Publications

Publications by João Gama

2009

Decision Trees Using the Minimum Entropy-of-Error Principle

Authors
Marques de Sa, JPM; Gama, J; Sebastiao, R; Alexandre, LA;

Publication
COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS

Abstract
Binary decision trees based on univariate splits have traditionally employed so-called impurity functions as a means of searching for the best node splits. Such functions use estimates of the class distributions. In the present paper we introduce a new concept to binary tree design: instead of working with the class distributions of the data we work directly with the distribution of the errors originated by the node splits. Concretely, we search for the best splits using a minimum entropy-of-error (MEE) strategy. This strategy has recently been applied in other areas (e.g. regression, clustering, blind source separation, neural network training) with success. We show that MEE trees are capable of producing good results with often simpler trees, have interesting generalization properties and in the many experiments we have performed they could be used without pruning.

CloseRead Abstract

2011

New Results on Minimum Error Entropy Decision Trees

Authors
Marques de Sa, JPM; Sebastiao, R; Gama, J; Fontes, T;

Publication
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS

Abstract
We present new results on the performance of Minimum Error Entropy (MEE) decision trees, which use a novel node split criterion. The results were obtained in a comparive study with popular alternative algorithms, on 42 real world datasets. Carefull validation and statistical methods were used. The evidence gathered from this body of results show that the error performance of MEE trees compares well with alternative algorithms. An important aspect to emphasize is that MEE trees generalize better on average without sacrifing error performance.

CloseRead Abstract

2004

Functional trees

Authors
Gama, J;

Publication
MACHINE LEARNING

Abstract
In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. In the regression setting, model trees algorithms explore multiple representation languages but using linear models at leaf nodes. In this work we study the effects of using combinations of attributes at decision nodes, leaf nodes, or both nodes and leaves in regression and classification tree learning. In order to study the use of functional nodes at different places and for different types of modeling, we introduce a simple unifying framework for multivariate tree learning. This framework combines a univariate decision tree with a linear function by means of constructive induction. Decision trees derived from the framework are able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. We experimentally evaluate a univariate tree, a multivariate tree using linear combinations at inner and leaf nodes, and two simplified versions restricting linear combinations to inner nodes and leaves. The experimental evaluation shows that all functional trees variants exhibit similar performance, with advantages in different datasets. In this study there is a marginal advantage of the full model. These results lead us to study the role of functional leaves and nodes. We use the bias-variance decomposition of the error, cluster analysis, and learning curves as tools for analysis. We observe that in the datasets under study and for classification and regression, the use of multivariate decision nodes has more impact in the bias component of the error, while the use of multivariate decision leaves has more impact in the variance component.

CloseRead Abstract

2003

Accurate decision trees for mining high-speed data streams

Authors
Gama, J; Rocha, R; Medas, P;

Publication
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Abstract
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. In this paper we extend the VFDT system in two directions: the ability to deal with continuous data and the use of more powerful classification techniques at tree leaves. The proposed system, VFDTc, can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datasets. This is relevant due to the any-time property. We study the behaviour of VFDTc in different problems and demonstrate its utility in large and medium data sets. Under a bias-variance analysis we observe that VFDTc in comparison to C4.5 is able to reduce the variance component. Copyright 2003 ACM.

CloseRead Abstract

2006

Learning with local drift detection

Authors
Gama, J; Castillo, G;

Publication
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS

Abstract
Most of the work in Machine Learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generates the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to monitor the online error-rate of a learning algorithm looking for significant deviations. The method can be used as a wrapper over any learning algorithm. In most problems, a change affects only some regions of the instance space, not the instance space as a whole. In decision models that fit different functions to regions of the instance space, like Decision Trees and Rule Learners, the method can be used to monitor the error in regions of the instance space, with advantages of fast model adaptation. In this work we present experiments using the method as a wrapper over a decision tree and a linear model, and in each internal-node of a decision tree. The experimental results obtained in controlled experiments using artificial data and a real-world problem show a good performance detecting drift and in adapting the decision model to the new concept.

CloseRead Abstract

2006

An adaptive prequential learning framework for Bayesian Network Classifiers

Authors
Castillo, G; Gama, J;

Publication
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS

Abstract
We introduce an adaptive prequential learning framework for Bayesian Network Classifiers which attempts to handle the cost-performance trade-off and cope with concept drift. Our strategy for incorporating new data is based on bias management and gradual adaptation. Starting with the simple Naive Bayes, we scale up the complexity by gradually increasing the maximum number of allowable attribute dependencies, and then by searching for new dependences in the extended search space. Since updating the structure is a costly task, we use new data to primarily adapt the parameters and only if this is really necessary, do we adapt the structure. The method for handling concept drift is based on the Shewhart P-Chart. We evaluated our adaptive algorithms on artificial domains and benchmark problems and show its advantages and future applicability in real-world on-line learning systems.

CloseRead Abstract