Publications

Publications by João Gama

2011

Incremental multi-target model trees for data streams

Authors
Ikonomovska, E; Gama, J; Dzeroski, S;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
As in batch learning, one may identify a class of streaming real-world problems which require the modeling of several targets simultaneously. Due to the dependencies among the targets, simultaneous modeling can be more successful and informative than creating independent models for each target. As a result one may obtain a smaller model able to simultaneously explain the relations between the input attributes and the targets. This problem has not been addressed previously in the streaming setting. We propose an algorithm for inducing multi-target model trees with low computational complexity, based on the principles of predictive clustering trees and probability bounds for supporting splitting decisions. Linear models are computed for each target separately, by incremental training of perceptrons in the leaves of the tree. Experiments are performed on synthetic and real-world datasets. The multi-target regression tree algorithm produces equally accurate and smaller models for simultaneous prediction of all the target attributes, as compared to a set of independent regression trees built separately for each target attribute. When the regression surface is smooth, the linear models computed in the leaves significantly improve the accuracy for all of the targets. © 2011 ACM.

CloseRead Abstract

2010

The next generation of transportation systems, greenhouse emissions, and data mining

Authors
Kargupta, H; Gama, J; Fan, W;

Publication
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Abstract

2009

Issues in Evaluation of Stream Learning Algorithms

Authors
Gama, J; Sebastiao, R; Rodrigues, PP;

Publication
KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING

Abstract
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate - the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i) assessing performance of a learning algorithm; ii) comparing learning algorithms; iii) hypothesis testing using McNemar test; and iv) change detection using Page-Hinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time.

CloseRead Abstract

2007

Incremental discretization, application to data with concept drift

Authors
Pinto, C; Gama, J;

Publication
APPLIED COMPUTING 2007, VOL 1 AND 2

Abstract
In this paper we present a method for incremental discretization able to be adapted to gradual changes in the target concept. The proposed method is based on the Partition incremental Discretization (PiD for short). The algorithm divides the discretization task in two layers. The first layer receives the sequence of input data and retains some statistics of the data using more intervals than required. The second layer computes the final discretization, based in the statistics stored by the first layer. The method is able to process streaming examples in a single scan, in constant time and space even for infinite sequences of examples. In dynamic environments the target concept can gradually change over time. Past examples may not reflect the actual status of the problem. To accommodate concept drift we use an exponential decay that smoothly reduces the importance of older examples. Experimental evaluation on a benchmark problem for drift environments, clearly illustrates the benefits of the weighting examples technique.

CloseRead Abstract

2012

A density-based clustering approach for behavior change detection in data streams

Authors
Vallim, RMM; Filho, JAA; Carvalho, ACPLF; Gama, J;

Publication
Proceedings - Brazilian Symposium on Neural Networks, SBRN

Abstract
Mining data streams poses many challenges to existing Machine Learning algorithms. Algorithms designed to learn in this scenario need to constantly update their decision models in accordance with current data behavior. Therefore, the ability to detect when the behavior of the stream is changing is an important feature of any learning technique approaching data streams. This work is concerned with unsupervised behavior change detection. It suggests the use of density-based clustering and an entropy measurement for change detection that is independent of the number and format of clusters. The proposed approach uses a modified version of the Den Stream algorithm that is designed to better cope with the entropy calculation. Experimental results using synthetic data provide insight on how clustering and novelty detection algorithms can be used for change detection in data streams. © 2012 IEEE.

CloseRead Abstract

2011

Data stream mining algorithms for building decision models in a computer roleplaying game simulation

Authors
Vallim, RMM; De Carvalho, ACPLF; Gama, J;

Publication
Proceedings - 2010 Brazilian Symposium on Games and Digital Entertainment, SBGames 2010

Abstract
Computer games are attracting increasing interest in the Artificial Intelligence (AI) research community, mainly because games involve reasoning, planning and learning [1]. One area of particular interest in the last years is the creation of adaptive game AI. Adaptive game AI is the implementation of AI in computer games that holds the ability to adapt to changing circumstances, i.e., to exhibit adaptive behavior during the play. This kind of adaptation can be created using Machine Learning techniques, such as neural networks, reinforcement learning and bioinspired methods. In order to learn online, a system needs to overcome the main difficulties imposed by games: processing time and memory requirements. Learning in a game needs to be fast and the memory available is usually not enough to store a large number of training examples to a traditional Machine Learning technique. In this context, methods for mining data streams seem to be a natural approach. Data streams are, by definition, sequences of training examples that arrive over time [2]. In the data stream scenario, algorithms are usually incremental and capable of adapting the decision model when a change in the distribution of the training examples is detected. One particularly interesting algorithm for mining data streams is the Very Fast Decision Tree (VFDT) [3]. VFDTs are incremental decision trees designed specifically to meet the data stream problem requirements. In this paper, we analyse the use of VFDTs in the task of learning in a Computer RolePlaying Game context. First, we simulate data from manually designed tactics for a Computer RolePlaying Game, based on Spronck's static tactics [4], and test the suitability of VFDT to rapid learn these tactics. Afterwards, we conduct an experiment in order to simulate a data stream of examples where changes of tactics occur over time, and analyse how VFDT and some of its variations respond to these changes in the target concept. © 2010 IEEE.

CloseRead Abstract