Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2009

Issues in Evaluation of Stream Learning Algorithms

Autores
Gama, J; Sebastiao, R; Rodrigues, PP;

Publicação
KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING

Abstract
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate - the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i) assessing performance of a learning algorithm; ii) comparing learning algorithms; iii) hypothesis testing using McNemar test; and iv) change detection using Page-Hinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time.

FecharLer Abstract

2007

Incremental discretization, application to data with concept drift

Autores
Pinto, C; Gama, J;

Publicação
APPLIED COMPUTING 2007, VOL 1 AND 2

Abstract
In this paper we present a method for incremental discretization able to be adapted to gradual changes in the target concept. The proposed method is based on the Partition incremental Discretization (PiD for short). The algorithm divides the discretization task in two layers. The first layer receives the sequence of input data and retains some statistics of the data using more intervals than required. The second layer computes the final discretization, based in the statistics stored by the first layer. The method is able to process streaming examples in a single scan, in constant time and space even for infinite sequences of examples. In dynamic environments the target concept can gradually change over time. Past examples may not reflect the actual status of the problem. To accommodate concept drift we use an exponential decay that smoothly reduces the importance of older examples. Experimental evaluation on a benchmark problem for drift environments, clearly illustrates the benefits of the weighting examples technique.

FecharLer Abstract

2012

A density-based clustering approach for behavior change detection in data streams

Autores
Vallim, RMM; Filho, JAA; Carvalho, ACPLF; Gama, J;

Publicação
Proceedings - Brazilian Symposium on Neural Networks, SBRN

Abstract
Mining data streams poses many challenges to existing Machine Learning algorithms. Algorithms designed to learn in this scenario need to constantly update their decision models in accordance with current data behavior. Therefore, the ability to detect when the behavior of the stream is changing is an important feature of any learning technique approaching data streams. This work is concerned with unsupervised behavior change detection. It suggests the use of density-based clustering and an entropy measurement for change detection that is independent of the number and format of clusters. The proposed approach uses a modified version of the Den Stream algorithm that is designed to better cope with the entropy calculation. Experimental results using synthetic data provide insight on how clustering and novelty detection algorithms can be used for change detection in data streams. © 2012 IEEE.

FecharLer Abstract

2011

Data stream mining algorithms for building decision models in a computer roleplaying game simulation

Autores
Vallim, RMM; De Carvalho, ACPLF; Gama, J;

Publicação
Proceedings - 2010 Brazilian Symposium on Games and Digital Entertainment, SBGames 2010

Abstract
Computer games are attracting increasing interest in the Artificial Intelligence (AI) research community, mainly because games involve reasoning, planning and learning [1]. One area of particular interest in the last years is the creation of adaptive game AI. Adaptive game AI is the implementation of AI in computer games that holds the ability to adapt to changing circumstances, i.e., to exhibit adaptive behavior during the play. This kind of adaptation can be created using Machine Learning techniques, such as neural networks, reinforcement learning and bioinspired methods. In order to learn online, a system needs to overcome the main difficulties imposed by games: processing time and memory requirements. Learning in a game needs to be fast and the memory available is usually not enough to store a large number of training examples to a traditional Machine Learning technique. In this context, methods for mining data streams seem to be a natural approach. Data streams are, by definition, sequences of training examples that arrive over time [2]. In the data stream scenario, algorithms are usually incremental and capable of adapting the decision model when a change in the distribution of the training examples is detected. One particularly interesting algorithm for mining data streams is the Very Fast Decision Tree (VFDT) [3]. VFDTs are incremental decision trees designed specifically to meet the data stream problem requirements. In this paper, we analyse the use of VFDTs in the task of learning in a Computer RolePlaying Game context. First, we simulate data from manually designed tactics for a Computer RolePlaying Game, based on Spronck's static tactics [4], and test the suitability of VFDT to rapid learn these tactics. Afterwards, we conduct an experiment in order to simulate a data stream of examples where changes of tactics occur over time, and analyse how VFDT and some of its variations respond to these changes in the target concept. © 2010 IEEE.

FecharLer Abstract

2012

Mobile data stream mining: From algorithms to applications

Autores
Krishnaswamy, S; Gama, J; Gaber, MM;

Publicação
Proceedings - 2012 IEEE 13th International Conference on Mobile Data Management, MDM 2012

Abstract
This paper presents an overview of the current state-of-the-art in mobile data stream mining. This area of mobile data stream mining is significant for a number of new application domains such as mobile crowd sensing and mobile activity recognition. The paper presents the strategies and techniques for adaptation that are essential in order to perform real-time, continuous data mining on mobile devices. We present an overview of the algorithms research in this area. Finally, we discuss the key toolkits, systems and applications of mobile data stream mining. © 2012 IEEE.

FecharLer Abstract

2008

RUSE-WARMR: Rule Selection for Classifier Induction in Multi-Relational Data-Sets

Autores
Ferreira, CA; Gama, J; Costa, VS;

Publicação
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS

Abstract
One of the major challenges in knowledge discovery is how to extract meaningful and useful knowledge from the complex structured data that one finds in Scientific and Technological applications. One approach is to explore the logic relations in the database and using, say, an Inductive Logic Programming (ILP) algorithm find descriptive and expressive patterns. These patterns can then be used as features to characterize the target concept, The effectiveness of these algorithms depends both upon the algorithm we use to generate the patterns and upon the classifier Rule mining provides an excellent framework for efficiently mining the interesting patterns that are relevant. We propose a novel method to select discriminative patterns and evaluate the effectiveness of this method on a complex discovery application of practical interest.

FecharLer Abstract