Publications

Publications by João Gama

2008

Online reliability estimates for individual predictions in data streams

Authors
Rodrigues, PP; Gama, J; Bosnic, Z;

Publication
Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008

Abstract
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In many sensitive applications, users should be allowed to associate a measure of reliability to each prediction. In the case of batch systems, reliability measures have already been defined, mostly empirical measures as the estimation using the local sensitivity analysis. However, with the advent of data streams, these reliability estimates should also be computed online, based only on available data and current model's state. In this paper we define empirical measures to perform online estimation of reliability of individual predictions when made in the context of online learning systems. We present preliminary results and evaluate the estimators in two different problems. © 2008 IEEE.

CloseRead Abstract

2004

On data and algorithms: Understanding inductive performance

Authors
Kalousis, A; Gama, J; Hilario, M;

Publication
MACHINE LEARNING

Abstract
In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.

CloseRead Abstract

2011

Learning model trees from evolving data streams

Authors
Ikonomovska, E; Gama, J; Dzeroski, S;

Publication
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The problem of real-time extraction of meaningful patterns from time-changing data streams is of increasing importance for the machine learning and data mining communities. Regression in time-changing data streams is a relatively unexplored topic, despite the apparent applications. This paper proposes an efficient and incremental stream mining algorithm which is able to learn regression and model trees from possibly unbounded, high-speed and time-changing data streams. The algorithm is evaluated extensively in a variety of settings involving artificial and real data. To the best of our knowledge there is no other general purpose algorithm for incremental learning regression/model trees able to perform explicit change detection and informed adaptation. The algorithm performs online and in real-time, observes each example only once at the speed of arrival, and maintains at any-time a ready-to-use model tree. The tree leaves contain linear models induced online from the examples assigned to them, a process with low complexity. The algorithm has mechanisms for drift detection and model adaptation, which enable it to maintain accurate and updated regression models at any time. The drift detection mechanism exploits the structure of the tree in the process of local change detection. As a response to local drift, the algorithm is able to update the tree structure only locally. This approach improves the any-time performance and greatly reduces the costs of adaptation.

CloseRead Abstract

2008

A review on the combination of binary classifiers in multiclass problems

Authors
Lorena, AC; de Carvalho, ACPLF; Gama, JMP;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.

CloseRead Abstract

2011

Best papers from the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)

Authors
Pei, JA; Gama, J; Yang, QA; Huang, RH; Li, X;

Publication
KNOWLEDGE AND INFORMATION SYSTEMS

Abstract

2007

An overview on learning from data streams - Preface

Authors
Gama, J; Rodrigues, P; Aguilar Ruiz, J;

Publication
NEW GENERATION COMPUTING

Abstract