Publications

Publications by João Gama

2008

The dimension of ECOCs for multiclass classification problems

Authors
Pimenta, E; Gama, J; Carvalho, A;

Publication
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS

Abstract
Several classification problems involve more than two classes. These problems are known as multiclass classification problems. One of the approaches to deal with multiclass problems is their decomposition into a set of binary problems. Recent work shows important advantages related with this approach. Several strategies have been proposed for this decomposition. The strategies most frequently used are All-vs-All, One-vs-All and Error Correction Output Codes (ECOC). ECOCs are based on binary words (codewords) and have been adapted to deal with multiclass problems. For such, they must comply with a number of specific constraints. Different dimensions may be adopted for the codewords for each number of classes in the problem. These dimensions grow exponentially with the number of classes present in a dataset. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The proposed methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive with other multiclass decomposition methods.

CloseRead Abstract

2007

Change detection in learning histograms from data streams

Authors
Sebastiao, R; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most recent data, and forgetting out-date data whenever a change in the distribution is detected. To construct histogram from high-speed data streams we use the two layer structure used in the Partition Incremental Discretization (PiD) algorithm. Our contribution is a new method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference time window, that reflects the distribution observed in the past; and the current time window reflecting the distribution observed in the most recent data. We compare both distributions and signal a change whenever they are greater than a threshold value, using three different methods: the Entropy Absolute Difference, the Kullback-Leibler divergence and the Cosine Distance. The experimental results suggest that Kullback-Leibler divergence exhibit high probability in change detection, faster detection rates, with few false positives alarms.

CloseRead Abstract

2003

Iterative Bayes

Authors
Gama, J;

Publication
THEORETICAL COMPUTER SCIENCE

Abstract
Naive Bayes is a well-known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The Iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. In this paper we argue that Iterative Bayes minimizes a quadratic loss function instead of the 0-1 loss function that usually applies, to classification problems. Experimental evaluation of Iterative Bayes on 27 benchmark data sets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

CloseRead Abstract

1999

Iterative naive Bayes

Authors
Gama, J;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 25 benchmark datasets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

CloseRead Abstract

2006

Discretization from data streams: Applications to histograms and data mining

Authors
Gama, J; Pinto, C;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
In this paper we propose a new method to perform incremental discretization. The basic idea is to perform the task in two layers. The first layer receives the sequence of input data and keeps some, statistics on the data using many more intervals than required. Based on the statistics stored by the first layer, the second layer creates the final discretization. The proposed architecture processes streaming examples in a single scan, in constant time and space even for infinite sequences of examples. We experimentally demonstrate that incremental discretization is able to maintain the performance of learning algorithms in comparison to a batch discretization. The proposed method is much more appropriate in incremental learning, and in problems where data flows continuously, as in most of the recent data mining applications. Copyright 2006 ACM.

CloseRead Abstract

2008

Special track on data streams

Authors
Gama, J; Carvalho, A; Aguilar Rlliz, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract