Publications

Publications by João Gama

2007

Pursuing the best ECOC dimension for multiclass problems

Authors
Pimenta, E; Gama, J; Carvalho, A;

Publication
Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007

Abstract
Recent work highlights advantages in decomposing multiclass decision problems into multiple binary problems. Several strategies have been proposed for this decomposition. The most frequently investigated are All-vs-All, One-vs-All and the Error correction output codes (ECOC). ECOC are binary words (codewords) and can be adapted to be used in classifications problems. They must, however, comply with some specific constraints. The codewords can have several dimensions for each number of classes to be represented. These dimensions grow exponentially with the number of classes of the multiclass problem. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive against conventional multiclass decomposition methods. Copyright

CloseRead Abstract

1999

Linear tree

Authors
Gama, J; Brazdil, P;

Publication
Intelligent Data Analysis

Abstract
In this paper we present system Ltree for propositional supervised learning. Ltree is able to define decision surfaces both orthogonal and oblique to the axes defined by the attributes of the input space. This is done combining a decision tree with a linear discriminant by means of constructive induction. At each decision node Ltree defines a new instance space by insertion of new attributes that are projections of the examples that fall at this node over the hyper-planes given by a linear discriminant function. This new instance space is propagated down through the tree. Tests based on those new attributes are oblique with respect to the original input space. Ltree is a probabilistic tree in the sense that it outputs a class probability distribution for each query example. The class probability distribution is computed at learning time, taking into account the different class distributions on the path from the root to the actual node. We have carried out experiments on twenty one benchmark datasets and compared our system with other well known decision tree systems (orthogonal and oblique) like C4.5, OC1, LMDT, and CART. On these datasets we have observed that our system has advantages in what concerns accuracy and learning times at statistically significant confidence levels.

CloseRead Abstract

2010

Drift Severity Metric

Authors
Kosina, P; Gama, J; Sebastiao, R;

Publication
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
Concept drift in data is usually considered only as abrupt or gradual thus referring to the speed of change. Such simple distinguishing by speed is sufficient for most of the problems, but there might be situations for which a finer representation would be of use. This paper studies further the phenomenon of concept drift and introduces a simple measure which is relevant to the speed and amount of change between different concepts.

CloseRead Abstract

2007

OLINDDA

Authors
Spinosa, EJ; de Leon F. de Carvalho, AP; Gama, J;

Publication
Proceedings of the 2007 ACM symposium on Applied computing - SAC '07

Abstract

2007

Knowledge discovery from data streams

Authors
Gama, J; Aguilar Ruiz, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract

2008

The dimension of ECOCs for multiclass classification problems

Authors
Pimenta, E; Gama, J; Carvalho, A;

Publication
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS

Abstract
Several classification problems involve more than two classes. These problems are known as multiclass classification problems. One of the approaches to deal with multiclass problems is their decomposition into a set of binary problems. Recent work shows important advantages related with this approach. Several strategies have been proposed for this decomposition. The strategies most frequently used are All-vs-All, One-vs-All and Error Correction Output Codes (ECOC). ECOCs are based on binary words (codewords) and have been adapted to deal with multiclass problems. For such, they must comply with a number of specific constraints. Different dimensions may be adopted for the codewords for each number of classes in the problem. These dimensions grow exponentially with the number of classes present in a dataset. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The proposed methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive with other multiclass decomposition methods.

CloseRead Abstract