Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

1999

Linear tree

Authors
Gama, J; Brazdil, P;

Publication
Intelligent Data Analysis

Abstract
In this paper we present system Ltree for propositional supervised learning. Ltree is able to define decision surfaces both orthogonal and oblique to the axes defined by the attributes of the input space. This is done combining a decision tree with a linear discriminant by means of constructive induction. At each decision node Ltree defines a new instance space by insertion of new attributes that are projections of the examples that fall at this node over the hyper-planes given by a linear discriminant function. This new instance space is propagated down through the tree. Tests based on those new attributes are oblique with respect to the original input space. Ltree is a probabilistic tree in the sense that it outputs a class probability distribution for each query example. The class probability distribution is computed at learning time, taking into account the different class distributions on the path from the root to the actual node. We have carried out experiments on twenty one benchmark datasets and compared our system with other well known decision tree systems (orthogonal and oblique) like C4.5, OC1, LMDT, and CART. On these datasets we have observed that our system has advantages in what concerns accuracy and learning times at statistically significant confidence levels.

2010

Drift Severity Metric

Authors
Kosina, P; Gama, J; Sebastiao, R;

Publication
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
Concept drift in data is usually considered only as abrupt or gradual thus referring to the speed of change. Such simple distinguishing by speed is sufficient for most of the problems, but there might be situations for which a finer representation would be of use. This paper studies further the phenomenon of concept drift and introduces a simple measure which is relevant to the speed and amount of change between different concepts.

2007

OLINDDA

Authors
Spinosa, EJ; de Leon F. de Carvalho, AP; Gama, J;

Publication
Proceedings of the 2007 ACM symposium on Applied computing - SAC '07

Abstract

2007

Knowledge discovery from data streams

Authors
Gama, J; Aguilar Ruiz, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract

2008

The dimension of ECOCs for multiclass classification problems

Authors
Pimenta, E; Gama, J; Carvalho, A;

Publication
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS

Abstract
Several classification problems involve more than two classes. These problems are known as multiclass classification problems. One of the approaches to deal with multiclass problems is their decomposition into a set of binary problems. Recent work shows important advantages related with this approach. Several strategies have been proposed for this decomposition. The strategies most frequently used are All-vs-All, One-vs-All and Error Correction Output Codes (ECOC). ECOCs are based on binary words (codewords) and have been adapted to deal with multiclass problems. For such, they must comply with a number of specific constraints. Different dimensions may be adopted for the codewords for each number of classes in the problem. These dimensions grow exponentially with the number of classes present in a dataset. Two methods to choose the dimension of a ECOC, which assure a good trade-off between redundancy and error correction capacity, are proposed in this paper. The proposed methods are evaluated in a set of benchmark classification problems. Experimental results show that they are competitive with other multiclass decomposition methods.

2007

Change detection in learning histograms from data streams

Authors
Sebastiao, R; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most recent data, and forgetting out-date data whenever a change in the distribution is detected. To construct histogram from high-speed data streams we use the two layer structure used in the Partition Incremental Discretization (PiD) algorithm. Our contribution is a new method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference time window, that reflects the distribution observed in the past; and the current time window reflecting the distribution observed in the most recent data. We compare both distributions and signal a change whenever they are greater than a threshold value, using three different methods: the Entropy Absolute Difference, the Kullback-Leibler divergence and the Cosine Distance. The experimental results suggest that Kullback-Leibler divergence exhibit high probability in change detection, faster detection rates, with few false positives alarms.

  • 60
  • 88