Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2008

Improving the performance of an incremental algorithm driven by error margins

Autores
del Campo Avila, J; Ramos Jimenez, G; Gama, J; Morales Bueno, R;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
Classification is a quite relevant task within data analysis field. This task is not a trivial task and different difficulties can arise depending on the nature of the problem. All these difficulties can become worse when the datasets are too large or when new information can arrive at any time. Incremental learning is an approach that can be used to deal with the classification task in these cases. It must alleviate, or solve, the problem of limited time and memory resources. One emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them. IADEM is one of the most recent algorithms that use this approach. The aim of this paper is to improve the performance of this algorithm in different ways: simplifying the complexity of the induced models, adding the ability to deal with continuous data, improving the detection of noise, selecting new criteria for evolutionating the model, including the use of more powerful prediction techniques, etc. Besides these new properties, the new system, IADEM-2, preserves the ability to obtain a performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does: using short time per example.

FecharLer Abstract

1999

Discriminant trees

Autores
Gama, J;

Publicação
MACHINE LEARNING, PROCEEDINGS

Abstract
In a previous work, we presented system Ltree, a multivariate tree that combines a decision tree with a linear discriminant by means of constructive induction. We have shown that it performs quite well, in terms of accuracy and learning times, in comparison with other multivariate systems like LMDT, OC1, and CART. In this work, we extend the previous work by using two new discriminant functions: a quadratic discriminant and a logistic discriminant. Using the same architecture as Ltree we obtain two new multivariate trees Qtree and LgTree. The three systems have been evaluate on 17 UCI datasets. From the empirical study, we argue that these systems can be shown as a composition of classifiers with low correlation error. From a bias-variance analysis of the error rate, the error reduction of all the systems in comparison to a univariate tree, is due to a reduction on both components.

FecharLer Abstract

2008

Knowledge discovery from data streams

Autores
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract

2005

Extracting knowledge from databases and warehouses (EKDB&W 2005) - Introduction

Autores
Gama, J; Moura Pires, J; Cardoso, M; Marques, NC; Cavique, L;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract

2005

Learning decision trees from dynamic data streams

Autores
Gama, J; Medas, P;

Publicação
JOURNAL OF UNIVERSAL COMPUTER SCIENCE

Abstract
This paper presents a system for induction of forest of functional trees from data streams able to detect concept drift. The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, which works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. For multi-class problems the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples. Naive-Bayes in inner nodes play two different roles. They can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect changes in the class-distribution of the examples that traverse the node. When a change in the class-distribution is detected, all the sub-tree rooted at that node will be pruned. The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect changes in the distribution of the examples are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations. This aspect is a main advantage in the context of high-speed data streams. This methodology was tested with artificial and real-world data sets. The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect drift in the distribution of the examples.

FecharLer Abstract

2005

Data streams - J.UCS special issue

Autores
Aguilar Ruiz, JS; Gama, J;

Publicação
JOURNAL OF UNIVERSAL COMPUTER SCIENCE

Abstract