2009
Autores
Qiang, Y; Ronghuai, H; Jian, P; Gama, J; Xiaofeng, M;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2005
Autores
Gama, J; Moura Pires, J; Cardoso, M; Marques, NC; Cavique, L;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2006
Autores
Gama, J; Fernandes, R; Rocha, R;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. We have extended VFDT in three directions: the ability to deal with continuous data; the use of more powerful classification techniques at tree leaves, and the ability to detect and react to concept drift. VFDTc system can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datum. This is relevant due to the any-time property. We also extend VFDTc with the ability to deal with concept drift, by continuously monitoring differences between two class-distribution of the examples: the distribution when a node was built and the distribution in a time window of the most recent examples. We study the sensitivity of VFDTc with respect to drift, noise, the order of examples, and the initial parameters in different problems and demonstrate its utility in large and medium data sets.
2008
Autores
del Campo Avila, J; Ramos Jimenez, G; Gama, J; Morales Bueno, R;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
Classification is a quite relevant task within data analysis field. This task is not a trivial task and different difficulties can arise depending on the nature of the problem. All these difficulties can become worse when the datasets are too large or when new information can arrive at any time. Incremental learning is an approach that can be used to deal with the classification task in these cases. It must alleviate, or solve, the problem of limited time and memory resources. One emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them. IADEM is one of the most recent algorithms that use this approach. The aim of this paper is to improve the performance of this algorithm in different ways: simplifying the complexity of the induced models, adding the ability to deal with continuous data, improving the detection of noise, selecting new criteria for evolutionating the model, including the use of more powerful prediction techniques, etc. Besides these new properties, the new system, IADEM-2, preserves the ability to obtain a performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does: using short time per example.
1999
Autores
Gama, J;
Publicação
MACHINE LEARNING, PROCEEDINGS
Abstract
In a previous work, we presented system Ltree, a multivariate tree that combines a decision tree with a linear discriminant by means of constructive induction. We have shown that it performs quite well, in terms of accuracy and learning times, in comparison with other multivariate systems like LMDT, OC1, and CART. In this work, we extend the previous work by using two new discriminant functions: a quadratic discriminant and a logistic discriminant. Using the same architecture as Ltree we obtain two new multivariate trees Qtree and LgTree. The three systems have been evaluate on 17 UCI datasets. From the empirical study, we argue that these systems can be shown as a composition of classifiers with low correlation error. From a bias-variance analysis of the error rate, the error reduction of all the systems in comparison to a univariate tree, is due to a reduction on both components.
2008
Autores
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.