Publications

Publications by João Gama

2011

Advances in Intelligent Data Analysis X

Authors
Gama, J; Bradley, E; Hollmén, J;

Publication
Lecture Notes in Computer Science

Abstract

2001

Functional trees

Authors
Gama, J;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
The design of algorithms that explore multiple representation languages and explore different search spaces has an intuitive appeal. In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. The same applies to model trees algorithms, in regression domains, but using linear models at leaf nodes. In this paper we study where to use combinations of attributes in regression and classification tree learning. We present an algorithm for multivariate tree learning that combines a univariate decision tree with a linear function by means of constructive induction. This algorithm is able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. The algorithm has been implemented both for classification problems and regression problems. The experimental evaluation shows that our algorithm has clear advantages with respect to the generalization ability when compared against its components, two simplified versions, and competes well against the state-of-the-art in multivariate regression and classification trees. © Springer-Verlag Berlin Heidelberg 2001.

CloseRead Abstract

2010

Clustering from Data Streams

Authors
Shultz, TR; Fahlman, SE; Craw, S; Andritsos, P; Tsaparas, P; Silva, R; Drummond, C; Ling, CX; Sheng, VS; Drummond, C; Lanzi, PL; Gama, J; Wiegand, RP; Sen, P; Namata, G; Bilgic, M; Getoor, L; He, J; Jain, S; Stephan, F; Jain, S; Stephan, F; Sammut, C; Harries, M; Sammut, C; Ting, KM; Pfahringer, B; Case, J; Jain, S; Wagstaff, KL; Nijssen, S; Wirth, A; Ling, CX; Sheng, VS; Zhang, X; Sammut, C; Cancedda, N; Renders, J; Michelucci, P; Oblinger, D; Keogh, E; Mueen, A;

Publication
Encyclopedia of Machine Learning

Abstract

2008

Improving the performance of an incremental algorithm driven by error margins

Authors
del Campo Avilaa, J; Ramos Jimeneza, G; Gamab, J; Morales Buenoa, R;

Publication
Intelligent Data Analysis

Abstract
Classification is a quite relevant task within data analysis field. This task is not a trivial task and different difficulties can arise depending on the nature of the problem. All these difficulties can become worse when the datasets are too large or when new information can arrive at any time. Incremental learning is an approach that can be used to deal with the classification task in these cases. It must alleviate, or solve, the problem of limited time and memory resources. One emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them. IADEM is one of the most recent algorithms that use this approach. The aim of this paper is to improve the performance of this algorithm in different ways: simplifying the complexity of the induced models, adding the ability to deal with continuous data, improving the detection of noise, selecting new criteria for evolutionating the model, including the use of more powerful prediction techniques, etc. Besides these new properties, the new system, IADEM-2, preserves the ability to obtain a performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does: using short time per example.

CloseRead Abstract

2000

Iterative Bayes

Authors
Gama, J;

Publication
Intelligent Data Analysis

Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 27 benchmark datasets shows consistent gains in accuracy. Moreover, the update schema can take costs into account turning the algorithm cost sensitive. Unlike stratification, it is applicable to any number of classes and to arbitrary cost matrices. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

CloseRead Abstract

2008

Schema matching on streams with accuracy guarantees

Authors
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;

Publication
Intelligent Data Analysis

Abstract
We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases.

CloseRead Abstract