Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2009

Evaluating algorithms that learn from data streams

Authors
Gama, J; Rodrigues, PP; Sebastião, R;

Publication
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), Honolulu, Hawaii, USA, March 9-12, 2009

Abstract
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. In this paper we propose a general framework for assessing the quality of streaming learning algorithms. We defend the use of Predictive Sequential error estimates over a sliding window to assess performance of learning algorithms that learn from open-ended data streams in non-stationary environments. This paper studies properties of convergence and methods to comparatively assess algorithms performance. Copyright 2009 ACM.

2012

Sequential Pattern Knowledge in Multi-Relational Learning

Authors
Ferreira, CA; Gama, J; Costa, VS;

Publication
COMPUTER AND INFORMATION SCIENCES II

Abstract
In this work we present XmuSer, a multi-relational framework suitable to explore temporal patterns available in multi-relational databases. xMuS er's main idea consists of exploiting frequent sequence mining, using an efficient and direct method to learn temporal patterns in the form of sequences. Grounded on a coding methodology and on the efficiency of sequence miners, we find the most interesting sequential patterns available and then map these findings into a new table, which encodes the multi-relational timed data using sequential patterns. In the last step of our framework, we use an ILP algorithm to learn a theory on the enlarged relational database that consists on the original multi-relational database and the new sequence relation. We evaluate our framework by addressing three classification problems.

1997

Regression Using Classification Algorithms

Authors
Torgo, L; Gama, J;

Publication
Intell. Data Anal.

Abstract
This article presents an alternative approach to the problem of regression. The methodology we describe allows the use of classification algorithms in regression tasks. From a practical point of view this enables the use of a wide range of existing machine learning (ML) systems in regression problems. In effect, most of the widely available systems deal with classification. Our method works as a pre-processing step in which the continuous goal variable values are discretised into a set of intervals. We use misclassification costs as a means to reflect the implicit ordering among these intervals. We describe a set of alternative discretisation methods and, based on our experimental results, justify the need for a search-based approach to choose the best method. The discretisation process is isolated from the classification algorithm, thus being applicable to virtually any existing system. The implemented system (RECLA) can thus be seen as a generic pre-processing tool. We have tested RECLA with three different classification systems and evaluated it in several regression data sets. Our experimental results confirm the validity of our search-based approach to class discretisation, and reveal the accuracy benefits of adding misclassification costs. © 1997 Elsevier Science B.Y.

2011

Preface

Authors
Suzuki, E; Sebag, M; Ando, S; Balcazar, JL; Billard, A; Bratko, I; Bredeche, N; Gama, J; Grunwald, P; Iba, H; Kersting, K; Peters, J; Washio, T;

Publication
Proceedings - IEEE International Conference on Data Mining, ICDM

Abstract

2011

Preface

Authors
Khan, L; Pechenizkiy, M; Zliobaite, I; Agrawal, C; Bifet, A; Delany, SJ; Dries, A; Fan, W; Gabrys, B; Gama, J; Gao, J; Gopalkrishnan, V; Holmes, G; Katakis, I; Kuncheva, L; Van Leeuwen, M; Masud, M; Menasalvas, E; Minku, L; Pfahringer, B; Polikar, R; Rodrigues, PP; Tsoumakas, G; Tsymbal, A;

Publication
Proceedings - IEEE International Conference on Data Mining, ICDM

Abstract

2000

Cascade generalization

Authors
Gama, J; Brazdil, P;

Publication
MACHINE LEARNING

Abstract
Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present two related methods for merging classifiers. The first method, Cascade Generalization, couples classifiers loosely. It belongs to the family of stacking algorithms. The basic idea of Cascade Generalization is to use sequentially the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. The second method exploits tight coupling of classifiers, by applying Cascade Generalization locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third merges a linear discriminant and a naive Bayes with a decision tree. All the algorithms show an increase of performance, when compared with the corresponding single models. Cascade also outperforms other methods for combining classifiers, like Stacked Generalization, and competes well against Boosting at statistically significant confidence levels.

  • 50
  • 88