Publications

Publications by LIAAD

2015

Concept Drift Detection with Clustering via Statistical Change Detection Methods

Authors
Sakamoto, Y; Fukui, K; Gama, J; Nicklas, D; Moriyama, K; Numao, M;

Publication
2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)

Abstract
We propose a concept drift detection method utilizing statistical change detection in which a drift detection method and the Page-Hinkley test are employed. Our method enables users to annotate clustering results without constructing a model of drift detection for every input. In our experiments using synthetic data, we evaluated our proposed method on the basis of detection delay and false detection, also revealed relations between the degree of drift and parameters of the method.

CloseRead Abstract

2015

Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency

Authors
Souza, VMAd; Silva, DF; Gama, J; Batista, GEAPA;

Publication
Proceedings of the 2015 SIAM International Conference on Data Mining, Vancouver, BC, Canada, April 30 - May 2, 2015

Abstract
Data stream classification algorithms for nonstationary environments frequently assume the availability of class labels, instantly or with some lag after the classification. However, certain applications, mainly those related to sensors and robotics, involve high costs to obtain new labels during the classification phase. Such a scenario in which the actual labels of processed data are never available is called extreme verification latency. Extreme verification latency requires new classification methods capable of adapting to possible changes over time without external supervision. This paper presents a fast, simple, intuitive and accurate algorithm to classify nonstationary data streams in an extreme verification latency scenario, namely Stream Classification Algorithm Guided by Clustering - SCARGC. Our method consists of a clustering followed by a classification step applied repeatedly in a closed loop fashion. We show in several classification tasks evaluated in synthetic and real data that our method is faster and more accurate than the state-of-the-art. Copyright © SIAM.

CloseRead Abstract

2015

Links between Scores, Real Default and Pricing: Evidence from the Freddie Mac’s Loan-Level Dataset

Authors
Rocha Sousa, M; Gama, J; Brandão, E;

Publication
Journal of Economics, Business and Management

Abstract

2015

Special track on data streams

Authors
Rodrigues, PP; Bifet, A; Krishnaswamy, S; Gama, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract

2015

Keynote speaker 2: Real time data mining

Authors
Gama, J;

Publication
2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems, EAIS 2015, Douai, France, December 1-3, 2015

Abstract

2015

Online tree-based ensembles and option trees for regression on evolving data streams

Authors
Ikonomovska, E; Gama, J; Dzeroski, S;

Publication
NEUROCOMPUTING

Abstract
The emergence of ubiquitous sources of streaming data has given rise to the popularity of algorithms for online machine learning. In that context, Hoeffding trees represent the state-of-the-art algorithms for online classification. Their popularity stems in large part from their ability to process large quantities of data with a speed that goes beyond the processing power of any other streaming or batch learning algorithm. As a consequence, Hoeffding trees have often been used as base models of many ensemble learning algorithms for online classification. However, despite the existence of many algorithms for online classification, ensemble learning algorithms for online regression do not exist. In particular, the field of online any-time regression analysis seems to have experienced a serious lack of attention. In this paper, we address this issue through a study and an empirical evaluation of a set of online algorithms for regression, which includes the baseline Hoeffding-based regression trees, online option trees, and an online least mean squares filter. We also design, implement and evaluate two novel ensemble learning methods for online regression: online bagging with Hoeffding-based model trees, and an online RandomForest method in which we have used a randomized version of the online model tree learning algorithm as a basic building block. Within the study presented in this paper, we evaluate the proposed algorithms along several dimensions: predictive accuracy and quality of models, time and memory requirements, bias-variance and bias-variance-covariance decomposition of the error, and responsiveness to concept drift.

CloseRead Abstract