Publications

Publications by João Gama

2015

Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency

Authors
Souza, VMAd; Silva, DF; Gama, J; Batista, GEAPA;

Publication
Proceedings of the 2015 SIAM International Conference on Data Mining, Vancouver, BC, Canada, April 30 - May 2, 2015

Abstract
Data stream classification algorithms for nonstationary environments frequently assume the availability of class labels, instantly or with some lag after the classification. However, certain applications, mainly those related to sensors and robotics, involve high costs to obtain new labels during the classification phase. Such a scenario in which the actual labels of processed data are never available is called extreme verification latency. Extreme verification latency requires new classification methods capable of adapting to possible changes over time without external supervision. This paper presents a fast, simple, intuitive and accurate algorithm to classify nonstationary data streams in an extreme verification latency scenario, namely Stream Classification Algorithm Guided by Clustering - SCARGC. Our method consists of a clustering followed by a classification step applied repeatedly in a closed loop fashion. We show in several classification tasks evaluated in synthetic and real data that our method is faster and more accurate than the state-of-the-art. Copyright © SIAM.

CloseRead Abstract

2017

Ensemble learning for data stream analysis: A survey

Authors
Krawczyk, B; Minku, LL; Gama, J; Stefanowski, J; Wozniak, M;

Publication
INFORMATION FUSION

Abstract
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for 'non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research. Published by Elsevier B.V.

CloseRead Abstract

2016

Online Semi-supervised Learning for Multi-target Regression in Data Streams Using AMRules

Authors
Sousa, R; Gama, J;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
Most data streams systems that use online Multi-target regression yield vast amounts of data which is not targeted. Targeting this data is usually impossible, time consuming and expensive. Semi-supervised algorithms have been proposed to use this untargeted data (input information only) for model improvement. However, most algorithms are adapted to work on batch mode for classification and require huge computational and memory resources. Therefore, this paper proposes an semi-supervised algorithm for online processing systems based on AMRules algorithm that handle both targeted and untargeted data and improves the regression model. The proposed method was evaluated through a comparison between a scenario where the untargeted examples are not used on the training and a scenario where some untargeted examples are used. Evaluation results indicate that the use of the untargeted examples improved the target predictions by improving the model.

CloseRead Abstract

2016

Clustering data streams using a forgetful neural model

Authors
Cardoso, DdO; França, FMG; Gama, J;

Publication
Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4-8, 2016

Abstract
To cluster a data stream is a more challenging task than its regular batch version, having stricter performance constraints. In this paper an approach to this problem is presented, based on WiSARD, a memory-based artificial neural network (ANN) model. This model functioning was reviewed and improved, in order to adapt it to this task. The experimental results obtained support the use of this system for the analysis of data streams in an informative way. © 2016 ACM.

CloseRead Abstract

2014

Recurrent concepts in data streams classification

Authors
Gama, J; Kosina, P;

Publication
KNOWLEDGE AND INFORMATION SYSTEMS

Abstract
This work addresses the problem of mining data streams generated in dynamic environments where the distribution underlying the observations may change over time. We present a system that monitors the evolution of the learning process. The system is able to self-diagnose degradations of this process, using change detection mechanisms, and self-repair the decision models. The system uses meta-learning techniques that characterize the domain of applicability of previously learned models. The meta-learner can detect recurrence of contexts, using unlabeled examples, and take pro-active actions by activating previously learned models. The experimental evaluation on three text mining problems demonstrates the main advantages of the proposed system: it provides information about the recurrence of concepts and rapidly adapts decision models when drift occurs.

CloseRead Abstract

2014

Challenges in Learning from Streaming Data

Authors
Gama, J;

Publication
ADVANCES IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2014)

Abstract