Publications

Publications by João Gama

2016

Analyzing the behavior dynamics of grain price indexes using Tucker tensor decomposition and spatio-temporal trajectories

Authors
Correa, FE; Oliveira, MDB; Gama, J; Correa, PLP; Rady, J;

Publication
COMPUTERS AND ELECTRONICS IN AGRICULTURE

Abstract
Agribusiness is an activity that generates huge amounts of temporal data. There are research centers that collect, store and create indexes of agricultural activities, providing multidimensional time series composed by years of data. In this paper, we are interested in studying the behavior of these time series, especially in what regards the evolution of agricultural price indexes over the years. We explore data mining techniques tailored to analyze temporal data, aiming to generate spatio-temporal trajectories of grains price indexes for six years of data. We propose the use of Tucker decomposition to both analyze the temporal patterns of these price indexes and map trajectories that represent their behavior over time in a concise and representative low-dimensional subspace. The case study presents an application of this methodology to real databases of price indexes of corn and soybeans in Brazil and the United States.

CloseRead Abstract

2017

An evolutionary algorithm for clustering data streams with a variable number of clusters

Authors
Silva, JD; Hruschka, ER; Gama, J;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Several algorithms for clustering data streams based on k-Means have been proposed in the literature. However, most of them assume that the number of clusters, k, is known a priori by the user and can be kept fixed throughout the data analySis process. Besides the difficulty in choosing k, data stream clustering imposes several challenges to be addressed, such as addressing non-stationary, unbounded data that arrive in an online fashion. In this paper, we propose a Fast Evolutionary Algorithm for Clustering data streams (FEAC-Stream) that allows estimating k automatically from data in an online fashion. FEAC-Stream uses the Page-Hinkley Test to detect eventual degradation in the quality of the induced clusters, thereby triggering an evolutionary algorithm that re-estimates k accordingly. FEAC-Stream relies on the assumption that clusters of (partially unknown) data can provide useful information about the dynamics of the data stream. We illustrate the potential of FEAC-Stream in a set of experiments using both synthetic and real-world data streams, comparing it to four related algorithms, namely: CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk and StreamKM++-BkM. The obtained results show that FEAC-Stream provides good data partitions and that it can detect, and accordingly react to, data changes.

CloseRead Abstract

2018

Proceedings of the Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV 2016) co-located with the 2016 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016), Riva del Garda, Italy, September 23, 2016

Authors
Mouchaweh, MS; Bouchachia, H; Gama, J; Ribeiro, RP;

Publication
STREAMEVOLV@ECML-PKDD

Abstract

2015

Data Stream Classification Based on the Gamma Classifier

Authors
Valeria Uriarte Arcia, AV; Lopez Yanez, I; Yanez Marquez, C; Gama, J; Camacho Nieto, O;

Publication
MATHEMATICAL PROBLEMS IN ENGINEERING

Abstract
The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier) implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

CloseRead Abstract

2015

Concept Drift Detection with Clustering via Statistical Change Detection Methods

Authors
Sakamoto, Y; Fukui, K; Gama, J; Nicklas, D; Moriyama, K; Numao, M;

Publication
2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)

Abstract
We propose a concept drift detection method utilizing statistical change detection in which a drift detection method and the Page-Hinkley test are employed. Our method enables users to annotate clustering results without constructing a model of drift detection for every input. In our experiments using synthetic data, we evaluated our proposed method on the basis of detection delay and false detection, also revealed relations between the degree of drift and parameters of the method.

CloseRead Abstract

2018

Multi-label classification from high-speed data streams with adaptive model rules and random rules

Authors
Sousa, R; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
Multi-label classification is a methodology that tries to solve classification problems where multiple classes are associated with each data example. Data streams pose new challenges to this methodology caused by the massive amounts of structured data production. In fact, most of the existent batch mode methods may not support this condition. Therefore, this paper proposes two multi-label classification methods based on rule and ensembles learning from continuous flow of data. These methods are derived from a multi-target regression algorithm. The main contribution of this work is the rule specialization for subsets of class labels, instead of the usual local (individual models for each output) or a global (one model for all outputs) methods. Prequential evaluation was conducted where global, local and subset operation modes were compared against other online classifiers found in the literature. Six real-world data sets were used. The evaluation demonstrated that the subset specialization presents competitive performance, when compared to local and global approaches and online classifiers found in the literature.

CloseRead Abstract