Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2013

Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, Porto, Portugal, June 20-22, 2013

Autores
Rodrigues, PP; Pechenizkiy, M; Gama, J; Correia, RC; Liu, J; Traina, AJM; Lucas, PJF; Soda, P;

Publicação
CBMS

Abstract

2013

Proceedings of the 3rd Workshop on Ubiquitous Data Mining co-located with the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), Beijing, China, August 3, 2013

Autores
Gama, J; May, M; Marques, NC; Cortez, P; Ferreira, CA;

Publicação
UDM@IJCAI

Abstract

2014

Comparing Data Distribution Using Fading Histograms

Autores
Sebastiao, R; Gama, J; Mendonca, T;

Publicação
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014)

Abstract
The emergence of real temporal applications under non-stationary scenarios has drastically altered the ability to generate and gather information. Nowadays, under dynamic scenarios, potentially unbounded and massive amounts of information are generated at high-speed rate, known as data streams. Dealing with evolving data streams imposes the online monitoring of data in order to detect changes. The contribution of this paper is to present the advantage of using fading histograms to compare data distribution for change detection purposes. In an windowing scheme, data distributions provided by the fading histograms are compared using the Kullback-Leibler divergence. The experimental results support that the detection delay time is smaller when using fading histograms to represent data instead of standard histograms.

FecharLer Abstract

2014

Constructing fading histograms from data streams

Autores
Sebastião, R; Gama, J; Mendonça, T;

Publicação
Progress in AI

Abstract
The ability to collect data is changing drastically. Nowadays, data are gathered in the form of transient and finite data streams. Memory restrictions preclude keeping all received data in memory. When dealing with massive data streams, it is mandatory to create compact representations of data, also known as synopses structures or summaries. Reducing memory occupancy is of utmost importance when handling a huge amount of data. This paper addresses the problem of constructing histograms from data streams under error constraints. When constructing online histograms from data streams there are two main characteristics to embrace: the updating facility and the error of the histogram. Moreover, in dynamic environments, besides the need of compact summaries to capture the most important properties of data, it is also essential to forget old data. Therefore, this paper presents sliding histograms and fading histograms, an abrupt and a smooth strategies to forget outdated data. © 2014 Springer-Verlag Berlin Heidelberg.

FecharLer Abstract

2014

Distributed clustering of ubiquitous data streams

Autores
Rodrigues, PP; Gama, J;

Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Nowadays information is generated and gathered from distributed streaming data sources, stressing communications and computing infrastructure, making it hard to transmit, compute, and store. Knowledge discovery from ubiquitous data streams has become a major goal for all sorts of applications, mostly based on unsupervised techniques such as clustering. Two subproblems exist: clustering streaming data observations and clustering streaming data sources. The former searches for dense regions of the data space, identifying hot spots where data sources tend to produce data, while the latter finds groups of sources that behave similarly over time. In order to better assess the current status of this topic, this article presents a thorough review on distributed algorithms addressing either of the subproblems. We characterize clustering algorithms for ubiquitous data streams, discussing advantages and disadvantages of distributed procedures. Overall, distributed stream clustering methods improve communication ratios, processing speed, and resources consumption, while achieving similar clustering validity as the centralized counterparts. (C) 2013 John Wiley & Sons, Ltd.

FecharLer Abstract

2014

Enhancing data stream predictions with reliability estimators and explanation

Autores
Bosnic, Z; Demsar, J; Kespret, G; Rodrigues, PP; Gama, J; Kononenko, I;

Publicação
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Incremental learning from data streams is increasingly attracting research focus due to many real streaming problems (such as learning from transactions, sensors or other sequential observations) that require processing and forecasting in the real time. In this paper we deal with two issues related to incremental learning - prediction accuracy and prediction explanation - and demonstrate their applicability on several streaming problems for predicting electricity load in the future. For improving prediction accuracy we propose and evaluate the use of two reliability estimators that allow us to estimate prediction error and correct predictions. For improving interpretability of the incremental model and its predictions we propose an adaptation of the existing prediction explanation methodology, which was originally developed for batch learning from stationary data. The explanation methodology is combined with a state-of-the-art concept drift detector and a visualization technique to enhance the explanation in dynamic streaming settings. The results show that the proposed approaches can improve prediction accuracy and allow transparent insight into the modeled concept.

FecharLer Abstract