2013
Authors
Rodrigues, PP; Pechenizkiy, M; Gama, J; Correia, RC; Liu, J; Traina, AJM; Lucas, PJF; Soda, P;
Publication
CBMS
Abstract
2013
Authors
Gama, J; May, M; Marques, NC; Cortez, P; Ferreira, CA;
Publication
UDM@IJCAI
Abstract
2014
Authors
Sebastiao, R; Gama, J; Mendonca, T;
Publication
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014)
Abstract
The emergence of real temporal applications under non-stationary scenarios has drastically altered the ability to generate and gather information. Nowadays, under dynamic scenarios, potentially unbounded and massive amounts of information are generated at high-speed rate, known as data streams. Dealing with evolving data streams imposes the online monitoring of data in order to detect changes. The contribution of this paper is to present the advantage of using fading histograms to compare data distribution for change detection purposes. In an windowing scheme, data distributions provided by the fading histograms are compared using the Kullback-Leibler divergence. The experimental results support that the detection delay time is smaller when using fading histograms to represent data instead of standard histograms.
2014
Authors
Sebastião, R; Gama, J; Mendonça, T;
Publication
Progress in AI
Abstract
The ability to collect data is changing drastically. Nowadays, data are gathered in the form of transient and finite data streams. Memory restrictions preclude keeping all received data in memory. When dealing with massive data streams, it is mandatory to create compact representations of data, also known as synopses structures or summaries. Reducing memory occupancy is of utmost importance when handling a huge amount of data. This paper addresses the problem of constructing histograms from data streams under error constraints. When constructing online histograms from data streams there are two main characteristics to embrace: the updating facility and the error of the histogram. Moreover, in dynamic environments, besides the need of compact summaries to capture the most important properties of data, it is also essential to forget old data. Therefore, this paper presents sliding histograms and fading histograms, an abrupt and a smooth strategies to forget outdated data. © 2014 Springer-Verlag Berlin Heidelberg.
2014
Authors
Rodrigues, PP; Gama, J;
Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Nowadays information is generated and gathered from distributed streaming data sources, stressing communications and computing infrastructure, making it hard to transmit, compute, and store. Knowledge discovery from ubiquitous data streams has become a major goal for all sorts of applications, mostly based on unsupervised techniques such as clustering. Two subproblems exist: clustering streaming data observations and clustering streaming data sources. The former searches for dense regions of the data space, identifying hot spots where data sources tend to produce data, while the latter finds groups of sources that behave similarly over time. In order to better assess the current status of this topic, this article presents a thorough review on distributed algorithms addressing either of the subproblems. We characterize clustering algorithms for ubiquitous data streams, discussing advantages and disadvantages of distributed procedures. Overall, distributed stream clustering methods improve communication ratios, processing speed, and resources consumption, while achieving similar clustering validity as the centralized counterparts. (C) 2013 John Wiley & Sons, Ltd.
2014
Authors
Bosnic, Z; Demsar, J; Kespret, G; Rodrigues, PP; Gama, J; Kononenko, I;
Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Abstract
Incremental learning from data streams is increasingly attracting research focus due to many real streaming problems (such as learning from transactions, sensors or other sequential observations) that require processing and forecasting in the real time. In this paper we deal with two issues related to incremental learning - prediction accuracy and prediction explanation - and demonstrate their applicability on several streaming problems for predicting electricity load in the future. For improving prediction accuracy we propose and evaluate the use of two reliability estimators that allow us to estimate prediction error and correct predictions. For improving interpretability of the incremental model and its predictions we propose an adaptation of the existing prediction explanation methodology, which was originally developed for batch learning from stationary data. The explanation methodology is combined with a state-of-the-art concept drift detector and a visualization technique to enhance the explanation in dynamic streaming settings. The results show that the proposed approaches can improve prediction accuracy and allow transparent insight into the modeled concept.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.