Publications

Publications by LIAAD

2014

Distributed Enviroment Framework for Optimization Experiments

Authors
Abreu, P; Soares, C; Camacho, R;

Publication
2014 14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ITS APPLICATIONS (ICCSA)

Abstract
Optimization studies often require very large computational resources to execute experiments. Furthermore, most of the time, the experiments are repetitions (same problem instances and same algorithm with the same parameters) that were carried out in past studies. In this work, we propose a framework for the execution of optimization experiments in a distributed environment and for the storage of the results as well as of the experimental conditions. The framework can support not only the organized execution of experiments but it also enables the reuse of the results in future studies.

CloseRead Abstract

2014

Preface

Authors
Vanschoren, J; Brazdil, P; Soares, C; Kotthoff, L;

Publication
CEUR Workshop Proceedings

Abstract

2014

Comparing Data Distribution Using Fading Histograms

Authors
Sebastiao, R; Gama, J; Mendonca, T;

Publication
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014)

Abstract
The emergence of real temporal applications under non-stationary scenarios has drastically altered the ability to generate and gather information. Nowadays, under dynamic scenarios, potentially unbounded and massive amounts of information are generated at high-speed rate, known as data streams. Dealing with evolving data streams imposes the online monitoring of data in order to detect changes. The contribution of this paper is to present the advantage of using fading histograms to compare data distribution for change detection purposes. In an windowing scheme, data distributions provided by the fading histograms are compared using the Kullback-Leibler divergence. The experimental results support that the detection delay time is smaller when using fading histograms to represent data instead of standard histograms.

CloseRead Abstract

2014

Constructing fading histograms from data streams

Authors
Sebastião, R; Gama, J; Mendonça, T;

Publication
Progress in AI

Abstract
The ability to collect data is changing drastically. Nowadays, data are gathered in the form of transient and finite data streams. Memory restrictions preclude keeping all received data in memory. When dealing with massive data streams, it is mandatory to create compact representations of data, also known as synopses structures or summaries. Reducing memory occupancy is of utmost importance when handling a huge amount of data. This paper addresses the problem of constructing histograms from data streams under error constraints. When constructing online histograms from data streams there are two main characteristics to embrace: the updating facility and the error of the histogram. Moreover, in dynamic environments, besides the need of compact summaries to capture the most important properties of data, it is also essential to forget old data. Therefore, this paper presents sliding histograms and fading histograms, an abrupt and a smooth strategies to forget outdated data. © 2014 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract

2014

Distributed clustering of ubiquitous data streams

Authors
Rodrigues, PP; Gama, J;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Nowadays information is generated and gathered from distributed streaming data sources, stressing communications and computing infrastructure, making it hard to transmit, compute, and store. Knowledge discovery from ubiquitous data streams has become a major goal for all sorts of applications, mostly based on unsupervised techniques such as clustering. Two subproblems exist: clustering streaming data observations and clustering streaming data sources. The former searches for dense regions of the data space, identifying hot spots where data sources tend to produce data, while the latter finds groups of sources that behave similarly over time. In order to better assess the current status of this topic, this article presents a thorough review on distributed algorithms addressing either of the subproblems. We characterize clustering algorithms for ubiquitous data streams, discussing advantages and disadvantages of distributed procedures. Overall, distributed stream clustering methods improve communication ratios, processing speed, and resources consumption, while achieving similar clustering validity as the centralized counterparts. (C) 2013 John Wiley & Sons, Ltd.

CloseRead Abstract

2014

Enhancing data stream predictions with reliability estimators and explanation

Authors
Bosnic, Z; Demsar, J; Kespret, G; Rodrigues, PP; Gama, J; Kononenko, I;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Incremental learning from data streams is increasingly attracting research focus due to many real streaming problems (such as learning from transactions, sensors or other sequential observations) that require processing and forecasting in the real time. In this paper we deal with two issues related to incremental learning - prediction accuracy and prediction explanation - and demonstrate their applicability on several streaming problems for predicting electricity load in the future. For improving prediction accuracy we propose and evaluate the use of two reliability estimators that allow us to estimate prediction error and correct predictions. For improving interpretability of the incremental model and its predictions we propose an adaptation of the existing prediction explanation methodology, which was originally developed for batch learning from stationary data. The explanation methodology is combined with a state-of-the-art concept drift detector and a visualization technique to enhance the explanation in dynamic streaming settings. The results show that the proposed approaches can improve prediction accuracy and allow transparent insight into the modeled concept.

CloseRead Abstract