Publications

Publications by João Gama

2009

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data, Paris, France, June 28, 2009

Authors
Omitaomu, OA; Ganguly, AR; Vatsavai, RR; Gama, J; Chawla, NV; Gaber, MM;

Publication
KDD Workshop on Knowledge Discovery from Sensor Data

Abstract

2012

Estimating reliability for assessing and correcting individual streaming predictions

Authors
Rodrigues, PPE; Bosnic, Z; Gama, J; Kononenko, I;

Publication
Reliable Knowledge Discovery

Abstract
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In these cases, users should be allowed to associate a measure of reliability to each prediction. However, with the advent of data streams, batch state-of-the-art reliability estimates need to be redefined. In this chapter we adapt and evaluate five empirical measures for online reliability estimation of individual predictions: similarity-based (k-NN) error, local sensitivity (bias and variance) and online bagging predictions (bias and variance). Evaluation is performed with a neural network base model on two different problems, with results showing that online bagging and k-NN estimates are consistently correlated with the error of the base model. Furthermore, we propose an approach for correcting individual predictions based on the CNK reliability estimate. Evaluation is done on a real-world problem (prediction of the electricity load for a selected European geographical region), using two different regression models: neural network and the k nearest neighbors algorithm. Comparison is performed with corrections based on the Kalman filter. The results show that our method performs better than the Kalman filter, significantly improving the original predictions to more accurate values.

CloseRead Abstract

2009

Knowledge discovery for sensor network comprehension

Authors
Rodrigues, PP; Gama, J; Lopes, L;

Publication
Intelligent Techniques for Warehousing and Mining Sensor Network Data

Abstract

2007

Learning from data streams: Processing techniques in sensor networks

Authors
Gama, J; Gaber, MM;

Publication
Learning from Data Streams: Processing Techniques in Sensor Networks

Abstract
Sensor networks consist of distributed autonomous devices that cooperatively monitor an environment. Sensors are equipped with capacities to store information in memory, process this information and communicate with their neighbors. Processing data streams generated from wireless sensor networks has raised new research challenges over the last few years due to the huge numbers of data streams to be managed continuously and at a very high rate. The book provides the reader with a comprehensive overview of stream data processing, including famous prototype implementations like the Nile system and the TinyOS operating system. The set of chapters covers the state-of-art in data stream mining approaches using clustering, predictive learning, and tensor analysis techniques, and applying them to applications in security, the natural sciences, and education. This research monograph delivers to researchers and graduate students the state of the art in data stream processing in sensor networks. The huge bibliography offers an excellent starting point for further reading and future research. © Springer-Verlag Berlin Heidelberg 2007. All rights are reserved.

CloseRead Abstract

2008

Clustering Distributed Sensor Data Streams

Authors
Rodrigues, PP; Gama, J; Lopes, L;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS

Abstract
Nowadays applications produce infinite streams of data distributed across wide sensor networks. In this work we study the problem of continuously maintain a cluster structure over the data points generated by the entire network. Usual techniques operate by forwarding and concentrating the entire data in a central server, processing it as a multivariate stream. In this paper, we propose DGClust, a new distributed algorithm which reduces both the dimensionality and the communication burdens, by allowing each local sensor to keep an online discretization of its data stream, which operates with constant update time and (almost) fixed space. Each new data point triggers a cell in this univariate grid, reflecting the current state of the data stream at the local site. Whenever a local site changes its state, it notifies the central server about the new state it is in. This way, at each point in time, the central site has the global multivariate state of the entire network. To avoid monitoring all possible states, which is exponential in the number of sensors, the central site keeps a small list of counters of the most frequent global states. Finally, a simple adaptive partitional clustering algorithm is applied to the frequent states central points in order to provide an anytime definition of the clusters centers. The approach is evaluated in the context of distributed sensor networks, presenting both empirical and theoretical evidence of its advantages.

CloseRead Abstract

2010

Change Detection with Kalman Filter and CUSUM

Authors
Severo, Milton; Gama, Joao;

Publication
Ubiquitous Knowledge Discovery - Challenges, Techniques, Applications

Abstract
In most challenging applications learning algorithms act in dynamic environments where the data is collected over time. A desirable property of these algorithms is the ability of incremental incorporating new data in the actual decision model. Several incremental learning algorithms have been proposed. However most of them make the assumption that the examples are drawn from a stationary distribution [14]. The aim of this study is to present a detection system (DSKC) for regression problems. The system is modular and works as a post-processor of a regressor. It is composed by a regression predictor, a Kalman filter and a Cumulative Sum of Recursive Residual (CUSUM) change detector. The system continuously monitors the error of the regression model. A significant increase of the error is interpreted as a change in the distribution that generates the examples over time. When a change is detected, the actual regression model is deleted and a new one is constructed. In this paper we tested DSKC with a set of three artificial experiments, and two real-world datasets: a Physiological dataset and a clinic dataset of sleep apnoea. Sleep apnoea is a common disorder characterized by periods of breathing cessation (apnoea) and periods of reduced breathing (hypopnea) [7]. This is a real-world application where the goal is to detect changes in the signals that monitor breathing. The experimental results showed that the system detected changes fast and with high probability. The results also showed that the system is robust to false alarms and can be applied with efficiency to problems where the information is available over time. © 2010 Springer-Verlag.

CloseRead Abstract