Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2008

Clustering Distributed Sensor Data Streams

Autores
Rodrigues, PP; Gama, J; Lopes, L;

Publicação
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS

Abstract
Nowadays applications produce infinite streams of data distributed across wide sensor networks. In this work we study the problem of continuously maintain a cluster structure over the data points generated by the entire network. Usual techniques operate by forwarding and concentrating the entire data in a central server, processing it as a multivariate stream. In this paper, we propose DGClust, a new distributed algorithm which reduces both the dimensionality and the communication burdens, by allowing each local sensor to keep an online discretization of its data stream, which operates with constant update time and (almost) fixed space. Each new data point triggers a cell in this univariate grid, reflecting the current state of the data stream at the local site. Whenever a local site changes its state, it notifies the central server about the new state it is in. This way, at each point in time, the central site has the global multivariate state of the entire network. To avoid monitoring all possible states, which is exponential in the number of sensors, the central site keeps a small list of counters of the most frequent global states. Finally, a simple adaptive partitional clustering algorithm is applied to the frequent states central points in order to provide an anytime definition of the clusters centers. The approach is evaluated in the context of distributed sensor networks, presenting both empirical and theoretical evidence of its advantages.

FecharLer Abstract

2010

Change Detection with Kalman Filter and CUSUM

Autores
Severo, Milton; Gama, Joao;

Publicação
Ubiquitous Knowledge Discovery - Challenges, Techniques, Applications

Abstract
In most challenging applications learning algorithms act in dynamic environments where the data is collected over time. A desirable property of these algorithms is the ability of incremental incorporating new data in the actual decision model. Several incremental learning algorithms have been proposed. However most of them make the assumption that the examples are drawn from a stationary distribution [14]. The aim of this study is to present a detection system (DSKC) for regression problems. The system is modular and works as a post-processor of a regressor. It is composed by a regression predictor, a Kalman filter and a Cumulative Sum of Recursive Residual (CUSUM) change detector. The system continuously monitors the error of the regression model. A significant increase of the error is interpreted as a change in the distribution that generates the examples over time. When a change is detected, the actual regression model is deleted and a new one is constructed. In this paper we tested DSKC with a set of three artificial experiments, and two real-world datasets: a Physiological dataset and a clinic dataset of sleep apnoea. Sleep apnoea is a common disorder characterized by periods of breathing cessation (apnoea) and periods of reduced breathing (hypopnea) [7]. This is a real-world application where the goal is to detect changes in the signals that monitor breathing. The experimental results showed that the system detected changes fast and with high probability. The results also showed that the system is robust to false alarms and can be applied with efficiency to problems where the information is available over time. © 2010 Springer-Verlag.

FecharLer Abstract

2012

A Predictive Model for the Passenger Demand on a Taxi Network

Autores
Moreira Matias, L; Gama, J; Ferreira, M; Damas, L;

Publicação
2012 15TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC)

Abstract
In the last decade, the real-time vehicle location systems attracted everyone attention for the new kind of rich spatio-temporal information. The fast processing of this large amount of information is a growing and explosive challenge. Taxi companies are already exploring such information in efficient taxi dispatching and time-saving route finding. In this paper, we propose a novel methodology to produce online short term predictions on the passenger demand spatial distribution over 63 taxi stands in the city of Porto, Portugal. We did so using time series forecasting techniques to the processed events constantly communicated for 441 taxi vehicles. Our tests - using 4 months of real data - demonstrated that this model is a true major contribution to the driver mobility intelligence: 76% of the 86411 demanded taxi services were accurately forecasted in a 30 minutes time horizon.

FecharLer Abstract

2011

Contributions to a Decision Support System Based on Depth of Anesthesia Signals

Autores
Sebastiao, R; Silva, MM; Gama, J; Mendonca, T;

Publicação
2012 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
In the clinical practice the concerns about the administration of hypnotics and analgesics for minimally invasive diagnostics and therapeutic procedures have enormously increased in the past years. The automatic detection of changes in the signals used to evaluate the depth of anesthesia is hence of foremost importance in order to decide how to adapt the doses of hypnotics and analgesics that should be administered to patients. The aim of this work is to online detect drifts in the referred depth of anesthesia signals of patients undergoing general anesthesia. The performance of the proposed method is illustrated using BIS records previously collected from patients subject to abdominal surgery. The results show that the drifts detected by the proposed method are in accordance with the actions of the clinicians in terms of times where a change in the hypnotic or analgesic rates had occurred. This detection was performed under the presence of noise and sensor faults. The presented algorithm was also online validated. The results encourage the inclusion of the proposed algorithm in a decision support system based on depth of anesthesia signals.

FecharLer Abstract

1998

Combining Classifiers by Constructive Induction

Autores
Gama, J;

Publicação
Machine Learning: ECML-98, 10th European Conference on Machine Learning, Chemnitz, Germany, April 21-23, 1998, Proceedings

Abstract
Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data set by adding new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. Cascade Generalization produces a single but structured model for the data that combines the model class representation of the base classifiers. We have performed an empirical evaluation of Cascade composition of three well known classifiers: Naive Bayes, Linear Discriminant, and C4.5. Composite models show an increase of performance, sometimes impressive, when compared with the corresponding single models, with significant statistical confidence levels. © Springer-Veriag Berlin Heidelberg 1998.

FecharLer Abstract

1997

Oblique linear tree

Autores
Gama, J;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS: REASONING ABOUT DATA

Abstract
In this paper we present system Ltree for proposicional supervised learning. Ltree is able to define decision surfaces both orthogonal and oblique to the axes defined by the attributes of the input space. This is done combining a decision tree with a linear discriminant by means of constructive induction. At each decision node Ltree defines a new instance space by insertion of new attributes that are projections of the. examples that fall at this node over the hyper-planes given by a linear discriminant function. This new instance space is propagated down through the tree. Tests based on those new attributes are oblique with respect to the original input space. Ltree is a probabilistic tree in the sense that it outputs a class probability distribution for each query example. The class probability distribution is computed at learning time, taking into account the different class distributions on the path from the root to the actual node. We have carried out experiments on sixteen benchmark datasets and compared our system with other well known decision-tree systems (orthogonal and oblique) like C4.5, OC1 and LMDT. On these datasets we have observed that our system has advantages in what concerns accuracy and tree size at statistically significant confidence levels.

FecharLer Abstract