Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2010

Validation of both number and coverage of bus schedules using AVL data

Autores
Matias, L; Gama, J; Moreira, JM; de Sousa, JF;

Publicação
13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Madeira, Portugal, 19-22 September 2010

Abstract
It is well known that the definition of bus schedules is critical for the service reliability of public transports. Several proposals have been suggested, using data from Automatic Vehicle Location (AVL) systems, in order to enhance the reliability of public transports. In this paper we study the optimum number of schedules and the days covered by each one of them, in order to increase reliability. We use the Dynamic Time Warping distance in order to calculate the similarities between two different dimensioned irregularly spaced data sequences before the use of data clustering techniques. The application of this methodology with the K-Means for a specific bus route demonstrated that a new schedule for the weekends in non-scholar periods could be considered due to its distinct profile from the remaining days. For future work, we propose to apply this methodology to larger data sets in time and in number, corresponding to different bus routes, in order to find a consensual cluster between all the routes. ©2010 IEEE.

FecharLer Abstract

2008

Knowledge discovery from sensor data

Autores
Ganguly, AR; Gama, J; Omitaomu, OA; Gaber, MM; Vatsavai, RR;

Publicação
Knowledge Discovery from Sensor Data

Abstract
As sensors become ubiquitous, a set of broad requirements is beginning to emerge across high-priority applications including disaster preparedness and management, adaptability to climate change, national or homeland security, and the management of critical infrastructures. This book presents innovative solutions in offline data mining and real-time analysis of sensor or geographically distributed data. It discusses the challenges and requirements for sensor data based knowledge discovery solutions in high-priority application illustrated with case studies. It explores the fusion between heterogeneous data streams from multiple sensor types and applications in science, engineering, and security. © 2009 by Taylor & Francis Group, LLC.

FecharLer Abstract

2008

Introduction

Autores
Ganguly, AR; Gama, J; Omitaomu, OA; Gaber, MM; Vatsavai, RR;

Publicação
Knowledge Discovery from Sensor Data

Abstract

2007

OLINDDA: A cluster-based approach for detecting novelty and concept drift in data streams

Autores
Spinosa, EJ; de Carvalho, APDF; Gama, J;

Publicação
APPLIED COMPUTING 2007, VOL 1 AND 2

Abstract
A machine learning approach that is capable of treating data streams presents new challenges and enables the analysis of a variety of real problems in which concepts change over time. In this scenario, the ability to identify novel concepts as well as to deal with concept drift axe two important attributes. This paper presents a technique based on the k-means clustering algorithm aimed at considering those two situations in a single learning strategy. Experimental results performed with data from various domains provide insight into how clustering algorithms can be used for the discovery of new concepts in streams of data.

FecharLer Abstract

2008

Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks

Autores
Spinosa, EJ; de Carvalho, APDF; Gama, J;

Publicação
APPLIED COMPUTING 2008, VOLS 1-3

Abstract
In this paper, a cluster-based novelty detection technique capable of dealing with a large amount of data is presented and evaluated in the context of intrusion detection. Starting with examples of a single class that describe the normal profile, the proposed technique detects novel concepts initially as cohesive clusters of examples and later as sets of clusters in an unsupervised incremental learning fashion. Experimental results with the KDD Cup 1999 data set show that the technique is capable of dealing with data streams, successfully learning novel concepts that are pure in terms of the real class structure.

FecharLer Abstract

2009

Adaptive Bayesian network classifiers

Autores
Castillo, G; Gama, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
This paper is concerned with adaptive learning algorithms for Bayesian network classifiers in a prequential (on-line) learning scenario. In this scenario, new data is available over time. An efficient supervised learning algorithm must be able to improve its predictive accuracy by incorporating the incoming data, while optimizing the cost of updating. However, if the process is not strictly stationary, the target concept could change over time. Hence, the predictive model should be adapted quickly to these changes. The main contribution of this work is a proposal of an unified, adaptive prequential framework for supervised learning called AdPreqFr4SL, which attempts to handle the cost-performance trade-off and deal with concept drift. Starting with the simple Naive Bayes, we scale up the complexity by gradually increasing the maximum number of allowable attribute dependencies, and then by searching for new dependences in the extended search space. Since updating the structure is a costly task, we use new data to primarily adapt the parameters. We adapt the structure only when is actually necessary. The method for handling concept drift is based on the Shewhart P-Chart. We experimentally prove the advantages of using the AdPreqFr4SL in comparison with its non-adaptive versions.

FecharLer Abstract