Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2012

Event and anomaly detection using Tucker3 decomposition

Authors
Tork, HF; Oliveira, M; Gama, J; Malinowski, S; Morla, R;

Publication
CEUR Workshop Proceedings

Abstract
Failure detection in telecommunication networks is a vital task. So far, several supervised and unsupervised solutions have been provided for discovering failures in such networks. Among them unsupervised approaches has attracted more attention since no label data is required [1]. Often, network devices are not able to provide information about the type of failure. In such cases, unsupervised setting is more appropriate for diagnosis. Among unsupervised approaches, Principal Component Analysis (PCA) has been widely used for anomaly detection literature and can be applied to matrix data (e.g. Users-Features). However, one of the important properties of network data is their temporal sequential nature. So considering the interaction of dimensions over a third dimension, such as time, may provide us better insights into the nature of network failures. In this paper we demonstrate the power of three-way analysis to detect events and anomalies in time-evolving network data.

2012

Holistic distributed stream clustering for smart grids

Authors
Rodrigues, PP; Gama, J;

Publication
CEUR Workshop Proceedings

Abstract
Smart grids consist of millions of automated electronic meters that will be installed in electricity distribution networks and connected to servers that will manage grid supervision, billing and customer services. World sustainability regarding energy management will definitely rely on such grids, so smart grids need also to be sustainable themselves. This sustainability depends on several research problems that emerge from this new setting (from power balance to energy markets) requiring new approaches for knowledge discovery and decision support. This paper presents a holistic distributed stream clustering view of possible solutions for those problems, supported by previous research in related domains. The approach is based on two orthogonal clustering algorithms, combined for a holistic clustering of the grid. Experimental results are included to illustrate the benefits of each algorithm, while the proposal is discussed in terms of application to smart grid problems. This holistic approach could be used to help solving some of the smart grid intelligent layer research problems, thus improving global sustainability.

2012

Semi-supervised learning: Predicting activities in Android environment

Authors
Lopes, A; Mendes Moreira, J; Gama, J;

Publication
CEUR Workshop Proceedings

Abstract
Predicting activities from data gathered with sensors gained importance over the years with the objective of getting a better understanding of the human body. The purpose of this paper is to show that predicting activities on an Android phone is possible. We take into consideration different classifiers, their accuracy using different approaches (hierarchical and one step classification) and limitations of the mobile itself like battery and memory usage. A semi-supervised learning approach is taken in order to compare its results against supervised learning. The objective is to discover if the application can be adapted to the user providing a better solution for this problem. The activities predicted are the most usual in everyday life: walking, running, standing idle and sitting. An android prototype, embedding the software MOA, was developed to experimentally evaluate the ideas proposed here.

2012

A survey on learning from data streams: current and future trends

Authors
Gama, J;

Publication
Progress in AI

Abstract
Nowadays, there are applications in which the data are modeled best not as persistent tables, but rather as transient data streams. In this article, we discuss the limitations of current machine learning and data mining algorithms. We discuss the fundamental issues in learning in dynamic environments like continuously maintain learning models that evolve over time, learning and forgetting, concept drift and change detection. Data streams produce a huge amount of data that introduce new constraints in the design of learning algorithms: limited computational resources in terms of memory, cpu power, and communication bandwidth. We present some illustrative algorithms, designed to taking these constrains into account, for decision-tree learning, hierarchical clustering and frequent pattern mining. We identify the main issues and current challenges that emerge in learning from data streams that open research lines for further developments. © 2011 Springer-Verlag.

2011

Ubiquitous Knowledge Discovery Introduction

Authors
Gama, J; May, M;

Publication
INTELLIGENT DATA ANALYSIS

Abstract

2010

Validation of both number and coverage of bus schedules using AVL data

Authors
Matias, L; Gama, J; Moreira, JM; de Sousa, JF;

Publication
13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Madeira, Portugal, 19-22 September 2010

Abstract
It is well known that the definition of bus schedules is critical for the service reliability of public transports. Several proposals have been suggested, using data from Automatic Vehicle Location (AVL) systems, in order to enhance the reliability of public transports. In this paper we study the optimum number of schedules and the days covered by each one of them, in order to increase reliability. We use the Dynamic Time Warping distance in order to calculate the similarities between two different dimensioned irregularly spaced data sequences before the use of data clustering techniques. The application of this methodology with the K-Means for a specific bus route demonstrated that a new schedule for the weekends in non-scholar periods could be considered due to its distinct profile from the remaining days. For future work, we propose to apply this methodology to larger data sets in time and in number, corresponding to different bus routes, in order to find a consensual cluster between all the routes. ©2010 IEEE.

  • 52
  • 89