Publications

Publications by João Gama

2014

Collaborative Wind Power Forecast

Authors
Almeida, V; Gama, J;

Publication
ADAPTIVE AND INTELLIGENT SYSTEMS, ICAIS 2014

Abstract
There are several new emerging environments, generating data spatially spread and interrelated. These applications reinforce the importance of the development of analytical systems capable to sense the environment and receive data from different locations. In this study we explore collaborative methodologies in a real-world problem: wind power prediction. Wind power is considered one of the most rapidly growing sources of electricity generation all over the world. The problem consists of monitoring a network of wind farms that collaborate by sharing information in a very short-term forecasting problem. We use an auto-regressive integrated moving average (ARIMA) model. The Symbolic Aggregate Approximation (SAX) is used in the selection of the set of neighbours. We propose two collaborative methods. The first one, based on a centralized management, exchange data-points between nodes. In the second approach, correlated wind farms share their own ARIMA models. In the experimental work we use 1 year data from 16 wind farms. The goal is to predict the energy produced at each farm every hour in the next 6 hours. We compare the proposed methods against ARIMA models trained with data of each one of the farms and with the persistence model at each farm. We observe a small but consistent reduction of the root mean square error (RMSE) of the predictions.

CloseRead Abstract

2013

Contextual Anomalies in Medical Data

Authors
Vasco, D; Rodrigues, PP; Gama, J;

Publication
2013 IEEE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Anomalies in data can cause a lot of problems in the data analysis processes. Thus, it is necessary to improve data quality by detecting and eliminating errors and inconsistencies in the data, known as the data cleaning process [1]. Since detection and correction of anomalies requires detailed domain knowledge, the involvement of experts in the field is essential to the success of the process of cleaning the data. However, considering the size of data to be processed, this process should be as automatic as possible so as to minimize the time spent [1]. © 2013 IEEE.

CloseRead Abstract

2016

Evolving Centralities in Temporal Graphs: A Twitter Network Analysis

Authors
Pereira, FSF; Amo, Sd; Gama, J;

Publication
IEEE 17th International Conference on Mobile Data Management, MDM 2016, Porto, Portugal, June 13-16, 2016 - Workshops

Abstract

2015

Improving Mass Transit Operations by Using AVL-Based Systems: A Survey

Authors
Moreira Matias, L; Mendes Moreira, J; de Sousa, JF; Gama, J;

Publication
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Intelligent transportation systems based on automated data collection frameworks are widely used by the major transit companies around the globe. This paper describes the current state of the art on improving both planning and control on public road transportation companies using automatic vehicle location (AVL) data. By surveying this topic, the expectation is to help develop a better understanding of the nature, approaches, challenges, and opportunities with regard to these problems. This paper starts by presenting a brief review on improving the network definition based on historical location-based data. Second, it presents a comprehensive review on AVL-based evaluation techniques of the schedule plan (SP) reliability, discussing the existing metrics. Then, the different dimensions on improving the SP reliability are presented in detail, as well as the works addressing such problem. Finally, the automatic control strategies are also revised, along with the research employed over the location-based data. A comprehensive discussion on the techniques employed is provided to encourage those who are starting research on this topic. It is important to highlight that there are still gaps in AVL-based literature, such as the following: 1) long-term travel time prediction; 2) finding optimal slack time; or 3) choosing the best control strategy to apply in each situation in the event of schedule instability. Hence, this paper includes introductory model formulations, reference surveys, formal definitions, and an overview of a promising area, which is of interest to any researcher, regardless of the level of expertise.

CloseRead Abstract

2013

Random rules from data streams

Authors
Almeida, E; Kosina, P; Gama, J;

Publication
Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, Coimbra, Portugal, March 18-22, 2013

Abstract
Existing works suggest that random inputs and random features produce good results in classification. In this paper we study the problem of generating random rule sets from data streams. One of the most interpretable and flexible models for data stream mining prediction tasks is the Very Fast Decision Rules learner (VFDR). In this work we extend the VFDR algorithm using random rules from data streams. The proposed algorithm generates several sets of rules. Each rule set is associated with a set of Natt attributes. The proposed algorithm maintains all properties required when learning from stationary data streams: online and any-time classification, processing each example once. Copyright 2013 ACM.

CloseRead Abstract

2017

Acute Kidney Injury Detection: An Alarm System to Improve Early Treatment

Authors
Nogueira, AR; Ferreira, CA; Gama, J;

Publication
Foundations of Intelligent Systems - 23rd International Symposium, ISMIS 2017, Warsaw, Poland, June 26-29, 2017, Proceedings

Abstract
This work aims to help in the correct and early diagnosis of the acute kidney injury, through the application of data mining techniques. The main goal is to be implemented in Intensive Care Units (ICUs) as an alarm system, to assist health professionals in the diagnosis of this disease. These techniques will predict the future state of the patients, based on his current medical state and the type of ICU. Through the comparison of three different approaches (Markov Chain Model, Markov Chain Model ICU Specialists and Random Forest), we came to the conclusion that the best method is the Markov Chain Model ICU Specialists. © Springer International Publishing AG 2017.

CloseRead Abstract