Publications

Publications by João Gama

2014

An Online Learning Framework for Predicting the Taxi Stand's Profitability

Authors
Moreira Matias, L; Mendes Moreira, J; Ferreira, M; Gama, J; Damas, L;

Publication
2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC)

Abstract
Taxi services play a central role in the mobility dynamics of major urban areas. Advanced communication devices such as GPS (Global Positioning System) and GSM (Global System for Mobile Communications) made it possible to monitor the drivers' activities in real-time. This paper presents an online learning approach to predict profitability in taxi stands. This approach consists of classifying each stand based according to the type of services that are being requested (for instance, short and long trips). This classification is achieved by maintaining a time-evolving histogram to approximate local probability density functions (p.d.f.) in service revenues. The future values of this histogram are estimated using time series analysis methods assuming that a non-homogeneous Poisson process is in place. Finally, the method's outputs were combined using a voting ensemble scheme based on a sliding window of historical data. Experimental tests were conducted using online data transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide an effective insight on the characterization of taxi stand profitability.

CloseRead Abstract

2017

Improving Incremental Recommenders with Online Bagging

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Online recommender systems often deal with continuous, potentially fast and unbounded flows of data. Ensemble methods for recommender systems have been used in the past in batch algorithms, however they have never been studied with incremental algorithms that learn from data streams. We evaluate online bagging with an incremental matrix factorization algorithm for top-N recommendation with positiveonly user feedback, often known as binary ratings. Our results show that online bagging is able to improve accuracy up to 35% over the baseline, with small computational overhead.

CloseRead Abstract

2016

Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach

Authors
Colonna, JG; Gama, J; Nakamura, EF;

Publication
DISCOVERY SCIENCE, (DS 2016)

Abstract
In bioacoustic recognition approaches, a "flat" classifier is usually trained to recognize several species of anuran, where the number of classes is equal to the number of species. Consequently, the complexity of the classification function increases proportionally to the amount of species. To avoid this issue we propose a "hierarchical" approach that decomposes the problem into three taxonomic levels: the family, the genus, and the species level. To accomplish this, we transform the original single-label problem into a multi-dimensional problem (multi-label and multi-class) considering the Linnaeus taxonomy. Then, we develop a top-down method using a set of classifiers organized as a hierarchical tree. Thus, it is possible to predict the same set of species as a flat classifier, and additionally obtain new information about the samples and their taxonomic relationship. This helps us to understand the problem better and achieve additional conclusions by the inspection of the confusion matrices at the three levels of classification. In addition, we carry out our experiments using a Cross-Validation performed by individuals. This form of CV avoids mixing syllables that belong to the same specimens in the testing and training sets, preventing an overestimate of the accuracy and generalizing the predictive capabilities of the system. We tested our system in a dataset with sixty individual frogs, from ten different species, eight genus, and four families, achieving a final Micro-and Average-accuracy equal to 86% and 62% respectively.

CloseRead Abstract

2016

Sequential anomalies: a study in the Railway Industry

Authors
Ribeiro, RP; Pereira, P; Gama, J;

Publication
MACHINE LEARNING

Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.

CloseRead Abstract

2016

Adaptive Model Rules From High-Speed Data Streams

Authors
Duarte, J; Gama, J; Bifet, A;

Publication
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA

Abstract
Decision rules are one of the most expressive and interpretable models for machine learning. In this article, we present Adaptive Model Rules (AMRules), the first stream rule learning algorithm for regression problems. In AMRules, the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. In order to maintain a regression model compatible with the most recent state of the process generating data, each rule uses a Page-Hinkley test to detect changes in this process and react to changes by pruning the rule set. Online learning might be strongly affected by outliers. AMRules is also equipped with outliers detection mechanisms to avoid model adaption using anomalous examples. In the experimental section, we report the results of AMRules on benchmark regression problems, and compare the performance of our system with other streaming regression algorithms.

CloseRead Abstract

2015

An overview on the exploitation of time in collaborative filtering

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Classic Collaborative Filtering (CF) algorithms rely on the assumption that data are static and we usually disregard the temporal effects in natural user-generated data. These temporal effects include user preference drifts and shifts, seasonal effects, inclusion of new users, and items entering the systemand old ones leavinguser and item activity rate fluctuations and other similar time-related phenomena. These phenomena continuously change the underlying relations between users and items that recommendation algorithms essentially try to capture. In the past few years, a new generation of CF algorithms has emerged, using the time dimension as a key factor to improve recommendation models. In this overview, we present a comprehensive analysis of these algorithms and identify important challenges to be faced in the near future.(C) 2015 John Wiley & Sons, Ltd.

CloseRead Abstract