Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2014

Event and Anomaly Detection Using Tucker3 Decomposition

Authors
T, HadiFanaee; Oliveira, MarciaD.B.; Gama, Joao; Malinowski, Simon; Morla, Ricardo;

Publication
CoRR

Abstract

2014

Failure Prediction - An Application in the Railway Industry

Authors
Pereira, P; Ribeiro, RP; Gama, J;

Publication
DISCOVERY SCIENCE, DS 2014

Abstract
Machine or system failures have high impact both at technical and economic levels. Most modern equipment has logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for novelty detection enables us to explore those datasets, building classification systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case we use a failure detection system to predict train doors breakdowns before they happen using data from their logging system. We study three methods for failure detection: outlier detection, novelty detection and a supervised SVM. Given the problem's features, namely the possibility of a passenger interrupting the movement of a door, the three predictors are prone to false alarms. The main contribution of this work is the use of a low-pass filter to process the output of the predictors leading to a strong reduction in the false alarm rate.

2014

Fast Incremental Matrix Factorization for Recommendation with Positive-Only Feedback

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
USER MODELING, ADAPTATION, AND PERSONALIZATION, UMAP 2014

Abstract
Traditional Collaborative Filtering algorithms for recommendation are designed for stationary data. Likewise, conventional evaluation methodologies are only applicable in offline experiments, where data and models are static. However, in real world systems, user feedback is continuously being generated, at unpredictable rates. One way to deal with this data stream is to perform online model updates as new data points become available. This requires algorithms able to process data at least as fast as it is generated. One other issue is how to evaluate algorithms in such a streaming data environment. In this paper we introduce a simple but fast incremental Matrix Factorization algorithm for positive-only feedback. We also contribute with a prequential evaluation protocol for recommender systems, suitable for streaming data environments. Using this evaluation methodology, we compare our algorithm with other state-of-the-art proposals. Our experiments reveal that despite its simplicity, our algorithm has competitive accuracy, while being significantly faster.

2013

Learning model rules from high-speed data streams

Authors
Almeida, E; Ferreira, C; Gama, J;

Publication
CEUR Workshop Proceedings

Abstract
Decision rules are one of the most expressive languages for machine learning. In this paper we present Adaptive Model Rules (AMRules), the first streaming rule learning algorithm for regression problems. In AMRules the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of attribute values. Each rule in AMRules uses a Page-Hinkley test to detect changes in the process generating data and react to changes by pruning the rule set. In the experimental section we report the results of AMRules on benchmark regression problems, and compare the performance of our algorithm with other streaming regression algorithms. © 2013 IJCAI.

2013

On evaluating stream learning algorithms

Authors
Gama, J; Sebastiao, R; Rodrigues, PP;

Publication
MACHINE LEARNING

Abstract
Most streaming decision models evolve continuously over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet convincingly addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of prequential error with forgetting mechanisms to provide reliable error estimators. We prove that, in stationary data and for consistent learning algorithms, the holdout estimator, the prequential error and the prequential error estimated over a sliding window or using fading factors, all converge to the Bayes error. The use of prequential error with forgetting mechanisms reveals to be advantageous in assessing performance and in comparing stream learning algorithms. It is also worthwhile to use the proposed methods for hypothesis testing and for change detection. In a set of experiments in drift scenarios, we evaluate the ability of a standard change detection algorithm to detect change using three prequential error estimators. These experiments point out that the use of forgetting mechanisms (sliding windows or fading factors) are required for fast and efficient change detection. In comparison to sliding windows, fading factors are faster and memoryless, both important requirements for streaming applications. Overall, this paper is a contribution to a discussion on best practice for performance assessment when learning is a continuous process, and the decision models are dynamic and evolve over time.

2014

On predicting a call center's workload: A discretization-based approach

Authors
Moreira Matias, L; Nunes, R; Ferreira, M; Mendes Moreira, J; Gama, J;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Agent scheduling in call centers is a major management problem as the optimal ratio between service quality and costs is hardly achieved. In the literature, regression and time series analysis methods have been used to address this problem by predicting the future arrival counts. In this paper, we propose to discretize these target variables into finite intervals. By reducing its domain length, the goal is to accurately mine the demand peaks as these are the main cause for abandoned calls. This was done by employing multi-class classification. This approach was tested on a real-world dataset acquired through a taxi dispatching call center. The results demonstrate that this framework can accurately reduce the number of abandoned calls, while maintaining a reasonable staff-based cost. © 2014 Springer International Publishing.

  • 7
  • 90