Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2023

MetroPT-3 Dataset

Authors
Davari, N; Veloso, B; Ribeiro, RP; Gama, J;

Publication

Abstract

2023

Guest Editorial: Special Issue on Stream Learning

Authors
Lu, J; Gama, J; Yao, X; Minku, L;

Publication
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Abstract
In recent years, learning from streaming data, commonly known as stream learning, has enjoyed tremendous growth and shown a wealth of development at both the conceptual and application levels. Stream learning is highly visible in both the machine learning and data science fields and has become a hot new direction in research. Advancements in stream learning include learning with concept drift detection, that includes whether a drift has occurred; understanding where, when, and how a drift occurs; adaptation by actively or passively updating models; and online learning, active learning, incremental learning, and reinforcement learning in data streaming situations.

2023

Error Analysis on Industry Data: Using Weak Segment Detection for Local Model Agnostic Prediction Intervals

Authors
Mamede, R; Paiva, N; Gama, J;

Publication
Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Abstract
Machine Learning has been overtaken by a growing necessity to explain and understand decisions made by trained models as regulation and consumer awareness have increased. Alongside understanding the inner workings of a model comes the task of verifying how adequately we can model a problem with the learned functions. Traditional global assessment functions lack the granularity required to understand local differences in performance in different regions of the feature space, where the model can have problems adapting. Residual Analysis adds a layer of model understanding by interpreting prediction residuals in an exploratory manner. However, this task can be unfeasible for high-dimensionality datasets through hypotheses and visualizations alone. In this work, we use weak interpretable learners to identify regions of high prediction error in the feature space. We achieve this by examining the absolute residuals of predictions made by trained regressors. This methodology retains the interpretability of the identified regions. It allows practitioners to have tools to formulate hypotheses surrounding model failure on particular regions for future model tunning, data collection, or data augmentation on critical cohorts of data. We present a way of including information on different levels of model uncertainty in the feature space through the use of locally fitted Model Agnostic Prediction Intervals (MAPIE) in the identified regions, comparing this approach with other common forms of conformal predictions which do not take into account findings from weak segment identification, by assessing local and global coverage of the prediction intervals. To demonstrate the practical application of our approach, we present a real-world industry use case in the context of inbound retention call-centre operations for a Telecom Provider to determine optimal pairing between a customer and an available assistant through the prediction of contracted revenue. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2023

Which Way to Go - Finding Frequent Trajectories Through Clustering

Authors
Andrade, T; Gama, J;

Publication
Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Abstract
Trajectory clustering is one of the most important issues in mobility patterns data mining. It is applied in several cases such as hot-spots detection, urban transportation control, animal migration movements, and tourist visiting routes among others. In this paper, we describe how to identify the most frequent trajectories from raw GPS data. By making use of the Ramer-Douglas-Peucker (RDP) mechanism we simplify the trajectories in order to obtain fewer points to check without losing information. We construct a similarity matrix by using the Fréchet distance metric and then employ density-based clustering to find the most similar trajectories. We perform experiments over three real-world datasets collected in the city of Porto, Portugal, and in Beijing China, and check the results of the most frequent trajectories for the top-k origins x destinations for the moves. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2024

SWINN: Efficient nearest neighbor search in sliding windows using graphs

Authors
Mastelini, SM; Veloso, B; Halford, M; de Carvalho, ACPDF; Gama, J;

Publication
INFORMATION FUSION

Abstract
Nearest neighbor search (NNS) is one of the main concerns in data stream applications since similarity queries can be used in multiple scenarios. Online NNS is usually performed on a sliding window by lazily scanning every element currently stored in the window. This paper proposes Sliding Window-based Incremental Nearest Neighbors (SWINN), a graph-based online search index algorithm for speeding up NNS in potentially never-ending and dynamic data stream tasks. Our proposal broadens the application of online NNS-based solutions, as even moderately large data buffers become impractical to handle when a naive NNS strategy is selected. SWINN enables efficient handling of large data buffers by using an incremental strategy to build and update a search graph supporting any distance metric. Vertices can be added and removed from the search graph. To keep the graph reliable for search queries, lightweight graph maintenance routines are run. According to experimental results, SWINN is significantly faster than performing a naive complete scan of the data buffer while keeping competitive search recall values. We also apply SWINN to online classification and regression tasks and show that our proposal is effective against popular online machine learning algorithms.

2023

Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Authors
Bifet, A; Lorena, AC; Ribeiro, RP; Gama, J; Abreu, PH;

Publication
DS

Abstract

  • 85
  • 88