Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2024

SWINN: Efficient nearest neighbor search in sliding windows using graphs

Autores
Mastelini, SM; Veloso, B; Halford, M; de Carvalho, ACPDF; Gama, J;

Publicação
INFORMATION FUSION

Abstract
Nearest neighbor search (NNS) is one of the main concerns in data stream applications since similarity queries can be used in multiple scenarios. Online NNS is usually performed on a sliding window by lazily scanning every element currently stored in the window. This paper proposes Sliding Window-based Incremental Nearest Neighbors (SWINN), a graph-based online search index algorithm for speeding up NNS in potentially never-ending and dynamic data stream tasks. Our proposal broadens the application of online NNS-based solutions, as even moderately large data buffers become impractical to handle when a naive NNS strategy is selected. SWINN enables efficient handling of large data buffers by using an incremental strategy to build and update a search graph supporting any distance metric. Vertices can be added and removed from the search graph. To keep the graph reliable for search queries, lightweight graph maintenance routines are run. According to experimental results, SWINN is significantly faster than performing a naive complete scan of the data buffer while keeping competitive search recall values. We also apply SWINN to online classification and regression tasks and show that our proposal is effective against popular online machine learning algorithms.

2024

Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach

Autores
Moya, AR; Veloso, B; Gama, J; Ventura, S;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitrarily over time. On the other hand, dealing with data streams and online learning is a challenging problem. In fact, the higher the technology goes, the greater the importance of sophisticated techniques to process these data streams. Thus, improving hyper-parameter self-tuning during online learning of these machine learning models is crucial. To this end, in this paper, we present MESSPT, an evolutionary algorithm for self-hyper-parameter tuning for data streams. We apply Differential Evolution to dynamically-sized samples, requiring a single pass-over of data to train and evaluate models and choose the best configurations. We take care of the number of configurations to be evaluated, which necessarily has to be reduced, thus making this evolutionary approach a micro-evolutionary one. Furthermore, we control how our evolutionary algorithm deals with concept drift. Experiments on different learning tasks and over well-known datasets show that our proposed MESSPT outperforms the state-of-the-art on hyper-parameter tuning for data streams.

2024

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning

Autores
Salazar, T; Gama, J; Araújo, H; Abreu, PH;

Publicação
CoRR

Abstract

2024

A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance

Autores
Gama, J; Ribeiro, RP; Mastelini, SM; Davari, N; Veloso, B;

Publicação
CoRR

Abstract

2024

Where DoWe Go From Here? Location Prediction from Time-Evolving Markov Models

Autores
Andrade, T; Gama, J;

Publicação
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024

Abstract
Various relevant aspects of our lives relate to the places we visit and our daily activities. The movement of individuals between regular places, such as work, school, or other important personal locations is getting increasing attention due to the pervasiveness of geolocation devices and the amount of data they generate. This work presents an approach for location prediction using a probabilistic model and data mining techniques over mobility data streams. We evaluate the method over 5 real-world datasets. The results show the usefulness of the proposal in comparison with other-well-known approaches.

2024

S plus t-SNE - Bringing Dimensionality Reduction to Data Streams

Autores
Vieira, PC; Montrezol, JP; Vieira, JT; Gama, J;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT II, IDA 2024

Abstract
We present S+t-SNE, an adaptation of the t-SNE algorithm designed to handle infinite data streams. The core idea behind S+t-SNE is to update the t-SNE embedding incrementally as new data arrives, ensuring scalability and adaptability to handle streaming scenarios. By selecting the most important points at each step, the algorithm ensures scalability while keeping informative visualisations. By employing a blind method for drift management, the algorithm adjusts the embedding space, which facilitates the visualisation of evolving data dynamics. Our experimental evaluations demonstrate the effectiveness and efficiency of S+t-SNE, whilst highlighting its ability to capture patterns in a streaming scenario. We hope our approach offers researchers and practitioners a real-time tool for understanding and interpreting high-dimensional data.

  • 13
  • 466