Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Bruno Miguel Veloso
  • Cargo

    Investigador Sénior
  • Desde

    01 março 2013
003
Publicações

2024

SWINN: Efficient nearest neighbor search in sliding windows using graphs

Autores
Mastelini, SM; Veloso, B; Halford, M; de Carvalho, ACPDF; Gama, J;

Publicação
INFORMATION FUSION

Abstract
Nearest neighbor search (NNS) is one of the main concerns in data stream applications since similarity queries can be used in multiple scenarios. Online NNS is usually performed on a sliding window by lazily scanning every element currently stored in the window. This paper proposes Sliding Window-based Incremental Nearest Neighbors (SWINN), a graph-based online search index algorithm for speeding up NNS in potentially never-ending and dynamic data stream tasks. Our proposal broadens the application of online NNS-based solutions, as even moderately large data buffers become impractical to handle when a naive NNS strategy is selected. SWINN enables efficient handling of large data buffers by using an incremental strategy to build and update a search graph supporting any distance metric. Vertices can be added and removed from the search graph. To keep the graph reliable for search queries, lightweight graph maintenance routines are run. According to experimental results, SWINN is significantly faster than performing a naive complete scan of the data buffer while keeping competitive search recall values. We also apply SWINN to online classification and regression tasks and show that our proposal is effective against popular online machine learning algorithms.

2024

Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach

Autores
Moya, AR; Veloso, B; Gama, J; Ventura, S;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitrarily over time. On the other hand, dealing with data streams and online learning is a challenging problem. In fact, the higher the technology goes, the greater the importance of sophisticated techniques to process these data streams. Thus, improving hyper-parameter self-tuning during online learning of these machine learning models is crucial. To this end, in this paper, we present MESSPT, an evolutionary algorithm for self-hyper-parameter tuning for data streams. We apply Differential Evolution to dynamically-sized samples, requiring a single pass-over of data to train and evaluate models and choose the best configurations. We take care of the number of configurations to be evaluated, which necessarily has to be reduced, thus making this evolutionary approach a micro-evolutionary one. Furthermore, we control how our evolutionary algorithm deals with concept drift. Experiments on different learning tasks and over well-known datasets show that our proposed MESSPT outperforms the state-of-the-art on hyper-parameter tuning for data streams.

2024

Detecting and Explaining Anomalies in the Air Production Unit of a Train

Autores
Davari, N; Veloso, B; Ribeiro, RP; Gama, J;

Publicação
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024

Abstract
Predictive maintenance methods play a crucial role in the early detection of failures and errors in machinery, preventing them from reaching critical stages. This paper presents a comprehensive study on a real-world dataset called MetroPT3, with data from a Metro do Porto train's air production unit (APU) system. The dataset comprises data collected from various analogue and digital sensors installed on the APU system, enabling the analysis of behavioural changes and deviations from normal patterns. We propose a data-driven predictive maintenance framework based on a Long Short-Term Memory Autoencoder (LSTM-AE) network. The LSTM-AE efficiently identifies abnormal data instances, leading to a reduction in false alarm rates. We also implement a Sparse Autoencoder (SAE) approach for comparative analysis. The experimental results demonstrate that the LSTM-AE outperforms the SAE regarding F1 Score, Recall, and Precision. Furthermore, to gain insights into the reasons for anomaly detection, we apply the Shap method to determine the importance of features in the predictive maintenance model. This approach enhances the interpretability of the model to support the decision-making process better.

2024

From Random to Informed Data Selection: A Diversity-Based Approach to Optimize Human Annotation and Few-Shot Learning

Autores
Alcoforado, A; Okamura, LH; Fama, IC; Dias Bueno, BF; Lavado, AM; Ferraz, TP; Veloso, B; Reali Costa, AH;

Publicação
Proceedings of the 16th International Conference on Computational Processing of Portuguese, PROPOR 2024, Santiago de Compostela, Galicia/Spain, 12-15 March, 2024

Abstract

2024

Super-Resolution Analysis for Landfill Waste Classification

Autores
Molina, M; Ribeiro, RP; Veloso, B; Carna, J;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024

Abstract
Illegal landfills are a critical issue due to their environmental, economic, and public health impacts. This study leverages aerial imagery for environmental crime monitoring. While advances in artificial intelligence and computer vision hold promise, the challenge lies in training models with high-resolution literature datasets and adapting them to open-access low-resolution images. Considering the substantial quality differences and limited annotation, this research explores the adaptability of models across these domains. Motivated by the necessity for a comprehensive evaluation of waste detection algorithms, it advocates cross-domain classification and super-resolution enhancement to analyze the impact of different image resolutions on waste classification as an evaluation to combat the proliferation of illegal landfills. We observed performance improvements by enhancing image quality but noted an influence on model sensitivity, necessitating careful threshold fine-tuning.

Teses
supervisionadas

2023

New Challenges in Official Statistics: Big Data Analytics and Multi-level Product Classification of Web Scraped Data

Autor
Juliana de Freitas Ulisses Machado

Instituição
UP-FEP

2023

{Unlocking Performance Potential: Power BI Implementation and its Transformative Impact on Proef's Business Intelligence

Autor
Bárbara Alexandra Ferreira Salgado

Instituição
UP-FEP

2023

Comparative Analysis of the Relationship Between Organizational Behaviour, Business Strategy, and Results - Study of European Football Clubs

Autor
Pedro Miguel Coutinho Federico de Mendes Ferreira

Instituição
UP-FEP

2023

Customers' revenue fluctuation in a Telecommunication company: Data Warehouse Construction and Visualization

Autor
Cândido Rafael Toledo Rocha

Instituição
UP-FEP

2023

Text mining of companies annual reports in PDF format

Autor
Svetlana Zamyatina

Instituição
UP-FEP