Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2024

Online boxplot derived outlier detection

Autores
Mazarei, A; Sousa, R; Mendes-Moreira, J; Molchanov, S; Ferreira, HM;

Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Outlier detection is a widely used technique for identifying anomalous or exceptional events across various contexts. It has proven to be valuable in applications like fault detection, fraud detection, and real-time monitoring systems. Detecting outliers in real time is crucial in several industries, such as financial fraud detection and quality control in manufacturing processes. In the context of big data, the amount of data generated is enormous, and traditional batch mode methods are not practical since the entire dataset is not available. The limited computational resources further compound this issue. Boxplot is a widely used batch mode algorithm for outlier detection that involves several derivations. However, the lack of an incremental closed form for statistical calculations during boxplot construction poses considerable challenges for its application within the realm of big data. We propose an incremental/online version of the boxplot algorithm to address these challenges. Our proposed algorithm is based on an approximation approach that involves numerical integration of the histogram and calculation of the cumulative distribution function. This approach is independent of the dataset's distribution, making it effective for all types of distributions, whether skewed or not. To assess the efficacy of the proposed algorithm, we conducted tests using simulated datasets featuring varying degrees of skewness. Additionally, we applied the algorithm to a real-world dataset concerning software fault detection, which posed a considerable challenge. The experimental results underscored the robust performance of our proposed algorithm, highlighting its efficacy comparable to batch mode methods that access the entire dataset. Our online boxplot method, leveraging dataset distribution to define whiskers, consistently achieved exceptional outlier detection results. Notably, our algorithm demonstrated computational efficiency, maintaining constant memory usage with minimal hyperparameter tuning.

2024

Sampling approaches to reduce very frequent seasonal time series

Autores
Baldo, A; Ferreira, PJS; Mendes-Moreira, J;

Publicação
EXPERT SYSTEMS

Abstract
With technological advancements, much data is being captured by sensors, smartphones, wearable devices, and so forth. These vast datasets are stored in data centres and utilized to forge data-driven models for the condition monitoring of infrastructures and systems through future data mining tasks. However, these datasets often surpass the processing capabilities of traditional information systems and methodologies due to their significant size. Additionally, not all samples within these datasets contribute valuable information during the model training phase, leading to inefficiencies. The processing and training of Machine Learning algorithms become time-consuming, and storing all the data demands excessive space, contributing to the Big Data challenge. In this paper, we propose two novel techniques to reduce large time-series datasets into more compact versions without undermining the predictive performance of the resulting models. These methods also aim to decrease the time required for training the models and the storage space needed for the condensed datasets. We evaluated our techniques on five public datasets, employing three Machine Learning algorithms: Holt-Winters, SARIMA, and LSTM. The outcomes indicate that for most of the datasets examined, our techniques maintain, and in several instances enhance, the forecasting accuracy of the models. Moreover, we significantly reduced the time required to train the Machine Learning algorithms employed.

2024

An Unsupervised Chatter Detection Method Based on AE and DBSCAN Clustering Utilizing Internal CNC Machine Signals

Autores
---, MP; Mendes-Moreira, J;

Publicação

Abstract
In manufacturing chatter is an unwanted phenomenon that can lead to product quality reduction and tool wear. Real time chatter detection is key to preventing these issues and improving overall machining efficiency. In this paper we propose an unsupervised chatter detection method using autoencoders (AE) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm that uses internal signals of Computer Numerical Control (CNC) machines. The proposed method starts by using an AE to extract features from raw internal signals collected from CNC machines. This step reduces the dimensionality of the data and captures the underlying patterns of chatter. Then the extracted features are fed into DBSCAN clustering algorithm which is a density based algorithm that groups similar data points and identifies outliers. We tested the proposed method with real world data collected from various CNC machines. The results show that our unsupervised chatter detection method has high accuracy, precision and recall, can detect chatter and distinguish it from normal machining. Also the method is robust to noise and can adapt to dynamic machining conditions. In summary our work presents an unsupervised chatter detection method using AE and DBSCAN clustering that uses internal signals of CNC machines. This method is a reliable and efficient solution for real time chatter detection so manufacturers can improve product quality, optimize machining process and reduce tool wear during machining.

2024

Towards a foundation large events model for soccer

Autores
Mendes-Neves, T; Meireles, L; Mendes-Moreira, J;

Publicação
MACHINE LEARNING

Abstract
This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework's design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player's contribution to the team's points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.

2024

Characterising Class Imbalance in Transportation Mode Detection: An Experimental Study

Autores
Muhammad, AR; Aguiar, A; Moreira, JM;

Publicação
Intelligent Data Engineering and Automated Learning - IDEAL 2024 - 25th International Conference, Valencia, Spain, November 20-22, 2024, Proceedings, Part II

Abstract
This study investigates the impact of class imbalance and its potential interplay with other factors on machine learning models for transportation mode classification, utilising two real-world GPS trajectory datasets. A Random Forest model serves as the baseline, demonstrating strong performance on the relatively balanced dataset but experiencing significant degradation on the imbalanced one. To mitigate this effect, we explore various state-of-the-art class imbalance learning techniques, finding only marginal improvements. Resampling the fairly balanced dataset to replicate the imbalanced distribution suggests that factors beyond class imbalance are at play. We hypothesise and provide preliminary evidence for class overlap as a potential contributing factor, underscoring the need for further investigation into the broader range of classification difficulty factors. Our findings highlight the importance of balanced class distributions and a deeper understanding of factors such as class overlap in developing robust and generalisable models for transportation mode detection. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2024

A Fast and Energy-Efficient Method for Online and Incremental Pareto-Front Update

Autores
Ferreira, PJS; Moreira, JM; Cardoso, JMP;

Publicação
10th IEEE World Forum on Internet of Things, WF-IoT 2024, Ottawa, ON, Canada, November 10-13, 2024

Abstract
Self-adaptive Systems (SaS) are becoming increasingly important for adapting to dynamic environments and for optimizing performance on resource-constrained devices. A practical approach to achieving self-adaptability involves using a Pareto-Front (PF) to store the system's hyper-parameters and the outcomes of hyperparameter combinations. This paper proposes a novel method to approximate a PF, offering a configurable number of solutions that can be adapted to the device's limitations. We conducted extensive experiments across various scenarios, where all PF solutions were replaced, and real world scenarios were performed using actual measurements from a Human Activity Recognition (HAR) system. Our results show that our method consistently outperforms previous methods, mainly when the maximum number of PF solutions is in the order of hundreds. The effectiveness of our method is most apparent in real-case scenarios where it achieves, when executed in a Raspberry Pi 5, up to 87% energy consumption reduction and lower execution times than the second-best algorithm. Additionally, our method ensures a more evenly distributed solution across the PF, preventing the high concentration of solutions. © 2024 IEEE.

  • 18
  • 466