Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2023

Online Influence Forest for Streaming Anomaly Detection

Autores
Martins, I; Resende, JS; Gama, J;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023

Abstract
As the digital world grows, data is being collected at high speed on a continuous and real-time scale. Hence, the imposed imbalanced and evolving scenario that introduces learning from streaming data remains a challenge. As the research field is still open to consistent strategies that assess continuous and evolving data properties, this paper proposes an unsupervised, online, and incremental anomaly detection ensemble of influence trees that implement adaptive mechanisms to deal with inactive or saturated leaves. This proposal features the fourth standardized moment, also known as kurtosis, as the splitting criteria and the isolation score, Shannon's information content, and the influence function of an instance as the anomaly score. In addition to improving interpretability, this proposal is also evaluated on publicly available datasets, providing a detailed discussion of the results.

FecharLer Abstract

2023

Predictive Maintenance, Adversarial Autoencoders and Explainability

Autores
Silva, MEP; Veloso, B; Gama, J;

Publicação
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII

Abstract
The transition to Industry 4.0 provoked a transformation of industrial manufacturing with a significant leap in automation and intelligent systems. This paradigm shift has brought about a mindset that emphasizes predictive maintenance: detecting future failures when current behaviour of industrial processes and machines is thought to be normal. The constant monitoring of industrial equipment produces massive quantities of data that enables the application of machine learning approaches to this task. This study uses deep learning-based models to build a data-driven predictive maintenance framework for the air production unit (APU), a crucial system for the proper functioning of a Metro do Porto train. This public transport system moves thousands of people every day and train failures lead to delays and loss of trust by clients. Therefore, it is essential not only to detect APU failures before they occur to minimize negative impacts, but also to provide explanations for the failure warnings that can aid in decision-making processes. We propose an autoencoder architecture trained with an adversarial loss, known as the Wasserstein Autoencoder with Generative Adversarial Network (WAE-GAN), designed to detect sensor failures in systems connected to the APU. Our model can detect APU failures up to two hours before they occur, allowing timely intervention of the maintenance teams. We further augment our model with an explainability layer, by providing explanations generated by a rule-based model that focuses on rare events. Results show that our model is able to detect APU failures without any false alarms, fulfilling the requisites of Metro do Porto for early detection of the failures.

FecharLer Abstract

2023

Guest Editorial: Special Issue on Stream Learning

Autores
Lu, J; Gama, J; Yao, X; Minku, L;

Publicação
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Abstract
In recent years, learning from streaming data, commonly known as stream learning, has enjoyed tremendous growth and shown a wealth of development at both the conceptual and application levels. Stream learning is highly visible in both the machine learning and data science fields and has become a hot new direction in research. Advancements in stream learning include learning with concept drift detection, that includes whether a drift has occurred; understanding where, when, and how a drift occurs; adaptation by actively or passively updating models; and online learning, active learning, incremental learning, and reinforcement learning in data streaming situations.

FecharLer Abstract

2023

Error Analysis on Industry Data: Using Weak Segment Detection for Local Model Agnostic Prediction Intervals

Autores
Mamede, R; Paiva, N; Gama, J;

Publicação
Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Abstract
Machine Learning has been overtaken by a growing necessity to explain and understand decisions made by trained models as regulation and consumer awareness have increased. Alongside understanding the inner workings of a model comes the task of verifying how adequately we can model a problem with the learned functions. Traditional global assessment functions lack the granularity required to understand local differences in performance in different regions of the feature space, where the model can have problems adapting. Residual Analysis adds a layer of model understanding by interpreting prediction residuals in an exploratory manner. However, this task can be unfeasible for high-dimensionality datasets through hypotheses and visualizations alone. In this work, we use weak interpretable learners to identify regions of high prediction error in the feature space. We achieve this by examining the absolute residuals of predictions made by trained regressors. This methodology retains the interpretability of the identified regions. It allows practitioners to have tools to formulate hypotheses surrounding model failure on particular regions for future model tunning, data collection, or data augmentation on critical cohorts of data. We present a way of including information on different levels of model uncertainty in the feature space through the use of locally fitted Model Agnostic Prediction Intervals (MAPIE) in the identified regions, comparing this approach with other common forms of conformal predictions which do not take into account findings from weak segment identification, by assessing local and global coverage of the prediction intervals. To demonstrate the practical application of our approach, we present a real-world industry use case in the context of inbound retention call-centre operations for a Telecom Provider to determine optimal pairing between a customer and an available assistant through the prediction of contracted revenue. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

FecharLer Abstract

2023

Which Way to Go - Finding Frequent Trajectories Through Clustering

Autores
Andrade, T; Gama, J;

Publicação
Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Abstract
Trajectory clustering is one of the most important issues in mobility patterns data mining. It is applied in several cases such as hot-spots detection, urban transportation control, animal migration movements, and tourist visiting routes among others. In this paper, we describe how to identify the most frequent trajectories from raw GPS data. By making use of the Ramer-Douglas-Peucker (RDP) mechanism we simplify the trajectories in order to obtain fewer points to check without losing information. We construct a similarity matrix by using the Fréchet distance metric and then employ density-based clustering to find the most similar trajectories. We perform experiments over three real-world datasets collected in the city of Porto, Portugal, and in Beijing China, and check the results of the most frequent trajectories for the top-k origins x destinations for the moves. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

FecharLer Abstract

2023

Bayesian Federated Learning: A Survey

Autores
Cao, LB; Chen, H; Fan, XH; Gama, J; Ong, YS; Kumar, V;

Publicação
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023

Abstract
Federated learning (FL) demonstrates its advantages in integrating distributed infrastructure, communication, computing and learning in a privacy-preserving manner. However, the robustness and capabilities of existing FL methods are challenged by limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability. Bayesian federated learning (BFL) has emerged as a promising approach to address these issues. This survey presents a critical overview of BFL, including its basic concepts, its relations to Bayesian learning in the context of FL, and a taxonomy of BFL from both Bayesian and federated perspectives. We categorize and discuss client- and server-side and FLbased BFL methods and their pros and cons. The limitations of the existing BFL methods and the future directions of BFL research further address the intricate requirements of real-life FL applications.

FecharLer Abstract