Publications

Publications by Rita Paula Ribeiro

2010

Interval Forecast of Water Quality Parameters

Authors
Ohashi, O; Torgo, L; Ribeiro, RP;

Publication
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
The current quality control methodology adopted by the water distribution service provider in the metropolitan region of Porto - Portugal, is based on simple heuristics and empirical knowledge. Based on the domain complexity and data volume, this application is a perfect candidate to apply data mining process. In this paper, we propose a new methodology to predict the range of normality for the values of different water quality parameters. These intervals of normality are of key importance to decide on costly inspection activities. Our experimental evaluation confirms that our proposal achieves good results on the task of forecasting the normal distribution of values for the following 30 days. The proposed method can be applied to other domains with similar network monitoring objectives.

CloseRead Abstract

2023

Explainable Predictive Maintenance

Authors
Pashami, S; Nowaczyk, S; Fan, Y; Jakubowski, J; Paiva, N; Davari, N; Bobek, S; Jamshidi, S; Sarmadi, H; Alabdallah, A; Ribeiro, RP; Veloso, B; Mouchaweh, MS; Rajaoarisoa, LH; Nalepa, GJ; Gama, J;

Publication
CoRR

Abstract

2023

An Online Anomaly Detection Approach for Fault Detection on Fire Alarm Systems

Authors
Tome, ES; Ribeiro, RP; Dutra, I; Rodrigues, A;

Publication
SENSORS

Abstract
The early detection of fire is of utmost importance since it is related to devastating threats regarding human lives and economic losses. Unfortunately, fire alarm sensory systems are known to be prone to failures and frequent false alarms, putting people and buildings at risk. In this sense, it is essential to guarantee smoke detectors' correct functioning. Traditionally, these systems have been subject to periodic maintenance plans, which do not consider the state of the fire alarm sensors and are, therefore, sometimes carried out not when necessary but according to a predefined conservative schedule. Intending to contribute to designing a predictive maintenance plan, we propose an online data-driven anomaly detection of smoke sensors that model the behaviour of these systems over time and detect abnormal patterns that can indicate a potential failure. Our approach was applied to data collected from independent fire alarm sensory systems installed with four customers, from which about three years of data are available. For one of the customers, the obtained results were promising, with a precision score of 1 with no false positives for 3 out of 4 possible faults. Analysis of the remaining customers' results highlighted possible reasons and potential improvements to address this problem better. These findings can provide valuable insights for future research in this area.

CloseRead Abstract

2022

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

Authors
Jesus, S; Pombal, J; Alves, D; Cruz, AF; Saleiro, P; Ribeiro, RP; Gama, J; Bizarro, P;

Publication
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022

Abstract

2022

Model Optimization in Imbalanced Regression

Authors
Silva, A; Ribeiro, RP; Moniz, N;

Publication
DISCOVERY SCIENCE (DS 2022)

Abstract
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values. Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA). This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain, thus preventing severe bias. However, its effectiveness as an optimization metric is unknown. In this paper, our goal is to study the impacts of using SERA as an optimization criterion in imbalanced regression tasks. Using gradient boosting algorithms as proof of concept, we perform an experimental study with 36 data sets of different domains and sizes. Results show that models that used SERA as an objective function are practically better than the models produced by their respective standard boosting algorithms at the prediction of extreme values. This confirms that SERA can be embedded as a loss function into optimization-based learning algorithms for imbalanced regression scenarios.

CloseRead Abstract

2022

The MetroPT dataset for predictive maintenance

Authors
Veloso, B; Gama, J; Ribeiro, RP; Pereira, PM;

Publication
SCIENTIFIC DATA

Abstract
The paper describes the MetroPT data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 to develop machine learning methods for online anomaly detection and failure prediction. Several analog sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed) provide a framework that can be easily used and help the development of new machine learning methods. This dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models.

CloseRead Abstract