Publications

Publications by Rita Paula Ribeiro

2017

Preface

Authors
Sayed Mouchaweh, M; Bouchachia, H; Gama, J; Ribeiro, RP;

Publication
CEUR Workshop Proceedings

Abstract

2015

Resampling strategies for regression

Authors
Torgo, L; Branco, P; Ribeiro, RP; Pfahringer, B;

Publication
EXPERT SYSTEMS

Abstract
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal, we have a problem of class imbalance that was thoroughly studied within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important applications involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present two modifications of well-known resampling strategies for classification tasks: the under-sampling and the synthetic minority over-sampling technique (SMOTE) methods. These modifications allow the use of these strategies on regression tasks where the goal is to forecast rare extreme values of the target variable. In an extensive set of experiments, we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed resampling methods can be used with any existing regression algorithm, which means that they are general tools for addressing problems of forecasting rare extreme values of a continuous target variable.

CloseRead Abstract

2020

A Study on Imbalanced Data Streams

Authors
Aminian, E; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II

Abstract
Data are growing fast in today's world and great portion of that is in the form of stream. In many situations, data streams are imbalanced making it difficult to use with classical data mining methods. However, mining these special kinds of streams is one of the most attractive research area. In this paper, we propose two algorithms for learning from imbalanced regression data streams. Both methods are based on Chebychev's inequality but in a different way. The first method, under-samples from the frequent target value examples while the second method over-samples the rare and extreme target value examples. This way, the learner will focus in the rare and more difficult cases. We applied our methods to train regression models using two benchmark datasets and two well-known regression algorithms: Perceptron and FIMT-DD. Our obtained results from the simulations indicate the usefulness of our proposed methods.

CloseRead Abstract

2020

Using Property-Based Testing to Generate Feedback for C Programming Exercises

Authors
Vasconcelos, PB; Ribeiro, RP;

Publication
First International Computer Programming Education Conference, ICPEC 2020, June 25-26, 2020, ESMAD, Vila do Conde, Portugal (Virtual Conference).

Abstract
This paper reports on the use of property-based testing for providing feedback to C programming exercises. Test cases are generated automatically from properties specified in a test script; this not only makes it possible to conduct many tests (thus potentially find more mistakes), but also allows simplifying failed tests cases automatically. We present some experimental validation gathered for an introductory C programming course during the fall semester of 2018 that show significant positive correlations between getting feedback during the semester and the student's results in the final exam. We also discuss some limitations regarding feedback for undefined behaviors in the C language. 2012 ACM Subject Classification Social and professional topics ! Student assessment; Software and its engineering ! Software testing and debugging; Software and its engineering ! Domain specific languages.

CloseRead Abstract

2020

Imbalanced regression and extreme value prediction

Authors
Ribeiro, RP; Moniz, N;

Publication
MACHINE LEARNING

Abstract
Research in imbalanced domain learning has almost exclusively focused on solving classification tasks for accurate prediction of cases labelled with a rare class. Approaches for addressing such problems in regression tasks are still scarce due to two main factors. First, standard regression tasks assume each domain value as equally important. Second, standard evaluation metrics focus on assessing the performance of models on the most common values of data distributions. In this paper, we present an approach to tackle imbalanced regression tasks where the objective is to predict extreme (rare) values. We propose an approach to formalise such tasks and to optimise/evaluate predictive models, overcoming the factors mentioned and issues in related work. We present an automatic and non-parametric method to obtain relevance functions, building on the concept of relevance as the mapping of target values into non-uniform domain preferences. Then, we proposeSERA, a new evaluation metric capable of assessing the effectiveness and of optimising models towards the prediction of extreme values while penalising severe model bias. An experimental study demonstrates howSERAprovides valid and useful insights into the performance of models in imbalanced regression tasks.

CloseRead Abstract

2020

Failure Detection of an Air Production Unit in Operational Context

Authors
Barros, M; Veloso, B; Pereira, PM; Ribeiro, RP; Gama, J;

Publication
IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning - Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14-18, 2020, Revised Selected Papers

Abstract
The transformation of industrial manufacturing with computers and automation with smart systems leads us to monitor and log of industrial equipment events. It is possible to apply analytic approaches, and to find interpretive results for strategic decision making, providing advantages such as failure detection and predictive maintenance. Over the last years, many researchers have been studying the application of machine learning techniques to improve such tasks. In this context, we develop a system capable of detect anomalies on an Air Production Unit (APU), taking into consideration the peak frequency of each sensor. The study started with the analysis of the sensors installed on the APU, defining its normal behavior and its failure mode. Using that information, we define rules, to monitor the APU, to detect anomalies on its components, and to predict possible failures. The definition of rules was based on the peak frequency analysis, which allowed the setting of boundaries of normality for the APU working modes and, thus, the identification of anomalies. © 2020, Springer Nature Switzerland AG.

CloseRead Abstract