Publications

Publications by Rita Paula Ribeiro

2021

Improving Smart Waste Collection Using AutoML

Authors
Teixeira, S; Londres, G; Veloso, B; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II

Abstract
The production and management of urban waste is a growing challenge and a consequence of our day-to-day resources and activities. According to the Portuguese Environment Agency, in 2019, Portugal produced 1% more tons compared to 2018. The proper management of this waste can be co-substantiated by existing policies, namely, national legislation and the Strategic Plan for Urban Waste. Those policies assess and support the amount of waste processed, allowing the recovery of materials. Among the solutions for waste management is the selective collection of waste. We improve the possibility of manage the smart waste collection of Paper, Plastic, and Glass packaging from corporate customers who joined a recycling program. We have data collected since 2017 until 2020. The main objective of this work is to increase the system's predictive performance, without any loss for citizens, but with improvement in the collection management. We analyze two types of problems: (i) the presence or absence of containers; and (ii) the prediction of the number of containers by type of waste. To carry out the analysis, we applied three machine learning algorithms: XGBoost, Random Forest, and Rpart. Additionally, we also use AutoML for XGBoost and Random Forest algorithms. The results show that with AutoML, generally, it is possible to obtain better results for classifying the presence or absence of containers by type of waste and predict the number of containers.

CloseRead Abstract

2021

Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2021, Virtual Event, September 13-17, 2021, Proceedings, Part I

Authors
Kamp, M; Koprinska, I; Bibal, A; Bouadi, T; Frénay, B; Galárraga, L; Oramas, J; Adilova, L; Krishnamurthy, Y; Kang, B; Largeron, C; Lijffijt, J; Viard, T; Welke, P; Ruocco, M; Aune, E; Gallicchio, C; Schiele, G; Pernkopf, F; Blott, M; Fröning, H; Schindler, G; Guidotti, R; Monreale, A; Rinzivillo, S; Biecek, P; Ntoutsi, E; Pechenizkiy, M; Rosenhahn, B; Buckley, CL; Cialfi, D; Lanillos, P; Ramstead, M; Verbelen, T; Ferreira, PM; Andresini, G; Malerba, D; Medeiros, I; Viger, PF; Nawaz, MS; Ventura, S; Sun, M; Zhou, M; Bitetta, V; Bordino, I; Ferretti, A; Gullo, F; Ponti, G; Severini, L; Ribeiro, RP; Gama, J; Gavaldà, R; Cooper, LAD; Ghazaleh, N; Richiardi, J; Roqueiro, D; Miranda, DS; Sechidis, K; Graça, G;

Publication
PKDD/ECML Workshops (1)

Abstract

2021

Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2021, Virtual Event, September 13-17, 2021, Proceedings, Part II

Publication
PKDD/ECML Workshops (2)

Abstract

2022

Bank Statements to Network Features: Extracting Features Out of Time Series Using Visibility Graph

Authors
Shaji, N; Gama, J; Ribeiro, RP; Gomes, P;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XX, IDA 2022

Abstract
Non-traditional data like the applicant's bank statement is a significant source for decision-making when granting loans. We find that we can use methods from network science on the applicant's bank statements to convert inherent cash flow characteristics to predictors for default prediction in a credit scoring or credit risk assessment model. First, the credit cash flow is extracted from a bank statement and later converted into a visibility graph or network. Afterwards, we use this visibility network to find features that predict the borrowers' repayment behaviour. We see that feature selection methods select all the five extracted features. Finally, SMOTE is used to balance the training data. The model using the features from the network and the standard features together is shown having superior performance compared to the model that uses only the standard features, indicating the network features' predictive power.

CloseRead Abstract

2022

Combining Multiple Data Sources to Predict IUCN Conservation Status of Reptiles

Authors
Soares, N; Goncalves, JF; Vasconcelos, R; Ribeiro, RP;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XX, IDA 2022

Abstract
Biodiversity loss is a hot topic. We are losing species at a high rate, even before their extinction risk is assessed. The International Union for Conservation of Nature (IUCN) Red List is the most complete assessment of all species conservation status, yet it only covers a small part of the species identified so far. Additionally, many of the existing assessments are outdated, either due to the ever-evolving nature of taxonomy, or to the lack of reassessments. The assessment of the conservation status of a species is a long, mostly manual process that needs to be carefully done by experts. The conservation field would gain by having ways of automating this process, for instance, by prioritising the species where experts and financing should focus on. In this paper, we present a pipeline used to derive a conservation dataset out of openly available data and obtain predictions, through machine learning techniques, on which species are most likely to be threatened. We applied this pipeline to the different groups within the Reptilia class as a model of one of the most under-assessed taxonomic groups. Additionally, we compared the performance of models using datasets that include different sets of predictors describing species ecological requirements and geographical distributions such as IUCN's area and extent of occurrence. Our results show that most groups benefit from using ecological variables together with IUCN predictors. Random Forest appeared as the best method for most species groups, and feature selection was shown to improve results.

CloseRead Abstract

2022

Data-Driven Predictive Maintenance

Authors
Gama, J; Ribeiro, RP; Veloso, B;

Publication
IEEE INTELLIGENT SYSTEMS

Abstract