Publications

Publications by Davide Rua Carneiro

2023

Predicting and explaining absenteeism risk in hospital patients before and during COVID-19

Authors
Borges, A; Carvalho, M; Maia, M; Guimaraes, M; Carneiro, D;

Publication
SOCIO-ECONOMIC PLANNING SCIENCES

Abstract
In order to address one of the most challenging problems in hospital management - patients' absenteeism without prior notice - this study analyses the risk factors associated with this event. To this end, through real data from a hospital located in the North of Portugal, a prediction model previously validated in the literature is used to infer absenteeism risk factors, and an explainable model is proposed, based on a modified CART algorithm. The latter intends to generate a human-interpretable explanation for patient absenteeism, and its implementation is described in detail. Furthermore, given the significant impact, the COVID-19 pandemic had on hospital management, a comparison between patients' profiles upon absenteeism before and during the COVID-19 pandemic situation is performed. Results obtained differ between hospital specialities and time periods meaning that patient profiles on absenteeism change during pandemic periods and within specialities.

CloseRead Abstract

2023

Predicting Model Training Time to Optimize Distributed Machine Learning Applications

Authors
Guimaraes, M; Carneiro, D; Palumbo, G; Oliveira, F; Oliveira, O; Alves, V; Novais, P;

Publication
ELECTRONICS

Abstract
Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs-a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster's computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.

CloseRead Abstract

2023

Teaching Data Structures and Algorithms Through Games

Authors
Carneiro, D; Carvalho, M;

Publication
METHODOLOGIES AND INTELLIGENT SYSTEMS FOR TECHNOLOGY ENHANCED LEARNING

Abstract
Computer Science degrees are often seen as challenging by students, especially in what concerns subjects such as programming, data structures or algorithms. Many reasons can be pointed out for this, some of which related to the abstract nature of these subjects and the lack of previous related knowledge by the students. In this paper we tackle this challenge using gamification in the teaching/learning process, with two main goals in mind. The first is to increase the intrinsic motivation of students to learn, by making the whole process more fun, enjoyable and competitive. The second is to facilitate the learning process by providing intuitive tools for the visualization of data structures and algorithmic output, together with a tool for automated assessment that decreases the dependence on the teacher and allows them to work more autonomously. We validated this approach over the course of three academic years in a Computer Science degree of the Polytechnic of Porto, Portugal, through the use of a questionnaire. Results show that the effects of using games and game elements have a generally positive effect on motivation and on the overall learning process.

CloseRead Abstract

2023

Using meta-learning to predict performance metrics in machine learning problems

Authors
Carneiro, D; Guimaraes, M; Carvalho, M; Novais, P;

Publication
EXPERT SYSTEMS

Abstract
Machine learning has been facing significant challenges over the last years, much of which stem from the new characteristics of machine learning problems, such as learning from streaming data or incorporating human feedback into existing datasets and models. In these dynamic scenarios, data change over time and models must adapt. However, new data do not necessarily mean new patterns. The main goal of this paper is to devise a method to predict a model's performance metrics before it is trained, in order to decide whether it is worth it to train it or not. That is, will the model hold significantly better results than the current one? To address this issue, we propose the use of meta-learning. Specifically, we evaluate two different meta-models, one built for a specific machine learning problem, and another built based on many different problems, meant to be a generic meta-model, applicable to virtually any problem. In this paper, we focus only on the prediction of the root mean square error (RMSE). Results show that it is possible to accurately predict the RMSE of future models, event in streaming scenarios. Moreover, results also show that it is possible to reduce the need for re-training models between 60% and 98%, depending on the problem and on the threshold used.

CloseRead Abstract

2021

A Meta-Learning Approach to Error Prediction

Authors
Guimaraes, M; Carneiro, D;

Publication
PROCEEDINGS OF 2021 16TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2021)

Abstract
Machine Learning is one of the most trending topics nowadays. The reason is of course for being more and more present in our everyday life, even if we do not notice it. What goes even more unnoticed is the fact that every Machine Learning model needs computational power. And of course, it also needs data. But how many data are necessary to build the best Machine Learning model possible, and how many times do we need to retrain a model so that it does not become obsolete as data change? That kind of questions are the ones that can reduce unnecessary costs to a company. In this paper we propose a novel approach to predict the performance of a model given some characteristics of the data, that are called meta-features. The goal is, indeed, to only train a new model when some error metric (e.g., RMSE) is expected to decrease substantially compared with a previously trained model. This approach is best applied in scenarios of data streaming or in Big Data, as well on Interactive Machine Learning scenarios. We validate it on a real Fraud Detection case and this scenario is also briefly described.

CloseRead Abstract

2021

Optimization of the grapes reception process

Authors
Carneiro, D; Pereira, J; Silva, ECE;

Publication
NEURAL COMPUTING & APPLICATIONS

Abstract
Grapes reception is a key process in wine production. The harvest days are extremely challenging days in managing the reception of the grapes, as the winery needs to deal with the non-uniform arrival of the grapes, while guaranteeing suppliers' satisfaction and wine quality. The best management of the resources of the suppliers (i.e., grapes and trucks) and winery (i.e., grain-tanks and pressing machines) must be ensured. In this paper, the underlying optimization problem for grape reception is solved by developing a genetic algorithm (GA) tailored for this specific challenge. The results of this algorithm are compared with a FIFO policy for a typical scenario that occurs on the harvest days of a real winery. Additionally, different scenarios are simulated to assess the validity and quality of the solutions found. The results show that, using modest computational resources, it is possible to achieve better solutions with the proposed GA. This allows for the algorithm to be used in real time, even whenever plant conditions change significantly (e.g., when a new truck arrives, when a machine fails). Furthermore, the trucks and grapes waiting time for the results using the developed GA are significantly smaller than the ones observed using a FIFO approach.

CloseRead Abstract