Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CESE

2023

Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Authors
Oliveira, F; Carneiro, D; Guimaraes, M; Oliveira, O; Novais, P;

Publication
INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS

Abstract
As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size.

2023

Selection of Replicas with Predictions of Resources Consumption

Authors
Monteiro, J; Oliveira, Ó; Carneiro, D;

Publication
Lecture Notes in Networks and Systems

Abstract

2023

Predicting and explaining absenteeism risk in hospital patients before and during COVID-19

Authors
Borges, A; Carvalho, M; Maia, M; Guimaraes, M; Carneiro, D;

Publication
SOCIO-ECONOMIC PLANNING SCIENCES

Abstract
In order to address one of the most challenging problems in hospital management - patients' absenteeism without prior notice - this study analyses the risk factors associated with this event. To this end, through real data from a hospital located in the North of Portugal, a prediction model previously validated in the literature is used to infer absenteeism risk factors, and an explainable model is proposed, based on a modified CART algorithm. The latter intends to generate a human-interpretable explanation for patient absenteeism, and its implementation is described in detail. Furthermore, given the significant impact, the COVID-19 pandemic had on hospital management, a comparison between patients' profiles upon absenteeism before and during the COVID-19 pandemic situation is performed. Results obtained differ between hospital specialities and time periods meaning that patient profiles on absenteeism change during pandemic periods and within specialities.

2023

Predicting Model Training Time to Optimize Distributed Machine Learning Applications

Authors
Guimaraes, M; Carneiro, D; Palumbo, G; Oliveira, F; Oliveira, O; Alves, V; Novais, P;

Publication
ELECTRONICS

Abstract
Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs-a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster's computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.

2023

Teaching Data Structures and Algorithms Through Games

Authors
Carneiro, D; Carvalho, M;

Publication
METHODOLOGIES AND INTELLIGENT SYSTEMS FOR TECHNOLOGY ENHANCED LEARNING

Abstract
Computer Science degrees are often seen as challenging by students, especially in what concerns subjects such as programming, data structures or algorithms. Many reasons can be pointed out for this, some of which related to the abstract nature of these subjects and the lack of previous related knowledge by the students. In this paper we tackle this challenge using gamification in the teaching/learning process, with two main goals in mind. The first is to increase the intrinsic motivation of students to learn, by making the whole process more fun, enjoyable and competitive. The second is to facilitate the learning process by providing intuitive tools for the visualization of data structures and algorithmic output, together with a tool for automated assessment that decreases the dependence on the teacher and allows them to work more autonomously. We validated this approach over the course of three academic years in a Computer Science degree of the Polytechnic of Porto, Portugal, through the use of a questionnaire. Results show that the effects of using games and game elements have a generally positive effect on motivation and on the overall learning process.

2023

Using meta-learning to predict performance metrics in machine learning problems

Authors
Carneiro, D; Guimaraes, M; Carvalho, M; Novais, P;

Publication
EXPERT SYSTEMS

Abstract
Machine learning has been facing significant challenges over the last years, much of which stem from the new characteristics of machine learning problems, such as learning from streaming data or incorporating human feedback into existing datasets and models. In these dynamic scenarios, data change over time and models must adapt. However, new data do not necessarily mean new patterns. The main goal of this paper is to devise a method to predict a model's performance metrics before it is trained, in order to decide whether it is worth it to train it or not. That is, will the model hold significantly better results than the current one? To address this issue, we propose the use of meta-learning. Specifically, we evaluate two different meta-models, one built for a specific machine learning problem, and another built based on many different problems, meant to be a generic meta-model, applicable to virtually any problem. In this paper, we focus only on the prediction of the root mean square error (RMSE). Results show that it is possible to accurately predict the RMSE of future models, event in streaming scenarios. Moreover, results also show that it is possible to reduce the need for re-training models between 60% and 98%, depending on the problem and on the threshold used.

  • 27
  • 212