Publicacoes - INESC TEC

Publicações

Publicações por Carlos Manuel Soares

2018

Smart energy management as a means towards improved energy efficiency

Autores
Lindert, Dt; de Sá, CR; Soares, C; Knobbe, AJ;

Publicação
CoRR

Abstract

2019

Preference rules for label ranking: Mining patterns in multi-target relations

Autores
de Sá, CR; Azevedo, PJ; Soares, C; Jorge, AM; Knobbe, AJ;

Publicação
CoRR

Abstract

2023

Exploring the Reduction of Configuration Spaces of Workflows

Autores
Freitas, F; Brazdil, P; Soares, C;

Publicação
Discovery Science - 26th International Conference, DS 2023, Porto, Portugal, October 9-11, 2023, Proceedings

Abstract
Many current AutoML platforms include a very large space of alternatives (the configuration space) that make it difficult to identify the best alternative for a given dataset. In this paper we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best workflow. We empirically validate the method on a set of workflows that include four ML algorithms (SVM, RF, LogR and LD) with different sets of hyperparameters. Our results show that it is possible to reduce the given space by more than one order of magnitude, from a few thousands to tens of workflows, while the risk that the best workflow is eliminated is nearly zero. The system after reduction is about one order of magnitude faster than the original one, but still maintains the same predictive accuracy and loss. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

FecharLer Abstract

2023

Federated Learning for Computer-Aided Diagnosis of Glaucoma Using Retinal Fundus Images

Autores
Baptista, T; Soares, C; Oliveira, T; Soares, F;

Publicação
APPLIED SCIENCES-BASEL

Abstract
Deep learning approaches require a large amount of data to be transferred to centralized entities. However, this is often not a feasible option in healthcare, as it raises privacy concerns over sharing sensitive information. Federated Learning (FL) aims to address this issue by allowing machine learning without transferring the data to a centralized entity. FL has shown great potential to ensure privacy in digital healthcare while maintaining performance. Despite this, there is a lack of research on the impact of different types of data heterogeneity on the results. In this study, we research the robustness of various FL strategies on different data distributions and data quality for glaucoma diagnosis using retinal fundus images. We use RetinaQualEvaluator to generate quality labels for the datasets and then a data distributor to achieve our desired distributions. Finally, we evaluate the performance of the different strategies on local data and an independent test dataset. We observe that federated learning shows the potential to enable high-performance models without compromising sensitive data. Furthermore, we infer that FedProx is more suitable to scenarios where the distributions and quality of the data of the participating clients is diverse with less communication cost.

FecharLer Abstract

2024

Systematic Analysis of the Impact of Label Noise Correction on ML Fairness

Autores
Silva, IOE; Soares, C; Sousa, I; Ghani, R;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction [20] method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction [14] can reduce discrimination the most, however, at the cost of lower predictive performance.

FecharLer Abstract

2024

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

FecharLer Abstract