2024
Autores
Silva, IOE; Soares, C; Sousa, I; Ghani, R;
Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II
Abstract
Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction [20] method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction [14] can reduce discrimination the most, however, at the cost of lower predictive performance.
2024
Autores
Strecht, P; Mendes Moreira, J; Soares, C;
Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II
Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.
2024
Autores
Roberto, GF; Pereira, DC; Martins, AS; Tosta, TAA; Soares, C; Lumini, A; Rozendo, GB; Neves, LA; Nascimento, MZ;
Publicação
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I
Abstract
Covid-19 is a serious disease caused by the Sars-CoV-2 virus that has been first reported in China at late 2019 and has rapidly spread around the world. As the virus affects mostly the lungs, chest X-rays are one of the safest and most accessible ways of diagnosing the infection. In this paper, we propose the use of an approach for detecting Covid-19 in chest X-ray images through the extraction and classification of local and global percolation-based features. The method was applied in two datasets: one containing 2,002 segmented samples split into two classes (Covid-19 and Healthy); and another containing 1,125 non-segmented samples split into three classes (Covid-19, Healthy and Pneumonia). The 48 obtained percolation features were given as input to six different classifiers and then AUC and accuracy values were evaluated. We employed the 10-fold cross-validation method and evaluated the lesion sub-types with binary and multiclass classification using the Hermite Polynomial classifier, which had never been employed in this context. This classifier provided the best overall results when compared to other five machine learning algorithms. These results based in the association of percolation features and Hermite polynomial can contribute to the detection of the lesions by supporting specialists in clinical practices.
2024
Autores
Cerqueira, V; Moniz, N; Inácio, R; Soares, C;
Publicação
CoRR
Abstract
2024
Autores
Lopes, TRS; Roberto, GF; Soares, C; Tosta, TAA; Silva, AB; Loyola, AM; Cardoso, SV; de Faria, PR; do Nascimento, MZ; Neves, LA;
Publicação
Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2024, Volume 2: VISAPP, Rome, Italy, February 27-29, 2024.
Abstract
In this work, a method based on the use of explainable artificial intelligence techniques with multiscale and multidimensional fractal techniques is presented in order to investigate histological images stained with Hematoxylin-Eosin. The CNN GoogLeNet neural activation patterns were explored, obtained from the gradient-weighted class activation mapping and locally-interpretable model-agnostic explanation techniques. The feature vectors were generated with multiscale and multidimensional fractal techniques, specifically fractal dimension, lacunarity and percolation. The features were evaluated by ranking each entry, using the ReliefF algorithm. The discriminative power of each solution was defined via classifiers with different heuristics. The best results were obtained from LIME, with a significant increase in accuracy and AUC rates when compared to those provided by GoogLeNet. The details presented here can contribute to the development of models aimed at the classification of histological images. © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
2024
Autores
Baghcheband, H; Soares, C; Reis, LP;
Publicação
FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2024
Abstract
Data valuation, the process of assigning value to data based on its utility and usefulness, is a critical and largely unexplored aspect of data markets. Within the Machine Learning Data Market (MLDM), a platform that enables data exchange among multiple agents, the challenge of quantifying the value of data becomes particularly prominent. Agents within MLDM are motivated to exchange data based on its potential impact on their individual performance. Shapley Value-based methods have gained traction in addressing this challenge, prompting our study to investigate their effectiveness within the MLDM context. Specifically, we propose the Gain Data Shapley Value (GDSV) method tailored for MLDM and compare it to the original data valuation method used in MLDM. Our analysis focuses on two common learning algorithms, Decision Tree (DT) and K-nearest neighbors (KNN), within a simulated society of five agents, tested on 45 classification datasets. results show that the GDSV leads to incremental improvements in predictive performance across both DT and KNN algorithms compared to performance-based valuation or the baseline. These findings underscore the potential of Shapley Value-based methods in identifying high-value data within MLDM while indicating areas for further improvement.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.