Publications

Publications by Carlos Manuel Soares

2024

Detection of Covid-19 in Chest X-Ray Images Using Percolation Features and Hermite Polynomial Classification

Authors
Roberto, GF; Pereira, DC; Martins, AS; Tosta, TAA; Soares, C; Lumini, A; Rozendo, GB; Neves, LA; Nascimento, MZ;

Publication
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
Covid-19 is a serious disease caused by the Sars-CoV-2 virus that has been first reported in China at late 2019 and has rapidly spread around the world. As the virus affects mostly the lungs, chest X-rays are one of the safest and most accessible ways of diagnosing the infection. In this paper, we propose the use of an approach for detecting Covid-19 in chest X-ray images through the extraction and classification of local and global percolation-based features. The method was applied in two datasets: one containing 2,002 segmented samples split into two classes (Covid-19 and Healthy); and another containing 1,125 non-segmented samples split into three classes (Covid-19, Healthy and Pneumonia). The 48 obtained percolation features were given as input to six different classifiers and then AUC and accuracy values were evaluated. We employed the 10-fold cross-validation method and evaluated the lesion sub-types with binary and multiclass classification using the Hermite Polynomial classifier, which had never been employed in this context. This classifier provided the best overall results when compared to other five machine learning algorithms. These results based in the association of percolation features and Hermite polynomial can contribute to the detection of the lesions by supporting specialists in clinical practices.

CloseRead Abstract

2023

Machine Learning Data Markets: Evaluating the Impact of Data Exchange on the Agent Learning Performance

Authors
Baghcheband, H; Soares, C; Reis, LP;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
In recent years, the increasing availability of distributed data has led to a growing interest in transfer learning across multiple nodes. However, local data may not be adequate to learn sufficiently accurate models, and the problem of learning from multiple distributed sources remains a challenge. To address this issue, Machine Learning Data Markets (MLDM) have been proposed as a potential solution. In MLDM, autonomous agents exchange relevant data in a cooperative relationship to improve their models. Previous research has shown that data exchange can lead to better models, but this has only been demonstrated with only two agents. In this paper, we present an extended evaluation of a simple version of the MLDM framework in a collaborative scenario. Our experiments show that data exchange has the potential to improve learning performance, even in a simple version of MLDM. The findings conclude that there exists a direct correlation between the number of agents and the gained performance, while an inverse correlation was observed between the performance and the data batch sizes. The results of this study provide important insights into the effectiveness of MLDM and how it can be used to improve learning performance in distributed systems. By increasing the number of agents, a more efficient system can be achieved, while larger data batch sizes can decrease the global performance of the system. These observations highlight the importance of considering both the number of agents and the data batch sizes when designing distributed learning systems using the MLDM framework.

CloseRead Abstract

2023

tsMorph: generation of semi-synthetic time series to understand algorithm performance

Authors
dos Santos, MR; de Carvalho, ACPLF; Soares, C;

Publication
CoRR

Abstract

2024

Time Series Data Augmentation as an Imbalanced Learning Problem

Authors
Cerqueira, V; Moniz, N; Inácio, R; Soares, C;

Publication
Progress in Artificial Intelligence - 23rd EPIA Conference on Artificial Intelligence, EPIA 2024, Viana do Castelo, Portugal, September 3-6, 2024, Proceedings, Part II

Abstract
Recent state-of-the-art forecasting methods are trained on collections of time series. These methods, often referred to as global models, can capture common patterns in different time series to improve their generalization performance. However, they require large amounts of data that might not be available. Moreover, global models may fail to capture relevant patterns unique to a particular time series. In these cases, data augmentation can be useful to increase the sample size of time series datasets. The main contribution of this work is a novel method for generating univariate time series synthetic samples. Our approach stems from the insight that the observations concerning a particular time series of interest represent only a small fraction of all observations. In this context, we frame the problem of training a forecasting model as an imbalanced learning task. Oversampling strategies are popular approaches used to handle the imbalance problem in machine learning. We use these techniques to create synthetic time series observations and improve the accuracy of forecasting models. We carried out experiments using 7 different databases that contain a total of 5502 univariate time series. We found that the proposed solution outperforms both a global and a local model, thus providing a better trade-off between these two approaches. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

CloseRead Abstract

2024

Association of Grad-CAM, LIME and Multidimensional Fractal Techniques for the Classification of H&E Images

Authors
Lopes, TRS; Roberto, GF; Soares, C; Tosta, TAA; Silva, AB; Loyola, AM; Cardoso, SV; de Faria, PR; do Nascimento, MZ; Neves, LA;

Publication
Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2024, Volume 2: VISAPP, Rome, Italy, February 27-29, 2024.

Abstract
In this work, a method based on the use of explainable artificial intelligence techniques with multiscale and multidimensional fractal techniques is presented in order to investigate histological images stained with Hematoxylin-Eosin. The CNN GoogLeNet neural activation patterns were explored, obtained from the gradient-weighted class activation mapping and locally-interpretable model-agnostic explanation techniques. The feature vectors were generated with multiscale and multidimensional fractal techniques, specifically fractal dimension, lacunarity and percolation. The features were evaluated by ranking each entry, using the ReliefF algorithm. The discriminative power of each solution was defined via classifiers with different heuristics. The best results were obtained from LIME, with a significant increase in accuracy and AUC rates when compared to those provided by GoogLeNet. The details presented here can contribute to the development of models aimed at the classification of histological images. © 2024 by SCITEPRESS – Science and Technology Publications, Lda.

CloseRead Abstract

2021

Preface

Authors
Soares C.; Torgo L.;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract