Publicacoes - INESC TEC

Publicações

Publicações por Pedro Henriques Abreu

2017

An artificial neural networks approach for assessment treatment response in oncological patients using PET/CT images

Autores
Nogueira, MA; Abreu, PH; Martins, P; Machado, P; Duarte, H; Santos, J;

Publicação
BMC MEDICAL IMAGING

Abstract
Background: Positron Emission Tomography - Computed Tomography (PET/CT) imaging is the basis for the evaluation of response-to-treatment of several oncological diseases. In practice, such evaluation is manually performed by specialists, which is rather complex and time-consuming. Evaluation measures have been proposed, but with questionable reliability. The usage of before and after-treatment image descriptors of the lesions for treatment response evaluation is still a territory to be explored. Methods: In this project, Artificial Neural Network approaches were implemented to automatically assess treatment response of patients suffering from neuroendocrine tumors and Hodgkyn lymphoma, based on image features extracted from PET/CT. Results: The results show that the considered set of features allows for the achievement of very high classification performances, especially when data is properly balanced. Conclusions: After synthetic data generation and PCA-based dimensionality reduction to only two components, LVQNN assured classification accuracies of 100%, 100%, 96.3% and 100% regarding the 4 response- to-treatment classes.

FecharLer Abstract

2023

A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research

Autores
Santos, MS; Abreu, PH; Japkowicz, N; Fernandez, A; Santos, J;

Publicação
INFORMATION FUSION

Abstract
The combination of class imbalance and overlap is currently one of the most challenging issues in machine learning. While seminal work focused on establishing class overlap as a complicating factor for classification tasks in imbalanced domains, ongoing research mostly concerns the study of their synergy over real-word applications. However, given the lack of a well-formulated definition and measurement of class overlap in real-world domains, especially in the presence of class imbalance, the research community has not yet reached a consensus on the characterisation of both problems. This naturally complicates the evaluation of existing approaches to address these issues simultaneously and prevents future research from moving towards the devise of specialised solutions. In this work, we advocate for a unified view of the problem of class overlap in imbalanced domains. Acknowledging class overlap as the overarching problem - since it has proven to be more harmful for classification tasks than class imbalance - we start by discussing the key concepts associated to its definition, identification, and measurement in real-world domains, while advocating for a characterisation of the problem that attends to multiple sources of complexity. We then provide an overview of existing data complexity measures and establish the link to what specific types of class overlap problems these measures cover, proposing a novel taxonomy of class overlap complexity measures. Additionally, we characterise the relationship between measures, the insights they provide, and discuss to what extent they account for class imbalance. Finally, we systematise the current body of knowledge on the topic across several branches of Machine Learning (Data Analysis, Data Preprocessing, Algorithm Design, and Meta-learning), identifying existing limitations and discussing possible lines for future research.

FecharLer Abstract

2020

How distance metrics influence missing data imputation with k-nearest neighbours

Autores
Santos, MS; Abreu, PH; Wilk, S; Santos, J;

Publicação
PATTERN RECOGNITION LETTERS

Abstract
In missing data contexts, k-nearest neighbours imputation has proven beneficial since it takes advantage of the similarity between patterns to replace missing values. When dealing with heterogeneous data, researchers traditionally apply the HEOM distance, that handles continuous, nominal and missing data. Although other heterogeneous distances have been proposed, they have not yet been investigated and compared for k-nearest neighbours imputation. In this work, we study the effect of several heterogeneous distances on k-nearest neighbours imputation on a large benchmark of publicly-available datasets.

FecharLer Abstract

2020

Assessing the Impact of Distance Functions on K-Nearest Neighbours Imputation of Biomedical Datasets

Autores
Santos, MS; Abreu, PH; Wilk, S; Santos, JAM;

Publicação
Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Minneapolis, MN, USA, August 25-28, 2020, Proceedings

Abstract
In healthcare domains, dealing with missing data is crucial since absent observations compromise the reliability of decision support models. K-nearest neighbours imputation has proven beneficial since it takes advantage of the similarity between patients to replace missing values. Nevertheless, its performance largely depends on the distance function used to evaluate such similarity. In the literature, k-nearest neighbours imputation frequently neglects the nature of data or performs feature transformation, whereas in this work, we study the impact of different heterogeneous distance functions on k-nearest neighbour imputation for biomedical datasets. Our results show that distance functions considerably impact the performance of classifiers learned from the imputed data, especially when data is complex. © 2020, Springer Nature Switzerland AG.

FecharLer Abstract

2018

Cross-Validation for Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches

Autores
Santos, MS; Soares, JP; Abreu, PH; Araujo, H; Santos, J;

Publicação
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE

Abstract
Although cross-validation is a standard procedure for performance evaluation, its joint application with oversampling remains an open question for researchers farther from the imbalanced data topic. A frequent experimental flaw is the application of oversampling algorithms to the entire dataset, resulting in biased models and overly-optimistic estimates. We emphasize and distinguish overoptimism from overfitting, showing that the former is associated with the cross-validation procedure, while the latter is influenced by the chosen oversampling algorithm. Furthermore, we perform a thorough empirical comparison of well-established oversampling algorithms, supported by a data complexity analysis. The best oversampling techniques seem to possess three key characteristics: use of cleaning procedures, cluster-based example synthetization and adaptive weighting of minority examples, where Synthetic Minority Oversampling Technique coupled with Tomek Links and Majority Weighted Minority Oversampling Technique stand out, being capable of increasing the discriminative power of data.

FecharLer Abstract

2018

BI-RADS CLASSIFICATION OF BREAST CANCER: A NEW PRE-PROCESSING PIPELINE FOR DEEP MODELS TRAINING

Autores
Domingues, I; Abreu, PH; Santos, J;

Publicação
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)

Abstract
One of the main difficulties in the use of deep learning strategies in medical contexts is the training set size. While these methods need large annotated training sets, these datasets are costly to obtain in medical contexts and suffer from intra and inter-subject variability. In the present work, two new pre-processing techniques are introduced to improve a deep classifier performance. First, data augmentation based on co-registration is suggested. Then, multi-scale enhancement based on Difference of Gaussians is proposed. Results are accessed in a public mammogram database, the InBreast, in the context of an ordinal problem, the BI-RADS classification. Moreover, a pre-trained Convolutional Neural Network with the AlexNet architecture was used as a base classifier. The multi-class classification experiments show that the proposed pipeline with the Difference of Gaussians and the data augmentation technique outperforms using the original dataset only and using the original dataset augmented by mirroring the images.

FecharLer Abstract