Publicacoes - INESC TEC

Publicações

Publicações por Pedro Henriques Abreu

2020

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Autores
Pereira, RC; Santos, MS; Rodrigues, PP; Abreu, PH;

Publicação
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH

Abstract
Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

FecharLer Abstract

2021

Advances in Intelligent Data Analysis XIX - 19th International Symposium on Intelligent Data Analysis, IDA 2021, Porto, Portugal, April 26-28, 2021, Proceedings

Autores
Abreu, PH; Rodrigues, PP; Fernández, A; Gama, J;

Publicação
IDA

Abstract

2021

Artificial Intelligence in Medicine - 19th International Conference on Artificial Intelligence in Medicine, AIME 2021, Virtual Event, June 15-18, 2021, Proceedings

Autores
Tucker, A; Abreu, PH; Cardoso, JS; Rodrigues, PP; Riaño, D;

Publicação
AIME

Abstract

2021

Preface

Autores
Abreu, PH; Rodrigues, PP; Fernández, A; Gama, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2022

On the joint-effect of class imbalance and overlap: a critical review

Autores
Santos, MS; Abreu, PH; Japkowicz, N; Fernandez, A; Soares, C; Wilk, S; Santos, J;

Publicação
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Current research on imbalanced data recognises that class imbalance is aggravated by other data intrinsic characteristics, among which class overlap stands out as one of the most harmful. The combination of these two problems creates a new and difficult scenario for classification tasks and has been discussed in several research works over the past two decades. In this paper, we argue that despite some insightful information can be derived from related research, the joint-effect of class overlap and imbalance is still not fully understood, and advocate for the need to move towards a unified view of the class overlap problem in imbalanced domains. To that end, we start by performing a thorough analysis of existing literature on the joint-effect of class imbalance and overlap, elaborating on important details left undiscussed on the original papers, namely the impact of data domains with different characteristics and the behaviour of classifiers with distinct learning biases. This leads to the hypothesis that class overlap comprises multiple representations, which are important to accurately measure and analyse in order to provide a full characterisation of the problem. Accordingly, we devise two novel taxonomies, one for class overlap measures and the other for class overlap-based approaches, both resonating with the distinct representations of class overlap identified. This paper therefore presents a global and unique view on the joint-effect of class imbalance and overlap, from precursor work to recent developments in the field. It meticulously discusses some concepts taken as implicit in previous research, explores new perspectives in light of the limitations found, and presents new ideas that will hopefully inspire researchers to move towards a unified view on the problem and the development of suitable strategies for imbalanced and overlapped domains.

FecharLer Abstract

2022

Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data

Autores
Pereira, RC; Abreu, PH; Rodrigues, PP;

Publicação
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS

Abstract
Missing data can pose severe consequences in critical contexts, such as clinical research based on routinely collected healthcare data. This issue is usually handled with imputation strategies, but these tend to produce poor and biased results under the Missing Not At Random (MNAR) mechanism. A recent trend that has been showing promising results for MNAR is the use of generative models, particularly Variational Autoencoders. However, they have a limitation: the imputed values are the result of a single sample, which can be biased. To tackle it, an extension to the Variational Autoencoder that uses a partial multiple imputation procedure is introduced in this work. The proposed method was compared to 8 state-of-the-art imputation strategies, in an experimental setup with 34 datasets from the medical context, injected with the MNAR mechanism (10% to 80% rates). The results were evaluated through the Mean Absolute Error, with the new method being the overall best in 71% of the datasets, significantly outperforming the remaining ones, particularly for high missing rates. Finally, a case study of a classification task with heart failure data was also conducted, where this method induced improvements in 50% of the classifiers.

FecharLer Abstract