2020
Autores
Cardoso, JS; Nguyen, HV; Heller, N; Abreu, PH; Isgum, I; Silva, W; Cruz, R; Amorim, JP; Patel, V; Roysam, B; Zhou, SK; Jiang, SB; Le, N; Luu, K; Sznitman, R; Cheplygina, V; Mateus, D; Trucco, E; Sureshjani, SA;
Publicação
Interpretable and Annotation-Efficient Learning for Medical Image Computing - Third International Workshop, iMIMIC 2020, Second International Workshop, MIL3ID 2020, and 5th International Workshop, LABELS 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4-8, 2020, Proceedings
Abstract
2020
Autores
Pereira, RC; Abreu, PH; Rodrigues, PP;
Publicação
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
Abstract
The missing data issue is often found in real-world datasets and it is usually handled with imputation strategies that replace the missing values with new data. Recently, generative models such as Variational Autoencoders have been applied for this imputation task. However, they were always used to perform the entire imputation, which has presented limited results when comparing to other state-of-the-art methods. In this work, a new approach called Variational Autoencoder Filter for Bayesian Ridge Imputation is introduced. It uses a Variational Autoencoder at the beginning of the imputation pipeline to filter the instances that are later fitted to a Bayesian ridge regression used to predict the new values. The approach was compared to four state-of-the-art imputation methods using 10 datasets from the healthcare context covering clinical trials, all injected with missing values under different rates. The proposed approach significantly outperformed the remaining methods in all settings, achieving an overall improvement between 26% and 67%.
2019
Autores
Pereira, RC; Santos, MS; Rodrigues, PP; Abreu, PH;
Publicação
Progress in Artificial Intelligence, 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3-6, 2019, Proceedings, Part II.
Abstract
Missing data is a problem found in real-world datasets that has a considerable impact on the learning process of classifiers. Although extensive work has been done in this field, the MNAR mechanism still remains a challenge for the existing imputation methods, mainly because it is not related with any observed information. Focusing on healthcare contexts, MNAR is present in multiple scenarios such as clinical trials where the participants may be quitting the study for reasons related to the outcome that is being measured. This work proposes an approach that uses different sources of information from the same healthcare context to improve the imputation quality and classification performance for datasets with missing data under MNAR. The experiment was performed with several databases from the medical context and the results show that the use of multiple sources of data has a positive impact in the imputation error and classification performance. © 2019, Springer Nature Switzerland AG.
2020
Autores
Pereira, RC; Santos, JC; Amorim, JP; Rodrigues, PP; Abreu, PH;
Publicação
28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2020, Bruges, Belgium, October 2-4, 2020
Abstract
Missing data is an issue often addressed with imputation strategies that replace the missing values with plausible ones. A trend in these strategies is the use of generative models, one being Variational Autoencoders. However, the default loss function of this method gives the same importance to all data, while a more suitable solution should focus on the missing values. In this work an extension of this method with a custom loss function is introduced (Variational Autoencoder with Weighted Loss). The method was compared with state-of-the-art generative models and the results showed improvements higher than 40% in several settings. © ESANN 2020 - Proceedings, 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
2020
Autores
Pereira, RC; Santos, MS; Rodrigues, PP; Abreu, PH;
Publicação
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
Abstract
Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.
2021
Autores
Abreu, PH; Rodrigues, PP; Fernández, A; Gama, J;
Publicação
IDA
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.