Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2017

Influence of data distribution in missing data imputation

Autores
Santos M.S.; Soares J.P.; Abreu P.H.; Araújo H.; Santos J.;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Dealing with missing data is a crucial step in the preprocessing stage of most data mining projects. Especially in healthcare contexts, addressing this issue is fundamental, since it may result in keeping or loosing critical patient information that can help physicians in their daily clinical practice. Over the years, many researchers have addressed this problem, basing their approach on the implementation of a set of imputation techniques and evaluating their performance in classification tasks. These classic approaches, however, do not consider some intrinsic data information that could be related to the performance of those algorithms, such as features’ distribution. Establishing a correspondence between data distribution and the most proper imputation method avoids the need of repeatedly testing a large set of methods, since it provides a heuristic on the best choice for each feature in the study. The goal of this work is to understand the relationship between data distribution and the performance of well-known imputation techniques, such as Mean, Decision Trees, k-Nearest Neighbours, Self-Organizing Maps and Support Vector Machines imputation. Several publicly available datasets, all complete, were selected attending to several characteristics such as number of distributions, features and instances. Missing values were artificially generated at different percentages and the imputation methods were evaluated in terms of Predictive and Distributional Accuracy. Our findings show that there is a relationship between features’ distribution and algorithms’ performance, although some factors must be taken into account, such as the number of features per distribution and the missing rate at state.

2017

HCC Survival

Autores
Santos, MS; Abreu, PH; García Laencina, PJ; Simão, A; Carvalho, A;

Publicação

Abstract

2017

Agents and Multi-Agent Systems for Health Care - 10th International Workshop, A2HC 2017, São Paulo, Brazil, May 8, 2017, and International Workshop, A-HEALTH 2017, Porto, Portugal, June 21, 2017, Revised and Extended Selected Papers

Autores
Montagna, S; Abreu, PH; Giroux, S; Schumacher, MI;

Publicação
A2HC@AAMAS/A-HEALTH@PAAMS

Abstract

2017

Influence of Data Distribution in Missing Data Imputation

Autores
Santos, MS; Soares, JP; Abreu, PH; Araújo, H; Santos, JAM;

Publicação
Artificial Intelligence in Medicine - 16th Conference on Artificial Intelligence in Medicine, AIME 2017, Vienna, Austria, June 21-24, 2017, Proceedings

Abstract

2017

On modifying the temporal modeling of HSMMs for pediatric heart sound segmentation

Autores
Oliveira, J; Mantadelis, T; Renna, F; Gomes, P; Coimbra, M;

Publicação
2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS)

Abstract
Heart sounds are difficult to interpret because a) they are composed by several different sounds, all contained in very tight time windows; b) they vary from physiognomy even if the show similar characteristics; c) human ears are not naturally trained to recognize heart sounds. Computer assisted decision systems may help but they require robust signal processing algorithms. In this paper, we use a real life dataset in order to compare the performance of a hidden Markov model and several hidden semi Markov models that used the Poisson, Gaussian, Gamma distributions, as well as a non-parametric probability mass function to model the sojourn time. Using a subject dependent approach, a model that uses the Poisson distribution as an approximation for the sojourn time is shown to outperform all other models. This model was able to recreate the "true" state sequence with a positive predictability per state of 96%. Finally, we used a conditional distribution in order to compute the confidence of our classifications. By using the proposed confidence metric, we were able to identify wrong classifications and boost our system (in average) from an approximate to 83% up to approximate to 90% of positive predictability per sample.

2017

A Data-Driven Feature Extraction Method for Enhanced Phonocardiogram Segmentation

Autores
Renna, F; Oliveira, J; Coimbra, MT;

Publicação
2017 COMPUTING IN CARDIOLOGY (CINC)

Abstract
In this work, we present a method to extract features from heart sound signals in order to enhance segmentation performance. The approach is data-driven, since the way features are extracted from the recorded signals is adapted to the data itself. The proposed method is based on the extraction of delay vectors, which are modeled with Gaussian mixture model priors, and an information-theoretic dimensionality reduction step which aims to maximize discrimination between delay vectors in different segments of the heart sound signal. We test our approach with heart sounds from the publicly available PhysioNet dataset showing an average F-1 score of 92.6% in detecting S-1 and S-2 sounds.

  • 270
  • 504