2023
Authors
Portela, D; Amaral, R; Rodrigues, PP; Freitas, A; Costa, E; Fonseca, JA; Sousa Pinto, B;
Publication
HEALTH INFORMATION MANAGEMENT JOURNAL
Abstract
Background Quantifying and dealing with lack of consistency in administrative databases (namely, under-coding) requires tracking patients longitudinally without compromising anonymity, which is often a challenging task. Objective This study aimed to (i) assess and compare different hierarchical clustering methods on the identification of individual patients in an administrative database that does not easily allow tracking of episodes from the same patient; (ii) quantify the frequency of potential under-coding; and (iii) identify factors associated with such phenomena. Method We analysed the Portuguese National Hospital Morbidity Dataset, an administrative database registering all hospitalisations occurring in Mainland Portugal between 2011-2015. We applied different approaches of hierarchical clustering methods (either isolated or combined with partitional clustering methods), to identify potential individual patients based on demographic variables and comorbidities. Diagnoses codes were grouped into the Charlson an Elixhauser comorbidity defined groups. The algorithm displaying the best performance was used to quantify potential under-coding. A generalised mixed model (GML) of binomial regression was applied to assess factors associated with such potential under-coding. Results We observed that the hierarchical cluster analysis (HCA) + k-means clustering method with comorbidities grouped according to the Charlson defined groups was the algorithm displaying the best performance (with a Rand Index of 0.99997). We identified potential under-coding in all Charlson comorbidity groups, ranging from 3.5% (overall diabetes) to 27.7% (asthma). Overall, being male, having medical admission, dying during hospitalisation or being admitted at more specific and complex hospitals were associated with increased odds of potential under-coding. Discussion We assessed several approaches to identify individual patients in an administrative database and, subsequently, by applying HCA + k-means algorithm, we tracked coding inconsistency and potentially improved data quality. We reported consistent potential under-coding in all defined groups of comorbidities and potential factors associated with such lack of completeness. Conclusion Our proposed methodological framework could both enhance data quality and act as a reference for other studies relying on databases with similar problems.
2023
Authors
Portela, D; Rodrigues, PP; Freitas, A; Costa, E; Bousquet, J; Fonseca, JA; Pinto, BS;
Publication
JOURNAL OF ASTHMA
Abstract
Background: Most previous studies assessing multimorbidity in asthma assessed the frequency of individual comorbid diseases. Objective: We aimed to assess the frequency and clinical and economic impact of co-occurring groups of comorbidities (comorbidity patterns using the Charlson Comorbidity Index) on asthma hospitalizations. Methods: We assessed the dataset containing a registration of all Portuguese hospitalizations between 2011-2015. We applied three different approaches (regression models, association rule mining, and decision trees) to assess both the frequency and impact of comorbidities patterns in the length-of-stay, in-hospital mortality and hospital charges. For each approach, separate analyses were performed for episodes with asthma as main and as secondary diagnosis. Separate analyses were performed by participants' age group. Results: We assessed 198340 hospitalizations in patients >18 years old. Both in hospitalizations with asthma as main or secondary diagnosis, combinations of diseases involving cancer, metastasis, cerebrovascular disease, hemiplegia/paraplegia, and liver disease displayed a relevant clinical and economic burden. In hospitalizations having asthma as a secondary diagnosis, we identified several comorbidity patterns involving asthma and associated with increased length-of-stay (average impact of 1.3 [95%CI=0.6-2.0]-3.2 [95%CI=1.8-4.6] additional days), in-hospital mortality (OR range=1.4 [95%CI=1.0-2.0]-7.9 [95%CI=2.6-23.5]) and hospital charges (average additional charges of 351.0 [95%CI=219.1-482.8] to 1470.8 [95%CI=1004.6-1937.0]) Euro compared with hospitalizations without any registered Charlson comorbidity). Consistent results were observed with association rules mining and decision tree approaches. Conclusions: Our findings highlight the importance not only of a complete assessment of patients with asthma, but also of considering the presence of asthma in patients admitted by other diseases, as it may have a relevant impact on clinical and health services outcomes.
2024
Authors
Pereira, RC; Abreu, PH; Rodrigues, PP;
Publication
JOURNAL OF COMPUTATIONAL SCIENCE
Abstract
Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real -world domains such as healthcare. One of the most common strategies to address this issue is to perform imputation, where the missing values are replaced by estimates. Several approaches based on statistics and machine learning techniques have been proposed for this purpose, including deep learning architectures such as generative adversarial networks and autoencoders. In this work, we propose a novel siamese neural network suitable for missing data imputation, which we call Siamese Autoencoder-based Approach for Imputation (SAEI). Besides having a deep autoencoder architecture, SAEI also has a custom loss function and triplet mining strategy that are tailored for the missing data issue. The proposed SAEI approach is compared to seven state-of-the-art imputation methods in an experimental setup that comprises 14 heterogeneous datasets of the healthcare domain injected with Missing Not At Random values at a rate between 10% and 60%. The results show that SAEI significantly outperforms all the remaining imputation methods for all experimented settings, achieving an average improvement of 35%. This work is an extension of the article Siamese Autoencoder-Based Approach for Missing Data Imputation [1] presented at the International Conference on Computational Science 2023. It includes new experiments focused on runtime, generalization capabilities, and the impact of the imputation in classification tasks, where the results show that SAEI is the imputation method that induces the best classification results, improving the F1 scores for 50% of the used datasets.
2024
Authors
Pereira, RC; Abreu, PH; Rodrigues, PP; Figueiredo, MAT;
Publication
EXPERT SYSTEMS WITH APPLICATIONS
Abstract
Experimental assessment of different missing data imputation methods often compute error rates between the original values and the estimated ones. This experimental setup relies on complete datasets that are injected with missing values. The injection process is straightforward for the Missing Completely At Random and Missing At Random mechanisms; however, the Missing Not At Random mechanism poses a major challenge, since the available artificial generation strategies are limited. Furthermore, the studies focused on this latter mechanism tend to disregard a comprehensive baseline of state-of-the-art imputation methods. In this work, both challenges are addressed: four new Missing Not At Random generation strategies are introduced and a benchmark study is conducted to compare six imputation methods in an experimental setup that covers 10 datasets and five missingness levels (10% to 80%). The overall findings are that, for most missing rates and datasets, the best imputation method to deal with Missing Not At Random values is the Multiple Imputation by Chained Equations, whereas for higher missingness rates autoencoders show promising results.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.