Publications

Publications by Carlos Baquero

2023

Consistent comparison of symptom-based methods for COVID-19 infection detection

Authors
Rufino, J; Ramirez, JM; Aguilar, J; Baquero, C; Champati, J; Frey, D; Lillo, RE; Fernandez Anta, A;

Publication
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS

Abstract
Background: During the global pandemic crisis, various detection methods of COVID-19-positive cases based on self-reported information were introduced to provide quick diagnosis tools for effectively planning and managing healthcare resources. These methods typically identify positive cases based on a particular combination of symptoms, and they have been evaluated using different datasets.Purpose: This paper presents a comprehensive comparison of various COVID-19 detection methods based on self-reported information using the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS), a large health surveillance platform, which was launched in partnership with Facebook.Methods: Detection methods were implemented to identify COVID-19-positive cases among UMD-CTIS participants reporting at least one symptom and a recent antigen test result (positive or negative) for six countries and two periods. Multiple detection methods were implemented for three different categories: rule-based approaches, logistic regression techniques, and tree-based machine-learning models. These methods were evaluated using different metrics including F1-score, sensitivity, specificity, and precision. An explainability analysis has also been conducted to compare methods.Results: Fifteen methods were evaluated for six countries and two periods. We identify the best method for each category: rule-based methods (F1-score: 51.48% -71.11%), logistic regression techniques (F1-score: 39.91% -71.13%), and tree-based machine learning models (F1-score: 45.07% -73.72%). According to the explainability analysis, the relevance of the reported symptoms in COVID-19 detection varies between countries and years. However, there are two variables consistently relevant across approaches: stuffy or runny nose, and aches or muscle pain.Conclusions: Regarding the categories of detection methods, evaluating detection methods using homogeneous data across countries and years provides a solid and consistent comparison. An explainability analysis of a tree-based machine-learning model can assist in identifying infected individuals specifically based on their relevant symptoms. This study is limited by the self-report nature of data, which cannot replace clinical diagnosis.

CloseRead Abstract

2024

Performance and explainability of feature selection-boosted tree-based classifiers for COVID-19 detection

Authors
Rufino, J; Ramírez, JM; Aguilar, J; Baquero, C; Champati, J; Frey, D; Lillo, RE; Fernández Anta, A;

Publication
HELIYON

Abstract
In this paper, we evaluate the performance and analyze the explainability of machine learning models boosted by feature selection in predicting COVID-19-positive cases from self-reported information. In essence, this work describes a methodology to identify COVID-19 infections that considers the large amount of information collected by the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS). More precisely, this methodology performs a feature selection stage based on the recursive feature elimination (RFE) method to reduce the number of input variables without compromising detection accuracy. A tree-based supervised machine learning model is then optimized with the selected features to detect COVID-19-active cases. In contrast to previous approaches that use a limited set of selected symptoms, the proposed approach builds the detection engine considering a broad range of features including self-reported symptoms, local community information, vaccination acceptance, and isolation measures, among others. To implement the methodology, three different supervised classifiers were used: random forests (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). Based on data collected from the UMD-CTIS, we evaluated the detection performance of the methodology for four countries (Brazil, Canada, Japan, and South Africa) and two periods (2020 and 2021). The proposed approach was assessed in terms of various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under the ROC curve (AUC). This work also shows the normalized daily incidence curves obtained by the proposed approach for the four countries. Finally, we perform an explainability analysis using Shapley values and feature importance to determine the relevance of each feature and the corresponding contribution for each country and each country/year.

CloseRead Abstract Read Full Publication

2024

Pondering the Ugly Underbelly, and Whether Images Are Real

Authors
Hill, RK; Baquero, C;

Publication
Commun. ACM

Abstract
[No abstract available]

CloseRead Abstract Read Full Publication

2025

Social Compliance With NPIs, Mobility Patterns, and Reproduction Number: Lessons From COVID-19 in Europe

Authors
Baccega, D; Aguilar, J; Baquero, C; Anta, AF; Ramirez, JM;

Publication
IEEE ACCESS

Abstract
Non-pharmaceutical interventions (NPIs), such as lockdowns, travel restrictions, and social distancing mandates, play a critical role in controlling the spread of infectious diseases by shaping human mobility patterns. Using COVID-19 as a case study, this research investigates the relationships between NPIs, mobility, and the effective reproduction number (R-t) across 13 European countries. We employ XGBoost regression models to estimate missing mobility data from NPIs and missing R(t )values from mobility, achieving high accuracy. Additionally, using clustering techniques, we uncover national distinctions in social compliance. Northern European countries demonstrate higher adherence to NPIs than Southern Europe, which exhibits more variability in response to restrictions. These differences highlight the influence of cultural and social norms on public health outcomes. In general, our analysis reveals a strong correlation between NPIs and mobility reductions, highlighting the immediate impact of restrictions on population movement. However, the relationship between mobility and R(t )is weaker and more nuanced, reflecting the time delays involved, as changes in mobility take time to influence transmission rates. These results underscore the interdependence of restrictions, mobility, and disease spread while demonstrating the potential for data-driven approaches to guide policy decisions. Our approach offers valuable insights for optimizing public health strategies and tailoring interventions to diverse cultural contexts during future health crises.

CloseRead Abstract

2016

Why logical clocks are easy

Authors
Baquero, C; Preguiça, N;

Publication
Queue

Abstract

2025

Distributed Generalized Linear Models: A Privacy-Preserving Approach

Authors
Tinoco, D; Menezes, R; Baquero, C;

Publication
COMPUTATIONAL STATISTICS

Abstract
This paper presents a novel approach to classical linear regression, enabling accurate model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized linear models (GLMs), ensuring scalability and adaptability to diverse data distributions while maintaining privacy-preserving properties. To assess the effectiveness of our approach, we conduct numerical studies on both simulated and real datasets, comparing our method with conventional maximum likelihood estimation for GLMs using iteratively reweighted least squares. Our results demonstrate the advantages of the proposed method in distributed and federated settings.

CloseRead Abstract