Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Pedro Pereira Rodrigues

2008

Hierarchical clustering of time-series data streams

Autores
Rodrigues, PP; Gama, J; Pedroso, JP;

Publicação
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Abstract
This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data, using a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams. The system also uses a merge operator that reaggregates a previously split node in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters, assuming that in stationary environments, expanding the structure leads to a decrease in the diameters of the clusters. The system is designed to process thousands of data streams that flow at a high rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting a competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.

2006

ODAC: Hierarchical Clustering of Time Series Data Streams

Autores
Rodrigues, PP; Gama, J; Pedroso, JP;

Publicação
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING

Abstract
This paper presents a time series whole clustering system that incrementally constructs a tree-like hierarchy of clusters, using a top-down strategy. The Online Divisive-Agglomerative Clustering (ODAC) system uses a correlation-based dissimilarity measure between time series over a data stream and possesses an agglomerative phase to enhance a dynamic behavior capable of concept drift detection. Main features include splitting and agglomerative criteria based on the diameters of existing clusters and supported by a. significance level. At each new example, only the leaves are updated, reducing computation of unneeded dissimilarities and speeding up the process every time the structure grows. Experimental results on artificial and real data suggest competitive performance on clustering time series and show that the system is equivalent to a batch divisive clustering on stationary time series, being also capable of dealing with concept drift. With this work, we assure the possibility and importance of hierarchical incremental time series whole clustering in the data stream paradigm, presenting a. valuable and usable option.

2024

Achieving rapid and significant results in healthcare services by using the theory of constraints

Autores
Bacelar Silva, GM; Cox, JF III; Rodrigues, P;

Publicação
HEALTH SYSTEMS

Abstract
Lack of timeliness and capacity are seen as fundamental problems that jeopardise healthcare delivery systems everywhere. Many believe the shortage of medical providers is causing this timeliness problem. This action research presents how one doctor implemented the theory of constraints (TOC) to improve the throughput (quantity of patients treated) of his ophthalmology imaging practice by 64% in a few weeks with little to no expense. The five focusing steps (5FS) guided the TOC implementation - which included the drum-buffer-rope scheduling and buffer management - and occurred in a matter of days. The implementation provided significant bottom-line results almost immediately. This article explains each step of the 5FS in general terms followed by specific applications to healthcare services, as well as the detailed use in this action research. Although TOC successfully addressed the practice problems, this implementation was not sustained after the TOC champion left the organisation. However, this drawback provided valuable knowledge. The article provides insightful knowledge to help readers implement TOC in their environments to provide immediate and significant results at little to no expense.

2022

Helping early obstructive sleep apnea diagnosis with machine learning: A systematic review (Preprint)

Autores
Ferreira-Santos, D; Amorim, P; Silva Martins, T; Monteiro-Soares, M; Pereira Rodrigues, P;

Publicação

Abstract
BACKGROUND

American Academy of Sleep Medicine guidelines suggests that clinical prediction algorithms can be used to screen obstructive sleep apnea (OSA) patients without replacing polysomnography (PSG) – the gold standard.

OBJECTIVE

We aimed to identify, gather, and analyze existing machine learning approaches that are being used for disease screening in adult patients suspected of OSA.

METHODS

We searched MEDLINE, Scopus and ISI Web of Knowledge databases for evaluating the validity of different machine learning techniques, with PSG as the gold standard outcome measures. This systematic review was registered in PROSPERO under reference CRD42021221339.

RESULTS

Our search retrieved 5479 articles, of which 63 articles were included. We found 23 studies performing diagnostic models’ development alone, 26 with added internal validation, and 14 applying the clinical prediction algorithm to an independent sample (although not all reporting the most common discrimination metrics - sensitivity and/or specificity). Logistic regression was applied in 35 studies, linear regression in 16, support vector machine in 9, neural networks in 8, decision trees in 6, and Bayesian networks in 4. Random forest, discriminant analysis, classification and regression tree, and nomogram were each performed in 2 studies, while Pearson correlation, adaptative neuro-fuzzy inference system, artificial immune recognition system, genetic algorithm, supersparse linear integer models, and k-nearest neighbors’ algorithm each in 1 study. The best AUC was .98 [.96-.99] for age, waist circumference, Epworth somnolence, and oxygen saturation as predictors in a logistic regression.

CONCLUSIONS

Although high values were obtained, they still lack external validation results in large cohorts and a standard OSA criteria definition.

2020

COVID-19 surveillance - a descriptive study on data quality issues

Autores
Costa-Santos, C; Luísa Neves, A; Correia, R; Santos, P; Monteiro-Soares, M; Freitas, A; Ribeiro-Vaz, I; Henriques, T; Rodrigues, PP; Costa-Pereira, A; Pereira, AM; Fonseca, J;

Publicação

Abstract
AbstractBackgroundHigh-quality data is crucial for guiding decision making and practicing evidence-based healthcare, especially if previous knowledge is lacking. Nevertheless, data quality frailties have been exposed worldwide during the current COVID-19 pandemic. Focusing on a major Portuguese surveillance dataset, our study aims to assess data quality issues and suggest possible solutions.MethodsOn April 27th 2020, the Portuguese Directorate-General of Health (DGS) made available a dataset (DGSApril) for researchers, upon request. On August 4th, an updated dataset (DGSAugust) was also obtained. The quality of data was assessed through analysis of data completeness and consistency between both datasets.ResultsDGSAugust has not followed the data format and variables as DGSApril and a significant number of missing data and inconsistencies were found (e.g. 4,075 cases from the DGSApril were apparently not included in DGSAugust). Several variables also showed a low degree of completeness and/or changed their values from one dataset to another (e.g. the variable ‘underlying conditions’ had more than half of cases showing different information between datasets). There were also significant inconsistencies between the number of cases and deaths due to COVID-19 shown in DGSAugust and by the DGS reports publicly provided daily.ConclusionsThe low quality of COVID-19 surveillance datasets limits its usability to inform good decisions and perform useful research. Major improvements in surveillance datasets are therefore urgently needed - e.g. simplification of data entry processes, constant monitoring of data, and increased training and awareness of health care providers - as low data quality may lead to a deficient pandemic control.

2020

Excess mortality during COVID-19 in five European countries and a critique of mortality analysis data

Autores
Felix-Cardoso, J; Vasconcelos, H; Rodrigues, P; Cruz-Correia, R;

Publicação

Abstract
INTRODUCTION The COVID-19 pandemic is an ongoing event disrupting lives, health systems, and economies worldwide. Clear data about the pandemic's impact is lacking, namely regarding mortality. This work aims to study the impact of COVID-19 through the analysis of all-cause mortality data made available by different European countries, and to critique their mortality surveillance data. METHODS European countries that had publicly available data about the number of deaths per day/week were selected (England and Wales, France, Italy, Netherlands and Portugal). Two different methods were selected to estimate the excess mortality due to COVID19: (DEV) deviation from the expected value from homologue periods, and (RSTS) remainder after seasonal time series decomposition. We estimate total, age- and gender-specific excess mortality. Furthermore, we compare different policy responses to COVID-19. RESULTS Excess mortality was found in all 5 countries, ranging from 10.6% in Portugal (DEV) to 98.5% in Italy (DEV). Furthermore, excess mortality is higher than COVID-attributed deaths in all 5 countries. DISCUSSION The impact of COVID-19 on mortality appears to be larger than officially attributed deaths, in varying degrees in different countries. Comparisons between countries would be useful, but large disparities in mortality surveillance data could not be overcome. Unreliable data, and even a lack of cause-specific mortality data undermine the understanding of the impact of policy choices on both direct and indirect deaths during COVID-19. European countries should invest more on mortality surveillance systems to improve the publicly available data.

  • 26
  • 29