Publications

Publications by LIAAD

2015

A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients

Authors
Santos, MS; Abreu, PH; Garcia Laencina, PJ; Simao, A; Carvalho, A;

Publication
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models.

CloseRead Abstract

2014

The harmonic and noise information of the glottal pulses in speech

Authors
Sousa, R; Ferreira, A; Alku, P;

Publication
BIOMEDICAL SIGNAL PROCESSING AND CONTROL

Abstract
This paper presents an algorithm, in the context of speech analysis and pathologic/dysphonic voices evaluation, which splits the signal of the glottal excitation into harmonic and noise components. The algorithm uses a harmonic and noise splitter and a glottal inverse filtering. The combination of these two functionalities leads to an improved estimation of the glottal excitation and its components. The results demonstrate this improvement of estimates of the glottal excitation in comparison to a known inverse filtering method (IAIF). These results comprise performance tests with synthetic voices and application to natural voices that show the waveforms of harmonic and noise components of the glottal excitation. This enhances the glottal information retrieval such as waveform patterns with physiological meaning.

CloseRead Abstract

2014

Fast Incremental Matrix Factorization for Recommendation with Positive-Only Feedback

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
USER MODELING, ADAPTATION, AND PERSONALIZATION, UMAP 2014

Abstract
Traditional Collaborative Filtering algorithms for recommendation are designed for stationary data. Likewise, conventional evaluation methodologies are only applicable in offline experiments, where data and models are static. However, in real world systems, user feedback is continuously being generated, at unpredictable rates. One way to deal with this data stream is to perform online model updates as new data points become available. This requires algorithms able to process data at least as fast as it is generated. One other issue is how to evaluate algorithms in such a streaming data environment. In this paper we introduce a simple but fast incremental Matrix Factorization algorithm for positive-only feedback. We also contribute with a prequential evaluation protocol for recommender systems, suitable for streaming data environments. Using this evaluation methodology, we compare our algorithm with other state-of-the-art proposals. Our experiments reveal that despite its simplicity, our algorithm has competitive accuracy, while being significantly faster.

CloseRead Abstract

2014

GTE-cluster: A temporal search interface for implicit temporal queries

Authors
Campos, R; Dias, G; Jorge, AM; Nunes, C;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
In this paper, we present GTE-Cluster an online temporal search interface which consistently allows searching for topics in a temporal perspective by clustering relevant temporal Web search results. GTE-Cluster is designed to improve user experience by augmenting document relevance with temporal relevance. The rationale is that offering the user a comprehensive temporal perspective of a topic is intuitively more informative than retrieving a result that only contains topical information. Our system does not pose any constraint in terms of language or domain, thus users can issue queries in any language ranging from business, cultural, political to musical perspective, to cite just a few. The ability to exploit this information in a temporal manner can be, from a user perspective, potentially useful for several tasks, including user query understanding or temporal clustering. © 2014 Springer International Publishing Switzerland.

CloseRead Abstract

2014

Heart Sounds Classification using Motif based Segmentation

Authors
Oliveira, SC; Gomes, EF; Jorge, AM;

Publication
PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14)

Abstract
In this paper we describe an algorithm for heart sound classification (classes Normal, Murmur and Extrasystole) based on the discretization of sound signals using the SAX (Symbolic Aggregate Approximation) representation. The general strategy is to automatically discover relevant top frequent motifs and relate them with the occurrence of systolic (S1) and diastolic (S2) sounds in the audio signals. The algorithm was tuned using motifs generated from a collection of audio signals obtained from a clinical trial in a hospital. Validation was performed on a separate set of unlabeled audio signals. Results indicate ability to improve the precision of the classification of the classes Normal and Murmur.

CloseRead Abstract

2014

Monitoring Recommender Systems: A Business Intelligence Approach

Authors
Felix, C; Soares, C; Jorge, A; Vinagre, J;

Publication
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, PART VI - ICCSA 2014

Abstract
Recommender systems (RS) are increasingly adopted by e-business, social networks and many other user-centric websites. Based on the user's previous choices or interests, a RS suggests new items in which the user might be interested. With constant changes in user behavior, the quality of a RS may decrease over time. Therefore, we need to monitor the performance of the RS, giving timely information to management, who can than manage the RS to maximize results. Our work consists in creating a monitoring platform - based on Business Intelligence (BI) and On-line Analytical Processing (OLAP) tools - that provides information about the recommender system, in order to assess its quality, the impact it has on users and their adherence to the recommendations. We present a case study with Palco Principal(1), a social network for music.

CloseRead Abstract