Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2014

Using probabilistic graphical models to enhance the prognosis of health-related quality of life in adult survivors of critical illness

Authors
Dias, CC; Granja, C; Costa Pereira, A; Gama, J; Rodrigues, PP;

Publication
2014 IEEE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Health-related quality of life (HR-QoL) is a subjective concept, reflecting the overall mental and physical state of the patient, and their own sense of well-being. Estimating current and future QoL has become a major outcome in the evaluation of critically ill patients. The aim of this study is to enhance the inference process of 6 weeks and 6 months prognosis of QoL after intensive care unit (ICU) stay, using the EQ-5D questionnaire. The main outcomes of the study were the EQ-5D five main dimensions: mobility, self-care, usual activities, pain and anxiety/depression. For each outcome, three Bayesian classifiers were built and validated with 10-fold cross-validation. Sixty and 473 patients (6 weeks and 6 months, respectively) were included. Overall, 6 months QoL is higher than 6 weeks, with the probability of absence of problems ranging from 31% (6 weeks mobility) to 72% (6 months self-care). Bayesian models achieved prognosis accuracies of 56% (6 months, anxiety/depression) up to 80% (6 weeks, mobility). The prognosis inference process for an individual patient was enhanced with the visual analysis of the models, showing that women, elderly, or people with longer ICU stay have higher risk of QoL problems at 6 weeks. Likewise, for the 6 months prognosis, a higher APACHE II severity score also leads to a higher risk of problems, except for anxiety/depression where the youngest and active have increased risk. Bayesian networks are competitive with less descriptive strategies, improve the inference process by incorporating domain knowledge and present a more interpretable model. The relationships among different factors extracted by the Bayesian models are in accordance with those collected by previous state-of-the-art literature, hence showing their usability as inference model.

2014

Event labeling combining ensemble detectors and background knowledge

Authors
T, HF; Gama, J;

Publication
Progress in AI

Abstract
Event labeling is the process of marking events in unlabeled data. Traditionally, this is done by involving one or more human experts through an expensive and timeconsuming task. In this article we propose an event labeling system relying on an ensemble of detectors and background knowledge. The target data are the usage log of a real bike sharing system. We first label events in the data and then evaluate the performance of the ensemble and individual detectors on the labeled data set using ROC analysis and static evaluation metrics in the absence and presence of background knowledge. Our results show that when there is no access to human experts, the proposed approach can be an effective alternative for labeling events. In addition to the main proposal, we conduct a comparative study regarding the various predictive models performance, semi-supervised and unsupervised approaches, train data scale, time series filtering methods, online and offline predictive models, and distance functions in measuring time series similarity. © Springer-Verlag Berlin Heidelberg 2013.

2014

Unsupervised density-based behavior change detection in data streams

Authors
Vallim, RMM; Andrade Filho, JA; de Mello, RF; de Carvalho, ACPLF; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change detection in Data Streams called M-DBScan. This framework is composed of a density-based clustering step followed by a novelty detection procedure based on entropy level measures. This work uses two different types of entropy measures, where one considers the spatial distribution of data while the other models temporal relations between observations in the stream. The performance of the method is assessed in a set of experiments comparing M-DBScan with a proximity-based approach. Experimental results provide important insight on how to design change detection mechanisms for streams.

2014

Dynamic communities in evolving customer networks: an analysis using landmark and sliding windows

Authors
Oliveira, M; Guerreiro, A; Gama, J;

Publication
Social Network Analysis and Mining

Abstract
The widespread availability of Customer Relationship Management applications in modern organizations, allows companies to collect and store vast amounts of high-detailed customer-related data. Making sense of these data using appropriate methods can yield insights into customers’ behaviour and preferences. The extracted knowledge can then be explored for marketing purposes. Social Network Analysis techniques can play a key role in business analytics. By modelling the implicit relationships among customers as a social network, it is possible to understand how patterns in these relationships translate into competitive advantages for the company. Additionally, the incorporation of the temporal dimension in such analysis can help detect market trends and changes in customers’ preferences. In this paper, we introduce a methodology to examine the dynamics of customer communities, which relies on two different time window models: a landmark and a sliding window. Landmark windows keep all the historical data and treat all nodes and links equally, even if they only appear at the early stages of the network life. Such approach is appropriate for the long-term analysis of networks, but may fail to provide a realistic picture of the current evolution. On the other hand, sliding windows focus on the most recent past thus allowing to capture current events. The application of the proposed methodology on a real-world customer network suggests that both window models provide complementary information. Nevertheless, the sliding window model is able to capture better the recent changes of the network. © 2014, Springer-Verlag Wien.

2014

A Survey on Concept Drift Adaptation

Authors
Gama, J; Zliobaite, I; Bifet, A; Pechenizkiy, M; Bouchachia, A;

Publication
ACM COMPUTING SURVEYS

Abstract
Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

2014

Distributed Adaptive Model Rules for Mining Big Data Streams

Authors
Vu, AT; De Francisci Morales, GD; Gama, J; Bifet, A;

Publication
2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)

Abstract
Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in SAMOA (SCALABLE ADVANCED MASSIVE ONLINE ANALYSIS), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 3 0 0 0 0 instances per second, and achieve a speedup of up to 4.7 x over the sequential version.

  • 262
  • 430