Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2018

Guest Editorial Special Issue on Knowledge Discovery From Mobility Data for Intelligent Transportation Systems

Autores
Moreira Matias, L; Gama, J; Monreal, CO; Nair, R; Trasarti, R;

Publicação
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
The recent technological advances on telecommunications create a new reality on mobility sensing. Nowadays, we live in an era where ubiquitous digital devices are able to broadcast rich information about human mobility in real-Time and at a high rate. Such fact exponentially increased the availability of large-scale mobility data which has been popularized in the media as the new currency, fueling the future vision of our smart cities that will transform our lives. The reality is that we just began to recognize significant research challenges across a spectrum of topics. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders on building knowledge discovery pipelines over such data sources. However, such availability also raises privacy issues that must be considered by both industrial and academic stakeholders on using these resources. © 2000-2011 IEEE.

FecharLer Abstract

2018

Outliers and the Simpson's Paradox

Autores
Portela, E; Ribeiro, RP; Gama, J;

Publicação
ADVANCES IN SOFT COMPUTING, MICAI 2017, PT I

Abstract
There is no standard definition of outliers, but most authors agree that outliers are points far from other data points. Several outlier detection techniques have been developed mainly with two different purposes. On one hand, outliers are the interesting observations, like in fraud detection, on the other side, outliers are considered measurement observations that should be removed from the analysis, e.g. robust statistics. In this work, we start from the observation that outliers are effected by the so called Simpson paradox: a trend that appears in different groups of data but disappears or reverses when these groups are combined. Given a dataset, we learn a regression tree. The tree grows by partitioning the data into groups more and more homogeneous of the target variable. At each partition defined by the tree, we apply a box plot on the target variable to detect outliers. We would expected that deeper nodes of the tree contain less and less outliers. We observe that some points previously signaled as outliers are no more signaled as such, but new outliers appear. The identification of outliers depends on the context considered. Based on this observation, we propose a new method to quantify the level of outlierness of data points. © Springer Nature Switzerland AG 2018.

FecharLer Abstract

2018

Biased Dynamic Sampling for Temporal Network Streams

Autores
Tabassum, S; Gama, J;

Publicação
Complex Networks and Their Applications VII - Volume 1 Proceedings The 7th International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2018, Cambridge, UK, December 11-13, 2018.

Abstract
Considering the avalanche of evolving data and the memory constraints, streaming networks’ sampling has gained much attention in the recent decade. However, samples choosing data uniformly from the beginning to the end of a temporal stream are not very relevant for temporally evolving networks where recent activities are more important than the old events. Moreover, the relationships also change overtime. Recent interactions are evident to show the current status of relationships, nevertheless some old stronger relations are also substantially significant. Considering the above issues we propose a fast memory less dynamic sampling mechanism for weighted or multi-graph high-speed streams. For this purpose, we use a forgetting function with two parameters that help introduce biases on the network based on time and relationship strengths. Our experiments on real-world data sets show that our samples not only preserve the basic properties like degree distributions but also maintain the temporal distribution correlations. We also observe that our method generates samples with increased efficiency. It also outperforms current sampling algorithms in the area. © 2019, Springer Nature Switzerland AG.

FecharLer Abstract

2018

Improving acute kidney injury detection with conditional probabilities

Autores
Nogueira, AR; Ferreira, CA; Gama, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
The Acute Kidney Injury (AKI), is a disease that affects the kidneys and is characterized by the rapid deterioration of these organs, usually associated with a pre-existing critical illness. Being an acute disease, time is a key element in the prevention. By anticipating a patient's state transition, we are preventing future complications in his health, such as the development of a chronic disease or loss of an organ, in addition to decreasing the amount of money spent on the patient's care. The main goal of this paper is to address the problem of correctly predicting the illness path in various patients by studying different methodologies to predict this disease and propose new distinct approaches based on this idea of improving the performance of the classification. Through the comparison of five different approaches (Markov Chain Model ICU Specialists, Markov Chain Model Features, Markov Chain Model Conditional Features, Markov Chain Model and Random Forest), we came to the conclusion that the application of conditional probabilities to this problem produces a more accurate prediction, based on common inputs.

FecharLer Abstract

2018

Artificial Neural Networks Classification of Patients with Parkinsonism based on Gait

Autores
Fernandes, C; Fonseca, L; Ferreira, F; Gago, M; Costa, L; Sousa, N; Ferreira, C; Gama, J; Erlhagen, W; Bicho, E;

Publicação
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)

Abstract
Differential diagnosis between Idiopathic Parkinson's disease (IPD) and Vascular Parkinsonism (VaP) is a difficult task, especially early in the disease. There is growing evidence to support the use of gait assessment in diagnosis and management of movement disorder diseases. The aim of this study is to evaluate the effectiveness of some machine learning strategies in distinguishing IPD and VaP gait. Wearable sensors positioned on both feet were used to acquire the gait data from 15 IPD, 15 VaP, and 15 healthy subjects. A comparative classification analysis was performed by applying two supervised machine learning algorithms: Multiple Layer Perceptrons (MLPs) and Deep Belief Networks (DBNs). The decisional space was composed of the gait variables, with or without neuropsychological evaluation (Montreal cognitive assessment (MoCA) score), top-ranked in an error incremental analysis. In the classification task of characterizing parkinsonian gait by distinguishing between patients (IPD+VaP) and healthy control, from the all strides classification of the gait performed by the person, high accuracy (93% with or without MoCA) was obtained for both algorithms. In the classification task of the two groups of patients (VaP vs. IPD), DBN classifier achieved higher performance (73% with MoCA). To the best of our knowledge, this is the first study on gait classification that includes a VaP group. DBN classifiers are not frequently applied in literature to similar studies, but the results here obtained demonstrate that the use of DBN classifiers based on gait analysis is promising to be a good support to the neurologist in distinguishing VaP and IPD.

FecharLer Abstract

2018

Self Hyper-parameter Tuning for Stream Recommendation Algorithms

Autores
Veloso, B; Gama, J; Malheiro, B; Vinagre, J;

Publicação
ECML PKDD 2018 Workshops - DMLE 2018 and IoTStream 2018, Dublin, Ireland, September 10-14, 2018, Revised Selected Papers

Abstract
E-commerce platforms explore the interaction between users and digital content – user generated streams of events – to build and maintain dynamic user preference models which are used to make mean-ingful recommendations. However, the accuracy of these incremental models is critically affected by the choice of hyper-parameters. So far, the incremental recommendation algorithms used to process data streams rely on human expertise for hyper-parameter tuning. In this work we apply our Self Hyper-Parameter Tuning (SPT) algorithm to incremental recommendation algorithms. SPT adapts the Melder-Mead optimi-sation algorithm to perform hyper-parameter tuning. First, it creates three models with random hyper-parameter values and, then, at dynamic size intervals, assesses and applies the Melder-Mead operators to update their hyper-parameters until the models converge. The main contribu-tion of this work is the adaptation of the SPT method to incremental matrix factorisation recommendation algorithms. The proposed method was evaluated with well-known recommendation data sets. The results show that SPT systematically improves data stream recommendations.

FecharLer Abstract