Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2019

The search of conditional outliers

Authors
Portel, E; Ribeire, RP; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
There is no standard definition of outliers, but most authors agree that outliers are points far from other data points. Several outlier detection techniques have been developed mainly for two different purposes. On one hand, outliers are considered error measurement observations that should be removed from the analysis, e.g. robust statistics. On the other hand, outliers are the interesting observations, like in fraud detection, and should be modelled by some learning method. In this work, we start from the observation that outliers are affected by the so-called simpson paradox: a trend that appears in different groups of data but disappears or reverses when these groups are combined. Given a data set, we learn a regression tree. The tree grows by partitioning the data into groups more and more homogeneous of the target variable. At each partition defined by the tree, we apply a box plot on the target variable to detect outliers. We would expect that the deeper nodes of the tree would contain less and less outliers. We observe that some points previously signalled as outliers are no more signalled as such, but new outliers appear.

2018

Clustering in the Presence of Concept Drift

Authors
Moulton, RH; Viktor, HL; Japkowicz, N; Gama, J;

Publication
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part I

Abstract
Clustering naturally addresses many of the challenges of data streams and many data stream clustering algorithms (DSCAs) have been proposed. The literature does not, however, provide quantitative descriptions of how these algorithms behave in different circumstances. In this paper we study how the clusterings produced by different DSCAs change, relative to the ground truth, as quantitatively different types of concept drift are encountered. This paper makes two contributions to the literature. First, we propose a method for generating real-valued data streams with precise quantitative concept drift. Second, we conduct an experimental study to provide quantitative analyses of DSCA performance with synthetic real-valued data streams and show how to apply this knowledge to real world data streams. We find that large magnitude and short duration concept drifts are most challenging and that DSCAs with partitioning-based offline clustering methods are generally more robust than those with density-based offline clustering methods. Our results further indicate that increasing the number of classes present in a stream is a more challenging environment than decreasing the number of classes. Code related to this paper is available at: https://doi.org/10.5281/zenodo.1168699, https://doi.org/10.5281/zenodo.1216189, https://doi.org/10.5281/zenodo.1213802, https://doi.org/10.5281/zenodo.1304380. © Springer Nature Switzerland AG 2019.

2019

ECML PKDD 2018 Workshops - DMLE 2018 and IoTStream 2018, Dublin, Ireland, September 10-14, 2018, Revised Selected Papers

Authors
Monreale, A; Alzate, C; Kamp, M; Krishnamurthy, Y; Paurat, D; Mouchaweh, MS; Bifet, A; Gama, J; Ribeiro, RP;

Publication
DMLE/IOTSTREAMING@PKDD/ECML

Abstract

2019

Anomaly Detection in Sequential Data: Principles and Case Studies

Authors
Andrade, T; Gama, J; Ribeiro, RP; Sousa, W; Carvalho, A;

Publication
Wiley Encyclopedia of Electrical and Electronics Engineering

Abstract

2019

Gait stride-to-stride variability and foot clearance pattern analysis in Idiopathic Parkinson's Disease and Vascular Parkinsonism

Authors
Ferreira, F; Gago, MF; Bicho, E; Carvalho, C; Mollaei, N; Rodrigues, L; Sousa, N; Rodrigues, PP; Ferreira, C; Gama, J;

Publication
JOURNAL OF BIOMECHANICS

Abstract
The literature on gait analysis in Vascular Parkinsonism (VaP), addressing issues such as variability, foot clearance patterns, and the effect of levodopa, is scarce. This study investigates whether spatiotemporal, foot clearance and stride-to-stride variability analysis can discriminate VaP, and responsiveness to levodopa. Fifteen healthy subjects, 15 Idiopathic Parkinson's Disease (IPD) patients and 15 VaP patients, were assessed in two phases: before (Off-state), and one hour after (On-state) the acute administration of a suprathreshold (1.5 times the usual) levodopa dose. Participants were asked to walk a 30-meter continuous course at a self-selected walking speed while wearing foot-worn inertial sensors. For each gait variable, mean, coefficient of variation (CV), and standard deviations SDI and SD2 obtained by Poincare analysis were calculated. General linear models (GLMs) were used to identify group differences. Patients were subject to neuropsychological evaluation (MoCA test) and Brain MRI. VaP patients presented lower mean stride velocity, stride length, lift-off and strike angle, and height of maximum toe (later swing) (p < .05), and higher %gait cycle in double support, with only the latter unresponsive to levodopa. VaP patients also presented higher CV, significantly reduced after levodopa. Yet, all VaP versus IPD differences lost significance when accounting for mean stride length as a covariate. In conclusion, VaP patients presented a unique gait with reduced degrees of foot clearance, probably correlated to vascular lesioning in dopaminergic/non-dopaminergic cortical and subcortical non-dopaminergic networks, still amenable to benefit from levodopa. The dependency of gait and foot clearance and variability deficits from stride length deserves future clarification.

2019

Special track on data streams

Authors
Bifet, A; Carvalho, A; Ferreira, C; Gama, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract

  • 28
  • 88