Publications

Publications by João Gama

2017

Feature ranking in hoeffding algorithms for regression

Authors
Duarte, J; Gama, J;

Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
Feature selection and feature ranking are two aspects of the same learning task. They are well studied in batch scenarios, but not in the streaming setting. This paper presents a study on feature ranking from data streams in online learning regression models. The main challenge here is the relevance of features might change over time: features relevant in the past might be irrelevant now and vice-versa. We propose three new online feature ranking algorithms designed for Hoeffding algorithms. We have implemented the three methods in AMRules, a streaming regression algorithm to learn model rules. We compare their behaviour experimentally and present the pros and cons of each method. Copyright 2017 ACM.

CloseRead Abstract

2015

Multi-Target Regression from High-Speed Data Streams with Adaptive Model Rules

Authors
Duarte, J; Gama, J;

Publication
PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015)

Abstract
Many real life prediction problems involve predicting a structured output. Multi-target regression is an instance of structured output prediction whose task is to predict for multiple target variables. Structured output algorithms are usually computationally and memory demanding, hence are not suited for dealing with massive amounts of data. Most of these algorithms can be categorized as local or global methods. Local methods produce individual models for each output component and combine them to produce the structured prediction. Global methods adapt traditional learning algorithms to predict the output structure as a whole. We propose the first rule-based algorithm for solving multi-target regression problems from data streams. The algorithm builds on the adaptive model rules framework. In contrast to the majority of the structured output predictors, this particular algorithm does not fall into the local and global categories. Instead, each rule specializes on related subsets of the output attributes. To evaluate the performance of the proposed algorithm, two other rule-based algorithms were developed, one using the local strategy and the other using the global strategy. These methods were compared considering their prediction error, memory usage, computational time, and model complexity. Experimental results on synthetic and real data show that the local-strategy algorithm usually obtains the lowest error. However, the proposed and the global-strategy algorithms use much less memory and run significantly much faster at the cost of a slightly increase in the error, which make them very attractive when computation resources are an important factor. Also, the models produced by the latter approaches are much easier to understand since considerably less rules are produced.

CloseRead Abstract

2016

Detecting Events in Evolving Social Networks through Node Centrality Analysis

Authors
Pereira, FSF; Amo, Sd; Gama, J;

Publication
Proceedings of the Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV 2016) co-located with the 2016 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016), Riva del Garda, Italy, September 23, 2016.

Abstract
Social networks have an evolving characteristic because of continuous interaction between users. Existing event detection tasks do not consider the analysis under a user-centric perspective. In this paper we propose to detect node centrality events, that is the task of finding events based on the position and roles of the nodes. We present a naive algorithm for detecting such events in network streams. Moreover, we apply our proposal in a case study, showing how node centrality events can be used for tracking user preferences changes.

CloseRead Abstract

2016

First Principle Models Based Dataset Generation for Multi-Target Regression and Multi-Label Classification Evaluation

Authors
Sousa, R; Gama, J;

Abstract
Machine Learning and Data Mining research strongly depend on the quality and quantity of the real world datasets for the evaluation stages of the developing methods. In the context of the emerging Online Multi-Target Regression and Multi-Label Classification methodologies, datasets present new characteristics that require specific testing and represent new challenges. The first difficulty found in evaluation is the reduced amount of examples caused by data damage, privacy preservation or high cost of acquirement. Secondly, few data events of interest such as data changes are difficult to find in the datasets of specific domains, since these events naturally scarce. For those reasons, this work suggests a method of producing synthetic datasets with desired properties(number of examples, data changes events, ... ) for the evaluation of Multi-Target Regression and Multi-Label Classification methods. These datasets are produced using First Principle Models which give more realistic and representative properties such as real world meaning ( physical, financial, ... ) for the outputs and inputs variables. This type of dataset generation can be used to produce infinite streams and to evaluate incremental methods such as online anomaly and change detection. This paper illustrates the use of synthetic data generation through two showcases of data changes evaluation.

CloseRead Abstract

2018

Predicting short term mood developments among depressed patients using adherence and ecological momentary assessment data

Authors
Mikus, A; Hoogendoorn, M; Rocha, A; Gama, J; Ruwaard, J; Riper, H;

Publication
INTERNET INTERVENTIONS-THE APPLICATION OF INFORMATION TECHNOLOGY IN MENTAL AND BEHAVIOURAL HEALTH

Abstract
Technology driven interventions provide us with an increasing amount of fine-grained data about the patient. This data includes regular ecological momentary assessments (EMA) but also response times to EMA questions by a user. When observing this data, we see a huge variation between the patterns exhibited by different patients. Some are more stable while others vary a lot over time. This poses a challenging problem for the domain of artificial intelligence and makes on wondering whether it is possible to predict the future mental state of a patient using the data that is available. In the end, these predictions could potentially contribute to interventions that tailor the feedback to the user on a daily basis, for example by warning a user that a fall-back might be expected during the next days, or by applying a strategy to prevent the fall-back from occurring in the first place. In this work, we focus on short term mood prediction by considering the adherence and usage data as an additional predictor. We apply recurrent neural networks to handle the temporal aspects best and try to explore whether individual, group level, or one single predictive model provides the highest predictive performance (measured using the root mean squared error (RMSE)). We use data collected from patients from five countries who used the ICT4Depression/MoodBuster platform in the context of the EU E-COMPARED project. In total, we used the data from 143 patients (with between 9 and 425 days of EMA data) who were diagnosed with a major depressive disorder according to DSM-IV. Results show that we can make predictions of short term mood change quite accurate (ranging between 0.065 and 0.11). The past EMA mood ratings proved to be the most influential while adherence and usage data did not improve prediction accuracy. In general, group level predictions proved to be the most promising, however differences were not significant. Short term mood prediction remains a difficult task, but from this research we can conclude that sophisticated machine learning algorithms/setups can result in accurate performance. For future work, we want to use more data from the mobile phone to improve predictive performance of short term mood.

CloseRead Abstract

2018

Cover Image, Volume 8, Issue 5

Authors
Tabassum, S; Pereira, FSF; Silva Fernandes, Sd; Gama, J;

Publication
Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery

Abstract