Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2019

Machine learning for streaming data: state of the art, challenges, and opportunities

Authors
Gomes, HM; Read, J; Bifet, A; Barddal, JP; Gama, J;

Publication
SIGKDD Explorations

Abstract

2019

Novelty Detection for Multi-Label Stream Classification

Authors
Costa Júnior, JD; de Faria, ER; Andrade Silva, Jd; Gama, J; Cerri, R;

Publication
8th Brazilian Conference on Intelligent Systems, BRACIS 2019, Salvador, Brazil, October 15-18, 2019

Abstract

2020

Proceedings of the 8th International Workshop on Big Data, IoT Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications co-located with 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019), Anchorage, Alaska, August 4-8, 2019

Authors
Bifet, A; Berlingerio, M; Gama, J; Read, J; Nogueira, AR;

Publication
BigMine@KDD

Abstract

2020

A Study on Imbalanced Data Streams

Authors
Aminian, E; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II

Abstract
Data are growing fast in today's world and great portion of that is in the form of stream. In many situations, data streams are imbalanced making it difficult to use with classical data mining methods. However, mining these special kinds of streams is one of the most attractive research area. In this paper, we propose two algorithms for learning from imbalanced regression data streams. Both methods are based on Chebychev's inequality but in a different way. The first method, under-samples from the frequent target value examples while the second method over-samples the rare and extreme target value examples. This way, the learner will focus in the rare and more difficult cases. We applied our methods to train regression models using two benchmark datasets and two well-known regression algorithms: Perceptron and FIMT-DD. Our obtained results from the simulations indicate the usefulness of our proposed methods.

2020

Identifying Points of Interest and Similar Individuals from Raw GPS Data

Authors
Andrade, T; Gama, J;

Publication
Mobility Internet of Things 2018 - EAI/Springer Innovations in Communication and Computing

Abstract

2020

Mining Human Mobility Data to Discover Locations and Habits

Authors
Andrade, T; Cancela, B; Gama, J;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II

Abstract
Many aspects of life are associated with places of human mobility patterns and nowadays we are facing an increase in the pervasiveness of mobile devices these individuals carry. Positioning technologies that serve these devices such as the cellular antenna (GSM networks), global navigation satellite systems (GPS), and more recently the WiFi positioning system (WPS) provide large amounts of spatio-temporal data in a continuous way. Therefore, detecting significant places and the frequency of movements between them is fundamental to understand human behavior. In this paper, we propose a method for discovering user habits without any a priori or external knowledge by introducing a density-based clustering for spatio-temporal data to identify meaningful places and by applying a Gaussian Mixture Model (GMM) over the set of meaningful places to identify the representations of individual habits. To evaluate the proposed method we use two real-world datasets. One dataset contains high-density GPS data and the other one contains GSM mobile phone data in a coarse representation. The results show that the proposed method is suitable for this task as many unique habits were identified. This can be used for understanding users' behavior and to draw their characterizing profiles having a panorama of the mobility patterns from the data.

  • 34
  • 89