Publications

Publications by João Gama

2016

Measures for Combining Prediction Intervals Uncertainty and Reliability in Forecasting

Authors
Almeida, V; Gama, J;

Publication
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015

Abstract
In this paper we propose a new methodology for evaluating prediction intervals (PIs). Typically, PIs are evaluated with reference to confidence values. However, other metrics should be considered, since high values are associated to too wide intervals that convey little information and are of no use for decision-making. We propose to compare the error distribution (predictions out of the interval) and the maximum mean absolute error (MAE) allowed by the confidence limits. Along this paper PIs based on neural networks for short-term load forecast are compared using two different strategies: (1) dual perturb and combine (DPC) algorithm and (2) conformal prediction. We demonstrated that depending on the real scenario (e.g., time of day) different algorithms perform better. The main contribution is the identification of high uncertainty levels in forecast that can guide the decision-makers to avoid the selection of risky actions under uncertain conditions. Small errors mean that decisions can be made more confidently with less chance of confronting a future unexpected condition.

CloseRead Abstract

2015

Streaming networks sampling using top-K networks

Authors
Sarmento, R; Cordeiro, M; Gama, J;

Publication
ICEIS 2015 - 17th International Conference on Enterprise Information Systems, Proceedings

Abstract
The combination of top-K network representation of the data stream with community detection is a novel approach to streaming networks sampling. Keeping an always up-to-date sample of the full network, the advantage of this method, compared to previous, is that it preserves larger communities and original network distribution. Empirically, it will also be shown that these techniques, in conjunction with community detection, provide effective ways to perform sampling and analysis of large scale streaming networks with power law distributions.

CloseRead Abstract

2017

Co-training Semi-supervised Learning for Single-Target Regression in Data Streams Using AMRules

Authors
Sousa, R; Gama, J;

Publication
Foundations of Intelligent Systems - 23rd International Symposium, ISMIS 2017, Warsaw, Poland, June 26-29, 2017, Proceedings

Abstract
In a single-target regression context, some important systems based on data streaming produce huge quantities of unlabeled data (without output value), of which label assignment may be impossible, time consuming or expensive. Semi-supervised methods, that include the co-training approach, were proposed to use the input information of the unlabeled examples in the improvement of models and predictions. In the literature, the co-training methods are essentially applied to classification and operate in batch mode. Due to these facts, this work proposes a co-training online algorithm for single-target regression to perform model improvement with unlabeled data. This work is also the first-step for the development of online multi-target regressor that create models for multiple outputs simultaneously. The experimental framework compared the performance of this method, when it rejects unalabeled data and when it uses unlabeled data with different parametrization in the training. The results suggest that the co-training method regressor predicts better when a portion of unlabeled examples is used. However, the prediction improvements are relatively small. © Springer International Publishing AG 2017.

CloseRead Abstract

2014

Dynamic communities in evolving customer networks: an analysis using landmark and sliding windows

Authors
Oliveira, M; Guerreiro, A; Gama, J;

Publication
Social Network Analysis and Mining

Abstract
The widespread availability of Customer Relationship Management applications in modern organizations, allows companies to collect and store vast amounts of high-detailed customer-related data. Making sense of these data using appropriate methods can yield insights into customers’ behaviour and preferences. The extracted knowledge can then be explored for marketing purposes. Social Network Analysis techniques can play a key role in business analytics. By modelling the implicit relationships among customers as a social network, it is possible to understand how patterns in these relationships translate into competitive advantages for the company. Additionally, the incorporation of the temporal dimension in such analysis can help detect market trends and changes in customers’ preferences. In this paper, we introduce a methodology to examine the dynamics of customer communities, which relies on two different time window models: a landmark and a sliding window. Landmark windows keep all the historical data and treat all nodes and links equally, even if they only appear at the early stages of the network life. Such approach is appropriate for the long-term analysis of networks, but may fail to provide a realistic picture of the current evolution. On the other hand, sliding windows focus on the most recent past thus allowing to capture current events. The application of the proposed methodology on a real-world customer network suggests that both window models provide complementary information. Nevertheless, the sliding window model is able to capture better the recent changes of the network. © 2014, Springer-Verlag Wien.

CloseRead Abstract

2016

Novelty detection in data streams

Authors
Faria, ER; Goncalves, IJCR; de Carvalho, ACPLF; Gama, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
In massive data analysis, data usually come in streams. In the last years, several studies have investigated novelty detection in these data streams. Different approaches have been proposed and validated in many application domains. A review of the main aspects of these studies can provide useful information to improve the performance of existing approaches, allow their adaptation to new applications and help to identify new important issues to be addresses in future studies. This article presents and analyses different aspects of novelty detection in data streams, like the offline and online phases, the number of classes considered at each phase, the use of ensemble versus a single classifier, supervised and unsupervised approaches for the learning task, information used for decision model update, forgetting mechanisms for outdated concepts, concept drift treatment, how to distinguish noise and outliers from novelty concepts, classification strategies for data with unknown label, and how to deal with recurring classes. This article also describes several applications of novelty detection in data streams investigated in the literature and discuss important challenges and future research directions.

CloseRead Abstract

2016

Clustering from Data Streams

Authors
Gama, J;

Publication
Encyclopedia of Machine Learning and Data Mining

Abstract