Publications

Publications by LIAAD

2016

Evolving Centralities in Temporal Graphs: A Twitter Network Analysis

Authors
Pereira, FSF; Amo, Sd; Gama, J;

Publication
IEEE 17th International Conference on Mobile Data Management, MDM 2016, Porto, Portugal, June 13-16, 2016 - Workshops

Abstract

2016

MINAS: multiclass learning algorithm for novelty detection in data streams

Authors
de Faria, ER; de Leon Ferreira Carvalho, ACPDF; Gama, J;

Publication
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Data stream mining is an emergent research area that aims at extracting knowledge from large amounts of continuously generated data. Novelty detection (ND) is a classification task that assesses if one or a set of examples differ significantly from the previously seen examples. This is an important task for data stream, as new concepts may appear, disappear or evolve over time. Most of the works found in the ND literature presents it as a binary classification task. In several data stream real life problems, ND must be treated as a multiclass task, in which, the known concept is composed by one or more classes and different new classes may appear. This work proposes MINAS, an algorithm for ND in data streams. MINAS deals with ND as a multiclass task. In the initial training phase, MINAS builds a decision model based on a labeled data set. In the online phase, new examples are classified using this model, or marked as unknown. Groups of unknown examples can be used later to create valid novelty patterns (NP), which are added to the current model. The decision model is updated as new data come over the stream in order to reflect changes in the known classes and allow the addition of NP. This work also presents a set of experiments carried out comparing MINAS and the main novelty detection algorithms found in the literature, using artificial and real data sets. The experimental results show the potential of the proposed algorithm.

CloseRead Abstract

2016

Tensor-based anomaly detection: An interdisciplinary survey

Authors
Fanaee T, H; Gama, J;

Publication
KNOWLEDGE-BASED SYSTEMS

Abstract
Traditional spectral-based methods such as PCA are popular for anomaly detection in a variety of problems and domains. However, if data includes tensor (multiway) structure (e.g. space-time-measurements), some meaningful anomalies may remain invisible with these methods. Although tensor-based anomaly detection (TAD) has been applied within a variety of disciplines over the last twenty years, it is not yet recognized as a formal category in anomaly detection. This survey aims to highlight the potential of tensor-based techniques as a novel approach for detection and identification of abnormalities and failures. We survey the interdisciplinary works in which TAD is reported and characterize the learning strategies, methods and applications; extract the important open issues in TAD and provide the corresponding existing solutions according to the state-of-the-art.

CloseRead Abstract

2016

Dynamic credit score modeling with short-term and long-term memories: the case of Freddie Mac's database

Authors
Sousa, MR; Gama, J; Brandao, E;

Publication
JOURNAL OF RISK MODEL VALIDATION

Abstract
In this paper, we investigate the two mechanisms of memory, short-term memory (STM) and long-term memory (LTM), in the context of credit risk assessment. These components are fundamental to learning but are overlooked in credit risk modeling frameworks. As a consequence, current models are insensitive to changes, such as population drifts or periods of financial distress. We extend the typical development of credit score modeling based in static learning settings to the use of dynamic learning frameworks. Exploring different amounts of memory enables a better adaptation of the model to the current state. This is particularly relevant during shocks, when limited memory is required for a rapid adjustment. At other times, a long memory is favored. An empirical study relying on the Freddie Mac database, with 16.7 million mortgage loans granted in the United States from 1999 to 2013, suggests using a dynamic modeling of STM and LTM components to optimize current rating frameworks.

CloseRead Abstract

2016

How to Correctly Evaluate an Automatic Bioacoustics Classification Method

Authors
Colonna, JG; Gama, J; Nakamura, EF;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016

Abstract
In this work, we introduce a more appropriate (or alternative) approach to evaluate the performance and the generalization capabilities of a framework for automatic anuran call recognition. We show that, by using the common k-folds Cross-Validation (k-CV) procedure to evaluate the expected error in a syllable-based recognition system the recognition accuracy is overestimated. To overcome this problem, and to provide a fair evaluation, we propose a new CV procedure in which the specimen information is considered during the split step of the k-CV. Therefore, we performed a k-CV by specimens (or individuals) showing that the accuracy of the system decrease considerably. By introducing the specimen information, we are able to answer a more fundamental question: Given a set of syllables that belongs to a specific group of individuals, can we recognize new specimens of the same species? In this article, we go deeper into the reviews and the experimental evaluations to answer this question.

CloseRead Abstract

2016

Measures for Combining Prediction Intervals Uncertainty and Reliability in Forecasting

Authors
Almeida, V; Gama, J;

Publication
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015

Abstract
In this paper we propose a new methodology for evaluating prediction intervals (PIs). Typically, PIs are evaluated with reference to confidence values. However, other metrics should be considered, since high values are associated to too wide intervals that convey little information and are of no use for decision-making. We propose to compare the error distribution (predictions out of the interval) and the maximum mean absolute error (MAE) allowed by the confidence limits. Along this paper PIs based on neural networks for short-term load forecast are compared using two different strategies: (1) dual perturb and combine (DPC) algorithm and (2) conformal prediction. We demonstrated that depending on the real scenario (e.g., time of day) different algorithms perform better. The main contribution is the identification of high uncertainty levels in forecast that can guide the decision-makers to avoid the selection of risky actions under uncertain conditions. Small errors mean that decisions can be made more confidently with less chance of confronting a future unexpected condition.

CloseRead Abstract