Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2016

Dynamic credit score modeling with short-term and long-term memories: the case of Freddie Mac's database

Autores
Sousa, MR; Gama, J; Brandao, E;

Publicação
JOURNAL OF RISK MODEL VALIDATION

Abstract
In this paper, we investigate the two mechanisms of memory, short-term memory (STM) and long-term memory (LTM), in the context of credit risk assessment. These components are fundamental to learning but are overlooked in credit risk modeling frameworks. As a consequence, current models are insensitive to changes, such as population drifts or periods of financial distress. We extend the typical development of credit score modeling based in static learning settings to the use of dynamic learning frameworks. Exploring different amounts of memory enables a better adaptation of the model to the current state. This is particularly relevant during shocks, when limited memory is required for a rapid adjustment. At other times, a long memory is favored. An empirical study relying on the Freddie Mac database, with 16.7 million mortgage loans granted in the United States from 1999 to 2013, suggests using a dynamic modeling of STM and LTM components to optimize current rating frameworks.

FecharLer Abstract

2013

Preface

Autores
Rodrigues, PP; Pechenizkiy, M; Gama, J; Correia, RC; Liu, J; Traina, A; Lucas, P; Soda, P;

Publicação
Proceedings of CBMS 2013 - 26th IEEE International Symposium on Computer-Based Medical Systems

Abstract

2015

Very fast decision rules for classification in data streams

Autores
Kosina, P; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.

FecharLer Abstract

2017

Fading histograms in detecting distribution and concept changes

Autores
Sebastião, R; Gama, J; Mendonça, T;

Publicação
I. J. Data Science and Analytics

Abstract
The remarkable number of real applications under dynamic scenarios is driving a novel ability to generate and gatherinformation.Nowadays,amassiveamountofinforma- tion is generated at a high-speed rate, known as data streams. Moreover, data are collected under evolving environments. Due to memory restrictions, data must be promptly processed and discarded immediately. Therefore, dealing with evolving data streams raises two main questions: (i) how to remember discarded data? and (ii) how to forget outdated data? To main- tain an updated representation of the time-evolving data, this paper proposes fading histograms. Regarding the dynamics of nature, changes in data are detected through a windowing scheme that compares data distributions computed by the fading histograms: the adaptive cumulative windows model (ACWM). The online monitoring of the distance between data distributions is evaluated using a dissimilarity measure based on the asymmetry of the Kullback–Leibler divergence.The experimental results support the ability of fading his- tograms in providing an updated representation of data. Such property works in favor of detecting distribution changes with smaller detection delay time when compared with stan- dard histograms. With respect to the detection of concept changes, the ACWM is compared with 3 known algorithms taken from the literature, using artificial data and using pub- lic data sets, presenting better results. Furthermore, we the proposed method was extended for multidimensional and the experiments performed show the ability of the ACWM for detecting distribution changes in these settings.

FecharLer Abstract

2014

Unsupervised density-based behavior change detection in data streams

Autores
Vallim, RMM; Andrade Filho, JA; de Mello, RF; de Carvalho, ACPLF; Gama, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change detection in Data Streams called M-DBScan. This framework is composed of a density-based clustering step followed by a novelty detection procedure based on entropy level measures. This work uses two different types of entropy measures, where one considers the spatial distribution of data while the other models temporal relations between observations in the stream. The performance of the method is assessed in a set of experiments comparing M-DBScan with a proximity-based approach. Experimental results provide important insight on how to design change detection mechanisms for streams.

FecharLer Abstract

2016

How to Correctly Evaluate an Automatic Bioacoustics Classification Method

Autores
Colonna, JG; Gama, J; Nakamura, EF;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016

Abstract
In this work, we introduce a more appropriate (or alternative) approach to evaluate the performance and the generalization capabilities of a framework for automatic anuran call recognition. We show that, by using the common k-folds Cross-Validation (k-CV) procedure to evaluate the expected error in a syllable-based recognition system the recognition accuracy is overestimated. To overcome this problem, and to provide a fair evaluation, we propose a new CV procedure in which the specimen information is considered during the split step of the k-CV. Therefore, we performed a k-CV by specimens (or individuals) showing that the accuracy of the system decrease considerably. By introducing the specimen information, we are able to answer a more fundamental question: Given a set of syllables that belongs to a specific group of individuals, can we recognize new specimens of the same species? In this article, we go deeper into the reviews and the experimental evaluations to answer this question.

FecharLer Abstract