Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2016

Dynamic credit score modeling with short-term and long-term memories: the case of Freddie Mac's database

Authors
Sousa, MR; Gama, J; Brandao, E;

Publication
JOURNAL OF RISK MODEL VALIDATION

Abstract
In this paper, we investigate the two mechanisms of memory, short-term memory (STM) and long-term memory (LTM), in the context of credit risk assessment. These components are fundamental to learning but are overlooked in credit risk modeling frameworks. As a consequence, current models are insensitive to changes, such as population drifts or periods of financial distress. We extend the typical development of credit score modeling based in static learning settings to the use of dynamic learning frameworks. Exploring different amounts of memory enables a better adaptation of the model to the current state. This is particularly relevant during shocks, when limited memory is required for a rapid adjustment. At other times, a long memory is favored. An empirical study relying on the Freddie Mac database, with 16.7 million mortgage loans granted in the United States from 1999 to 2013, suggests using a dynamic modeling of STM and LTM components to optimize current rating frameworks.

2013

Preface

Authors
Rodrigues, PP; Pechenizkiy, M; Gama, J; Correia, RC; Liu, J; Traina, A; Lucas, P; Soda, P;

Publication
Proceedings of CBMS 2013 - 26th IEEE International Symposium on Computer-Based Medical Systems

Abstract

2015

Very fast decision rules for classification in data streams

Authors
Kosina, P; Gama, J;

Publication
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.

2017

Fading histograms in detecting distribution and concept changes

Authors
Sebastião, R; Gama, J; Mendonça, T;

Publication
I. J. Data Science and Analytics

Abstract

2014

Unsupervised density-based behavior change detection in data streams

Authors
Vallim, RMM; Andrade Filho, JA; de Mello, RF; de Carvalho, ACPLF; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change detection in Data Streams called M-DBScan. This framework is composed of a density-based clustering step followed by a novelty detection procedure based on entropy level measures. This work uses two different types of entropy measures, where one considers the spatial distribution of data while the other models temporal relations between observations in the stream. The performance of the method is assessed in a set of experiments comparing M-DBScan with a proximity-based approach. Experimental results provide important insight on how to design change detection mechanisms for streams.

2016

How to Correctly Evaluate an Automatic Bioacoustics Classification Method

Authors
Colonna, JG; Gama, J; Nakamura, EF;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016

Abstract
In this work, we introduce a more appropriate (or alternative) approach to evaluate the performance and the generalization capabilities of a framework for automatic anuran call recognition. We show that, by using the common k-folds Cross-Validation (k-CV) procedure to evaluate the expected error in a syllable-based recognition system the recognition accuracy is overestimated. To overcome this problem, and to provide a fair evaluation, we propose a new CV procedure in which the specimen information is considered during the split step of the k-CV. Therefore, we performed a k-CV by specimens (or individuals) showing that the accuracy of the system decrease considerably. By introducing the specimen information, we are able to answer a more fundamental question: Given a set of syllables that belongs to a specific group of individuals, can we recognize new specimens of the same species? In this article, we go deeper into the reviews and the experimental evaluations to answer this question.

  • 14
  • 90