Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2013

Evaluation Methodology for Multiclass Novelty Detection Algorithms

Autores
Faria, ER; Goncalves, IJCR; Gama, J; Carvalho, ACPLF;

Publicação
2013 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS)

Abstract
Novelty detection is a useful ability for learning systems, especially in data stream scenarios, where new concepts can appear, known concepts can disappear and concepts can evolve over time. There are several studies in the literature investigating the use of machine learning classification techniques for novelty detection in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques, particular for multiclass problems. In this study, we propose a new evaluation approach for multiclass data streams novelty detection problems. This approach is able to deal with: i) multiclass problems; ii) confusion matrix with a column representing the unknown examples; iii) confusion matrix that increases over time; iv) unsupervised learning, that generates novelties without an association with the problem classes and v) representation of the evaluation measures over time. We evaluate the performance of the proposed approach by known novelty detection algorithms with artificial and real data sets.

FecharLer Abstract

2016

Evolution Analysis of Call Ego-Networks

Autores
Tabassum, S; Gama, J;

Publicação
DISCOVERY SCIENCE, (DS 2016)

Abstract
With the realization of networks in many of the real world domains, research work in network science has gained much attention now-a-days. The real world interaction networks are exploited to gain insights into real world connections. One of the notion is to analyze how these networks grow and evolve. Most of the works rely upon the socio centric networks. The socio centric network comprises of several ego networks. How these ego networks evolve greatly influences the structure of network. In this work, we have analyzed the evolution of ego networks from a massive call network stream by using an extensive list of graph metrics. By doing this, we studied the evolution of structural properties of graph and related them with the real world user behaviors. We also proved the densification power law over the temporal call ego networks. Many of the evolving networks obey the densification power law and the number of edges increase as a function of time. Therefore, we discuss a sequential sampling method with forgetting factor to sample the evolving ego network stream. This method captures the most active and recent nodes from the network while preserving the tie strengths between them and maintaining the density of graph and decreasing redundancy.

FecharLer Abstract

2016

On Using Temporal Networks to Analyze User Preferences Dynamics

Autores
Pereira, FSF; de Amo, S; Gama, J;

Publicação
DISCOVERY SCIENCE, (DS 2016)

Abstract
User preferences are fairly dynamic, since users tend to exploit a wide range of information and modify their tastes accordingly over time. Existing models and formulations are too constrained to capture the complexity of this underlying phenomenon. In this paper, we investigate the interplay between user preferences and social networks over time. We propose to analyze user preferences dynamics with his/her social network modeled as a temporal network. First, we define a temporal preference model for reasoning with preferences. Then, we use evolving centralities from temporal networks to link with preferences dynamics. Our results indicate that modeling Twitter as a temporal network is more appropriated for analyzing user preferences dynamics than using just snapshots of static network.

FecharLer Abstract

2014

A Survey on Concept Drift Adaptation

Autores
Gama, J; Zliobaite, I; Bifet, A; Pechenizkiy, M; Bouchachia, A;

Publicação
ACM COMPUTING SURVEYS

Abstract
Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

FecharLer Abstract

2016

A new dynamic modeling framework for credit risk assessment

Autores
Sousa, MR; Gama, J; Brandao, E;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
We propose a new dynamic modeling framework for credit risk assessment that extends the prevailing credit scoring models built upon historical data static settings. The driving idea mimics the principle of films, by composing the model with a sequence of snapshots, rather than a single photograph. In doing so, the dynamic modeling consists of sequential learning from the new incoming data. A key contribution is provided by the insight that different amounts of memory can be explored concurrently. Memory refers to the amount of historic data being used for estimation. This is important in the credit risk area, which often seems to undergo shocks. During a shock, limited memory is important. Other times, a larger memory has merit. An application to a real-world financial dataset of credit cards from a financial institution in Brazil illustrates our methodology, which is able to consistently outperform the static modeling schema.

FecharLer Abstract

2015

Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality

Autores
Saez, C; Rodrigues, P; Gama, J; Robles, M; Garcia Gomez, JM;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Knowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment based on a probabilistic framework. Concretely, methods are proposed for (1) monitoring changes, and (2) characterizing changes, trends and detecting temporal subgroups. First, a probabilistic change detection algorithm is proposed based on the Statistical Process Control of the posterior Beta distribution of the Jensen-Shannon distance, with a memoryless forgetting mechanism. This algorithm (PDF-SPC) classifies the degree of current change in three states: In-Control, Warning, and Out-of-Control. Second, a novel method is proposed to visualize and characterize the temporal changes of data based on the projection of a non-parametric information-geometric statistical manifold of time windows. This projection facilitates the exploration of temporal trends using the proposed IGT-plot and, by means of unsupervised learning methods, discovering conceptually-related temporal subgroups. Methods are evaluated using real and simulated data based on the National Hospital Discharge Survey (NHDS) dataset.

FecharLer Abstract