Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2016

Adaptive Model Rules From High-Speed Data Streams

Autores
Duarte, J; Gama, J; Bifet, A;

Publicação
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA

Abstract
Decision rules are one of the most expressive and interpretable models for machine learning. In this article, we present Adaptive Model Rules (AMRules), the first stream rule learning algorithm for regression problems. In AMRules, the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. In order to maintain a regression model compatible with the most recent state of the process generating data, each rule uses a Page-Hinkley test to detect changes in this process and react to changes by pruning the rule set. Online learning might be strongly affected by outliers. AMRules is also equipped with outliers detection mechanisms to avoid model adaption using anomalous examples. In the experimental section, we report the results of AMRules on benchmark regression problems, and compare the performance of our system with other streaming regression algorithms.

2016

Evolving Centralities in Temporal Graphs: A Twitter Network Analysis

Autores
Pereira, FSF; Amo, Sd; Gama, J;

Publicação
IEEE 17th International Conference on Mobile Data Management, MDM 2016, Porto, Portugal, June 13-16, 2016 - Workshops

Abstract

2016

MINAS: multiclass learning algorithm for novelty detection in data streams

Autores
de Faria, ER; de Leon Ferreira Carvalho, ACPDF; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Data stream mining is an emergent research area that aims at extracting knowledge from large amounts of continuously generated data. Novelty detection (ND) is a classification task that assesses if one or a set of examples differ significantly from the previously seen examples. This is an important task for data stream, as new concepts may appear, disappear or evolve over time. Most of the works found in the ND literature presents it as a binary classification task. In several data stream real life problems, ND must be treated as a multiclass task, in which, the known concept is composed by one or more classes and different new classes may appear. This work proposes MINAS, an algorithm for ND in data streams. MINAS deals with ND as a multiclass task. In the initial training phase, MINAS builds a decision model based on a labeled data set. In the online phase, new examples are classified using this model, or marked as unknown. Groups of unknown examples can be used later to create valid novelty patterns (NP), which are added to the current model. The decision model is updated as new data come over the stream in order to reflect changes in the known classes and allow the addition of NP. This work also presents a set of experiments carried out comparing MINAS and the main novelty detection algorithms found in the literature, using artificial and real data sets. The experimental results show the potential of the proposed algorithm.

2016

Tensor-based anomaly detection: An interdisciplinary survey

Autores
Fanaee T, H; Gama, J;

Publicação
KNOWLEDGE-BASED SYSTEMS

Abstract
Traditional spectral-based methods such as PCA are popular for anomaly detection in a variety of problems and domains. However, if data includes tensor (multiway) structure (e.g. space-time-measurements), some meaningful anomalies may remain invisible with these methods. Although tensor-based anomaly detection (TAD) has been applied within a variety of disciplines over the last twenty years, it is not yet recognized as a formal category in anomaly detection. This survey aims to highlight the potential of tensor-based techniques as a novel approach for detection and identification of abnormalities and failures. We survey the interdisciplinary works in which TAD is reported and characterize the learning strategies, methods and applications; extract the important open issues in TAD and provide the corresponding existing solutions according to the state-of-the-art.

2016

Dynamic credit score modeling with short-term and long-term memories: the case of Freddie Mac's database

Autores
Sousa, MR; Gama, J; Brandao, E;

Publicação
JOURNAL OF RISK MODEL VALIDATION

Abstract
In this paper, we investigate the two mechanisms of memory, short-term memory (STM) and long-term memory (LTM), in the context of credit risk assessment. These components are fundamental to learning but are overlooked in credit risk modeling frameworks. As a consequence, current models are insensitive to changes, such as population drifts or periods of financial distress. We extend the typical development of credit score modeling based in static learning settings to the use of dynamic learning frameworks. Exploring different amounts of memory enables a better adaptation of the model to the current state. This is particularly relevant during shocks, when limited memory is required for a rapid adjustment. At other times, a long memory is favored. An empirical study relying on the Freddie Mac database, with 16.7 million mortgage loans granted in the United States from 1999 to 2013, suggests using a dynamic modeling of STM and LTM components to optimize current rating frameworks.

2016

How to Correctly Evaluate an Automatic Bioacoustics Classification Method

Autores
Colonna, JG; Gama, J; Nakamura, EF;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016

Abstract
In this work, we introduce a more appropriate (or alternative) approach to evaluate the performance and the generalization capabilities of a framework for automatic anuran call recognition. We show that, by using the common k-folds Cross-Validation (k-CV) procedure to evaluate the expected error in a syllable-based recognition system the recognition accuracy is overestimated. To overcome this problem, and to provide a fair evaluation, we propose a new CV procedure in which the specimen information is considered during the split step of the k-CV. Therefore, we performed a k-CV by specimens (or individuals) showing that the accuracy of the system decrease considerably. By introducing the specimen information, we are able to answer a more fundamental question: Given a set of syllables that belongs to a specific group of individuals, can we recognize new specimens of the same species? In this article, we go deeper into the reviews and the experimental evaluations to answer this question.

  • 216
  • 430