Publicacoes - INESC TEC

Publicações

Publicações por Vítor Santos Costa

2008

RUSE-WARMR: Rule Selection for Classifier Induction in Multi-Relational Data-Sets

Autores
Ferreira, CA; Gama, J; Costa, VS;

Publicação
20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS

Abstract
One of the major challenges in knowledge discovery is how to extract meaningful and useful knowledge from the complex structured data that one finds in Scientific and Technological applications. One approach is to explore the logic relations in the database and using, say, an Inductive Logic Programming (ILP) algorithm find descriptive and expressive patterns. These patterns can then be used as features to characterize the target concept, The effectiveness of these algorithms depends both upon the algorithm we use to generate the patterns and upon the classifier Rule mining provides an excellent framework for efficiently mining the interesting patterns that are relevant. We propose a novel method to select discriminative patterns and evaluate the effectiveness of this method on a complex discovery application of practical interest.

FecharLer Abstract

2012

Predictive sequence miner in ILP learning

Autores
Ferreira, CA; Gama, J; Santos Costa, V;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
This work presents an optimized version of XMuSer, an ILP based framework suitable to explore temporal patterns available in multi-relational databases. XMuSer's main idea consists of exploiting frequent sequence mining, an efficient method to learn temporal patterns in the form of sequences. XMuSer framework efficiency is grounded on a new coding methodology for temporal data and on the use of a predictive sequence miner. The frameworks selects and map the most interesting sequential patterns into a new table, the sequence relation. In the last step of our framework, we use an ILP algorithm to learn a classification theory on the enlarged relational database that consists of the original multi-relational database and the new sequence relation. We evaluate our framework by addressing three classification problems and map each one of three different types of sequential patterns: frequent, closed or maximal. The experiments show that our ILP based framework gains both from the descriptive power of the ILP algorithms and the efficiency of the sequential miners. © 2012 Springer-Verlag Berlin Heidelberg.

FecharLer Abstract

2010

Sequential Pattern Mining in Multi-relational Datasets

Autores
Ferreira, CA; Gama, J; Costa, VS;

Publicação
CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE

Abstract
We present a framework designed to mine sequential temporal patterns from multi-relational databases. In order to exploit logic-relational information without using aggregation methodologies, we convert the multi-relational dataset into what we name a multi-sequence database. Each example in a multi-relational target table is coded into a sequence that combines intra-table and inter-table relational temporal information. This allows us to find heterogeneous temporal patterns through standard sequence miners. Our framework is grounded in the excellent results achieved by previous propositionalization strategies. We follow a pipelined approach, where we first use a sequence miner to find frequent sequences in the multi-sequence database. Next, we select the most interesting findings to augment the representational space of the examples. The most interesting sequence patterns are discriminative and class correlated. In the final step we build a classifier model by taking an enlarged target table as input to a classifier algorithm. We evaluate the performance of this work through a motivating application, the hepatitis multi-relational dataset. We prove the effectiveness of our methodology by addressing two problems of the hepatitis dataset.

FecharLer Abstract

2009

Discovery Science, 12th International Conference, DS 2009, Porto, Portugal, October 3-5, 2009

Autores
Gama, J; Costa, VS; Jorge, AM; Brazdil, P;

Publicação
Discovery Science

Abstract

2009

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface

Autores
Gama, J; Costa, VS; Jorge, A; Brazdil, P;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2012

Predicting Ramp Events with a Stream-Based HMM Framework

Autores
Ferreira, CA; Gama, J; Costa, VS; Miranda, V; Botterud, A;

Publicação
Discovery Science - 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings

Abstract
The motivation for this work is the study and prediction of wind ramp events occurring in a large-scale wind farm located in the US Midwest. In this paper we introduce the SHRED framework, a stream-based model that continuously learns a discrete HMM model from wind power and wind speed measurements. We use a supervised learning algorithm to learn HMM parameters from discretized data, where ramp events are HMM states and discretized wind speed data are HMM observations. The discretization of the historical data is obtained by running the SAX algorithm over the first order variations in the original signal. SHRED updates the HMM using the most recent historical data and includes a forgetting mechanism to model natural time dependence in wind patterns. To forecast ramp events we use recent wind speed forecasts and the Viterbi algorithm, that incrementally finds the most probable ramp event to occur. We compare SHRED framework against Persistence baseline in predicting ramp events occurring in short-time horizons, ranging from 30 minutes to 90 minutes. SHRED consistently exhibits more accurate and cost-effective results than the baseline. © 2012 Springer-Verlag Berlin Heidelberg.

FecharLer Abstract