Publicacoes - INESC TEC

Publicações

Publicações por Paulo Jorge Azevedo

2011

Mining Association Rules for Label Ranking

Autores
de Sa, CR; Soares, C; Jorge, AM; Azevedo, P; Costa, J;

Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011

Abstract
Recently, a number of learning algorithms have been adapted for label ranking, including instance-based and tree-based methods. In this paper, we propose an adaptation of association rules for label ranking. The adaptation, which is illustrated in this work with APRIORI Algorithm, essentially consists of using variations of the support and confidence measures based on ranking similarity functions that are suitable for label ranking. We also adapt the method to make a prediction from the possibly conflicting consequents of the rules that apply to an example. Despite having made our adaptation from a very simple variant of association rules for classification, the results clearly show that the method is making valid predictions. Additionally, they show that it competes well with state-of-the-art label ranking algorithms.

FecharLer Abstract

2007

Comparing rule measures for predictive association rules

Autores
Azevedo, PJ; Jorge, AM;

Publicação
Machine Learning: ECML 2007, Proceedings

Abstract
We study the predictive ability of some association rule measures typically used to assess descriptive interest. Such measures, namely conviction, lift and chi(2) are compared with confidence, Laplace, mutual information, cosine, Jaccard and phi-coefficient. As prediction models, we use sets of association rules. Classification is done by selecting the best rule, or by weighted voting. We performed an evaluation on 17 datasets with different characteristics and conclude that conviction is on average the best predictive measure to use in this setting. We also provide some meta-analysis insights for explaining the results.

FecharLer Abstract

2006

Visual interactive subgroup discovery with numerical properties of interest

Autores
Jorge, AM; Pereira, F; Azevedo, PJ;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
We propose an approach to subgroup discovery using distribution rules (a kind of association rules with a probability distribution on the consequent) for numerical properties of interest. The objective interest of the subgroups is measured through statistical goodness of fit tests. Their subjective interest can be assessed by the data analyst through a visual interactive subgroup browsing procedure.

FecharLer Abstract

2006

Distribution rules with numeric attributes of interest

Autores
Jorge, AM; Azevedo, PJ; Pereira, F;

Publicação
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS

Abstract
In this paper we introduce distribution rules, a kind of association rules with a distribution on the consequent. Distribution rules are related to quantitative association rules but can be seen as a more fundamental concept, useful for learning distributions. We formalize the main concepts and indicate applications to tasks such as frequent pattern discovery, sub group discovery and forecasting. An efficient algorithm for the generation of distribution rules is described. We also provide interest measures, visualization techniques and evaluation.

FecharLer Abstract

2010

Ensembles of jittered association rule classifiers

Autores
Azevedo, PJ; Jorge, AM;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias-variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.

FecharLer Abstract

2012

Finding interesting contexts for explaining deviations in bus trip duration using distribution rules

Autores
Jorge, AM; Mendes Moreira, J; De Sousa, JF; Soares, C; Azevedo, PJ;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
In this paper we study the deviation of bus trip duration and its causes. Deviations are obtained by comparing scheduled times against actual trip duration and are either delays or early arrivals. We use distribution rules, a kind of association rules that may have continuous distributions on the consequent. Distribution rules allow the systematic identification of particular conditions, which we call contexts, under which the distribution of trip time deviations differs significantly from the overall deviation distribution. After identifying specific causes of delay the bus company operational managers can make adjustments to the timetables increasing punctuality without disrupting the service. © Springer-Verlag Berlin Heidelberg 2012.

FecharLer Abstract