Publications

Publications by Alípio Jorge

2012

Ensemble Approaches for Regression: A Survey

Authors
Mendes Moreira, J; Soares, C; Jorge, AM; De Sousa, JF;

Publication
ACM COMPUTING SURVEYS

Abstract
The goal of ensemble regression is to combine several models in order to improve the prediction accuracy in learning problems with a numerical target variable. The process of ensemble learning can be divided into three phases: the generation phase, the pruning phase, and the integration phase. We discuss different approaches to each of these phases that are able to deal with the regression problem, categorizing them in terms of their relevant characteristics and linking them to contributions from different fields. Furthermore, this work makes it possible to identify interesting areas for future research.

CloseRead Abstract

1995

Learning recursion with iterative bootstrap induction

Authors
Jorge, A; Brazdil, P;

Publication
MACHINE LEARNING: ECML-95

Abstract
In this paper we are concerned with the problem of inducing recursive Horn clauses from small sets of training examples. The method of iterative bootstrap induction is presented. In the first step, the system generates simple clauses, which can be regarded as properties of the required definition. Properties represent generalizations of the positive examples, simulating the effect of having larger number of examples. Properties are used subsequently to induce the required recursive definitions. This paper describes the method together with a series of experiments. The results support the thesis that iterative bootstrap induction is indeed an effective technique that could be of general use in ILP.

CloseRead Abstract

2002

Post-processing operators for browsing large sets of association rules

Authors
Jorge, A; Pocas, J; Azevedo, P;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
Association rule engines typically output a very large set of rules. Despite the fact that association rules are regarded as highly comprehensible and useful for data mining and decision support in fields such as marketing, retail, demographics, among others, lengthy outputs may discourage users from using the technique. In this paper we propose a post-processing methodology and tool for browsing/visualizing large sets of association rules. The method is based on a set of operators that transform sets of rules into sets of rules, allowing focusing on interesting regions of the rule space. Each set of rules can be then seen with different graphical representations. The tool is web-based and uses SVG. Association rules are given in PMML.

CloseRead Abstract

2001

Combining rule-based and case-based learning for iterative part-of-speech tagging

Authors
Lopes, AA; Jorge, A;

Publication
ADVANCES IN CASE-BASED REASONING, PROCEEDINGS

Abstract
In this article we show how the accuracy of a rule based first order theory may be increased by combining it with a case-based approach in a classification task. Case-based learning is used when the rule language bias is exhausted. This is achieved in an iterative approach. In each iteration theories consisting of first order rules are induced and covered examples are removed. The process stops when it is no longer possible to find rules with satisfactory quality. The remaining examples are then handled as cases. The case-based approach proposed here is also, to a large extent, new, Instead of only storing the cases as provided, it has a learning phase where, for each case, it constructs and stores a set of explanations with support and confidence above given thresholds. These explanations have different levels of generality and the maximally specific one corresponds to the case itself The same case may have different explanations representing different perspectives of the case. Therefore, to classify a new case, it looks for relevant stored explanations applicable to the new case. The different possible views of the case given by the explanations correspond to considering different sets of conditions/features to analyze the case. In other words, they lead to different ways to compute similarity between known cases/explanations and the new case to be classified (as opposed to the commonly used global metric). Experimental results have been obtained on a corpus of Portuguese texts for the task of part-of-speech tagging with significant improvement.

CloseRead Abstract

1997

Integrity constraints in ILP using a Monte Carlo approach

Authors
Jorge, A; Brazdil, PB;

Publication
INDUCTIVE LOGIC PROGRAMMING

Abstract
Many state-of-the-art ILP systems require large numbers of negative examples to avoid overgeneralization. This is a considerable disadvantage for many ILP applications, namely inductive program synthesis where relativelly small and sparse example sets are a more realistic scenario. Integrity constraints are first order clauses that can play the role of negative examples in an inductive process. One integrity constraint can replace a long list of ground negative examples. However, checking the consistency of a program with a set of integrity constraints usually involves heavy theorem-proving. We propose an efficient constraint satisfaction algorithm that applies to a wide variety of useful integrity constraints and uses a Monte Carlo strategy. It looks for inconsistencies by random generation of queries to the program. This method allows the use of integrity constraints instead of (or together with) negative examples. As a consequence programs to induce can be specified more rapidly by the user and the ILP system tends to obtain more accurate definitions. Average running times are not greatly affected by the use of integrity constraints compared to ground negative examples.

CloseRead Abstract

2001

Collaboration support for virtual data mining enterprises

Authors
Voß, A; Richter, G; Moyle, S; Jorge, A;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
RAMSYS is a web-based infrastructure for collaborative data mining. It is being developed in the SolEuNet European Project for virtual enterprise services in data mining and decision support. Central to RAMSYS is the idea of sharing the current best understanding to foster efficient collaboration. This paper presents the design and rationale of Zeno, a core component of RAMSYS. Zeno is a groupware for discourses on the Internet and, for RAMSYS, aims to provide a “virtual data mining laboratory” to aid data miners in collaboratively producing better solutions to data mining problems. © Springer-Verlag Berlin Heidelberg 2001.

CloseRead Abstract