Publications

Publications by Alípio Jorge

2004

Hierarchical clustering for thematic browsing and summarization of large sets of association rules

Authors
Jorge, A;

Publication
Proceedings of the Fourth SIAM International Conference on Data Mining

Abstract
In this paper we propose a method for grouping and summarizing large sets of association rules according to the items contained in each rule. We use hierarchical clustering to partition the initial rule set into thematically coherent subsets. This enables the summarization of the rule set by adequately choosing a representative rule for each subset, and helps in the interactive exploration of the rule model by the user. We define the requirements of our approach, and formally show the adequacy of the chosen approach to our aims. Rule clusters can also be used to infer novel interest measures for the rules. Such measures are based on the lexicon of the rules and are complementary to measures based on statistical properties, such as confidence, lift and conviction. We show examples of the application of the proposed techniques.

CloseRead Abstract

2009

Item-Based and User-Based Incremental Collaborative Filtering for Web Recommendations

Authors
Miranda, C; Jorge, AM;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we propose an incremental item-based collaborative filtering algorithm. It works with binary ratings (sometimes also called implicit ratings), as it; is typically the case in a Web environment. Our method is capable of incorporating new information in parallel with performing recommendation. New sessions and new users are used to update the similarity matrix as they appear. The proposed algorithm is compared with a non-incremental one, as well as with an incremental user-based approach, based oil an existing explicit, rating recommender. The use of techniques for working with sparse matrices oil these algorithms is also evaluated. All versions, implemented ill R, are evaluated on 5 datasets with various number of users and/or items. We observed that: Recall tends to improve when we continuously add information to the recommender model; the time spent for recommendation does not degrade; the time for updating the similarity matrix (necessary to the recommendation) is relatively low and motivates the use of the item-based incremental approach. Moreover we study how the number of items and users affects the user based and the item based approaches.

CloseRead Abstract

2007

A tool for interactive subgroup discovery using distribution rules

Authors
Lucas, JP; Jorge, AM; Pereira, F; PernaS, AM; Machado, AA;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
We describe an approach and a tool for the discovery of subgroups within the framework of distribution rule mining. Distribution rules are a kind of association rules particularly suited for the exploratory study of numerical variables of interest. Being an exploratory technique, the result of a distribution mining process is typically a very large number of patterns. Exploring such results is thus a complex task and limits the use of the technique. To overcome this shortcoming we developed a tool, written in Java, which supports subgroup discovery in a post-processing step. The tool engages the analyst in an interactive process of subgroup discovery by means of a graphical interface with well defined statistical grounds, where domain knowledge can be used during the identification of such subgroups amid the population. We show a case study to analyze the results of students in a large scale university admission examination.

CloseRead Abstract

2007

Comparing rule measures for predictive association rules

Authors
Azevedo, PJ; Jorge, AM;

Publication
Machine Learning: ECML 2007, Proceedings

Abstract
We study the predictive ability of some association rule measures typically used to assess descriptive interest. Such measures, namely conviction, lift and chi(2) are compared with confidence, Laplace, mutual information, cosine, Jaccard and phi-coefficient. As prediction models, we use sets of association rules. Classification is done by selecting the best rule, or by weighted voting. We performed an evaluation on 17 datasets with different characteristics and conclude that conviction is on average the best predictive measure to use in this setting. We also provide some meta-analysis insights for explaining the results.

CloseRead Abstract

2003

Visualization and evaluation support of knowledge discovery through the predictive model markup language

Authors
Wettschereck, D; Jorge, A; Moyle, S;

Publication
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS

Abstract
The emerging standard for the platform- and system-independent representation of data mining models PMML (Predictive Model Markup Language) is currently supported by a number of knowledge discovery support engines. The primary purpose of the PMML standard is to separate model generation from model storage in order to enable users to view, post-process, and utilize data mining models independently of the tool that generated the model. In this paper two systems, called VizWiz and PEAR, are described. These software packages allow for the visualization and evaluation of data mining models that are specified in PMML. They can be viewed. as decision support systems, since they enable non-expert users of data mining results to interactively inspect and evaluate these results.

CloseRead Abstract

2004

Extreme adaptivity

Authors
Alves, MA; Jorge, A; Leal, JP;

Publication
ADAPTIVE HYPERMEDIA AND ADAPOTIVE WEB-BASED SYSTEMS, PROCEEDINGS

Abstract
This Doctoral Consortium paper focuses on Extreme Adaptivity, a set of top level requirements for adaptive hypertext systems, which has resulted from one year of examining the adaptive hypertext landscape. The complete specification of a system, KnowledgeAtoms, is also given, mainly as an example of Extreme Adaptivity. Additional methodological elements are discussed.

CloseRead Abstract