Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Alípio Jorge

2003

Automatic selection of table areas in documents for information extraction

Authors
Silva, ACE; Jorge, A; Torgo, L;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
The information contained in companies' financial statements is valuable to several users. Much of the relevant information in such documents is contained in tables and is currently mainly extracted by hand. We propose a method that accomplishes a prior step of the task of automatically extracting information from tables in documents: selecting the lines that are likely to belong to tables. Our method has been developed by empirically analyzing a set of Portuguese companies' financial statements using statistical and data mining techniques. Empirical evaluation indicates that more than 99% of table lines are selected after discarding at least 50% of all lines. The method can cope with the complexity of styles used in assembling information on paper and adapt its performance accordingly, thus maximizing its results.

2004

Hierarchical clustering for thematic browsing and summarization of large sets of association rules

Authors
Jorge, A;

Publication
Proceedings of the Fourth SIAM International Conference on Data Mining

Abstract
In this paper we propose a method for grouping and summarizing large sets of association rules according to the items contained in each rule. We use hierarchical clustering to partition the initial rule set into thematically coherent subsets. This enables the summarization of the rule set by adequately choosing a representative rule for each subset, and helps in the interactive exploration of the rule model by the user. We define the requirements of our approach, and formally show the adequacy of the chosen approach to our aims. Rule clusters can also be used to infer novel interest measures for the rules. Such measures are based on the lexicon of the rules and are complementary to measures based on statistical properties, such as confidence, lift and conviction. We show examples of the application of the proposed techniques.

2009

Item-Based and User-Based Incremental Collaborative Filtering for Web Recommendations

Authors
Miranda, C; Jorge, AM;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we propose an incremental item-based collaborative filtering algorithm. It works with binary ratings (sometimes also called implicit ratings), as it; is typically the case in a Web environment. Our method is capable of incorporating new information in parallel with performing recommendation. New sessions and new users are used to update the similarity matrix as they appear. The proposed algorithm is compared with a non-incremental one, as well as with an incremental user-based approach, based oil an existing explicit, rating recommender. The use of techniques for working with sparse matrices oil these algorithms is also evaluated. All versions, implemented ill R, are evaluated on 5 datasets with various number of users and/or items. We observed that: Recall tends to improve when we continuously add information to the recommender model; the time spent for recommendation does not degrade; the time for updating the similarity matrix (necessary to the recommendation) is relatively low and motivates the use of the item-based incremental approach. Moreover we study how the number of items and users affects the user based and the item based approaches.

2007

A tool for interactive subgroup discovery using distribution rules

Authors
Lucas, JP; Jorge, AM; Pereira, F; PernaS, AM; Machado, AA;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
We describe an approach and a tool for the discovery of subgroups within the framework of distribution rule mining. Distribution rules are a kind of association rules particularly suited for the exploratory study of numerical variables of interest. Being an exploratory technique, the result of a distribution mining process is typically a very large number of patterns. Exploring such results is thus a complex task and limits the use of the technique. To overcome this shortcoming we developed a tool, written in Java, which supports subgroup discovery in a post-processing step. The tool engages the analyst in an interactive process of subgroup discovery by means of a graphical interface with well defined statistical grounds, where domain knowledge can be used during the identification of such subgroups amid the population. We show a case study to analyze the results of students in a large scale university admission examination.

2007

Comparing rule measures for predictive association rules

Authors
Azevedo, PJ; Jorge, AM;

Publication
Machine Learning: ECML 2007, Proceedings

Abstract
We study the predictive ability of some association rule measures typically used to assess descriptive interest. Such measures, namely conviction, lift and chi(2) are compared with confidence, Laplace, mutual information, cosine, Jaccard and phi-coefficient. As prediction models, we use sets of association rules. Classification is done by selecting the best rule, or by weighted voting. We performed an evaluation on 17 datasets with different characteristics and conclude that conviction is on average the best predictive measure to use in this setting. We also provide some meta-analysis insights for explaining the results.

2003

Visualization and evaluation support of knowledge discovery through the predictive model markup language

Authors
Wettschereck, D; Jorge, A; Moyle, S;

Publication
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS

Abstract
The emerging standard for the platform- and system-independent representation of data mining models PMML (Predictive Model Markup Language) is currently supported by a number of knowledge discovery support engines. The primary purpose of the PMML standard is to separate model generation from model storage in order to enable users to view, post-process, and utilize data mining models independently of the tool that generated the model. In this paper two systems, called VizWiz and PEAR, are described. These software packages allow for the visualization and evaluation of data mining models that are specified in PMML. They can be viewed. as decision support systems, since they enable non-expert users of data mining results to interactively inspect and evaluate these results.

  • 27
  • 40