Publicacoes - INESC TEC

Publicações

Publicações por Alípio Jorge

2009

The Effect of Varying Parameters and Focusing on Bus Travel Time Prediction

Autores
Moreira, JM; Soares, C; Jorge, AM; de Sousa, JF;

Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS

Abstract
Travel time prediction is an important tool for the planning tasks of mass transit and logistics companies. ID this paper we investigate the use of regression methods for the problem of predicting the travel time of buses in a Portuguese public transportation company. More specifically, we empirically evaluate the impact of varying parameters on the performance of different regression algorithms, such as support vector machines (SVM), random forests (RF) and projection pursuit, regression (PPR). We also evaluate the impact of the focusing tusks (example selection; domain value definition and feature selection) in the accuracy of those algorithms. Concerning the algorithms, we observe that 1) RF is quite robust to the choice of parameters and focusing methods: 2) the choice of parameters for SVM can be made independently of focusing methods while 3) for PPR they should be selected simultaneously. For the focusing methods, we observe that a stronger effect is obtained using example selection, particularly in combination with SVM.

FecharLer Abstract

2009

A Knowledge Discovery Method for the Characterization of Protein Unfolding Processes

Autores
Fernandes, E; Jorge, AM; Silva, CG; Brito, RMM;

Publicação
2ND INTERNATIONAL WORKSHOP ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (IWPACBB 2008)

Abstract
This work presents a method of knowledge discovery in data obtained from Molecular Dynamics Protein Unfolding Simulations. The data under study was obtained from simulations of the unfolding process of the protein Transthyretin (TTR), responsible for amyloid diseases such as Familial Amyloid Polyneuropathy (FAP). Protein unfolding and misfolding are at the source of many amyloidogenic diseases. Thus, the molecular characterization of protein unfolding processes through experimental and simulation methods may be essential in the development of effective treatments. Here, we analyzed the distance variation of each of the 127 amino acids C. (alpha carbon) atoms of TTR to the centre of mass of the protein, along 10 different unfolding simulations - five simulations of WT-TTR and five simulations of L55P-TTR, a highly amyloidogenic TTR variant. Using data mining techniques, and considering all the information of the 10 runs, we identified several clusters of amino acids. For each cluster we selected the representative element and identified events which were used as features. With Association Rules we found patterns that characterize the type of TTR variant under study. These results may help discriminate between amyloidogenic and non-amyloidogenic behaviour among different TTR variants and contribute to the understanding of the molecular mechanisms of FAP.

FecharLer Abstract

2007

Quantitative evaluation of Clusterings for marketing applications: A web portal case study

Autores
Rebelo, C; Brito, PQ; Soares, C; Jorge, A; Brandao, R;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
The potential value of a market segmentation for a company is usually assessed in terms of six criteria: identifiability, substantiality, accessibility, responsiveness, stability and actionability. These are widely accepted as essential criteria, but they are difficult to quantify. Quantification is particularly important in early stages of the segmentation process, especially when automatic clustering methods are employed. With such methods it is easy to produce a large number of segmentations but only the most interesting ones should be selected for further analysis. In this paper, we address the problem of how to quantify the value of a segmentation according to the criteria above. We propose several measures and test them on a case study, consisting of a segmentation of portal users.

FecharLer Abstract

2008

A methodology for exploring association models

Autores
Jorge, A; Pocas, J; Azevedo, PJ;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Visualization in data mining is typically related to data exploration. In this chapter we present a methodology for the post processing and visualization of association rule models. One aim is to provide the user with a tool that enables the exploration of a large set of association rules. The method is inspired by the hypertext metaphor. The initial set of rules is dynamically divided into small comprehensible sets or pages, according to the interest of the user. From each set, the user can move to other sets by choosing one appropriate operator. The set of available operators transform sets of rules into sets of rules, allowing focusing on interesting regions of the rule space. Each set of rules can also be then seen with different graphical representations. The tool is web-based and dynamically generates SVG pages to represent graphics. Association rules are given in PMML format. © 2008 Springer-Verlag Berlin Heidelberg.

FecharLer Abstract

2006

Semi-automatic creation and maintenance of web resources with webTopic

Autores
Escudeiro, NF; Jorge, AM;

Publicação
Semantics, Web and Mining

Abstract
In this paper we propose a methodology for automatically retrieving document collections from the web on specific topics and for organizing them and keeping them up-to-date over time, according to user specific persistent information needs. The documents collected are organized according to user specifications and are classified partly by the user and partly automatically. A presentation layer enables the exploration of large sets of documents and, simultaneously, monitors and records user interaction with these document collections. The quality of the system is permanently monitored; the system periodically measures and stores the values of its quality parameters. Using this quality log it is possible to maintain the quality of the resources by triggering procedures aimed at correcting or preventing quality degradation.

FecharLer Abstract

2005

Monitoring the quality of meta-data in web portals using statistics, visualization and data mining

Autores
Soares, C; Jorge, AM; Domingues, MA;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
We propose a methodology to monitor the quality of the meta-data used to describe content in web portals. It is based on the analysis of the meta-data using statistics, visualization and data mining tools. The methodology enables the site's editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We also define a general architecture for a platform to support the proposed methodology. We have implemented this platform and tested it on a Portuguese portal for management; executives. The results validate the methodology proposed.

FecharLer Abstract