Publicacoes - INESC TEC

Publicações

Publicações por Alípio Jorge

2006

Factor analysis to support the visualization and interpretation of clusters of portal users

Autores
Rebelo, C; Brito, PQ; Soares, C; Jorge, A;

Publicação
2006 IEEE/WIC/ACM International Conference on Web Intelligence, (WI 2006 Main Conference Proceedings)

Abstract
Clusterings based on many variables are difficult to visualize and interpret. We present a methodology based on Factor Analysis (FA) which can be used for that purpose. FA generates a small set of variables which encode most of the information in the original variables. We apply the methodology to segment the users of a web portal, using access log data. It not only makes it simpler to visualize and understand the clusters which are obtained on the original variables but it also helps the analyst in selecting some of the original variables for further analysis of those clusters.

FecharLer Abstract

2006

Personalization of e-newsletters based on web log analysis and clustering

Autores
Carvalho, C; Jorge, AM; Soares, C;

Publicação
2006 IEEE/WIC/ACM International Conference on Web Intelligence, (WI 2006 Main Conference Proceedings)

Abstract
We present a methodology for the personalization of e-newsletters based on the analysis of user access logs. To approach the problem we have used clustering on the set of users, described by their web access patterns. Our work is evaluated using a case study with real data from e-newsletters sent by mail to users of a web portal, and can be adapted to similar situations. Positive results were obtained, indicating that the methodology is able to automatically select contents for a personalized e-newsletter.

FecharLer Abstract

2006

A web-based system to monitor the quality of meta-data in web portals

Autores
Domingues, MA; Soares, C; Jorge, AM;

Publicação
2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Workshops Proceedings

Abstract
We present a web-based system to monitor the quality of the meta-data used to describe content in web portals. The system implements meta-data analysis using statistical, visualization and data mining tools. The web-based system enables the site's editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We have developed this system and tested it on a Portuguese portal for management executives.

FecharLer Abstract

2012

Forgetting mechanisms for scalable collaborative filtering

Autores
Vinagre, J; Jorge, AM;

Publicação
Journal of the Brazilian Computer Society

Abstract
Collaborative filtering (CF) has been an important subject of research in the past few years. Many achievements have been made in this field, however, many challenges still need to be faced, mainly related to scalability and predictive ability. One important issue is how to deal with old and potentially obsolete data in order to avoid unnecessary memory usage and processing time. Our proposal is to use forgetting mechanisms. In this paper, we present and evaluate the impact of two forgetting mechanisms-sliding windows and fading factors-in user-based and item-based CF algorithms with implicit binary ratings under a scenario of abrupt change. Our results suggest that forgetting mechanisms reduce time and space requirements, improving scalability, while not significantly affecting the predictive ability of the algorithms. © 2012 The Brazilian Computer Society.

FecharLer Abstract

2012

D-Confidence: An active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions

Autores
Escudeiro, NF; Jorge, AM;

Publicação
Journal of the Brazilian Computer Society

Abstract
In some classification tasks, such as those related to the automatic building and maintenance of text corpora, it is expensive to obtain labeled instances to train a classifier. In such circumstances it is common to have massive corpora where a few instances are labeled (typically a minority) while others are not. Semi-supervised learning techniques try to leverage the intrinsic information in unlabeled instances to improve classification models. However, these techniques assume that the labeled instances cover all the classes to learn which might not be the case. Moreover, when in the presence of an imbalanced class distribution, getting labeled instances from minority classes might be very costly, requiring extensive labeling, if queries are randomly selected. Active learning allows asking an oracle to label new instances, which are selected by criteria, aiming to reduce the labeling effort. D-Confidence is an active learning approach that is effective when in presence of imbalanced training sets. In this paper we evaluate the performance of d-Confidence in comparison to its baseline criteria over tabular and text datasets. We provide empirical evidence that d-Confidence reduces label disclosure complexity-which we have defined as the number of queries required to identify instances from all classes to learn-when in the presence of imbalanced data. © 2012 The Brazilian Computer Society.

FecharLer Abstract

2010

Ensembles of jittered association rule classifiers

Autores
Azevedo, PJ; Jorge, AM;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias-variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.

FecharLer Abstract