Publications

Publications by José Luís Borges

2009

Variable Length Markov Chains for Web Usage Mining

Authors
Borges, J; Levene, M;

Publication
- Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract

2012

A New Method to Obtain a Consensus Ranking of a Region's Vintages’ Quality

Authors
Borges, J; Real, AC; Cabral, JS; Jones, GV;

Publication
Journal of Wine Economics - J Wine Econ

Abstract

2000

A fine grained heuristic to capture web navigation patterns

Authors
Borges, J; Levene, M;

Publication
SIGKDD Explor. Newsl. - ACM SIGKDD Explorations Newsletter

Abstract

2007

Evaluating variable-length Markov chain models for analysis of user Web navigation sessions

Authors
Borges, J; Levene, M;

Publication
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Abstract
Markov models have been widely used to represent and analyze user Web navigation data. In previous work, we have proposed a method to dynamically extend the order of a Markov chain model and a complimentary method for assessing the predictive power of such a variable-length Markov chain. Herein, we review these two methods and propose a novel method for measuring the ability of a variable-length Markov model to summarize user Web navigation sessions up to a given length. Although the summarization ability of a model is important to enable the identification of user navigation patterns, the ability to make predictions is important in order to foresee the next link choice of a user after following a given trail so as, for example, to personalize a Web site. We present an extensive experimental evaluation providing strong evidence that prediction accuracy increases linearly with summarization ability.

CloseRead Abstract

2006

Ranking pages by topology and popularity within web sites

Authors
Borges, J; Levene, M;

Publication
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS

Abstract
We compare two link analysis ranking methods of web pages in a site. The first, called Site Rank, is an adaptation of PageRank to the granularity of a web site and the second, called Popularity Rank, is based on the frequencies of user clicks on the outlinks in a page that are captured by navigation sessions of users through the web site. We ran experiments on artificially created web sites of different sizes and on two real data sets, employing the relative entropy to compare the distributions of the two ranking methods. For the real data sets we also employ a nonparametric measure, called Spearman's footrule, which we use to compare the top-ten web pages ranked by the two methods. Our main result is that the distributions of the Popularity Rank and Site Rank are surprisingly close to each other, implying that the topology of a web site is very instrumental in guiding users through the site. Thus, in practice, the Site Rank provides a reasonable first order approximation of the aggregate behaviour of users within a web site given by the Popularity Rank.

CloseRead Abstract

2007

Testing the predictive power of variable history web usage

Authors
Borges, J; Levene, M;

Publication
SOFT COMPUTING

Abstract
We present two methods for testing the predictive power of a variable length Markov chain induced from a collection of user web navigation sessions. The collection of sessions is split into a training and a test set. The first method uses a chi(2) statistical test to measure the significance of the distance between the distribution of the probabilities assigned to the test trails by a Markov model build from the full collection of sessions and a model built from the training set. The statistical test measures the ability of the model to generalise its predictions to the unseen sessions from the test set. The second method evaluates the model ability to predict the last page of a navigation session based on the preceding pages viewed by recording the mean absolute error of the rank of the last occurring page among the predictions provided by the model. Experimental results conducted on both real and random data sets are reported and the results show that in most cases a second-order model is able to capture sufficient history to predict the next link choice with high accuracy.

CloseRead Abstract