Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2012

Optimal leverage association rules with numerical interval conditions

Autores
Jorge, AM; Azevedo, PJ;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
In this paper we propose a framework for defining and discovering optimal association rules involving a numerical attribute A in the consequent. The consequent has the form of interval conditions (A < x, A >= x or A is an element of I where I is an interval or a set of intervals of the form [x(l), x(u))). The optimality is with respect to leverage, one well known association rule interest measure. The generated rules are called Maximal Leverage Rules (MLR) and are generated from Distribution Rules. The principle for finding the MLR is related to the Kolmogorov-Smirnov goodness of fit statistical test. We propose different methods for MLR generation, taking into account leverage optimallity and readability. We theoretically demonstrate the optimality of the main exact methods, and measure the leverage loss of approximate methods. We show empirically that the discovery process is scalable.

2012

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Autores
Campos, R; Jorge, AM; Dias, G; Nunes, C;

Publicação
2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1

Abstract
With the growing popularity of research in Temporal Information Retrieval (T-IR), a large amount of temporal data is ready to be exploited. The ability to exploit this information can be potentially useful for several tasks. For example, when querying "Football World Cup Germany", it would be interesting to have two separate clusters {1974,2006} corresponding to each of the two temporal instances. However, clustering of search results by time is a non-trivial task that involves determining the most relevant dates associated to a query. In this paper, we propose a first approach to flat temporal clustering of search results. We rely on a second order co-occurrence similarity measure approach which first identifies top relevant dates. Documents are grouped at the year level, forming the temporal instances of the query. Experimental tests were performed using real-world text queries. We used several measures for evaluating the performance of the system and compared our approach with Carrot Web-snippet clustering engine. Both experiments were complemented with a user survey.

2012

A Multi-agent Recommender System

Autores
Jorge Morais, AJ; Oliveira, E; Jorge, AM;

Publicação
DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE

Abstract
The large amount of pages in Websites is a problem for users who waste time looking for the information they really want. Knowledge about users' previous visits may provide patterns that allow the customization of the Website. This concept is known as Adaptive Website: a Website that adapts itself for the purpose of improving the user's experience. Some Web Mining algorithms have been proposed for adapting a Website. In this paper, a recommender system using agents with two different algorithms (associative rules and collaborative filtering) is described. Both algorithms are incremental and work with binary data. Results show that this multi-agent approach combining different algorithms is capable of improving user's satisfaction.

2012

GTE: a distributional second-order co-occurrence approach to improve the identification of top relevant dates in web snippets

Autores
Campos, R; Dias, G; Jorge, A; Nunes, C;

Publicação
21st ACM International Conference on Information and Knowledge Management, CIKM'12, Maui, HI, USA, October 29 - November 02, 2012

Abstract
In this paper, we present an approach to identify top relevant dates in Web snippets with respect to a given implicit temporal query. Our approach is two-fold. First, we propose a generic temporal similarity measure called GTE, which evaluates the temporal similarity between a query and a date. Second, we propose a classification model to accurately relate relevant dates to their corresponding query terms and withdraw irrelevant ones. We suggest two different solutions: a threshold-based classification strategy and a supervised classifier based on a combination of multiple similarity measures. We evaluate both strategies over a set of real-world text queries and compare the performance of our Web snippet approach with a query log approach over the same set of queries. Experiments show that determining the most relevant dates of any given implicit temporal query can be improved with GTE combined with the second order similarity measure InfoSimba, the Dice coefficient and the threshold-based strategy compared to (1) first-order similarity measures and (2) the query log based approach. © 2012 ACM.

2012

Ensemble Approaches for Regression: A Survey

Autores
Mendes Moreira, J; Soares, C; Jorge, AM; De Sousa, JF;

Publicação
ACM COMPUTING SURVEYS

Abstract
The goal of ensemble regression is to combine several models in order to improve the prediction accuracy in learning problems with a numerical target variable. The process of ensemble learning can be divided into three phases: the generation phase, the pruning phase, and the integration phase. We discuss different approaches to each of these phases that are able to deal with the regression problem, categorizing them in terms of their relevant characteristics and linking them to contributions from different fields. Furthermore, this work makes it possible to identify interesting areas for future research.

2012

Enriching temporal query understanding through date identification: How to tag implicit temporal queries?

Autores
Campos, R; Dias, G; Jorge, AM; Nunes, C;

Publicação
ACM International Conference Proceeding Series

Abstract
Generically, search engines fail to understand the user's temporal intents when expressed as implicit temporal queries. This causes the retrieval of less relevant information and prevents users from being aware of the possible temporal dimension of the query results. In this paper, we aim to develop a language-independent model that tackles the temporal dimensions of a query and identifies its most relevant time periods. For this purpose, we propose a temporal similarity measure capable of associating a relevant date(s) to a given query and filtering out irrelevant ones. Our approach is based on the exploitation of temporal information from web content, particularly within the set of k-top retrieved web snippets returned in response to a query. We particularly focus on extracting years, which are a kind of temporal information that often appears in this type of collection. We evaluate our methodology using a set of real-world text temporal queries, which are clear concepts (i.e. queries which are non-ambiguous in concept and temporal in their purpose). Experiments show that when compared to baseline methods, determining the most relevant dates relating to any given implicit temporal query can be improved with a new temporal similarity measure. © 2012 ACM.

  • 294
  • 430