Publications

Publications by Ricardo Campos

2012

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Authors
Campos, R; Jorge, AM; Dias, G; Nunes, C;

Publication
2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1

Abstract
With the growing popularity of research in Temporal Information Retrieval (T-IR), a large amount of temporal data is ready to be exploited. The ability to exploit this information can be potentially useful for several tasks. For example, when querying "Football World Cup Germany", it would be interesting to have two separate clusters {1974,2006} corresponding to each of the two temporal instances. However, clustering of search results by time is a non-trivial task that involves determining the most relevant dates associated to a query. In this paper, we propose a first approach to flat temporal clustering of search results. We rely on a second order co-occurrence similarity measure approach which first identifies top relevant dates. Documents are grouped at the year level, forming the temporal instances of the query. Experimental tests were performed using real-world text queries. We used several measures for evaluating the performance of the system and compared our approach with Carrot Web-snippet clustering engine. Both experiments were complemented with a user survey.

CloseRead Abstract

2012

GTE: a distributional second-order co-occurrence approach to improve the identification of top relevant dates in web snippets

Authors
Campos, R; Dias, G; Jorge, A; Nunes, C;

Publication
21st ACM International Conference on Information and Knowledge Management, CIKM'12, Maui, HI, USA, October 29 - November 02, 2012

Abstract
In this paper, we present an approach to identify top relevant dates in Web snippets with respect to a given implicit temporal query. Our approach is two-fold. First, we propose a generic temporal similarity measure called GTE, which evaluates the temporal similarity between a query and a date. Second, we propose a classification model to accurately relate relevant dates to their corresponding query terms and withdraw irrelevant ones. We suggest two different solutions: a threshold-based classification strategy and a supervised classifier based on a combination of multiple similarity measures. We evaluate both strategies over a set of real-world text queries and compare the performance of our Web snippet approach with a query log approach over the same set of queries. Experiments show that determining the most relevant dates of any given implicit temporal query can be improved with GTE combined with the second order similarity measure InfoSimba, the Dice coefficient and the threshold-based strategy compared to (1) first-order similarity measures and (2) the query log based approach. © 2012 ACM.

CloseRead Abstract

2012

Enriching temporal query understanding through date identification: How to tag implicit temporal queries?

Authors
Campos, R; Dias, G; Jorge, AM; Nunes, C;

Publication
ACM International Conference Proceeding Series

Abstract
Generically, search engines fail to understand the user's temporal intents when expressed as implicit temporal queries. This causes the retrieval of less relevant information and prevents users from being aware of the possible temporal dimension of the query results. In this paper, we aim to develop a language-independent model that tackles the temporal dimensions of a query and identifies its most relevant time periods. For this purpose, we propose a temporal similarity measure capable of associating a relevant date(s) to a given query and filtering out irrelevant ones. Our approach is based on the exploitation of temporal information from web content, particularly within the set of k-top retrieved web snippets returned in response to a query. We particularly focus on extracting years, which are a kind of temporal information that often appears in this type of collection. We evaluate our methodology using a set of real-world text temporal queries, which are clear concepts (i.e. queries which are non-ambiguous in concept and temporal in their purpose). Experiments show that when compared to baseline methods, determining the most relevant dates relating to any given implicit temporal query can be improved with a new temporal similarity measure. © 2012 ACM.

CloseRead Abstract

2011

An Exploratory Study on the Impact of Temporal Features on the Classification and Clustering of Future-Related Web Documents

Authors
Campos, R; Dias, G; Jorge, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
In the last few years, a huge amount of temporal written information has become widely available on the Internet with the advent of forums, blogs and social networks. This gave rise to a new challenging problem called future retrieval, which consists of extracting future temporal information, that is known in advance, from web sources in order to answer queries that combine text of a future temporal nature. This paper aims to confirm whether web snippets can be used to form an intelligent web that can detect future expected events when their dates are already known. Moreover, the objective is to identify the nature of future texts and understand how these temporal features affect the classification and clustering of the different types of future-related texts: informative texts, scheduled texts and rumor texts. We have conducted a set of comprehensive experiments and the results show that web documents are a valuable source of future data that can be particularly useful in identifying and understanding the future temporal nature of a given implicit temporal query.

CloseRead Abstract

2011

What is the temporal value of web snippets?

Authors
Campos, R; Dias, G; Jorge, AM;

Publication
CEUR Workshop Proceedings

Abstract
The World Wide Web (WWW) is a huge information network from which retrieving and organizing quality relevant content remains an open question for mostly all implicit temporal queries, i.e., queries without any date but with an underlying temporal intent. In this research, we aim at studying the temporal nature of any given query by means of web snippets or web query logs. For that purpose, we conducted a set of experiments, which goal is to assess the percentage of web snippets or queries (in query logs) having temporal features, thus checking whether they are a valuable source of data to help on inferring the temporal intent of queries, namely implicit ones. Our results show that web snippets, as opposed to web query logs, are an important source of concentrated information, where time clues often appear. As a consequence, they can be particularly useful to identify and understand "on-the-fly" the implicit temporal nature of queries in the context of ephemeral clustering.

CloseRead Abstract

2009

DISAMBIGUATING WEB SEARCH RESULTS BY TOPIC AND TEMPORAL CLUSTERING A Proposal

Authors
Campos, R; Dias, G; Jorge, AM;

Publication
KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL

Abstract
With so much information available on the web, looking for relevant documents on the Internet has become a difficult task. Temporal features play an important role with the introduction of a time dimension and the possibility to restrict a search by time, recreating a particular moment of a web page set. Despite its importance, temporal information is still under-considered by current search engines, limiting themselves to the capture of the most recent snapshot of the information. In this paper, we describe the architecture of a temporal search engine which uses timelines to browse search results. More specifically, we intend to add a time measure to cluster web page results, by analyzing web page contents, supporting the search of temporal and non-temporal information embedded in web documents.

CloseRead Abstract