Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Gabriel David

2007

Using neighbors to date web documents

Authors
Nunes, S; Ribeiro, C; David, G;

Publication
International Conference on Information and Knowledge Management, Proceedings

Abstract
Time has been successfully used as a feature in web information retrieval tasks. In this context, estimating a document's inception date or last update date is a necessary task. Classic approaches have used HTTP header fields to estimate a document's last update time. The main problem with this approach is that it is applicable to a small part of web documents. In this work, we evaluate an alternative strategy based on a document's neighborhood. Using a random sample containing 10,000 URLs from the Yahoo! Directory, we study each document's links and media assets to determine its age. If we only consider isolated documents, we are able to date 52% of them. Including the document's neighborhood, we are able to estimate the date of more than 86% of the same sample. Also, we find that estimates differ significantly according to the type of neighbors used. The most reliable estimates are based on the document's media assets, while the worst estimates are based on incoming links. These results are experimentally evaluated with a real world application using different datasets. Copyright 2007 ACM.

2007

An evaluation framework for multidimensional multimedia Descriptor indexing

Authors
Gonalves, B; Calistru, C; Ribeiro, C; David, G;

Publication
2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2

Abstract
Automatic multimedia retrieval requires the use of complex features, which are typically captured by multidimensional descriptors. A basic operation in a multimedia retrieval system is similarity computation, making use of descriptor-dependant metrics. Many data structures have been proposed for managing the representation of multidimensional descriptors, each geared towards efficiency in some set of basic operations. The paper describes a framework for evaluating multidimensional descriptor indexing structures and reports a set of experiments with selected descriptors indexing methods. The extensibility of the framework is illustrated by incorporating a recently-proposed structure, the BitMatrix. Data sets and experiment conditions can be set up so as to provide results that can be used in the choice of appropriate indexing structures for a class of multimedia retrieval applications.

2012

SIARD Archive Browser

Authors
Rahman, AU; David, G; Ribeiro, C;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
SIARD Suite enables us to preserve a relational database in an open format. It migrates a relational database to SIARD format and preserves technical and contextual metadata along with the primary data ensuring long term accessibility. This paper introduces a web application, the SIARD Archive Browser, which allows operations on the archive such as searching for a specific record, counting records in a table containing a keyword, sorting by a column and making joins. In many use cases, the application avoids the need to load a preserved database to a DBMS. © 2012 Springer-Verlag.

2010

Model Migration Approach for Database Preservation

Authors
Rahman, AU; David, G; Ribeiro, C;

Publication
ROLE OF DIGITAL LIBRARIES IN A TIME OF GLOBAL CHANGE

Abstract
Strategies developed for database preservation in the past include technology preservation, migration, emulation and the use of a universal virtual computer. In this paper we present a new concept of "Model Migration for Database Preservation". Our proposed approach involves two major activities. First, migrating the database model from conventional relational model to dimensional model and second, calculating the information embedded in code and preserving it instead of preserving the code required to calculate it. This will affect the originality of the database but improve two other characteristics: the information considered relevant is kept in a simple and easier to understand format and the systematic process to preserve the dimensional model is independent of the DBMS details and application logic.

2008

Use of temporal expressions in web search

Authors
Nunes, S; Ribeiro, C; David, G;

Publication
ADVANCES IN INFORMATION RETRIEVAL

Abstract
While trying to understand and characterize users' behavior online, the temporal dimension has received little attention by the research community. This exploratory study uses two collections of web search queries to investigate the use of temporal information needs. Using state-of-the-art information extraction techniques we identify temporal expressions in these queries. We find that temporal expressions are rarely used (1.5% of queries) and, when used, they are related to current and past events. Also, there are specific topics where the use of temporal expressions is more visible.

2006

Multidimensional descriptor indexing: Exploring the BitMatrix

Authors
Calistru, C; Ribeiro, C; David, G;

Publication
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS

Abstract
Multimedia retrieval brings new challenges, mainly derived from the mismatch between the level of the user interaction-high-level concepts, and that of the automatically processed descriptors-low-level features. The effective use of the low-level descriptors is therefore mandatory. Many data structures have been proposed for managing the representation of multidimensional descriptors, each geared toward efficiency in some set of basic operations. The paper introduces a highly parametrizable structure called the BitMatrix, along with its search algorithms. The BitMatrix is compared with existing methods, all implemented in a common framework. The tests have been performed on two datasets, with parameters covering significant ranges of values. The BitMatrix has proved to be a robust and flexible structure that can compete with other methods for multidimensional descriptor indexing.

  • 6
  • 8