Publicacoes - INESC TEC

Publicações

Publicações por Cristina Ribeiro

2012

SIARD Archive Browser

Autores
Rahman, AU; David, G; Ribeiro, C;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
SIARD Suite enables us to preserve a relational database in an open format. It migrates a relational database to SIARD format and preserves technical and contextual metadata along with the primary data ensuring long term accessibility. This paper introduces a web application, the SIARD Archive Browser, which allows operations on the archive such as searching for a specific record, counting records in a table containing a keyword, sorting by a column and making joins. In many use cases, the application avoids the need to load a preserved database to a DBMS. © 2012 Springer-Verlag.

FecharLer Abstract

2010

Model Migration Approach for Database Preservation

Autores
Rahman, AU; David, G; Ribeiro, C;

Publicação
ROLE OF DIGITAL LIBRARIES IN A TIME OF GLOBAL CHANGE

Abstract
Strategies developed for database preservation in the past include technology preservation, migration, emulation and the use of a universal virtual computer. In this paper we present a new concept of "Model Migration for Database Preservation". Our proposed approach involves two major activities. First, migrating the database model from conventional relational model to dimensional model and second, calculating the information embedded in code and preserving it instead of preserving the code required to calculate it. This will affect the originality of the database but improve two other characteristics: the information considered relevant is kept in a simple and easier to understand format and the systematic process to preserve the dimensional model is independent of the DBMS details and application logic.

FecharLer Abstract

2008

Use of temporal expressions in web search

Autores
Nunes, S; Ribeiro, C; David, G;

Publicação
ADVANCES IN INFORMATION RETRIEVAL

Abstract
While trying to understand and characterize users' behavior online, the temporal dimension has received little attention by the research community. This exploratory study uses two collections of web search queries to investigate the use of temporal information needs. Using state-of-the-art information extraction techniques we identify temporal expressions in these queries. We find that temporal expressions are rarely used (1.5% of queries) and, when used, they are related to current and past events. Also, there are specific topics where the use of temporal expressions is more visible.

FecharLer Abstract

2006

Multidimensional descriptor indexing: Exploring the BitMatrix

Autores
Calistru, C; Ribeiro, C; David, G;

Publicação
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS

Abstract
Multimedia retrieval brings new challenges, mainly derived from the mismatch between the level of the user interaction-high-level concepts, and that of the automatically processed descriptors-low-level features. The effective use of the low-level descriptors is therefore mandatory. Many data structures have been proposed for managing the representation of multidimensional descriptors, each geared toward efficiency in some set of basic operations. The paper introduces a highly parametrizable structure called the BitMatrix, along with its search algorithms. The BitMatrix is compared with existing methods, all implemented in a common framework. The tests have been performed on two datasets, with parameters covering significant ranges of values. The BitMatrix has proved to be a robust and flexible structure that can compete with other methods for multidimensional descriptor indexing.

FecharLer Abstract

2011

Term Weighting Based on Document Revision History

Autores
Nunes, S; Ribeiro, C; David, G;

Publicação
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY

Abstract
In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.

FecharLer Abstract

2001

A Metadata Model for Multimedia Databases

Autores
Ribeiro, C; David, G;

Publicação
ICHIM (1)

Abstract