2007
Authors
Ribeiro, C; David, G; Calistrul, C;
Publication
ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS
Abstract
The paper presents a multimedia database model accounting for the representation of documents, collections and the associated metadata. Appropriate structures are provided for descriptive metadata and for metadata resulting from automatic content analysis. The model is based on the identification and unification of the main concepts in the archival standards and the audiovisual area. The main features of the model, designed to support multimedia database applications, axe the integration of descriptive and content analysis metadata, the association of metadata to collections as well as to items, the extensibility with respect to the inclusion of new descriptors and the support to several retrieval modes. The MetaMedia application development platform, based on the model, has been used to support the construction of a historic documentation collection where a common web interface provides collection administrators, metadata creators and visitors a multi-faceted view of the repository.
2007
Authors
Ribeiro, C; David, G; Calistru, C;
Publication
Research and Advanced Technology for Digital Libraries, Proceedings
Abstract
The web is currently the information searching and browsing environment of choice for scholars and lay users alike. The goal of most cultural heritage applications is to interest a large audience, and therefore web interfaces are being developed even when part of their functionality is not offered to the general public. We present a web-based interface for managing, browsing and searching a repository of historic documents. The documents pertain to a region which has been an important regional power in medieval times and their originals are under the custody of the Portuguese national archives. The challenges of the project came from its requisites in three aspects: rigorous archival description, the incorporation of document analysis and a flexible search interface. The system is an instance of a multimedia database framework providing both browse and retrieval functionalities to end users and configuration and content management services to the collection administrators.
2009
Authors
Calistru, C; Ribeiro, C; David, G;
Publication
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING
Abstract
Cultural heritage documents are often subject to digitization processes resulting in image material, even for textual contents. It is therefore common, in collections of valuable documents, to have descriptive information generated by the institutions, along with digitized images, transcriptions created by scholars, translations and even miscellaneous annotations. To offer a faceted access to the collection it is necessary to explore these diverse materials, integrate them according to a model that accounts for both metadata and the content and provide a comprehensive retrieval environment. In this work we have applied the MetaMedia multimedia database framework to a collection of ancient documents, processed the documents in their descriptive, textual, and image content and produced a browsing and searching system. The main challenges are the integrated management of metadata and content, the indexing of the image content, and the design of the browsing and searching interface where various views on the data are kept together. Copyright (C) 2009 Catalin Calistru et al.
2010
Authors
Nunes, S; Ribeiro, C; David, G;
Publication
DOCENG2010: PROCEEDINGS OF THE 2010 ACM SYMPOSIUM ON DOCUMENT ENGINEERING
Abstract
Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, generally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collection because it is a broad and public resource and, more important, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely revision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents - i.e. comprehensive and focused on a single topic - exhibits a rapid and steady progression towards the document's current version. The content in early versions quickly becomes very similar to the present version of the document.
2007
Authors
Nunes, S; Ribeiro, C; David, G;
Publication
International Conference on Information and Knowledge Management, Proceedings
Abstract
Time has been successfully used as a feature in web information retrieval tasks. In this context, estimating a document's inception date or last update date is a necessary task. Classic approaches have used HTTP header fields to estimate a document's last update time. The main problem with this approach is that it is applicable to a small part of web documents. In this work, we evaluate an alternative strategy based on a document's neighborhood. Using a random sample containing 10,000 URLs from the Yahoo! Directory, we study each document's links and media assets to determine its age. If we only consider isolated documents, we are able to date 52% of them. Including the document's neighborhood, we are able to estimate the date of more than 86% of the same sample. Also, we find that estimates differ significantly according to the type of neighbors used. The most reliable estimates are based on the document's media assets, while the worst estimates are based on incoming links. These results are experimentally evaluated with a real world application using different datasets. Copyright 2007 ACM.
2007
Authors
Gonalves, B; Calistru, C; Ribeiro, C; David, G;
Publication
2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2
Abstract
Automatic multimedia retrieval requires the use of complex features, which are typically captured by multidimensional descriptors. A basic operation in a multimedia retrieval system is similarity computation, making use of descriptor-dependant metrics. Many data structures have been proposed for managing the representation of multidimensional descriptors, each geared towards efficiency in some set of basic operations. The paper describes a framework for evaluating multidimensional descriptor indexing structures and reports a set of experiments with selected descriptors indexing methods. The extensibility of the framework is illustrated by incorporating a recently-proposed structure, the BitMatrix. Data sets and experiment conditions can be set up so as to provide results that can be used in the choice of appropriate indexing structures for a class of multimedia retrieval applications.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.