Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu

Publications by Gabriel David


WikiChanges - Exposing Wikipedia revision activity

Nunes, S; Ribeiro, C; David, G;

WikiSym 2008 - The 4th International Symposium on Wikis, Proceedings

Wikis are popular tools commonly used to support distributed collaborative work. Wikis can be seen as virtual scrap-books that anyone can edit without having any specific technical know-how. The Wikipedia is a flagship example of a real-word application of wikis. Due to the large scale of Wikipedia it's difficult to easily grasp much of the information that is stored in this wiki. We address one particular aspect of this issue by looking at the revision history of each article. Plotting the revision activity in a timeline we expose the complete article's history in a easily understandable format. We present WIKICHANGES, a web-based application designed to plot an article's revision timeline in real time. WIKICHANGES also includes a web browser extension that incorporates activity sparklines in the real Wikipedia. Finally, we introduce a revisions summarization task that addresses the need to understand what occurred during a given set of revisions. We present a first approach to this task using tag clouds to present the revisions made. © 2008 ACM.


INESC, Porto at TRECVID 2007: Automatic and interactive video search

Calistru, C; Ribeiro, C; David, G; Rodrigues, I; Laboreiro, G;

2007 TREC Video Retrieval Evaluation Notebook Papers

The INESC Porto group has participated in the search task (automatic and interactive). Our approach combines high-level features (the 39 con- cepts of the LSCOM-Lite set) with low-level features. We use a large set of low-level features with the intention of analysing as many facets as possible of each shot. The aggregation of large feature sets can be time consuming as it needs to be done at query time. We have developed the BitMatrix indexing method to speed up the search process. For each shot, binary signatures in the form of bit sequences are obtained in an on-line process. At query time, the query bit signature is compared to each of the shots signatures. The automatic run performs above the median, in spite of not using any classifier or any other knowledge sources except the translation of the query into LSCOM-Lite concepts.


Multimedia in cultural heritage collections: A model and applications

Ribeiro, C; David, G; Calistrul, C;


The paper presents a multimedia database model accounting for the representation of documents, collections and the associated metadata. Appropriate structures are provided for descriptive metadata and for metadata resulting from automatic content analysis. The model is based on the identification and unification of the main concepts in the archival standards and the audiovisual area. The main features of the model, designed to support multimedia database applications, axe the integration of descriptive and content analysis metadata, the association of metadata to collections as well as to items, the extensibility with respect to the inclusion of new descriptors and the support to several retrieval modes. The MetaMedia application development platform, based on the model, has been used to support the construction of a historic documentation collection where a common web interface provides collection administrators, metadata creators and visitors a multi-faceted view of the repository.


A historic documentation repository for specialized and public access

Ribeiro, C; David, G; Calistru, C;

Research and Advanced Technology for Digital Libraries, Proceedings

The web is currently the information searching and browsing environment of choice for scholars and lay users alike. The goal of most cultural heritage applications is to interest a large audience, and therefore web interfaces are being developed even when part of their functionality is not offered to the general public. We present a web-based interface for managing, browsing and searching a repository of historic documents. The documents pertain to a region which has been an important regional power in medieval times and their originals are under the custody of the Portuguese national archives. The challenges of the project came from its requisites in three aspects: rigorous archival description, the incorporation of document analysis and a flexible search interface. The system is an instance of a multimedia database framework providing both browse and retrieval functionalities to end users and configuration and content management services to the collection administrators.


Multimedia in Cultural Heritage Manuscripts: Integrating Description, Transcription, and Image Content

Calistru, C; Ribeiro, C; David, G;


Cultural heritage documents are often subject to digitization processes resulting in image material, even for textual contents. It is therefore common, in collections of valuable documents, to have descriptive information generated by the institutions, along with digitized images, transcriptions created by scholars, translations and even miscellaneous annotations. To offer a faceted access to the collection it is necessary to explore these diverse materials, integrate them according to a model that accounts for both metadata and the content and provide a comprehensive retrieval environment. In this work we have applied the MetaMedia multimedia database framework to a collection of ancient documents, processed the documents in their descriptive, textual, and image content and produced a browsing and searching system. The main challenges are the integrated management of metadata and content, the indexing of the image content, and the design of the browsing and searching interface where various views on the data are kept together. Copyright (C) 2009 Catalin Calistru et al.


Term Frequency Dynamics in Collaborative Articles

Nunes, S; Ribeiro, C; David, G;


Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, generally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collection because it is a broad and public resource and, more important, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely revision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents - i.e. comprehensive and focused on a single topic - exhibits a rapid and steady progression towards the document's current version. The content in early versions quickly becomes very similar to the present version of the document.

  • 5
  • 8