Publicacoes - INESC TEC

Publicações

Publicações por Cristina Ribeiro

2017

Involving data creators in an ontology-based design process for metadata models

Autores
Castro, JA; Amorim, RC; Gattelli, R; Karimova, Y; Da Silva, JR; Ribeiro, C;

Publicação
Developing Metadata Application Profiles

Abstract
Research data are the cornerstone of science and their current fast rate of production is disquieting researchers. Adequate research data management strongly depends on accurate metadata records that capture the production context of the datasets, thus enabling data interpretation and reuse. This chapter reports on the authors' experience in the development of the metadata models, formalized as ontologies, for several research domains, involving members from small research teams in the overall process. This process is instantiated with four case studies: vehicle simulation; hydrogen production; biological oceanography and social sciences. The authors also present a data description workflow that includes a research data management platform, named Dendro, where researchers can prepare their datasets for further deposit in external data repositories. © 2017, IGI Global.

FecharLer Abstract

2014

LabTablet: Semantic Metadata Collection on a Multi-domain Laboratory Notebook

Autores
Amorim, RC; Castro, JA; da Silva, JR; Ribeiro, C;

Publicação
METADATA AND SEMANTICS RESEARCH, MTSR 2014

Abstract
The value of research data is recognized, and so is the importance of the associated metadata to contextualize, describe and ultimately render them understandable in the long term. Laboratory notebooks are an excellent source of domain-specific metadata, but this paper-based approach can pose risks of data loss, while limiting the possibilities of collaborative metadata production. The paper discusses the advantages of tools to complement paper-based laboratory notebooks in capturing metadata, regardless of the research domain. We propose LabTablet, an electronic laboratory book aimed at the collection of metadata from the early stages of the research workflow. To evaluate the use of LabTablet and the proposed workflow, researchers in two domains were asked to perform a set of tasks and provided insights about their experience. By rethinking the workflow and helping researchers to actively contribute to data description, the research outputs can be described with generic and domain-dependent metadata, thus improving their chances of being deposited, reused and preserved.

FecharLer Abstract

2013

Managing Research Data at the University of Porto: Requirements, Technologies, and Services

Autores
da Silva, JR; Ribeiro, C; Lopes, JC;

Publicação
INNOVATIONS IN XML APPLICATIONS AND METADATA MANAGEMENT: ADVANCING TECHNOLOGIES

Abstract
This chapter consists of a solution for the management of research data at a higher education and research institution. The chapter is based on a small-scale data audit study, which included contacts with researchers and yielded some preliminary requirements and use cases. These requirements led to the design of a data curation workflow involving the researcher, the curator, and a data repository. The authors describe the features of the data repository prototype, which is an extension to the widely used DSpace repository platform and introduced a set of features mentioned by the majority of the interviewed researchers as relevant for a data repository. The data repository platform contributes to the curation workflow at the university, with XML technology at its core-data is stored using XML documents, which can be systematically processed and queried unlike its original-format counterpart. This system is capable of indexing, querying, and retrieving, in whole or in part, datasets represented in tabular form. There is also the possibility of using elements from domain-specific XML schemas for the cataloguing process, improving the interoperability and quality of the deposited data. Copyright (C) 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

FecharLer Abstract

2013

Measuring the value of health query translation: An analysis by user language proficiency

Autores
Lopes, CT; Ribeiro, C;

Publicação
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY

Abstract
English is by far the most used language on the web. In some domains, the existence of less content in the users' native language may not be problematic and even help to cope with the information overload. Yet, in domains such as health, where information quality is critical, a larger quantity of information may mean easier access to higher quality content. Query translation may be a good strategy to access content in other languages, but the presence of medical terms in health queries makes the translation process more difficult, even for users with very good language proficiencies. In this study, we evaluate how translating a health query affects users with different language proficiencies. We chose English as the non-native language because it is a widely spoken language and it is the most used language on the web. Our findings suggest that non-English-speaking users having at least elementary English proficiency can benefit from a system that suggests English alternatives for their queries, or automatically retrieves English content from a non-English query. This awareness of the user profile results in higher precision, more accurate medical knowledge, and better access to high-quality content. Moreover, the suggestions of English-translated queries may also trigger new health search strategies.

FecharLer Abstract

2014

Ontology-Based Multi-Domain metadata for research data management using triple stores

Autores
Silva, JRD; Ribeiro, C; Lopes, JC;

Publicação
ACM International Conference Proceeding Series

Abstract
Most current research data management solutions rely on a fixed set of descriptors (e.g. Dublin Core Terms) for the description of the resources that they manage. These are easy to understand and use, but their semantics are limited to general concepts, leaving out domain-specific metadata. The textual values for descriptors are easily indexed through free-text indexes, but faceted search and dataset interlinking becomes limited. From the point of view of the relational database schema modeler, designing a more flexible metadata model represents a non-trivial challenge because it means representing entities with attributes unknown at the time of modeling and that can change in time. Those traits, combined with the presence of hierarchies among the entities, can make the relational schema quite complex. This work demonstrates the approaches followed by current opensource platforms and proposes a graph-based model for achieving modular, ontology-based metadata for interlinked data assets in the Semantic Web. The proposed model was implemented in a collaborative research data management platform currently under development at the University of Porto. © 2014 ACM.

FecharLer Abstract

2017

Predicting the Situational Relevance of Health Web Documents

Autores
Oroszlanyova, M; Lopes, CT; Nunes, S; Ribeiro, C;

Publicação
2017 12TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI)

Abstract
Relevance is usually estimated by search engines using document content, disregarding the user behind the search and the characteristics of the task. In this work, we look at relevance as framed in a situational context, calling it situational relevance, and analyze if it is possible to predict it using documents, users and tasks characteristics. Using an existing dataset composed of health web documents, relevance judgments for information needs, user and task characteristics, we build a multivariate prediction model for situational relevance. Our model has an accuracy of 77.17%. Our findings provide insights into features that could improve the estimation of relevance by search engines, helping to conciliate the systemic and situational views of relevance. In a near future we will work on the automatic assessment of document, user and task characteristics.

FecharLer Abstract