Publicacoes - INESC TEC

Publicações

Publicações por Cristina Ribeiro

2013

The impact of time in link-based Web ranking

Autores
Nunes, S; Ribeiro, C; David, G;

Publicação
INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL

Abstract
Introduction. The strong dynamic nature of the Web is a well-known reality. Nonetheless, research on Web dynamics is still a minor part of mainstream Web research. This is largely the case in Web link analysis. In this paper we investigate and measure the impact of time in link-based ranking algorithms on a particular subset of the Web, specifically blogs. Method. Using a large collection of blog posts that span more than three years, we compare a traditional link-based ranking algorithm with a time-biased alternative, providing some insights into the evolution of link data over time. We designed two experiments to evaluate the use of temporal features in authority estimation algorithms. In the first experiment we compare time-independent and time-sensitive ranking algorithms with a reference rank based on the total number of visits to each blog. In the second, we use feedback from communication media domain experts to contrast different rankings of Portuguese news Websites. Results. The distribution of citations to a Web document over time contains valuable information. Based on several examples we show that time-independent algorithms are unable to capture the correct popularity of sites with high citation activity. Using a reference rank based on the number of visits to a site, we show that a time-biased approach has a better performance. Conclusions. Although both time-independent and time-aware approaches are based on the same raw data, the experiments indicate that they can be treated as complementary signals for relevance assessment by information retrieval systems. We show that temporal information present in blogs can be used to derive stable time-dependent features, which can be successfully used in the context of Web document ranking.

FecharLer Abstract

2013

Preservation of Data Warehouses: Extending the SIARD System with DWXML Language and Tools

Autores
Aldeias, C; David, G; Ribeiro, C;

Publicação
INNOVATIONS IN XML APPLICATIONS AND METADATA MANAGEMENT: ADVANCING TECHNOLOGIES

Abstract
Data warehouses are used in many application domains, and there is no established method for their preservation. A data warehouse can be implemented in multidimensional structures or in relational databases that represent the dimensional model concepts in the relational model. The focus of this work is on describing the dimensional model of a data warehouse and migrating it to an XML model, in order to achieve a long-term preservation format. This chapter presents the definition of the XML structure that extends the SIARD format used for the description and archive of relational databases, enriching it with a layer of metadata for the data warehouse components. Data Warehouse Extensible Markup Language (DWXML) is the XML language proposed to describe the data warehouse. An application that combines the SIARD format and the DWXML metadata layer supports the XML language and helps to acquire the relevant metadata for the warehouse and to build the archival format. Copyright (C) 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

FecharLer Abstract

2013

Query behavior: The impact of health literacy, topic familiarity and terminology

Autores
Lopes, CT; Ribeiro, C;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
We conducted a user study to analyze how health literacy, topic familiarity and the terminology used in past queries affect query behavior in health searches. We found that users with inadequate health literacy have less success in web searches and show more difficulties in query formulation. These users and the ones not familiar with the topic use medico-scientific terminology less often than users with more health literacy and topic familiarity. We conclude that search engines should help these groups of users in query formulation and, since technical documents stimulate the use of medico-scientific terminology in query reformulation, mechanisms like query suggestion can have long-term benefits. © 2013 Springer-Verlag.

FecharLer Abstract

2014

The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

Autores
da Silva, JR; Castro, JA; Ribeiro, C; Lopes, JC;

Publicação
Proceedings of the 11th International Conference on Digital Preservation, iPRES 2014, Melbourne, Australia, October 6 - 10, 2014

Abstract

2013

The Dotted-Board Model: A new MIP model for nesting irregular shapes

Autores
Toledo, FMB; Carravilla, MA; Ribeiro, C; Oliveira, JF; Gomes, AM;

Publicação
INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS

Abstract
The nesting problem, also known as irregular packing problem, belongs to the generic class of cutting and packing (C&P) problems. It differs from other 2-D C&P problems in the irregular shape of the pieces. This paper proposes a new mixed-integer model in which binary decision variables are associated with each discrete point of the board (a dot) and with each piece type. It is much more flexible than previously proposed formulations and solves to optimality larger instances of the nesting problem, at the cost of having its precision dependent on board discretization. To date no results have been published concerning optimal solutions for nesting problems with more than 7 pieces. We ran computational experiments on 45 problem instances with the new model, solving to optimality 34 instances with a total number of pieces ranging from 16 to 56, depending on the number of piece types, grid resolution and the size of the board. A strong advantage of the model is its insensitivity to piece and board geometry, making it easy to extend to more complex problems such as non-convex boards, possibly with defects. Additionally, the number of binary variables does not depend on the total number of pieces but on the number of piece types, making the model particularly suitable for problems with few piece types. The discrete nature of the model requires a trade-off between grid resolution and problem size, as the number of binary variables grows with the square of the selected grid resolution and with board size.

FecharLer Abstract

2013

UPBox e DataNotes: um ambiente de suporte à gestão colaborativa de dados científicos

Autores
Silva, JRd; Ribeiro, C; Lopes, JC;

Publicação
InCID: Revista de Ciência da Informação e Documentação - InCID: Rev. Ci. Inf. Doc.

Abstract