2015
Authors
Devezas, T; Nunes, S; Rodríguez, MT;
Publication
Proceedings of the 2015 International Workshop on Human-centric Independent Computing, HIC@HT 2015, Guzelyurt, Northern Cyprus, September 1, 2015
Abstract
In this paper, we present the tools of the MediaViz project, a work-in progress platform that aims to provide researchers, academics and professionals from the media field with a set of analytical and exploratory resources to answer high level and complex questions about the online media panorama, in an eficient, visual and interactive way. Our approach consists of aggregating and processing news data from multiple online sources, and provide programatic access to it through an Application Programming Interface (API). The visualization tools leverage the data provided by the API, allowing users to interact, explore and interrogate that information. Through the use of data visualization techniques, we aim to characterize the publication patterns of multiple online news sources by analyzing and comparing distinct dimensions. Dimensions of interest include the frequency and flow of publications and social shares throughout time, and the geographic coverage of online news outlets. We present some of the developed visualization tools and describe how they can offer meaningful insights by providing a bird's-eye view of distinct characteristics of the online mediascape. © 2015 ACM.
2015
Authors
Rodríguez, MT; Nunes, S; Devezas, T;
Publication
NHT 2015 - Proceedings of the 2015 Workshop on Narrative and Hypertext - co-located with HT 2015
Abstract
In this article we survey the historical background and development of information and data visualization, and an overview of the intersection of data visualization with storytelling applied to the field of data journalism, where it finds its most widespread use in narrative visualizations. We start by explaining why the mere act of visualization can be highly useful to readers, helping them discover patterns and comprehend information. Backed by historical references, we will describe how some of the first data visualizations were used to explain facts, understand certain events, and determine courses of action. We will then outline how storytelling and narrative techniques are being currently used with data visualization to leverage the power of visual expression. Our goal is to characterize storytelling with data as a vibrant and interesting field that current journalism practices employ to help readers understand and form opinions on complex facts. By presenting concepts like storytelling with data and data stories, we aim to spark interest in further research in the applications of data visualization and narrative. © 2015 ACM.
2015
Authors
Oroszlanyova, M; Ribeiro, C; Nunes, S; Lopes, CT;
Publication
CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2015
Abstract
Search engines typically estimate relevance using features of the documents. We believe that several features from the user and task can also contribute to this process. In the health domain there are specific characteristics of web documents that can also add value to this estimation. In the present work, using a dataset composed by set of annotated web pages and their assessment by a set of users regarding their relevance and comprehension, we analyse what characteristics affect documents' relevance and what characteristics influence how well users comprehend them. We have conducted a bivariate analysis using characteristics of the above data collection. The strongest relations we have found are linked to the task features, suggesting a direct association between tasks' clarity and easiness and both the relevance and the comprehension of the content. The language of the document, its medical certification, the update status, the content in pathology definitions, the content in prevention, prognosis and treatment information, are other characteristics valued by consumers in terms of relevance. Users' previous experience on health searches and, particularly, on the topic being searched, their gender, the language and terminology of their queries were shown to be related to their success in the search tasks. We have also found that lay terminology, knowledge about the medico-scientific terms and the language of the documents are good indicators of comprehension. Documents containing links and testimonies, and the ones recently updated were observed to be better understood by users, as well as blog posts and comments. (C) 2015 The Authors. Published by Elsevier B.V.
2013
Authors
Nunes, S; Ribeiro, C; David, G;
Publication
INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL
Abstract
Introduction. The strong dynamic nature of the Web is a well-known reality. Nonetheless, research on Web dynamics is still a minor part of mainstream Web research. This is largely the case in Web link analysis. In this paper we investigate and measure the impact of time in link-based ranking algorithms on a particular subset of the Web, specifically blogs. Method. Using a large collection of blog posts that span more than three years, we compare a traditional link-based ranking algorithm with a time-biased alternative, providing some insights into the evolution of link data over time. We designed two experiments to evaluate the use of temporal features in authority estimation algorithms. In the first experiment we compare time-independent and time-sensitive ranking algorithms with a reference rank based on the total number of visits to each blog. In the second, we use feedback from communication media domain experts to contrast different rankings of Portuguese news Websites. Results. The distribution of citations to a Web document over time contains valuable information. Based on several examples we show that time-independent algorithms are unable to capture the correct popularity of sites with high citation activity. Using a reference rank based on the number of visits to a site, we show that a time-biased approach has a better performance. Conclusions. Although both time-independent and time-aware approaches are based on the same raw data, the experiments indicate that they can be treated as complementary signals for relevance assessment by information retrieval systems. We show that temporal information present in blogs can be used to derive stable time-dependent features, which can be successfully used in the context of Web document ranking.
2017
Authors
Devezas, JL; Nunes, S;
Publication
6th Symposium on Languages, Applications and Technologies, SLATE 2017, June 26-27, 2017, Vila do Conde, Portugal
Abstract
Search engines are evolving towards richer and stronger semantic approaches, focusing on entity-oriented tasks where knowledge bases have become fundamental. In order to support semantic search, search engines are increasingly reliant on robust information extraction systems. In fact, most modern search engines are already highly dependent on a well-curated knowledge base. Nevertheless, they still lack the ability to e ectively and automatically take advantage of multiple heterogeneous data sources. Central tasks include harnessing the information locked within textual content by linking mentioned entities to a knowledge base, or the integration of multiple knowledge bases to answer natural language questions. Combining text and knowledge bases is frequently used to improve search results, but it can also be used for the query-independent ranking of entities like events. In this work, we present a complete information extraction pipeline for the Portuguese language, covering all stages from data acquisition to knowledge base population. We also describe a practical application of the automatically extracted information, to support the ranking of upcoming events displayed in the landing page of an institutional search engine, where space is limited to only three relevant events. We manually annotate a dataset of news, covering event announcements from multiple faculties and organic units of the institution. We then use it to train and evaluate the named entity recognition module of the pipeline. We rank events by taking advantage of identified entities, as well as partOf relations, in order to compute an entity popularity score, as well as an entity click score based on implicit feedback from clicks from the institutional search engine. We then combine these two scores with the number of days to the event, obtaining a final ranking for the three most relevant upcoming events. © José Devezas and Sérgio Nunes
2017
Authors
Devezas, J; Nunes, S;
Publication
ERCIM NEWS
Abstract
In an information society, people expect to find answers to their questions quickly and with little effort. Sometimes, these answers are locked within textual documents, which often require a manual analysis, after being retrieved from the web using search engines. At FEUP InfoLab, we are researching graph-based models to index combined data (text and knowledge), with the goal of improving entity-oriented search effectiveness.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.