2021
Autores
Vinagre, J; Jorge, AM; Rocha, C; Gama, J;
Publicação
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Abstract
Online incremental models for recommendation are nowadays pervasive in both the industry and the academia. However, there is not yet a standard evaluation methodology for the algorithms that maintain such models. Moreover, online evaluation methodologies available in the literature generally fall short on the statistical validation of results, since this validation is not trivially applicable to stream-based algorithms. We propose a k-fold validation framework for the pairwise comparison of recommendation algorithms that learn from user feedback streams, using prequential evaluation. Our proposal enables continuous statistical testing on adaptive-size sliding windows over the outcome of the prequential process, allowing practitioners and researchers to make decisions in real time based on solid statistical evidence. We present a set of experiments to gain insights on the sensitivity and robustness of two statistical tests-McNemar's and Wilcoxon signed rank-in a streaming data environment. Our results show that besides allowing a real-time, fine-grained online assessment, the online versions of the statistical tests are at least as robust as the batch versions, and definitely more robust than a simple prequential single-fold approach.
2021
Autores
Gatzioura, A; Vinagre, J; Jorge, AM; Sanchez Marre, M;
Publicação
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Abstract
Although widely used, the majority of current music recommender systems still focus on recommendations' accuracy, user preferences and isolated item characteristics, without evaluating other important factors, like the joint item selections and the recommendation moment. However, when it comes to playlist recommendations, additional dimensions, as well as the notion of user experience and perception, should be taken into account to improve recommendations' quality. In this work, HybA, a hybrid recommender system for automatic playlist continuation, that combines Latent Dirichlet Allocation and Case-Based Reasoning, is proposed. This system aims to address "similar concepts" rather than similar users. More than generating a playlist based on user requirements, like automatic playlist generation methods, HybA identifies the semantic characteristics of a started playlist and reuses the most similar past ones, to recommend relevant playlist continuations. In addition, support to beyond accuracy dimensions, like increased coherence or diverse items' discovery, is provided. To overcome the semantic gap between music descriptions and user preferences, identify playlist structures and capture songs' similarity, a graph model is used. Experiments on real datasets have shown that the proposed algorithm is able to outperform other state of the art techniques, in terms of accuracy, while balancing between diversity and coherence.
2021
Autores
Campos, R; Duque, J; Cândido, T; Mendes, J; Dias, G; Jorge, A; Nunes, C;
Publicação
Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part II
Abstract
Over the past few years, the amount of information generated, consumed and stored on the Web has grown exponentially, making it impossible for users to keep up to date. Temporal data representation can help in this process by giving documents a sense of organization. Timelines are a natural way to showcase this data, giving users the chance to get familiar with a topic in a shorter amount of time. Despite their importance, little is known about their use in the context of single documents. In this paper, we present Time-Matters, a novel system to automatically explore arbitrary texts through temporal narratives in an interactive fashion that allows users to get insights into the relevant temporal happenings of a story through multiple components, including temporal annotation, storylines or temporal clustering. In contrast to classical timeline multi-document summarization tasks, we focus on performing text summaries of single documents with a temporal lens. This approach may be of interest to a number of providers such as media outlets, for which automatically building a condensed overview of a text is an important issue. © 2021, Springer Nature Switzerland AG.
2021
Autores
Pasquali, A; Campos, R; Ribeiro, A; Santana, BS; Jorge, A; Jatowt, A;
Publicação
Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part I
Abstract
The rise of social media and the explosion of digital news in the web sphere have created new challenges to extract knowledge and make sense of published information. Automated timeline generation appears in this context as a promising answer to help users dealing with this information overload problem. Formally, Timeline Summarization (TLS) can be defined as a subtask of Multi-Document Summarization (MDS) conceived to highlight the most important information during the development of a story over time by summarizing long-lasting events in a timely ordered fashion. As opposed to traditional MDS, TLS has a limited number of publicly available datasets. In this paper, we propose TLS-Covid19 dataset, a novel corpus for the Portuguese and English languages. Our aim is to provide a new, larger and multi-lingual TLS annotated dataset that could foster timeline summarization evaluation research and, at the same time, enable the study of news coverage about the COVID-19 pandemic. TLS-Covid19 consists of 178 curated topics related to the COVID-19 outbreak, with associated news articles covering almost the entire year of 2020 and their respective reference timelines as gold-standard. As a final outcome, we conduct an experimental study on the proposed dataset over two extreme baseline methods. All the resources are publicly available at https://github.com/LIAAD/tls-covid19. © 2021, Springer Nature Switzerland AG.
2021
Autores
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Finlayson, MA;
Publicação
Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part II
Abstract
Narrative extraction, understanding and visualization is currently a popular topic and an important tool for humans interested in achieving a deeper understanding of text. Information Retrieval (IR), Natural Language Processing (NLP) and Machine Learning (ML) already offer many instruments that aid the exploration of narrative elements in text and within unstructured data. Despite evident advances in the last couple of years the problem of automatically representing narratives in a structured form, beyond the conventional identification of common events, entities and their relationships, is yet to be solved. This workshop held virtually onApril 1st, 2021 co-located with the 43rd European Conference on Information Retrieval (ECIR’21) aims at presenting and discussing current and future directions for IR, NLP, ML and other computational fields capable of improving the automatic understanding of narratives. It includes a session devoted to regular, short and demo papers, keynote talks and space for an informal discussion of the methods, of the challenges and of the future of the area. © 2021, Springer Nature Switzerland AG.
2021
Autores
Trindade, J; Vinagre, J; Fernandes, K; Paiva, N; Jorge, A;
Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021
Abstract
In the past decade, we have witnessed the widespread adoption of Deep Neural Networks (DNNs) in several Machine Learning tasks. However, in many critical domains, such as healthcare, finance, or law enforcement, transparency is crucial. In particular, the lack of ability to conform with prior knowledge greatly affects the trustworthiness of predictive models. This paper contributes to the trustworthiness of DNNs by promoting monotonicity. We develop a multi-layer learning architecture that handles a subset of features in a dataset that, according to prior knowledge, have a monotonic relation with the response variable. We use two alternative approaches: (i) imposing constraints on the model's parameters, and (ii) applying an additional component to the loss function that penalises non-monotonic gradients. Our method is evaluated on classification and regression tasks using two datasets. Our model is able to conform to known monotonic relations, improving trustworthiness in decision making, while simultaneously maintaining small and controllable degradation in predictive ability.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.