2021
Authors
Sato, M; Jatowt, A; Duan, YJ; Campos, R; Yoshikawa, M;
Publication
2021 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2021)
Abstract
Our society generates massive amounts of digital data, significant portion of which is being archived and made accessible to the public for the current and future use. In addition, historical born-analog documents are being increasingly digitized and included in document archives which are available online. Professionals who use document archives tend to know what they wish to search for. Yet, if the results are to be useful and attractive for ordinary users they need to contain content which is interesting and familiar. However, the state-of-the-art retrieval methods for document archives basically apply same techniques as search engines for synchronic document collections. In this paper, we introduce a novel concept of estimating the relation of archival documents to the present times, called contemporary relevance. Contemporary relevance can be used for improving access to archival document collections so that users have higher probability of finding interesting or useful content. We then propose an effective method for computing contemporary relevance degrees of news articles using Learning to Rank with a range of diverse features, and we successfully test it on the New York Times Annotated document collection. Our proposal offers a novel paradigm of information access to archival document collections by incorporating the context of contemporary time.
2021
Authors
Campos, R; Pasquali, A; Jatowt, A; Mangaravite, V; Jorge, AM;
Publication
The Past Web: Exploring Web Archives
Abstract
Despite significant advances in web archive infrastructures, the problem of exploring the historical heritage preserved by web archives is yet to be solved. Timeline generation emerges in this context as one possible solution for automatically producing summaries of news over time. Thanks to this, users can gain a better sense of reported news events, entities, stories or topics over time, such as getting a summary of the most important news about a politician, an organisation or a locality. Web archives play an important role here by providing access to a historical set of preserved information. This particular characteristic of web archives makes them an irreplaceable infrastructure and a valuable source of knowledge that contributes to the process of timeline generation. Accordingly, the authors of this chapter developed "Tell me Stories" (), a news summarisation system, built on top of the infrastructure of Arquivo.pt-the Portuguese web-archive-to automatically generate a timeline summary of a given topic. In this chapter, we begin by providing a brief overview of the most relevant research conducted on the automatic generation of timelines for past-web events. Next, we describe the architecture and some use cases for "Tell me Stories". Our system demonstrates how web archives can be used as infrastructures to develop innovative services. We conclude this chapter by enumerating open challenges in this field and possible future directions in the general area of temporal summarisation in web archives. © Springer Nature Switzerland AG 2021. All rights reserved.
2022
Authors
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;
Publication
ADVANCES IN INFORMATION RETRIEVAL, PT II
Abstract
Narrative extraction, understanding, verification, and visualization are currently popular topics for users interested in achieving a deeper understanding of text, researchers who want to develop accurate methods for text mining, and commercial companies that strive to provide efficient tools for that. Information Retrieval (IR), Natural Language Processing (NLP), Machine Learning (ML) and Computational Linguistics (CL) already offer many instruments that aid the exploration of narrative elements in text and within unstructured data. Despite evident advances in the last couple of years, the problem of automatically representing narratives in a structured form and interpreting them, beyond the conventional identification of common events, entities and their relationships, is yet to be solved. This workshop held virtually on April 10th, 2022 in conjunction with the 44th European Conference on Information Retrieval (ECIR '22) aims at presenting and discussing current and future directions for IR, NLP, ML and other computational linguistics-related fields capable of improving the automatic understanding of narratives. It includes sessions devoted to research, demo, position papers, work-in-progress, project description, nectar, and negative results papers, keynote talks and space for an informal discussion of the methods, of the challenges and of the future of this research area.
2022
Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;
Publication
Text2Story@ECIR
Abstract
2021
Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Finlayson, MA; Cordeiro, JP; Rocha, C; Ribeiro, A; Mansouri, B; Ansah, J; Pasquali, A;
Publication
SIGIR Forum
Abstract
2022
Authors
Campos, V; Campos, R; Mota, P; Jorge, A;
Publication
ADVANCES IN INFORMATION RETRIEVAL, PT II
Abstract
Social media platforms are used to discuss current events with very complex narratives that become difficult to understand. In this work, we introduce Tweet2Story, a web app to automatically extract narratives from small texts such as tweets and describe them through annotations. By doing this, we aim to mitigate the difficulties existing on creating narratives and give a step towards deeply understanding the actors and their corresponding relations found in a text. We build the web app to be modular and easy-to-use, which allows it to easily incorporate new techniques as they keep getting developed.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.