2023
Autores
Ströhle, T; Campos, R; Jatowt, A;
Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
In our data-flooded age, an enormous amount of redundant, but also disparate textual data is collected on a daily basis on a wide variety of topics. Much of this information refers to documents related to the same theme, that is, different versions of the same document, or different documents discussing the same topic. Being aware of such differences turns out to be an important aspect for those who want to perform a comparative task. However, as documents increase in size and volume, keeping up-to-date, detecting, and summarizing relevant changes between different documents or versions of it becomes unfeasible. This motivates the rise of the contrastive or comparative summarization task, which attempts to summarize the text of different documents related to the same topic in a way that highlights the relevant differences between them. Our research aims to provide a systematic literature review on contrastive or comparative summarization, highlighting the different methods, data sets, metrics, and applications. Overall, we found that contrastive summarization is most commonly used in controversial news articles, controversial opinions or sentiments on a topic, and reviews of a product. Despite the great interest in the topic, we note that standard data sets, as well as a competitive task dedicated to this topic, are yet to come to be proposed, eventually impeding the emergence of new methods. Moreover, the great breakthrough of using deep learning-based language models for abstract summaries in contrastive summarization is still missing.
2023
Autores
Eder, L; Campos, R; Jatowt, A;
Publicação
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023
Abstract
Versioned documents are common in many situations and play a vital part in numerous applications enabling an overview of the revisions made to a document or document collection. However, as documents increase in size, it gets difficult to summarize and comprehend all the changes made to versioned documents. In this paper, we propose a novel research problem of contrastive keyword extraction from versioned documents, and introduce an unsupervised approach that extracts keywords to reflect the key changes made to an earlier document version. In order to provide an easy-to-use comparison and summarization tool, an open-source demonstration is made available which can be found at https://contrastive-keyword-extraction.streamlit.app/.
2023
Autores
Litvak, M; Rabaev, I; Campos, R; Jorge, M; Jatowt, A;
Publicação
CEUR Workshop Proceedings
Abstract
[No abstract available]
2023
Autores
Campos, V; Campos, R; Jorge, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I
Abstract
Topics discussed on social media platforms contain a disparate amount of information written in colloquial language, making it difficult to understand the narrative of the topic. In this paper, we take a step forward, towards the resolution of this problem by proposing a framework that performs the automatic extraction of narratives from a document, such as tweet posts. To this regard, we propose a methodology that extracts information from the texts through a pipeline of tasks, such as co-reference resolution and the extraction of entity relations. The result of this process is embedded into an annotation file to be used by subsequent operations, such as visualization schemas. We named this framework Tweet2Story and measured its effectiveness under an evaluation schema that involved three different aspects: (i) as an Open Information extraction (OpenIE) task, (ii) by comparing the narratives of manually annotated news articles linked to tweets about the same topic and (iii) by comparing their knowledge graphs, produced by the narratives, in a qualitative way. The results obtained show a high precision and a moderate recall, on par with other OpenIE state-of-the-art frameworks and confirm that the narratives can be extracted from small texts. Furthermore, we show that the narrative can be visualized in an easily understandable way.
2023
Autores
Cunha, LF; Campos, R; Jorge, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I
Abstract
Event extraction is an Information Retrieval task that commonly consists of identifying the central word for the event (trigger) and the event's arguments. This task has been extensively studied for English but lags behind for Portuguese, partly due to the lack of task-specific annotated corpora. This paper proposes a framework in which two separated BERT-based models were fine-tuned to identify and classify events in Portuguese documents. We decompose this task into two sub-tasks. Firstly, we use a token classification model to detect event triggers. To extract event arguments, we train a Question Answering model that queries the triggers about their corresponding event argument roles. Given the lack of event annotated corpora in Portuguese, we translated the original version of the ACE-2005 dataset (a reference in the field) into Portuguese, producing a new corpus for Portuguese event extraction. To accomplish this, we developed an automatic translation pipeline. Our framework obtains F1 marks of 64.4 for trigger classification and 46.7 for argument classification setting, thus a new state of the art reference for these tasks in Portuguese.
2023
Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, H; Mansouri, B;
Publicação
SIGIR Forum
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.