Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Evelin Freire Amorim

2024

Identification of Participants of Narratives Using Knowledge Bases

Authors
Machado, J; Amorim, E;

Publication
Anais do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD 2024)

Abstract
Identifying participants in narratives is important to understand and extract meaning from unstructured texts. This paper investigates the use of DBpedia and Wikifier for this task. We tested these two knowledge base platforms to evaluate their performance in recognizing and extracting entities in Portuguese-language journalistic narrative texts. The results show that both DBpedia and Wikifier present similar results in identifying participants, around 0.40 in the f1-score. The objective of this paper is to study the potential of knowledge bases to improve the understanding of narratives, in addition to suggesting directions for future research in this domain.

2024

ISO 24617-8 Applied: Insights from Multilingual Discourse Relations Annotation in English, Polish, and Portuguese

Authors
Tomaszewska, A; Silvano, P; Leal, A; Amorim, E;

Publication
ISA 2024: 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation at LREC-COLING 2024, Workshop Proceedings

Abstract
The main objective of this study is to contribute to multilingual discourse research by employing ISO-24617 Part 8 (Semantic Relations in Discourse, Core Annotation Schema – DR-core) for annotating discourse relations. Centering around a parallel discourse relations corpus that includes English, Polish, and European Portuguese, we initiate one of the few ISO-based comparative analyses through a multilingual corpus that aligns discourse relations across these languages. In this paper, we discuss the project’s contributions, including the annotated corpus, research findings, and statistics related to the use of discourse relations. The paper further discusses the challenges encountered in complying with the ISO standard, such as defining the scope of arguments and annotating specific relation types like Expansion. Our findings highlight the necessity for clearer definitions of certain discourse relations and more precise guidelines for argument spans, especially concerning the inclusion of connectives. Additionally, the study underscores the importance of ongoing collaborative efforts to broaden the inclusion of languages and more comprehensive datasets, with the objective of widening the reach of ISO-guided multilingual discourse research. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

2024

A Legal Framework for Natural Language Processing Model Training in Portugal

Authors
Almeida, R; Amorim, E;

Publication
Legal and Ethical Issues in Human Language Technologies 2024, LEGAL 2024 at LREC-COLING 2024 - Workshop Proceedings

Abstract
Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead, these models use large volumes of textual data to learn meaningful representations of human languages. Associated with these advances, concerns about copyright and data privacy infringements caused by these applications have emerged. Despite these concerns, the pace at which new natural language processing applications continued to be developed largely outperformed the introduction of new regulations. Today, communication barriers between legal experts and computer scientists motivate many unintentional legal infringements during the development of such applications. In this paper, a multidisciplinary team intends to bridge this communication gap and promote more compliant Portuguese NLP research by presenting a series of everyday NLP use cases, while highlighting the Portuguese legislation that may arise during its development. © 2024 ELRA Language Resource Association.

2024

Untangling a Web of Temporal Relations in News Articles

Authors
Silvano, P; Amorim, E; Leal, A; Cantante, I; Jorge, A; Campos, R; Yu, N;

Publication
Proceedings of Text2Story - Seventh Workshop on Narrative Extraction From Texts held in conjunction with the 46th European Conference on Information Retrieval (ECIR 2024), Glasgow, Scotland, UK, March 24, 2024.

Abstract
Temporal reasoning has been the focus of several studies during the past years, both in linguistics and computational studies. Although advances on this topic are undeniable, there are still improvements to be made and new avenues to pursue. One relevant problem concerns the temporal ordering of the events, particularly asserting and representing how events are temporally related and how the story told in the narrative evolves. This paper aims to analyse the temporal structure of narratives present in news articles with the aid of different visualisations. To this end, we annotated a dataset of 119 news articles in European Portuguese following an annotation scheme that combines different parts of ISO 24617-Language Resource Management - Semantic Annotation Framework (SemAF). The temporal layer of this annotation scheme identifies the events and their main features, as well as the temporal links between the events. The annotation provided us with paramount information about the temporal characteristics of news at two levels: the story and the report levels. The visualisations that we propose facilitate the process of understanding how news are temporally organised, providing a more practical means to observe them. © 2024 Copyright for this paper by its authors.

2024

text2story: A Python Toolkit to Extract and Visualize Story Components of Narrative Text

Authors
Amorim, E; Campos, R; Jorge, AM; Mota, P; Almeida, R;

Publication
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.

Abstract
Story components, namely, events, time, participants, and their relations are present in narrative texts from different domains such as journalism, medicine, finance, and law. The automatic extraction of narrative elements encompasses several NLP tasks such as Named Entity Recognition, Semantic Role Labeling, Event Extraction, and Temporal Inference. The text2story Python, an easy-to-use modular library, supports the narrative extraction and visualization pipeline. The package contains an array of narrative extraction tools that can be used separately or in sequence. With this toolkit, end users can process free text in English or Portuguese and obtain formal representations, like standard annotation files or a formal logical representation. The toolkit also enables narrative visualization as Message Sequence Charts (MSC), Knowledge Graphs, and Bubble Diagrams, making it useful to visualize and transform human-annotated narratives. The package combines the use of off-the-shelf and custom tools and is easily patched (replacing existing components) and extended (e.g. with new visualizations). It includes an experimental module for narrative element effectiveness assessment and being is therefore also a valuable asset for researchers developing solutions for narrative extraction. To evaluate the baseline components, we present some results of the main annotators embedded in our package for datasets in English and Portuguese. We also compare the results with the extraction of narrative elements by GPT-3, a robust LLM model.

  • 2
  • 2