Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Evelin Freire Amorim
  • Cargo

    Investigador Auxiliar
  • Desde

    21 setembro 2020
002
Publicações

2024

Text2Story Lusa: A Dataset for Narrative Analysis in European Portuguese News Articles

Autores
Nunes, S; Jorge, AM; Amorim, E; Sousa, HO; Leal, A; Silvano, PM; Cantante, I; Campos, R;

Publicação
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.

Abstract
Narratives have been the subject of extensive research across various scientific fields such as linguistics and computer science. However, the scarcity of freely available datasets, essential for studying this genre, remains a significant obstacle. Furthermore, datasets annotated with narratives components and their morphosyntactic and semantic information are even scarcer. To address this gap, we developed the Text2Story Lusa datasets, which consist of a collection of news articles in European Portuguese. The first datasets consists of 357 news articles and the second dataset comprises a subset of 117 manually densely annotated articles, totaling over 50 thousand individual annotations. By focusing on texts with substantial narrative elements, we aim to provide a valuable resource for studying narrative structures in European Portuguese news articles. On the one hand, the first dataset provides researchers with data to study narratives from various perspectives. On the other hand, the annotated dataset facilitates research in information extraction and related tasks, particularly in the context of narrative extraction pipelines. Both datasets are made available adhering to FAIR principles, thereby enhancing their utility within the research community.

2024

Keywords attention for fake news detection using few positive labels

Autores
de Souza, MC; Golo, MPS; Jorge, AMG; de Amorim, ECF; Campos, RNT; Marcacini, RM; Rezende, SO;

Publicação
INFORMATION SCIENCES

Abstract
Fake news detection (FND) tools are essential to increase the reliability of information in social media. FND can be approached as a machine learning classification problem so that discriminative features can be automatically extracted. However, this requires a large news set, which in turn implies a considerable amount of human experts' effort for labeling. In this paper, we explore Positive and Unlabeled Learning (PUL) to reduce the labeling cost. In particular, we improve PUL with the network-based Label Propagation (PU-LP) algorithm. PU-LP achieved competitive results in FND exploiting relations between news and terms and using few labeled fake news. We propose integrating an attention mechanism in PU-LP that can define which terms in the network are more relevant for detecting fake news. We use GNEE, a state-of-the-art algorithm based on graph attention networks. Our proposal outperforms state-of-the-art methods, improving F-1 in 2% to 10%, especially when only 10% labeled fake news are available. It is competitive with the binary baseline, even when nearly half of the data is labeled. Discrimination ability is also visualized through t-SNE. We also present an analysis of the limitations of our approach according to the type of text found in each dataset.

2024

Identification of Participants of Narratives Using Knowledge Bases

Autores
Machado, J; Amorim, E;

Publicação
Anais do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD 2024)

Abstract
Identifying participants in narratives is important to understand and extract meaning from unstructured texts. This paper investigates the use of DBpedia and Wikifier for this task. We tested these two knowledge base platforms to evaluate their performance in recognizing and extracting entities in Portuguese-language journalistic narrative texts. The results show that both DBpedia and Wikifier present similar results in identifying participants, around 0.40 in the f1-score. The objective of this paper is to study the potential of knowledge bases to improve the understanding of narratives, in addition to suggesting directions for future research in this domain.

2024

ISO 24617-8 Applied: Insights from Multilingual Discourse Relations Annotation in English, Polish, and Portuguese

Autores
Tomaszewska, A; Silvano, P; Leal, A; Amorim, E;

Publicação
ISA 2024: 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation at LREC-COLING 2024, Workshop Proceedings

Abstract
The main objective of this study is to contribute to multilingual discourse research by employing ISO-24617 Part 8 (Semantic Relations in Discourse, Core Annotation Schema – DR-core) for annotating discourse relations. Centering around a parallel discourse relations corpus that includes English, Polish, and European Portuguese, we initiate one of the few ISO-based comparative analyses through a multilingual corpus that aligns discourse relations across these languages. In this paper, we discuss the project’s contributions, including the annotated corpus, research findings, and statistics related to the use of discourse relations. The paper further discusses the challenges encountered in complying with the ISO standard, such as defining the scope of arguments and annotating specific relation types like Expansion. Our findings highlight the necessity for clearer definitions of certain discourse relations and more precise guidelines for argument spans, especially concerning the inclusion of connectives. Additionally, the study underscores the importance of ongoing collaborative efforts to broaden the inclusion of languages and more comprehensive datasets, with the objective of widening the reach of ISO-guided multilingual discourse research. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

2024

A Legal Framework for Natural Language Processing Model Training in Portugal

Autores
Almeida, R; Amorim, E;

Publicação
Legal and Ethical Issues in Human Language Technologies 2024, LEGAL 2024 at LREC-COLING 2024 - Workshop Proceedings

Abstract
Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead, these models use large volumes of textual data to learn meaningful representations of human languages. Associated with these advances, concerns about copyright and data privacy infringements caused by these applications have emerged. Despite these concerns, the pace at which new natural language processing applications continued to be developed largely outperformed the introduction of new regulations. Today, communication barriers between legal experts and computer scientists motivate many unintentional legal infringements during the development of such applications. In this paper, a multidisciplinary team intends to bridge this communication gap and promote more compliant Portuguese NLP research by presenting a series of everyday NLP use cases, while highlighting the Portuguese legislation that may arise during its development. © 2024 ELRA Language Resource Association.