2024
Autores
de Souza, MC; Golo, MPS; Jorge, AMG; de Amorim, ECF; Campos, RNT; Marcacini, RM; Rezende, SO;
Publicação
INFORMATION SCIENCES
Abstract
Fake news detection (FND) tools are essential to increase the reliability of information in social media. FND can be approached as a machine learning classification problem so that discriminative features can be automatically extracted. However, this requires a large news set, which in turn implies a considerable amount of human experts' effort for labeling. In this paper, we explore Positive and Unlabeled Learning (PUL) to reduce the labeling cost. In particular, we improve PUL with the network-based Label Propagation (PU-LP) algorithm. PU-LP achieved competitive results in FND exploiting relations between news and terms and using few labeled fake news. We propose integrating an attention mechanism in PU-LP that can define which terms in the network are more relevant for detecting fake news. We use GNEE, a state-of-the-art algorithm based on graph attention networks. Our proposal outperforms state-of-the-art methods, improving F-1 in 2% to 10%, especially when only 10% labeled fake news are available. It is competitive with the binary baseline, even when nearly half of the data is labeled. Discrimination ability is also visualized through t-SNE. We also present an analysis of the limitations of our approach according to the type of text found in each dataset.
2024
Autores
Cunha, LF; Silvano, P; Campos, R; Jorge, A;
Publicação
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024
Abstract
Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we developed an alignment pipeline that incorporates several alignment techniques: lemmatization, fuzzy matching, synonym matching, multiple translations and a BERT-based word aligner. To measure the alignment effectiveness, a subset of annotations from the ACE-2005-PT corpus was manually aligned by a linguist expert. This subset was then compared against our pipeline results which achieved exact and relaxed match scores of 70.55% and 87.55% respectively. As a result, we successfully generated a Portuguese version of the ACE-2005 corpus, which has been accepted for publication by LDC.
2024
Autores
Kurunathan, H; Li, K; Tovar, E; Jorge, AM; Ni, W; Jamalipour, A;
Publicação
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Abstract
The exploitation of radio channels' inherent randomness for generating secret keys within a vehicular platoon offers a promising approach to securing communications in dynamic and unpredictable environments. The channel-based key generation leverages the fact that the physical characteristics of the radio channel, such as fading, shadowing, and multipath propagation, vary in a complex manner that makes it difficult for external adversaries to predict or replicate. A challenge lies in accurately assessing the channel's randomness to ensure the generated keys are both secure and consistent across the platooning vehicles, especially in vehicular environments with high mobility and the ever-changing urban landscape. This paper proposes a novel channel-based key generation (DRL-KeyAgree) technique to enhance communication security within vehicular platoons through combinatorial deep reinforcement learning (DRL). DRL-KeyAgree addresses key disagreement among platooning vehicles by training advantage Actor-Critic (A2C), which integrates policy-and value-based strategies to dynamically select optimal quantization intervals adapting to the random wireless channels. Further incorporation of Long Short-Term Memory (LSTM) allows DRL-KeyAgree to capture the characteristics of partially observable radio channels, significantly enhancing the key agreement rate among vehicles. DRL-KeyAgree is rigorously evaluated using the standard National Institute of Standards and Technology (NIST) test suite.
2024
Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Mansouri, B;
Publicação
SIGIR Forum
Abstract
2024
Autores
Piskorski, J; Stefanovitch, N; Alam, F; Campos, R; Dimitrov, D; Jorge, A; Pollak, S; Ribin, N; Fijavz, Z; Hasanain, M; Silvano, P; Sartori, E; Guimarães, N; Vitez, AZ; Pacheco, AF; Koychev, I; Yu, N; Nakov, P; San Martino, GD;
Publicação
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September, 2024.
Abstract
We present an overview of CheckThat! Lab's 2024 Task 3, which focuses on detecting 23 persuasion techniques at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English, Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali-Palestian conflict, the Russia-Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of them submitted system responses which were compared against a baseline and a task organizers' system, which used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall task setup, including the evaluation methodology, and an overview of the participating systems. The datasets accompanied with the evaluation scripts are released to the research community, which we believe will foster research on persuasion technique detection and analysis of online media content in various fields and contexts. © 2024 Copyright for this paper by its authors.
2024
Autores
Silvano, P; Amorim, E; Leal, A; Cantante, I; Jorge, A; Campos, R; Yu, N;
Publicação
Proceedings of Text2Story - Seventh Workshop on Narrative Extraction From Texts held in conjunction with the 46th European Conference on Information Retrieval (ECIR 2024), Glasgow, Scotland, UK, March 24, 2024.
Abstract
Temporal reasoning has been the focus of several studies during the past years, both in linguistics and computational studies. Although advances on this topic are undeniable, there are still improvements to be made and new avenues to pursue. One relevant problem concerns the temporal ordering of the events, particularly asserting and representing how events are temporally related and how the story told in the narrative evolves. This paper aims to analyse the temporal structure of narratives present in news articles with the aid of different visualisations. To this end, we annotated a dataset of 119 news articles in European Portuguese following an annotation scheme that combines different parts of ISO 24617-Language Resource Management - Semantic Annotation Framework (SemAF). The temporal layer of this annotation scheme identifies the events and their main features, as well as the temporal links between the events. The annotation provided us with paramount information about the temporal characteristics of news at two levels: the story and the report levels. The visualisations that we propose facilitate the process of understanding how news are temporally organised, providing a more practical means to observe them. © 2024 Copyright for this paper by its authors.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.