Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2023

Event Extraction for Portuguese: A QA-Driven Approach Using ACE-2005

Authors
Cunha, LF; Campos, R; Jorge, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
Event extraction is an Information Retrieval task that commonly consists of identifying the central word for the event (trigger) and the event's arguments. This task has been extensively studied for English but lags behind for Portuguese, partly due to the lack of task-specific annotated corpora. This paper proposes a framework in which two separated BERT-based models were fine-tuned to identify and classify events in Portuguese documents. We decompose this task into two sub-tasks. Firstly, we use a token classification model to detect event triggers. To extract event arguments, we train a Question Answering model that queries the triggers about their corresponding event argument roles. Given the lack of event annotated corpora in Portuguese, we translated the original version of the ACE-2005 dataset (a reference in the field) into Portuguese, producing a new corpus for Portuguese event extraction. To accomplish this, we developed an automatic translation pipeline. Our framework obtains F1 marks of 64.4 for trigger classification and 46.7 for argument classification setting, thus a new state of the art reference for these tasks in Portuguese.

2023

Symbolic Versus Deep Learning Techniques for Explainable Sentiment Analysis

Authors
Muhammad, SH; Brazdil, P; Jorge, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
Deep learning approaches have become popular in many different areas, including sentiment analysis (SA), because of their competitive performance. However, the downside of this approach is that they do not provide understandable explanations on how the sentiment values are calculated. In contrast, previous approaches that used sentiment lexicons can do that, but their performance is normally not high. To leverage the strengths of both approaches, we present a neuro-symbolic approach that combines deep learning (DL) and symbolic methods for SA tasks. The DL approach uses a pre-trained language model (PLM) to construct sentiment lexicon. The symbolic approach exploits the constructed sentiment lexicon and manually constructed shifter patterns to determine the sentiment of a sentence. Our experimental results show that the proposed approach leads to promising results with the additional advantage that sentiment predictions can be accompanied by understandable explanations.

2023

Combining Neighbor Models to Improve Predictions of Age of Onset of ATTRv Carriers

Authors
Pedroto, M; Jorge, A; Mendes-Moreira, J; Coelho, T;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT II

Abstract
Transthyretin (TTR)-related familial amyloid polyneuropathy (ATTRv) is a life-threatening autosomal dominant disease and the age of onset represents the moment when first symptoms are felt. Accurately predicting the age of onset for a given patient is relevant for risk assessment and treatment management. In this work, we evaluate the impact of combining prediction models obtained from neighboring time windows on prediction error. We propose Symmetric (Sym) and Asymmetric (Asym) models which represent two different averaging approaches. These are incorporated with a weighting mechanism as to create Symmetric (Sym), Symmetric-weighted (Sym-w), Asymmetric (Asym), and Asymmetric-weighted (Asym-w). These four ensemble models are then compared to the original approach which is focused on individual regression base learners namely: Baseline (BL), Decision Tree (DT), Elastic Net (EN), Lasso (LA), Linear Regression (LR), Random Forest (RF), Ridge (RI), Support Vector Regressor (SV) and XGBoost (XG). Our results show that by aggregating predictions from neighbor models the average mean absolute error obtained by each base learner decreases. Overall, the best results are achieved by regression-based ensemble tree models as base learners.

2023

Report on the 6th International Workshop on Narrative Extraction from Texts (Text2Story 2023) at ECIR 2023

Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, H; Mansouri, B;

Publication
SIGIR Forum

Abstract

2023

GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

Authors
Sousa, H; Guimaraes, N; Jorge, A; Campos, R;

Publication
2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT

Abstract
The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models - GPT-3 and GPT-3.5, commonly known as ChatGPT - in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.

2023

Proceedings of the 6th Workshop on Online Recommender Systems and User Modeling co-located with the 17th ACM Conference on Recommender Systems (RecSys 2023), Singapore, September 19th, 2023

Authors
Vinagre, J; Ghossein, MA; Peska, L; Jorge, AM; Bifet, A;

Publication
ORSUM@RecSys

Abstract

  • 24
  • 440