Publications

Publications by LIAAD

2025

ICDAR 2025 Competition on Automatic Classification of Literary Epochs

Authors
Rabaev, I; Litvak, M; Bass, R; Campos, R; Jorge, AM; Jatowt, A;

Publication
Document Analysis and Recognition - ICDAR 2025 - 19th International Conference, Wuhan, China, September 16-21, 2025, Proceedings, Part V

Abstract
This report describes the ICDAR 2025 Competition on Automatic Classification of Literary Epochs (ICDAR 2025 CoLiE), which consisted of two tasks focused on automatic prediction of the time in which a book was written (date of first publication). Both tasks comprised two sub-tasks, where a related fine-grained classification was addressed. Task 1 consisted of the identification of literary epochs, such as Romanticism or Modernism (sub-task 1.1), and a more precise classification of the period within the epoch (sub-task 1.2). Task 2 addressed the chronological identification of century (sub-task 2.1) or decade (sub-task 2.2). The compiled dataset and the reported findings are valuable to the scientific community and contribute to advancing research in the automatic dating of texts and its applications in digital humanities and temporal text analysis. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025

Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;

Publication
Text2Story@ECIR

Abstract

2025

Resilience Under Attack: Benchmarking Optimizers Against Poisoning in Federated Learning for Image Classification Using CNN

Authors
Biadgligne, Y; Baghoussi, Y; Li, K; Jorge, A;

Publication
Advances in Computational Intelligence - 18th International Work-Conference on Artificial Neural Networks, IWANN 2025, A Coruña, Spain, June 16-18, 2025, Proceedings, Part I

Abstract
Federated Learning (FL) enables decentralized model training while preserving data privacy but remains susceptible to poisoning attacks. Malicious clients can manipulate local data or model updates, threatening FL’s reliability, especially in privacy-sensitive domains like healthcare and finance. While client-side optimization algorithms play a crucial role in training local models, their resilience to such attacks is underexplored. This study empirically evaluates the robustness of three widely used optimization algorithms: SGD, Adam, and RMSProp—against label-flipping attacks (LFAs) in image classification tasks using Convolutional Neural Networks (CNNs). Through 900 individual runs in both federated and centralized learning (CL) settings, we analyze their performance under Independent and Identically Distributed (IID) and Non-IID data distributions. Results reveal that SGD is the most resilient, achieving the highest accuracy in 87% of cases, while Adam performs best in 13%. Additionally, centralized models outperform FL on CIFAR-10, whereas FL excels on Fashion-MNIST, highlighting the impact of dataset characteristics on adversarial robustness. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Knowledge-Aware Clinical Narrative Extraction Using Ontologies and Knowledge Graphs

Authors
Leite, M; Silva, RR; Guimarães, N; Stork, L; Jorge, A;

Publication
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I

Abstract
Providing healthcare professionals with quick access to structured standardized information enables comprehensive analysis and improves clinical decision-making. However, an important part of the records in health institutions is in the form of free text. This paper proposes a pipeline that automatically extracts medical information from Electronic Medical Records (EMRs), based on large language models (LLMs) and a domain ontology defined and validated in collaboration with a medical expert. The output is a knowledge graph of clinical narratives that can be used to search through repositories of EMRs or discover new facts. To promote the standardization of the extracted medical terms, we link them to existing international coding systems using biomedical repositories (UMLS - Unified Medical Language System and BioPortal - Biomedical Ontology Repository). We showcase our approach on a set of Portuguese clinical texts of cases of Acute Myeloid Leukemia (AML) guided by one medical expert. We evaluate the quality of the extraction and of the knowledge graph. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

LLM-Based Framework for Synthetic Data Generation in Portuguese Clinical NER

Authors
Henriques, L; Guimarães, N; Jorge, A;

Publication
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I

Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Anomaly Detection in Pet Behavioural Data

Authors
Silva, I; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II

Abstract
Pet owners are increasingly becoming conscious of their pet's necessities and are paying more attention to their overall wellness. The well-being of their pets is intricately linked to their own emotional and physical well-being. Some veterinary system solutions are emerging to provide proactive healthcare options for pets. One such solution offers the continuous monitoring of a pet's activity through accelerometer tracking devices. Based on data collected by this application, in this paper, we study different time aggregation and three unsupervised machine learning techniques to identify anomalies in pet behaviour data. Specifically, three algorithms, Isolation Forest, Local Outlier Factor, and K-Nearest Neighbour, with various thresholds to differentiate between normal and abnormal events. Results conducted on ten pets (five cats and five dogs) show that the most effective approach is to use daily data divided into periods. Moreover, the Local Outlier Factor is the best algorithm for detecting anomalies when prioritizing the identification of true positives. However, it also produces a high false positive ratio.

CloseRead Abstract