2026
Autores
Biadgligne, Y; Baghoussi, Y; Li, K; Jorge, A;
Publicação
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2025, PT I
Abstract
Federated Learning (FL) enables decentralized model training while preserving data privacy but remains susceptible to poisoning attacks. Malicious clients can manipulate local data or model updates, threatening FL's reliability, especially in privacy-sensitive domains like healthcare and finance. While client-side optimization algorithms play a crucial role in training local models, their resilience to such attacks is underexplored. This study empirically evaluates the robustness of three widely used optimization algorithms: SGD, Adam, and RMSProp-against label-flipping attacks (LFAs) in image classification tasks using Convolutional Neural Networks (CNNs). Through 900 individual runs in both federated and centralized learning (CL) settings, we analyze their performance under Independent and Identically Distributed (IID) and Non-IID data distributions. Results reveal that SGD is the most resilient, achieving the highest accuracy in 87% of cases, while Adam performs best in 13%. Additionally, centralized models outperform FL on CIFAR-10, whereas FL excels on Fashion-MNIST, highlighting the impact of dataset characteristics on adversarial robustness.
2026
Autores
Fernandes, RF; Oliveira, HS; Ribeiro, PP; Oliveira, HP;
Publicação
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2025, PT II
Abstract
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a softprompt model to improve the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed that the inclusion of tailored soft-prompts improved accuracy and efficiency, while ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.
2026
Autores
Leite, M; Rb Silva, R; Guimaraes, N; Stork, L; Jorge, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
Providing healthcare professionals with quick access to structured standardized information enables comprehensive analysis and improves clinical decision-making. However, an important part of the records in health institutions is in the form of free text. This paper proposes a pipeline that automatically extracts medical information from Electronic Medical Records (EMRs), based on large language models (LLMs) and a domain ontology defined and validated in collaboration with a medical expert. The output is a knowledge graph of clinical narratives that can be used to search through repositories of EMRs or discover new facts. We showcase our approach on a set of Portuguese clinical texts of cases of Acute Myeloid Leukemia (AML) guided by one medical expert. We evaluate the quality of the extraction and of the knowledge graph.
2026
Autores
Pereira, AC; Folgado, D; Barandas, M; Soares, C; Carreiro, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
Subgroup discovery aims to identify interpretable segments of a dataset where model behavior deviates from global trends. Traditionally, this involves uncovering patterns among data instances with respect to a target property, such as class labels or performance metrics. For example, classification accuracy can highlight subpopulations where models perform unusually well or poorly. While effective for model auditing and failure analysis, accuracy alone provides a limited view, as it does not reflect model confidence or sources of uncertainty. This work proposes a complementary approach: subgroup discovery using model uncertainty. Rather than identifying where the model fails, we focus on where it is systematically uncertain, even when predictions are correct. Such uncertainty may arise from intrinsic data ambiguity (aleatoric) or poor data representation in training (epistemic). It can highlight areas of the input space where the model's predictions are less robust or reliable. We evaluate the feasibility of this approach through controlled experiments on the classification of synthetic data and the Iris dataset. While our findings are exploratory and qualitative, they suggest that uncertainty-based subgroup discovery may uncover interpretable regions of interest, providing a promising direction for model auditing and analysis.
2026
Autores
Henriques, L; Guimaraes, N; Jorge, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages.
2026
Autores
Viana, FD; Pereira, BVL; Santos, M; Soares, C; Neto, AD;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
One strategy for constructing an artificial neural network with multiple hidden layers is to insert layers incrementally in stages. However, for this approach to be effective, each newly added layer must be properly aligned with the previous layers to avoid degradation of the network output and preserve the already learned knowledge. Ideally, inserting new layers should expand the network's search space, enabling it to explore more complex representations and ultimately improve overall performance. In this work, we present a novel method for layer insertion in stacked autoencoder networks. The method developed maintains the learning obtained before the layer insertion and allows the acquisition of new knowledge; therefore, it is denoted collaborative. This approach allows this kind of neural network to evolve and learn effectively, while significantly reducing the design time. Unlike traditional methods, it addresses the common challenges associated with manually defining the number of layers and the number of neurons in each layer. By automating this aspect of network design, the proposed method promotes scalability and adaptability between tasks. The effectiveness of the approach was validated on multiple binary classification datasets using neural networks initialized with various architectures. The experimental results demonstrate that the method maintains performance while streamlining the architectural design process.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.