2026
Authors
Fernandes, RF; Oliveira, HS; Ribeiro, PP; Oliveira, HP;
Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2025, PT II
Abstract
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a softprompt model to improve the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed that the inclusion of tailored soft-prompts improved accuracy and efficiency, while ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.
2026
Authors
Leite, M; Rb Silva, R; Guimaraes, N; Stork, L; Jorge, A;
Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
Providing healthcare professionals with quick access to structured standardized information enables comprehensive analysis and improves clinical decision-making. However, an important part of the records in health institutions is in the form of free text. This paper proposes a pipeline that automatically extracts medical information from Electronic Medical Records (EMRs), based on large language models (LLMs) and a domain ontology defined and validated in collaboration with a medical expert. The output is a knowledge graph of clinical narratives that can be used to search through repositories of EMRs or discover new facts. We showcase our approach on a set of Portuguese clinical texts of cases of Acute Myeloid Leukemia (AML) guided by one medical expert. We evaluate the quality of the extraction and of the knowledge graph.
2026
Authors
Pereira, AC; Folgado, D; Barandas, M; Soares, C; Carreiro, A;
Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
Subgroup discovery aims to identify interpretable segments of a dataset where model behavior deviates from global trends. Traditionally, this involves uncovering patterns among data instances with respect to a target property, such as class labels or performance metrics. For example, classification accuracy can highlight subpopulations where models perform unusually well or poorly. While effective for model auditing and failure analysis, accuracy alone provides a limited view, as it does not reflect model confidence or sources of uncertainty. This work proposes a complementary approach: subgroup discovery using model uncertainty. Rather than identifying where the model fails, we focus on where it is systematically uncertain, even when predictions are correct. Such uncertainty may arise from intrinsic data ambiguity (aleatoric) or poor data representation in training (epistemic). It can highlight areas of the input space where the model's predictions are less robust or reliable. We evaluate the feasibility of this approach through controlled experiments on the classification of synthetic data and the Iris dataset. While our findings are exploratory and qualitative, they suggest that uncertainty-based subgroup discovery may uncover interpretable regions of interest, providing a promising direction for model auditing and analysis.
2026
Authors
Henriques, L; Guimaraes, N; Jorge, A;
Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages.
2026
Authors
Viana, FD; Pereira, BVL; Santos, M; Soares, C; Neto, AD;
Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
One strategy for constructing an artificial neural network with multiple hidden layers is to insert layers incrementally in stages. However, for this approach to be effective, each newly added layer must be properly aligned with the previous layers to avoid degradation of the network output and preserve the already learned knowledge. Ideally, inserting new layers should expand the network's search space, enabling it to explore more complex representations and ultimately improve overall performance. In this work, we present a novel method for layer insertion in stacked autoencoder networks. The method developed maintains the learning obtained before the layer insertion and allows the acquisition of new knowledge; therefore, it is denoted collaborative. This approach allows this kind of neural network to evolve and learn effectively, while significantly reducing the design time. Unlike traditional methods, it addresses the common challenges associated with manually defining the number of layers and the number of neurons in each layer. By automating this aspect of network design, the proposed method promotes scalability and adaptability between tasks. The effectiveness of the approach was validated on multiple binary classification datasets using neural networks initialized with various architectures. The experimental results demonstrate that the method maintains performance while streamlining the architectural design process.
2026
Authors
Salazar, T; Araujo, H; Cano, A; Abreu, PH;
Publication
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
Group fairness in machine learning is an important area of research focused on achieving equitable outcomes across different groups defined by sensitive attributes such as race or gender. Federated learning, a decentralized approach to training machine learning models across multiple clients, amplifies the need for fairness methodologies due to its inherent heterogeneous data distributions that can exacerbate biases. The intersection of federated learning and group fairness has attracted significant interest, with 48 research works specifically dedicated to addressing this issue. However, no comprehensive survey has specifically focused on group fairness in Federated Learning. In this work, we analyze the key challenges of this topic, propose practices for its identification and benchmarking, and create a novel taxonomy based on criteria such as data partitioning, location, and strategy. Furthermore, we analyze broader concerns, review how different approaches handle the complexities of various sensitive attributes, examine common datasets and applications, and discuss the ethical, legal, and policy implications of group fairness in FL. We conclude by highlighting key areas for future research, emphasizing the need for more methods to address the complexities of achieving group fairness in federated systems.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.