2026
Autores
Henriques, L; Guimaraes, N; Jorge, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages.
2026
Autores
Viana, FD; Pereira, BVL; Santos, M; Soares, C; Neto, AD;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I
Abstract
One strategy for constructing an artificial neural network with multiple hidden layers is to insert layers incrementally in stages. However, for this approach to be effective, each newly added layer must be properly aligned with the previous layers to avoid degradation of the network output and preserve the already learned knowledge. Ideally, inserting new layers should expand the network's search space, enabling it to explore more complex representations and ultimately improve overall performance. In this work, we present a novel method for layer insertion in stacked autoencoder networks. The method developed maintains the learning obtained before the layer insertion and allows the acquisition of new knowledge; therefore, it is denoted collaborative. This approach allows this kind of neural network to evolve and learn effectively, while significantly reducing the design time. Unlike traditional methods, it addresses the common challenges associated with manually defining the number of layers and the number of neurons in each layer. By automating this aspect of network design, the proposed method promotes scalability and adaptability between tasks. The effectiveness of the approach was validated on multiple binary classification datasets using neural networks initialized with various architectures. The experimental results demonstrate that the method maintains performance while streamlining the architectural design process.
2026
Autores
Salazar, T; Araujo, H; Cano, A; Abreu, PH;
Publicação
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
Group fairness in machine learning is an important area of research focused on achieving equitable outcomes across different groups defined by sensitive attributes such as race or gender. Federated learning, a decentralized approach to training machine learning models across multiple clients, amplifies the need for fairness methodologies due to its inherent heterogeneous data distributions that can exacerbate biases. The intersection of federated learning and group fairness has attracted significant interest, with 48 research works specifically dedicated to addressing this issue. However, no comprehensive survey has specifically focused on group fairness in Federated Learning. In this work, we analyze the key challenges of this topic, propose practices for its identification and benchmarking, and create a novel taxonomy based on criteria such as data partitioning, location, and strategy. Furthermore, we analyze broader concerns, review how different approaches handle the complexities of various sensitive attributes, examine common datasets and applications, and discuss the ethical, legal, and policy implications of group fairness in FL. We conclude by highlighting key areas for future research, emphasizing the need for more methods to address the complexities of achieving group fairness in federated systems.
2026
Autores
Marcela Jorio; António Amaral; Paula Ferreira;
Publicação
Springer proceedings in earth and environmental sciences
Abstract
2026
Autores
Matos, M; Gomes, F; Nogueira, F; Almeida, F;
Publicação
INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS
Abstract
PurposeDetecting anomalous access to electronic health records (EHRs) is critical for safeguarding patient privacy and ensuring compliance with healthcare regulations. Traditional anomaly detection methods often struggle in this domain due to extreme class imbalance, limited labelled data and the subtlety of insider threats. This study proposes a lightweight, hybrid anomaly detection framework that integrates unsupervised, supervised and rule-based approaches using a meta-classifier architecture.Design/methodology/approachAn experimental and model-development approach is employed, combining machine learning techniques with domain-inspired rule modelling to construct a hybrid anomaly detection framework for healthcare access logs. Performance of the algorithm is measured using standard classification metrics such as precision, recall, F1-score and accuracy.FindingsEvaluated on a synthetic but realistic dataset of 50.000 normal and 500 labelled anomalous healthcare access events, the proposed framework achieved superior performance compared to standalone models as well as other hybrid models, with an F1-score of 0.8989 and recall of 0.8180. It also maintained low inference latency (0.028 ms) and energy consumption (4.03e-07 kg CO2), making it suitable for deployment in resource-constrained clinical environments.Originality/valueThis study highlights the potential of a hybrid meta-classifier to enhance anomaly detection in healthcare access logs, capturing both subtle and obvious anomalies while outperforming conventional models and remaining efficient, scalable and practical for real-time monitoring.
2026
Autores
Rabaev, I; Litvak, M; Bass, R; Campos, R; Jorge, AM; Jatowt, A;
Publicação
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2025, PT V
Abstract
This report describes the ICDAR 2025 Competition on Automatic Classification of Literary Epochs (ICDAR 2025 CoLiE), which consisted of two tasks focused on automatic prediction of the time in which a book was written (date of first publication). Both tasks comprised two sub-tasks, where a related fine-grained classification was addressed. Task 1 consisted of the identification of literary epochs, such as Romanticism or Modernism (sub-task 1.1), and a more precise classification of the period within the epoch (sub-task 1.2). Task 2 addressed the chronological identification of century (sub-task 2.1) or decade (sub-task 2.2). The compiled dataset and the reported findings are valuable to the scientific community and contribute to advancing research in the automatic dating of texts and its applications in digital humanities and temporal text analysis.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.