Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

2026

Evaluating Transfer Learning Methods on Real-World Data Streams: A Case Study in Financial Fraud Detection

Authors
Pereira, RR; Bono, J; Ferreira, H; Ribeiro, P; Soares, C; Bizarro, P;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES. APPLIED DATA SCIENCE TRACK, ECML PKDD 2025, PT IX

Abstract
When the available data for a target domain is limited, transfer learning (TL) methods leverage related data-rich source domains to train and evaluate models, before deploying them on the target domain. However, most TL methods assume fixed levels of labeled and unlabeled target data, which contrasts with real-world scenarios where both data and labels arrive progressively over time. As a result, evaluations based on these static assumptions may not reflect how methods perform in practice. To support a more realistic assessment of TL methods in dynamic settings, we propose an evaluation framework that (1) simulates varying data availability over time, (2) creates multiple domains via resampling of a given dataset and (3) introduces inter-domain variability through controlled transformations, e.g., including time-dependent covariate and concept shifts. These capabilities enable the systematic simulation of a large number of variants of the experiments, providing deeper insights into how algorithms may behave when deployed. We demonstrate the usefulness of the proposed framework by performing a case study on a proprietary real-world suite of card payment datasets. To support reproducibility, we also apply the framework on the publicly available Bank Account Fraud (BAF) dataset. By providing a methodology for evaluating TL methods over time and in different data availability conditions, our framework supports a better understanding of model behavior in real-world environments, which enables more informed decisions when deploying models in new domains.

2026

A subject-based association network defines new pediatric sleep apnea phenotypes with different odds of recovery after treatment

Authors
Gutiérrez-Tobal, GC; Gomez-Pilar, J; Ferreira-Santos, D; Pereira-Rodrigues, P; Alvarez, D; del Campo, F; Gozal, D; Hornero, R;

Publication
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE

Abstract
Background and objectives: Timely treatment of pediatric obstructive sleep apnea (OSA) can prevent or reverse neurocognitive and cardiovascular morbidities. However, whether distinct phenotypes exist and account for divergent treatment effectiveness remains unknown. In this study, our goal is threefold: i) to define new data-driven pediatric OSA phenotypes, ii) to evaluate possible treatment effectiveness differences among them, and iii) to assess phenotypic information in predicting OSA resolution. Methods: We involved 22 sociodemographic, anthropometric, and clinical data from 464 children (5-10 years old) from the Childhood Adenotonsillectomy Trial (CHAT) database. Baseline information was used to automatically define pediatric OSA phenotypes using a new unsupervised subject-based association network. Follow-up data (7 months later) were used to evaluate the effects of the therapeutic intervention in terms of changes in the obstructive apnea-hypopnea index (OAHI) and the resolution of OSA (OAHI < 1 event per hour). An explainable artificial intelligence (XAI) approach was also developed to assess phenotypic information as OSA resolution predictor at baseline. Results: Our approach identified three OSA phenotypes (PHOSA1-PHOSA3), with PHOSA2 showing significantly lower odds of OSA recovery than PHOSA1 and PHOSA3 when treatment information was not considered (odds ratios, OR: 1.64 and 1.66, 95 % confidence intervals, CI: 1.03-2.62 and 1.01-2.69, respectively). The odds of OSA recovery were also significantly lower in PHOSA2 than in PHOSA3 when adenotonsillectomy was adopted as treatment (OR: 2.60, 95 % CI: 1.26-5.39). Our XAI approach identified 79.4 % (CI: 69.9-88.0 %) of children reaching OSA resolution after adenotonsillectomy, with a positive predictive value of 77.8 % (CI: 70.3 %-86.0 %). Conclusions: Our new subject-based association network successfully identified three clinically useful pediatric OSA phenotypes with different odds of therapeutic intervention effectiveness. Specifically, we found that children of any sex, >6 years old, overweight or obese, and with enlarged neck and waist circumference (PHOSA2) have less odds of recovering from OSA. Similarly, younger female children with no enlarged neck (PHOSA3) have higher odds of benefiting from adenotonsillectomy.

2026

Video-based epileptic seizure classification: A novel multi-stage approach integrating vision and motion transformer deep learning models

Authors
Aslani, R; Karácsony, T; Fearns, N; Caldeiras, C; Vollmar, C; Rego, R; Rémi, J; Noachtar, S; Cunha, JPS;

Publication
BIOMEDICAL SIGNAL PROCESSING AND CONTROL

Abstract
Automated seizure quantification and classification are needed for semiology-based epileptic seizure diagnosis support. To the best of our knowledge, the 5-class (Hypermotor, Automotor, Complex Motor, Psychogenic Non-Epileptic Seizures, and Generalized Tonic-Clonic Seizures) seizure video dataset (198 seizures from 74 patients) studied in this paper is the largest 5-class dataset ever curated, composed of monocular RGB videos from two university hospital epilepsy monitoring units. 2D skeletons were estimated using ViTPose, a vision transformer deep learning (DL) architecture, and lifted to 3D space using MotionBERT, a multimodal motion transformer architecture. The movements were quantified based on the estimated 3D skeleton sequences. Two approaches were evaluated for seizure classification: (1) classical machine learning methods (Random Forest (RF) and XGBoost) applied to quantified movement parameters, and (2) 2D skeleton-based DL using MotionBERT action, an action recognition DL model, to which we perform transfer-learning. The best model achieved a promising, above literature, 5-fold cross-validated macro average F1-score of 0.84 +/- 0.09 (RF) for 5-class classification. The binary case (Automotor vs Hypermotor) resulted in 0.80 +/- 0.18 (MotionBERT action), and adding a 3rd class (Complex motor) lowered to 0.65 +/- 0.14 (RF). This novel multi-stage classification ensures that the included movement features are traceable, allowing interpretable AI exploration of this novel approach supporting future clinical diagnosis.

2026

Resilience Under Attack: Benchmarking Optimizers Against Poisoning in Federated Learning for Image Classification Using CNN

Authors
Biadgligne, Y; Baghoussi, Y; Li, K; Jorge, A;

Publication
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2025, PT I

Abstract
Federated Learning (FL) enables decentralized model training while preserving data privacy but remains susceptible to poisoning attacks. Malicious clients can manipulate local data or model updates, threatening FL's reliability, especially in privacy-sensitive domains like healthcare and finance. While client-side optimization algorithms play a crucial role in training local models, their resilience to such attacks is underexplored. This study empirically evaluates the robustness of three widely used optimization algorithms: SGD, Adam, and RMSProp-against label-flipping attacks (LFAs) in image classification tasks using Convolutional Neural Networks (CNNs). Through 900 individual runs in both federated and centralized learning (CL) settings, we analyze their performance under Independent and Identically Distributed (IID) and Non-IID data distributions. Results reveal that SGD is the most resilient, achieving the highest accuracy in 87% of cases, while Adam performs best in 13%. Additionally, centralized models outperform FL on CIFAR-10, whereas FL excels on Fashion-MNIST, highlighting the impact of dataset characteristics on adversarial robustness.

2026

Optimizing Medical Image Captioning with Conditional Prompt Encoding

Authors
Fernandes, RF; Oliveira, HS; Ribeiro, PP; Oliveira, HP;

Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2025, PT II

Abstract
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a softprompt model to improve the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed that the inclusion of tailored soft-prompts improved accuracy and efficiency, while ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.

2026

A survey on group fairness in federated learning: challenges, taxonomy of solutions and directions for future research

Authors
Salazar, T; Araujo, H; Cano, A; Abreu, PH;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Group fairness in machine learning is an important area of research focused on achieving equitable outcomes across different groups defined by sensitive attributes such as race or gender. Federated learning, a decentralized approach to training machine learning models across multiple clients, amplifies the need for fairness methodologies due to its inherent heterogeneous data distributions that can exacerbate biases. The intersection of federated learning and group fairness has attracted significant interest, with 48 research works specifically dedicated to addressing this issue. However, no comprehensive survey has specifically focused on group fairness in Federated Learning. In this work, we analyze the key challenges of this topic, propose practices for its identification and benchmarking, and create a novel taxonomy based on criteria such as data partitioning, location, and strategy. Furthermore, we analyze broader concerns, review how different approaches handle the complexities of various sensitive attributes, examine common datasets and applications, and discuss the ethical, legal, and policy implications of group fairness in FL. We conclude by highlighting key areas for future research, emphasizing the need for more methods to address the complexities of achieving group fairness in federated systems.

  • 24
  • 4387