Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

2026

Video-based epileptic seizure classification: A novel multi-stage approach integrating vision and motion transformer deep learning models

Autores
Aslani, R; Karácsony, T; Fearns, N; Caldeiras, C; Vollmar, C; Rego, R; Rémi, J; Noachtar, S; Cunha, JPS;

Publicação
BIOMEDICAL SIGNAL PROCESSING AND CONTROL

Abstract
Automated seizure quantification and classification are needed for semiology-based epileptic seizure diagnosis support. To the best of our knowledge, the 5-class (Hypermotor, Automotor, Complex Motor, Psychogenic Non-Epileptic Seizures, and Generalized Tonic-Clonic Seizures) seizure video dataset (198 seizures from 74 patients) studied in this paper is the largest 5-class dataset ever curated, composed of monocular RGB videos from two university hospital epilepsy monitoring units. 2D skeletons were estimated using ViTPose, a vision transformer deep learning (DL) architecture, and lifted to 3D space using MotionBERT, a multimodal motion transformer architecture. The movements were quantified based on the estimated 3D skeleton sequences. Two approaches were evaluated for seizure classification: (1) classical machine learning methods (Random Forest (RF) and XGBoost) applied to quantified movement parameters, and (2) 2D skeleton-based DL using MotionBERT action, an action recognition DL model, to which we perform transfer-learning. The best model achieved a promising, above literature, 5-fold cross-validated macro average F1-score of 0.84 +/- 0.09 (RF) for 5-class classification. The binary case (Automotor vs Hypermotor) resulted in 0.80 +/- 0.18 (MotionBERT action), and adding a 3rd class (Complex motor) lowered to 0.65 +/- 0.14 (RF). This novel multi-stage classification ensures that the included movement features are traceable, allowing interpretable AI exploration of this novel approach supporting future clinical diagnosis.

2026

Optimizing Medical Image Captioning with Conditional Prompt Encoding

Autores
Fernandes, RF; Oliveira, HS; Ribeiro, PP; Oliveira, HP;

Publicação
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2025, PT II

Abstract
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a softprompt model to improve the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed that the inclusion of tailored soft-prompts improved accuracy and efficiency, while ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.

2026

Overview of the CLEF 2025 JOKER Lab: Humour in Machine

Autores
Ermakova, L; Campos, R; Bosser, AG; Miller, T;

Publicação
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2025

Abstract
Humour poses a unique challenge for artificial intelligence, as it often relies on non-literal language, cultural references, and linguistic creativity. The JOKER Lab, now in its fourth year, aims to advance computational humour research through shared tasks on curated, multilingual datasets, with applications in education, computer-mediated communication and translation, and conversational AI. This paper provides an overview of the JOKER Lab held at CLEF 2025, detailing the setup and results of its three main tasks: (1) humour-aware information retrieval, which involves searching a document collection for humorous texts relevant to user queries in either English or Portuguese; (2) pun translation, focussed on humour-preserving translation of paronomastic jokes from English into French; and (3) onomastic wordplay translation, a task addressing the translation of name-based wordplay from English into French. The 2025 edition builds upon previous iterations by expanding datasets and emphasising nuanced, manual evaluation methods. The Task 1 results show a marked improvement this year, apparently due to participants' judicious combination of retrieval and filtering techniques. Tasks 2 and 3 remain challenging, not only in terms of system performance but also in terms of defining meaningful and reliable evaluation metrics.

2026

Unsupervised contrastive analysis for anomaly detection in brain MRIs via conditional diffusion models

Autores
Patrício, C; Barbano, CA; Fiandrotti, A; Renzulli, R; Grangetto, M; Teixeira, LF; Neves, JC;

Publicação
PATTERN RECOGNITION LETTERS

Abstract
Contrastive Analysis (CA) detects anomalies by contrasting patterns unique to a target group (e.g., unhealthy subjects) from those in a background group (e.g., healthy subjects). In the context of brain MRIs, existing CA approaches rely on supervised contrastive learning or variational autoencoders (VAEs) using both healthy and unhealthy data, but such reliance on target samples is challenging in clinical settings. Unsupervised Anomaly Detection (UAD) learns a reference representation of healthy anatomy, eliminating the need for target samples. Deviations from this reference distribution can indicate potential anomalies. In this context, diffusion models have been increasingly adopted in UAD due to their superior performance in image generation compared to VAEs. Nonetheless, precisely reconstructing the anatomy of the brain remains a challenge. In this work, we bridge CA and UAD by reformulating contrastive analysis principles for the unsupervised setting. We propose an unsupervised framework to improve the reconstruction quality by training a self-supervised contrastive encoder on healthy images to extract meaningful anatomical features. These features are used to condition a diffusion model to reconstruct the healthy appearance of a given image, enabling interpretable anomaly localization via pixel-wise comparison. We validate our approach through a proof-of-concept on a facial image dataset and further demonstrate its effectiveness on four brain MRI datasets, outperforming baseline methods in anomaly localization on the NOVA benchmark.

2026

Challenges and Opportunities for Designing Digital Communication Interfaces for Persons with Partial Locked-In Syndrome

Autores
Amado, P; Penedos-Santiago, E; Lima, C; Simoes, S; Giesteira, B; Peçaibes, V;

Publicação
ARTSIT, INTERACTIVITY AND GAME CREATION, ARTSIT 2024, PT II

Abstract
This integrative literature review synthesizes insights from multiple disciplines to address the challenges and opportunities in designing digital communication interfaces for persons with Locked-In Syndrome (LIS). The paper highlights the importance of a multidisciplinary approach that includes ethical co-design, visual design principles, and Human-Computer Interaction (HCI). It emphasizes how important it is to have user-friendly, visually appealing, and accessible interfaces to help persons with LIS to communicate more effectively. Important technologies are evaluated for their potential to improve communication, including Augmented and Virtual Reality (AR & VR), Eye Tracking, and Brain-Computer Interfaces (BCI). To guarantee that the emerging technologies are both efficient and considerate of user demands, the review emphasizes the significance of ethical considerations and patient-centered design. This study intends to direct future design-based action research in constructing functional digital communication systems, using head-mounted Extended Reality (XR) technologies, by combining the various research findings from the review.

2026

Covering with Network Design for Wildfire Promptness

Autores
Silva, E; e Alvelos, eF; Marto, M;

Publicação
Lecture Notes in Operations Research

Abstract
We consider the problem of selecting bases for firefighting activities (e.g., vigilance, water refill, initial attack) and links between them in the context of wildfire promptness. Bases can be facilities, such as watchtowers and water tanks, or positions from where an initial attack is conducted. It is assumed that it is advantageous to connect bases in such a way that resources (e.g. ground crews) can quickly move between them. The general problem is modelled in a general way as integration of a set covering problem (for selecting the location of the bases) and a travelling salesman problem where the cities are the selected locations and the arcs the links that connect them. We propose a mixed integer programming model where objectives are addressed by lexicographic optimization. The first objective is related to cover potential ignition points with a high estimate of their initial spread rate of the fire at the detection time. Computational experiments are discussed for a scenario, of an actual landscape, with parameters estimated from a fire behaviour model that takes into account slope, fuels, and wind. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

  • 17
  • 4399