Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Jaime Cardoso

2023

Compressed Models Decompress Race Biases: What Quantized Models Forget for Fair Face Recognition

Autores
Neto, PC; Caldeira, E; Cardoso, JS; Sequeira, AF;

Publicação
International Conference of the Biometrics Special Interest Group, BIOSIG 2023, Darmstadt, Germany, September 20-22, 2023

Abstract

2023

Detecting Concepts and Generating Captions from Medical Images: Contributions of the VCMI Team to ImageCLEFmedical Caption 2023

Autores
Torto, IR; Patrício, C; Montenegro, H; Gonçalves, T; Cardoso, JS;

Publicação
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th to 21st, 2023.

Abstract
This paper presents the main contributions of the VCMI Team to the ImageCLEFmedical Caption 2023 task. We addressed both the concept detection and caption prediction tasks. Regarding concept detection, our team employed different approaches to assign concepts to medical images: multi-label classification, adversarial training, autoregressive modelling, image retrieval, and concept retrieval. We also developed three model ensembles merging the results of some of the proposed methods. Our best submission obtained an F1-score of 0.4998, ranking 3rd among nine teams. Regarding the caption prediction task, our team explored two main approaches based on image retrieval and language generation. The language generation approaches, based on a vision model as the encoder and a language model as the decoder, yielded the best results, allowing us to rank 5th among thirteen teams, with a BERTScore of 0.6147. © 2023 Copyright for this paper by its authors.

2024

Active Supervision: Human in the Loop

Autores
Cruz, RPM; Shihavuddin, ASM; Maruf, MH; Cardoso, JS;

Publicação
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
After the learning process, certain types of images may not be modeled correctly because they were not well represented in the training set. These failures can then be compensated for by collecting more images from the real-world and incorporating them into the learning process - an expensive process known as active learning. The proposed twist, called active supervision, uses the model itself to change the existing images in the direction where the boundary is less defined and requests feedback from the user on how the new image should be labeled. Experiments in the context of class imbalance show the technique is able to increase model performance in rare classes. Active human supervision helps provide crucial information to the model during training that the training set lacks.

2024

Explaining Bounding Boxes in Deep Object Detectors Using Post Hoc Methods for Autonomous Driving Systems

Autores
Nogueira, C; Fernandes, L; Fernandes, JND; Cardoso, JS;

Publicação
SENSORS

Abstract
Deep learning has rapidly increased in popularity, leading to the development of perception solutions for autonomous driving. The latter field leverages techniques developed for computer vision in other domains for accomplishing perception tasks such as object detection. However, the black-box nature of deep neural models and the complexity of the autonomous driving context motivates the study of explainability in these models that perform perception tasks. Moreover, this work explores explainable AI techniques for the object detection task in the context of autonomous driving. An extensive and detailed comparison is carried out between gradient-based and perturbation-based methods (e.g., D-RISE). Moreover, several experimental setups are used with different backbone architectures and different datasets to observe the influence of these aspects in the explanations. All the techniques explored consist of saliency methods, making their interpretation and evaluation primarily visual. Nevertheless, numerical assessment methods are also used. Overall, D-RISE and guided backpropagation obtain more localized explanations. However, D-RISE highlights more meaningful regions, providing more human-understandable explanations. To the best of our knowledge, this is the first approach to obtaining explanations focusing on the regression of the bounding box coordinates.

2024

Intrinsic Explainability for End-to-End Object Detection

Autores
Fernandes, L; Fernandes, JND; Calado, M; Pinto, JR; Cerqueira, R; Cardoso, JS;

Publicação
IEEE ACCESS

Abstract
Deep Learning models are automating many daily routine tasks, indicating that in the future, even high-risk tasks will be automated, such as healthcare and automated driving areas. However, due to the complexity of such deep learning models, it is challenging to understand their reasoning. Furthermore, the black box nature of the designed deep learning models may undermine public confidence in critical areas. Current efforts on intrinsically interpretable models focus only on classification tasks, leaving a gap in models for object detection. Therefore, this paper proposes a deep learning model that is intrinsically explainable for the object detection task. The chosen design for such a model is a combination of the well-known Faster-RCNN model with the ProtoPNet model. For the Explainable AI experiments, the chosen performance metric was the similarity score from the ProtoPNet model. Our experiments show that this combination leads to a deep learning model that is able to explain its classifications, with similarity scores, using a visual bag of words, which are called prototypes, that are learned during the training process. Furthermore, the adoption of such an explainable method does not seem to hinder the performance of the proposed model, which achieved a mAP of 69% in the KITTI dataset and a mAP of 66% in the GRAZPEDWRI-DX dataset. Moreover, our explanations have shown a high reliability on the similarity score.

2023

Transformer-Based Multi-Prototype Approach for Diabetic Macular Edema Analysis in OCT Images

Autores
Vidal, PL; Moura, Jd; Novo, J; Ortega, M; Cardoso, JS;

Publicação
IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023

Abstract
Optical Coherence Tomography (OCT) is the major diagnostic tool for the leading cause of blindness in developed countries: Diabetic Macular Edema (DME). Depending on the type of fluid accumulations, different treatments are needed. In particular, Cystoid Macular Edemas (CMEs) represent the most severe scenario, while Diffuse Retinal Thickening (DRT) is an early indicator of the disease but a challenging scenario to detect. While methodologies exist, their explanatory power is limited to the input sample itself. However, due to the complexity of these accumulations, this may not be enough for a clinician to assess the validity of the classification. Thus, in this work, we propose a novel approach based on multi-prototype networks with vision transformers to obtain an example-based explainable classification. Our proposal achieved robust results in two representative OCT devices, with a mean accuracy of 0.9099 ± 0.0083 and 0.8582 ± 0.0126 for CME and DRT-type fluid accumulations, respectively. © 2023 IEEE.

  • 57
  • 61