Publications

Publications by Jaime Cardoso

2023

Detecting Concepts and Generating Captions from Medical Images: Contributions of the VCMI Team to ImageCLEFmedical Caption 2023

Authors
Torto, IR; Patrício, C; Montenegro, H; Gonçalves, T; Cardoso, JS;

Publication
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th to 21st, 2023.

Abstract
This paper presents the main contributions of the VCMI Team to the ImageCLEFmedical Caption 2023 task. We addressed both the concept detection and caption prediction tasks. Regarding concept detection, our team employed different approaches to assign concepts to medical images: multi-label classification, adversarial training, autoregressive modelling, image retrieval, and concept retrieval. We also developed three model ensembles merging the results of some of the proposed methods. Our best submission obtained an F1-score of 0.4998, ranking 3rd among nine teams. Regarding the caption prediction task, our team explored two main approaches based on image retrieval and language generation. The language generation approaches, based on a vision model as the encoder and a language model as the decoder, yielded the best results, allowing us to rank 5th among thirteen teams, with a BERTScore of 0.6147. © 2023 Copyright for this paper by its authors.

CloseRead Abstract

2024

Active Supervision: Human in the Loop

Authors
Cruz, RPM; Shihavuddin, ASM; Maruf, MH; Cardoso, JS;

Publication
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
After the learning process, certain types of images may not be modeled correctly because they were not well represented in the training set. These failures can then be compensated for by collecting more images from the real-world and incorporating them into the learning process - an expensive process known as active learning. The proposed twist, called active supervision, uses the model itself to change the existing images in the direction where the boundary is less defined and requests feedback from the user on how the new image should be labeled. Experiments in the context of class imbalance show the technique is able to increase model performance in rare classes. Active human supervision helps provide crucial information to the model during training that the training set lacks.

CloseRead Abstract

2024

Explaining Bounding Boxes in Deep Object Detectors Using Post Hoc Methods for Autonomous Driving Systems

Authors
Nogueira, C; Fernandes, L; Fernandes, JND; Cardoso, JS;

Publication
SENSORS

Abstract
Deep learning has rapidly increased in popularity, leading to the development of perception solutions for autonomous driving. The latter field leverages techniques developed for computer vision in other domains for accomplishing perception tasks such as object detection. However, the black-box nature of deep neural models and the complexity of the autonomous driving context motivates the study of explainability in these models that perform perception tasks. Moreover, this work explores explainable AI techniques for the object detection task in the context of autonomous driving. An extensive and detailed comparison is carried out between gradient-based and perturbation-based methods (e.g., D-RISE). Moreover, several experimental setups are used with different backbone architectures and different datasets to observe the influence of these aspects in the explanations. All the techniques explored consist of saliency methods, making their interpretation and evaluation primarily visual. Nevertheless, numerical assessment methods are also used. Overall, D-RISE and guided backpropagation obtain more localized explanations. However, D-RISE highlights more meaningful regions, providing more human-understandable explanations. To the best of our knowledge, this is the first approach to obtaining explanations focusing on the regression of the bounding box coordinates.

CloseRead Abstract

2024

Intrinsic Explainability for End-to-End Object Detection

Authors
Fernandes, L; Fernandes, JND; Calado, M; Pinto, JR; Cerqueira, R; Cardoso, JS;

Publication
IEEE ACCESS

Abstract
Deep Learning models are automating many daily routine tasks, indicating that in the future, even high-risk tasks will be automated, such as healthcare and automated driving areas. However, due to the complexity of such deep learning models, it is challenging to understand their reasoning. Furthermore, the black box nature of the designed deep learning models may undermine public confidence in critical areas. Current efforts on intrinsically interpretable models focus only on classification tasks, leaving a gap in models for object detection. Therefore, this paper proposes a deep learning model that is intrinsically explainable for the object detection task. The chosen design for such a model is a combination of the well-known Faster-RCNN model with the ProtoPNet model. For the Explainable AI experiments, the chosen performance metric was the similarity score from the ProtoPNet model. Our experiments show that this combination leads to a deep learning model that is able to explain its classifications, with similarity scores, using a visual bag of words, which are called prototypes, that are learned during the training process. Furthermore, the adoption of such an explainable method does not seem to hinder the performance of the proposed model, which achieved a mAP of 69% in the KITTI dataset and a mAP of 66% in the GRAZPEDWRI-DX dataset. Moreover, our explanations have shown a high reliability on the similarity score.

CloseRead Abstract

2023

Transformer-Based Multi-Prototype Approach for Diabetic Macular Edema Analysis in OCT Images

Authors
Vidal, PL; Moura, Jd; Novo, J; Ortega, M; Cardoso, JS;

Publication
IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023

Abstract
Optical Coherence Tomography (OCT) is the major diagnostic tool for the leading cause of blindness in developed countries: Diabetic Macular Edema (DME). Depending on the type of fluid accumulations, different treatments are needed. In particular, Cystoid Macular Edemas (CMEs) represent the most severe scenario, while Diffuse Retinal Thickening (DRT) is an early indicator of the disease but a challenging scenario to detect. While methodologies exist, their explanatory power is limited to the input sample itself. However, due to the complexity of these accumulations, this may not be enough for a clinician to assess the validity of the classification. Thus, in this work, we propose a novel approach based on multi-prototype networks with vision transformers to obtain an example-based explainable classification. Our proposal achieved robust results in two representative OCT devices, with a mean accuracy of 0.9099 ± 0.0083 and 0.8582 ± 0.0126 for CME and DRT-type fluid accumulations, respectively. © 2023 IEEE.

CloseRead Abstract

2024

YOLOMM - You Only Look Once for Multi-modal Multi-tasking

Authors
Campos, F; Cerqueira, FG; Cruz, RPM; Cardoso, JS;

Publication
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
Autonomous driving can reduce the number of road accidents due to human error and result in safer roads. One important part of the system is the perception unit, which provides information about the environment surrounding the car. Currently, most manufacturers are using not only RGB cameras, which are passive sensors that capture light already in the environment but also Lidar. This sensor actively emits laser pulses to a surface or object and measures reflection and time-of-flight. Previous work, YOLOP, already proposed a model for object detection and semantic segmentation, but only using RGB. This work extends it for Lidar and evaluates performance on KITTI, a public autonomous driving dataset. The implementation shows improved precision across all objects of different sizes. The implementation is entirely made available: https://github.com/filipepcampos/yolomm.

CloseRead Abstract