Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por CTM

2024

Explaining Bounding Boxes in Deep Object Detectors Using Post Hoc Methods for Autonomous Driving Systems

Autores
Nogueira, C; Fernandes, L; Fernandes, JND; Cardoso, JS;

Publicação
SENSORS

Abstract
Deep learning has rapidly increased in popularity, leading to the development of perception solutions for autonomous driving. The latter field leverages techniques developed for computer vision in other domains for accomplishing perception tasks such as object detection. However, the black-box nature of deep neural models and the complexity of the autonomous driving context motivates the study of explainability in these models that perform perception tasks. Moreover, this work explores explainable AI techniques for the object detection task in the context of autonomous driving. An extensive and detailed comparison is carried out between gradient-based and perturbation-based methods (e.g., D-RISE). Moreover, several experimental setups are used with different backbone architectures and different datasets to observe the influence of these aspects in the explanations. All the techniques explored consist of saliency methods, making their interpretation and evaluation primarily visual. Nevertheless, numerical assessment methods are also used. Overall, D-RISE and guided backpropagation obtain more localized explanations. However, D-RISE highlights more meaningful regions, providing more human-understandable explanations. To the best of our knowledge, this is the first approach to obtaining explanations focusing on the regression of the bounding box coordinates.

2024

Intrinsic Explainability for End-to-End Object Detection

Autores
Fernandes, L; Fernandes, JND; Calado, M; Pinto, JR; Cerqueira, R; Cardoso, JS;

Publicação
IEEE ACCESS

Abstract
Deep Learning models are automating many daily routine tasks, indicating that in the future, even high-risk tasks will be automated, such as healthcare and automated driving areas. However, due to the complexity of such deep learning models, it is challenging to understand their reasoning. Furthermore, the black box nature of the designed deep learning models may undermine public confidence in critical areas. Current efforts on intrinsically interpretable models focus only on classification tasks, leaving a gap in models for object detection. Therefore, this paper proposes a deep learning model that is intrinsically explainable for the object detection task. The chosen design for such a model is a combination of the well-known Faster-RCNN model with the ProtoPNet model. For the Explainable AI experiments, the chosen performance metric was the similarity score from the ProtoPNet model. Our experiments show that this combination leads to a deep learning model that is able to explain its classifications, with similarity scores, using a visual bag of words, which are called prototypes, that are learned during the training process. Furthermore, the adoption of such an explainable method does not seem to hinder the performance of the proposed model, which achieved a mAP of 69% in the KITTI dataset and a mAP of 66% in the GRAZPEDWRI-DX dataset. Moreover, our explanations have shown a high reliability on the similarity score.

2024

YOLOMM - You Only Look Once for Multi-modal Multi-tasking

Autores
Campos, F; Cerqueira, FG; Cruz, RPM; Cardoso, JS;

Publicação
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
Autonomous driving can reduce the number of road accidents due to human error and result in safer roads. One important part of the system is the perception unit, which provides information about the environment surrounding the car. Currently, most manufacturers are using not only RGB cameras, which are passive sensors that capture light already in the environment but also Lidar. This sensor actively emits laser pulses to a surface or object and measures reflection and time-of-flight. Previous work, YOLOP, already proposed a model for object detection and semantic segmentation, but only using RGB. This work extends it for Lidar and evaluates performance on KITTI, a public autonomous driving dataset. The implementation shows improved precision across all objects of different sizes. The implementation is entirely made available: https://github.com/filipepcampos/yolomm.

2024

Anonymizing medical case-based explanations through disentanglement

Autores
Montenegro, H; Cardoso, JS;

Publicação
MEDICAL IMAGE ANALYSIS

Abstract
Case-based explanations are an intuitive method to gain insight into the decision-making process of deep learning models in clinical contexts. However, medical images cannot be shared as explanations due to privacy concerns. To address this problem, we propose a novel method for disentangling identity and medical characteristics of images and apply it to anonymize medical images. The disentanglement mechanism replaces some feature vectors in an image while ensuring that the remaining features are preserved, obtaining independent feature vectors that encode the images' identity and medical characteristics. We also propose a model to manufacture synthetic privacy-preserving identities to replace the original image's identity and achieve anonymization. The models are applied to medical and biometric datasets, demonstrating their capacity to generate realistic-looking anonymized images that preserve their original medical content. Additionally, the experiments show the network's inherent capacity to generate counterfactual images through the replacement of medical features.

2024

Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Autores
Tame, ID; Tolosana, R; Melzi, P; Rodríguez, RV; Kim, M; Rathgeb, C; Liu, X; Morales, A; Fiérrez, J; Garcia, JO; Zhong, Z; Huang, Y; Mi, Y; Ding, S; Zhou, S; He, S; Fu, L; Cong, H; Zhang, R; Xiao, Z; Smirnov, E; Pimenov, A; Grigorev, A; Timoshenko, D; Asfaw, KM; Low, CY; Liu, H; Wang, C; Zuo, Q; He, Z; Shahreza, HO; George, A; Unnervik, A; Rahimi, P; Marcel, S; Neto, PC; Huber, M; Kolf, JN; Damer, N; Boutros, F; Cardoso, JS; Sequeira, AF; Atzori, A; Fenu, G; Marras, M; Struc, V; Yu, J; Li, Z; Li, J; Zhao, W; Lei, Z; Zhu, X; Zhang, XY; Biesseck, B; Vidal, P; Coelho, L; Granada, R; Menotti, D;

Publicação
IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Workshops, Seattle, WA, USA, June 17-18, 2024

Abstract

2024

Phasing segmented telescopes via deep learning methods: application to a deployable CubeSat

Autores
Dumont, M; Correia, CM; Sauvage, JF; Schwartz, N; Gray, M; Cardoso, J;

Publicação
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION

Abstract
Capturing high-resolution imagery of the Earth's surface often calls for a telescope of considerable size, even from low Earth orbits (LEOs). A large aperture often requires large and expensive platforms. For instance, achieving a resolution of 1 m at visible wavelengths from LEO typically requires an aperture diameter of at least 30 cm. Additionally, ensuring high revisit times often prompts the use of multiple satellites. In light of these challenges, a small, segmented, deployable CubeSat telescope was recently proposed creating the additional need of phasing the telescope's mirrors. Phasing methods on compact platforms are constrained by the limited volume and power available, excluding solutions that rely on dedicated hardware or demand substantial computational resources. Neural networks (NNs) are known for their computationally efficient inference and reduced onboard requirements. Therefore, we developed a NN-based method to measure co-phasing errors inherent to a deployable telescope. The proposed technique demonstrates its ability to detect phasing errors at the targeted performance level [typically a wavefront error (WFE) below 15 nm RMS for a visible imager operating at the diffraction limit] using a point source. The robustness of the NN method is verified in presence of high-order aberrations or noise and the results are compared against existing state-of-the-art techniques. The developed NN model ensures its feasibility and provides arealistic pathway towards achieving diffraction-limited images. (c) 2024 Optica Publishing Group

  • 5
  • 333