2022
Autores
Goncalves, T; Rio-Torto, I; Teixeira, LF; Cardoso, JS;
Publicação
IEEE ACCESS
Abstract
The increasing popularity of attention mechanisms in deep learning algorithms for computer vision and natural language processing made these models attractive to other research domains. In healthcare, there is a strong need for tools that may improve the routines of the clinicians and the patients. Naturally, the use of attention-based algorithms for medical applications occurred smoothly. However, being healthcare a domain that depends on high-stake decisions, the scientific community must ponder if these high-performing algorithms fit the needs of medical applications. With this motto, this paper extensively reviews the use of attention mechanisms in machine learning methods (including Transformers) for several medical applications based on the types of tasks that may integrate several works pipelines of the medical domain. This work distinguishes itself from its predecessors by proposing a critical analysis of the claims and potentialities of attention mechanisms presented in the literature through an experimental case study on medical image classification with three different use cases. These experiments focus on the integrating process of attention mechanisms into established deep learning architectures, the analysis of their predictive power, and a visual assessment of their saliency maps generated by post-hoc explanation methods. This paper concludes with a critical analysis of the claims and potentialities presented in the literature about attention mechanisms and proposes future research lines in medical applications that may benefit from these frameworks.
2023
Autores
Castro, E; Ferreira, PM; Rebelo, A; Rio-Torto, I; Capozzi, L; Ferreira, MF; Goncalves, T; Albuquerque, T; Silva, W; Afonso, C; Sousa, RG; Cimarelli, C; Daoudi, N; Moreira, G; Yang, HY; Hrga, I; Ahmad, J; Keswani, M; Beco, S;
Publicação
MACHINE VISION AND APPLICATIONS
Abstract
Every year, the VISion Understanding and Machine intelligence (VISUM) summer school runs a competition where participants can learn and share knowledge about Computer Vision and Machine Learning in a vibrant environment. 2021 VISUM's focused on applying those methodologies in fashion. Recently, there has been an increase of interest within the scientific community in applying computer vision methodologies to the fashion domain. That is highly motivated by fashion being one of the world's largest industries presenting a rapid development in e-commerce mainly since the COVID-19 pandemic. Computer Vision for Fashion enables a wide range of innovations, from personalized recommendations to outfit matching. The competition enabled students to apply the knowledge acquired in the summer school to a real-world problem. The ambition was to foster research and development in fashion outfit complementary product retrieval by leveraging vast visual and textual data with domain knowledge. For this, a new fashion outfit dataset (acquired and curated by FARFETCH) for research and benchmark purposes is introduced. Additionally, a competitive baseline with an original negative sampling process for triplet mining was implemented and served as a starting point for participants. The top 3 performing methods are described in this paper since they constitute the reference state-of-the-art for this particular problem. To our knowledge, this is the first challenge in fashion outfit complementary product retrieval. Moreover, this joint project between academia and industry brings several relevant contributions to disseminating science and technology, promoting economic and social development, and helping to connect early-career researchers to real-world industry challenges.
2024
Autores
Dani, M; Rio Torto, I; Alaniz, S; Akata, Z;
Publicação
PATTERN RECOGNITION, DAGM GCPR 2023
Abstract
Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks. In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned. Our DeViL method generates textual descriptions of visual features at different layers of the network as well as highlights the attribution locations of learned concepts. We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language. By employing dropout both per-layer and per-spatial-location, our model can generalize training on image-text pairs to generate localized explanations. As it uses a pre-trained language model, our approach is fast to train and can be applied to any vision backbone. Moreover, DeViL can create open-vocabulary attribution maps corresponding to words or phrases even outside the training scope of the vision model. We demonstrate that DeViL generates textual descriptions relevant to the image content on CC3M, surpassing previous lightweight captioning models and attribution maps, uncovering the learned concepts of the vision backbone. Further, we analyze fine-grained descriptions of layers as well as specific spatial locations and show that DeViL outperforms the current state-of-the-art on the neuron-wise descriptions of the MILANNOTATIONS dataset.
2024
Autores
Torto, IR; Gonçalves, T; Cardoso, JS; Teixeira, LF;
Publicação
IEEE International Symposium on Biomedical Imaging, ISBI 2024, Athens, Greece, May 27-30, 2024
Abstract
In fields that rely on high-stakes decisions, such as medicine, interpretability plays a key role in promoting trust and facilitating the adoption of deep learning models by the clinical communities. In the medical image analysis domain, gradient-based class activation maps are the most widely used explanation methods and the field lacks a more in depth investigation into inherently interpretable models that focus on integrating knowledge that ensures the model is learning the correct rules. A new approach, B-cos networks, for increasing the interpretability of deep neural networks by inducing weight-input alignment during training showed promising results on natural image classification. In this work, we study the suitability of these B-cos networks to the medical domain by testing them on different use cases (skin lesions, diabetic retinopathy, cervical cytology, and chest X-rays) and conducting a thorough evaluation of several explanation quality assessment metrics. We find that, just like in natural image classification, B-cos explanations yield more localised maps, but it is not clear that they are better than other methods' explanations when considering more explanation properties. © 2024 IEEE.
2022
Autores
Gonçalves, T; Torto, IR; Teixeira, LF; Cardoso, JS;
Publicação
CoRR
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.