Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Isabel Rio-Torto received the master's degree in Electrical and Computers Engineering in 2019 from the Faculty of Engineering of the University of Porto (FEUP). Isabel is currently a research assistant at INESC TEC, associated with the Visual Computing and Machine Intelligence Group (VCMI), and a Ph.D. student in Computer Science from the Faculty of Sciences of the University of Porto (FCUP). Isabel is also an Invited Teaching Assistant at FEUP, teaching programming courses. Her work is currently focused on "Self-explanatory computer-aided diagnosis with limited supervision".

Interest
Topics
Details

Details

  • Name

    Isabel Rio-Torto
  • Role

    Research Assistant
  • Since

    06th July 2020
001
Publications

2024

<i>DeViL</i>: Decoding Vision features into Language

Authors
Dani, M; Rio Torto, I; Alaniz, S; Akata, Z;

Publication
PATTERN RECOGNITION, DAGM GCPR 2023

Abstract
Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks. In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned. Our DeViL method generates textual descriptions of visual features at different layers of the network as well as highlights the attribution locations of learned concepts. We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language. By employing dropout both per-layer and per-spatial-location, our model can generalize training on image-text pairs to generate localized explanations. As it uses a pre-trained language model, our approach is fast to train and can be applied to any vision backbone. Moreover, DeViL can create open-vocabulary attribution maps corresponding to words or phrases even outside the training scope of the vision model. We demonstrate that DeViL generates textual descriptions relevant to the image content on CC3M, surpassing previous lightweight captioning models and attribution maps, uncovering the learned concepts of the vision backbone. Further, we analyze fine-grained descriptions of layers as well as specific spatial locations and show that DeViL outperforms the current state-of-the-art on the neuron-wise descriptions of the MILANNOTATIONS dataset.

2024

On the Suitability of B-cos Networks for the Medical Domain

Authors
Torto, IR; Gonçalves, T; Cardoso, JS; Teixeira, LF;

Publication
IEEE International Symposium on Biomedical Imaging, ISBI 2024, Athens, Greece, May 27-30, 2024

Abstract
In fields that rely on high-stakes decisions, such as medicine, interpretability plays a key role in promoting trust and facilitating the adoption of deep learning models by the clinical communities. In the medical image analysis domain, gradient-based class activation maps are the most widely used explanation methods and the field lacks a more in depth investigation into inherently interpretable models that focus on integrating knowledge that ensures the model is learning the correct rules. A new approach, B-cos networks, for increasing the interpretability of deep neural networks by inducing weight-input alignment during training showed promising results on natural image classification. In this work, we study the suitability of these B-cos networks to the medical domain by testing them on different use cases (skin lesions, diabetic retinopathy, cervical cytology, and chest X-rays) and conducting a thorough evaluation of several explanation quality assessment metrics. We find that, just like in natural image classification, B-cos explanations yield more localised maps, but it is not clear that they are better than other methods' explanations when considering more explanation properties. © 2024 IEEE.

2023

Fill in the blank for fashion complementary outfit product Retrieval: VISUM summer school competition

Authors
Castro, E; Ferreira, PM; Rebelo, A; Rio-Torto, I; Capozzi, L; Ferreira, MF; Goncalves, T; Albuquerque, T; Silva, W; Afonso, C; Sousa, RG; Cimarelli, C; Daoudi, N; Moreira, G; Yang, HY; Hrga, I; Ahmad, J; Keswani, M; Beco, S;

Publication
MACHINE VISION AND APPLICATIONS

Abstract
Every year, the VISion Understanding and Machine intelligence (VISUM) summer school runs a competition where participants can learn and share knowledge about Computer Vision and Machine Learning in a vibrant environment. 2021 VISUM's focused on applying those methodologies in fashion. Recently, there has been an increase of interest within the scientific community in applying computer vision methodologies to the fashion domain. That is highly motivated by fashion being one of the world's largest industries presenting a rapid development in e-commerce mainly since the COVID-19 pandemic. Computer Vision for Fashion enables a wide range of innovations, from personalized recommendations to outfit matching. The competition enabled students to apply the knowledge acquired in the summer school to a real-world problem. The ambition was to foster research and development in fashion outfit complementary product retrieval by leveraging vast visual and textual data with domain knowledge. For this, a new fashion outfit dataset (acquired and curated by FARFETCH) for research and benchmark purposes is introduced. Additionally, a competitive baseline with an original negative sampling process for triplet mining was implemented and served as a starting point for participants. The top 3 performing methods are described in this paper since they constitute the reference state-of-the-art for this particular problem. To our knowledge, this is the first challenge in fashion outfit complementary product retrieval. Moreover, this joint project between academia and industry brings several relevant contributions to disseminating science and technology, promoting economic and social development, and helping to connect early-career researchers to real-world industry challenges.

2022

From Captions to Explanations: A Multimodal Transformer-based Architecture for Natural Language Explanation Generation

Authors
Rio-Torto, I; Cardoso, JS; Teixeira, LF;

Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022)

Abstract
The growing importance of the Explainable Artificial Intelligence (XAI) field has led to the proposal of several methods for producing visual heatmaps of the classification decisions of deep learning models. However, visual explanations are not sufficient because different end-users have different backgrounds and preferences. Natural language explanations (NLEs) are inherently understandable by humans and, thus, can complement visual explanations. Therefore, we introduce a novel architecture based on multimodal Transformers to enable the generation of NLEs for image classification tasks. Contrary to the current literature, which models NLE generation as a supervised image captioning problem, we propose to learn to generate these textual explanations without their direct supervision, by starting from image captions and evolving to classification-relevant text. Preliminary experiments on a novel dataset where there is a clear demarcation between captions and NLEs show the potential of the approach and shed light on how it can be improved.

2022

Hybrid Quality Inspection for the Automotive Industry: Replacing the Paper-Based Conformity List through Semi-Supervised Object Detection and Simulated Data

Authors
Rio-Torto, I; Campanico, AT; Pinho, P; Filipe, V; Teixeira, LF;

Publication
APPLIED SCIENCES-BASEL

Abstract
The still prevalent use of paper conformity lists in the automotive industry has a serious negative impact on the performance of quality control inspectors. We propose instead a hybrid quality inspection system, where we combine automated detection with human feedback, to increase worker performance by reducing mental and physical fatigue, and the adaptability and responsiveness of the assembly line to change. The system integrates the hierarchical automatic detection of the non-conforming vehicle parts and information visualization on a wearable device to present the results to the factory worker and obtain human confirmation. Besides designing a novel 3D vehicle generator to create a digital representation of the non conformity list and to collect automatically annotated training data, we apply and aggregate in a novel way state-of-the-art domain adaptation and pseudo labeling methods to our real application scenario, in order to bridge the gap between the labeled data generated by the vehicle generator and the real unlabeled data collected on the factory floor. This methodology allows us to obtain, without any manual annotation of the real dataset, an example-based F1 score of 0.565 in an unconstrained scenario and 0.601 in a fixed camera setup (improvements of 11 and 14.6 percentage points, respectively, over a baseline trained with purely simulated data). Feedback obtained from factory workers highlighted the usefulness of the proposed solution, and showed that a truly hybrid assembly line, where machine and human work in symbiosis, increases both efficiency and accuracy in automotive quality control.

Supervised
thesis

2023

Self-Supervised Learning for Medical Image Classification: A Study on MoCo-CXR

Author
Hugo Miguel Monteiro Guimarães

Institution
UM

2023

Improving Image Captioning through Segmentation

Author
Pedro Daniel Fernandes Ferreira

Institution
UM

2021

Combining simulated and real images in deep learning

Author
Pedro Xavier Tavares Monteiro Correia de Pinho

Institution
UM

2020

Automatic generation of textual explanations in deep learning

Author
Patrícia Ferreira Rocha

Institution
UM