Publicacoes - INESC TEC

Publicações

2025

Using Explanations to Estimate the Quality of Computer Vision Models

Autores
Oliveira, F; Carneiro, D; Pereira, J;

Publicação
Springer Proceedings in Business and Economics

Abstract
Explainable AI (xAI) emerged as one of the ways of addressing the interpretability issues of the so-called black-box models. Most of the xAI artifacts proposed so far were designed, as expected, for human users. In this work, we posit that such artifacts can also be used by computer systems. Specifically, we propose a set of metrics derived from LIME explanations, that can eventually be used to ascertain the quality of each output of an underlying image classification model. We validate these metrics against quantitative human feedback, and identify 4 potentially interesting metrics for this purpose. This research is particularly useful in concept drift scenarios, in which models are deployed into production and there is no new labelled data to continuously evaluate them, becoming impossible to know the current performance of the model. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

FecharLer Abstract

2025

CNN explanation methods for ordinal regression tasks

Autores
Gómez, JB; Cruz, RPM; Cardoso, JS; Gutiérrez, PA; Martínez, CH;

Publicação
Neurocomputing

Abstract

2025

CNN explanation methods for ordinal regression tasks

Autores
Barbero-Gómez, J; Cruz, RPM; Cardoso, JS; Gutiérrez, PA; Hervás-Martínez, C;

Publicação
NEUROCOMPUTING

Abstract
The use of Convolutional Neural Network (CNN) models for image classification tasks has gained significant popularity. However, the lack of interpretability in CNN models poses challenges for debugging and validation. To address this issue, various explanation methods have been developed to provide insights into CNN models. This paper focuses on the validity of these explanation methods for ordinal regression tasks, where the classes have a predefined order relationship. Different modifications are proposed for two explanation methods to exploit the ordinal relationships between classes: Grad-CAM based on Ordinal Binary Decomposition (GradOBDCAM) and Ordinal Information Bottleneck Analysis (OIBA). The performance of these modified methods is compared to existing popular alternatives. Experimental results demonstrate that GradOBD-CAM outperforms other methods in terms of interpretability for three out of four datasets, while OIBA achieves superior performance compared to IBA.

FecharLer Abstract

2025

Aligning priorities: A Comparative analysis of scientific and policy perspectives on municipal solid waste management

Autores
Rodrigues, M; Antunes, JA; Migueis, V;

Publicação
WASTE MANAGEMENT

Abstract
Municipal solid waste (MSW) management has become a critical issue today, posing substantial economic, environmental, and social challenges. Identifying and analyzing dominant themes in this field is essential for advancing research and policies towards sustainable MSW management practices. This study aims to explore the key issues related to MSW management that have been addressed by both the scientific community and policymakers through funded projects. By doing so, the study seeks to guide the scientific community as a knowledge producer and the EU as a key funder. Two Latent Dirichlet Allocation (LDA) models were applied to analyze the themes from two corpora: one representing scientific literature and another focusing on EU-funded projects. Additionally, this analysis was complemented by a quantitative estimation of the similarity between the two corpora, providing a measure of alignment between the scientific community and policymakers. The results generally indicate that the two spheres are aligned and highlight the diversity of topics explored by the scientific community. Nevertheless, it is concluded that there are opportunities for further research on specific topics, such as leaching and the extraction of heavy metals. Additionally, the popularity of topics identified in European Union-funded projects has fluctuated considerably over time, focusing primarily on waste management rather than its prevention. In light of these findings, waste prevention emerges as a promising avenue for future EU-funded research initiatives.

FecharLer Abstract

2025

Enhancing Nut-Tightening Processes in the Automotive Industry: Integration of 3D Vision Systems with Collaborative Robots

Autores
Gonçalves, A; Pereira, T; Lopes, D; Cunha, F; Lopes, F; Coutinho, F; Barreiros, J; Durães, J; Santos, P; Simões, F; Ferreira, P; Freitas, DC; Trovão, F; Santos, V; Ferreira, P; Ferreira, M;

Publicação
Automation

Abstract
This paper presents a method for position correction in collaborative robots, applied to a case study in an industrial environment. The case study is aligned with the GreenAuto project and aims to optimize industrial processes through the integration of various hardware elements. The case study focuses on tightening a specific number of nuts onto bolts located on a partition plate, referred to as “Cloison”, which is mounted on commercial vans produced by Stellantis, to secure the plate. The main challenge lies in deviations that may occur in the plate during its assembly process, leading to uncertainties in its fastening to the vehicles. To address this and optimize the process, a collaborative robot was integrated with a 3D vision system and a screwdriving system. By using the 3D vision system, it is possible to determine the bolts’ positions and adjust them within the robot’s frame of reference, enabling the screwdriving system to tighten the nuts accurately. Thus, the proposed method aims to integrate these different systems to tighten the nuts effectively, regardless of the deviations that may arise in the plate during assembly. © 2025 by the authors.

FecharLer Abstract

2025

Evaluation of Lyrics Extraction from Folk Music Sheets Using Vision Language Models (VLMs)

Autores
Sales Mendes, A; Lozano Murciego, Á; Silva, LA; Jiménez Bravo, M; Navarro Cáceres, M; Bernardes, G;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Monodic folk music has traditionally been preserved in physical documents. It constitutes a vast archive that needs to be digitized to facilitate comprehensive analysis using AI techniques. A critical component of music score digitization is the transcription of lyrics, an extensively researched process in Optical Character Recognition (OCR) and document layout analysis. These fields typically require the development of specific models that operate in several stages: first, to detect the bounding boxes of specific texts, then to identify the language, and finally, to recognize the characters. Recent advances in vision language models (VLMs) have introduced multimodal capabilities, such as processing images and text, which are competitive with traditional OCR methods. This paper proposes an end-to-end system for extracting lyrics from images of handwritten musical scores. We aim to evaluate the performance of two state-of-the-art VLMs to determine whether they can eliminate the need to develop specialized text recognition and OCR models for this task. The results of the study, obtained from a dataset in a real-world application environment, are presented along with promising new research directions in the field. This progress contributes to preserving cultural heritage and opens up new possibilities for global analysis and research in folk music. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

FecharLer Abstract

25
4033