Publicacoes - INESC TEC

Publicações

Publicações por CTM

2025

A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Autores
Vilaça, L; Yu, Y; Viana, P;

Publicação
ACM COMPUTING SURVEYS

Abstract
Audio-visual correlation learning aims at capturing and understanding natural phenomena between audio and visual data. The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data and can be observed in the number of proposals in the past years. Thus encouraging the development of a comprehensive survey. Besides analyzing the models used in this context, we also discuss some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate objective functions frequently used and discuss how audio-visual data is exploited in the optimization process, i.e., the different methodologies for representing knowledge in the audio-visual domain. In fact, we focus on how human-understandable mechanisms, i.e., structured knowledge that reflects comprehensible knowledge, can guide the learning process. Most importantly, we provide a summarization of the recent progress of Audio-Visual Correlation Learning (AVCL) and discuss the future research directions.

FecharLer Abstract

2025

Correction: Guimarães et al. A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition. Appl. Sci. 2023, 13, 2871

Autores
Guimarães, V; Nascimento, J; Viana, P; Carvalho, P;

Publicação
Applied Sciences

Abstract
There was an error in the original publication [...]

FecharLer Abstract

2025

An Assessment of the Sensory Function in the Maxillofacial Region: A Dual-Case Pilot Study

Autores
Aguiar, JM; da Silva, JM; Fonseca, C; Marinho, J;

Publicação
SENSORS

Abstract
Trigeminal somatosensory-evoked potentials (TSEPs) provide valuable insight into neural responses to oral stimuli. This study investigates TSEP recording methods and their impact on interpreting results in clinical settings to improve the development process of neurostimulation-based therapies. The experiments and results presented here aim at identifying appropriate stimulation characteristics to design an active dental prosthesis capable of contributing to restoring the lost neurosensitive connection between the teeth and the brain. Two methods of TSEP acquisition, traditional and occluded, were used, each conducted by a different volunteer. Traditional TSEP acquisition involves stimulation at different sites with varying parameters to achieve a control base. In contrast, occluded TSEPs examine responses acquired under low- and high-force bite conditions to assess the influence of periodontal mechanoreceptors and muscle activation on measurements. Traditional TSEPs demonstrated methodological feasibility with satisfactory results despite a limited subject pool. However, occluded TSEPs presented challenges in interpreting results, with responses deviating from expected norms, particularly under high force conditions, due to the simultaneous occurrence of stimulation and dental occlusion. While traditional TSEPs highlight methodological feasibility, the occluded approach highlights complexities in outcome interpretation and urges caution in clinical application. Previously unreported results were achieved, which underscores the importance of conducting further research with larger sample sizes and refined protocols in order to strengthen the reliability and validity of TSEP assessments.

FecharLer Abstract

2025

A Reinforcement Learning Based Recommender System Framework for Web Apps: Radio and Game Aggregators Scenarios

Autores
Batista, A; Torres, JM; Sobral, P; Moreira, RS; Soares, C; Pereira, I;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2024, PT I

Abstract
Recommendation systems can play an important role in today's digital content platforms by supporting the suggestion of relevant content in a personalised manner for each customer. Such content customisation has not been consistent across most media domains, and particularly on radio streaming and gaming aggregators, which are the two real-world application domains focused in this work. The challenges faced in these application areas are the dynamic nature of user preferences and the difficulty of generating recommendations for less popular content, due to the overwhelming choice and polarisation of available top content. We present the design and implementation of a Reinforcement Learning-based Recommendation System (RLRS) for web applications, using a Deep Deterministic Policy Gradient (DDPG) agent and, as a reward function, a weighted sum of the user Click Distribution (CD) across the recommended items and the Dwell Time (DT), a measure of the time users spend interacting with those items. Our system has been deployed in real production scenarios with preliminary but promising results. Several metrics are used to track the effectiveness of our approach, such as content coverage, category diversity, and intra-list similarity. In both scenarios tested, the system shows consistent improvement and adaptability over time, reinforcing its applicability.

FecharLer Abstract

2025

Transformer-Based Models for Probabilistic Time Series Forecasting with Explanatory Variables

Autores
Caetano, R; Oliveira, JM; Ramos, P;

Publicação
MATHEMATICS

Abstract
Accurate demand forecasting is essential for retail operations as it directly impacts supply chain efficiency, inventory management, and financial performance. However, forecasting retail time series presents significant challenges due to their irregular patterns, hierarchical structures, and strong dependence on external factors such as promotions, pricing strategies, and socio-economic conditions. This study evaluates the effectiveness of Transformer-based architectures, specifically Vanilla Transformer, Informer, Autoformer, ETSformer, NSTransformer, and Reformer, for probabilistic time series forecasting in retail. A key focus is the integration of explanatory variables, such as calendar-related indicators, selling prices, and socio-economic factors, which play a crucial role in capturing demand fluctuations. This study assesses how incorporating these variables enhances forecast accuracy, addressing a research gap in the comprehensive evaluation of explanatory variables within multiple Transformer-based models. Empirical results, based on the M5 dataset, show that incorporating explanatory variables generally improves forecasting performance. Models leveraging these variables achieve up to 12.4% reduction in Normalized Root Mean Squared Error (NRMSE) and 2.9% improvement in Mean Absolute Scaled Error (MASE) compared to models that rely solely on past sales. Furthermore, probabilistic forecasting enhances decision making by quantifying uncertainty, providing more reliable demand predictions for risk management. These findings underscore the effectiveness of Transformer-based models in retail forecasting and emphasize the importance of integrating domain-specific explanatory variables to achieve more accurate, context-aware predictions in dynamic retail environments.

FecharLer Abstract

2025

Deep Learning-Driven Integration of Multimodal Data for Material Property Predictions

Autores
Costa, V; Oliveira, JM; Ramos, P;

Publicação

Abstract
This study investigates the integration of deep learning for single-modality and multimodal data within materials science. Traditional methods for materials discovery are often resource-intensive and slow, prompting the exploration of machine learning to streamline the prediction of material properties. While single-modality models have been effective, they often miss the complexities inherent in material data. The paper explores multimodal data integration—combining text, images, and tabular data—and demonstrates its potential to improve predictive accuracy. Utilizing the Alexandria dataset, the research introduces a custom methodology involving multimodal data creation, model tuning with AutoGluon framework, and evaluation through targeted fusion techniques. Results reveal that multimodal approaches enhance predictive accuracy and efficiency, particularly when text and image data are integrated. However, challenges remain in predicting complex features like band gaps. Future directions include incorporating new data types and refining specialized models to improve materials discovery and innovation.

FecharLer Abstract