Publicacoes - INESC TEC

Publicações

Publicações por Paula Viana

2023

Data2MV - A user behaviour dataset for multi-view scenarios

Autores
da Costa, TS; Andrade, MT; Viana, P; Silva, NC;

Publicação
DATA IN BRIEF

Abstract
The Data2MV dataset contains gaze fixation data obtained through experimental procedures from a total of 45 participants using an Intel RealSense F200 camera module and seven different video playlists. Each of the playlists had an approximate duration of 20 minutes and was viewed at least 17 times, with raw tracking data being recorded with a 0.05 second interval. The Data2MV dataset encompasses a total of 1.0 0 0.845 gaze fixations, gathered across a total of 128 experiments. It is also composed of 68.393 image frames, extracted from each of the 6 videos selected for these experiments, and an equal quantity of saliency maps, generated from aggregate fixation data. Software tools to obtain saliency maps and generate complementary plots are also provided as an open source software package. The Data2MV dataset was publicly released to the research community on Mendeley Data and constitutes an important contribution to reduce the current scarcity of such data, particularly in immersive, multi-view streaming scenarios. (c) 2023 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)

FecharLer Abstract

2024

A Machine Learning App for Monitoring Physical Therapy at Home

Autores
Pereira, B; Cunha, B; Viana, P; Lopes, M; Melo, ASC; Sousa, ASP;

Publicação
SENSORS

Abstract
Shoulder rehabilitation is a process that requires physical therapy sessions to recover the mobility of the affected limbs. However, these sessions are often limited by the availability and cost of specialized technicians, as well as the patient's travel to the session locations. This paper presents a novel smartphone-based approach using a pose estimation algorithm to evaluate the quality of the movements and provide feedback, allowing patients to perform autonomous recovery sessions. This paper reviews the state of the art in wearable devices and camera-based systems for human body detection and rehabilitation support and describes the system developed, which uses MediaPipe to extract the coordinates of 33 key points on the patient's body and compares them with reference videos made by professional physiotherapists using cosine similarity and dynamic time warping. This paper also presents a clinical study that uses QTM, an optoelectronic system for motion capture, to validate the methods used by the smartphone application. The results show that there are statistically significant differences between the three methods for different exercises, highlighting the importance of selecting an appropriate method for specific exercises. This paper discusses the implications and limitations of the findings and suggests directions for future research.

FecharLer Abstract

2024

Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach

Autores
Vilça, L; Viana, P; Carvalho, P; Andrade, MT;

Publicação
IEEE ACCESS

Abstract
It is well known that the performance of Machine Learning techniques, notably when applied to Computer Vision (CV), depends heavily on the amount and quality of the training data set. However, large data sets lead to time-consuming training loops and, in many situations, are difficult or even impossible to create. Therefore, there is a need for solutions to reduce their size while ensuring good levels of performance, i.e., solutions that obtain the best tradeoff between the amount/quality of training data and the model's performance. This paper proposes a dataset reduction approach for training data used in Deep Learning methods in Facial Recognition (FR) problems. We focus on maximizing the variability of representations for each subject (person) in the training data, thus favoring quality instead of size. The main research questions are: 1) Which facial features better discriminate different identities? 2) Will it be possible to significantly reduce the training time without compromising performance? 3) Should we favor quality over quantity for very large datasets in FR? This analysis uses a pipeline to discriminate a set of features suitable for capturing the diversity and a cluster-based sampling to select the best images for each training subject, i.e., person. Results were obtained using VGGFace2 and Labeled Faces in the Wild (for benchmarking) and show that, with the proposed approach, a data reduction is possible while ensuring similar levels of accuracy.

FecharLer Abstract

2024

Movie trailer genre classification using multimodal pretrained features

Autores
Sulun, S; Viana, P; Davies, MEP;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.

FecharLer Abstract

2024

CONVERGE: A Vision-Radio Research Infrastructure Towards 6G and Beyond

Autores
Teixeira, FB; Ricardo, M; Coelho, A; Oliveira, HP; Viana, P; Paulino, N; Fontes, H; Marques, P; Campos, R; Pessoa, LM;

Publicação
2024 JOINT EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS & 6G SUMMIT, EUCNC/6G SUMMIT 2024

Abstract
Telecommunications and computer vision have evolved separately so far. Yet, with the shift to sub-terahertz (sub-THz) and terahertz (THz) radio communications, there is an opportunity to explore computer vision technologies together with radio communications, considering the dependency of both technologies on Line of Sight. The combination of radio sensing and computer vision can address challenges such as obstructions and poor lighting. Also, machine learning algorithms, capable of processing multimodal data, play a crucial role in deriving insights from raw and low-level sensing data, offering a new level of abstraction that can enhance various applications and use cases such as beamforming and terminal handovers. This paper introduces CONVERGE, a pioneering vision-radio paradigm that bridges this gap by leveraging Integrated Sensing and Communication (ISAC) to facilitate a dual View-to-Communicate, Communicate-to-View approach. CONVERGE offers tools that merge wireless communications and computer vision, establishing a novel Research Infrastructure (RI) that will be open to the scientific community and capable of providing open datasets. This new infrastructure will support future research in 6G and beyond concerning multiple verticals, such as telecommunications, automotive, manufacturing, media, and health.

FecharLer Abstract

2024

Enhancing Indoor Localisation: a Bluetooth Low Energy (BLE) Beacon Placement approach

Autores
Dias, J; Oliper, D; Soares, MR; Viana, P;

Publicação
2024 IEEE 22ND MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, MELECON 2024

Abstract
This paper addresses the critical challenge of optimising beacon placement to support indoor location services and proposes a methodology to optimise the Base Station (BS) coverage keeping or even improving the system precision. The algorithm builds on top of the building schematics and takes into account several aspects that affect the radio link range (materials attenuation, Line of Sight (LOS) conditions, transmitted power and radio sensibility). The outcome is available as a coverage heat map. It is then compared with a standard layout following existing expert guidelines to evaluate the efficacy of the proposed layout.

FecharLer Abstract