2023
Autores
da Costa, TS; Andrade, MT; Viana, P; Silva, NC;
Publicação
DATA IN BRIEF
Abstract
The Data2MV dataset contains gaze fixation data obtained through experimental procedures from a total of 45 participants using an Intel RealSense F200 camera module and seven different video playlists. Each of the playlists had an approximate duration of 20 minutes and was viewed at least 17 times, with raw tracking data being recorded with a 0.05 second interval. The Data2MV dataset encompasses a total of 1.0 0 0.845 gaze fixations, gathered across a total of 128 experiments. It is also composed of 68.393 image frames, extracted from each of the 6 videos selected for these experiments, and an equal quantity of saliency maps, generated from aggregate fixation data. Software tools to obtain saliency maps and generate complementary plots are also provided as an open source software package. The Data2MV dataset was publicly released to the research community on Mendeley Data and constitutes an important contribution to reduce the current scarcity of such data, particularly in immersive, multi-view streaming scenarios. (c) 2023 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
2024
Autores
Pereira, B; Cunha, B; Viana, P; Lopes, M; Melo, ASC; Sousa, ASP;
Publicação
SENSORS
Abstract
Shoulder rehabilitation is a process that requires physical therapy sessions to recover the mobility of the affected limbs. However, these sessions are often limited by the availability and cost of specialized technicians, as well as the patient's travel to the session locations. This paper presents a novel smartphone-based approach using a pose estimation algorithm to evaluate the quality of the movements and provide feedback, allowing patients to perform autonomous recovery sessions. This paper reviews the state of the art in wearable devices and camera-based systems for human body detection and rehabilitation support and describes the system developed, which uses MediaPipe to extract the coordinates of 33 key points on the patient's body and compares them with reference videos made by professional physiotherapists using cosine similarity and dynamic time warping. This paper also presents a clinical study that uses QTM, an optoelectronic system for motion capture, to validate the methods used by the smartphone application. The results show that there are statistically significant differences between the three methods for different exercises, highlighting the importance of selecting an appropriate method for specific exercises. This paper discusses the implications and limitations of the findings and suggests directions for future research.
2024
Autores
Vilça, L; Viana, P; Carvalho, P; Andrade, MT;
Publicação
IEEE ACCESS
Abstract
It is well known that the performance of Machine Learning techniques, notably when applied to Computer Vision (CV), depends heavily on the amount and quality of the training data set. However, large data sets lead to time-consuming training loops and, in many situations, are difficult or even impossible to create. Therefore, there is a need for solutions to reduce their size while ensuring good levels of performance, i.e., solutions that obtain the best tradeoff between the amount/quality of training data and the model's performance. This paper proposes a dataset reduction approach for training data used in Deep Learning methods in Facial Recognition (FR) problems. We focus on maximizing the variability of representations for each subject (person) in the training data, thus favoring quality instead of size. The main research questions are: 1) Which facial features better discriminate different identities? 2) Will it be possible to significantly reduce the training time without compromising performance? 3) Should we favor quality over quantity for very large datasets in FR? This analysis uses a pipeline to discriminate a set of features suitable for capturing the diversity and a cluster-based sampling to select the best images for each training subject, i.e., person. Results were obtained using VGGFace2 and Labeled Faces in the Wild (for benchmarking) and show that, with the proposed approach, a data reduction is possible while ensuring similar levels of accuracy.
2024
Autores
Sulun, S; Viana, P; Davies, MEP;
Publicação
EXPERT SYSTEMS WITH APPLICATIONS
Abstract
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.