Publications

Publications by Pedro Miguel Carvalho

2010

Hybrid framework for evaluating video object tracking algorithms

Authors
Carvalho, P; Cardoso, JS; Corte Real, L;

Publication
ELECTRONICS LETTERS

Abstract
A simple and efficient hybrid framework for evaluating algorithms for tracking objects in video sequences is presented. The framework unifies state-of-the-art evaluation metrics with diverse requirements in terms of reference information, thus overcoming weaknesses of individual approaches. With foundations on already demonstrated and well known metrics, this framework assumes the role of a flexible and powerful tool for the research community to assess and compare algorithms.

CloseRead Abstract

2009

Partition-distance methods for assessing spatial segmentations of images and videos

Authors
Cardoso, JS; Carvalho, P; Teixeira, LF; Corte Real, L;

Publication
COMPUTER VISION AND IMAGE UNDERSTANDING

Abstract
The primary goal of the research on image segmentation is to produce better segmentation algorithms. In spite of almost 50 years of research and development in this Held, the general problem of splitting in image into meaningful regions remains unsolved. New and emerging techniques are constantly being applied with reduced Success. The design of each of these new segmentation algorithms requires spending careful attention judging the effectiveness of the technique. This paper demonstrates how the proposed methodology is well suited to perform a quantitative comparison between image segmentation algorithms using I ground-truth segmentation. It consists of a general framework already partially proposed in the literature, but dispersed over several works. The framework is based on the principle of eliminating the minimum number of elements Such that a specified condition is met. This rule translates directly into a global optimization procedure and the intersection-graph between two partitions emerges as the natural tool to solve it. The objective of this paper is to summarize, aggregate and extend the dispersed work. The principle is clarified, presented striped of unnecessary supports and extended to sequences of images. Our Study shows that the proposed framework for segmentation performance evaluation is simple, general and mathematically sound.

CloseRead Abstract

2023

From a Visual Scene to a Virtual Representation: A Cross-Domain Review

Authors
Pereira, A; Carvalho, P; Pereira, N; Viana, P; Corte-Real, L;

Publication
IEEE ACCESS

Abstract
The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an end-to-end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.

CloseRead Abstract Read Full Publication

2023

Unveiling the performance of video anomaly detection models - A benchmark-based review

Authors
Caetano, F; Carvalho, P; Cardoso, JS;

Publication
Intell. Syst. Appl.

Abstract
Deep learning has recently gained popularity in the field of video anomaly detection, with the development of various methods for identifying abnormal events in visual data. The growing need for automated systems to monitor video streams for anomalies, such as security breaches and violent behaviours in public areas, requires the development of robust and reliable methods. As a result, there is a need to provide tools to objectively evaluate and compare the real-world performance of different deep learning methods to identify the most effective approach for video anomaly detection. Current state-of-the-art metrics favour weakly-supervised strategies stating these as the best-performing approaches for the task. However, the area under the ROC curve, used to justify this statement, has been shown to be an unreliable metric for highly unbalanced data distributions, as is the case with anomaly detection datasets. This paper provides a new perspective and insights on the performance of video anomaly detection methods. It reports the results of a benchmark study with state-of-the-art methods using a novel proposed framework for evaluating and comparing the different models. The results of this benchmark demonstrate that using the currently employed set of reference metrics led to the misconception that weakly-supervised methods consistently outperform semi-supervised ones. © 2023 The Authors

CloseRead Abstract Read Full Publication

2023

Synthesizing Human Activity for Data Generation

Authors
Romero, A; Carvalho, P; Corte-Real, L; Pereira, A;

Publication
JOURNAL OF IMAGING

Abstract
The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, these are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and editing of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained using our method, which is two-fold: (i) the usage of an action classifier on the output data resulting from the mechanism and (ii) the generation of masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to their respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and the clean and precise formation of the masks.

CloseRead Abstract Read Full Publication

2024

Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach

Authors
Vilça, L; Viana, P; Carvalho, P; Andrade, MT;

Publication
IEEE ACCESS

Abstract
It is well known that the performance of Machine Learning techniques, notably when applied to Computer Vision (CV), depends heavily on the amount and quality of the training data set. However, large data sets lead to time-consuming training loops and, in many situations, are difficult or even impossible to create. Therefore, there is a need for solutions to reduce their size while ensuring good levels of performance, i.e., solutions that obtain the best tradeoff between the amount/quality of training data and the model's performance. This paper proposes a dataset reduction approach for training data used in Deep Learning methods in Facial Recognition (FR) problems. We focus on maximizing the variability of representations for each subject (person) in the training data, thus favoring quality instead of size. The main research questions are: 1) Which facial features better discriminate different identities? 2) Will it be possible to significantly reduce the training time without compromising performance? 3) Should we favor quality over quantity for very large datasets in FR? This analysis uses a pipeline to discriminate a set of features suitable for capturing the diversity and a cluster-based sampling to select the best images for each training subject, i.e., person. Results were obtained using VGGFace2 and Labeled Faces in the Wild (for benchmarking) and show that, with the proposed approach, a data reduction is possible while ensuring similar levels of accuracy.

CloseRead Abstract