Pedro Miguel Carvalho

O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais

Instituição
Investigação
Domínios de Investigação
Inteligência Artificial

Bioengenharia

Comunicações

Ciência e Engenharia dos Computadores
Fotónica

Sistemas de Energia

Robótica

Engenharia e Gestão de Sistemas
CENTROS DE INVESTIGAÇÃO
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Inovação
Inovação / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Tecnologias Disponíveis
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratórios
Laboratórios de Investigação

iilab
Comunicação
Notícias

Eventos

Media

Boletim Informativo
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Junte-se a nós
Contactos

Home
Pessoas
Pedro Miguel Carvalho

Ler apresentação completa

Sou natural do distrito de porto. Obtive a Licenciatura em Eng. Eletrotécnica e de Computadores em 2001, o grau de Mestre em Redes e Serviços de Comunicação em 2004 e o Doutoramento em Eng. Eletrotécnica e de Computadores em 2012, todos na Faculdade de Engenharia da Universidade do Porto (FEUP). Sou colaborador no INESC TEC desde 2001 e tenho a função de Investigador Sénior no Centro de Telecomunicações e Multimédia. Sou também Professor Adjunto Convidado no Departamento de Engenharia Eletrotécnica do Instituto Superior de Engenharia do Porto (ISEP). Os meus atuais interesses de investigação incluem procesamento de imagem e vídeo, sistemas multimédia e visão computacional.

Ler apresentação completa

Sobre

Tópicos
de interesse

Detalhes

Nome
Pedro Miguel Carvalho
Cargo
Investigador Sénior
Desde
01 setembro 2001

Nacionalidade
Portugal
Centro
Centro de Telecomunicações e Multimédia
Contactos
+351222094299
pedro.m.carvalho@inesctec.pt

013

Publicações

Ler todas as publicações

2024

Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach

Autores
Vilça, L; Viana, P; Carvalho, P; Andrade, MT;

Publicação
IEEE ACCESS

Abstract
It is well known that the performance of Machine Learning techniques, notably when applied to Computer Vision (CV), depends heavily on the amount and quality of the training data set. However, large data sets lead to time-consuming training loops and, in many situations, are difficult or even impossible to create. Therefore, there is a need for solutions to reduce their size while ensuring good levels of performance, i.e., solutions that obtain the best tradeoff between the amount/quality of training data and the model's performance. This paper proposes a dataset reduction approach for training data used in Deep Learning methods in Facial Recognition (FR) problems. We focus on maximizing the variability of representations for each subject (person) in the training data, thus favoring quality instead of size. The main research questions are: 1) Which facial features better discriminate different identities? 2) Will it be possible to significantly reduce the training time without compromising performance? 3) Should we favor quality over quantity for very large datasets in FR? This analysis uses a pipeline to discriminate a set of features suitable for capturing the diversity and a cluster-based sampling to select the best images for each training subject, i.e., person. Results were obtained using VGGFace2 and Labeled Faces in the Wild (for benchmarking) and show that, with the proposed approach, a data reduction is possible while ensuring similar levels of accuracy.

FecharLer Abstract

2024

A Transition Towards Virtual Representations of Visual Scenes

Autores
Pereira, A; Carvalho, P; Côrte Real, L;

Publicação
Advances in Internet of Things & Embedded Systems

Abstract
We propose a unified architecture for visual scene understanding, aimed at overcoming the limitations of traditional, fragmented approaches in computer vision. Our work focuses on creating a system that accurately and coherently interprets visual scenes, with the ultimate goal to provide a 3D virtual representation, which is particularly useful for applications in virtual and augmented reality. By integrating various visual and semantic processing tasks into a single, adaptable framework, our architecture simplifies the design process, ensuring a seamless and consistent scene interpretation. This is particularly important in complex systems that rely on 3D synthesis, as the need for precise and semantically coherent scene descriptions keeps on growing. Our unified approach addresses these challenges, offering a flexible and efficient solution. We demonstrate the practical effectiveness of our architecture through a proof-of-concept system and explore its potential in various application domains, proving its value in advancing the field of computer vision.

FecharLer Abstract

2023

Benchmarking edge computing devices for grape bunches and trunks detection using accelerated object detection single shot multibox deep learning models

Autores
Magalhaes, SC; dos Santos, FN; Machado, P; Moreira, AP; Dias, J;

Publicação
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU-Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU-Tensor Processing Unit (such as Coral Dev Board TPU), and DPU-Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Methods: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency.Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5 W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.

FecharLer Abstract

2023

A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition

Autores
Guimaraes, V; Nascimento, J; Viana, P; Carvalho, P;

Publicação
APPLIED SCIENCES-BASEL

Abstract
When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields.

FecharLer Abstract

2023

From a Visual Scene to a Virtual Representation: A Cross-Domain Review

Autores
Pereira, A; Carvalho, P; Pereira, N; Viana, P; Corte-Real, L;

Publicação
IEEE ACCESS

Abstract
The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an end-to-end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.

FecharLer Abstract Ler Publicação Completa

Teses
supervisionadas

Teses supervisionadas

Ver todas as teses supervisionadas

2023

Image Processing of Grocery Labels for Assisted Analysis

Autor
Jéssica Mireie Fernandes do Nascimento

Instituição

2023

Synthesing Human Activity for Data Generation

Autor
Ana Ysabella Rodrigues Romero

Instituição

2022

Visual Data Processing for Anomaly Detection

Autor
Francisco Tiago de Espírito Santo e Caetano

Instituição

2022

Identification and extraction of floor planes for 3D representation

Autor
Carlos Miguel Guerra Soeiro

Instituição

2022

Segmentation and Extraction of Human Characteristics for 3D Video Synthesis

Autor
André Filipe Cardoso Madureira

Instituição

Ver todas as teses supervisionadas

Pedro Miguel Carvalho

Sobre

Detalhes

Nome

Cargo

Desde

Nacionalidade

Centro

Contactos

ASSIST

RETAIL_PRO

Cloud-Setup

SURGEONMATE

TenisApp2

WATSON

SUSTAINABLE PLASTICS

AURORA

CHIC

FotoInMotion

CLOUD4CANDY

NEXUS

Vision2Control

Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach

A Transition Towards Virtual Representations of Visual Scenes

Benchmarking edge computing devices for grape bunches and trunks detection using accelerated object detection single shot multibox deep learning models

A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition

From a Visual Scene to a Virtual Representation: A Cross-Domain Review

Image Processing of Grocery Labels for Assisted Analysis

Synthesing Human Activity for Data Generation

Visual Data Processing for Anomaly Detection

Identification and extraction of floor planes for 3D representation

Segmentation and Extraction of Human Characteristics for 3D Video Synthesis