2022
Authors
Capozzi, L; Barbosa, V; Pinto, C; Pinto, JR; Pereira, A; Carvalho, PM; Cardoso, JS;
Publication
IEEE ACCESS
Abstract
With the advent of self-driving cars and the push by large companies into fully driverless transportation services, monitoring passenger behaviour in vehicles is becoming increasingly important for several reasons, such as ensuring safety and comfort. Although several human action recognition (HAR) methods have been proposed, developing a true HAR system remains a very challenging task. If the dataset used to train a model contains a small number of actors, the model can become biased towards these actors and their unique characteristics. This can cause the model to generalise poorly when confronted with new actors performing the same actions. This limitation is particularly acute when developing models to characterise the activities of vehicle occupants, for which data sets are short and scarce. In this study, we describe and evaluate three different methods that aim to address this actor bias and assess their performance in detecting in-vehicle violence. These methods work by removing specific information about the actor from the model's features during training or by using data that is independent of the actor, such as information about body posture. The experimental results show improvements over the baseline model when evaluated with real data. On the Hanau03 Vito dataset, the accuracy improved from 65.33% to 69.41%. On the Sunnyvale dataset, the accuracy improved from 82.81% to 86.62%.
2022
Authors
Caetano, F; Carvalho, P; Cardoso, J;
Publication
APPLIED SCIENCES-BASEL
Abstract
Anomaly detection has been an active research area for decades, with high application potential. Recent work has explored deep learning approaches to the detection of abnormal behaviour and abandoned objects in outdoor video surveillance scenarios. The extension of this recent work to in-vehicle monitoring using solely visual data represents a relevant research opportunity that has been overlooked in the accessible literature. With the increasing importance of public and shared transportation for urban mobility, it becomes imperative to provide autonomous intelligent systems capable of detecting abnormal behaviour that threatens passenger safety. To investigate the applicability of current works to this scenario, a recapitulation of relevant state-of-the-art techniques and resources is presented, including available datasets for their training and benchmarking. The lack of public datasets dedicated to in-vehicle monitoring is addressed alongside other issues not considered in previous works, such as moving backgrounds and frequent illumination changes. Despite its relevance, similar surveys and reviews have disregarded this scenario and its specificities. This work initiates an important discussion on application-oriented issues, proposing solutions to be followed in future works, particularly synthetic data augmentation to achieve representative instances with the low amount of available sequences.
2022
Authors
Pereira, A; Carvalho, P; Corte Real, L;
Publication
SIGNAL IMAGE AND VIDEO PROCESSING
Abstract
Color comparison is a key aspect in many areas of application, including industrial applications, and different metrics have been proposed. In many applications, this comparison is required to be closely related to human perception of color differences, thus adding complexity to the process. To tackle this, different approaches were proposed through the years, culminating in the CIEDE2000 formulation. In our previous work, we showed that simple color properties could be used to reduce the computational time of a color similarity decision process that employed this metric, which is recognized as having high computational complexity. In this paper, we show mathematically and experimentally that these findings can be adapted and extended to the recently proposed CIEDE2000 PF metric, which has been recommended by the CIE for industrial applications. Moreover, we propose new efficient models that not only achieve lower error rates, but also outperform the results obtained for the CIEDE2000 metric.
2023
Authors
Magalhaes, SC; dos Santos, FN; Machado, P; Moreira, AP; Dias, J;
Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Abstract
Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU-Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU-Tensor Processing Unit (such as Coral Dev Board TPU), and DPU-Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Methods: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency.Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5 W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.
2023
Authors
Guimaraes, V; Nascimento, J; Viana, P; Carvalho, P;
Publication
APPLIED SCIENCES-BASEL
Abstract
When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields.
2007
Authors
Ciobanu, G; Andrade, MT; Carvalho, P; Carrapatoso, E;
Publication
NOVAS PERSPECTIVAS EM SISTEMAS E TECNOLOGIAS DE INFORMACAO, VOL II
Abstract
MPEG-21 enables content consumers to access and interoperate with a large variety of multimedia resources and their descriptions in a flexible manner. Considering the great heterogeneity that presently exists across the entire multimedia content chain and the growing importance of open standards to facilitate the interoperations across environments, applications and formats, an MPEG-21 Peer was developed to process and present complex multimedia content, represented as MPEG-21 Digital Items. The novelty of the work essentially relies on the adoption of a Web Services architecture, based on a single Digital Items processing core available for all types of terminal devices.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.