Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Paula Viana

2022

Symbolic Music Generation Conditioned on Continuous-Valued Emotions

Autores
Sulun, S; Davies, MEP; Viana, P;

Publicação
IEEE ACCESS

Abstract
In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We evaluate our approach in a quantitative manner in two ways, first by measuring its note prediction accuracy, and second via a regression task in the valence-arousal plane. Our results demonstrate that our proposed approaches outperform conditioning using control tokens which is representative of the current state of the art.

2022

Enhancing Photography Management Through Automatically Extracted Metadata

Autores
Carvalho, P; Freitas, D; Machado, T; Viana, P;

Publicação
INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021

Abstract
The tremendous increase in photographs that are captured each day by common users has been favoured by the availability of high quality devices at accessible costs, such as smartphones and digital cameras. However, the quantity of captured photos raises new challenges regarding the access and management of image repositories. This paper describes a lightweight distributed framework intended to help overcome these problems. It uses image metadata in EXIF format, already widely added to images by digital acquisition devices, and automatic facial recognition to provide management and search functionalities. Moreover, a visualization functionality using a graph-based strategy was integrated, enabling an enhanced and more interactive navigation through search results and the corresponding relations.

2022

Improving word embeddings in Portuguese: increasing accuracy while reducing the size of the corpus

Autores
Pinto, JP; Viana, P; Teixeira, I; Andrade, M;

Publicação
PEERJ COMPUTER SCIENCE

Abstract
The subjectiveness of multimedia content description has a strong negative impact on tag-based information retrieval. In our work, we propose enhancing available descriptions by adding semantically related tags. To cope with this objective, we use a word embedding technique based on the Word2Vec neural network parameterized and trained using a new dataset built from online newspapers. A large number of news stories was scraped and pre-processed to build a new dataset. Our target language is Portuguese, one of the most spoken languages worldwide. The results achieved significantly outperform similar existing solutions developed in the scope of different languages, including Portuguese. Contributions include also an online application and API available for external use. Although the presented work has been designed to enhance multimedia content annotation, it can be used in several other application areas.

2022

Automated Adequacy Assessment of Cervical Cytology Samples Using Deep Learning

Autores
Mosiichuk, V; Viana, P; Oliveira, T; Rosado, L;

Publicação
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022)

Abstract
Cervical cancer has been among the most common causes of cancer death in women. Screening tests such as liquid-based cytology (LBC) were responsible for a substantial decrease in mortality rates. Still, visual examination of cervical cells on microscopic slides is a time-consuming, ambiguous and challenging task, aggravated by inadequate sample quality (e.g. low cellularity or the presence of obscuring factors like blood or inflammation). While most works in the literature are focused on the automated detection of cervical lesions to support diagnosis, to the best of our knowledge, none of them address the automated assessment of sample adequacy, as established by The Bethesda System (TBS) guidelines. This work proposes a new methodology for automated adequacy assessment of cervical cytology samples. Since the most common reason for rejecting samples is the low count of the squamous nucleus, our approach relies on a deep learning object detection model for the detection and counting of different types of nuclei present in LBC samples. A dataset of 41 samples with a total of 42387 nuclei manually annotated by experienced specialists was used, and the best solution proposed achieved promising results for the automated detection of squamous nuclei (AP of 82.4%, Accuracy of 79.8%, Recall of 73.8% and Fl score of 81.5%). Additionally, by merging the developed automated cell counting approach with the adequacy criteria stated by the TBS guidelines, we validated our approach by correctly classifying an entire subset of 12 samples as adequate or inadequate.

2023

A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition

Autores
Guimaraes, V; Nascimento, J; Viana, P; Carvalho, P;

Publicação
APPLIED SCIENCES-BASEL

Abstract
When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields.

2023

A Dataset for User Visual Behaviour with Multi-View Video Content

Autores
da Costa, TS; Andrade, MT; Viana, P; Silva, NC;

Publicação
PROCEEDINGS OF THE 2023 PROCEEDINGS OF THE 14TH ACM MULTIMEDIA SYSTEMS CONFERENCE, MMSYS 2023

Abstract
Immersive video applications impose unpractical bandwidth requirements for best-effort networks. With Multi-View(MV) streaming, these can be minimized by resorting to view prediction techniques. SmoothMV is a multi-view system that uses a non-intrusive head tracking mechanism to detect the viewer's interest and select appropriate views. By coupling Neural Networks (NNs) to anticipate the viewer's interest, a reduction of view-switching latency is likely to be obtained. The objective of this paper is twofold: 1) Present a solution for acquisition of gaze data from users when viewing MV content; 2) Describe a dataset, collected with a large-scale testbed, capable of being used to train NNs to predict the user's viewing interest. Tracking data from head movements was obtained from 45 participants using an Intel Realsense F200 camera, with 7 video playlists, each being viewed a minimum of 17 times. This dataset is publicly available to the research community and constitutes an important contribution to reducing the current scarcity of such data. Tools to obtain saliency/heat maps and generate complementary plots are also provided as an open-source software package.

  • 7
  • 10