2010
Autores
Viana, P; Alves, AP;
Publicação
MULTIMEDIA TOOLS AND APPLICATIONS
Abstract
The challenge of managing large scale media assets has led to the development of metadata schemas that are expected to enable efficient search and retrieval of multimedia content. Those approaches propose schemas that can range from simple keyword based descriptions to complex hierarchical organization of information. However, effective media asset management requires more than content searching and retrieval: underlying infrastructures are usually complex, require the use of a number of different equipment and management decisions have to be done based on information available from the multimedia metadata layer as well as on data describing system resources and capabilities. In this paper we propose a new ontology that aggregates information from different sources and enables a top level business oriented view of multimedia archives.
2000
Autores
Viana, P; Alves, AP;
Publicação
INTERNET MULTIMEDIA MANAGEMENT SYSTEMS
Abstract
The evolution of Television towards de digital domain is opening new opportunities but also new challenges both to users and system managers. Audiovisual television archives will be an essential component of the whole digital television operators systems, as archived information needs to be available to a wide range of users. This paper presents the work developed at INESC Porto within the VIDION project and the experiments on merging television, computer and telecommunications concepts and technologies by the use of software agents and CORBA to assist in solving problems of information and system configuration and management in a TV archive. Aspects such as definition of the problem, architecture proposed and current state of the work will be the focus of the paper.
2023
Autores
Pereira, A; Carvalho, P; Pereira, N; Viana, P; Corte-Real, L;
Publicação
IEEE ACCESS
Abstract
The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an end-to-end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.
2023
Autores
Mosiichuk, V; Sampaio, A; Viana, P; Oliveira, T; Rosado, L;
Publicação
APPLIED SCIENCES-BASEL
Abstract
Liquid-based cytology (LBC) plays a crucial role in the effective early detection of cervical cancer, contributing to substantially decreasing mortality rates. However, the visual examination of microscopic slides is a challenging, time-consuming, and ambiguous task. Shortages of specialized staff and equipment are increasing the interest in developing artificial intelligence (AI)-powered portable solutions to support screening programs. This paper presents a novel approach based on a RetinaNet model with a ResNet50 backbone to detect the nuclei of cervical lesions on mobile-acquired microscopic images of cytology samples, stratifying the lesions according to The Bethesda System (TBS) guidelines. This work was supported by a new dataset of images from LBC samples digitalized with a portable smartphone-based microscope, encompassing nucleus annotations of 31,698 normal squamous cells and 1395 lesions. Several experiments were conducted to optimize the model's detection performance, namely hyperparameter tuning, transfer learning, detected class adjustments, and per-class score threshold optimization. The proposed nucleus-based methodology improved the best baseline reported in the literature for detecting cervical lesions on microscopic images exclusively acquired with mobile devices coupled to the & mu;SmartScope prototype, with per-class average precision, recall, and F1 scores up to 17.6%, 22.9%, and 16.0%, respectively. Performance improvements were obtained by transferring knowledge from networks pre-trained on a smaller dataset closer to the target application domain, as well as including normal squamous nuclei as a class detected by the model. Per-class tuning of the score threshold also allowed us to obtain a model more suitable to support screening procedures, achieving F1 score improvements in most TBS classes. While further improvements are still required to use the proposed approach in a clinical context, this work reinforces the potential of using AI-powered mobile-based solutions to support cervical cancer screening. Such solutions can significantly impact screening programs worldwide, particularly in areas with limited access and restricted healthcare resources.
2023
Autores
Costa, TS; Viana, P; Andrade, MT;
Publicação
IEEE ACCESS
Abstract
Quality of Experience (QoE) in multi-view streaming systems is known to be severely affected by the latency associated with view-switching procedures. Anticipating the navigation intentions of the viewer on the multi-view scene could provide the means to greatly reduce such latency. The research work presented in this article builds on this premise by proposing a new predictive view-selection mechanism. A VGG16-inspired Convolutional Neural Network (CNN) is used to identify the viewer's focus of attention and determine which views would be most suited to be presented in the brief term, i.e., the near-term viewing intentions. This way, those views can be locally buffered before they are actually needed. To this aim, two datasets were used to evaluate the prediction performance and impact on latency, in particular when compared to the solution implemented in the previous version of our multi-view streaming system. Results obtained with this work translate into a generalized improvement in perceived QoE. A significant reduction in latency during view-switching procedures was effectively achieved. Moreover, results also demonstrated that the prediction of the user's visual interest was achieved with a high level of accuracy. An experimental platform was also established on which future predictive models can be integrated and compared with previously implemented models.
2023
Autores
Sulun, S; Oliveira, P; Viana, P;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT II
Abstract
We present a new large-scale emotion-labeled symbolic music dataset consisting of 12 k MIDI songs. To create this dataset, we first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. We then applied these models to lyrics from two large-scale MIDI datasets. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions and, especially, to develop models that can generate music based on specific emotions. Our code for inference, trained models, and datasets are available online.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.