2016
Autores
Fernandes, R; Andrade, MT;
Publicação
U.Porto Journal of Engineering
Abstract
Multimedia content consumption is very popular nowadays. However, not every content can be consumed in its original format: the combination of content, transport and access networks, consumption device and usage environment characteristics may all pose restrictions to that purpose. One way to provide the best possible quality to the user is to adapt the content according to these restrictions as well as user preferences. This adaptation stage can be best executed if knowledge about the content is known a-priori. In order to provide this knowledge we classify the content based on metrics to define its temporal and spatial complexity. The temporal complexity classification is based on the Motion Vectors of the predictive encoded frames and on the difference between frames. The spatial complexity classification is based on different implementations of an edge detection algorithm and an image activity measure.
2019
Autores
Costa, TS; Andrade, MT; Viana, P;
Publicação
ELECTRONICS LETTERS
Abstract
This Letter discusses the benefits of introducing Machine Learning techniques in multi-view streaming applications. Widespread use of machine learning techniques has contributed to significant gains in numerous scientific and industry fields. Nonetheless, these have not yet been specifically applied to adaptive interactive multimedia streaming systems where, typically, the encoding bit rate is adapted based on resources availability, targeting the efficient use of network resources whilst offering the best possible user quality of experience (QoE). Intrinsic user data could be coupled with such existing quality adaptation mechanisms to derive better results, driven also by the preferences of the user. Head-tracking data, captured from camera feeds available at the user side, is an example of such data to which Recurrent Attention Models could be applied to accurately predict the focus of attention of users within videos frames. Information obtained from such models could be used to assist a preemptive buffering approach of specific viewing angles, contributing to the joint goal of maximising QoE. Based on these assumptions, a research line is presented, focusing on obtaining better QoE in an already existing multi-view streaming system
2018
Autores
Castro, H; Andrade, MT;
Publicação
International Journal of Computer Information Systems and Industrial Management Applications
Abstract
Machine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process. © MIR Labs.
2019
Autores
Fernandes, R; Andrade, MT;
Publicação
INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS (ICNAAM-2018)
Abstract
Multimedia content adaptation decision is necessary whenever a multimedia transmission system has multiple adaptations available to adjust the content representation requirements to the present available system resources. The implementation of an adaptation decision module, based on a Markov Decision Process, requires to weight the adaptations, to establish the adaptation plan to deliver the best possible Quality of Experience (QoE) to the user. We present a method, using a feedforward neural network, to determine these costs using two approaches: user and service provider perspectives.
2018
Autores
Vilaça, L; Viana, P; Carvalho, P; Andrade, MT;
Publicação
Proceedings of the Tenth International Conference on Soft Computing and Pattern Recognition, SoCPaR 2018, Porto, Portugal, December 13-15, 2018
Abstract
Over the last years, Deep Learning has become one of the most popular research fields of Artificial Intelligence. Several approaches have been developed to address conventional challenges of AI. In computer vision, these methods provide the means to solve tasks like image classification, object identification and extraction of features. In this paper, some approaches to face detection and recognition are presented and analyzed, in order to identify the one with the best performance. The main objective is to automate the annotation of a large dataset and to avoid the costy and time-consuming process of content annotation. The approach follows the concept of incremental learning and a R-CNN model was implemented. Tests were conducted with the objective of detecting and recognizing one personality within image and video content. Results coming from this initial automatic process are then made available to an auxiliary tool that enables further validation of the annotations prior to uploading them to the archive. Tests show that, even with a small size dataset, the results obtained are satisfactory. © 2020, Springer Nature Switzerland AG.
2020
Autores
Viana, P; Carvalho, P; Andrade, MT; Jonker, PP; Papanikolaou, V; Teixeira, IN; Vilaça, L; Pinto, JP; Costa, T;
Publicação
MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020
Abstract
Multimedia content production is nowadays widespread due to technological advances, namely supported by smartphones and social media. Although the massive amount of media content brings new opportunities to the industry, it also obfuscates the relevance of marketing content, meant to maintain and lure new audiences. This leads to an emergent necessity of producing these kinds of contents as quickly and engagingly as possible. Creating these automatically would decrease both the production costs and time, particularly by using static media for the creation of short storytelling animated clips. We propose an innovative approach that uses context and content information to transform a still photo into an appealing context-aware video clip. Thus, our solution presents a contribution to the state-of-the-art in computer vision and multimedia technologies and assists content creators with a value-added service to automatically build rich contextualized multimedia stories from single photographs. © 2020 Owner/Author.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.