Serkan Sulun

Cookies Policy

The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More

Institution
Research
Research Domains
Artificial Intelligence

Bioengineering

Communications

Computer Science and Engineering
Photonics

Power and Energy Systems

Robotics

Systems Engineering and Management
RESEARCH CENTERS
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Innovation
Innovation / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Available Technologies
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratories
Research Laboratories

iilab
Communication
News

Events

Media

Newsletter
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Work with us
Contacts

Home
People
Serkan Sulun

Interest
Topics

Details

Name
Serkan Sulun
Role
Research Assistant
Since
11th March 2019

Nationality
Turquia
Centre
Telecommunications and Multimedia
Contacts
+351222094000
serkan.sulun@inesctec.pt

001

Publications

View all Publications

2024

Movie trailer genre classification using multimodal pretrained features

Authors
Sulun, S; Viana, P; Davies, MEP;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.

CloseRead Abstract

2023

Emotion4MIDI: A Lyrics-Based Emotion-Labeled Symbolic Music Dataset

Authors
Sulun, S; Oliveira, P; Viana, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT II

Abstract
We present a new large-scale emotion-labeled symbolic music dataset consisting of 12 k MIDI songs. To create this dataset, we first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. We then applied these models to lyrics from two large-scale MIDI datasets. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions and, especially, to develop models that can generate music based on specific emotions. Our code for inference, trained models, and datasets are available online.

CloseRead Abstract

2022

Symbolic Music Generation Conditioned on Continuous-Valued Emotions

Authors
Sulun, S; Davies, MEP; Viana, P;

Publication
IEEE ACCESS

Abstract
In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We evaluate our approach in a quantitative manner in two ways, first by measuring its note prediction accuracy, and second via a regression task in the valence-arousal plane. Our results demonstrate that our proposed approaches outperform conditioning using control tokens which is representative of the current state of the art.

CloseRead Abstract

2021

On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

Authors
Sulun, S; Davies, MEP;

Publication
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING

Abstract
In this paper, we address a subtopic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low-pass filter when training and subsequently testing the network. For two different state-of-the-art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7 dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low-pass filters during training and leads to improved generalization to unseen filtering conditions at test time.

CloseRead Abstract

2020

Can learned frame prediction compete with block motion compensation for video coding?

Authors
Sulun, S; Tekalp, AM;

Publication
Signal, Image and Video Processing

Abstract

Details

Name

Role

Since

Nationality

Centre

Contacts

Inphinit

Movie trailer genre classification using multimodal pretrained features

Emotion4MIDI: A Lyrics-Based Emotion-Labeled Symbolic Music Dataset

Symbolic Music Generation Conditioned on Continuous-Valued Emotions

On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

Can learned frame prediction compete with block motion compensation for video coding?