Publications

Publications by Aníbal Ferreira

2012

Automatic Transcription of Polyphonic Piano Music Using Genetic Algorithms, Adaptive Spectral Envelope Modeling, and Dynamic Noise Level Estimation

Authors
Reis, G; Fernandez de Vega, FF; Ferreira, A;

Publication
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Abstract
This paper presents a new method for multiple fundamental frequency (F0) estimation on piano recordings. We propose a framework based on a genetic algorithm in order to analyze the overlapping overtones and search for the most likely F0 combination. The search process is aided by adaptive spectral envelope modeling and dynamic noise level estimation: while the noise is dynamically estimated, the spectral envelope of previously recorded piano samples (internal database) is adapted in order to best match the piano played on the input signals and aid the search process for the most likely combination of F0s. For comparison, several state-of-the-art algorithms were run across various musical pieces played by different pianos and then compared using three different metrics. The proposed algorithm ranked first place on Hybrid Decay/Sustain Score metric, which has better correlation with the human hearing perception and ranked second place on both onset-only and onset-offset metrics. A previous genetic algorithm approach is also included in the comparison to show how the proposed system brings significant improvements on both quality of the results and computing time. Index Terms-Acoustic signal analysis, automatic

CloseRead Abstract

2001

Combined spectral envelope normalization and subtraction of sinusoidal components in the ODFT and MDCT frequency domains

Authors
Ferreira, AJS;

Publication
PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS

Abstract
Recent research in high-quality audio coding seeks not only improved coding gains but also new functionalities, such as easy semantic access to compressed audio material and audio modification in the compressed domain. These objectives imply the decomposition of the audio signal into several components of specific semantic value, such as sinusoidal components, that take advantage of selective coding and parametrization tools. In this paper we presume an MDCT based audio coding environment and present a new technique combining spectral envelope normalization with accurate subtraction of sinusoidal components in the MDCT frequency domain. It is shown how a parametrization of L stationary sinusoids in the complex ODFT spectrum can lead to the effective subtraction in the real MDCT spectrum, of 3L spectral lines. A demonstration of the implementation of the technique is available on the Internet.

CloseRead Abstract

2001

Accurate estimation in the ODFT domain of the frequency, phase and magnitude of stationary sinusoids

Authors
Ferreira, AJS;

Publication
PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS

Abstract
This paper addresses the extraction of parametric information in an audio coder that uses the MDCT filter bank. The computation of the filter bank is reformulated as a function of the Odd-DFT, in order to allow the estimation of the frequency, the phase and the magnitude of stationary sinusoids. Closed expression delivering accurate estimates are derived and explained, and their implementation and accuracy are illustrated in a Web page that includes a demonstration Matlab M-file.

CloseRead Abstract

1996

Convolutional effects in transform coding with TDAC: An optimal window

Authors
Ferreira, AJS;

Publication
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING

Abstract
Perceptual coders have proven to be highly efficient in the context of audio or video applications involving bit rate reduction. However, this efficiency is strongly limited in very low bit rate coding conditions. This paper studies the multiplicative effects of quantization in the frequency domain, when an overlapped filter bank (TDAC) is used to shape the quantization noise in a perceptually optimal way. The associated circular convolution operation generates aliased components in the time domain that are examined and subjected to minimization. A closed form expression is suggested to approximate an optimal transform window offering a desired tradeoff between the reduction of the time artifacts produced by a coarse quantization and the reduction of the stop-band leakage, relative to other transform windows commonly used.

CloseRead Abstract

2009

Automatic Recognition of Isolated Vowels Using F0-Normalized Harmonic Features

Authors
Ferreira, A;

Publication
E-BUSINESS AND TELECOMMUNICATIONS

Abstract
Human recognition of isolated vowels is quite robust considering intra and inter-speaker variability. Automatic recognition techniques typically exhibit poor performances, notably in the case of female or child speech because a higher fundamental frequency (F0) generates a sparser sampling of the magnitude spectrum. In this paper we extend previous results on a perceptually motivated concept of vowel recognition that is based on Perceptual Spectral Clusters (PSC) of harmonic partials. We study the effect of normalizing relevant PSC features by F0 taking as a reference the recognition performance of static features derived from either Linear Prediction (LP) analysis or Mel-Frequency Cepstral Coefficients (MFCC), and using the Mahalanobis distance on a data base of five natural Portuguese vowel sounds uttered by 44 speakers. Test results reveal that the recognition performance of F0-normalized PSC features increases approaching that of MFCC coefficients. These results are significant as PSC related features are amenable to concurrent vowel identification while LP or MFCC-related features are not.

CloseRead Abstract

2012

Accurate analysis and visual feedback of vibrato in singing

Authors
Ventura, J; Sousa, R; Ferreira, A;

Publication
5th International Symposium on Communications Control and Signal Processing, ISCCSP 2012

Abstract
Vibrato is a frequency modulation effect of the singing voice and is very relevant in musical terms. Its most important characteristics are the vibrato frequency (in Hertz) and the vibrato extension (in semitones). In singing teaching and learning, it is very convenient to provide a visual feedback of those two objective signal characteristics, in real-time. In this paper we describe an algorithm performing vibrato detection and analysis. Since this capability depends on fundamental frequency (F0) analysis of the singing voice, we first discuss F0 estimation and compare three algorithms that are used in voice and speech analysis. Then we describe the vibrato detection and analysis algorithm and assess its performance using both synthetic and natural singing signals. Overall, results indicate that the relative estimation errors in vibrato frequency and extension are lower than 0.1%. © 2012 IEEE.

CloseRead Abstract