Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Aníbal Ferreira

2011

Estimation of harmonic and noise components of the glottal excitation

Authors
Sousa, R; Ferreira, A; Alku, P;

Publication
Models and Analysis of Vocal Emissions for Biomedical Applications - 7th International Workshop, MAVEBA 2011

Abstract
This paper describes an algorithm which enables harmonic and noise splitting of the glottal excitation of voiced speech. The algorithm utilizes a straightforward harmonic and noise splitter which is utilized prior to glottal inverse filtering. The results show improved estimates of the glottal excitation in comparison to a known inverse filtering method.

2012

Speaker identification using phonetic segmentation and normalized relative delays of source harmonics

Authors
Mendes, D; Ferreira, A;

Publication
Proceedings of the AES International Conference

Abstract
Current state-of-The-Art speaker identification systems achieve high performances in reasonably well controlled conditions. However, some scenarios still elicit significant challenges, particularly in audio forensics when voice records are typically just a few seconds long and are severely affected by distortion, interferences, and abnormal speaking attitudes. In this paper we are inspired by the concept of minutiae in the context of fingerprinting, and try to extract localized, phase-related singularities from the speech signal denoting glottal source idiosyncratic information. First, we perform MFCC+GMM experiments in order to find the most effective phonetic segmentation of the speech signal for speaker modelling and discrimination. Secondly, we rely on effective phonetic segmentation and, in addition to MFCC features, we extract Normalized Relative Delays (NRDs) obtained from the phase of spectral harmonics. We use a Nearest Neighbour generalized classifier for speaker modelling and identification. Our results indicate that combining a careful phonetic segmentation and the inclusion of phase-related information, performance in speaker identification may increase significantly. Copyright © 2012 Audio Engineering Society, Inc.

2005

Accurate spectral replacement

Authors
Ferreira, AJS; Sinha, D;

Publication
Audio Engineering Society - 118th Convention Spring Preprints 2005

Abstract
Recent advances in perceptual audio coding are strongly based on the concept of bandwidth extension. Most techniques implementing bandwidth extension require an analysis/synthesis filter bank in addition to that used by the associated perceptual audio coder, which increases the overall system complexity and coding delay, and makes difficult the correct alignment between the operation of the audio coder and the operation of the bandwidth extension technique. We present a new Accurate Spectral Replacement (ASR) technique that is based on a suitable decomposition of the MDCT filter bank, and that implements synthesis of sinusoidal components with an accuracy much higher than the natural frequency resolution of the filter bank. The ASR technique is described, its performance is assessed with both synthetic and natural audio signals, and its main areas of application are addressed. Audio demos are available at http://www.atc-labs.com/asr/.

2005

A new low-delay codec for two-way high-quality audio communication

Authors
Ferreira, AJS; Sinlia, D;

Publication
Audio Engineering Society - 119th Convention Fall Preprints 2005

Abstract
High-quality audio bit-rate reduction systems are widely used in many application areas involving audio broadcast, streaming and download services. With the advent of 3G mobile and wireless communication networks, there is a clear opportunity for new multimedia services, notably those relying on two-way high- quality audio communication. In t his paper we describe a new source/perceptual audio coder that features low-delay, intrinsic error robustness and high subjective audio quality at competitive compression ratios. The structure of the audio coder is described and an emphasis is given on its innovative approaches to semantic signal segmentation and decomposition, independent coding of sinusoidal and noise components, and bandwidth extension using Accurate Spectral Replacement. A few test results are presented that illustrate the operation and performance of the new coder.

2005

A new broadcast quality low bit rate audio coding scheme utilizing novel bandwidth extension tools

Authors
Sinha, D; Ferreira, AJS;

Publication
Audio Engineering Society - 119th Convention Fall Preprints 2005

Abstract
In this paper we describe the components of a novel audio coding algorithm capable of delivering high-fidelity CDlike stereo audio at the bit rates of 40-48 kbps and natural sounding FM grade mono at the bit rates of 18-22 kbps. Bandwidth Extension has emerged as an important tool for the satisfactory performance of low bit rate audio codecs. Recently we proposed two new bandwidth extension algorithms, Fractal Self-Similarity Model (FSSM) and Accurate Spectral Replacement (ASR), which belong to a new class of Bandwidth Extension techniques which are applied directly to the high resolution frequency representation of the signal (e.g., MDCT or ODFT). The proposed coding scheme uses FSSM and ASR in an adaptive and complementary framework. Another important component of the proposed codec is a wideband psychoacoustic model that makes an explicit use of the Comodulation Release of Masking (CMR) phenomenon. It also includes a novel parametric stereo coding technique. The proposed audio coding scheme is geared towards broadcast applications where codec latency and encoder complexity is generally not an overriding concern. In this paper we present algorithmic details of the new codec, audio demonstrations, and, comparison to other audio coding schemes. Further information and audio demonstrations are available at http://www.atc-labs.com/teslapro.

2005

A new class of smooth power complementary windows and their application to audio signal processing

Authors
Sinha, D; Ferreira, AJS;

Publication
Audio Engineering Society - 119th Convention Fall Preprints 2005

Abstract
In this paper we describe a new family of smooth power complementary windows which exhibit a very high level of localization in both time and frequency domain. This window family is parameterized by a "smoothness quotient". As the smoothness quotient increases the window becomes increasingly localized in time (most of the energy gets concentrated in the center half of the window) and frequency (far field rejection becomes increasing stronger to the order of 150 dB or higher). A closed form solution for such window function exists and the associated design procedure is described. The new class of windows is quite attractive for a number of applications as switching functions, equalization functions, or as windows for overlap-add and modulated filter banks. An extension to the family of smooth windows which exhibits improved near-field response in the frequency domain is also discussed. More information is available at http://www.atc-labs.com/technology/misc/windows.

  • 11
  • 13