2011
Authors
Sousa, R; Ferreira, A; Alku, P;
Publication
Models and Analysis of Vocal Emissions for Biomedical Applications - 7th International Workshop, MAVEBA 2011
Abstract
This paper describes an algorithm which enables harmonic and noise splitting of the glottal excitation of voiced speech. The algorithm utilizes a straightforward harmonic and noise splitter which is utilized prior to glottal inverse filtering. The results show improved estimates of the glottal excitation in comparison to a known inverse filtering method.
2012
Authors
Mendes, D; Ferreira, A;
Publication
Proceedings of the AES International Conference
Abstract
Current state-of-The-Art speaker identification systems achieve high performances in reasonably well controlled conditions. However, some scenarios still elicit significant challenges, particularly in audio forensics when voice records are typically just a few seconds long and are severely affected by distortion, interferences, and abnormal speaking attitudes. In this paper we are inspired by the concept of minutiae in the context of fingerprinting, and try to extract localized, phase-related singularities from the speech signal denoting glottal source idiosyncratic information. First, we perform MFCC+GMM experiments in order to find the most effective phonetic segmentation of the speech signal for speaker modelling and discrimination. Secondly, we rely on effective phonetic segmentation and, in addition to MFCC features, we extract Normalized Relative Delays (NRDs) obtained from the phase of spectral harmonics. We use a Nearest Neighbour generalized classifier for speaker modelling and identification. Our results indicate that combining a careful phonetic segmentation and the inclusion of phase-related information, performance in speaker identification may increase significantly. Copyright © 2012 Audio Engineering Society, Inc.
2005
Authors
Ferreira, AJS; Sinha, D;
Publication
Audio Engineering Society - 118th Convention Spring Preprints 2005
Abstract
Recent advances in perceptual audio coding are strongly based on the concept of bandwidth extension. Most techniques implementing bandwidth extension require an analysis/synthesis filter bank in addition to that used by the associated perceptual audio coder, which increases the overall system complexity and coding delay, and makes difficult the correct alignment between the operation of the audio coder and the operation of the bandwidth extension technique. We present a new Accurate Spectral Replacement (ASR) technique that is based on a suitable decomposition of the MDCT filter bank, and that implements synthesis of sinusoidal components with an accuracy much higher than the natural frequency resolution of the filter bank. The ASR technique is described, its performance is assessed with both synthetic and natural audio signals, and its main areas of application are addressed. Audio demos are available at http://www.atc-labs.com/asr/.
2005
Authors
Ferreira, AJS; Sinlia, D;
Publication
Audio Engineering Society - 119th Convention Fall Preprints 2005
Abstract
High-quality audio bit-rate reduction systems are widely used in many application areas involving audio broadcast, streaming and download services. With the advent of 3G mobile and wireless communication networks, there is a clear opportunity for new multimedia services, notably those relying on two-way high- quality audio communication. In t his paper we describe a new source/perceptual audio coder that features low-delay, intrinsic error robustness and high subjective audio quality at competitive compression ratios. The structure of the audio coder is described and an emphasis is given on its innovative approaches to semantic signal segmentation and decomposition, independent coding of sinusoidal and noise components, and bandwidth extension using Accurate Spectral Replacement. A few test results are presented that illustrate the operation and performance of the new coder.
2005
Authors
Sinha, D; Ferreira, AJS;
Publication
Audio Engineering Society - 119th Convention Fall Preprints 2005
Abstract
In this paper we describe the components of a novel audio coding algorithm capable of delivering high-fidelity CDlike stereo audio at the bit rates of 40-48 kbps and natural sounding FM grade mono at the bit rates of 18-22 kbps. Bandwidth Extension has emerged as an important tool for the satisfactory performance of low bit rate audio codecs. Recently we proposed two new bandwidth extension algorithms, Fractal Self-Similarity Model (FSSM) and Accurate Spectral Replacement (ASR), which belong to a new class of Bandwidth Extension techniques which are applied directly to the high resolution frequency representation of the signal (e.g., MDCT or ODFT). The proposed coding scheme uses FSSM and ASR in an adaptive and complementary framework. Another important component of the proposed codec is a wideband psychoacoustic model that makes an explicit use of the Comodulation Release of Masking (CMR) phenomenon. It also includes a novel parametric stereo coding technique. The proposed audio coding scheme is geared towards broadcast applications where codec latency and encoder complexity is generally not an overriding concern. In this paper we present algorithmic details of the new codec, audio demonstrations, and, comparison to other audio coding schemes. Further information and audio demonstrations are available at http://www.atc-labs.com/teslapro.
2005
Authors
Sinha, D; Ferreira, AJS;
Publication
Audio Engineering Society - 119th Convention Fall Preprints 2005
Abstract
In this paper we describe a new family of smooth power complementary windows which exhibit a very high level of localization in both time and frequency domain. This window family is parameterized by a "smoothness quotient". As the smoothness quotient increases the window becomes increasingly localized in time (most of the energy gets concentrated in the center half of the window) and frequency (far field rejection becomes increasing stronger to the order of 150 dB or higher). A closed form solution for such window function exists and the associated design procedure is described. The new class of windows is quite attractive for a number of applications as switching functions, equalization functions, or as windows for overlap-add and modulated filter banks. An extension to the family of smooth windows which exhibits improved near-field response in the frequency domain is also discussed. More information is available at http://www.atc-labs.com/technology/misc/windows.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.