2016
Authors
Ferreira, A; Sinha, D;
Publication
140th Audio Engineering Society International Convention 2016, AES 2016
Abstract
In recent years, tools in perceptual coding of high-quality audio have been tailored to capture highly detailed information regarding signal components so that they gained an intrinsic ability to represent audio parametrically. In a recent paper, we described a first validation model to such an approach applied to parametric coding of wideband speech. In this paper we describe specific advances to such an approach that improve coding efficiency and signal quality. A special focus is devoted to the fact that persistent transmission to the decoder of phase information is avoided, to the synthesis of both impulse-like and noise-based plosives using short-term windows, to improved ways of spectral envelope modelling, and to the fact that direct synthesis in the time-domain of the periodic content of speech is allowed in order to cope with fast F0 changes. A few examples of signal coding and transformation illustrate the impact of those improvements.
2017
Authors
Lobo, J; Ferreira, L; Ferreira, AJ;
Publication
Health Care Delivery and Clinical Science
Abstract
2018
Authors
Ferreira, A;
Publication
ICETE 2018 - Proceedings of the 15th International Joint Conference on e-Business and Telecommunications
Abstract
In this paper we report on a number of speaker identification experiments that assume a phonetic-oriented segmentation scheme exists such as to motivate the extraction of psychoacoustically-motivated phase and pitch related features. MFCC features are also considered for benchmarking. An emphasis is given to an innovative shift-invariant phase-related feature that is closely linked to the glottal source. A very simple statistical modeling is proposed and adapted in order to highlight the relative discrimination capabilities of different feature types. Results are presented for individual features and a discussion is also developed regarding possibilities of fusing features at the speaker modeling stage, or fusing distances at the speaker identification stage. Copyright
2019
Authors
Ferreira, AJ;
Publication
2019 AES INTERNATIONAL CONFERENCE ON AUDIO FORENSICS
Abstract
Automatic speaker identification typically relies on sophisticated statistical modeling and classification which requires large amounts of data for good performance. However, in actual audio forensics casework, frequently only a few seconds of speech material are available. In this paper, we favor diversity in feature extraction, simple modeling and classification, and constructive combination of congruent classification scores. We use phase, spectral magnitude and F0-related features in speaker identification experiments on a database of 35 speakers most of whom are twins. Using only 4.4 sec. of vowel-like sounds per speaker, we characterize the performance that is reached with individual features and we characterize simple and yet effective ways of classification score fusion. Insights for further research are also presented.
2018
Authors
Ferreira, AJ; Tribolet, JM;
Publication
DAFx 2018 - Proceedings: 21st International Conference on Digital Audio Effects
Abstract
This paper addresses a phase-related feature that is time-shift invariant, and that expresses the relative phases of all harmonics with respect to that of the fundamental frequency. We identify the feature as Normalized Relative Delay (NRD) and we show that it is particularly useful to describe the holistic phase properties of voiced sounds produced by a human speaker, notably vowel sounds. We illustrate the NRD feature with real data that is obtained from five sustained vowels uttered by 20 female speakers and 17 male speakers. It is shown that not only NRD coefficients carry idiosyncratic information, but also their estimation is quite stable and robust for all harmonics encompassing, for most vowels, at least the first four formant frequencies. The average NRD model that is estimated using data pertaining to all speakers in our database is compared to that of the idealized Liljencrants-Fant (LF) and Rosenberg glottal models. We also present results on the phase effects of linear-phase FIR and IIR vocal tract filter models when a plausible source excitation is used that corresponds to the derivative of the L-F glottal flow model. These results suggest that the shape of NRD feature vectors is mainly determined by the glottal pulse and only marginally affected by either the group delay of the vocal tract filter model, or by the acoustic coupling between glottis and vocal tract structures. Copyright
2020
Authors
Ferreira, A; Silva, J; Brito, F; Sinha, D;
Publication
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING
Abstract
Harmonic representation models are widely used, notably in speech coding and synthesis. In this paper, we describe two fully parametric harmonic representation and signal reconstruction alternatives that rely on a shift-invariant harmonic phase model and that implement accurate frame-based synthesis in the frequency-domain, and accurate pitch pulse-based synthesis in the time-domain. We use natural spoken and sung voice signals in order to assess the objective and subjective quality of both alternatives when parameters are exact, and when they are replaced by compact and shift-invariant harmonic phase and magnitude approximation models. We highlight the flexibility of these models and present results indicating that not only does the compact shift-invariant phase model cause a smaller impact than that caused by harmonic magnitude modeling, but it also compares favorably to results presented in the literature.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.