Publications

Publications by Aníbal Ferreira

1998

A new frequency domain approach to time-scale expansion of audio signals

Authors
Ferreira, AJS;

Publication
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6

Abstract
We present a new algorithm for time-scale expansion of audio signals that comprises: time interpolation, frequency-scale expansion and modification of a spectral representation of the signal. The algorithm relies an an accurate model of signal analysis and synthesis, and was constrained to a non-iterative modification of the magnitudes and the wrapped phases of the relevant sinusoidal components of the signal. The structure of the algorithm is described and its performance is illustrated. A few examples of time-expanded wideband speech can be found on the Internet.

CloseRead Abstract

2012

Evolutionary Algorithms and Automatic Transcription of Music

Authors
Reis, G; Fernandez, F; Ferreira, A;

Publication
PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12)

Abstract
The main problem behind Automatic Transcription (Multiple Fundamental Frequency - F0 - Estimation) relies on its complexity. Harmonic collision and partial overlapping create a frequency lattice that is almost impossible to de-construct. Although traditional approaches to this problem of rely mainly in Digital Signal Processing (DSP) techniques, evolutionary algorithms have been applied recently to this problem and achieved competitive results. We describe all evolutionary approaches to the problem of automatic music transcription and how some were improved so they could achieve competitive results. Finally, we show how the best evolutionary approach performs on piano transcription, when compared with the state-of-the-art.

CloseRead Abstract

2011

Concatenative singing voice resynthesis

Authors
Fonseca, N; Ferreira, A; Rocha, AP;

Publication
17th DSP 2011 International Conference on Digital Signal Processing, Proceedings

Abstract
The concept of capturing the sound of something for later replication is not new, and it is used in many synthesizers. But capturing sounds and use them as an audio effect, is less common. This paper presents an approach for the resynthesis of a singing voice, based on concatenative techniques, that uses pre-recorded audio material as an high level semantic audio effect, replacing an original audio recording with the sound of a different singer, while trying to keep the same musical/phonetic performance. © 2011 IEEE.

CloseRead Abstract

2008

A Genetic Algorithm Approach with Harmonic Structure Evolution for Polyphonic Music Transcription

Authors
Reis, G; Fonseca, N; Fernandez, F; Ferreira, A;

Publication
ISSPIT: 8TH IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY

Abstract
This paper presents a Genetic Algorithm approach with Harmonic Structure Evolution for Polyphonic Music Transcription. Automatic Music Transcription is a very complex problem that continues waiting for solutions due to the harmonic complexity of musical sounds. More traditional approaches try to extract the information directly from the audio stream, but by taking into account that a polyphonic audio stream is no more than a combination of several musical notes, music transcription can be addressed as a search space problem where the goal is to find the sequence of notes that best models our audio signal. By taking advantage of the genetic algorithms to explore large search spaces we present a new approach to the music transcription problem. In order to reduce the harmonic overfitting several techniques were used including the encoding of the harmonic structure of the internal synthesizer inside the individual's genotype as a way to evolve towards the instrument played on the original audio signal. The results obtained in polyphonic piano transcriptions show the feasibility of the approach.

CloseRead Abstract

1999

An odd-DFT based approach to time-scale expansion of audio signals

Authors
Ferreira, AJS;

Publication
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING

Abstract
A new time-scale expansion algorithm based on a frequency-scale modification approach combined with tame interpolation is presented. The algorithm is noniterative and is constrained to a blind modification of the magnitudes and phases of the relevant spectral components of the signal, on a frame-by-frame basis. The resulting advantages and limitations are discussed. A few simplified models for signal analysis/synthesis are developed, the most critical of which concern phase and frequency estimation beyond the frequency resolution of the filterbank, The structure of the algorithm is described and its performance is illustrated with both synthetic and natural audio signals.

CloseRead Abstract

2007

Static features in real-time recognition of isolated vowels at high pitch

Authors
Ferreira, AJS;

Publication
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

Abstract
This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowel sounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter. (C) 2007 Acoustical Society of America.

CloseRead Abstract