2020
Autores
Ferreira, A; Silva, J; Brito, F; Sinha, D;
Publicação
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING
Abstract
Harmonic representation models are widely used, notably in speech coding and synthesis. In this paper, we describe two fully parametric harmonic representation and signal reconstruction alternatives that rely on a shift-invariant harmonic phase model and that implement accurate frame-based synthesis in the frequency-domain, and accurate pitch pulse-based synthesis in the time-domain. We use natural spoken and sung voice signals in order to assess the objective and subjective quality of both alternatives when parameters are exact, and when they are replaced by compact and shift-invariant harmonic phase and magnitude approximation models. We highlight the flexibility of these models and present results indicating that not only does the compact shift-invariant phase model cause a smaller impact than that caused by harmonic magnitude modeling, but it also compares favorably to results presented in the literature.
2020
Autores
Silva, JP; Oliveira, MA; Cardoso, CF; Ferreira, AJ;
Publicação
IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
Abstract
In this paper, we present a computationally efficient and fully parametric harmonic speech model that is suitable for real-time flexible frame-based analysis and synthesis implementation in the frequency domain. We carry out a performance comparison between this vocoder and similar ones, such as WORLD and HPMD. Then, a deliberate manipulation of the speaker's fundamental frequency micro-variations is performed in order to understand in which way it conveys prosodic and idiosyncratic information. We conclude our discussion by evaluating the impact of these manipulations through the realization of perceptual tests. © 2020 IEEE.
2021
Autores
Silva, J; Oliveira, M; Ferreira, A;
Publicação
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020)
Abstract
Whispered-voice to normal-voice conversion is typically achieved using codec-based analysis and re-synthesis, using statistical conversion of important spectral and prosodic features, or using data-driven end-to-end signal conversion. These approaches are however highly constrained by the architecture of the codec, the statistical projection, or the size and quality of the training data. In this paper, we presume direct implantation of voiced phonemes in whispered speech and we focus on fully flexible parametric models that i) can be independently controlled, ii) synthesize natural and linguistically correct voiced phonemes, iii) preserve idiosyncratic characteristics of a given speaker, and iv) are amenable to co-articulation effects through simple model interpolation. We use natural spoken and sung vowels to illustrate these capabilities in a signal modeling and re-synthesis process where spectral magnitude, phase structure, F-0 contour and sound morphing can be independently controlled in arbitrary ways.
2023
Autores
Silva, JM; Oliveira, MA; Saraiva, AF; Ferreira, AJS;
Publicação
ACOUSTICS
Abstract
The estimation of the frequency of sinusoids has been the object of intense research for more than 40 years. Its importance in classical fields such as telecommunications, instrumentation, and medicine has been extended to numerous specific signal processing applications involving, for example, speech, audio, and music processing. In many cases, these applications run in real-time and, thus, require accurate, fast, and low-complexity algorithms. Taking the normalized Cramer-Rao lower bound as a reference, this paper evaluates the relative performance of nine non-iterative discrete Fourier transform-based individual sinusoid frequency estimators when the target sinusoid is affected by full-bandwidth quasi-harmonic interference, in addition to stationary noise. Three levels of the quasi-harmonic interference severity are considered: no harmonic interference, mild harmonic interference, and strong harmonic interference. Moreover, the harmonic interference is amplitude-modulated and frequency-modulated reflecting real-world conditions, e.g., in singing and musical chords. Results are presented for when the Signal-to-Noise Ratio varies between -10 dB and 70 dB, and they reveal that the relative performance of different frequency estimators depends on the SNR and on the selectivity and leakage of the window that is used, but also changes drastically as a function of the severity of the quasi-harmonic interference. In particular, when this interference is strong, the performance curves of the majority of the tested frequency estimators collapse to a few trends around and above 0.4% of the DFT bin width.
2023
Autores
Silva, JM; Nogueira, AR; Pinto, J; Alves, AC; Sousa, R;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I
Abstract
Effective quality control is essential for efficient and successful manufacturing processes in the era of Industry 4.0. Artificial Intelligence solutions are increasingly employed to enhance the accuracy and efficiency of quality control methods. In Computer Numerical Control machining, challenges involve identifying and verifying specific patterns of interest or trends in a time-series dataset. However, this can be a challenge due to the extensive diversity. Therefore, this work aims to develop a methodology capable of verifying the presence of a specific pattern of interest in a given collection of time-series. This study mainly focuses on evaluating One-Class Classification techniques using Linear Frequency Cepstral Coefficients to describe the patterns on the time-series. A real-world dataset produced by turning machines was used, where a time-series with a certain pattern needed to be verified to monitor the wear offset. The initial findings reveal that the classifiers can accurately distinguish between the time-series' target pattern and the remaining data. Specifically, the One-Class Support Vector Machine achieves a classification accuracy of 95.6 % +/- 1.2 and an F1-score of 95.4 % +/- 1.3.
2023
Autores
Oliveira, M; Almeida, V; Silva, J; Ferreira, A;
Publicação
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Abstract
Cricket sounds are usually regarded as pleasant and, thus, can be used as suitable test signals in psychoacoustic experiments assessing the human listening acuity to specific temporal and spectral features. In addition, the simple structure of cricket sounds makes them prone to reverse engineering such that they can be analyzed and re-synthesized with desired alterations in their defining parameters. This paper describes cricket sounds from a parametric point of view, characterizes their main temporal and spectral features, namely jitter, shimmer and frequency sweeps, and explains a re-synthesis process generating modified natural cricket sounds. These are subsequently used in listening tests helping to shed light on the sound identification and discrimination capabilities of humans that are important, for example, in voice recognition. © 2023 IEEE.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.