Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Ricardo Teixeira Sousa

2022

Exploiting BIM Objects for Synthetic Data Generation toward Indoor Point Cloud Classification Using Deep Learning

Autores
Frias, E; Pinto, J; Sousa, R; Lorenzo, H; Diaz Vilarino, L;

Publicação
JOURNAL OF COMPUTING IN CIVIL ENGINEERING

Abstract
Advances in technology are leading to more and more devices integrating sensors capable of acquiring data quickly and with high accuracy. Point clouds are no exception. Therefore, there is increased research interest in the large amount of available light detection and ranging (LiDAR) data by point cloud classification using artificial intelligence. Nevertheless, point cloud labeling is a time-consuming task. Hence the amount of labeled data is still scarce. Data synthesis is gaining attention as an alternative to increase the volume of classified data. At the same time, the amount of Building Information Models (BIMs) provided by manufacturers on website databases is increasing. In line with these recent trends, this paper presents a deep-learning framework for classifying point cloud objects based on synthetic data sets created from BIM objects. The method starts by transforming BIM objects into point clouds deriving a data set consisting of 21 object classes characterized with various perturbation patterns. Then, the data set is split into four subsets to carry out the evaluation of synthetic data on the implemented flexible two-dimensional (2D) deep neural framework. In the latter, binary or greyscale images can be generated from point clouds by both orthographic or perspective projection to feed the network. Moreover, the surface variation feature was computed in order to aggregate more geometric information to images and to evaluate how it influences the object classification. The overall accuracy is over 85% in all tests when orthographic images are used. Also, the use of greyscale images representing surface variation improves performance in almost all tests although the computation of this feature may not be robust in point clouds with complex geometry or perturbations. (C) 2022 American Society of Civil Engineers.

2011

Importance of the relative delay of glottal source harmonics

Autores
Soiisa, R; Ferreira, A;

Publicação
Proceedings of the AES International Conference

Abstract
In this paper we focus on the real-time frequency domain analysis of speech signals, and on the extraction of suitable and perceptually meaningful features that are related to the glottal source and that may pave the way for robust speaker identification and voice register classification. We take advantage of an analysis-synthesis framework derived from an audio coding algorithm in order to estimate and model the relative delays between the different harmonics reflecting the contribution of the glottal source and the group delay of the vocal tract filter. We show in this paper that this approach effectively captures the shape invariance of a periodic signal and may be suited to monitor and extract in real-time perceptually important features correlating well with specific voice registers or with a speaker unique sound signature. A first validation study is described that confirms the competitive performance of the proposed approach in the automatic classification of the breathy, normal and pressed voice phonation types.

2009

A NEW ACCURATE METHOD OF HARMONIC-TO-NOISE RATIO EXTRACTION

Autores
de Sousa, RJT;

Publicação
BIOSIGNALS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING

Abstract
In this paper, an accurate method that estimates the HNR from sustained vowels based on harmonic structure modeling is proposed. Basically, the proposed algorithm creates an accurate harmonic structure where each harmonic is parameterized by frequency, magnitude and phase. The harmonic structure is then synthesized and assumed as the harmonic component of the speech signal. The noise component can be estimated by subtracting the harmonic component from the speech signal. The proposed algorithm was compared to others HNR extraction algorithms based on spectral, cepstral and time domain methods, and using different performance measures.

2010

DFT-based frequency estimation under harmonic interference

Autores
Ferreira, A; Sousa, R;

Publicação
Final Program and Abstract Book - 4th International Symposium on Communications, Control, and Signal Processing, ISCCSP 2010

Abstract
In this paper we address the accurate estimation of the frequency of sinusoids of natural signals such as singing, voice or music. These signals are intrinsicly harmonic and are normally contaminated by noise. Taking the Cramér-Rao Lower Bound for unbiased frequency estimators as a reference, we compare the performance of several DFT-based frequency estimators that are non-iterative and that use the rectangular window or the Hanning window. Tests conditions simulate harmonic interference and two new ArcTan-based frequency estimators are also included in the tests. Conclusions are presented on the relative performance of the different frequency estimators as a function of the SNR. ©2010 IEEE.

2010

Non-iterative frequency estimation in the DFT magnitude domain

Autores
Sousa, R; Ferreira, A;

Publicação
Final Program and Abstract Book - 4th International Symposium on Communications, Control, and Signal Processing, ISCCSP 2010

Abstract
The accurate estimation of the frequency of sinusoids is a frequent problem in many signal processing problems including the real-time analysis of the singing voice. In this paper we rely on a single DFT magnitude spectrum in order to perform frequency estimation in a non-iterative way. Two new frequency estimation methods are derived that are matched to the time analysis window and that reduce the maximum absolute estimation error to about 0.1% of the bin width of the DFT. The performance of these methods is evaluated including the parabolic method as a reference, and considering the influence of noise. A combined model is proposed that offers higher noise robustness than that of a single model. ©2010 IEEE.

2012

Accurate analysis and visual feedback of vibrato in singing

Autores
Ventura, J; Sousa, R; Ferreira, A;

Publicação
5th International Symposium on Communications Control and Signal Processing, ISCCSP 2012

Abstract
Vibrato is a frequency modulation effect of the singing voice and is very relevant in musical terms. Its most important characteristics are the vibrato frequency (in Hertz) and the vibrato extension (in semitones). In singing teaching and learning, it is very convenient to provide a visual feedback of those two objective signal characteristics, in real-time. In this paper we describe an algorithm performing vibrato detection and analysis. Since this capability depends on fundamental frequency (F0) analysis of the singing voice, we first discuss F0 estimation and compare three algorithms that are used in voice and speech analysis. Then we describe the vibrato detection and analysis algorithm and assess its performance using both synthetic and natural singing signals. Overall, results indicate that the relative estimation errors in vibrato frequency and extension are lower than 0.1%. © 2012 IEEE.

  • 3
  • 5