How to estimate a speech sound' fundamental frequency

10 views (last 30 days)
Hello,
I had use the code 'YIN' from http://audition.ens.fr/adc/sw/yin.zip provided by Alain de Cheveigné.
In this code, it had a parameter P.hop % samples - interval between estimates (default: 32)
I found someone said, the frame(interval) needs to be longer then 2*fundamental period to present the characteristic of speech, but also not to long to keep the estimate correct.
If my speech file's fundamental frequency is about 200 Hz, then it's fundamental period is 1/200 sec?
And the file's sampling rate is 44100 Hz, so I need to set the frame larger then 1/200*2*44100=441?
The output of YIN, R.f0 % fundamental frequency in octaves re: 440 Hz, some if I want to get the speech file's fundamental frequency contour, F0=2.^R.f0*440; then the F0 was the fundamental frequency contour of speech?
But, if I do that, the F0 vector had some value was not correct(higher then 500 Hz to about 1000~2000 Hz). Is that because the frame was non voicing speech so it's was totally wrong estimate? And can I just set the value which is larger then 500 to NAN in F0 vector to present the non voicing part of speech is no fundamental frequency?
Thanks.

Answers (1)

Wayne King
Wayne King on 18 Feb 2013
Edited: Wayne King on 18 Feb 2013
Often, you want to lowpass filter your speech waveform before you attempt to extract the fundamental frequency.
Have you seen these two examples in the Signal Processing Toolbox documentation:
As far your questions:
"If my speech file's fundamental frequency is about 200 Hz, then it's fundamental period is 1/200 sec?
And the file's sampling rate is 44100 Hz, so I need to set the frame larger then 1/200*2*44100=441?"
A 200 Hz oscillation sampled at 44.1 kHz has a period of 220.5 samples.
dt = 1/44100;
T = 1/200;
N = T/dt
So 2 periods would be 441 samples as you state. However, in my experience you need more than 2 periods to make accurate estimates, so I would consider increasing that length if you can. 441 samples is only 0.01 seconds, or 10 milliseconds. You should be able to find vowel sounds in the speech signal longer than that in duration.
  3 Comments
Wayne King
Wayne King on 18 Feb 2013
Without having your data, it's very difficult to answer this question.
Eason
Eason on 18 Feb 2013
The above link was my speech file, it's a chinese sentence speak by a female.
And this is the code YIN: A F0 estimator.
My final goal was to create a sine wave which frequency was dynamic change as same as speech file's F0.
So I need to estimate the speech's dynamic change of fundamental frequency. That means if speech was 2.5 sec long and then the sine wave was 2.5 sec long. if speech's 0.5~0.52 sec part's F0 was 226 Hz then the sine wave's 0.5~0.52 sec part was also 226 Hz.
As I know this sine wave seems can use frest.Sinestream function to create. But I am confuse about extract the dynamic change of fundamental frequency of speech(Speech's F0 contour).
Thanks very much:)

Sign in to comment.

Categories

Find more on Signal Processing Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!