File Exchange

image thumbnail

HTK MFCC MATLAB

version 1.2 (307 KB) by

Mel frequency cepstral coefficient feature extraction that closely matches that of HTK's HCopy.

273 Downloads

Updated

View License

Computes mel frequency cepstral coefficient (MFCC) features from a given speech signal. The speech signal is first preemphasised using a first order FIR filter with preemphasis coefficient. The preemphasised speech signal is subjected to the short-time Fourier transform analysis with a specified frame duration, frame shift and analysis window function. This is followed by magnitude spectrum computation, followed by filterbank design with M triangular filters uniformly spaced on the mel scale between lower and upper frequency limits. The filterbank is applied to the magnitude spectrum values to produce filterbank energies (FBEs). Log-compressed FBEs are then decorrelated using the discrete cosine transform to produce cepstral coefficients. Final step applies sinusoidal lifter to produce liftered MFCCs that closely match those produced by HTK. Demo scripts are included.

Comments and Ratings (58)

Dear All:

How Can I run the programm please, Can somebody send me the manual please ?
mohgamal1@yahoo.com

Hello, i have to extract MFCC coefficients for a project for school but when i tried the exemple.m file the output are the plots. how can i get the coefficients please. thank you for your help

Kamil Wojcicki

Kamil Wojcicki (view profile)

Glad you got it resolved Deep. Thanks for sharing the solution here!

Deep

Deep (view profile)

Hi Kamil ! Thanks a lot for great support during resolution of this issue. Issue resolved now
It pertains to the AMD A8-7410 APU . Which is prone to crash during matrix multiplication. finally setting the environmental variables to " MKL_CBWR=AVX" and" MKL_DEBUG_CPU_TYPE=4" helped. Answer by Chestor Gillon.. Thanks a lot Mathworks !

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Deep. The whole MATLAB is crashing? Or, you get an error? What are the inputs N and M set to? When you type-in "dctm" by itself in MATLAB prompt, what do you get?

Deep

Deep (view profile)

Hi. My MATLAB R 2014a is crashing every time i reach the code line " DCT = dctm( N, M );" in the given example.
some help please.

Deep

Deep (view profile)

Thanks, Kamil for a quick and sure shot answer..

Kamil Wojcicki

Kamil Wojcicki (view profile)

Fizza, you are getting 13 coefficients for every audio frame. To compute delta coefficients refer to the HTK book. If you don't want to implement this in MATLAB yourself, just export the MFCCs to HTK format and then use HTK to append delta and delta-delta coefficients.

Deep, does "help vec2frames" produce error for you? If so, ensure _all_ helper functions included in the submission are in MATLAB path.

Kamil Wojcicki

Kamil Wojcicki (view profile)

Fizza, you are getting 13 coefficients for every audio frame. To compute delta coefficients refer to the HTK book. If you don't want to implement this in MATLAB yourself, just export the MFCCs to HTK format and then use HTK to append delta and delta-delta coefficients.

Deep

Deep (view profile)

Hi , I am unable to run the code using Matlab R 2014A in my win 7 PC. I am getting error like function vec2frames is not defined for variable type double. and for type single. i tried both using audioread('filename', 'native') and audioread('filename', 'double') .

Anand Mohan

dear Kamil
thank you for this file exchange...
i required only 13 coefficients for 100-1600 Hz freq.
now it is giving FBE 13*93, CC=256*93, frames 256*93
Ts = 10;
alpha = 0.97;
R = [ 100 1600 ];
M = 13;
C = 13;
L = 22;
what i should do?
secondly can you help to calculate delta and double delta?

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Wilmer, Have you looked at example.m? There the MFCCs are computed in this line:

% Feature extraction (feature vectors as columns)
[ MFCCs, FBEs, frames ] = mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );

Remember to update wavread to audioread as described in the comment below.

HTH,
Kamil

i am lost , in what part i can obtain the mel coeficients, I supouse are 12 ? please help me

safa zighem

thanks, you have been really helpful!

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Salvatore,

If the audio file has multiple channels, then yes, you'll get a matrix after loading the file into MATLAB. For example:

>> [x,fs]=audioread('speech.wav');
>> size(x)
ans =
46417 2

This means that there are two channels of audio, each with 46417 samples. It really depends on your task what you do with it. For example, if you:

>> x=x(:);
size(x)
ans =
92834 1

then you essentially concatenating the samples for the two channels, i.e., samples from the second channel are appended after the last sample of the first channel. You can verify this by plotting the signal waveform and/or spectrogram. Now, if you pass this concatenated x vector through the mfcc() function it will extract the features as expected.

One alternative would be loop over each channel and pass one channel at the time to the mfcc() function to get only the features for that channel at a time.

HTH,
Kamil

Thanks for the answer! I have an another problem. I read that "speech" has to be a vector not a matrix, I compute it and I get a matrix,so I changed it in a vector with function speech1 = speech(:). Do you think that it is a correct transformation to obtain the mfcc conefficients?

Kamil Wojcicki

Kamil Wojcicki (view profile)

MATLAB has removed the wavread and wavwrite functions some releases ago (and I haven't got around to revising the FileExchange submissions). They replaced these with audioread and audiowrite functions, so you'll have to use those. Note that API is somewhat different (e.g., those do not return nbits as the third output).

So in the case you are pointing out, you would replace the wavread line with:
[ speech, fs ] = audioread( 'trail.wav' );

HTH,
Kamil

[speech, fs, nbits ] = wavread( 'trial.wav' );
Undefined function or variable 'wavread'.

why?I install toolbox

Jovan Galic

Jovan Galic

Once again, thank you very much!

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Jovan,

Unfortunately there isn't.

One possibility would be to cite HTK documentation, as well as to provide the link to the implementation in a footnote, e.g.:

"The MFCC features were extracted according to [1] using MATLAB.^#"

# The MATLAB-based MFCC routines can be found at: http://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab

[1] Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., 2006. The HTK Book (HTK Version 3.4.1). Engineering Department, Cambridge University.

Alternatively, you could cite the original paper by Davis and Mermelstein (1980).

Unless you are doing something like assessing various implementations (e.g., Ganchev et al., 2005), I would say that the footnote part is optional given how standard this task is. I would just ensure to state the relevant settings used in your task in the methods section.

Best,
Kamil

Jovan Galic

Is there some conference or article paper for citing this code in MATLAB ?
Regards,
Jovan

Jovan Galic

Kamil, thank you for clear explanation.

Regards,
Jovan

Lucas R

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Jovan,

>> And how the code would be if warping function is between mel and linear for example warp = 0.5 * hz?

Well, 0.5*hz is linear. You'll need a pair of nonlinear forward/backward warping functions instead if you want the filters to be non-uniformly spaced on the Hz scale.

>> Is it enough to change only mel2hz and hz2mel?

Yes. Simply assign function handles to these backward and forward warping functions, respectively.

See the documentation for the trifbank function. Use the example provided there to visualize the triangular filterbanks for the different warping functions you may want to try.

HTH,
Kamil

Jovan Galic

And how the code would be if warping function is between mel and linear for example warp = 0.5 * hz?

Is it enough to change only mel2hz and hz2mel?

Regards.

Jovan Galic

Dear Kamil,

thank you very much for very helpful and quick reply!

Best regards.

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Jovan,

Modify mfcc.m as follows:

add:
lin = @(x)(x);

replace:
hz2mel = @( hz )( 1127*log(1+hz/700) ); % Hertz to mel warping function
mel2hz = @( mel )( 700*exp(mel/1127)-700 ); % mel to Hertz warping function

with:
hz2mel = lin;
mel2hz = lin;

HTH,
Kamil

Jovan Galic

Hello!

How and where modify the code to get LFCC feature vectors, where triangular filters are uniformaly distributed over linear (not mel) frequency scale?
In some applications lfcc show greater robustness.

Regards!

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Donal.

Yes, the zeroth coefficient is included in the output from mfcc.m. This function emulates HTK's MFCC_0 feature computation, which includes the zeroth coefficient (i.e., the _0 modifier).

Note, however, that for plotting purposes in the included example, the zeroth coefficient was discarded. See example.m, and specifically the following lines:
subplot( 313 );
imagesc( time_frames, [1:C], MFCCs(2:end,:) ); % HTK's TARGETKIND: MFCC
%imagesc( time_frames, [1:C+1], MFCCs ); % HTK's TARGETKIND: MFCC_0

HTH,
Kamil

Hi Does this MFCC calculation include the c(0) (first MFCC) in the output?

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Mehmet, the triangular filterbank function implementation is based on information for a speech processing book, a reference to which is included in trifbank.m. HTH, Kamil

hi, where is your article , this is stated in trifbank script.
I am trying to built mel filter bank

thanks

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Ankur,

Could you elaborate on what you mean?

If you are wondering how to load audio from a file and extract features using the mfcc function, take a look at example.m.

Note that for newer MATLAB releases you may want to replace wavread with audio read, i.e.,

[ x, fs ] = audioread( wav_file );

Hope that helps.

Ankur Kalita

Is there any specification for the input audio file?

Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Yi,

In general you don't really have to do that. It is just that here we are trying to match the output of HTK feature extractor when it reads in audio data as 16-bit signed shorts. With this, you can compare directly the output features generated using this MATLAB routine with the corresponding features extracted with HTK (as demonstrated in compare.m).

Beyond that, i.e., if you are not comparing w/ HTK and just are looking to extract features for some task, you can drop this scaling.

HTH,
Kamil

yi wu

yi wu (view profile)

i am confused about the sentence
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
% Explode samples to the range of 16 bit shorts
if( max(abs(speech))<=1 ), speech = speech * 2^15; end;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
why speech need to multiply 2^15 before calculate STFT? Can someone kindly help to answer it? Thank you!

Kamil Wojcicki

Kamil Wojcicki (view profile)

Yibo, the overlap used is as defined in:

Huang, X., Acero, A., Hon, H., 2001. Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice Hall, Upper Saddle River, NJ, USA (pp. 314-315).

Kamil Wojcicki

Kamil Wojcicki (view profile)

Olessya, what is the dimensionality of your input vector? i.e., what is size(speech)? It must be a vector and not a matrix.

hi, i am trying to use your code but it gives me the usage error:
[ MFCCs, FBEs, frames ] = ...
mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );
Error using vec2frames (line 83)
usage: [ frames, indexes ] = vec2frames( vector, frame_length, frame_shift, direction, window, padding );

Error in mfcc (line 151)
frames = vec2frames( speech, Nw, Ns, 'cols', window, false );

Do you have any idea what the problem might be? Thank you

Yibo Yang

Quick question Kamil: how can I tweak the trifbank code so that I can generate triangular filters with, say, 50% overlaps in the mel scale?
Thanks for your work!

Kamil Wojcicki

Kamil Wojcicki (view profile)

Brittany, are you using the provided example.m with sp10.wav, or your own audio files? If the audio file you are using happens to have long sections of zero only samples, that could explain NaN MFCC values. If that is the case, you could add some very low level noise to your audio samples, e.g.,

speech = speech + randn(size(speech))*1E-10;

Hope this helps.

I get some NaN values in the MFCC variable. Why is that so?

clarissa yong

Does anyone know which file should I run to achieve the final outcome? please help,thanks!!

Adnan Farooq

In case of sequence of images.. how can we use MFCC? 1-> we convert each frame to 2D/3D to 1D vector. but i am confuse how can i use these parameters ?
"fs, Tw, Ts, alpha, window, R, M, N, L"

Agus Reza

Telkom University Indonesia - was here :D

Christophe

yingxue wang

NOT BAD

Lehigh

Lehigh (view profile)

very good!

FJK

FJK (view profile)

Updates

1.2

Title change

MATLAB Release
MATLAB 7.10 (R2010a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Win prizes and improve your MATLAB skills

Play today