View License

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Highlights from
HTK MFCC MATLAB

Join the 15-year community celebration.

Play games and win prizes!

» Learn more

5.0
5.0 | 12 ratings Rate this file 154 Downloads (last 30 days) File Size: 307 KB File ID: #32849 Version: 1.2
image thumbnail

HTK MFCC MATLAB

by

Kamil Wojcicki (view profile)

 

11 Sep 2011 (Updated )

Mel frequency cepstral coefficient feature extraction that closely matches that of HTK's HCopy.

| Watch this File

File Information
Description

Computes mel frequency cepstral coefficient (MFCC) features from a given speech signal. The speech signal is first preemphasised using a first order FIR filter with preemphasis coefficient. The preemphasised speech signal is subjected to the short-time Fourier transform analysis with a specified frame duration, frame shift and analysis window function. This is followed by magnitude spectrum computation, followed by filterbank design with M triangular filters uniformly spaced on the mel scale between lower and upper frequency limits. The filterbank is applied to the magnitude spectrum values to produce filterbank energies (FBEs). Log-compressed FBEs are then decorrelated using the discrete cosine transform to produce cepstral coefficients. Final step applies sinusoidal lifter to produce liftered MFCCs that closely match those produced by HTK. Demo scripts are included.

Acknowledgements

Triangular Filterbank, File I/O For Cell Arrays, and Framing Routines inspired this file.

Required Products Signal Processing Toolbox
MATLAB release MATLAB 7.10 (R2010a)
Other requirements HTK, RASTAMAT, VOICEBOX
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (24)
13 May 2016 Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Donal.

Yes, the zeroth coefficient is included in the output from mfcc.m. This function emulates HTK's MFCC_0 feature computation, which includes the zeroth coefficient (i.e., the _0 modifier).

Note, however, that for plotting purposes in the included example, the zeroth coefficient was discarded. See example.m, and specifically the following lines:
subplot( 313 );
imagesc( time_frames, [1:C], MFCCs(2:end,:) ); % HTK's TARGETKIND: MFCC
%imagesc( time_frames, [1:C+1], MFCCs ); % HTK's TARGETKIND: MFCC_0

HTH,
Kamil

Comment only
13 May 2016 Donal O Sullivan

Hi Does this MFCC calculation include the c(0) (first MFCC) in the output?

Comment only
05 May 2016 Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Mehmet, the triangular filterbank function implementation is based on information for a speech processing book, a reference to which is included in trifbank.m. HTH, Kamil

Comment only
05 May 2016 Mehmet Kazanç

hi, where is your article , this is stated in trifbank script.
I am trying to built mel filter bank

thanks

Comment only
10 Apr 2016 Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Ankur,

Could you elaborate on what you mean?

If you are wondering how to load audio from a file and extract features using the mfcc function, take a look at example.m.

Note that for newer MATLAB releases you may want to replace wavread with audio read, i.e.,

[ x, fs ] = audioread( wav_file );

Hope that helps.

Comment only
08 Apr 2016 Ankur Kalita

Is there any specification for the input audio file?

Comment only
05 Apr 2016 Viet Nguyen Van  
21 Mar 2016 Kamil Wojcicki

Kamil Wojcicki (view profile)

Hi Yi,

In general you don't really have to do that. It is just that here we are trying to match the output of HTK feature extractor when it reads in audio data as 16-bit signed shorts. With this, you can compare directly the output features generated using this MATLAB routine with the corresponding features extracted with HTK (as demonstrated in compare.m).

Beyond that, i.e., if you are not comparing w/ HTK and just are looking to extract features for some task, you can drop this scaling.

HTH,
Kamil

Comment only
21 Mar 2016 yi wu

yi wu (view profile)

i am confused about the sentence
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
% Explode samples to the range of 16 bit shorts
if( max(abs(speech))<=1 ), speech = speech * 2^15; end;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
why speech need to multiply 2^15 before calculate STFT? Can someone kindly help to answer it? Thank you!

29 Jul 2015 Kamil Wojcicki

Kamil Wojcicki (view profile)

Yibo, the overlap used is as defined in:

Huang, X., Acero, A., Hon, H., 2001. Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice Hall, Upper Saddle River, NJ, USA (pp. 314-315).

Comment only
29 Jul 2015 Kamil Wojcicki

Kamil Wojcicki (view profile)

Olessya, what is the dimensionality of your input vector? i.e., what is size(speech)? It must be a vector and not a matrix.

Comment only
28 Jul 2015 Olessya Medvedeva

hi, i am trying to use your code but it gives me the usage error:
[ MFCCs, FBEs, frames ] = ...
mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );
Error using vec2frames (line 83)
usage: [ frames, indexes ] = vec2frames( vector, frame_length, frame_shift, direction, window, padding );

Error in mfcc (line 151)
frames = vec2frames( speech, Nw, Ns, 'cols', window, false );

Do you have any idea what the problem might be? Thank you

26 Jul 2015 Yibo Yang

Quick question Kamil: how can I tweak the trifbank code so that I can generate triangular filters with, say, 50% overlaps in the mel scale?
Thanks for your work!

18 Jun 2015 Kamil Wojcicki

Kamil Wojcicki (view profile)

Brittany, are you using the provided example.m with sp10.wav, or your own audio files? If the audio file you are using happens to have long sections of zero only samples, that could explain NaN MFCC values. If that is the case, you could add some very low level noise to your audio samples, e.g.,

speech = speech + randn(size(speech))*1E-10;

Hope this helps.

Comment only
18 Jun 2015 Brittany Davis

I get some NaN values in the MFCC variable. Why is that so?

Comment only
24 Jul 2014 clarissa yong

Does anyone know which file should I run to achieve the final outcome? please help,thanks!!

13 Jul 2014 Adnan Farooq

In case of sequence of images.. how can we use MFCC? 1-> we convert each frame to 2D/3D to 1D vector. but i am confuse how can i use these parameters ?
"fs, Tw, Ts, alpha, window, R, M, N, L"

Comment only
16 Mar 2014 Agus Reza

Telkom University Indonesia - was here :D

06 Jan 2014 wuhan institute of technology  
31 May 2013 Christophe  
16 May 2013 yingxue wang

NOT BAD

04 Feb 2013 Saurabh Verma  
21 Dec 2012 Lehigh

Lehigh (view profile)

very good!

05 Sep 2012 FJK

FJK (view profile)

 
Updates
19 Sep 2011 1.2

Title change

Contact us