melSpectrogram(___) plots the mel spectrogram on a
surface in the current figure.
Calculate Mel Spectrogram
Use the default settings to calculate the mel spectrogram for an entire audio file. Print the number of bandpass filters in the filter bank and the number of frames in the mel spectrogram.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav'); S = melSpectrogram(audioIn,fs); [numBands,numFrames] = size(S); fprintf("Number of bandpass filters in filterbank: %d\n",numBands)
Number of bandpass filters in filterbank: 32
fprintf("Number of frames in spectrogram: %d\n",numFrames)
Number of frames in spectrogram: 1551
Plot the mel spectrogram.
Calculate Mel Spectrums of 2048-Point Windows
Calculate the mel spectrums of 2048-point periodic Hann windows with 1024-point overlap. Convert to the frequency domain using a 4096-point FFT. Pass the frequency-domain representation through 64 half-overlapped triangular bandpass filters that span the range 62.5 Hz to 8 kHz.
[audioIn,fs] = audioread('FunkyDrums-44p1-stereo-25secs.mp3'); S = melSpectrogram(audioIn,fs, ... 'Window',hann(2048,'periodic'), ... 'OverlapLength',1024, ... 'FFTLength',4096, ... 'NumBands',64, ... 'FrequencyRange',[62.5,8e3]);
melSpectrogram again, this time with no output arguments so that you can visualize the mel spectrogram. The input audio is a multichannel signal. If you call
melSpectrogram with a multichannel input and with no output arguments, only the first channel is plotted.
melSpectrogram(audioIn,fs, ... 'Window',hann(2048,'periodic'), ... 'OverlapLength',1024, ... 'FFTLength',4096, ... 'NumBands',64, ... 'FrequencyRange',[62.5,8e3])
Get Filter Bank Center Frequencies and Analysis Window Time Instants
melSpectrogram applies a frequency-domain filter bank to audio signals that are windowed in time. You can get the center frequencies of the filters and the time instants corresponding to the analysis windows as the second and third output arguments from
Get the mel spectrogram, filter bank center frequencies, and analysis window time instants of a multichannel audio signal. Use the center frequencies and time instants to plot the mel spectrogram for each channel.
[audioIn,fs] = audioread('AudioArray-16-16-4channels-20secs.wav'); [S,cF,t] = melSpectrogram(audioIn,fs); S = 10*log10(S+eps); % Convert to dB for plotting for i = 1:size(S,3) figure(i) surf(t,cF,S(:,:,i),'EdgeColor','none'); xlabel('Time (s)') ylabel('Frequency (Hz)') view([0,90]) title(sprintf('Channel %d',i)) axis([t(1) t(end) cF(1) cF(end)]) end
audioIn — Audio input
column vector | matrix
Audio input, specified as a column vector or matrix. If specified as a matrix, the function treats columns as independent audio channels.
fs — Input sample rate (Hz)
Input sample rate in Hz, specified as a positive scalar.
Specify optional pairs of arguments as
the argument name and
Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
Window — Window applied in time domain
hamming(round(fs*0.3),'periodic') (default) | vector
OverlapLength — Analysis window overlap length (samples)
round(0.02* (default) | integer in the range
Analysis window overlap length in samples, specified as the comma-separated pair
'OverlapLength' and an integer in the range
WindowLength - 1)]
FFTLength — Number of DFT points
WindowLength (default) | positive integer
Number of points used to calculate the DFT, specified as the comma-separated pair
'FFTLength' and a positive integer greater than or
WindowLength. If unspecified,
FFTLength defaults to
NumBands — Number of mel bandpass filters
32 (default) | positive integer
Number of mel bandpass filters, specified as the comma-separated pair consisting
'NumBands' and a positive integer.
FrequencyRange — Frequency range over which to compute mel spectrogram (Hz)
[0 (default) | two-element row vector
Frequency range over which to compute the mel spectrogram in Hz, specified as the
comma-separated pair consisting of
'FrequencyRange' and a
two-element row vector of monotonically increasing values in the range
SpectrumType — Type of mel spectrogram
'power' (default) |
Type of mel spectrogram, specified as the comma-separated pair consisting of
WindowNormalization — Apply window normalization
true (default) |
Apply window normalization, specified as the comma-separated pair consisting of
WindowNormalization is set to
true, the power (or magnitude) in the mel spectrogram is
normalized to remove the power (or magnitude) of the time domain
FilterBankNormalization — Type of filter bank normalization
'bandwidth' (default) |
Type of filter bank normalization, specified as the comma-separated pair
S — Mel spectrogram
column vector | matrix | 3-D array
Mel spectrogram, returned as a column vector, matrix, or 3-D array. The dimensions
Trailing singleton dimensions are removed from the output
F — Center frequencies of mel bandpass filters (Hz)
Center frequencies of mel bandpass filters in Hz, returned as a row vector with
T — Location of each window of audio (s)
Location of each analysis window of audio in seconds, returned as a row vector
size(. The location corresponds to
the center of each window.
melSpectrogram function follows the general algorithm to compute
a mel spectrogram as described in .
In this algorithm, the audio input is first buffered into frames of
numel( number of samples. The frames are
OverlapLength number of samples. The specified
Window is applied to each frame, and then the frame is converted to
frequency-domain representation with
FFTLength number of points. The
frequency-domain representation can be either magnitude or power, specified by
WindowNormalization is set to
true, the spectrum is normalized by the window. Each frame of the
frequency-domain representation passes through a mel filter bank. The spectral values output
from the mel filter bank are summed, and then the channels are concatenated so that each frame
is transformed to a
NumBands-element column vector.
Filter Bank Design
The mel filter bank is designed as half-overlapped triangular filters equally spaced on
the mel scale.
NumBands controls the number of mel bandpass filters.
FrequencyRange controls the band edges of the first and last filters
in the mel filter bank.
FilterBankNormalization specifies the type of
normalization applied to the individual bands.
 Rabiner, Lawrence R., and Ronald W. Schafer. Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Pearson, 2010.
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Version HistoryIntroduced in R2019a
WindowLength will be removed in a future release
WindowLength parameter will be removed from the
melSpectrogram function in a future release. Use the
Window parameter instead.
In releases prior to R2020b, you could only specify the length of a time-domain window. The window was always designed as a periodic Hamming window. You can replace instances of the code
S = melSpectrogram(audioin,fs,'WindowLength',1024);
S = melSpectrogram(audioIn,fs,'Window',hamming(1024,'periodic'));