Voice Activity Detector
Detect presence of speech in audio signal
 Library:
Audio Toolbox / Measurements
Description
The Voice Activity Detector block detects the presence of speech in an audio signal. You can also use the Voice Activity Detector block to output an estimate of the noise variance per frequency bin.
Ports
Input
Output
Parameters
Model Examples
Block Characteristics
Data Types 

Direct Feedthrough 

Multidimensional Signals 

VariableSize Signals 

ZeroCrossing Detection 

Algorithms
The Voice Activity Detector implements the algorithm described in [1].
If Domain of the input is specified as
Time
, the input signal is windowed and then converted to
the frequency domain according to the Window, Sidelobe
attenuation of the window (dB), and FFT length
parameters. If Domain of the input is specified as
Frequency
, the input is assumed to be a windowed discrete
time Fourier transform (DTFT) of an audio signal. The signal is then converted to the
power domain. Noise variance is estimated according to [2]. The posterior and
prior SNR are estimated according to the Minimum MeanSquare Error (MMSE) formula
described in [3]. A log likelihood
ratio test with a Hidden Markov Model (HMM)based hangover scheme is used, according to
[1].
References
[1] Sohn, Jongseo., Nam Soo Kim, and Wonyong Sung. "A Statistical ModelBased Voice Activity Detection." Signal Processing Letters IEEE. Vol. 6, No. 1, 1999.
[2] Martin, R. "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics." IEEE Transactions on Speech and Audio Processing. Vol. 9, No. 5, 2001, pp. 504–512.
[3] Ephraim, Y., and D. Malah. "Speech Enhancement Using a Minimum MeanSquare Error ShortTime Spectral Amplitude Estimator." IEEE Transactions on Acoustics, Speech, and Signal Processing. Vol. 32, No. 6, 1984, pp. 1109–1121.
Extended Capabilities
Version History
Introduced in R2018a