input shape to the LSTM net when doing inference for VAD tasks

Question

0 votes

Hi, I am following this article to train a LSTM network for VAD tasks: https://www.mathworks.com/help/deeplearning/ug/voice-activity-detection-in-noise-using-deep-learning.html

My question is, when testing a trained LSTM network, as in the article did, the input data is not shaped as the training input as (#frames, #time_steps, #features), does this mean, when doing inference, the trained LSTM network will take each frame as a input independetly, and classify if this frame is noise or voice, so basically there is no hidden states used when doing inference, am I right?

Thank you in advance!

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Brian Hemmat on 7 Mar 2023

0 votes

I did not look at the dimensions you're discussing, but I can say that you are correct that the "streaming" code in the example classifies chunks independently. Note that it is calling classify and not classifyAndUpdateState.

Stay tuned for the R2023a release, where we have updated the example to maintain state (should be coming in the next few weeks).

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

input shape to the LSTM net when doing inference for VAD tasks

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

input shape to the LSTM net when doing inference for VAD tasks

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments