Automating an Audio Labeling Workflow with Deep Learning for Voice Activity Detection
Dr. Ramakrishnan Raman, Honeywell
Vasantha Paulraj, Honeywell
Deep learning models require labeled data for training purposes. Labeling of the data is an important step. For audio files, labeling involves analyzing segments of audio, listening to them, and manually assigning appropriate labels for specific time slots in the audio files. However, such an audio labeling workflow is a labor-intensive process and slows the development cycle time. For instance, to train a deep learning model to classify segments into speech or noise for voice activity detection (VAD), we need to label the speech and noise segments in the audio files. In this presentation, we’ll discuss our experience using a pretrained deep learning model in a labeling algorithm and show how we adopted automation of an audio labeling workflow towards VAD, thereby reducing the development cycle time.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.