AI for Audio
Audio Toolbox™ provides functionality to develop machine and deep learning solutions for audio, speech, and acoustic applications including speaker identification, speech command recognition, speech separation, acoustic scene recognition, denoising, and many more.
audioDatastoreto ingest large audio data sets and process files in parallel.
Use Signal Labeler to build audio data sets by annotating audio recordings manually and automatically.
audioDataAugmenterto create randomized pipelines of built-in or custom signal processing methods for augmenting and synthesizing audio data sets.
audioFeatureExtractorto extract combinations of different features while sharing intermediate computations.
Audio Toolbox also provides access to third-party APIs for text-to-speech and speech-to-text, and it includes pretrained models so that you can perform transfer learning, classify sounds, and extract feature embeddings. Using pretrained networks requires Deep Learning Toolbox™.
Apply AI workflows to audio applications
- Dataset Management and Labeling
Ingest, create, and label large data sets
- Feature Extraction
Mel spectrogram, MFCC, pitch, spectral descriptors
- Data Augmentation
Augmentation pipelines, shift pitch and time, stretch time, control volume and noise
Detect and isolate speech and other sounds
- Pretrained Models
Transfer learning, sound classification, feature embeddings, pretrained audio deep learning networks
- Speech Transcription and Synthesis
Use a pretrained model or third-party APIs for text-to-speech and speech-to-text
- Code Generation and GPU Support
Generate portable C/C++/MEX functions and use GPUs to deploy or accelerate processing