Main Content

Pretrained Networks

Transfer learning, sound classification, feature embeddings

Audio Toolbox™ provides the pretrained VGGish and YAMNet networks. Use the vggish and yamnet functions to interact directly with the pretrained networks. The classifySound function performs required preprocessing and postprocessing for YAMNet so that you can locate and classify sounds into one of 521 categories. You can explore the YAMNet ontology using the yamnetGraph function. The vggishFeatures function performs the necessary preprocessing and postprocessing for VGGish so that you can extract feature embeddings to input to machine learning and deep learning systems.

This functionality requires Deep Learning Toolbox™.


expand all

vggishFeaturesExtract VGGish features
vggishVGGish neural network
vggishPreprocessPreprocess audio for VGGish feature extraction
classifySoundClassify sounds in audio signal
yamnetYAMNet neural network
yamnetGraphGraph of YAMNet AudioSet ontology
yamnetPreprocessPreprocess audio for YAMNet classification
openl3OpenL3 neural network
openl3PreprocessPreprocess audio for OpenL3 feature extraction
openl3FeaturesExtract OpenL3 features
crepeCREPE neural network
crepePreprocessPreprocess audio for CREPE deep learning network
crepePostprocessPostprocess output of CREPE deep learning network
pitchnnEstimate pitch with deep learning neural network

Featured Examples