OpenL3 embeddings extraction network
Audio Toolbox / Deep Learning
The OpenL3 block leverages a pretrained convolutional neural network that extracts feature embeddings from audio signals. These embeddings are powerful audio representations that can be used for tasks such as classification. This block requires Deep Learning Toolbox™.
Port_1 — Spectrograms
matrix | 4-D array
Spectrograms generated from audio, specified as an N-by-M matrix or an N-by-M-by-1-by-K array. K represents the number of spectrograms, and N-by-M is the size of the spectrograms and depends on the value of the Spectrum type parameter.
Mel (128 bands)–– The network accepts mel spectrograms of size 128-by-199, where 128 is the number of mel bands, and 199 is the number of time hops.
Mel (256 bands)–– The network accepts mel spectrograms of size 256-by-199, where 256 is the number of mel bands, and 199 is the number of time hops.
Linear–– The network accepts positive one-sided spectrograms of size 257-by-197, where 257 is the FFT length and 197 is the number of time hops.
Port_1 — Embeddings
Output embeddings, returned as a K-by-L matrix, where K is the number of input spectrograms, and L is specified by the Embedding length parameter.
Spectrum type — Type of spectrum
Mel (128 bands) (default) |
Mel (256 bands) |
Type of spectrum generated from audio and used as input to the neural network,
Mel (128 bands),
Linear. This parameter specifies
the size of the network input Port_1.
Content type — Type of audio content
Environmental sounds (default) |
Type of audio content the neural network was trained on, specified as
Environmental sounds or
sounds. Set this parameter to
sounds to use a neural network pretrained on environmental audio data,
and set it to
Musical sounds to use a network pretrained on
Embedding length — Output embedding length
512 (default) |
Length of output embedding, specified as
Mini-batch size — Size of mini-batches
128 (default) | positive integer
Size of mini-batches to use for prediction, specified as a positive integer. Larger mini-batch sizes require more memory but can lead to faster predictions.
 Cramer, Jason, et al. "Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings." In ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 3852-56. DOI.org (Crossref), doi:/10.1109/ICASSP.2019.8682475.
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Usage notes and limitations:
To generate generic C code that does not depend on third-party libraries, in the Configuration Parameters > Code Generation general category, set the Language parameter to
To generate C++ code, in the Configuration Parameters > Code Generation general category, set the Language parameter to
C++. To specify the target library for code generation, in the Code Generation > Interface category, set the Target Library parameter. Setting this parameter to
Nonegenerates generic C++ code that does not depend on third-party libraries.
For ERT-based targets, the Support: variable-size signals parameter in the Code Generation> Interface pane must be enabled.
For a list of networks and layers supported for code generation, see Networks and Layers Supported for Code Generation (MATLAB Coder).
Introduced in R2022b