Code Generation for Deep Learning Networks with ARM Compute Library

With MATLAB^® Coder™, you can generate code for prediction from an already trained neural network, targeting an embedded platform that uses an ARM^® processor that supports the NEON extension. The code generator takes advantage of the ARM Compute Library for computer vision and machine learning. The generated code implements a neural network that has the architecture, layers, and parameters specified in the input SeriesNetwork (Deep Learning Toolbox) or DAGNetwork (Deep Learning Toolbox) network object.

Generate code by using one of these methods:

Requirements

MATLAB Coder Interface for Deep Learning. To install the support package, select it from the MATLAB Add-Ons menu.
ARM Compute Library for computer vision and machine learning must be installed on the target hardware.
Deep Learning Toolbox™.
Environment variables for the compilers and libraries.

Note

The ARM Compute library version that the examples in this help topic uses might not be the latest version that code generation supports. For supported versions of libraries and for information about setting up environment variables, see Prerequisites for Deep Learning with MATLAB Coder.

Code Generation by Using `codegen`

To generate code for deep learning on an ARM target by using codegen:

Write an entry-point function that loads the pretrained convolutional neural network (CNN) and calls predict. For example:

function out = squeezenet_predict(in)
%#codegen

persistent net;
opencv_linkflags = '`pkg-config --cflags --libs opencv`';
coder.updateBuildInfo('addLinkFlags',opencv_linkflags);
if isempty(net)
    net = coder.loadDeepLearningNetwork('squeezenet', 'squeezenet');
end

out = net.predict(in);
end

If your target hardware is Raspberry Pi^®, you can take advantage of the MATLAB Support Package for Raspberry Pi Hardware. With the support package, codegen moves the generated code to the Raspberry Pi and builds the executable program on the Raspberry Pi. When you generate code for a target that does not have a hardware support package, you must run commands to move the generated files and build the executable program.
MEX generation is not supported for code generation for deep learning on ARM targets.
For ARM, for inputs to predict (Deep Learning Toolbox) with multiple images or observations (N > 1), a MiniBatchSize of greater than 1 is not supported. Specify a MiniBatchSize of 1.

Code Generation for Deep Learning on a Raspberry Pi

When you have the MATLAB Support Package for Raspberry Pi Hardware, to generate code for deep learning on a Raspberry Pi:

To connect to the Raspberry Pi, use raspi. For example:
```
r = raspi('raspiname','username','password');
```
Create a code generation configuration object for a library or executable by using coder.config. Set the TargetLang property to 'C++'.
```
cfg = coder.config('exe');
cfg.TargetLang = 'C++';
```
Create a deep learning configuration object by using coder.DeepLearningConfig. Set the ArmComputeVersion and ArmArchitecture properties. Set the DeepLearningConfig property of the code generation configuration object to the coder.ARMNEONConfig object. For example:
```
dlcfg = coder.DeepLearningConfig('arm-compute');
dlcfg.ArmArchitecture = 'armv7';
dlcfg.ArmComputeVersion = '20.02.1';
cfg.DeepLearningConfig = dlcfg;
```
To configure code generation hardware settings for the Raspberry Pi, create a coder.Hardware object, by using coder.hardware. Set the Hardware property of the code generation configuration object to the coder.Hardware object.
```
hw = coder.hardware('Raspberry Pi');
cfg.Hardware = hw;
```
If you are generating an executable program, provide a C++ main program. For example:
```
cfg.CustomSource = 'main.cpp';
```
To generate code, use codegen. Specify the code generation configuration object by using the -config option. For example:
```
codegen -config cfg squeezenet_predict -args {ones(227, 227, 3,'single')} -report
```
Note
You can specify half-precision inputs for code generation. However, the code generator type casts the inputs to single-precision. The Deep Learning Toolbox uses single-precision, floating-point arithmetic for all computations in MATLAB.

Code Generation When You Do Not Have a Hardware Support Package

To generate code for deep learning when you do not have a hardware support package for the target:

Generate code on a Linux^® host only.
Create a configuration object for a library. For example:
```
cfg = coder.config('lib');
```
Do not use a configuration object for an executable program.
Configure code generation to generate C++ code and to generate source code only.
```
cfg.GenCodeOnly = true;
cfg.TargetLang = 'C++';
```
To specify code generation with the ARM Compute Library, create a coder.ARMNEONConfig object by using coder.DeepLearningConfig. Set the ArmComputeVersion and ArmArchitecture properties. Set the DeepLearningConfig property of the code generation configuration object to the coder.ARMNEONConfig object.
```
dlcfg = coder.DeepLearningConfig('arm-compute');
dlcfg.ArmArchitecture = 'armv7';
dlcfg.ArmComputeVersion = '20.02.1';
cfg.DeepLearningConfig = dlcfg;
```
To configure code generation parameters that are specific to the target hardware, set the ProdHWDeviceType property of the HardwareImplementation object.
- For the ARMv7 architecture, use 'ARM Compatible->ARM Cortex'.
- for the ARMv8 architecture, use 'ARM Compatible->ARM 64-bit (LP64)'.
For example:
```
cfg.HardwareImplementation.ProdHWDeviceType = 'ARM Compatible->ARM 64-bit (LP64)';
```
To generate code, use codegen. Specify the code generation configuration object by using the -config option. For example:
```
codegen -config cfg squeezenet_predict -args {ones(227, 227, 3, 'single')} -d arm_compute
```

For an example, see Code Generation for Deep Learning on ARM Targets.

Generated Code

The series network is generated as a C++ class containing an array of layer classes.

class b_squeezenet_0
{
 public:
  int32_T batchSize;
  int32_T numLayers;
  real32_T *inputData;
  real32_T *outputData;
  MWCNNLayer *layers[68];
 private:
  MWTargetNetworkImpl *targetImpl;
 public:
  b_squeezenet_0();
  void presetup();
  void postsetup();
  void setup();
  void predict();
  void cleanup();
  real32_T *getLayerOutput(int32_T layerIndex, int32_T portIndex);
  ~b_squeezenet_0();
};

The setup() method of the class sets up handles and allocates memory for each layer of the network object. The predict() method invokes prediction for each of the layers in the network. Suppose that you generate code for an entry-point function, squeezenet_predict. In the generated"for you" file, squeezenet_predict.cpp, the entry-point function squeeznet_predict() constructs a static object of b_squeezenet_0 class type and invokes setup and predict on the network object.

static b_squeezenet_0 net;
static boolean_T net_not_empty;

// Function Definitions
//
// A persistent object net is used to load the DAGNetwork object.
//  At the first call to this function, the persistent object is constructed and
//  set up. When the function is called subsequent times, the same object is reused
//  to call predict on inputs, avoiding reconstructing and reloading the
//  network object.
// Arguments    : const real32_T in[154587]
//                real32_T out[1000]
// Return Type  : void
//
void squeezenet_predict(const real32_T in[154587], real32_T out[1000])
{
  //  Copyright 2018 The MathWorks, Inc.
  if (!net_not_empty) {
    DeepLearningNetwork_setup(&net);
    net_not_empty = true;
  }

  DeepLearningNetwork_predict(&net, in, out);
}

Binary files are exported for layers that have parameters, such as fully connected and convolution layers in the network. For example, the files with names having the pattern cnn_squeezenet_*_w and cnn_squeezenet_*_b correspond to weights and bias parameters for the convolution layers in the network.

cnn_squeezenet_conv10_b            
cnn_squeezenet_conv10_w            
cnn_squeezenet_conv1_b             
cnn_squeezenet_conv1_w             
cnn_squeezenet_fire2-expand1x1_b   
cnn_squeezenet_fire2-expand1x1_w   
cnn_squeezenet_fire2-expand3x3_b   
cnn_squeezenet_fire2-expand3x3_w   
cnn_squeezenet_fire2-squeeze1x1_b  
cnn_squeezenet_fire2-squeeze1x1_w 
...

`int8` Code Generation

See Generate int8 Code for Deep Learning Networks.

Code Generation by Using the MATLAB Coder App

Complete the Select Source Files and Define Input Types steps.
Go to the Generate Code step. (Skip the Check for Run-Time Issues step because MEX generation is not supported for code generation with the ARM Compute Library.)
Set Language to C++.
Specify the target ARM hardware.
If your target hardware is Raspberry Pi and you installed the MATLAB Support Package for Raspberry Pi Hardware:
- For Hardware Board, select Raspberry Pi.
- To access the Raspberry Pi settings, click More Settings. Then, click Hardware. Specify the Device Address, Username, Password, and Build directory.
When you do not have a support package for your ARM target:
- Make sure that Build type is Static Library or Dynamic Library and select the Generate code only check box.
- For Hardware Board, select None - Select device below.
- For Device vendor, select ARM Compatible.
- For the Device type:
  - For the ARMv7 architecture, select ARM Cortex.
  - For the ARMv8 architecture, select ARM 64-bit (LP64).
Note
If you generate code for deep learning on an ARM target, and do not use a hardware support package, generate code on a Linux host only.
In the Deep Learning pane, set Target library to ARM Compute. Specify ARM Compute Library version and ARM Compute Architecture.
Generate the code.