Code Generation for Deep Learning Networks with ARM Compute Library

With MATLAB® Coder™, you can generate code for prediction from an already trained convolutional neural network (CNN), targeting an embedded platform that uses an ARM® processor that supports the NEON extension. The code generator takes advantage of the ARM Compute Library for computer vision and machine learning. The generated code implements a CNN that has the architecture, layers, and parameters specified in the input SeriesNetwork or DAGNetwork network object.

Generate code by using one of these methods:

When you generate code for a neural network by using codegen or the MATLAB Coder app, the generated code uses column-major layout for your array data. To match the row-major layout that the deep learning library uses, the code generator must insert operations to convert the column-major layout to row-major layout. These conversion operations can degrade the performance of the generated code. Code generation for deep learning neural networks does not support MATLAB Coder row-major options, such as the RowMajor configuration parameter.

Requirements

  • MATLAB Coder Interface for Deep Learning Libraries. To install the support package, select it from the MATLAB Add-Ons menu.

  • ARM Compute Library for computer vision and machine learning must be installed on the target hardware.

  • Deep Learning Toolbox™.

  • Environment variables for the compilers and libraries.

For supported versions of libraries and for information about setting up environment variables, see Prerequisites for Deep Learning with MATLAB Coder.

Code Generation by Using codegen

To generate code for deep learning on an ARM target by using codegen:

  • Write an entry-point function that loads the pretrained CNN and calls predict. For example:

    function out = squeezenet_predict(in)
    %#codegen
    
    persistent net;
    opencv_linkflags = '`pkg-config --cflags --libs opencv`';
    coder.updateBuildInfo('addLinkFlags',opencv_linkflags);
    if isempty(net)
        net = coder.loadDeepLearningNetwork('squeezenet', 'squeezenet');
    end
    
    out = net.predict(in);
    end
    

  • If your target hardware is Raspberry Pi™, you can take advantage of the MATLAB Support Package for Raspberry Pi Hardware. With the support package, codegen moves the generated code to the Raspberry Pi and builds the executable program on the Raspberry Pi. When you generate code for a target that does not have a hardware support package, you must run commands to move the generated files and build the executable program.

  • MEX generation is not supported for code generation for deep learning on ARM targets.

  • For ARM, for inputs to predict with multiple images or observations (N > 1), a MiniBatchSize of greater than 1 is not supported. Specify a MiniBatchSize of 1.

Code Generation for Deep Learning on a Raspberry Pi

When you have the MATLAB Support Package for Raspberry Pi Hardware, to generate code for deep learning on a Raspberry Pi:

  1. To connect to the Raspberry Pi, use raspi. For example:

    r = raspi('raspiname','username','password');
    

  2. Create a code generation configuration object for a library or executable by using coder.config. Set the TargetLang property to 'C++'.

    cfg = coder.config('exe');
    cfg.TargetLang = 'C++';
    

  3. Create a deep learning configuration object by using coder.DeepLearningConfig. Set the ArmComputeVersion and ArmArchitecture properties. Set the DeepLearningConfig property of the code generation configuration object to the coder.ARMNEONConfig object. For example:

    dlcfg = coder.DeepLearningConfig('arm-compute');
    dlcfg.ArmArchitecture = 'armv7';
    dlcfg.ArmComputeVersion = '19.02';
    cfg.DeepLearningConfig = dlcfg;
    

  4. To configure code generation hardware settings for the Raspberry Pi, create a coder.Hardware object, by using coder.hardware. Set the Hardware property of the code generation configuration object to the coder.Hardware object .

    hw = coder.hardware('Raspberry Pi');
    cfg.Hardware = hw;
    

  5. If you are generating an executable program, provide a C++ main program. For example:

    cfg.CustomSource = 'main.cpp';

  6. To generate code, use codegen. Specify the code generation configuration object by using the -config option. For example:

    codegen -config cfg squeezenet_raspi_predict -args {ones(227, 227, 3,'single')} -report

For an example, see Code Generation for Deep Learning on Raspberry Pi.

Code Generation When You Do Not Have a Hardware Support Package

To generate code for deep learning when you do not have a hardware support package for the target:

  1. Generate code on a Linux® host only.

  2. Create a configuration object for a library. For example:

    cfg = coder.config('lib');

    Do not use a configuration object for an executable program.

  3. Configure code generation to generate C++ code and to generate source code only.

    cfg.GenCodeOnly = true;
    cfg.TargetLang = 'C++';

  4. To specify code generation with the ARM Compute Library, create a coder.ARMNEONConfig object by using coder.DeepLearningConfig. Set the ArmComputeVersion and ArmArchitecture properties. Set the DeepLearningConfig property of the code generation configuration object to the coder.ARMNEONConfig object.

    dlcfg = coder.DeepLearningConfig('arm-compute');
    dlcfg.ArmArchitecture = 'armv7';
    dlcfg.ArmComputeVersion = '19.02';
    cfg.DeepLearningConfig = dlcfg;
    

  5. To configure code generation parameters that are specific to the target hardware, set the ProdHWDeviceType property of the HardwareImplementation object.

    • For the ARMv7 architecture, use 'ARM Compatible->ARM Cortex'.

    • for the ARMv8 architecture, use 'ARM Compatible->ARM 64-bit (LP64)'.

    For example:

    cfg.HardwareImplementation.ProdHWDeviceType = 'ARM Compatible->ARM 64-bit (LP64)';

  6. To generate code, use codegen. Specify the code generation configuration object by using the -config option. For example:

    codegen -config cfg squeezenet_predict -args {ones(227, 227, 3, 'single')} -d arm_compute

For an example, see Code Generation for Deep Learning on ARM Targets.

Generated Code

The series network is generated as a C++ class containing an array of layer classes.

class b_squeezenet_0
{
 public:
  int32_T batchSize;
  int32_T numLayers;
  real32_T *inputData;
  real32_T *outputData;
  MWCNNLayer *layers[68];
 private:
  MWTargetNetworkImpl *targetImpl;
 public:
  b_squeezenet_0();
  void presetup();
  void postsetup();
  void setup();
  void predict();
  void cleanup();
  real32_T *getLayerOutput(int32_T layerIndex, int32_T portIndex);
  ~b_squeezenet_0();
};

The setup() method of the class sets up handles and allocates memory for each layer of the network object. The predict() method invokes prediction for each of the layers in the network. Suppose that you generate code for an entry-point function, squeezenet_predict. In the generated"for you" file, squeezenet_predict.cpp, the entry-point function squeeznet_predict() constructs a static object of b_squeezenet_0 class type and invokes setup and predict on the network object.

static b_squeezenet_0 net;
static boolean_T net_not_empty;

// Function Definitions
//
// A persistent object net is used to load the DAGNetwork object.
//  At the first call to this function, the persistent object is constructed and
//  set up. When the function is called subsequent times, the same object is reused
//  to call predict on inputs, avoiding reconstructing and reloading the
//  network object.
// Arguments    : const real32_T in[154587]
//                real32_T out[1000]
// Return Type  : void
//
void squeezenet_predict(const real32_T in[154587], real32_T out[1000])
{
  //  Copyright 2018 The MathWorks, Inc.
  if (!net_not_empty) {
    DeepLearningNetwork_setup(&net);
    net_not_empty = true;
  }

  DeepLearningNetwork_predict(&net, in, out);
}

Binary files are exported for layers that have parameters, such as fully connected and convolution layers in the network. For example, the files with names having the pattern cnn_squeezenet_*_w and cnn_squeezenet_*_b correspond to weights and bias parameters for the convolution layers in the network.

cnn_squeezenet_conv10_b            
cnn_squeezenet_conv10_w            
cnn_squeezenet_conv1_b             
cnn_squeezenet_conv1_w             
cnn_squeezenet_fire2-expand1x1_b   
cnn_squeezenet_fire2-expand1x1_w   
cnn_squeezenet_fire2-expand3x3_b   
cnn_squeezenet_fire2-expand3x3_w   
cnn_squeezenet_fire2-squeeze1x1_b  
cnn_squeezenet_fire2-squeeze1x1_w 
...

Code Generation by Using the MATLAB Coder App

  1. Complete the Select Source Files and Define Input Types steps.

  2. Go to the Generate Code step. (Skip the Check for Run-Time Issues step because MEX generation is not supported for code generation with the ARM Compute Library.)

  3. Set Language to C++.

  4. Specify the target ARM hardware.

    If your target hardware is Raspberry Pi and you installed the MATLAB Support Package for Raspberry Pi Hardware:

    • For Hardware Board, select Raspberry Pi.

    • To access the Raspberry Pi settings, click More Settings. Then, click Hardware. Specify the Device Address, Username, Password, and Build directory.

    When you do not have a support package for your ARM target:

    • Make sure that Build type is Static Library or Dynamic Library and select the Generate code only check box.

    • For Hardware Board, select None - Select device below.

    • For Device vendor, select ARM Compatible.

    • For the Device type:

      • For the ARMv7 architecture, select ARM Cortex.

      • For the ARMv8 architecture, select ARM 64-bit (LP64).

    Note

    If you generate code for deep learning on an ARM target, and do not use a hardware support package, generate code on a Linux host only.

  5. In the Deep Learning pane, set Target library to ARM Compute. Specify ARM Compute Library version and ARM Compute Architecture.

  6. Generate the code.

Code Generation by Using cnncodegen

  • Load the pretrained network in MATLAB. For example:

    net = alexnet;
    

  • Generate code for the CNN by using cnncodegen with 'targetlib' specified as 'arm-compute'.

    Specify the ARM Compute Library version and ARM architecture by using the targetparams argument.

    For example:

    cnncodegen(net,'targetlib','arm-compute','targetparams',struct('ArmComputeVersion','19.02','ArmArchitecture','armv8'));

    If you specify a version of the ARM Compute Library that is later than 19.02, the code generator produces code for 19.02. On the ARM target, the generated code can build with the later version of the library.

  • Write a C++ main function that calls predict.

  • Move the files to the ARM hardware and build the executable program.

For an example, see Code Generation for Deep Learning Networks with ARM Compute Library.

Generated Code

The cnncodegen command generates C++ code and a makefile, cnnbuild_rtw.mk. The generated files are in the codegen folder. Do not compile the generated code on the MATLAB host. Move the generated code to the ARM target platform for compilation.

The Series Network is generated as a C++ class containing an array of layer classes.

class CnnMain
{
  ...
  public:
    CnnMain();
    ...
    void setup();
    void predict();
    void cleanup();
    ...
    ~CnnMain();
};

The setup() method of the class sets up handles and allocates memory for each layer of the network object. The predict() method invokes prediction for each of the layers in the network.

void CnnMain::predict()
{
    int32_T idx;
    for (idx = 0; idx < 25; idx++) {
        this->layers[idx]->predict();
    }
}

Binary files are exported for layers that have parameters such as fully connected and convolution layers in the network. For instance, files cnn_CnnMain_conv*_w and cnn_CnnMain_conv*_b correspond to weights and bias parameters for the convolution layers in the network.

cnn_CnnMain_avg         cnn_CnnMain_conv5_w     
cnn_CnnMain_conv1_b     cnn_CnnMain_fc6_b       
cnn_CnnMain_conv1_w     cnn_CnnMain_fc6_w       
cnn_CnnMain_conv2_b     cnn_CnnMain_fc7_b       
cnn_CnnMain_conv2_w     cnn_CnnMain_fc7_w       
cnn_CnnMain_conv3_b     cnn_CnnMain_fc8_b       
cnn_CnnMain_conv3_w     cnn_CnnMain_fc8_w       
cnn_CnnMain_conv4_b     cnn_CnnMain_labels.txt  
cnn_CnnMain_conv4_w     
cnn_CnnMain_conv5_b      

See Also

| | |

Related Topics