calibrate

Simulate and collect ranges of a deep neural network

Syntax

calResults = calibrate(quantObj,calData)

calResults = calibrate(quantObj,calData,Name=Value)

Description

Add-On Required: This feature requires the Deep Learning Toolbox Model Compression Library add-on.

calResults = calibrate(quantObj,calData) exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network specified by dlquantizer object, quantObj, using the data specified by calData.

example

calResults = calibrate(quantObj,calData,Name=Value) calibrates the network with additional options specified by one or more name-value arguments.

Examples

collapse all

Quantize a Neural Network for GPU Target

This example uses:

Open Live Script

This example shows how to quantize learnable parameters in the convolution layers of a neural network for GPU and explore the behavior of the quantized network. In this example, you quantize the squeezenet neural network after retraining the network to classify new images. In this example, the memory required for the network is reduced approximately 75% through quantization while the accuracy of the network is not affected.

Load the pretrained network. net is the output network of the Train Deep Learning Network to Classify New Images example.

load squeezedlnetmerch
net

net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

Define calibration and validation data to use for quantization.

The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

In this example, use the images in the MerchData data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
classes = categories(imds.Labels);
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227], calData);
aug_valData = augmentedImageDatastore([227 227], valData);

Create a dlquantizer object and specify the network to quantize.

dlquantObj = dlquantizer(net);

Specify the GPU target.

quantOpts = dlquantizationOptions(Target='gpu');
quantOpts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}

quantOpts = 
  dlquantizationOptions with properties:

   Validation Metric Info
    MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]}

   Validation Environment Info
       Target: 'gpu'
    Bitstream: ''

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

calResults = calibrate(dlquantObj, aug_calData)

calResults=120×5 table
               'conv1_Weights'               'conv1'    "Weights"    -0.9198    0.8849
                  'conv1_Bias'               'conv1'       "Bias"    -0.0793    0.2634
    'fire2-squeeze1x1_Weights'    'fire2-squeeze1x1'    "Weights"    -1.3800    1.2477
       'fire2-squeeze1x1_Bias'    'fire2-squeeze1x1'       "Bias"    -0.1164    0.2427
     'fire2-expand1x1_Weights'     'fire2-expand1x1'    "Weights"    -0.7406    0.9098
        'fire2-expand1x1_Bias'     'fire2-expand1x1'       "Bias"    -0.0601    0.1460
     'fire2-expand3x3_Weights'     'fire2-expand3x3'    "Weights"    -0.7440    0.6691
        'fire2-expand3x3_Bias'     'fire2-expand3x3'       "Bias"    -0.0518    0.0742
    'fire3-squeeze1x1_Weights'    'fire3-squeeze1x1'    "Weights"    -0.7712    0.6892
       'fire3-squeeze1x1_Bias'    'fire3-squeeze1x1'       "Bias"    -0.1014    0.3267
     'fire3-expand1x1_Weights'     'fire3-expand1x1'    "Weights"    -0.7204    0.9743
        'fire3-expand1x1_Bias'     'fire3-expand1x1'       "Bias"    -0.0670    0.3043
     'fire3-expand3x3_Weights'     'fire3-expand3x3'    "Weights"    -0.6144    0.7741
        'fire3-expand3x3_Bias'     'fire3-expand3x3'       "Bias"    -0.0536    0.1033
      ⋮

Use the validate function to quantize the learnable parameters in the convolution layers of the network and exercise the network. The function uses the metric function defined in the dlquantizationOptions object to compare the results of the network before and after quantization.

valResults = validate(dlquantObj, aug_valData, quantOpts)

valResults = struct with fields:
       NumSamples: 20
    MetricResults: [1×1 struct]
       Statistics: [2×2 table]

Examine the validation output to see the performance of the quantized network.

valResults.MetricResults.Result

ans=2×2 table
    'Floating-Point'    1
         'Quantized'    1

valResults.Statistics

ans=2×2 table
    'Floating-Point'    2900268
         'Quantized'     733932

In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.

The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.

Quantize Network for FPGA Deployment

This example uses:

Open Live Script

Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Compression Library and Deep Learning HDL Toolbox to deploy the int8 network to a target FPGA board.

Load Pretrained Network

Load the pretrained LogoNet network and analyze the network architecture.

snet = getLogoNetwork;
deepNetworkDesigner(snet);

Set random number generator for reproducibility.

rng(0);

Load Data

This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore object for calibration and validation.

curDir = pwd;
unzip("logos_dataset.zip");
imageData = imageDatastore(fullfile(curDir,'logos_dataset'),...
'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames');
[calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');

Generate Calibration Result File for the Network

Create a dlquantizer (Deep Learning HDL Toolbox) object and specify the network to quantize. Specify the execution environment as FPGA.

dlQuantObj = dlquantizer(snet,'ExecutionEnvironment',"FPGA");

Use the calibrate (Deep Learning HDL Toolbox) function to exercise the network with sample inputs and collect the range information. The calibrate function collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.

calibrate(dlQuantObj,calibrationData)

ans=35×5 table
    'conv_1_Weights'    'conv_1'    "Weights"    -0.0490    0.0394
       'conv_1_Bias'    'conv_1'       "Bias"          1    1.0028
    'conv_2_Weights'    'conv_2'    "Weights"    -0.0555    0.0619
       'conv_2_Bias'    'conv_2'       "Bias"    -0.0006    0.0023
    'conv_3_Weights'    'conv_3'    "Weights"    -0.0459    0.0469
       'conv_3_Bias'    'conv_3'       "Bias"    -0.0014    0.0015
    'conv_4_Weights'    'conv_4'    "Weights"    -0.0460    0.0510
       'conv_4_Bias'    'conv_4'       "Bias"    -0.0016    0.0038
      'fc_1_Weights'      'fc_1'    "Weights"    -0.0514    0.0543
         'fc_1_Bias'      'fc_1'       "Bias"    -0.0005    0.0008
      'fc_2_Weights'      'fc_2'    "Weights"    -0.0502    0.0516
         'fc_2_Bias'      'fc_2'       "Bias"    -0.0018    0.0019
      'fc_3_Weights'      'fc_3'    "Weights"    -0.0507    0.0468
         'fc_3_Bias'      'fc_3'       "Bias"    -0.0295    0.0249
      ⋮

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx Vivado® Design Suite 2022.1. To set the Xilinx Vivado toolpath, enter:

hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2022.1\bin\vivado.bat');

To create the target object, enter:

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet','IPAddress','10.10.10.15');

Alternatively, you can also use the JTAG interface.

% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');

Create dlQuantizationOptions Object

Create a dlquantizationOptions object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.

options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget);
options_emulation = dlquantizationOptions('Target','host');

To use a custom metric function, specify the metric function in the dlquantizationOptions object.

options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)},'Bitstream','zcu102_int8','Target',hTarget);
options_emulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)})

Validate Quantized Network

Use the validate function to quantize the learnable parameters in the convolution layers of the network. The validate function simulates the quantized network in MATLAB. The validate function uses the metric function defined in the dlquantizationOptions object to compare the results of the single-data-type network object to the results of the quantized network object.

prediction_emulation = dlQuantObj.validate(validationData,options_emulation)

prediction_emulation = struct with fields:
       NumSamples: 160
    MetricResults: [1×1 struct]
       Statistics: []

For validation on an FPGA, the validate function:

Programs the FPGA board by using the output of the compile method and the programming file
Downloads the network weights and biases
Compares the performance of the network before and after quantization

prediction_FPGA = dlQuantObj.validate(validationData,options_FPGA)

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8.
### The network includes the following layers:
1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer)
2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
3 'relu_1' ReLU ReLU (HW Layer)
4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
6 'relu_2' ReLU ReLU (HW Layer)
7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
9 'relu_3' ReLU ReLU (HW Layer)
10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer)
12 'relu_4' ReLU ReLU (HW Layer)
13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer)
15 'relu_5' ReLU ReLU (HW Layer)
16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer)
17 'relu_6' ReLU ReLU (HW Layer)
18 'fc_3' Fully Connected 32 fully connected layer (HW Layer)
19 'softmax' Softmax softmax (SW Layer)
20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer)

### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>relu_4 ...
### Compiling layer group: conv_1>>relu_4 ... complete.
### Compiling layer group: maxpool_4 ...
### Compiling layer group: maxpool_4 ... complete.
### Compiling layer group: fc_1>>fc_3 ...
### Compiling layer group: fc_1>>fc_3 ... complete.

### Allocating external memory buffers:

offset_name offset_address allocated_space
_______________________ ______________ ________________

"InputDataOffset" "0x00000000" "11.9 MB"
"OutputResultOffset" "0x00be0000" "128.0 kB"
"SchedulerDataOffset" "0x00c00000" "128.0 kB"
"SystemBufferOffset" "0x00c20000" "9.9 MB"
"InstructionDataOffset" "0x01600000" "4.6 MB"
"ConvWeightDataOffset" "0x01aa0000" "8.2 MB"
"FCWeightDataOffset" "0x022e0000" "10.4 MB"
"EndOffset" "0x02d40000" "Total: 45.2 MB"

### Network compilation complete.

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### The network includes the following layers:
1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer)
2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
3 'relu_1' ReLU ReLU (HW Layer)
4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
6 'relu_2' ReLU ReLU (HW Layer)
7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
9 'relu_3' ReLU ReLU (HW Layer)
10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer)
12 'relu_4' ReLU ReLU (HW Layer)
13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer)
15 'relu_5' ReLU ReLU (HW Layer)
16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer)
17 'relu_6' ReLU ReLU (HW Layer)
18 'fc_3' Fully Connected 32 fully connected layer (HW Layer)
19 'softmax' Softmax softmax (SW Layer)
20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer)

### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.

Deep Learning Processor Estimator Performance Results

LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s
------------- ------------- --------- --------- ---------
Network 39136574 0.17789 1 39136574 5.6
imageinput_norm 216472 0.00098
conv_1 6832680 0.03106
maxpool_1 3705912 0.01685
conv_2 10454501 0.04752
maxpool_2 1173810 0.00534
conv_3 9364533 0.04257
maxpool_3 1229970 0.00559
conv_4 1759348 0.00800
maxpool_4 24450 0.00011
fc_1 2651288 0.01205
fc_2 1696632 0.00771
fc_3 26978 0.00012
* The clock frequency of the DL processor is: 220MHz

### Finished writing input activations.
### Running single input activation.

prediction_FPGA = struct with fields:
       NumSamples: 160
    MetricResults: [1×1 struct]
       Statistics: [2×7 table]

View Performance of Quantized Neural Network

Display the accuracy of the quantized network.

prediction_emulation.MetricResults.Result

ans=2×2 table
    'Floating-Point'    0.9875
         'Quantized'    0.9875

prediction_FPGA.MetricResults.Result

ans=2×2 table
    'Floating-Point'    0.9875
         'Quantized'    0.9875

Display the performance of the quantized network in frames per second.

prediction_FPGA.Statistics

ans=2×7 table
    'Floating-Point'     5.6213    16     4    93.1976    63.9254    15.5952
         'Quantized'    19.4335    64    16    62.3099    50.1096    32.1032

Quantize a Neural Network for CPU Target

This example uses:

Open Live Script

This example shows how to quantize and validate a neural network for a CPU target. This workflow is similar to other execution environments, but before validating you must establish a raspi connection and specify it as target using dlquantizationOptions.

First, load your network. This example uses the pretrained network squeezenet.

load squeezedlnetmerch
net

net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

Then define your calibration and validation data, calDS and valDS respectively.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
classes = categories(imds.Labels);
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227],calData);
aug_valData = augmentedImageDatastore([227 227],valData);

Create the dlquantizer object and specify a CPU execution environment.

dq = dlquantizer(net,'ExecutionEnvironment','CPU')

dq = 
  dlquantizer with properties:

           NetworkObject: [1×1 dlnetwork]
    ExecutionEnvironment: 'CPU'

Calibrate the network.

calResults = calibrate(dq,aug_calData,'UseGPU','off')

calResults=120×5 table
               "conv1_Weights"               'conv1'    "Weights"    -0.9198    0.8849
                  "conv1_Bias"               'conv1'       "Bias"    -0.0793    0.2634
    "fire2-squeeze1x1_Weights"    'fire2-squeeze1x1'    "Weights"    -1.3800    1.2477
       "fire2-squeeze1x1_Bias"    'fire2-squeeze1x1'       "Bias"    -0.1164    0.2427
     "fire2-expand1x1_Weights"     'fire2-expand1x1'    "Weights"    -0.7406    0.9098
        "fire2-expand1x1_Bias"     'fire2-expand1x1'       "Bias"    -0.0601    0.1460
     "fire2-expand3x3_Weights"     'fire2-expand3x3'    "Weights"    -0.7440    0.6691
        "fire2-expand3x3_Bias"     'fire2-expand3x3'       "Bias"    -0.0518    0.0742
    "fire3-squeeze1x1_Weights"    'fire3-squeeze1x1'    "Weights"    -0.7712    0.6892
       "fire3-squeeze1x1_Bias"    'fire3-squeeze1x1'       "Bias"    -0.1014    0.3267
     "fire3-expand1x1_Weights"     'fire3-expand1x1'    "Weights"    -0.7204    0.9743
        "fire3-expand1x1_Bias"     'fire3-expand1x1'       "Bias"    -0.0670    0.3043
     "fire3-expand3x3_Weights"     'fire3-expand3x3'    "Weights"    -0.6144    0.7741
        "fire3-expand3x3_Bias"     'fire3-expand3x3'       "Bias"    -0.0536    0.1033
      ⋮

Use the Raspberry Pi® Blockset function, raspi, to create a connection to the Raspberry Pi. In the following code, replace:

raspiname with the name or address of your Raspberry Pi
username with your user name
password with your password

% r = raspi('raspiname','username','password')

For example,

r = raspi('gpucoder-raspberrypi-8','pi','matlab')

r = 
  raspi with properties:

         DeviceAddress: 'gpucoder-raspberrypi-8'      
                  Port: 18734                         
             BoardName: 'Raspberry Pi 3 Model B+'     
         AvailableLEDs: {'led0'}                      
  AvailableDigitalPins: [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]
  AvailableSPIChannels: {}                            
     AvailableI2CBuses: {}                            
      AvailableWebcams: {}                            
           I2CBusSpeed:                               
AvailableCANInterfaces: {}                            

  Supported peripherals

Specify raspi object as the target for the quantized network.

opts = dlquantizationOptions('Target',r);
opts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}

opts = 
  dlquantizationOptions with properties:

   Validation Metric Info
    MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]}

   Validation Environment Info
       Target: [1×1 raspi]
    Bitstream: ''

Validate the quantized network with the validate function.

valResults = validate(dq,aug_valData,opts)

### Starting application: 'codegen/lib/validate_predict_int8/pil/validate_predict_int8.elf'
    To terminate execution: clear validate_predict_int8_pil
### Launching application validate_predict_int8.elf...
### Host application produced the following standard output (stdout) and standard error (stderr) messages:

valResults = struct with fields:
       NumSamples: 20
    MetricResults: [1×1 struct]
       Statistics: []

Examine the validation output to see the performance of the quantized network.

valResults.MetricResults.Result

ans=2×2 table
    'Floating-Point'    1
         'Quantized'    1

Quantize YOLO v3 Object Detector

This example uses:

Open Live Script

This example shows how to quantize a yolov3ObjectDetector (Computer Vision Toolbox) object using preprocessed calibration and validation data.

First, download a pretrained YOLO v3 object detector.

detector = downloadPretrainedNetwork();

This example uses a small labeled data set that contains one or two labeled instances of a vehicle. Many of these images come from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona and used with permission.

Unzip the vehicle images and load the vehicle ground truth data.

unzip vehicleDatasetImages.zip
data = load('vehicleDatasetGroundTruth.mat');
vehicleDataset = data.vehicleDataset;

Add the full path to the local vehicle data folder.

vehicleDataset.imageFilename = fullfile(pwd, vehicleDataset.imageFilename);

Create an imageDatastore for loading the images and a boxLabelDatastore (Computer Vision Toolbox) for the ground truth bounding boxes.

imds = imageDatastore(vehicleDataset.imageFilename);
blds = boxLabelDatastore(vehicleDataset(:,2));

Use the combine function to combine both the datastores into a CombinedDatastore.

combinedDS = combine(imds, blds);

Split the data into calibration and validation data.

calData = combinedDS.subset(1:32);
valData = combinedDS.subset(33:64);

Use the preprocess (Computer Vision Toolbox) method of yolov3ObjectDetector (Computer Vision Toolbox) object with transform function to prepare the data for calibration and validation.

The transform function returns a TransformedDatastore object.

processedCalData = transform(calData, @(data)preprocess(detector,data));
processedValData = transform(valData, @(data)preprocess(detector,data));

Create the dlquantizer object. When you use the MATLAB execution environment, quantization is performed using the fi fixed-point data type which requires a Fixed-Point Designer™ license.

dq = dlquantizer(detector, 'ExecutionEnvironment', 'MATLAB');

Calibrate the network.

calResults = calibrate(dq, processedCalData,'UseGPU','off')

calResults=135×5 table
               'conv1_Weights'               'conv1'    "Weights"    -0.9219    0.8569
                  'conv1_Bias'               'conv1'       "Bias"    -0.0963    0.2663
    'fire2-squeeze1x1_Weights'    'fire2-squeeze1x1'    "Weights"    -1.3751    1.2444
       'fire2-squeeze1x1_Bias'    'fire2-squeeze1x1'       "Bias"    -0.1207    0.2310
     'fire2-expand1x1_Weights'     'fire2-expand1x1'    "Weights"    -0.7528    0.9162
        'fire2-expand1x1_Bias'     'fire2-expand1x1'       "Bias"    -0.0593    0.1404
     'fire2-expand3x3_Weights'     'fire2-expand3x3'    "Weights"    -0.7527    0.6774
        'fire2-expand3x3_Bias'     'fire2-expand3x3'       "Bias"    -0.0622    0.0882
    'fire3-squeeze1x1_Weights'    'fire3-squeeze1x1'    "Weights"    -0.7586    0.6877
       'fire3-squeeze1x1_Bias'    'fire3-squeeze1x1'       "Bias"    -0.1021    0.3165
     'fire3-expand1x1_Weights'     'fire3-expand1x1'    "Weights"    -0.7157    0.9768
        'fire3-expand1x1_Bias'     'fire3-expand1x1'       "Bias"    -0.0693    0.3288
     'fire3-expand3x3_Weights'     'fire3-expand3x3'    "Weights"    -0.6008    0.7764
        'fire3-expand3x3_Bias'     'fire3-expand3x3'       "Bias"    -0.0580    0.1123
      ⋮

Validate the quantized network with the validate function.

valResults = validate(dq, processedValData)

valResults = struct with fields:
       NumSamples: 32
    MetricResults: [1×1 struct]
       Statistics: []

function detector = downloadPretrainedNetwork()
   pretrainedURL = 'https://ssd.mathworks.com/supportfiles/vision/data/yolov3SqueezeNetVehicleExample_21aSPKG.zip';
   websave('yolov3SqueezeNetVehicleExample_21aSPKG.zip', pretrainedURL);

   unzip('yolov3SqueezeNetVehicleExample_21aSPKG.zip');

   pretrained = load("yolov3SqueezeNetVehicleExample_21aSPKG.mat");
   detector = pretrained.detector;
end

Input Arguments

collapse all

`quantObj` — Network to quantize
`dlquantizer` object

Network to quantize, specified as a dlquantizer object.

`calData` — Data to use for calibration of quantized network
`imageDatastore` object | `augmentedImageDatastore` object | `pixelLabelImageDatastore` object | `CombinedDatastore` object | `TransformedDatastore` object | `arrayDatastore` object | `dlarray` object | cell array of `dlarray` objects | cell array of numeric arrays | numeric array

Data to use for calibration of quantized network, specified as one of these values:

imageDatastore object
augmentedImageDatastore object
pixelLabelImageDatastore (Computer Vision Toolbox) object
CombinedDatastore object
TransformedDatastore object
arrayDatastore object
dlarray object
cell array of dlarray objects
cell array of numeric arrays
numeric array

When using a dlarray object, format the dlarray to assure the data has the appropriate shape. Unformatted dlarray objects are treated in the same way as numeric arrays.

For more information on how data in dlarray objects is handled for dlnetwork objects, see Deep Learning Data Formats.

You must preprocess the data used for calibration of a yolov3ObjectDetector (Computer Vision Toolbox) object using the preprocess (Computer Vision Toolbox) function. For an example of using preprocessed data for calibration of a yolov3ObjectDetector, see Quantize YOLO v3 Object Detector.

For more information on supported datastores, see Prepare Data for Quantizing Networks.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: calResults = calibrate(quantObj,calData,'UseGPU','on')

`MiniBatchSize` — Size of mini-batches
`32` (default) | positive integer

Size of the mini-batches to use for calibration, specified as a positive integer. Larger mini-batch sizes require more memory, but can lead to faster calibration.

`UseGPU` — Whether to use host GPU for calibration
'auto' (default) | `'on'` | `'off`

Whether to use host GPU for calibration, specified as one of the following:

'auto' — Use host GPU for calibration if one is available. Otherwise, use host CPU for calibration.
'on' — Use host GPU for calibration.
'off' — Use host CPU for calibration.

Data Types: char

Output Arguments

collapse all

`calResults` — Dynamic ranges of network
table

Dynamic ranges of layers of the network, returned as a table. Each row in the table displays the minimum and maximum values of a learnable parameter of a convolution layer of the optimized network. The software uses these minimum and maximum values to determine the scaling for the data type of the quantized parameter.

Version History

Introduced in R2020a

expand all

R2024b: Calibrate `dlnetwork` objects with deep learning arrays

The calibrate function supports dlarray objects as calibration data input for dlnetwork objects.

R2023b: Calibrate networks with numeric arrays and array datastores

The calibrate function supports numeric arrays and arrayDatastore objects as calibration data input.

R2023b: Code generation support for `dlnetwork`, `yolov3ObjectDetector`, and `yolov4ObjectDetector`

You can generate INT8 ARM or CuDNN code after you call the calibrate function of the dlquantizer object with a dlnetwork, yolov3ObjectDetector (Computer Vision Toolbox), or yolov4ObjectDetector (Computer Vision Toolbox) object as the network object.

In earlier releases, code generation from a calibrated dlquantizer object is supported only for DAGNetwork objects.

R2022b: Calibrate on host GPU or host CPU

You can now choose whether to calibrate your network using the host GPU or host CPU. By default, the calibrate function and the Deep Network Quantizer app will calibrate on the host GPU if one is available.

In previous versions, it was required that the execution environment was the same as the instrumentation environment used for the calibration step of quantization.

R2022b: Specify mini-batch size to use for calibration

Use MiniBatchSize to specify the size of mini-batches to use for calibration.

R2021a: ARM Cortex-A calibration support

The Deep Learning Toolbox™ Model Compression Library now supports calibration of a network for quantization and deployment on ARM^® Cortex^®-A microcontrollers.

calibrate

Syntax

Description

Examples

Quantize a Neural Network for GPU Target

Quantize Network for FPGA Deployment

Quantize a Neural Network for CPU Target

Quantize YOLO v3 Object Detector

Input Arguments

`quantObj` — Network to quantize
`dlquantizer` object

Name-Value Arguments

`MiniBatchSize` — Size of mini-batches
`32` (default) | positive integer

`UseGPU` — Whether to use host GPU for calibration
'auto' (default) | `'on'` | `'off`

Output Arguments

`calResults` — Dynamic ranges of network
table

Version History

R2024b: Calibrate `dlnetwork` objects with deep learning arrays

R2023b: Calibrate networks with numeric arrays and array datastores

R2023b: Code generation support for `dlnetwork`, `yolov3ObjectDetector`, and `yolov4ObjectDetector`

R2022b: Calibrate on host GPU or host CPU

R2022b: Specify mini-batch size to use for calibration

R2021a: ARM Cortex-A calibration support

See Also

Apps

Functions

Topics

calibrate

Syntax

Description

Examples

Quantize a Neural Network for GPU Target

Quantize Network for FPGA Deployment

Quantize a Neural Network for CPU Target

Quantize YOLO v3 Object Detector

Input Arguments

quantObj — Network to quantize dlquantizer object

Name-Value Arguments

MiniBatchSize — Size of mini-batches 32 (default) | positive integer

UseGPU — Whether to use host GPU for calibration 'auto' (default) | 'on' | 'off

Output Arguments

calResults — Dynamic ranges of network table

Version History

R2024b: Calibrate dlnetwork objects with deep learning arrays

R2023b: Calibrate networks with numeric arrays and array datastores

R2023b: Code generation support for dlnetwork, yolov3ObjectDetector, and yolov4ObjectDetector

R2022b: Calibrate on host GPU or host CPU

R2022b: Specify mini-batch size to use for calibration

R2021a: ARM Cortex-A calibration support

See Also

Apps

Functions

Topics

`quantObj` — Network to quantize
`dlquantizer` object

`MiniBatchSize` — Size of mini-batches
`32` (default) | positive integer

`UseGPU` — Whether to use host GPU for calibration
'auto' (default) | `'on'` | `'off`

`calResults` — Dynamic ranges of network
table

R2024b: Calibrate `dlnetwork` objects with deep learning arrays

R2023b: Code generation support for `dlnetwork`, `yolov3ObjectDetector`, and `yolov4ObjectDetector`