Main Content

dlquantizer

Quantize a deep neural network to 8-bit scaled integer data types

Description

Use the dlquantizer object to reduce the memory requirement of a deep neural network by quantizing weights, biases, and activations to 8-bit scaled integer data types.

Creation

Description

quantObj = dlquantizer(net) creates a dlquantizer object for the specified network.

example

quantObj = dlquantizer(net,Name,Value) creates a dlquantizer object for the specified network, with additional options specified by one or more name-value pair arguments.

Use dlquantizer to create an quantized network for GPU, FPGA, or CPU deployment. To learn about the products required to quantize and deploy the deep learning network to a GPU, FPGA, or CPU environment, see Quantization Workflow Prerequisites.

Input Arguments

expand all

Pretrained neural network, specified as a DAGNetwork, SeriesNetwork, yolov2ObjectDetector (Computer Vision Toolbox), or a ssdObjectDetector (Computer Vision Toolbox) object.

Quantization of yolov2ObjectDetector (Computer Vision Toolbox) and ssdObjectDetector (Computer Vision Toolbox) networks requires a GPU Coder™ license.

Properties

expand all

Pre-trained neural network, specified as a DAGNetwork, SeriesNetwork, yolov2ObjectDetector (Computer Vision Toolbox), or a ssdObjectDetector (Computer Vision Toolbox) object.

Specify the execution environment for the quantized network. When this parameter is not specified the default execution environment is GPU. To learn about the products required to quantize and deploy the deep learning network to a GPU, FPGA, or CPU environment, see Quantization Workflow Prerequisites.

Example: 'ExecutionEnvironment','FPGA'

Enable or disable the MATLAB simulation workflow. When this parameter is set to on, the quantized network is validated by simulating the quantized network in MATLAB and comparing the single data type network prediction results to the simulated network prediction results.

Example: 'Simulation', 'on'

Object Functions

calibrateSimulate and collect ranges of a deep neural network
validateQuantize and validate a deep neural network

Examples

collapse all

This example shows how to quantize learnable parameters in the convolution layers of a neural network for GPU and explore the behavior of the quantized network. In this example, you quantize the squeezenet neural network after retraining the network to classify new images according to the Train Deep Learning Network to Classify New Images example. In this example, the memory required for the network is reduced approximately 75% through quantization while the accuracy of the network is not affected.

Load the pretrained network. net is the output network of the Train Deep Learning Network to Classify New Images example.

load net
net
net = 
  DAGNetwork with properties:

         Layers: [68×1 nnet.cnn.layer.Layer]
    Connections: [75×2 table]
     InputNames: {'data'}
    OutputNames: {'new_classoutput'}

Define calibration and validation data to use for quantization.

The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

In this example, use the images in the MerchData data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227], calData);
aug_valData = augmentedImageDatastore([227 227], valData);

Create a dlquantizer object and specify the network to quantize.

quantObj = dlquantizer(net);

Define a metric function to use to compare the behavior of the network before and after quantization. This example uses the hComputeModelAccuracy metric function.

function accuracy = hComputeModelAccuracy(predictionScores, net, dataStore)
%% Computes model-level accuracy statistics
    
    % Load ground truth
    tmp = readall(dataStore);
    groundTruth = tmp.response;
    
    % Compare with predicted label with actual ground truth 
    predictionError = {};
    for idx=1:numel(groundTruth)
        [~, idy] = max(predictionScores(idx,:)); 
        yActual = net.Layers(end).Classes(idy);
        predictionError{end+1} = (yActual == groundTruth(idx)); %#ok
    end
    
    % Sum all prediction errors.
    predictionError = [predictionError{:}];
    accuracy = sum(predictionError)/numel(predictionError);
end

Specify the metric function in a dlquantizationOptions object.

quantOpts = dlquantizationOptions('MetricFcn',{@(x)hComputeModelAccuracy(x, net, aug_valData)});

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

calResults = calibrate(quantObj, aug_calData)
calResults=95×5 table
                   Optimized Layer Name                      Network Layer Name        Learnables / Activations    MinValue     MaxValue
    __________________________________________________    _________________________    ________________________    _________    ________

    {'conv1_relu_conv1_Weights'                      }    {'relu_conv1'           }           "Weights"             -0.91985     0.88489
    {'conv1_relu_conv1_Bias'                         }    {'relu_conv1'           }           "Bias"                -0.07925     0.26343
    {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Weights'}    {'fire2-relu_squeeze1x1'}           "Weights"                -1.38      1.2477
    {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Bias'   }    {'fire2-relu_squeeze1x1'}           "Bias"                -0.11641     0.24273
    {'fire2-expand1x1_fire2-relu_expand1x1_Weights'  }    {'fire2-relu_expand1x1' }           "Weights"              -0.7406     0.90982
    {'fire2-expand1x1_fire2-relu_expand1x1_Bias'     }    {'fire2-relu_expand1x1' }           "Bias"               -0.060056     0.14602
    {'fire2-expand3x3_fire2-relu_expand3x3_Weights'  }    {'fire2-relu_expand3x3' }           "Weights"             -0.74397     0.66905
    {'fire2-expand3x3_fire2-relu_expand3x3_Bias'     }    {'fire2-relu_expand3x3' }           "Bias"               -0.051778    0.074239
    {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Weights'}    {'fire3-relu_squeeze1x1'}           "Weights"             -0.77198     0.68818
    {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Bias'   }    {'fire3-relu_squeeze1x1'}           "Bias"                -0.10144     0.32676
    {'fire3-expand1x1_fire3-relu_expand1x1_Weights'  }    {'fire3-relu_expand1x1' }           "Weights"              -0.7213     0.97523
    {'fire3-expand1x1_fire3-relu_expand1x1_Bias'     }    {'fire3-relu_expand1x1' }           "Bias"               -0.067021     0.30422
    {'fire3-expand3x3_fire3-relu_expand3x3_Weights'  }    {'fire3-relu_expand3x3' }           "Weights"             -0.61274     0.77511
    {'fire3-expand3x3_fire3-relu_expand3x3_Bias'     }    {'fire3-relu_expand3x3' }           "Bias"               -0.053607      0.1033
    {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Weights'}    {'fire4-relu_squeeze1x1'}           "Weights"             -0.74225      1.0891
    {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Bias'   }    {'fire4-relu_squeeze1x1'}           "Bias"                -0.10885     0.13878
      ⋮

Use the validate function to quantize the learnable parameters in the convolution layers of the network and exercise the network. The function uses the metric function defined in the dlquantizationOptions object to compare the results of the network before and after quantization.

valResults = validate(quantObj, aug_valData, quantOpts)
valResults = struct with fields:
       NumSamples: 20
    MetricResults: [1×1 struct]
       Statistics: [2×2 table]

Examine the validation output to see the performance of the quantized network.

valResults.MetricResults.Result
ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}           1      
     {'Quantized'     }           1      

valResults.Statistics
ans=2×2 table
    NetworkImplementation    LearnableParameterMemory(bytes)
    _____________________    _______________________________

     {'Floating-Point'}                2.9003e+06           
     {'Quantized'     }                7.3393e+05           

In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.

The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.

This example shows how to quantize learnable parameters in the convolution layers of a neural network, and explore the behavior of the quantized network. In this example, you quantize the LogoNet neural network. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results from the target device.

To run this example, you need the products listed under FPGA in Quantization Workflow Prerequisites.

For additional requirements, see Quantization Workflow Prerequisites.

Create a file in your current working directory called getLogoNetwork.m. Enter these lines into the file:

function net = getLogoNetwork()
    data = getLogoData();
    net  = data.convnet;
end

function data = getLogoData()
    if ~isfile('LogoNet.mat')
        url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat';
        websave('LogoNet.mat',url);
    end
    data = load('LogoNet.mat');
end

Load the pretrained network.

snet = getLogoNetwork();
snet = 

  SeriesNetwork with properties:

         Layers: [22×1 nnet.cnn.layer.Layer]
     InputNames: {'imageinput'}
    OutputNames: {'classoutput'}

Define calibration and validation data to use for quantization.

The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

This example uses the images in the logos_dataset data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

curDir = pwd;
newDir = fullfile(matlabroot,'examples','deeplearning_shared','data','logos_dataset.zip');
copyfile(newDir,curDir);
unzip('logos_dataset.zip');
imageData = imageDatastore(fullfile(curDir,'logos_dataset'),...
 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames');
[calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');

Create a dlquantizer object and specify the network to quantize.

dlQuantObj = dlquantizer(snet,'ExecutionEnvironment','FPGA');

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

 dlQuantObj.calibrate(calibrationData)
ans = 
        Optimized Layer Name        Network Layer Name    Learnables / Activations     MinValue       MaxValue 
    ____________________________    __________________    ________________________    ___________    __________

    {'conv_1_Weights'          }      {'conv_1'    }           "Weights"                -0.048978      0.039352
    {'conv_1_Bias'             }      {'conv_1'    }           "Bias"                     0.99996        1.0028
    {'conv_2_Weights'          }      {'conv_2'    }           "Weights"                -0.055518      0.061901
    {'conv_2_Bias'             }      {'conv_2'    }           "Bias"                 -0.00061171       0.00227
    {'conv_3_Weights'          }      {'conv_3'    }           "Weights"                -0.045942      0.046927
    {'conv_3_Bias'             }      {'conv_3'    }           "Bias"                  -0.0013998     0.0015218
    {'conv_4_Weights'          }      {'conv_4'    }           "Weights"                -0.045967         0.051
    {'conv_4_Bias'             }      {'conv_4'    }           "Bias"                    -0.00164     0.0037892
    {'fc_1_Weights'            }      {'fc_1'      }           "Weights"                -0.051394      0.054344
    {'fc_1_Bias'               }      {'fc_1'      }           "Bias"                 -0.00052319    0.00084454
    {'fc_2_Weights'            }      {'fc_2'      }           "Weights"                 -0.05016      0.051557
    {'fc_2_Bias'               }      {'fc_2'      }           "Bias"                  -0.0017564     0.0018502
    {'fc_3_Weights'            }      {'fc_3'      }           "Weights"                -0.050706       0.04678
    {'fc_3_Bias'               }      {'fc_3'      }           "Bias"                    -0.02951      0.024855
    {'imageinput'              }      {'imageinput'}           "Activations"                    0           255
    {'imageinput_normalization'}      {'imageinput'}           "Activations"              -139.34        198.72

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To create the target object, enter:

hTarget = dlhdl.Target('Intel', 'Interface', 'JTAG');

Define a metric function to use to compare the behavior of the network before and after quantization. Save this function in a local file.

function accuracy = hComputeModelAccuracy(predictionScores, net, dataStore)
%% hComputeModelAccuracy test helper function computes model level accuracy statistics

% Copyright 2020 The MathWorks, Inc.
    
    % Load ground truth 
    groundTruth = dataStore.Labels;
    
    % Compare predicted label with ground truth 
    predictionError = {};
    for idx=1:numel(groundTruth)
        [~, idy] = max(predictionScores(idx, :)); 
        yActual = net.Layers(end).Classes(idy);
        predictionError{end+1} = (yActual == groundTruth(idx)); %#ok
    end
    
    % Sum all prediction errors.
    predictionError = [predictionError{:}];
    accuracy = sum(predictionError)/numel(predictionError);
end

Specify the metric function in a dlquantizationOptions object.

options = dlquantizationOptions('MetricFcn', ...
    {@(x)hComputeModelAccuracy(x, snet, validationData)},'Bitstream','arria10soc_int8',...
'Target',hTarget);

To compile and deploy the quantized network, run the validate function of the dlquantizer object. Use the validate function to quantize the learnable parameters in the convolution layers of the network and exercise the network. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function checks for the Intel Quartus tool and the supported tool version. It then starts programming the FPGA device by using the sof file, displays progress messages, and the time it takes to deploy the network. The function uses the metric function defined in the dlquantizationOptions object to compare the results of the network before and after quantization.

prediction = dlQuantObj.validate(validationData,options);
           offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "48.0 MB"        
    "OutputResultOffset"        "0x03000000"     "4.0 MB"         
    "SystemBufferOffset"        "0x03400000"     "60.0 MB"        
    "InstructionDataOffset"     "0x07000000"     "8.0 MB"         
    "ConvWeightDataOffset"      "0x07800000"     "8.0 MB"         
    "FCWeightDataOffset"        "0x08000000"     "12.0 MB"        
    "EndOffset"                 "0x08c00000"     "Total: 140.0 MB"

### Programming FPGA Bitstream using JTAG...
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 16-Jul-2020 12:45:10
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 16-Jul-2020 12:45:26
### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   13570959                  0.09047                      30          380609145             11.8
    conv_module           12667786                  0.08445 
        conv_1             3938907                  0.02626 
        maxpool_1          1544560                  0.01030 
        conv_2             2910954                  0.01941 
        maxpool_2           577524                  0.00385 
        conv_3             2552707                  0.01702 
        maxpool_3           676542                  0.00451 
        conv_4              455434                  0.00304 
        maxpool_4            11251                  0.00008 
    fc_module               903173                  0.00602 
        fc_1                536164                  0.00357 
        fc_2                342643                  0.00228 
        fc_3                 24364                  0.00016 
 * The clock frequency of the DL processor is: 150MHz


### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   13570364                  0.09047                      30          380612682             11.8
    conv_module           12667103                  0.08445 
        conv_1             3939296                  0.02626 
        maxpool_1          1544371                  0.01030 
        conv_2             2910747                  0.01940 
        maxpool_2           577654                  0.00385 
        conv_3             2551829                  0.01701 
        maxpool_3           676548                  0.00451 
        conv_4              455396                  0.00304 
        maxpool_4            11355                  0.00008 
    fc_module               903261                  0.00602 
        fc_1                536206                  0.00357 
        fc_2                342688                  0.00228 
        fc_3                 24365                  0.00016 
 * The clock frequency of the DL processor is: 150MHz


### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   13571561                  0.09048                      30          380608338             11.8
    conv_module           12668340                  0.08446 
        conv_1             3939070                  0.02626 
        maxpool_1          1545327                  0.01030 
        conv_2             2911061                  0.01941 
        maxpool_2           577557                  0.00385 
        conv_3             2552082                  0.01701 
        maxpool_3           676506                  0.00451 
        conv_4              455582                  0.00304 
        maxpool_4            11248                  0.00007 
    fc_module               903221                  0.00602 
        fc_1                536167                  0.00357 
        fc_2                342643                  0.00228 
        fc_3                 24409                  0.00016 
 * The clock frequency of the DL processor is: 150MHz


### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   13569862                  0.09047                      30          380613327             11.8
    conv_module           12666756                  0.08445 
        conv_1             3939212                  0.02626 
        maxpool_1          1543267                  0.01029 
        conv_2             2911184                  0.01941 
        maxpool_2           577275                  0.00385 
        conv_3             2552868                  0.01702 
        maxpool_3           676438                  0.00451 
        conv_4              455353                  0.00304 
        maxpool_4            11252                  0.00008 
    fc_module               903106                  0.00602 
        fc_1                536050                  0.00357 
        fc_2                342645                  0.00228 
        fc_3                 24409                  0.00016 
 * The clock frequency of the DL processor is: 150MHz


### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   13570823                  0.09047                      30          380619836             11.8
    conv_module           12667607                  0.08445 
        conv_1             3939074                  0.02626 
        maxpool_1          1544519                  0.01030 
        conv_2             2910636                  0.01940 
        maxpool_2           577769                  0.00385 
        conv_3             2551800                  0.01701 
        maxpool_3           676795                  0.00451 
        conv_4              455859                  0.00304 
        maxpool_4            11248                  0.00007 
    fc_module               903216                  0.00602 
        fc_1                536165                  0.00357 
        fc_2                342643                  0.00228 
        fc_3                 24406                  0.00016 
 * The clock frequency of the DL processor is: 150MHz


          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "48.0 MB"        
    "OutputResultOffset"        "0x03000000"     "4.0 MB"         
    "SystemBufferOffset"        "0x03400000"     "60.0 MB"        
    "InstructionDataOffset"     "0x07000000"     "8.0 MB"         
    "ConvWeightDataOffset"      "0x07800000"     "8.0 MB"         
    "FCWeightDataOffset"        "0x08000000"     "12.0 MB"        
    "EndOffset"                 "0x08c00000"     "Total: 140.0 MB"

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.
### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   13572329                  0.09048                      10          127265075             11.8
    conv_module           12669135                  0.08446 
        conv_1             3939559                  0.02626 
        maxpool_1          1545378                  0.01030 
        conv_2             2911243                  0.01941 
        maxpool_2           577422                  0.00385 
        conv_3             2552064                  0.01701 
        maxpool_3           676678                  0.00451 
        conv_4              455657                  0.00304 
        maxpool_4            11227                  0.00007 
    fc_module               903194                  0.00602 
        fc_1                536140                  0.00357 
        fc_2                342688                  0.00228 
        fc_3                 24364                  0.00016 
 * The clock frequency of the DL processor is: 150MHz


### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   13572527                  0.09048                      10          127266427             11.8
    conv_module           12669266                  0.08446 
        conv_1             3939776                  0.02627 
        maxpool_1          1545632                  0.01030 
        conv_2             2911169                  0.01941 
        maxpool_2           577592                  0.00385 
        conv_3             2551613                  0.01701 
        maxpool_3           676811                  0.00451 
        conv_4              455418                  0.00304 
        maxpool_4            11348                  0.00008 
    fc_module               903261                  0.00602 
        fc_1                536205                  0.00357 
        fc_2                342689                  0.00228 
        fc_3                 24365                  0.00016 
 * The clock frequency of the DL processor is: 150MHz

Examine the MetricResults.Result field of the validation output to see the performance of the quantized network.

validateOut = prediction.MetricResults.Result
ans = 
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}         0.9875   
     {'Quantized'     }         0.9875   

Examine the QuantizedNetworkFPS field of the validation output to see the frames per second performance of the quantized network.

prediction.QuantizedNetworkFPS
ans = 11.8126

The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.

This example shows you how to import a dlquantizer object from the base workspace into the Deep Network Quantizer app. This allows you to begin quantization of a deep neural network using the command line or the app, and resume your work later in the app.

Open the Deep Network Quantizer app.

deepNetworkQuantizer

In the app, click New and select Import dlquantizer object.

Deep Network Quantizer import dlquantizer object

In the dialog, select the dlquantizer object to import from the base workspace. For this example, use quantObj that you create in the above example Quantize a Neural Network for GPU Target.

Select a dlquantizer object to import

The app imports any data contained in the dlquantizer object that was collected at the command line. This data can include the network to quantize, calibration data, validation data, and calibration statistics.

The app displays a table containing the calibration data contained in the imported dlquantizer object, quantObj. To the right of the table, the app displays histograms of the dynamic ranges of the parameters. The gray regions of the histograms indicate data that cannot be represented by the quantized representation. For more information on how to interpret these histograms, see Quantization of Deep Neural Networks.

Deep Network Quantizer app

To explore the behavior of a neural network that has quantized convolution layers, use the Deep Network Quantizer app. This example quantizes the learnable parameters of the convolution layers of the LogoNet neural network.

For this example, you need the products listed under FPGA in Quantization Workflow Prerequisites.

For additional requirements, see Quantization Workflow Prerequisites.

Create a file in your current working folder called getLogoNetwork.m. In the file, enter:

function net = getLogoNetwork
 if ~isfile('LogoNet.mat')
        url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat';
        websave('LogoNet.mat',url);
    end
    data = load('LogoNet.mat');
    net  = data.convnet;
end

Load the pretrained network.

snet = getLogoNetwork;
snet = 

  SeriesNetwork with properties:

         Layers: [22×1 nnet.cnn.layer.Layer]
     InputNames: {'imageinput'}
    OutputNames: {'classoutput'}

Define calibration and validation data to use for quantization.

The app uses calibration data to exercise the network and collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network. The app also exercises the dynamic ranges of the activations in all layers of the LogoNet network. For the best quantization results, the calibration data must be representative of inputs to the LogoNet network.

After quantization, the app uses the validation data set to test the network to understand the effects of the limited range and precision of the quantized learnable parameters of the convolution layers in the network.

In this example, use the images in the logos_dataset data set to calibrate and validate the LogoNet network. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

Expedite the calibration and validation process by using a subset of the calibrationData and validationData. Store the new reduced calibration data set in calibrationData_concise and the new reduced validation data set in validationData_concise.

curDir = pwd;
newDir = fullfile(matlabroot,'examples','deeplearning_shared','data','logos_dataset.zip');
copyfile(newDir,curDir);
unzip('logos_dataset.zip');
imageData = imageDatastore(fullfile(curDir,'logos_dataset'),...
 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames');
[calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');
calibrationData_concise = calibrationData.subset(1:20);
validationData_concise = vaidationData.subset(1:1);

At the MATLAB command prompt, open the Deep Network Quantizer app.

deepNetworkQuantizer

Click New and select Quantize a network.

The app verifies your execution environment.

Select the execution environment and the network to quantize from the base workspace. For this example, select a FPGA execution environment and the series network snet.

Select a network and execution environment

The app displays the layer graph of the selected network.

In the Calibrate section of the app toolstrip, under Calibration Data, select the augmentedImageDatastore object from the base workspace containing the calibration data calibrationData_concise.

Click Calibrate.

The Deep Network Quantizer app uses the calibration data to exercise the network and collect range information for the learnable parameters in the network layers.

When the calibration is complete, the app displays a table containing the weights and biases in the convolution and fully connected layers of the network. Also displayed are the dynamic ranges of the activations in all layers of the network and their minimum and maximum values during the calibration. The app displays histograms of the dynamic ranges of the parameters. The gray regions of the histograms indicate data that cannot be represented by the quantized representation. For more information on how to interpret these histograms, see Quantization of Deep Neural Networks.

Deep Network Quantizer calibration

In the Quantize column of the table, indicate whether to quantize the learnable parameters in the layer. You cannot quantize layers that are not convolution layers. Layers that are not quantized remain in single-precision.

In the Validate section of the app toolstrip, under Validation Data, select the augmentedImageDatastore object from the base workspace containing the validation data validationData_concise.

In the Hardware Settings section of the toolstrip, select from the options listed in the table:

Simulation EnvironmentAction
MATLAB (Simulate in MATLAB)Simulates the quantized network in MATLAB. Validates the quantized network by comparing performance to single-precision version of the network.
Intel Arria 10 SoC (arria10soc_int8)

Deploys the quantized network to an Intel® Arria® 10 SoC board by using the arria10soc_int8 bitstream. Validates the quantized network by comparing performance to single-precision version of the network.

Xilinx ZCU102 (zcu102_int8)

Deploys the quantized network to a Xilinx® Zynq® UltraScale+™ MPSoC ZCU102 10 SoC board by using the zcu102_int8 bitstream. Validates the quantized network by comparing performance to single-precision version of the network.

Xilinx ZC706 (zc706_int8)

Deploys the quantized network to a Xilinx Zynq-7000 ZC706 board by using the zc706_int8 bitstream. Validates the quantized network by comparing performance to single-precision version of the network.

When you select the Intel Arria 10 SoC (arria10soc_int8), Xilinx ZCU102 (zcu102_int8), or Xilinx ZC706 (zc706_int8) options, select the interface to use to deploy and validate the quantized network. The Target interface options are listed in this table.

Target OptionAction
JTAGPrograms the target FPGA board selected in Simulation Environment by using a JTAG cable. For more information, see JTAG Connection (Deep Learning HDL Toolbox)
EthernetPrograms the target FPGA board selected in Simulation Environment through the Ethernet interface. Specify the IP address for your target board in IP Address.

For this example, select Xilinx ZCU102 (zcu102_int8), select Ethernet, and enter the board IP address.

Deep Network Quantizer Hardware Settings

In the Validate section of the app toolstrip, under Quantization Options, select the Default metric function.

Click Quantize and Validate.

The Deep Network Quantizer app quantizes the weights, activations, and biases of convolution layers in the network to scaled 8-bit integer data types and uses the validation data to exercise the network. The app determines a metric function to use for the validation based on the type of network that is being quantized.

Type of NetworkMetric Function
Classification

Top-1 Accuracy – Accuracy of the network

Object Detection

Average Precision – Average precision over all detection results. See evaluateDetectionPrecision (Computer Vision Toolbox).

Regression

MSE – Mean squared error of the network

Semantic SegmentationevaluateSemanticSegmentation (Computer Vision Toolbox) – Evaluate semantic segmentation data set against ground truth
Single Shot Detector (SSD)

WeightedIOU – Average IoU of each class, weighted by the number of pixels in that class

When the validation is complete, the app displays the results of the validation, including:

  • Metric function used for validation

  • Result of the metric function before and after quantization

Deep Network Quantizer validation

If you want to use a different metric function for validation, for example to use the Top-5 accuracy metric function instead of the default Top-1 accuracy metric function, you can define a custom metric function. Save this function in a local file.

function accuracy = hComputeAccuracy(predictionScores, net, dataStore)
%% Computes model-level accuracy statistics
    
    % Load ground truth
    tmp = readall(dataStore);
    groundTruth = tmp.response;
    
    % Compare predicted label with ground truth 
    predictionError = {};
    for idx=1:numel(groundTruth)
        [~, idy] = max(predictionScores(idx,:)); 
        yActual = net.Layers(end).Classes(idy);
        predictionError{end+1} = (yActual == groundTruth(idx)); %#ok
    end
    
    % Sum all prediction errors.
    predictionError = [predictionError{:}];
    accuracy = sum(predictionError)/numel(predictionError);
end

To revalidate the network by using this custom metric function, under Quantization Options, enter the name of the custom metric function hComputeAccuracy. Select Add to add hComputeAccuracy to the list of metric functions available in the app. Select hComputeAccuracy as the metric function to use.

The custom metric function must be on the path. If the metric function is not on the path, this step produces an error.

Deep Network Quantizer select custom metric function

Click Quantize and Validate.

The app quantizes the network and displays the validation results for the custom metric function.

Deep Network Quantizer validation with custom metric function

The app displays only scalar values in the validation results table. To view the validation results for a custom metric function with nonscalar output, export the dlquantizer object, then validate the quantized network by using the validate function in the MATLAB command window.

After quantizing and validating the network, you can choose to export the quantized network.

Click the Export button. In the drop-down list, select Export Quantizer to create a dlquantizer object in the base workspace. You can deploy the quantized network to your target FPGA board and retrieve the prediction results by using MATLAB. See, Deploy Quantized Network Example (Deep Learning HDL Toolbox).

Introduced in R2020a