Main Content

Accelerate Prototyping Workflow for Large Networks by Using Ethernet

This example shows how to deploy a deep learning network and obtain prediction results using the Ethernet connection to your target device. You can significantly speed up the deployment and prediction times for large deep learning networks by using Ethernet versus JTAG. This example shows the workflow on a ZCU102 SoC board. The example also works on the other boards supported by Deep Learning HDL Toolbox. See Supported Networks, Layers, Boards, and Tools.

Prerequisites

  • Xilinx ZCU102 SoC development kit. For help with board setup, see Guided SD Card Set Up (Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices).

  • Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC

  • Deep Learning HDL Toolbox™

  • Deep Learning Toolbox™ Model for AlexNet Network

Introduction

Deep Learning HDL Toolbox establishes a connection between the host computer and FPGA board to prototype deep learning networks on hardware. This connection is used to deploy deep learning networks and run predictions. The connection provides two services:

  • Programming the bitstream onto the FPGA

  • Communicating with the design running on FPGA from MATLAB

There are two hardware interfaces for establishing a connection between the host computer and FPGA board: JTAG and Ethernet.

JTAG Interface

The JTAG interface, programs the bitstream onto the FPGA over JTAG. The bitstream is not persistent through power cycles. You must reprogram the bitstream each time the FPGA is turned on.

MATLAB uses JTAG to control an AXI Master IP in the FPGA design, to communicate with the design running on the FPGA. You can use the AXI Master IP to read and write memory locations in the onboard memory and deep learning processor.

This figure shows the high-level architecture of the JTAG interface.

Ethernet Interface

The Ethernet interface leverages the ARM processor to send and receive information from the design running on the FPGA. The ARM processor runs on a Linux operating system. You can use the Linux operating system services to interact with the FPGA. When using the Ethernet interface, the bitstream is downloaded to the SD card. The bitstream is persistent through power cycles and is reprogrammed each time the FPGA is turned on. The ARM processor is configured with the correct device tree when the bitstream is programmed.

To communicate with the design running on the FPGA, MATLAB leverages the Ethernet connection between the host computer and ARM processor. The ARM processor runs a LIBIIO service, which communicates with a datamover IP in the FPGA design. The datamover IP is used for fast data transfers between the host computer and FPGA, which is useful when prototyping large deep learning networks that would have long transfer times over JTAG. The ARM processor generates the read and write transactions to access memory locations in both the onboard memory and deep learning processor.

The figure below shows the high-level architecture of the Ethernet interface.\

Load and Compile Deep Learning Network

This example uses the pretrained series network alexnet. This network is a larger network that has significant improvement in transfer time when deploying it to the FPGA by using Ethernet. To load alexnet, run the command:

snet = alexnet;

To view the layers of the network enter:

analyzeNetwork(snet);
% The saved network contains 25 layers including input, convolution, ReLU, cross channel normalization,
% max pool, fully connected, and the softmax output layers.

To deploy the deep learning network on the target FPGA board, create a dlhdl.Workflow object that has the pretrained network snet as the network and the bitstream for your target FPGA board. This example uses the bitstream 'zcu102_single', which has single data type and is configured for the ZCU102 board. To run this example on a different board, use the bitstream for your board.

hW = dlhdl.Workflow('Network', snet, 'Bitstream', 'zcu102_single');

Compile the alexnet network for deployment to the FPGA.

hW.compile;
          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "24.0 MB"        
    "OutputResultOffset"        "0x01800000"     "4.0 MB"         
    "SystemBufferOffset"        "0x01c00000"     "28.0 MB"        
    "InstructionDataOffset"     "0x03800000"     "4.0 MB"         
    "ConvWeightDataOffset"      "0x03c00000"     "16.0 MB"        
    "FCWeightDataOffset"        "0x04c00000"     "224.0 MB"       
    "EndOffset"                 "0x12c00000"     "Total: 300.0 MB"

The output displays the size of the compiled network, which is 300 MB. The entire 300 MB is transferred to the FPGA by using the deploy method. Due to the large size of the network, the transfer can take a significant amount of time if using JTAG. When using Ethernet, the transfer happens quickly.

Deploy Deep Learning Network to FPGA

Before deploying a network, you must first establish a connection to the FPGA board. The dlhdl.Target object represents this connection between the host computer and the FPGA. Create two target objects, one for connection through the JTAG interface and one for connection through the Ethernet interface. To use the JTAG connection, install Xilinx™ Vivado™ Design Suite 2019.2 and set the path to your installed Xilinx Vivado executable if it is not already set up.

% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2019.2\bin\vivado.bat');
hTargetJTAG = dlhdl.Target('Xilinx', 'Interface', 'JTAG')
hTargetJTAG = 
  Target with properties:

       Vendor: 'Xilinx'
    Interface: JTAG

hTargetEthernet = dlhdl.Target('Xilinx', 'Interface', 'Ethernet')
hTargetEthernet = 
  Target with properties:

       Vendor: 'Xilinx'
    Interface: Ethernet
    IPAddress: '192.168.1.100'
     Username: 'root'
         Port: 22

To deploy the network, assign the target object to the dlhdl.Workflow object and execute the deploy method. The deployment happens in two stages. First, the bitstream is programmed onto the FPGA. Then, the network is transferred to the onboard memory.

Select the JTAG interface and time the operation. This operation might take several minutes.

hW.Target = hTargetJTAG;
tic;
hW.deploy;
### Programming FPGA Bitstream using JTAG...
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to FC Processor.
### 8% finished, current time is 29-Jun-2020 16:33:14.
### 17% finished, current time is 29-Jun-2020 16:34:20.
### 25% finished, current time is 29-Jun-2020 16:35:38.
### 33% finished, current time is 29-Jun-2020 16:36:56.
### 42% finished, current time is 29-Jun-2020 16:38:13.
### 50% finished, current time is 29-Jun-2020 16:39:31.
### 58% finished, current time is 29-Jun-2020 16:40:48.
### 67% finished, current time is 29-Jun-2020 16:42:02.
### 75% finished, current time is 29-Jun-2020 16:43:10.
### 83% finished, current time is 29-Jun-2020 16:44:23.
### 92% finished, current time is 29-Jun-2020 16:45:39.
### FC Weights loaded. Current time is 29-Jun-2020 16:46:31
elapsedTimeJTAG = toc
elapsedTimeJTAG = 1.0614e+03

Use the Ethernet interface by setting the dlhdl.Workflow target object to hTargetEthernet and running the deploy function. There is a significant acceleration in the network deployment when you use Ethernet to deploy the bitstream and network to the FPGA.

hW.Target = hTargetEthernet;
tic;
hW.deploy;
### Programming FPGA Bitstream using Ethernet...
Downloading target FPGA device configuration over Ethernet to SD card ...
# Copied /tmp/hdlcoder_rd to /mnt/hdlcoder_rd
# Copying Bitstream hdlcoder_system.bit to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/hdlcoder_system.bit
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'

Downloading target FPGA device configuration over Ethernet to SD card done. The system will now reboot for persistent changes to take effect.


System is rebooting . . . . . .
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to FC Processor.
### 8% finished, current time is 29-Jun-2020 16:47:08.
### 17% finished, current time is 29-Jun-2020 16:47:08.
### 25% finished, current time is 29-Jun-2020 16:47:09.
### 33% finished, current time is 29-Jun-2020 16:47:10.
### 42% finished, current time is 29-Jun-2020 16:47:10.
### 50% finished, current time is 29-Jun-2020 16:47:11.
### 58% finished, current time is 29-Jun-2020 16:47:13.
### 67% finished, current time is 29-Jun-2020 16:47:13.
### 75% finished, current time is 29-Jun-2020 16:47:15.
### 83% finished, current time is 29-Jun-2020 16:47:16.
### 92% finished, current time is 29-Jun-2020 16:47:18.
### FC Weights loaded. Current time is 29-Jun-2020 16:47:18
elapsedTimeEthernet = toc
elapsedTimeEthernet = 47.5854

Changing from JTAG to Ethernet the deploy function reprograms the bitstream, which accounts for most of the elapsed time. Reprogramming is due to different methods that are used to program the bitstream for the different hardware interfaces. The Ethernet interface configures the ARM processor and uses a persistent programming method so that the bitstream is reprogrammed each time the board is turned on. When deploying different deep learning networks by using the same bitstream and hardware interface, you can skip the bitstream programming, which further speeds up network deployment.

Run Prediction for Example Image

Run a prediction for an example image by using the predict method.

imgFile = 'zebra.JPEG';
inputImg = imresize(imread(imgFile), [227,227]);
imshow(inputImg)

prediction = hW.predict(single(inputImg));
### Finished writing input activations.
### Running single input activations.
[val, idx] = max(prediction);
result = snet.Layers(end).ClassNames{idx}
result = 
'zebra'

Release any hardware resources associated with the dlhdl.Target objects.

release(hTargetJTAG)
release(hTargetEthernet)