Main Content

Interface with the Deep Learning Processor IP Core

Retrieve predictions for a batch of images or for a data stream from a live camera input by using the generated deep learning processor IP core. Select between batch processing mode and streaming mode depending on available board resources, availability of input data, and application requirements. Use MATLAB® to run your deep learning network on the generated deep learning processor IP core and retrieve the network prediction from the generated deep learning processor IP core.

Create Deep Learning Processor Configuration

To generate a deep learning processor IP core that has the required interfaces for processing multiple data frames, create a deep learning processor configuration by using the dlhdl.ProcessorConfig class. In the deep learning processor configuration:

  • Set InputRunTimeControl and OutputRunTimeControlto either port or register.

  • You must set InputDataInterface and OutputDataInterface to ExternalMemory.

Use the dlhdl.buildProcessor function with the deep learning processor configuration object as the input argument to generate the deep learning processor IP core. For example, this code generates a deep learning processor IP core with the interfaces to process multiple data frames.

hPC = dlhdl.ProcessorConfig;
hPC.InputRunTimeControl = 'port';
hPC.OutputRunTimeControl = 'port'
hPC.InputDataInterface = 'External Memory';
hPC.OutputDataInterface = 'External Memory';
dlhdl.buildProcessor(hPC);

Select Data Processing Mode

Choose between batch processing mode and streaming mode based on your resource requirements, availability of inputs, and interface complexity. This table lists the different selection criteria and which mode to select based on the selection criteria.

Selection CriteriaBatch Processing ModeStreaming Mode
Availability of input dataAll input data must be available before you trigger the deep learning processor IP core to start processing data.Stream input data as and when data is available.
Memory requirementsRequires large memory resources to store all the input data and processed output data as the deep learning processor IP core processes all the data together.Requires minimal memory resources. The smallest memory required is twice the size of one input data frame.
Interface ComplexitySimple protocol. No handshaking protocol required.Complex protocol. You must implement a handshaking protocol.

Design Processing Mode Interface Signals

You can group the interface signals into run-time signals and handshaking signals. Handshaking signals are used only when the data processing mode is set to streaming mode.

Run-Time Signals

This table lists the run-time signals, data types, interface types, and description.

Signal NameData TypeInterface Type (Port or Register)Description
DonelogicalregisterSignal indicating that the deep learning processor IP core has processed all input data and written the last output to memory.
InputStartlogicalregisterSignal from the user to the deep learning processor IP core to start processing the data.
FrameCountintegerregisterSignal from the user to the deep learning processor IP core specifying the number of input data frames.
StreamingModelogicalregisterSignal from the user to the deep learning processor IP core specifying the data processing mode. false selects buffer mode and true selects streaming mode.
StreamingDonelogicalregisterSignal to test streaming mode. During testing, the signal becomes true when you retrieve the last output.
InputStoplogicalregisterSignal to stop the continuous streaming mode. To stop the continuous streaming mode, set this signal to true.

Handshaking signals

This table lists the handshaking signals, data types, interface types, and description. These signals are used for streaming mode. The interface type depends on the InputRunTimeControl and OutputRunTimeControl settings. For example, if InputRunTimeControl is set to port, the interface type is port.

Signal NameData TypeInputRunTimeControl or OutputRunTimeControlInterface Type (Port or Register)Description
InputAddruint32InputRunTimeControlport/registerSignal indicating the address location in memory for loading the input data. Use this signal when the InputValid signal is high.
InputNextlogicalInputRunTimeControlport/registerSignal to the deep learning processor IP core to indicate that the next data frame is available for processing. Use this signal when the InputValid signal is high.
InputSizeuint32InputRunTimeControlport/registerSignal indicating the size of the next input data frame. Use this signal when the InputValid signal is high.
InputValidlogicalInputRunTimeControlport/registerSignal from the deep learning processor IP core indicating that the input data is valid.
OutputAddruint32OutputRunTimeControlport/registerrSignal indicating the address location in memory from where to retrieve the output data. Use this signal when the OutputValid signal is high.
OutputNextlogicalOutputRunTimeControlport/registerSignal to the deep learning processor IP core to indicate that you have read the current output data frame. Use this signal when the OutputValid signal is high.
OutputSizeuint32OutputRunTimeControlport/registerSignal indicating the size of the next output data frame. Use this signal when the OutputValid signal is high.
OutputValidlogicalOutputRunTimeControlport/registerSignal from the deep learning processor IP core indicating that the output data is valid.

Design Batch Processing Mode Interface

When you have all your input data available and access to large double data rate (DDR) memory space, process multiple frames by using the batch processing mode. The figure shows the generated deep learning processor IP core with interface signals for the batch processing mode of operation. You use MATLAB and a dlhdl.Workflow object to run your deep learning network on the deep learning processor IP core. Retrieve the network prediction results from the deep learning processor IP core.

To process a single data frame set the FrameCount register value to one.

Deep Learning Processor IP core with buffer mode interface signals

This flowchart shows the operation of the batch processing mode.

Flowchart detailing buffer mode of operation.

This timing diagram shows the operation of the batch processing mode.

Buffer mode timing diagram for three input data frames

Load all the data frames into consecutive input DDR memory locations, toggle the inputStart signal, wait for the done signal to go high, and then read the output data from the consecutive output DDR memory locations. The clientAction signals represent your actions of loading input data and reading output data into the DDR memory.

Design Streaming Mode Interface

When your input data is streaming in, when you have access to limited double data rate (DDR) memory space, and when your application requires handshaking protocols, process multiple frames by using the streaming mode. The figure shows the generated deep learning processor IP core with interface signals for the streaming mode of operation. In this figure, the live camera streams data to an image preprocessing design under test (DUT) that implements the streaming mode handshaking protocol to interact with the generated deep learning processor IP core.

Date can be streamed to the deep learning processor IP core in two modes:

  • Stream Data up to a frame count value— In this mode the deep learning processor processes data frames up to the value specified in FrameCount. After processing all the frames the deep learning processor IP core sets the Done signal to true.

    To process a single data frame set the FrameCount register value to one.

  • Continuous streaming mode— In this mode the deep learning processor IP core processes data frames until you set the InputStop value to true.

Deep Learning Processor IP core with interface signals for streaming mode

Streaming Mode up to a Frame Count

This flowchart shows the operation of the streaming mode data processing mode. The read and write operations occur in parallel. The value set in the InputFrameNumberLimit specifies the number of spaces in the DDR for the input and output ring buffers. If you use a larger ring buffer size the deep learning processor IP core processes more input images before you read the first output result.

This flowchart shows the operation of the streaming mode up to a frame count. The read and write operations occur in parallel.

Flowchart detailing streaming mode operation

This timing diagram shows the operation of the streaming mode up to a frame count.

Streaming up to a frame count mode timing diagram

  1. Set the InputFrameNumberLimit argument of the compile method to a value greater than two.

  2. Set the StreamingMode signal to true.

  3. Set the number of data frames to process in the FrameCount register.

  4. Pulse the inputStart signal. These next actions can be performed in parallel:

    1. Wait for the inputValid signal to become true and then:

      • Use the inputAddr and inputSize signals to write the next input data frame to DDR memory.

      • Pulse the inputNext signal.

    2. Wait for the outputValid signal to become true and then:

      • Use the outputAddr and outputSize signals to read the processed output data frame.

      • Pulse the outputNext signal.

  5. Once the deep learning processor IP core has processed all the frames it sets the done signal to true.

The clientAction signals represent your actions of loading input data and reading output data into the DDR memory.

Continuous Streaming Mode

You can continuously stream data to the deep learning processor in continuous streaming mode. To use the continuous streaming mode, set the FrameCount to zero. To stop the data processing set the InputStop signal to true.

This flowchart shows the operation of the continuous streaming mode. The read and write operations occur in parallel.

Flowchart detailing continuous streaming mode operation

This timing diagram shows the operation of the continuous streaming mode.

Continuous streaming mode timing diagram

  1. Set the InputFrameNumberLimit argument of the compile method to a value greater than two.

  2. Set the StreamingMode signal to true.

  3. Set the number of data frames to process in the FrameCount register to zero.

  4. Pulse the inputStart signal. These next actions can be performed in parallel:

    1. Wait for the inputValid signal to become true and then:

      • Use the inputAddr and inputSize signals to write the next input data frame to DDR memory.

      • Pulse the inputNext signal.

    2. Wait for the outputValid signal to become true and then:

      • Use the outputAddr and outputSize signals to read the processed output data frame.

      • Pulse the outputNext signal.

  5. Once you have written all the input data and read all the output data pulse the InputStop signal.

Access Data from DDR

The deep learning IP core uses the three AXI4 Master interfaces to store and process:

  • Activation data

  • Weight data

  • Debug data

The deep learning processor reads and writes data from the DDR based on the data processing mode of operation by using these AXI4 Master interfaces.

See Also

| |