Main Content

Streaming Data from Software to Hardware

This example shows how to design the data-path from an embedded processor to hardware logic (FPGA) using SoC Blockset™. Design and simulate the entire application comprising of FPGA and processor algorithms, memory interface, and task scheduling to meet the system requirements. Then, validate the design on hardware by generating code from the model and implementing it on a System-on-Chip (SoC) device.

Supported hardware platforms:

  • Xilinx® Zynq® ZC706 evaluation kit

  • Xilinx Zynq UltraScale™+ MPSoC ZCU102 Evaluation Kit

  • Xilinx Zynq UltraScale™+ RFSoC ZCU111 Evaluation Kit

  • ZedBoard™ Zynq-7000 Development Board

  • Altera® Cyclone® V SoC development kit

  • Altera Arria® 10 SoC development kit

Design Task and System Requirements

In this example, the embedded processor sends test data of either a low or high frequency sinusoid to the FPGA. The FPGA algorithm detects the frequency of the signal by filtering and lights up a light-emitting diode (LED) to indicate the detection. This example models the data-path similar to the Streaming Data from Hardware to Software example. In this example, the data-flow is reversed as compared to the Streaming Data from Hardware to Software example.

The application has these performance requirements.

  • Throughput: 10e6 samples per second

  • Maximum latency: 10 ms

  • Data streaming: Continuous

Design Using SoC Blockset

Create SoC model soc_swhw_stream_top using the template Stream from Processor to FPGA Template. The top model includes FPGA model soc_swhw_stream_fpga and processor model soc_swhw_stream_proc instantiated as model references. The top model also includes Memory Channel and Memory Controller blocks that model shared external memory between the FPGA and processor.

Design to Meet Latency Requirement: Begin with a few potential frame sizes and calculate the frame period for each frame size in Table-1. The frame period is the time between two consecutive frames from the FPGA to processor. For this example, the FPGA output sample time is 1/10e6, or 1e-7, as the FPGA algorithm runs at 10 MHz. The frame period is calculated as

$FramePeriod = Frame size * FPGAOutputSampleTime$

The latency of the memory channel is due to the time elapsed by samples in the queue of frame buffers and the FPGA FIFO. Select the FPGA FIFO size such that it is equivalent to the size of one frame buffer. To stay within the maximum latency requirement, calculate the number of frame buffers for each frame size such that:

$(NumFrameBuffers + 1) * FramePeriod <= MaxLatency$

The maximum latency allowed for this example is 10 ms. Calculate the maximum frame buffers for all of the cases in this table. Because the number of buffers accounts for the maximum latency requirement, all of the cases meet the latency requirement.

The range for number of buffers is dictated by memory architecture constraints. The maximum number of frame buffers allowed by the software Direct Memory Access (DMA) driver is 64. The minimum number of frame buffers is 3. While the processor writes one frame buffer, the FPGA reads from another frame buffer. Therefore, the range for the number of frame buffers is:

$3 <= NumFrameBuffers <= 64$

Case #5 and #6 violate the minimum buffer requirements.

Design to Meet Throughput Requirement: On average, the software processing must complete within a frame period. If it does not, the software task does not generate data fast enough for consumption by the FPGA, violating the throughput requirement. i.e.

$FramePeriod &#62; MeanTaskDuration$

Various ways exists for obtaining mean task durations corresponding to frame sizes for your algorithm. These concepts are covered in the Task Execution example. Mean task durations for various frame sizes are captured in the following Table-2. Because the mean task duration is greater than the calculated frame period, case #1 and #2 violate the throughput requirement.

Design to Meet Data Continuity Requirement: To meet the data continuity requirement, fill in the frame buffers in the memory (priming) before starting to stream the data. When temporary disruptions occur due to processor execution, the data is available from the preciously filled frame buffers filled earlier. Priming is accomplished by designing software logic under the soc_swhw_stream_proc/Writer/Priming subsystem, which generates a streamEnable command for the FPGA to start streaming data after the memory is almost full.

Because the task durations can vary for many reasons such as different code execution paths and variation in OS switching time, the software task might not deliver data to the FPGA through shared memory on time. This can result in loss of data continuity. Specify the mean task execution duration and its statistical distribution in the mask of the Task Manager block, and then simulate to verify if this requirement is met.

By default, the model is configured with case #3 parameters by default. Simulate the top model, and Click Data Inspector from the Simulation tab. Add bufAvail signals on the top view. In this case, the available software buffer signal does not drop to zero, and validDropLED in the top model does not light up, indicating that the data is streamed continuously.

Set the model for case # 4 as in this code and simulate the model again.

soc_swhw_stream_set_parameters(4); % row # 4

In this case, the available software buffers drop to zero, and the validDropLED in the top model lights up.

Case #4 violates the data continuity requirement. Case #3 is proven to be the optimal case that meet all of the design requirements. This Table-3 shows the updated results.

Run soc_swhw_stream_set_parameters(3) command to restore the model with case #3 parameters before deployment of the model.

Implement and Run Model on Hardware

These products are required for this section:

  • HDL Coder™

  • Embedded Coder®

  • SoC Blockset Support Package for Xilinx Devices, or SoC Blockset Support Package for Intel Devices

For more information about support packages, see SoC Blockset Supported Hardware.

To implement the model on a supported SoC board use the SoC Builder tool. By default, the model is implemented on Xilinx® Zynq® ZC706 evaluation kit as it is configured with that board. To open SoC Builder click, Configure, Build, & Deploy button in the toolstrip and follow these steps:

  1. Select Build Model on the Setup screen. Click Next.

  2. Click Next on the Review Task Map screen.

  3. On Review Memory Map screen, click View/Edit Memory Map to view the memory map. Click Next.

  4. Specify the project folder on the Select Project Folder screen. Click Next.

  5. Select Build, load for external mode on the Select Build Action screen. Click Next.

  6. On Validate Model screen, click Validate to check the compatibility of model for implementation. Click Next.

  7. On Build Model screen, click Build to begin building of the model. An external shell opens when FPGA synthesis begins. Click Next.

  8. Click Test Connection on the Connect Hardware screen to test the connectivity of the host computer with SoC board. Click Next to go to the Run Application screen.

The FPGA synthesis can take more than 30 minutes to complete. To save time, you can use the provided pregenerated bitstream by following these steps.

  1. Close the external shell to terminate synthesis.

  2. Copy pregenerated bitstream to your project folder by running this copyfile command below.

  3. Click Load and Run to load the pregenerated bitstream and open the generated software model soc_swhw_stream_top_sw.

copyfile(fullfile(matlabroot,'toolbox','soc','socexamples','bitstreams','soc_swhw_stream_top-zc706.bit'), './soc_prj');

After loading the bitstream, run the generated software model soc_swhw_stream_top_sw in external mode by clicking Monitor and Tune on the toolstrip. This will light up LED2 on the board, indicating the detection of high frequency signal by the FPGA. To change the frequency of the sinusoid signal dynamically at run-time, replace the SourceSelector terminator block with a Constant block, and then run the model again in external mode. Modify the constant value from 0 to 1 to change the frequency of signal from a high to low respectively.

Implementation on other boards: To implement the model on a supported board other than ZC706, first configure the model to the supported board, and then set the example parameters as below.

  • On the Hardware tab, click Hardware Settings to open the Configuration Parameters window.

  • In the Hardware Implementation tab, select your board from Hardware board drop-down list on both top and processor model.

  • Navigate to Target hardware resources > FPGA design (top level) tab and set IP core clock frequency (MHz) to 10 MHz.

Next, open SoC Builder and follow the steps as previously stated for Xilinx® Zynq® ZC706 above. Modify the copyfile command to match the bitstream corresponding to your board. In case of Altera Arria® 10 SoC development kit, copy '.periph.rbf' and '.core.rbf' files. The following are the available pre-generated bitstream files:

  • 'soc_swhw_stream_top-zc706.bit'

  • 'soc_swhw_stream_top-zedboard.bit'

  • 'soc_swhw_stream_top-zcu102.bit'

  • 'soc_swhw_stream_top-XilinxZynqUltraScale_RFSoCZCU111EvaluationKit.bit'

  • 'soc_swhw_stream_top-c5soc.rbf'

  • 'soc_swhw_stream_top-a10soc.periph.rbf'

  • 'soc_swhw_stream_top-a10soc.core.rbf'

In summary, this example showed how to design the data-path from processor to FPGA for continuous streaming. You designed and modeled the behavior using SoC Blockset and went through the workflow required to implement it on an SoC device.