This example demonstrates how to analyze memory bandwidth for an SoC application. In memory-intensive hardware designs, you may have multiple masters accessing a common DDR memory. In such cases, it is important to analyze the dynamic requirement of all memory masters to guide algorithm design and hardware board requirement for deployment. You can simulate the memory traffic using Memory traffic generators, analyze the bandwidth usage and verify it on the hardware.
Supported hardware platforms
Xilinx® Zynq® ZC706 evaluation kit
Xilinx® Kintex® 7 KC705 development board
Consider an application performing HD video processing in FPGA on real-time input and output. This application requires four memory consumers vying for DDR access simultaneously. Memory master 1 writes incoming video frames to memory and Memory master 4 reads video frames out of memory and connect to output display. Memory master 2 reads the data from memory for processing in FPGA and Memory master 3 writes the data back to memory.
Each master operates on HD video with following characteristics:
Frame size: 1920x1080p
Pixel size: 2 Bytes (YCbCr format)
Frame period: 1/60 = 16.67ms (for 60 FPS)
Frame data: 1920x1080x2 = 4.1472MB
Each master requires following minimum memory bandwidth to get the frame rate of 60 FPS.
Memory bandwidth: Frame data/ Frame period = 4.1472e6/16.67e-3 = 248.8MBps
Assume the memory controller characteristics are as follows:
Clock frequency: 200 MHz
Data width: 32 bits
Burst transaction length: 128
Memory Controller: Set the memory controller parameters in Configuration Parameters > Hardware Implementation > Target Hardware Resources. Under FPGA Design (mem Controllers) tab, set the clock frequency to 200 MHz and data width to 32. Under FPGA Design (debug) tab, select Include AXI interconnect monitor.
Memory Traffic Generators 1 & 4: Memory traffic characteristics for Master 1 and 4 are same as they represent streaming of video frames to and from memory. Set the memory traffic characteristics for masters 1 and 4 as follows:
Burst size (in bytes): Burst transaction length * (Data width/8) = 128* 32/8 = 512
Total burst requests: 4 frames data for simulation = 4 * Traffic data/Burst size = 4*8100 = 32400
Burst inter access time: Frame period/Number of Burst requests = 16.67e-3/8100 = 20.58e-7 sec. As a constant data traffic, the data is continuously received at a constant rate. Set the burst times as below:
First burst time = 20.58e-7
Random time between the bursts = [20.58e-7 20.58e-7]
Update the Memory Traffic Generator1 and Memory Traffic Generator4 block mask with above values. Set the Request type for Memory Traffic Generator1 with writer and Memory Traffic Generator4 with reader. Clear the Wait for burst done option in both the block masks as these masters represent the masters with continuous traffic, such as HDMI Camera and display.
Memory Traffic Generators 2 & 3: Memory Traffic Generator2 represent reader for FPGA Algorithm and Memory Traffic Generator3 represent writer from FPGA Algorithm. Set the memory traffic characteristics for masters 2 and 3 as follows:
Burst size (in bytes): Burst transaction length * (Data width/8) = 128* 32/8 = 512
Total burst requests: 4 * Traffic data/Burst size = 4*8100 = 32400(4 frames data for simulation)
Burst inter access time: (Burst Length + 10)/Clock period = 6.9e-7(0.69us). To allow some randomness in burst times for read and write request of data, due to variation in demands of algorithm, set the burst times as below:
First burst time: 7.2e-7
Random time between the bursts: [7.2e-7 7.4e-7]
Run the model. After completion of simulation, open the Memory Controller block and click on View performance plots under Performance tab. Select all the masters under Bandwidth tab and click Create Plot. You can notice that all masters roughly achieved a bandwidth of 190 MBps and did not meet the required 248 MBps. It is also observed by the warnings in the diagnostic viewer.
To meet the required bandwidth, modify the data width of controller from 32 to 64 in configuration parameter settings under Target Hardware Resources. This requires changing the Memory Traffic Generator settings accordingly as follows:
Burst size (in bytes): Burst transaction length * (Data width/8) = 128* 64/8 = 1024
Total burst requests: 4 * Traffic data/Burst size = 4*4050 = 16200(4 frames of data for simulation)
Burst inter access time for Memory Traffic Generators 1 & 4: Frame Period/Number of Burst requests = 16.67e-3/4050 = 41.16e-7 sec. Set the burst times as below:
First burst time: 41.16e-7
Random time between the bursts: [41.16e-7 41.16e-7]
There is no change in First burst time and Random time between the bursts for Memory Traffic Generators 2 and 3, since they are determined based on algorithm requirements.
Simulate the model and open the Bandwidth plot from Memory Controller as mentioned earlier. Notice that Memory bandwidth achieved by Memory Traffic Generator 1 and 4 is 248 MBps. The memory bandwidth from Generator 2 and 3 is around 500 MBps. This meets the design requirement as all the masters are able to meet the real-time requirement of 248 MHz. Observe that there are no warnings on the diagnostic viewer as burst requests are not dropped.
SoC Blockset Support Package for Xilinx Devices is required for this section.
To implement the model on a supported FPGA board, use the SoC Builder application. By default, the model will be implemented on Xilinx® Zynq® ZC706 evaluation kit as it is configured with that board.
AXI Traffic Generator(ATG), the hardware IP Core for Memory Traffic Generator block does not support random burst inter access times and it differentiates Reader and Writer masters in arbitration policy unlike the Memory Traffic Generator block for simulation. Therefore, before implementing on hardware, modify the Memory block settings as follows:
Make all the Memory Traffic Generators as 'Writers'
For Memory Traffic Generator 2 and 3, set [7.2e-7 7.2 e-7] for Random time between burst to make it fixed inter burst time of 7.2e-7
To open SoC Builder, select the System on Chip tab in the Simulink toolstrip, and click the Configure, Build, & Deploy button. Once SoC Builder opens, follow these steps:
Select Build Model on Setup screen. Click Next.
Click View/Edit Memory Map to view the memory map on Review Memory Map screen. Click Next.
Specify project folder on Select Project Folder screen. Click Next.
Select Build, load and run on Select Build Action screen. Click Next.
Click Validate to check the compatibility of model for implementation on Validate Model screen. Click Next.
Click Build to begin building of the model on Build Model screen. An external shell will open when FPGA synthesis begins. Click Next to Load Bitstream screen.
The FPGA synthesis may take more than 30 minutes to complete. To save time, you may want to use the provided pre-generated bitstream by following steps:
Close the external shell to terminate synthesis.
Copy the pre-generated bitstream to your project folder and rename by running the below command.
copyfile(fullfile(matlabshared.supportpkg.getSupportPackageRoot,'toolbox','soc',... 'supportpackages','xilinxsoc','xilinxsocexamples','bitstreams',... 'soc_memory_traffic_generator-zc706.bit'), './soc_prj');
Click Load button to load pre-generated bitstream.
To run this example, copy the example test bench to your project folder.
copyfile(fullfile(matlabroot,'toolbox','soc','socexamples',... 'soc_memory_traffic_generator_aximaster.m'), './soc_prj','f');
The testbench configures the generated hardware ATG IP cores for Memory Traffic Generators. To run on hardware, increase the number of burst requests by 100 times since it uses MATLAB® as AXI Master IP to get the samples back to MATLAB®, which involves substantial delay in accessing hardware. Load soc_memory_traffic_generator_zc706_aximaster.mat file and increase the number of burst requests for all the masters in ATG configuration to 100 times. Save the .mat file requests in ATG configuration.
Enter the following command to run the test bench soc_memory_traffic_generator_aximaster.
After running the test bench, the following output is generated showing the memory traffic. All masters passing the bandwidth requirements.
Implementation on Xilinx® Kintex® 7 KC705 development board: To implement the model on KC705 development board, you must first configure the model to Xilinx® Kintex® 7 KC705 development board and set the following example parameters. Open Model Configuration Parameters, navigate to Hardware Implementation tab and perform the following:
Select Xilinx® Kintex® 7 KC705 development board from the drop-down list under Hardware board.
Navigate to Target hardware resources > FPGA design (top level) tab and enable Include MATLAB as AXI Master IP for host-based interaction.
Navigate to Target hardware resources > FPGA design (mem controllers) tab and set Controller data width (bits) to 64.
Navigate to Target hardware resources > FPGA design (debug) tab and enable Include AXI interconnect monitor.
Next, open SoC Builder and follow the steps as previously stated for Xilinx® Zynq® ZC706 above. Modify the copyfile command to match Kintex® 7 KC705 development board bitstream as below.
copyfile(fullfile(matlabshared.supportpkg.getSupportPackageRoot,'toolbox','soc',... 'supportpackages','xilinxsoc','xilinxsocexamples','bitstreams',... 'soc_memory_traffic_generator-kc705.bit'), './soc_prj');
In summary, you simulated the memory traffic for a prospective design before designing the algorithms. You analyzed memory bandwidth and modified memory parameters to meet the design requirement. You verified the results on hardware.