Multicore Execution Using Dataflow Domain

This example uses:

This example shows how to speed up execution of models using dataflow domain in Simulink®. We use the digital up converter and digital down converter blocks to create a family radio service transmitter and receiver.

Introduction

Dataflow execution domain allows you to make use of multiple cores in the simulation of computationally intensive signal processing systems.

This example shows how to specify dataflow as the execution domain of a subsystem, improve simulation performance of the model, and generate multicore code.

Family Radio Service System

This example uses the Digital Up-Converter (DUC) and Digital Down-Converter (DDC) blocks to create a Family Radio Service (FRS) transmitter and receiver. The Digital Up-Converter (DUC) block converts a complex digital baseband signal to real passband signal. The Digital Down-Converter (DDC) block converts the digitized real signal back to a baseband complex signal. Open familyRadioServiceExample model.

Simulate the model and measure the execution time. Execution time is measured using the output of the sim command which returns the simulation execution time of the model. To measure the time taken primarily for the dataflow subsystem, comment out the Spectrum Analyzer blocks and Audio Device Writer block.

Simulation execution time for single-threaded model = 4.24s

Specify Dataflow Execution Domain

In Simulink, you specify dataflow as the execution domain for a subsystem by setting the Domain parameter to Dataflow using Property Inspector. To access Property Inspector, in the Simulink Toolstrip, on the Modeling tab, in the Design gallery select Property Inspector or on the Simulation tab, Prepare gallery, select Property Inspector. In the Property Inspector, you can set the Domain to Dataflow by selecting Set domain specification and then choosing Dataflow for Domain setting. You can also use Dataflow Subsystem block from the Dataflow library of DSP System toolbox to get a subsystem that is preconfigured with the dataflow execution domain.

Multicore Simulation of Dataflow Domain

Dataflow domains automatically partition your model into multiple threads for better performance. Once you set the Domain parameter to Dataflow, you can use the Multicore tab analysis to analyze your model to get better performance. The Multicore tab is available in the toolstrip when there is a dataflow domain in the model. To learn more about the Multicore tab, see Perform Multicore Analysis for Dataflow.

For this example the Multicore tab mode is set to Simulation Profiling for simulation performance analysis.

It is recommended to optimize model settings for optimal simulation performance. To accept the proposed model settings, on the Multicore tab, click Optimize. Alternatively, you can use the drop menu below the Optimize button to change the settings individually. In this example the model settings are already optimal.

On the Multicore tab, click the Run Analysis button to start the analysis of the dataflow domain for simulation performance. Once the analysis is finished, the Analysis Report and Suggestions window shows how many threads the dataflow subsystem uses during simulation.

After analyzing the model, the Analysis Report and Suggestions window shows one thread because the data dependency between the blocks in the model prevents blocks from being executed concurrently. By pipelining the data dependent blocks, the dataflow subsystem can increase concurrency for higher data throughput. The Analysis Report and Suggestions window shows the recommended number of pipeline delays as Suggested for Increasing Concurrency. The suggested latency value is computed to give the best performance.

The following diagram shows the Analysis Report and Suggestions window where the suggested latency is 2 for the dataflow subsystem.

Click the Accept button to use the recommended latency for the dataflow subsystem. This value can also be entered directly in the Property Inspector for Latency parameter. Simulink shows the Latency parameter value using $Z^{-n}$ tags at the output ports of the dataflow subsystem.

The Analysis Report and Suggestions window now shows the number of threads as 2 meaning that the blocks inside the dataflow subsystem simulate in parallel using 2 threads. Highlight threads highlights the blocks with colors based on their thread allocation as shown in the Thread Highlighting Legend. Show pipeline delays shows where pipelining delays were inserted within the dataflow subsystem using $Z^{-n}$ tags.

Compensating for Latency

When latency is increased in the dataflow execution domain to break data dependencies between blocks and create concurrency, that delay needs to be accounted for in other parts of the model. For example, signals that are compared or combined with the signals at the output ports of the dataflow subsystem must be delayed to align in time with the signals at the output ports of the dataflow subsystem. In this example, the audio signal from the source block that goes to the Audio Device Writer must be delayed to align with other signals. To compensate for the latency specified on the dataflow subsystem, use a delay block to delay this signal by 2 frames. For this signal, the frame length is 1000. A delay value of 2000 is set in the delay block to align the signal from source and the signal processed through dataflow subsystem.

Dataflow Simulation Performance

Simulate the model and measure model execution time. When measuring the time taken for simulating the model, comment out the Spectrum Analyzer blocks and Audio Device Writer blocks to measure the time taken primarily for the dataflow subsystem. Execution time is measured using the sim command, which returns the simulation execution time of the model. We can measure the amount of speedup obtained by dividing the execution time taken by the model using multiple threads with the execution time taken by the original model. This number is computed and shown below.

These numbers and analysis were published on a Windows® desktop computer with Intel® Xeon® CPU E5-1650 v3 @ 3.4 GHz 6 Cores 12 Threads processor.

Simulation execution time for multithreaded model = 2.20s
Actual speedup with dataflow: 1.9x

Code Generation

Code generation requires a Simulink Coder™ or an Embedded Coder® license. Press Ctrl+B to build the model and generate single-core code for your desktop target. If your desktop machine is Windows or Linux®, you can generate multicore code for the model. To enable multicore code generation for the model, you must select the Allow tasks to execute concurrently on target parameter in the Solver pane under Solver details. Selecting this parameter allows:

Each rate in the model to execute as an independent concurrent task on the target processor
The dataflow subsystem to generate additional concurrent tasks by automatically partitioning the blocks

In the generated code you can observe the generated functions for each concurrent task created by the dataflow domain and realized as an OpenMP section.

Summary

This example shows how to specify dataflow as the execution domain in a model to design computationally-intensive signal processing systems, improve simulation performance of the model and generate multicore code.

Appendix

The following helper functions are used in this example.

dspSimulateDataflowExample