Main Content

Perform Multicore Analysis for Dataflow

When a subsystem in a model is configured to use a dataflow execution domain, the Multicore tab is activated on the Simulink® toolstrip. This tab consolidates multicore analysis techniques leveraged in dataflow into an incremental and iterative workflow.

Using the controls on the Multicore tab, you can:

  • Estimate the relative cost of blocks using internal Simulink heuristics.

  • Measure average execution times (cost) of blocks inside the dataflow subsystems by simulating the model with software-in-the-loop (SIL) or processor-in-the-loop (PIL) profiling . This functionality requires an Embedded Coder® license.

  • Manually override the block cost values.

  • Provide analysis constraints, such as maximum number of threads and threading threshold.

  • Run analysis to generate a block-to-threads allocation and visualize analysis results.

Multicore tab

This chart illustrates the steps of multicore analysis. After you specify the dataflow execution domain for the subsystems in your model, you can select a cost calculation method, overwrite block costs, specify analysis constraints, run analysis, and review results.

Multicore analysis workflow

Select the Cost Calculation Method

On the Multicore tab, in the Mode section, you can select the method of cost calculation as Cost Estimation or SIL/PIL Profiling . In both modes, the cost of individual blocks will be automatically determined and used in the multicore analysis for equal distribution of the computational load across multiple CPU cores.

Cost estimation in the toolstrip

Cost Estimation

Use Cost Estimation for:

  • Quick analysis without running the simulation or generating code.

  • Preliminary analysis when the model is not fully implemented. In this case, you can modify the results of the estimation to match the anticipated cost values for the final implementation.

When you click Estimate Cost, the Cost Editor displays the estimated execution cost of each block in your model without simulating it.

Cost Editor window

SIL/PIL Profiling

Use the software-in-the-loop (SIL) or processor-in-the-loop (PIL) profiling method (requires Embedded Coder license) to:

  • Acquire accurate cost values measured on the host computer using the generated code. The generated code is the closest to the code that will be deployed on the hardware.

  • Measure cost values on the actual target hardware in order to maximize the utilization of cores when the final code is deployed.

SIL/PIL profiling measures average execution times (cost) of blocks inside the dataflow subsystems by simulating the model with SIL/PIL.

SIL and PIL profiling settings in the toolstrip

  • Use Settings to configure C/C++ code generation and hardware implementation settings.

  • Use Stop Time to specify the time to measure the cost.

  • Use the list to select the Software-in-the-Loop (SIL) orProcessor-in-the-Loop (PIL) setting.

  • Use Profile to measure the costs associated with blocks with the specified settings.

This example shows the highlighted block in the model and its cost. Observe that Cost Editor displays the units of the profiled cost values when you perform SIL/PIL profiling.

Results from SIL/PIL profiling

Manually Change Block Costs

You can manually change the block cost values to understand their impact to the multicore behavior. To override block costs, clear the Auto column for the corresponding block and edit the value in the Cost column.

Overwriting block costs values allows you to perform analysis for custom costs.

Cost column

Specify Analysis Constraints and Run Analysis

Next, set constraints and run multicore analysis. In the Analyze section:

Number of cores and multithreading threshold

  • Use Maximum Number of Threads to specify the maximum number of threads produced by the analysis. By default, the tool automatically tries to determine the number of cores of the target processor from the hardware settings and uses that as maximum number of threads. If the tool is unable to determine the exact value, it will use the number of cores on the host platform as the maximum number of threads.

  • Specify the Multithreading Threshold to set a minimum for the total cost (in microseconds) of the subsystem, for which the tool applies multithreading. If the total cost falls below the threshold, the tool will not partition the subsystem. By default, the tool uses a nominal value, 25 micro- seconds, as the threshold.

  • Click Run Analysis to perform the analysis based on your configuration.

Review Results

Use the tools provided in the Review Results section to visualize and understand the multicore behavior of your model.

Tools to review results

Highlight and View Threads

Select Highlight threads to highlight and visualize the threads and the assignment of blocks to the threads based on the block execution cost values.

Thread highlighting

Select Thread Viewer to visualize the allocation of blocks to threads.

Allocation of the blocks to threads

Analysis Report and Suggestions

Analyze the Suggestions for Increasing Concurrency section to see if there are suggested latencies for pipelining delays. By pipelining the data-dependent blocks, the Dataflow Subsystem block can increase concurrency for higher data throughput. For more information about pipelining delays, see Multicore Simulation and Code Generation of Dataflow Domains.

Suggested latency

After accepting suggested latencies for pipelining delays, you can use Show pipeline delays to visualize the delays in your model.

Pipeline delays in the model

Use the analysis report to investigate the relative weight of dataflow subsystems and the maximum theoretical speedup for the entire model. This speedup can be achieved as a result of the partitioning performed during the analysis. The amount of speedup is proportional to the relative weight of dataflow subsystems with respect to the entire model.

The analysis report displays total cost and number of threads values for each Dataflow Subsystem block.

Theoretical speedup chart

The speedup is calculated using this formula, where n is the total number of Dataflow Subsystem blocks, pctPar is the percentage of the parallel execution of a subsystem, and criticalPathCost is the cost of the most costly thread in a subsystem.


See Also

Related Topics