Documentation

Multicore Programming with Simulink

Using the process of partitioning, mapping, and profiling in Simulink®, you can address common challenges of designing systems for concurrent execution.

Partitioning enables you to designate regions of your model as tasks, independent of the details of the embedded multicore processing hardware. This independence enables you to arrange the content and hierarchy of your model to best suit the needs of your application.

In a partitioned system, mapping enables you to assign partitions to processing elements in your embedded processing system. You can use the Simulink mapping tool to represent and manage the details of executing threads, HDL code on FPGAs, and the work that these threads or FPGAs perform. While creating your model, you do not need to keep track of the partitions or data transfer between them. The tool does this work. This capability enables you to reuse your model across multiple architectures.

Profiling simulates deployment of your application under typical computational loads. It enables you to determine the partitioning and mapping for your model that gives the best performance, before you deploy to your hardware.

How Simulink Helps You to Overcome Challenges in Multicore Programming

Manually programming your application for concurrent execution poses challenges beyond the typical challenges with manual coding. With Simulink, you can overcome the challenges of portability across multiple architectures, efficiency of deployment for an architecture, and cyclic data dependencies between application components. For more information on these challenges, see Challenges in Multicore Programming

Portability

Simulink enables you to determine the content and hierarchical needs of the modeled system without considering the target system. While creating model content, you do not need to keep track of the number of cores in your target system. Instead, you select the partitioning methods that allow you to create model content. Simulink generates code for the architecture you specify.

You can select an architecture from the available supported architectures or add a custom architecture. When you change your architecture, Simulink generates only the code that needs to change for the second architecture. The new architecture reuses blocks and functions. For more information, see Supported Targets and Select Target Architecture.

Handle Data Transfers.  Data dependencies arise when a signal originates from one block in one partition and is connected to a block in another partition. To create opportunities for parallelism, Simulink provides multiple options for handling data transfers between concurrently executing partitions. These options help you trade off computational latency for numerical signal delays:

GoalAction
  • Create opportunity for parallelism.

  • Produce numeric results that are repeatable with each run of the generated code.

  • In the Data Transfer pane of the Concurrent Execution dialog box, select Ensure deterministic transfer (Maximum Delay) for either signal type.

  • To achieve this behavior, Simulink introduces signal delays, which can have numeric impact on the numeric results. To compensate, you might need to specify an initial condition for these delay elements.

  • Create opportunity for parallelism.

  • Reduce signal latency.

  • In the Data Transfer pane of the Concurrent Execution dialog box, select Ensure data integrity only for either signal type.

  • Simulink generates code to operate with maximum responsiveness and data integrity. However, the implementation is interruptible, which can lead to loss of data during data transfer.

  • Use a deterministic execution schedule to achieve determinism in the deployment environment.

  • Enforce data dependency.

  • Produce numeric results that are repeatable with each run of the generated code.

  • In the Data Transfer pane of the Concurrent Execution dialog box, select Ensure deterministic transfer (Minimum Delay) for either signal type.

  • Simulink uses target specific synchronization primitives to synchronize data transfer.

For example, consider a control application in which a controller that reads sensory data at time T must produce the control signals to the actuator at time T+Δ.

  • If the sequential algorithm meets the timing deadlines, consider using option 3.

  • If the embedded system provides deterministic scheduling, consider using option 2.

  • Otherwise, use option 1 to create opportunities for parallelism by introducing signal delays.

The table provides the model-level options that you can apply to each signal that requires data transfer in the system. In addition to model-level control, Simulink enables you to override how the data transfer settings are handled for each signal. For more information, see Configuring Data Transfer Communications.

Deployment Efficiency

To improve the performance of the deployed application, Simulink allows you to simulate it under typical computational loads and try multiple configurations of partitioning and mapping the application. Simulink compares the performance of each of these configurations to provide the optimal configuration for deployment. This is known as profiling. Profiling helps you determine the optimum partition configuration before you deploy your system to the desired hardware.

You can create a mapping for your application in which Simulink maps the application components across different processing nodes. You can also manually assign components to processing nodes. For any mapping, you can see the data dependencies between components and remap accordingly. You can also introduce and remove data dependencies between different components.

Cyclic Data Dependency

Some tasks of a system depend on the output of other tasks. The data dependency between tasks determines their processing order. Two or more partitions containing data dependencies in a cycle creates a data dependency loop, also known as an algebraic loop. Simulink does not allow algebraic loops to occur across potentially parallel partitions because of the high cost of solving the loop using parallel algorithms.

In some cases, the algebraic loop is artificial. For example, you can have an artificial algebraic loop because of Model-block-based partitioning. An algebraic loop involving Model blocks is artificial if removing the use of Model partitioning eliminates the loop. You can minimize the occurrence of artificial loops. In the Configuration Parameter dialog boxes for the models involved in the algebraic loop, select Model Referencing > Minimize algebraic loop occurrences.

Additionally, if the model is configured for the Generic Real-Time target (grt.tlc) or the Embedded Real-Time target (ert.tlc) in the Configuration Parameters dialog box, clear the Code Generation > Interface > Single output/update function check box.

If the algebraic loop is a true algebraic condition, you must either contain all the blocks in the loop in one Model partition, or eliminate the loop by introducing a delay element in the loop.

The following examples show how to implement different types of parallelism in Simulink. These examples contain models that are partitioned and mapped to a simple architecture with one CPU and one FPGA.

Implement Data Parallelism in Simulink

This example shows how to implement data parallelism in a Simulink model. The model consists of an input, a functional component that applies to each input, and a concatenated output.

Set up this model for concurrent execution. To see the completed model, open ex_data_parallelism_top.

  1. Convert areas in this model to referenced models. Use the same referenced model to replace each of the functional components that process the input. The figure shows a sample configuration.

  2. Open the Model Explorer.

  3. Expand the options for the top-level model and right-click the Configuration (Active), the active configuration set. Select Show Concurrent Execution options to see how the options for concurrent execution are set in the model configuration parameters.

  4. Close the Model Explorer and open the model configuration parameters. On the Code Generation > Interface pane, clear the MAT-file logging check box.

  5. On the Solver pane, set Type to Fixed-step and click Apply. Under Additional options, click Configure Tasks.

  6. In the Concurrent Execution dialog box, in the right pane, select the Enable explicit model partitioning for concurrent behavior check box. With explicit partitioning, you can partition your model manually.

  7. In the selection pane, select CPU. Click Add task four times to add four new tasks.

  8. In the selection pane, select Tasks and Mapping. On the Map block to tasks pane:

    • Under Block: Input, click select task and select Periodic: Task.

    • Under Block: Function 1, select Periodic: Task1.

    • Under Block: Function 2, select Periodic: Task2.

    • Under Block: Function 3, select Periodic: Task3.

    • Under Block: Output, select Periodic: Task.

    This maps your partitions to the tasks you created. The Input and Output model blocks are on one task. Each functional component is assigned a separate task.

  9. In the selection pane, select Data transfer. In the Data Transfer Options pane, set the parameter Periodic signals to Ensure deterministic transfer (minimum delay). Click Apply and close the Concurrent Execution dialog box.

  10. Apply these configuration parameters to all referenced models. For more information, see Share a Configuration for Multiple Models.

Update your model to see the tasks mapped to individual model blocks.

Implement Task Parallelism in Simulink

This example shows how to implement task parallelism in a Simulink model. The model consists of an input, functional components applied to the same input, and a concatenated output.

Setup this model for concurrent execution. To see the completed model, open ex_task_parallelism_top.

  1. Convert areas in this model to referenced models. Use the same referenced model to replace each of the functional components that process the input. The figure shows a sample configuration.

  2. Open the Model Explorer.

  3. Expand the options for the top-level model and right-click the Configuration (Active), the active configuration set. Select Show Concurrent Execution options to see how the options for concurrent execution are set in the model configuration parameters.

  4. Close the Model Explorer and open the model configuration parameters. On the Code Generation > Interface pane, clear the MAT-file logging check box.

  5. On the Solver pane, set Type to Fixed-step and click Apply. Under Additional options, click Configure Tasks.

  6. In the Concurrent Execution dialog box, in the right pane, select the Enable explicit model partitioning for concurrent behavior check box. With explicit partitioning, you can partition your model manually.

  7. In the selection pane, select CPU. Click Add task three times to add three new tasks.

  8. In the selection pane, select Tasks and Mapping. On the Map block to tasks pane:

    • Under Block: Input, click select task and select Periodic: Task.

    • Under Block: Function 1, select Periodic: Task1.

    • Under Block: Function 2, select Periodic: Task2.

    • Under Block: Output, select Periodic: Task.

    This maps your partitions to the tasks you created. The Input and Output model blocks are on one task. Each functional component is assigned a separate task.

  9. In the selection pane, select Data transfer. In the Data Transfer Options pane, set the parameter Periodic signals to Ensure deterministic transfer (minimum delay). Click Apply and close the Concurrent Execution dialog box.

  10. Apply these configuration parameters to all referenced models. For more information, see Share a Configuration for Multiple Models.

Update your model to see the tasks mapped to individual model blocks.

Implement Pipelining in Simulink

This example shows how to implement pipelining in a Simulink model. The model consists of an input, functional components applied to the same input, and a concatenated output.

Setup this model for concurrent execution. To see the completed model, open ex_pipelining_top.

  1. Convert areas in this model to referenced models. Use the same referenced model to replace each of the functional components that process the input. The figure shows a sample configuration.

  2. Open the Model Explorer.

  3. Expand the options for the top-level model and right-click the Configuration (Active), the active configuration set. Select Show Concurrent Execution options to see how the options for concurrent execution are set in the model configuration parameters.

  4. Close the Model Explorer and open the model configuration parameters. On the Code Generation > Interface pane, clear the MAT-file logging check box.

  5. On the Solver pane, set Type to Fixed-step and click Apply. Under Additional options, click Configure Tasks.

  6. In the Concurrent Execution dialog box, in the right pane, select the Enable explicit model partitioning for concurrent behavior check box. With explicit partitioning, you can partition your model manually.

  7. In the selection pane, select CPU. Click Add task three times to add three new tasks.

  8. In the selection pane, select Tasks and Mapping. On the Map block to tasks pane:

    • Under Block: Input, click select task and select Periodic: Task.

    • Under Block: Function 1, select Periodic: Task1.

    • Under Block: Function 2, select Periodic: Task2.

    • Under Block: Output, select Periodic: Task.

    This maps your partitions to the tasks you created. The Input and Output model blocks are on one task. Each functional component is assigned a separate task.

  9. Close the Concurrent Execution dialog box.

  10. Apply these configuration parameters to all referenced models. For more information, see Share a Configuration for Multiple Models.

Update your model to see the tasks mapped to individual model blocks.

Ways to Partition

There are multiple ways to partition your model for concurrent execution in Simulink. Rate and model based approaches give you primarily graphical means to represent concurrency for systems that are represented using Simulink and Stateflow® blocks. You can partition MATLAB® code using the MATLAB System block. You can also partition models of physical systems using multisolver methods.

Each method has additional considerations to help you decide which to use.

ToValid Partitioning MethodsConsiderations

Increase the performance of a simulation on the host computer.

None of the listed.

In general, Simulink tries to make the best use of the host computer performance regardless of the modeling method you use. For more information on the ways that Simulink helps you improve performance, see Performance.

Increase the performance of a plant simulation in a multicore HIL system.

You can use any of the partitioning methods as well as their combinations.

The processing characteristics of the HIL system and the embedded processing system can vary greatly. Consider partitioning your system into more units of work than there are number of processing elements in the HIL or embedded system. This convention allows flexibility in the mapping process.

Create a valid model of a multirate concurrent system to take advantage of a multicore processing system.

You can use any of the partitioning methods as well as their combinations.

Partitioning can introduce signal delays to represent the data transfer requirements for concurrent execution. For more information, see .

Create a valid model of a heterogeneous system to take advantage of multicore and FPGA processing.

  • Multicore processing: Use any of the partitioning methods.

  • FPGA processing: Partition your model using Model blocks.

Consider partitioning for FPGA processing where your computations have bottlenecks that can benefit from fine-grain hardware parallelism.

Supported Targets

Supported Multicore Targets

You can build and download concurrent execution models for the following multicore targets using system target files:

  • Linux®, Windows®, and Mac OS using ert.tlc and grt.tlc

  • Simulink Real-Time™ using slrt.tlc and slrtert.tlc

  • Linux, Windows, and VxWorks® using idelink_ert.tlc, idelink_grt.tlc, and ert.tlc with the Code Generation > Target hardware parameter set to a value other than None

    Note:  

    • To build and download your model, you must have Simulink Coder™ software installed.

    • To build and download your model to a Simulink Real-Time system, you must have Simulink Real-Time software installed. You must also have a multicore target system supported by the Simulink Real-Time product.

    • Deploying to an embedded processor that runs Linux and VxWorks operating systems requires the Embedded Coder® product.

Supported Heterogeneous Targets

In addition to multicore targets, Simulink also supports building and downloading partitions of a model to heterogeneous targets that contain a multicore target and one or more field-programmable gate arrays (FPGAs).

In addition to the supported multicore targets listed in Supported Targets for building and downloading to the multicore target, select the heterogeneous architecture using the Target architecture option in the Concurrent Execution dialog box Concurrent Execution pane:

PropertyDescription

Sample Architecture

Example architecture consisting of single CPU with multiple cores and two FPGAs. You can use this architecture to model for concurrent execution.

Simulink Real-Time

Simulink Real-Time target containing FPGA boards.

Xilinx Zynq ZC702 evaluation kit

Xilinx® Zynq® ZC702 evaluation kit target.

Xilinx Zynq ZC706 evaluation kit

Xilinx Zynq ZC706 evaluation kit target.

Xilinx Zynq Zedboard

Xilinx Zynq ZedBoard™ target.

    Note:   Building HDL code and downloading it to FPGAs requires the HDL Coder™ product. You can generate HDL code if:

    • You have an HDL Coder license

    • You are building on Windows or Linux operating systems

    You cannot generate HDL code on Macintosh systems.

Simulation Limitations

The following limitations apply when partitioning a model for concurrent execution.

  • A partitioned model must consist entirely of Model blocks, MATLAB System blocks, and virtual connectivity blocks at the root-level. The following are valid virtual connectivity blocks:

  • Configure the model to use the fixed-step solver.

  • Do not use the following modes of simulation for models in the concurrent execution environment:

    • External mode

    • Logging to MAT-files (Configuration Parameters > Interface > MAT-file logging check box selected). However, you can use the To Workspace and To File blocks.

    • If you are simulating your model using Rapid Accelerator mode, the top-level model cannot contain a root level Inport block that outputs function calls.

    • In the Configuration Parameters dialog box, set the Diagnostics > Sample Time > Multitask conditionally executed subsystem and Diagnostics > Data Validity > Multitask data store parameters to error.

    • In addition, use the model-level control to handle data transfer for rate transition or if you use Rate Transition blocks, then:

      • Select the Ensure data integrity during data transfer check box.

      • Clear the Ensure deterministic data transfer (maximum delay) check box.

Was this topic helpful?