How Simulink Solves Parallel and Multicore Processing Problems

Basics of Concurrent Execution

The purpose of concurrent execution is to help you create models of real-world concurrent systems, where parts of your model represent computations that can execute in parallel. These models allow you to take advantage of multicore processing power and FPGA hardware parallelism to increase the performance of an embedded system.

In general, use these concurrent execution modeling concepts for real-time system design. These concepts are not helpful if you want to improve the performance of a Simulink® simulation on a non-real-time host computer. In general, Simulink tries to optimize the usage of host computers, regardless of the modeling pattern you use. It provides ways to improve the performance through a variety of techniques. For more information on these techniques, see Performance.

You can use these modeling concepts, described in Model Parallel Computations, to design a model for concurrent execution now or for use in future concurrent execution environments. This can help when:

  • You want to take advantage of multicore and FPGA processing now or at a future stage in your design process.

  • You want to take into account scalability so your models can take advantage of increasing numbers of cores and FPGA processing power from year to year. Simulink helps you achieve scalability through the process of partitioning and mapping.

    • Partitioning lets you designate regions of your model as units of work, independent of the details of the embedded multicore processing system. This independence lets you arrange the content and hierarchy of your model according to the physical, control, or signal processing system of interest.

    • Given a partitioned system, mapping is a separate tool that lets you assign the work designated within partitions to actual processing elements in the embedded processing system. The assignment occurs outside of the model content. You do not need to modify the content of your model in terms of blocks or signal lines. This capability lets you reuse your model for increasing processing power as the number of cores and FPGA increases.

For example, manually programming a multicore processor connected to an FPGA poses multiple challenges. Amongst these, you need to keep track of:

  • The threads that will execute on the embedded processing system multicore processor

  • The data transfers to and from the FPGA

In contrast, the partitioning approach provides a model based approach in which you arrange the content and hierarchy of your model according to the needs of the physical, control, or signal processing system of interest. While creating your model content, you do not need to keep track of threads or data transfer to/from these threads. You can solve these problems using the mapping tool, which provides a more natural interface to represent and manage the actual details of executing threads, HDL code on FPGAs, and the work that these threads or FPGAs perform.

Model Parallel Computations

Partitioning methods help you designate areas of your model for concurrent computations. Partitioning allows you to create model content independently of target processing details. For example, while creating model content, you should not need to keep track of how many cores are in your target system. Instead, you should select the methods that allow you to create model content. Simulink gives you the flexibility to express the natural content and hierarchical needs of the modeled system without consideration for the target system.

The rate and model based approaches give you primarily graphical means to represent concurrency for systems that are represented using Simulink and Stateflow® blocks. You can partition MATLAB® code using the MATLAB System block. You can also partition models of physical systems using multisolver methods. The following summarizes these partitioning methods:

  • Rate based

    If your model contains multiple rates, you can designate each rate as a potentially concurrent computation. Rate grouped partitions let you contain blocks in different subsystems and models. This capability does not impose any modeling constraints on the hierarchical representation of your modeled system.

  • Model based

    You can use Model blocks in addition to the rate based approach to refine rate groups into finer-grained size computations. When using Model blocks to express partition boundaries, you can use each rate group in each root-level Model block to designate a potentially concurrent computation.

  • MATLAB System block based

    You can partition MATLAB code by dividing your logic into separate MATLAB System blocks located at the root-level of your system. You can designate each root-level MATLAB System block as a potentially parallel unit of computation.

  • Physical

    You can design a model using Simscape™ blocks to represent physical parts of your model. In particular, you can use different solvers on different parts of the system. For more information, see Multiple Local Solvers Example with a Mixed Stiff-Nonstiff System.

    When using multiple solvers, you can designate the system of equations under each solver as a potentially parallel computation. To do this, designate the partition boundaries using a Model block and place a different solver into each Model block. The solver in each Model block designates a potentially parallel unit of computation.

Each method has additional considerations to help you decide which to use.

ToValid Partitioning MethodsConsiderations

Increase the performance of a simulation on the host computer.

None of the listed.

In general, Simulink tries to make the best use of the host computer performance regardless of the modeling method you use. For more information on the ways that Simulink helps you improve performance, see Performance.

Increase the performance of a plant simulation in a multicore HIL system.

You can use any of the partitioning methods as well as their combinations.

The processing characteristics of the HIL system and the embedded processing system can vary greatly. Consider partitioning your system into more units of work than there are number of processing elements in the HIL or embedded system. This convention allows flexibility in the mapping process.

Create a valid model of a multirate concurrent system to take advantage of a multicore processing system.

You can use any of the partitioning methods as well as their combinations.

Partitioning can introduce signal delays to represent the data transfer requirements for concurrent execution. For more information, see Handle Data Transfers.

Create a valid model of a heterogeneous system to take advantage of multicore and FPGA processing.

  • Multicore processing

    Use any of the partitioning methods.

  • FPGA processing

    Use the Model block based method.

Consider partitioning for FPGA processing where your computations have bottlenecks that can benefit from fine-grain hardware parallelism.

Handle Problems that Arise from Parallelism

A parallel computation cannot execute faster than its longest sequence of dependent partitions, which must execute sequentially. To achieve scalable and efficient concurrency, track data dependencies between partitions. Data dependencies arise whenever a signal originates from one block in one partition and is connected to a block in another partition.

Handle Data Transfers

To create opportunities for parallelism, Simulink provides multiple options for handling data transfers between concurrently executing partitions. These options help you trade off computational latency for numerical signal delays, as follows:

You want toAction


  • Create opportunity for parallelism.

  • Produce numeric results that are repeatable with each run of the generated code.

  • In the Data Transfer pane of the Concurrent Execution dialog box, select Ensure deterministic transfer (Maximum Delay) for either signal type.

  • To achieve this behavior, Simulink introduces signal delays, which may have numeric impact on the numeric results. To compensate, you may need to specify an initial condition for these delay elements.


  • Create opportunity for parallelism.

  • Reduce signal latency.

  • In the Data Transfer pane of the Concurrent Execution dialog box, select Ensure data integrity only for either signal type.

  • Simulink generates code to operate with maximum responsiveness and data integrity. However, the implementation is interruptible, which can lead to loss of data during data transfer.

  • Use a deterministic execution schedule to achieve determinism in the deployment environment.


  • Enforce data dependency.

  • Produce numeric results that are repeatable with each run of the generated code.

  • In the Data Transfer pane of the Concurrent Execution dialog box, select Ensure deterministic transfer (Minimum Delay) for either signal type.

  • Simulink uses target specific synchronization primitives to synchronize data transfer.

For example, consider a control application in which a controller that reads sensory data at time T must produce the control signals to the actuator at time T+Δ.

  • If the sequential algorithm meets the timing deadlines, consider using option 3.

  • If the embedded system provides deterministic scheduling, consider using option 2.

  • Otherwise, use option 1 to create opportunities for parallelism by introducing signal delays.

The preceding table provides the model-level options that you can apply to each signal that requires data transfer in the system. In addition to model-level control, Simulink lets you override how the data transfer settings are to be handled for each signal. For more information, see Configuring Data Transfer Communications.

Algebraic Loops

When two or more partitions contain data dependencies in a cycle, an algebraic loop condition occurs. Simulink does not allow algebraic loops to occur across potentially parallel partitions because of the high cost of solving the loop using parallel algorithms.

In some cases, the algebraic loop may be artificial. For example, you might have an artificial algebraic loop because of Model block based partitioning. An algebraic loop involving Model blocks is artificial if removing the use of Model partitioning will eliminate the loop. Simulink provides an option to minimize the occurrence of artificial loops. In the Configuration Parameter dialog boxes for the models involved in the algebraic loop, select Model Referencing > Minimize algebraic loop occurrences.

Additionally, if the model is configured for the Generic Real-Time target (grt.tlc) or the Embedded Real-Time target (ert.tlc) in the Configuration Parameters dialog box, clear the Code Generation > Interface > Single output/update function check box.

If the algebraic loop is a true algebraic condition, you must either contain all the blocks in the loop in one Model partition, or eliminate the loop by introducing a delay element in the loop.

Supported Multicore Targets

You can build and download concurrent execution models for the following multicore targets using system target files:

  • Linux®, Windows®, and Mac OS using ert.tlc and grt.tlc

  • Simulink Real-Time™ using slrt.tlc and slrtert.tlc

  • Linux, Windows, and VxWorks® using idelink_ert.tlc, idelink_grt.tlc, and ert.tlc with the Code Generation > Target hardware parameter set to a value other than None


    • To build and download your model, you must have Simulink Coder™ software installed.

    • To build and download your model to a Simulink Real-Time system, you must have Simulink Real-Time software installed. You must also have a multicore target system supported by the Simulink Real-Time product.

    • Deploying to an embedded processor that runs Linux and VxWorks operating systems requires the Embedded Coder® product.

Supported Heterogeneous Targets

In addition to multicore targets, Simulink also supports building and downloading partitions of a model to heterogeneous targets that contain a multicore target and one or more field-programmable gate arrays (FPGAs).

In addition to the supported multicore targets listed in Supported Multicore Targets for building and downloading to the multicore target, select the heterogeneous architecture using the Target architecture option in the Concurrent Execution dialog box Concurrent Execution pane:


Sample Architecture

Example architecture consisting of single CPU with multiple cores and two FPGAs. You can use this architecture to model for concurrent execution.

Simulink Real-Time

Simulink Real-Time target containing FPGA boards.

Xilinx Zynq ZC702 evaluation kit

Xilinx® Zynq® ZC702 evaluation kit target.

Xilinx Zynq ZC706 evaluation kit

Xilinx Zynq ZC706 evaluation kit target.

Xilinx Zynq Zedboard

Xilinx Zynq ZedBoard™ target.

    Note:   Building HDL code and downloading it to FPGAs requires the HDL Coder™ product. You can generate HDL code if:

    • You have an HDL Coder license

    • You are building on Windows or Linux operating systems

    You cannot generate HDL code on Macintosh systems.

Helpful Terms

Task — Object that corresponds to a thread of execution on a target. From within the Simulink environment, you can specify tasks, configure their properties, and map Model blocks to them.

Trigger — Abstraction of operating system timers, signals, interrupts, and system events.

Aperiodic trigger — Event trigger that has no inherent periodicity, such as an interrupt. When multiple triggers coexist, the software assumes that they are asynchronous to each other.

Periodic triggers — Periodic event trigger such as an operating system timer. When multiple triggers coexist, the software assumes that they are asynchronous to each other.

Hardware node — Abstraction of an FPGA processing element.

Software node — Abstraction of a multicore processor (CPU).

Simulation Limitations

  • A partitioned model must consist entirely of Model blocks, MATLAB System blocks, and virtual connectivity blocks at the root-level. The following are valid virtual connectivity blocks:

  • Configure the model to use the fixed-step solver.

  • Do not use the following modes of simulation for models in the concurrent execution environment:

    • External mode

    • Logging to MAT-files (Configuration Parameters > Interface > MAT-file logging check box selected). However, you can use the To Workspace and To File blocks.

    • If you are simulating your model using Rapid Accelerator mode, the top-level model cannot contain a root level Inport block that outputs function calls.

    • In the Configuration Parameters dialog box, set the Diagnostics > Sample Time > Multitask conditionally executed subsystem and Diagnostics > Data Validity > Multitask data store parameters to error.

    • In addition, use the model-level control to handle data transfer for rate transition or if you use Rate Transition blocks, then:

      • Select the Ensure data integrity during data transfer check box.

      • Clear the Ensure deterministic data transfer (maximum delay) check box.

Was this topic helpful?