Using the process of partitioning, mapping, and profiling in Simulink®, you can address common challenges of designing systems for concurrent execution.
Partitioning enables you to designate regions of your model as tasks, independent of the details of the embedded multicore processing hardware. This independence enables you to arrange the content and hierarchy of your model to best suit the needs of your application.
In a partitioned system, mapping enables you to assign partitions to processing elements in your embedded processing system. Use the Simulink mapping tool to represent and manage the details of executing threads, HDL code on FPGAs, and the work that these threads or FPGAs perform. While creating your model, you do not need to track the partitions or data transfer between them because the tool does this work. Also, you can reuse your model across multiple architectures.
Profiling simulates deployment of your application under typical computational loads. It enables you to determine the partitioning and mapping for your model that gives the best performance, before you deploy to your hardware.
To deploy your model to the target.
Set up your model for concurrent execution.
If you are using a desktop target, you need to configure your model. For more information, see Configure Your Model for Concurrent Execution.
If you are not using a desktop target, you need to configure your model, specify a target, and explicitly partition your model. Additionally, you may want to change the default mapping of blocks to tasks in explicit partitioning. To set up your model for concurrent execution, see Configure Your Model for Concurrent Execution. To specify the target architecture, see Specify a Target Architecture. For explicit partitioning of a model, see Partition Your Model Using Explicit Partitioning
Generate code and deploy it to your target. You can choose to deploy onto multiple targets.
To build and deploy on a desktop target, see Build on Desktop.
To deploy onto embedded targets using Embedded Coder®, see Deployment.
To deploy onto FPGAs using HDL Coder™, see Deployment.
To build and deploy on a real-time target using Simulink Real-Time™, see Standalone Operation.
Optimize your design. This step is optional, and includes iterating over the design of your model and mapping to get the best performance, based on your metrics. One way to evaluate your model is to profile it and get execution times.
|Desktop target||Profile and Evaluate on a Desktop|
|Simulink Real-Time||Execution Profiling for Real-Time Applications|
|Embedded Coder||Perform Execution-Time Profiling for IDE and Toolchain Targets|
|HDL Coder||Speed and Area Optimization|
Manually programming your application for concurrent execution poses challenges beyond the typical challenges with manual coding. With Simulink, you can overcome the challenges of portability across multiple architectures, efficiency of deployment for an architecture, and cyclic data dependencies between application components. For more information on these challenges, see Challenges in Multicore Programming
Simulink enables you to determine the content and hierarchical needs of the modeled system without considering the target system. While creating model content, you do not need to keep track of the number of cores in your target system. Instead, you select the partitioning methods that enable you to create model content. Simulink generates code for the architecture you specify.
You can select an architecture from the available supported architectures or add a custom architecture. When you change your architecture, Simulink generates only the code that needs to change for the second architecture. The new architecture reuses blocks and functions. For more information, see Supported Targets For Multicore Programming and Specify a Target Architecture.
To improve the performance of the deployed application, Simulink allows you to simulate it under typical computational loads and try multiple configurations of partitioning and mapping the application. Simulink compares the performance of each of these configurations to provide the optimal configuration for deployment. This is known as profiling. Profiling helps you to determine the optimum partition configuration before you deploy your system to the desired hardware.
You can create a mapping for your application in which Simulink maps the application components across different processing nodes. You can also manually assign components to processing nodes. For any mapping, you can see the data dependencies between components and remap accordingly. You can also introduce and remove data dependencies between different components.
Some tasks of a system depend on the output of other tasks. The data dependency between tasks determines their processing order. Two or more partitions containing data dependencies in a cycle creates a data dependency loop, also known as an algebraic loop. Simulink does not allow algebraic loops to occur across potentially parallel partitions because of the high cost of solving the loop using parallel algorithms.
In some cases, the algebraic loop is artificial. For example, you can have an artificial algebraic loop because of Model-block-based partitioning. An algebraic loop involving Model blocks is artificial if removing the use of Model partitioning eliminates the loop. You can minimize the occurrence of artificial loops. In the Configuration Parameter dialog boxes for the models involved in the algebraic loop, select Model Referencing > Minimize algebraic loop occurrences.
Additionally, if the model is configured for the Generic Real-Time
grt.tlc) or the Embedded Real-Time target
ert.tlc) in the Configuration Parameters dialog
box, clear the All Parameters > Single output/update function check
If the algebraic loop is a true algebraic condition, you must either contain all the blocks in the loop in one Model partition, or eliminate the loop by introducing a delay element in the loop.
The following examples show how to implement different types of parallelism in Simulink. These examples contain models that are partitioned and mapped to a simple architecture with one CPU and one FPGA.