Optimize and Deploy on a Multicore Target

This topic shows how to use a model that is configured for concurrent execution using explicit partitioning and deploy it onto a target. To set up your model for concurrent execution, see Configure Your Model for Concurrent Execution. To specify the target architecture, see Specify a Target Architecture. To use explicit partitioning in a model that is set up for concurrent execution, see Partition Your Model Using Explicit Partitioning.

Generate Code

To generate code for a model that is configured for concurrent execution, on the Apps tab of the Simulink^® editor, select Simulink Coder. On the C Code tab, select Build. The resulting code includes:

C code for parts of the model that are mapped to tasks and triggers in the Concurrent Execution dialog box. C code generation requires a Simulink Coder™ license. For more information, see Code Generation (Simulink Coder) and Code Generation (Embedded Coder).
HDL code for parts of the model that are mapped to hardware nodes in the Concurrent Execution dialog box. HDL code generation requires an HDL Coder™ license. For more information, see HDL Code Generation from Simulink (HDL Coder).
Code to handle data transfer between the concurrent tasks and triggers and to interface with the hardware and software components.

The generated C code contains one function for each task or trigger defined in the system. The task and trigger determines the name of the function:

void <TriggerName>_TaskName(void);

The content for each such function consists of target-independent C code, except for:

Code corresponding to blocks that implement target-specific functionality
Customizations, including those derived from custom storage classes (see Organize Parameter Data into a Structure by Using Struct Storage Class (Embedded Coder)) or Code Replacement Libraries (Simulink Coder)
Code that is generated to handle how data is transferred between tasks. In particular, Simulink Coder uses target-specific implementations of mutual exclusion primitives and data synchronization semaphores to implement the data transfer as described in the following table of pseudocode.

Data Transfer	Initialization	Reader	Writer
Data Integrity Only	BufferIndex = 0; Initialize Buffer[1] with IC	Begin mutual exclusion Tmp = 1 - BufferIndex; End mutual exclusiton Read Buffer[ Tmp ];	Write Buffer[ BufferIndex ]; Begin mutual exclusion BufferIndex = 1 - BufferIndex; End mutual exclusion
Ensure Determinism (Maximum Delay)	WriterIndex = 0; ReaderIndex = 1; Initialize Buffer[1] with IC	Read Buffer[ ReaderIndex ]; ReaderIndex = 1 - ReaderIndex;	Write Buffer[ WriterIndex ] WriterIndex = 1 - WriterIndex;
Ensure Determinism (Minimum Delay)	N/A	Wait dataReady; Read data; Post readDone;	Wait readDone; Write data; Post dataReady;
Data Integrity Only C-HDL interface	The Simulink Coder and HDL Coder products both take advantage of target-specific communication implementations and devices to handle the data transfer between hardware and software components.

The generated HDL code contains one HDL project for each hardware node.

Build on Desktop

Simulink Coder and Embedded Coder^® targets provide an example target to generate code for Windows^®, Linux^® and macOS operating systems. It is known as the native threads example, which is used to deploy your model to a desktop target. The desktop may not be your final target, but can help to profile and optimize your model before you deploy it on another target.

If you have specified an Embedded Coder target, make the following changes in the Configuration Parameters dialog box.

Select the Code Generation > Templates > Generate an example main program check box.
From the Code Generation > Templates > Target Operating System list, select NativeThreadsExample.
Click OK to save your changes and close the Configuration Parameters dialog box.
Apply these settings to all referenced models in your model.

Once you have set up your model, press Ctrl-B to build and deploy it to your desktop. The native threads example illustrates how Simulink Coder and Embedded Coder use target-specific threading APIs and data management primitives, as shown in Threading APIs Used by Native Threads Example. The data transfer between concurrently executing tasks behaves as described in Data Transfer Options. The coder products use the APIs on supported targets for this behavior, as described in Data Protection and Synchronization APIs Used by Native Threads Example.

Threading APIs Used by Native Threads Example

Aspect of Concurrent Execution	Linux Implementation	Windows Implementation	macOS Implementation
Periodic triggering event	POSIX timer	Windows timer	Not applicable
Aperiodic triggering event	POSIX real-time signal	Windows event	POSIX non-real-time signal
Aperiodic trigger	For blocks mapped to an aperiodic task: thread waiting for a signal For blocks mapped to an aperiodic trigger: signal action	Thread waiting for an event	For blocks mapped to an aperiodic task: thread waiting for a signal For blocks mapped to an aperiodic trigger: signal action
Threads	POSIX^®	Windows	POSIX
Threads priority	Assigned based on sample time: fastest task has highest priority	Priority class inherited from the parent process. Assigned based on sample time: fastest task has highest priority for the first three fastest tasks. The rest of the tasks share the lowest priority.	Assigned based on sample time: fastest task has highest priority
Example of overrun detection	Yes	Yes	No

Data Protection and Synchronization APIs Used by Native Threads Example

API	Linux Implementation	Windows Implementation	macOS Implementation
Data protection API	`pthread_mutex_init` `pthread_mutex_destroy` `pthread_mutex_lock` `pthread_mutex_unlock`	`CreateMutex` `CloseHandle` `WaitForSingleObject` `ReleaseMutex`	`pthread_mutex_init` `pthread_mutex_destroy` `pthread_mutex_lock` `pthread_mutex_unlock`
Synchronization API	`sem_init` `sem_destroy` `sem_wait` `sem_post`	`CreateSemaphore` `CloseHandle` `WaitForSingleObject` `ReleaseSemaphore`	`sem_open` `sem_unlink` `sem_wait` `sem_post`

Profile and Evaluate Explicitly Partitioned Models on a Desktop

Profile the execution of your code on the multicore target using the Profile Report pane of the Concurrent Execution dialog box. You can profile using Simulink Coder (GRT) and Embedded Coder (ERT) targets. Profiling helps you identify the areas in your model that are execution bottlenecks. You can analyze the execution time of each task and find the task that takes most of the execution time. For example, you can compare the average execution times of the tasks. If a task is computation intensive, or does not satisfy real-time requirements and overruns, you can break it into tasks that are less computation intensive and that can run concurrently.

When you generate a profile report, the software:

Builds the model.
Generates code for the model.
Adds tooling to the generated code to collect data.
Executes the generated code on the target and collects data.
Collates the data, generates an HTML file (model_name_ProfileReport.html) in the current folder, and displays that HTML file in the Profile Report pane of the Concurrent Execution dialog box.
Note
If an HTML profile report exists for the model, the Profile Report pane displays that file. To generate a new profile report, click .

Section Description

Section	Description
Summary	Summarizes model execution statistics, such as total execution time and profile report creation time. It also lists the total number of cores on the host machine.
Task Execution Time	Displays the execution time, in microseconds, for each task in a pie chart color coded by task. Visible for Windows, Linux, and macOS platforms.
Task Affinitization to Processor Cores	Platform-dependent. For each time step and task, Simulink displays the processor core number the task started executing on at that time step, color coded by processor. If there is no task scheduled for a particular time step, `NR` is displayed. Visible for Windows and Linux platforms.

Summary

Summarizes model execution statistics, such as total execution time and profile report creation time. It also lists the total number of cores on the host machine.

Task Execution Time

Displays the execution time, in microseconds, for each task in a pie chart color coded by task.

Visible for Windows, Linux, and macOS platforms.

Task Affinitization to Processor Cores

Platform-dependent. For each time step and task, Simulink displays the processor core number the task started executing on at that time step, color coded by processor.

If there is no task scheduled for a particular time step, NR is displayed.

Visible for Windows and Linux platforms.

After you analyze the profile report, consider changing the mapping of Model blocks to efficiently use the concurrency available on your multicore system (see Map Blocks to Tasks, Triggers, and Nodes).

Generate Profile Report

This topic assumes a previously configured model ready to be profiled for concurrent execution. For more information, see Configure Your Model for Concurrent Execution.

In the Concurrent Execution dialog box, click the Profile report node.
The profile tool looks for a file named model_name_ProfileReport.html. If such a file does not exist for the current model, the Profile Report pane displays the following.
Note
If an HTML profile report exists for the model, the Profile Report pane displays that file. To generate a new profile report, click .
Enter the number of time steps for which you want the profiler to collect data for the model execution.
Click the Generate task execution profile report button.
This action builds the model, generates code, adds data collection tooling to the code, and executes it on the target, which also generates an HTML profile report. This process can take several minutes. When the process is complete, the contents of the profile report appear in the Profile Report pane. For example:
The profiling report shows the summary, execution time for each task, and the mapping of each task to processor cores. We see that tasks 1 and 2 run on core 0, where tasks 3 and 4 run on core 1. The Task Execution Time section of the report indicates that task 1 and task 3 take the most amount of time to run. Note that the period of task 3 is twice that of tasks 1 and 2, and the period of task 4 is twice that of task 3.
Analyze the profile report. Create and modify your model or task mapping if needed, and regenerate the profile report.

Generate Profile Report at Command Line. Alternatively, you can generate a profile report for a model configured for concurrent execution at the command line. Use the Simulink.architecture.profile function.

For example, to create a profile report for the model slexMulticoreSolverExample:

openExample('slexMulticoreSolverExample');
Simulink.architecture.profile('slexMulticoreSolverExample');

To create a profile report with a specific number of samples (100) for the model slexMulticoreSolverExample:

Simulink.architecture.profile('slexMulticoreSolverExample',120);

The function creates a profile report named slexMulticoreSolverExample_ProfileReport.html in your current folder.

Customize the Generated C Code

The generated code is suitable for many different applications and development environments. To meet your needs, you can customize the generated C code as described in Code and Tool Customization (Embedded Coder). In addition to those customization capabilities, for multicore and heterogeneous targets you can further customize the generated code as follows:

You can register your preferred implementation of mutual exclusion and data synchronization primitives using the code replacement library.
You can define a custom target architecture file that allows you to specify target specific properties for tasks and triggers in the Concurrent Execution dialog box. For more information, see Define a Custom Architecture File.

Related Examples

Assign Tasks to Cores for Multicore Programming

More About

Multicore Programming with Simulink