This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Sample Time and Throughput in Real-Time Applications

After you design and debug the functionality of your model in Simulink®, test and debug it as a real-time application. While testing your real-time application, you can encounter performance issues.

Real-Time Performance Factors

Real-time performance consists of sample time and throughput.

Sample time refers to the time during which the real-time application reads data into blocks and processes it. Physical systems have an inherent sample time (the Nyquist sample time) that is based on physical constraints. For example, when you use the brakes in a truck, the inertia of the truck limits how fast the road speed can change. A significant change requires about a second. Therefore, the speedometer does not need to sample the road speed more often than every tenth of a second.

If the data changes significantly between samples taken at the inherent sample time, sample times longer than that rate can miss those changes. If the data includes undesirable noise, sample times shorter than the inherent sample time can capture that noise.

Throughput refers to how much data the real-time application can process without a CPU overload in a given sample time. Throughput is limited by the resources that are available from the target computer. Sample times that are too short can overload the target computer CPU and stop execution.

For more information, see:

Resources

The target computer system resources that affect a real-time application include:

  • CPU cycles available on multicore systems

  • Target computer RAM access speed

  • RAM available for RAM disk

  • Backplane I/O channel bandwidth and latency

  • Disk storage bandwidth and latency

A multicore target computer can improve throughput and sample time. A multicore computer contains multiple CPUs, or cores, that share the processing load. In a four-core target computer, for example, the following tasks can happen simultaneously on different cores:

  • Execute a referenced model

  • Acquire data through an I/O channel

  • Log results to a RAM disk

  • Communicate with the development computer

The strategy that you use to improve throughput depends on your application system requirements.

Application System RequirementHardware CapabilitiesModeling StyleAvailable Tools

Heavy sensor and effector I/O

Fast I/O channels

 

Simulink Real-Time™ profiler

Heavy real-time computation

  • Additional multicore processors

  • Faster multicore processors

  • Faster RAM speed

Polling mode

  • Simulink Performance Advisor

  • Minimum sample time function

Reference models with different inherent rates

Multicore processors

  • Rate transition blocks

  • Trade off deterministic data transfer for data transfer speed

  • Concurrent execution options

  • Reference model task mapping

Simulink Performance Advisor

Real-time applications connected by network

  • Multiple target computers

  • Fast network switches

Multiple real-time applications that use network blocks for communication

  • Simulink Real-Time profiler

  • Network analyzer

Data logging

  • Large fast hard drive

  • Large RAM disk

  • Selective marking of signals for capture

  • File scopes

Simulation Data Inspector in buffered mode

Low-level mechanical and electronic control

FPGA

 

HDL Coder™ HDL Workflow Advisor

Improving Performance with Concurrency

Whether you can use concurrency to improve real-time performance depends on the model. For example, a model that has heavy data traffic between referenced models is limited by data propagation and not by data processing. For more information, see:

To use concurrency, first convert the blocks at the root level of your model into MATLAB System blocks or into models that are referenced with Model blocks. Do not use Subsystem blocks.

Simulink provides concurrency settings in the Solver pane, under Additional options:

  • Allow tasks to execute concurrently on target'on' (default) or 'off'. When this parameter is 'on' (the default), the kernel allocates tasks to the next available CPU core. For most models, use the default value.

    When Allow tasks to execute concurrently on target is 'off', the parameter Treat each discrete rate as a separate task is available. When Treat each discrete rate as a separate task is 'off', the real-time application executes in single-tasking mode. In single-tasking mode, the application does not take advantage of a multicore target computer.

    In a future release, single-tasking mode will not work for multirate Simulink Real-Time models.

  • Enable explicit model partitioning for concurrent behavior'on' or 'off' (default). This parameter is available only if Allow tasks to execute concurrently on target is 'on' and you click Configure tasks.

This scenario shows how to use the inherent sample time of a model and concurrency to improve the sample time and throughput of a model. As frequently happens during prototyping, the original version is a single-rate model. Using Simulink Performance Advisor and the profiler, this scenario iterates through these tasks:

  • Eliminating CPU overloads while executing in the required sample time range

  • Converting the single-rate model to multirate by using the design specification

  • Improving multirate performance by using concurrency with implicit partitioning

  • Refactoring a multirate model to reduce the CPU requirements of individual referenced models

  • Improving multirate performance by using concurrency with explicit partitioning

At each stage, you view the allocation of single-rate and multirate models among the cores of a multicore target computer by using the Simulink Real-Time profiler functions.

Prerequisites

This scenario assumes that you can:

  1. Open Simulink Real-Time Explorer.

  2. Start the target computer.

  3. Connect Explorer to the target computer.

  4. Build and download a real-time application to the target computer.

  5. Execute a real-time application on the target computer.

For more information, see Related Topics.

Single-Rate Model

You implemented the basic functionality as a single-rate model. To expedite tuning the sample time, you used variable Ts to define the base sample time for the constant blocks in the ref1 and ref2 referenced models.

You debugged the model at a sample time of Ts = 1.0e-3 s. To meet its real-time performance requirement, this model must achieve a base sample time in the range 1.0e-4 ≤ Ts ≤ 3.0e-4 while running on a four-core target computer.

Test Against Requirement.  To test the model , set its base sample time to the top of the required range, 3.0e-4 s.

  1. To open this model, open these files in sequence:

    To view the sample time legend, right-click in Simulink Editor and click Sample Time Display > All. For a single-rate model, the top-level sample time legend color applies to all referenced models.

  2. Set Ts = 3.0e-4.

  3. Build, download, and execute the real-time application.

    The real-time application overloads the CPU. The target computer does not have enough CPU cycles to completely execute the model at the basic sample time.

Determine Minimum Sample Time.  Because the CPU overloaded, you cannot take a baseline until you have determined the minimum sample time.

  1. Open the Analysis > Performance Tools menu and run the Simulink Performance Advisor.

  2. Select the Execute real-time execution activity.

  3. Select and run the baseline checks other than Real-Time Performance Baseline, including Determine minimum sample time.

  4. Evaluate the smallest sample time this model can attain, about 2.2e-3.

  5. To avoid CPU overloads caused by random variations, set Ts to a value about 5% higher than that sample time, or Ts = 2.3e-3.

Determine Baseline.  Using Performance Advisor, establish a baseline and evaluate whether improvement is possible for this version of the model.

  1. In Performance Advisor, run Real-Time Performance Baseline.

    The run succeeds and produces a pie chart.

    This chart shows two usage allocations, BaseRate and Background. The BaseRate allocation shows the execution of the single-rate real-time application as one task. The Background allocation shows the execution of the kernel tasks, such as accessing the target computer disk for data logging or communicating between the development and target computers.

    This example uses a four-core target computer, but the real-time application only uses a quarter of the available CPU cycles. BaseRate has a low margin before CPU overload, about 5%. To improve performance, the real-time application must use more of the available CPU resources.

  2. As a best practice, run all of the Real-Time checks except Simscape checks.

    The Real-Time checks pass. This version of the model cannot be improved further.

Evaluate Task Allocation.   Evaluate the allocation of tasks across the 4 cores.

  1. In the Command Window, run the profiler:

    tg = slrt;
    startProfiler(tg);
    start(tg);
    stop(tg);

    The stop function also calls the stopProfiler function.

  2. Retrieve the profiler data and display the results:

    profiler_data = getProfilerData(tg);
    plot(profiler_data);

    To skip initialization, start the display at 3*Ts. To show a representative example of concurrency, use a range of 4*Ts.

    In the profiler display, the highlighted numbers within each task bar give the task number. Task number 2 shows how much of the available time is being used by the BaseRate task. Task number 1 is the timer interrrupt, part of the background tasks.

    The labels under the task bars give the CPU core on which each task runs. Because this is a single-rate model, the referenced model tasks run one after the other on core 2 at the same rate after each timer interrupt.

    The execution bar at one timer interrupt almost fills the time until the next timer interrupt. If the execution bar at one interrupt overlaps with the execution bar at the next, the target computer CPU overloads and stops execution.

Multirate Model: Concurrency On, Implicit Partitioning

At this stage of the optimization process, the current value of Ts = 2.3e-3, which is outside the required range of 1.0e-4 ≤ Ts ≤ 3.0e-4.

To improve the sample time of the real-time application, start with the inherent rates of the model. After converting the single-rate model to a multirate model, you can turn on concurrency with implicit partitioning.

Convert to Multirate Model.  From the design specification, determine which parts of the model can run at lower rates and which cannot.

  1. Specify rates for parts of the model.

    As a best practice, specify rates that are multiples of a single base rate. In this model, the valid rates are multiples of Ts: Ts, 2*Ts, 3*Ts, and 4*Ts.

  2. In the original model, Ref1/Out4 connects directly to Ref2/In1. Because Ref1/Out4 and Ref2/In1 have different rates, add a Rate Transition block to Ref1.

  3. Configure the Rate Transition block:

    • Set the Ensure data integrity during data transfer parameter.

    • Clear the Ensure deterministic data transfer (maximum delay) parameter.

      Data transfers between triggered tasks cannot require deterministic data integrity.

  4. To open this model, open these files in sequence:

    For this model, the sample time legend colors for the top level also apply to the Ref1 referenced model. A separate set of sample time legend colors appears for the Ref2 referenced model.

Configure Implicit Partitioning.  To configure implicit partitioning, turn on task-level concurrency and take the defaults.

  1. Open the Configuration Parameters dialog box for the top-level model.

  2. In the Solver pane, under Additional options, set the Allow tasks to execute concurrently on target parameter.

  3. Click Simulation > Update Diagram.

Test Against Requirement.  To test the model , set its base sample time to the top of the required range, 3.0e-4 s.

  1. Set Ts = 3.0e-4.

  2. Build, download, and execute the real-time application.

    The real-time application overloads the CPU. The target computer does not have enough CPU cycles to completely execute the model at the basic sample time.

Determine Minimum Sample Time.  Because the CPU overloaded, you cannot take a baseline until you have determined the minimum sample time.

  1. Open the Analysis > Performance Tools menu and run the Simulink Performance Advisor.

  2. Select the Execute real-time execution activity.

  3. Select and run the baseline checks other than Real-Time Performance Baseline, including Determine minimum sample time.

  4. Evaluate the smallest sample time this model can attain, about 4.2e-4.

  5. To avoid CPU overloads caused by random variations, set Ts to a value about 5% higher than that sample time, or Ts = 4.4e-4.

Determine Baseline.  Using Performance Advisor, establish a baseline and evaluate whether improvement is possible for this version of the model.

  1. To take a baseline for optimization, run Real-Time Performance Baseline.

    The run succeeds and produces a pie chart.

    The CPU core usage has improved, but the real-time application only uses half of the available CPU cycles. Also, SubRate2 has a low margin before CPU overload, about 5%. The real-time application needs better load balancing to improve the base sample time and to make its execution more likely to succeed.

  2. As a best practice, run all of the Real-Time checks except Simscape checks.

    The Real-Time checks pass. This version of the model cannot be improved further.

Evaluate Task Allocation.   Evaluate the allocation of tasks across the 4 cores.

  1. In the Command Window, run the profiler:

    tg = slrt;
    startProfiler(tg);
    start(tg);
    stop(tg);  
  2. Retrieve the profiler data and display the results:

    profiler_data = getProfilerData(tg);
    plot(profiler_data);

    To skip initialization, start the display at 3*Ts. To show a representative example of concurrency, use a range of 4*Ts.

    The execution bars for SubRate2, the task with the largest CPU requirement, almost overlap. Concurrency is in full operation as of time tick 1.32e-3.

Refactored Multirate Model: Concurrency On, Explicit Partitioning

At this stage of the optimization process, the current value of Ts = 4.4e-4, which is still outside the required range of 1.0e-4 ≤ Ts ≤ 3.0e-4.

You can improve the performance of your real-time application by explicitly balancing the load of the different processing nodes in the multicore target computer. This process involves iteratively refactoring the model, moving tasks between different processing nodes, and testing the result. For more information, see Concepts in Multicore Programming (Simulink).

Before refactoring a model, note which tasks of a system depend on the output of other tasks. The data dependency between tasks determines their execution order within a time step. Two or more partitions containing data dependencies in a cycle creates a data dependency loop, also known as an algebraic loop. To detect these loops, in the Diagnostics pane, set the Algebraic loop parameter to error. Simulink identifies algebraic loops during execution, displays an error message, and highlights the portion of the block diagram that comprises the loop. Remove these loops from your model. For more information, see Algebraic loop (Simulink).

Refactor Model.  In this scenario, the multirate model consists of two referenced models, each containing many signals to process during each sample time. With implicit partitioning, each referenced model task is assigned to a core. To improve interleaving among CPU cores, divide each referenced model in half. Each half contains half the number of signals in the original referenced model. This configuration produces the same number of referenced models as cores with each referenced model having smaller CPU requirements than the original.

Configure Explicit Partitioning.  To configure explicit partitioning, turn on task-level concurrency and explicitly configure the tasks for each referenced model.

  1. To increase the CPU interleaving of real-time tasks, open the Configuration Parameters dialog box for the top-level model.

  2. In the Solver pane, set the Allow tasks to execute concurrently on target parameter.

  3. Under Configure Tasks, set the Enable explicit model partitioning for concurrent behavior parameter.

  4. Under Concurrent Execution > Tasks and Mapping, open CPU > Periodic.

  5. Create periodic tasks for each rate in each referenced model. Name the tasks Model1_R1, Model1_R2, and so on.

    You use a periodic trigger to represent periodic interrupt sources, such as a timer. The periodicity of the trigger is either the base rate of the task or the period of the trigger. See Concepts in Multicore Programming (Simulink).

  6. Assign each periodic task to the corresponding rate in each referenced model.

    • Model1, Model3 — Four tasks of rates Ts, 2*Ts, 3*Ts, and 4*Ts.

    • Model2, Model4 — Three tasks of rates Ts, 3*Ts, and 4*Ts

    At the end of this process, the Concurrent Execution window looks like the figure.

  7. In the Verification tab, clear the MAT-file logging check box.

  8. Click Simulation > Update Diagram.

For this model, the sample time legend colors for the top level also apply to the Ref1A and Ref1B referenced models. A separate set of sample time legend colors appears for the Ref2A and Ref2B referenced models.

Test Against Requirement.  To test the model , set its base sample time to the top of the required range, 3.0e-4 s.

  1. Set Ts = 3.0e-4.

  2. Build, download, and execute the real-time application.

    The real-time application runs. Your model has met the basic sample-time requirement.

Determine Minimum Sample Time.  To evaluate where this version of the model falls in the sample-time range and how much margin it has:

  1. Open the Analysis > Performance Tools menu and run the Simulink Performance Advisor.

  2. Select the Execute real-time execution activity.

  3. Select and run the baseline checks other than Real-Time Performance Baseline, including Determine minimum sample time. You cannot take a baseline until you have determined the minimum sample time.

  4. Evaluate the smallest sample time this model can attain, about 2.6e-4.

  5. To avoid CPU overloads caused by random variations, set Ts to a value about 5% higher than that sample time, or Ts = 2.7e-4.

Determine Baseline.  Using Performance Advisor, establish a baseline and evaluate whether improvement is possible for this version of the model.

  1. To take a baseline, run Real-Time Performance Baseline.

    The run succeeds and produces output like the figure.

    At the lowest achievable sample time, this real-time application uses three-quarters of the available CPU cycles. The smallest margin before CPU overload is about 27%, which is an improvement over the 5% margin in the previous version.

  2. As a best practice, run all of the Real-Time checks except Simscape checks.

    The Real-Time checks pass. This version of the model cannot be improved further.

Evaluate Task Allocation.   Evaluate the allocation of tasks across the 4 cores.

  1. In the Command Window, run the profiler:

    tg = slrt;
    startProfiler(tg);
    start(tg);
    stop(tg);  
  2. Retrieve the profiler data and display the results:

    profiler_data = getProfilerData(tg);
    plot(profiler_data);

    To skip initialization, start the display at 3*Ts. To show a representative example of concurrency, use a range of 4*Ts.

    The Model*_R3 tasks start running on all four processors, but Model*_R1 tasks preempt them. The overhead of preemption limits the performance improvement that you can achieve by using concurrency alone.

Additional Optimizations

In the model scenario, the change that produced the greatest improvement was going from single-rate to multirate execution with the default task mapping (over 5X improvement). The other optimization produced less improvement (1.5X), but was required to reach the required sample time of 1.0e-4 ≤ Ts ≤ 3.0e-4.

OptimizationAchievable Sample Time Ts
SIngle-rate2.3e-3
Multirate, implicit task mapping4.4e-4
Partitioned multirate, explicit task mapping2.7e-4

To gain more improvement, consider the following optimizations.

Isolated Rate Transitions

If a multirate model contains many rate-transition blocks covering a few overlapping rates, consider extracting each similar rate transition into a new referenced model. You can then set the Enable explicit model partitioning for concurrent behavior parameter and create an explicit periodic task mapping for the new referenced models. If a referenced model does not contain a block with the base-rate sample time, add a separate base-rate task to the mapping table for that model.

For this model, factoring out rate transitions provides only a small improvement. To open ex_slrt_multirate_refactor, open these files in sequence:

Explicit Partitioning of Single-Rate Model

If the model is a single-rate model with a high computational requirement for each referenced model without data dependencies between them, consider setting the Enable explicit model partitioning for concurrent behavior parameter. You can then create an explicit periodic task mapping for each of the referenced models.

The improvement that you can achieve by explicitly mapping the tasks of a single-rate model is limited by the number of cores. For example, if you have four cores and the tasks run at a single rate, the most you can achieve is a 4X improvement in CPU usage.

Function Execution Optimization

To find additional optimizations, consider running the Simulink Real-Time profiler with function execution time logging enabled (see Profiling and Optimization). The profiler provides detailed, low-level information about the CPU tasks. You can then identify bottleneck blocks and replace or improve them.

FPGA Coprocessing

In cases where you cannot meet your system requirements by other optimization methods, consider embedding the crucial algorithms in an FPGA by using HDL Coder HDL Workflow Advisor.

See Also

| | | | | | |

Related Topics