Code Optimization using CMSIS DSP Library
This example shows you how to use code replacement libraries for ARM Cortex-M processors to generate optimized code for the STMicroelectronics STM32F4-Discovery board.
A code replacement library (CRL) is a set of one or more code replacement tables that define target-specific implementations of functions and operators to be used while generating code for your Simulink model. CRL tables provide the basis for replacing default functions and operators in your model code with target-specific code. The ability to control function and operator replacements allows you to optimize code execution speed, memory footprint and allows you to better integrate external and legacy code with the model code.
The DSP System Toolbox Support Package for ARM Cortex-M Processors provides a CRL table that replaces the standard ANSI-C code generated for certain Simulink blocks with ARM Cortex-M optimized code from the CMSIS DSP library. The CMSIS DSP library includes a set of controls and signal processing functions such as filters, Fourier transforms, matrix math operations, vector operations, etc. The Cortex-M4 processor uses the ARM DSP SIMD instruction set and a floating-point unit (FPU) to efficiently compute signal processing algorithms.
This example shows you how to use the ARM Cortex-M CRL table to generate code optimized for the Cortex-M4 processor present on the STM32F4-Discovery board. You will learn how to use PIL to get execution profiling measurements and observe the performance improvements obtained while using the ARM Cortex-M CRL table.
This example requires the DSP System Toolbox Support Package for ARM Cortex-M Processors.
We recommend completing the Code Verification and Validation with PIL and External Mode example.
To run this example you will need the following hardware:
- STMicroelectronics STM32F4-Discovery board
- USB type A to Mini-B cable
- USB TTL-232 cable - TTL-232R 3.3V
- This example was tested with the FTDI Friend USB TTL-232R 3.3V adapter.
Task 1 - Configure the model for PIL simulation
In this task, you will configure a Simulink model to generate optimized code for the STM32F4-Discovery board and you will run a PIL simulation to collect execution profiling measurements.
1. Open the Code Optimization model. This model is configured for the STM32F4-Discovery target. The objective is to create a PIL block out of the FIR subsystem running on the STM32F4-Discovery board. The FIR subsystem contains a 64-tap FIR filter. This model uses the single-precision floating point data type to fully take advantage of the floating point unit of the STM32F4xx processor.
2. Follow the steps below to select the ARM Cortex-M CRL table:
3. You can enable PIL from Configuration Parameters > All Parameters by searching for Create block and select PIL from the drop-down as shown below:
Alternatively you can enable PIL for Code Optimization model through running set_param('stm32f4discovery_cmsis_crl','CreateSILPILBlock','PIL') from MATLAB command window.
4. Follow the steps below to enable profiling with PIL:
5. Follow the steps below to choose a PIL communication interface:
In this example, the serial communication interface is selected and the COM port corresponding to the USB TTL-232 cable (COM28) is specified in the COM port edit box. Refer to Task 1 of the Code Verification and Validation with PIL and External Mode example for more information on selecting the PIL communication interface.
6. Create a PIL block for the FIR subsystem by following Task 1 - Step 3 of the Code Verification and Validation with PIL and External Mode example.
7. Run a PIL simulation by following Task 1 - Step 4 of the Code Verification and Validation with PIL and External Mode example.
Task 2 - Inspect execution profiling results
This example shows you how to inspect the execution profiling results collected during the PIL simulation.
1. In Task 1, you ran a PIL simulation to collect execution profiling measurements. The measurements are saved in the firProfile workspace variable. To view a report of the code execution profiling measurements, enter the following command on the MATLAB prompt:
The following report opens and displays execution profiling measurements:
The default unit for execution time measurements is nano second.
2. Expand the FIR_step [0.009375 0] in the profiling report to view the total time spent in the Discrete FIR Filter function, arm_fir_f32, from the CMSIS DSP library. To see the code that corresponds to the Discrete FIR Filter entry in the table, click on the link next to the MATLAB icon (number 2 in the above figure).
3. Repeat the PIL simulation choosing None instead of ARM Cortex-M as your CRL table. To keep the profiling measurements acquired with the ARM Cortex-M CRL table, change the name of the firProfile workspace variable. Compare the execution profiling results with the 2 approaches. You should notice a significant performance improvement for the filtering algorithm when ARM Cortex-M CRL table is used.
This example illustrated how to improve the execution time taken by code generated for an FIR filter using the ARM Cortex-M CRL table to replace standard operations with CMSIS DSP library equivalents. The example also introduced the workflow for collecting and analyzing the execution profiling measurements during a PIL simulation.