Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

**Products Used**- DSP System
Toolbox™, MATLAB
^{®}Coder™

The benchmarks in this example have been measured on a machine with four physical cores.

This example shows how to accelerate a signal
processing algorithm in MATLAB using the `codegen`

and `dspunfold`

functions. You can generate a MATLAB executable (MEX function) from
an entire MATLAB function
or specific parts of the MATLAB function.
When you run the MEX function instead of the original MATLAB code, simulation speed can increase
significantly. To generate the MEX equivalent, the algorithm must
support code generation.

To use `codegen`

, you must
have MATLAB
Coder installed. To use `dspunfold`

,
you must have MATLAB
Coder and DSP System
Toolbox installed.

To use `dspunfold`

on Windows
and Linux, you must use a compiler that supports the Open Multi-Processing
(OpenMP) application interface. See Supported
Compilers.

Consider a simple FIR filter algorithm to accelerate. Copy the `firfilter`

function
code into the `firfilter.m`

file.

function [y,z1] = firfilter(b,x) % Inputs: % b - 1xNTaps row vector of coefficients % x - A frame of noisy input % States: % z, z1 - NTapsx1 column vector of states % Output: % y - A frame of filtered output persistent z; if (isempty(z)) z = zeros(length(b),1); end Lx = size(x,1); y = zeros(size(x),'like',x); z1 = z; for m = 1:Lx % Load next input sample z1(1,:) = x(m,:); % Compute output y(m,:) = b*z1; % Update states z1(2:end,:) = z1(1:end-1,:); z = z1; end

The `firfilter`

function accepts a vector
of filter coefficients, *b*, a noisy input
signal, *x*, as inputs. Generate the filter
coefficients using the `fir1`

function.

NTaps = 250; Fp = 4e3/(44.1e3/2); b = fir1(NTaps-1,Fp);

Filter a stream of a noisy sine wave signal by using the `firfilter`

function.
The sine wave has a frame size of 4000 samples and a sample rate of
192 kHz. Generate the sine wave using the `dsp.SineWave`

System
object™. The noise
is a white Gaussian with a mean of 0 and a variance of 0.02. Name
this function `firfilter_sim`

. The `firfilter_sim`

function
calls the `firfilter`

function on the noisy
input.

function totVal = firfilter_sim(b) % Create the signal source Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); totVal = zeros(4000,500); R = 0.02; clear firfilter; % Iteration loop. Each iteration filters a frame of the noisy signal. for i = 1 : 500 trueVal = Sig(); % Original sine wave noisyVal = trueVal + sqrt(R)*randn; % Noisy sine wave filteredVal = firfilter(b,noisyVal); % Filtered sine wave totVal(:,i) = filteredVal; % Store the entire sine wave end

Run `firfilter_sim`

and measure the
speed of execution. The execution speed varies depending on your machine.

```
tic;totVal = firfilter_sim(b);t1 = toc;
fprintf('Original Algorithm Simulation Time: %4.1f seconds\n',t1);
```

Original Algorithm Simulation Time: 7.8 seconds

`codegen`

Call `codegen`

on `firfilter`

,
and generate its MEX equivalent, `firfilter_mex`

.
Generate and pass the filter coefficients and the sine wave signal
as inputs to the `firfilter`

function.

Ntaps = 250; Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); % Create the Signal Source R = 0.02; trueVal = Sig(); % Original sine wave noisyVal = trueVal + sqrt(R)*randn; % Noisy sine wave Fp = 4e3/(44.1e3/2); b = fir1(Ntaps-1,Fp); % Filter coefficients codegen firfilter -args {b,noisyVal}

In the `firfilter_sim`

function, replace `firfilter(b,noisyVal)`

function
call with `firfilter_mex(b,noisyVal)`

. Name this
function `firfilter_codegen`

.

function totVal = firfilter_codegen(b) % Create the signal source Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); totVal = zeros(4000,500); R = 0.02; clear firfilter_mex; % Iteration loop. Each iteration filters a frame of the noisy signal. for i = 1 : 500 trueVal = Sig(); % Original sine wave noisyVal = trueVal + sqrt(R)*randn; % Noisy sine wave filteredVal = firfilter_mex(b,noisyVal); % Filtered sine wave totVal(:,i) = filteredVal; % Store the entire sine wave end

Run `firfilter_codegen`

and measure the speed
of execution. The execution speed varies depending on your machine.

tic;totValcodegen = firfilter_codegen(b);t2 = toc; fprintf('Algorithm Simulation Time with codegen: %5f seconds\n',t2); fprintf('Speedup factor with codegen: %5f\n',(t1/t2));

Algorithm Simulation Time with codegen: 0.923683 seconds Speedup factor with codegen: 8.5531

The speedup gain is approximately `8.5`

.

`dspunfold`

The `dspunfold`

function generates a multi-threaded
MEX file which can improve the speedup gain even further.

`dspunfold`

also generates a single-threaded
MEX file and a self-diagnostic analyzer function. The multi-threaded
MEX file leverages the multicore CPU architecture of the host computer.
The single-threaded MEX file is similar to the MEX file that the `codegen`

function
generates. The analyzer function measures the speedup gain of the
multi-threaded MEX file over the single-threaded MEX file.

Call `dspunfold`

on `firfilter`

and
generate its multi-threaded MEX equivalent, `firfilter_mt`

.
Detect the state length in samples by using the `-f`

option,
which can improve the speedup gain further. `-s auto`

triggers
the automatic state length detection. For more information on using
the `-f`

and `-s`

options, see `dspunfold`

.

dspunfold firfilter -args {b,noisyVal} -s auto -f [false,true]

State length: [autodetect] samples, Repetition: 1, Output latency: 8 frames, Threads: 4 Analyzing: firfilter.m Creating single-threaded MEX file: firfilter_st.mexw64 Searching for minimal state length (this might take a while) Checking stateless ... Insufficient Checking 4000 samples ... Sufficient Checking 2000 samples ... Sufficient Checking 1000 samples ... Sufficient Checking 500 samples ... Sufficient Checking 250 samples ... Sufficient Checking 125 samples ... Insufficient Checking 187 samples ... Insufficient Checking 218 samples ... Insufficient Checking 234 samples ... Insufficient Checking 242 samples ... Insufficient Checking 246 samples ... Insufficient Checking 248 samples ... Insufficient Checking 249 samples ... Sufficient Minimal state length is 249 samples Creating multi-threaded MEX file: firfilter_mt.mexw64 Creating analyzer file: firfilter_analyzer.p

The automatic state length detection tool detects an exact state
length of `259`

samples.

Call the analyzer function and measure the speedup gain of the multi-threaded MEX file with respect to the single-threaded MEX file. Provide at least two different frames for each input argument of the analyzer. The frames are appended along the first dimension. The analyzer alternates between these frames while verifying that the outputs match. Failure to provide multiple frames for each input can decrease the effectiveness of the analyzer and can lead to false positive verification results.

firfilter_analyzer([b;0.5*b;0.6*b],[noisyVal;0.5*noisyVal;0.6*noisyVal]);

Analyzing multi-threaded MEX file firfilter_mt.mexw64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 8 frames Speedup = 3.2x

`firfilter_mt`

has a speedup gain factor
of `3.2`

when compared to the single-threaded MEX
file, `firfilter_st`

. To increase the speedup further,
increase the repetition factor using the `-r`

option.
The tradeoff is that the output latency increases. Use a repetition
factor of `3`

. Specify the exact state length to
reduce the overhead and increase the speedup further.

dspunfold firfilter -args {b,noisyVal} -s 249 -f [false,true] -r 3

State length: 249 samples, Repetition: 3, Output latency: 24 frames, Threads: 4 Analyzing: firfilter.m Creating single-threaded MEX file: firfilter_st.mexw64 Creating multi-threaded MEX file: firfilter_mt.mexw64 Creating analyzer file: firfilter_analyzer.p

Call the analyzer function.

firfilter_analyzer([b;0.5*b;0.6*b],[noisyVal;0.5*noisyVal;0.6*noisyVal]);

Analyzing multi-threaded MEX file firfilter_mt.mexw64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 24 frames Speedup = 3.8x

The speedup gain factor is `3.8`

, or approximately
32 times the speed of execution of the original simulation.

For this particular algorithm, you can see that `dspunfold`

is
generating a highly optimized code, without having to write any C
or C++ code. The speedup gain scales with the number of cores on your
host machine.

The FIR filter function in this example is only an illustrative
algorithm that is easy to understand. You can apply this workflow
on any of your custom algorithms. If you want to use an FIR filter,
it is recommended that you use the `dsp.FIRFilter`

System
object in DSP System
Toolbox.
This object runs much faster than the benchmark numbers presented
in this example, without the need for code generation.

Consider a Kalman filter algorithm, which estimates the sine
wave signal from a noisy input. This example shows the performance
of Kalman filter with `codegen`

and `dspunfold`

.

The noisy sine wave input has a frame size of 4000 samples and a sample rate of 192 kHz. The noise is a white Gaussian with mean of 0 and a variance of 0.02.

The function `filterNoisySignal`

calls the `kalmanfilter`

function
on the noisy input.

```
type filterNoisySignal
```

function totVal = filterNoisySignal % Create the signal source Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); totVal = zeros(4000,500); R = 0.02; clear kalmanfilter; % Iteration loop to estimate the sine wave signal for i = 1 : 500 trueVal = Sig(); % Actual values noisyVal = trueVal + sqrt(R)*randn; % Noisy measurements estVal = kalmanfilter(noisyVal); % Sine wave estimated by Kalman filter totVal(:,i) = estVal; % Store the entire sine wave end

```
type kalmanfilter
```

function [estVal,estState] = kalmanfilter(noisyVal) % This function tracks a noisy sinusoid signal using a Kalman filter % % State Transition Matrix A = 1; stateSpaceDim = size(A,1); % Measurement Matrix H = 1; measurementSpaceDim = size(H,1); numTsteps = size(noisyVal,1)/measurementSpaceDim; % Containers to store prediction and estimates for all time steps zEstContainer = noisyVal; xEstContainer = zeros(size(noisyVal)); Q = 0.0001; % Process noise covariance R = 0.02; % Measurement noise covariance persistent xhat P xPrior PPrior; % Local copies of discrete states if isempty(xhat) xhat = 5; % Initial state estimate end if isempty(P) P = 1; % Error covariance estimate end if isempty(xPrior) xPrior = 0; end if isempty(PPrior) PPrior = 0; end % Loop over all time steps for n=1:numTsteps % Gather chunks for current time step zRowIndexChunk = (n-1)*measurementSpaceDim + (1:measurementSpaceDim); stateEstsRowIndexChunk = (n-1)*stateSpaceDim + (1:stateSpaceDim); % Prediction step xPrior = A * xhat; PPrior = A * P * A' + Q; % Correction step. Compute Kalman gain. PpriorH = PPrior * H'; HPpriorHR = H * PpriorH + R; KalmanGain = (HPpriorHR \ PpriorH')'; KH = KalmanGain * H; % States and error covariance are updated in the % correction step xhat = xPrior + KalmanGain * noisyVal(zRowIndexChunk,:) - ... KH * xPrior; P = PPrior - KH * PPrior; % Append estimates xEstContainer(stateEstsRowIndexChunk, :) = xhat; zEstContainer(zRowIndexChunk,:) = H*xhat; end % Populate the outputs estVal = zEstContainer; estState = xEstContainer; end

Run `filterNoisySignal.m`

and measure the
speed of execution.

```
tic;totVal = filterNoisySignal;t1 = toc;
fprintf('Original Algorithm Simulation Time: %4.1f seconds\n',t1);
```

Original Algorithm Simulation Time: 21.7 seconds

`codegen`

Call the `codegen`

function on `kalmanfilter`

,
and generate its MEX equivalent, `kalmanfilter_mex`

.

The `kalmanfilter`

function requires the
noisy sine wave as the input.

Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); % Create the Signal Source R = 0.02; % Measurement noise covariance trueVal = step(Sig); % Actual values noisyVal = trueVal + sqrt(R)*randn; % Noisy measurements codegen -args {noisyVal} kalmanfilter.m

Replace `kalmanfilter(noisyVal)`

in `filterNoisySignal`

function
with `kalmanfilter_mex(noisyVal)`

. Name this function
as `filterNoisySignal_codegen`

function totVal = filterNoisySignal_codegen % Create the signal source Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); totVal = zeros(4000,500); R = 0.02; clear kalmanfilter_mex; % Iteration loop to estimate the sine wave signal for i = 1 : 500 trueVal = Sig(); % Actual values noisyVal = trueVal + sqrt(R)*randn; % Noisy measurements estVal = kalmanfilter_mex(noisyVal); % Sine wave estimated by Kalman filter totVal(:,i) = estVal; % Store the entire sine wave end

Run `filterNoisySignal_codegen`

and measure
the speed of execution.

tic; totValcodegen = filterNoisySignal_codegen; t2 = toc; fprintf('Algorithm Simulation Time with codegen: %5f seconds\n',t2); fprintf('Speedup with codegen is %0.1f',t1/t2);

Algorithm Simulation Time with codegen: 0.095480 seconds Speedup with codegen is 227.0

The Kalman filter algorithm implements several matrix multiplications. `codegen`

uses
the Basic Linear Algebra Subroutines (BLAS) libraries to perform these
multiplications. These libraries generate a highly optimized code,
hence giving a speedup gain of 227.

`dspunfold`

Generate a multi-threaded MEX file using `dspunfold`

and
compare its performance with `codegen`

.

Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); % Create the signal source R = 0.02; % Measurement noise covariance trueVal = step(Sig); % Actual values noisyVal = trueVal + sqrt(R)*randn; % Noisy measurements dspunfold kalmanfilter -args {noisyVal} -s auto

State length: [autodetect] frames, Repetition: 1, Output latency: 8 frames, Threads: 4 Analyzing: kalmanfilter.m Creating single-threaded MEX file: kalmanfilter_st.mexw64 Searching for minimal state length (this might take a while) Checking stateless ... Insufficient Checking 1 frames ... Sufficient Minimal state length is 1 frames Creating multi-threaded MEX file: kalmanfilter_mt.mexw64 Creating analyzer file: kalmanfilter_analyzer.p

Call the analyzer function.

kalmanfilter_analyzer([noisyVal;0.01*noisyVal;0.05*noisyVal;0.1*noisyVal]);

Analyzing multi-threaded MEX file kalmanfilter_mt.mexw64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 8 frames Speedup = 0.7x

`kalmanfilter_mt`

has a speedup factor of `0.7`

,
which is a performance loss of `30%`

when compared
to the single-threaded MEX file, `kalmanfilter_st`

.
Increase the repetition factor to `3`

to see if the
performance increases. Also, detect the state length in samples.

dspunfold kalmanfilter -args {noisyVal} -s auto -f true -r 3

State length: [autodetect] samples, Repetition: 3, Output latency: 24 frames, Threads: 4 Analyzing: kalmanfilter.m Creating single-threaded MEX file: kalmanfilter_st.mexw64 Searching for minimal state length (this might take a while) Checking stateless ... Insufficient Checking 4000 samples ... Sufficient Checking 2000 samples ... Sufficient Checking 1000 samples ... Sufficient Checking 500 samples ... Sufficient Checking 250 samples ... Insufficient Checking 375 samples ... Sufficient Checking 312 samples ... Sufficient Checking 281 samples ... Sufficient Checking 265 samples ... Sufficient Checking 257 samples ... Insufficient Checking 261 samples ... Sufficient Checking 259 samples ... Sufficient Checking 258 samples ... Insufficient Minimal state length is 259 samples Creating multi-threaded MEX file: kalmanfilter_mt.mexw64 Creating analyzer file: kalmanfilter_analyzer.p

Call the analyzer function.

kalmanfilter_analyzer([noisyVal;0.01*noisyVal;0.05*noisyVal;0.1*noisyVal]);

Analyzing multi-threaded MEX file kalmanfilter_mt.mexw64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 24 frames Speedup = 1.4x

`dspunfold`

gives a speedup gain of `40%`

when
compared to the highly optimized single-threaded MEX file. Specify
the exact state length and increase the repetition factor to `4`

.

dspunfold kalmanfilter -args {noisyVal} -s 259 -f true -r 4

State length: 259 samples, Repetition: 4, Output latency: 32 frames, Threads: 4 Analyzing: kalmanfilter.m Creating single-threaded MEX file: kalmanfilter_st.mexw64 Creating multi-threaded MEX file: kalmanfilter_mt.mexw64 Creating analyzer file: kalmanfilter_analyzer.p

Invoke the analyzer function to see the speedup gain.

kalmanfilter_analyzer([noisyVal;0.01*noisyVal;0.05*noisyVal;0.1*noisyVal]);

Analyzing multi-threaded MEX file kalmanfilter_mt.mexw64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 32 frames Speedup = 1.5x

The speedup gain factor is `50%`

when compared
to the single-threaded MEX file.

The performance gain factors `codegen`

and `dspunfold`

give
depend on your algorithm. `codegen`

provides sufficient
acceleration for some MATLAB constructs. `dspunfold`

can
provide additional performance gains using the cores available on
your machine to distribute your algorithm via DSP unfolding. As shown
in this example, the amount of speedup that `dspunfold`

provides depends on the particular algorithm to accelerate. Use `dspunfold`

in
addition to `codegen`

if your algorithm is well-suited
for distributing via DSP unfolding, and if the resulting latency cost
is in line with the constraints of your application.

Some MATLAB constructs are highly optimized with MATLAB interpreted
execution. The `fft`

function,
for example, runs much faster in interpreted simulation than with
code generation.

Was this topic helpful?