Contents

Accelerating Correlation with GPUs

This example shows how a GPU can be used to accelerate cross-correlation. Many correlation problems involve large data sets and can be solved much faster using a GPU. To use this example, you must have a Parallel Computing Toolbox™ user license and a CUDA-enabled NVIDIA GPU with compute capability 1.3 or above.

Introduction

To execute this example, you must have a GPU with a ComputeCapability of 1.3 or greater. You access the GPU using the Parallel Computing Toolbox product. First, it is important to know basic information about the GPU in your machine.

fprintf('Benchmarking GPU-accelerated Cross-Correlation.\n');

if ~(parallel.gpu.GPUDevice.isAvailable)
    fprintf(['\n\t**GPU does not have a compute capability of 1.3 or ' ...
             'greater. Stopping.**\n']);
    return;
else
    dev = gpuDevice;
    fprintf(...
    'GPU detected (%s, %d multiprocessors, Compute Capability %s)',...
    dev.Name, dev.MultiprocessorCount, dev.ComputeCapability);
end
Benchmarking GPU-accelerated Cross-Correlation.
GPU detected (Tesla C2075, 14 multiprocessors, Compute Capability 2.0)

Benchmarking Functions

Because code written for the CPU can be ported to run on the GPU, a single function can be used to benchmark both the CPU and GPU. However, because code on the GPU executes asynchronously from the CPU, special precaution should be taken when measuring performance. Before measuring the time taken to execute a function, ensure that all GPU processing has finished by executing the 'wait' method on the device. This extra call will have no effect on the CPU performance.

This example benchmarks three different types of cross-correlation.

Benchmark Simple Cross-Correlation

For the first case, two vectors of equal size are cross-correlated using the syntax xcorr(u,v). The ratio of CPU execution time to GPU execution time is plotted against the size of the vectors.

fprintf('\n\n *** Benchmarking vector-vector cross-correlation*** \n\n');
fprintf('Benchmarking function :\n');
type('benchXcorrVec');
fprintf('\n\n');

sizes = [2000 1e4 1e5 5e5 1e6];
tc = zeros(1,numel(sizes));
tg = zeros(1,numel(sizes));
numruns = 10;

for s=1:numel(sizes);
    fprintf('Running xcorr of %d elements...\n', sizes(s));
    delchar = repmat('\b', 1,numruns);

    a = rand(sizes(s),1);
    b = rand(sizes(s),1);
    tc(s) = benchXcorrVec(a, b, numruns);
    fprintf([delchar '\t\tCPU  time : %.2f ms\n'], 1000*tc(s));
    tg(s) = benchXcorrVec(gpuArray(a), gpuArray(b), numruns);
    fprintf([delchar '\t\tGPU time :  %.2f ms\n'], 1000*tg(s));
end

%Plot the results
fig = figure;
ax = axes('parent', fig);
semilogx(ax, sizes, tc./tg, 'r*-');
ylabel(ax, 'Speedup');
xlabel(ax, 'Vector size');
title(ax, 'GPU Acceleration of XCORR');
drawnow;

 *** Benchmarking vector-vector cross-correlation*** 

Benchmarking function :

function t = benchXcorrVec(u,v, numruns)
%Used to benchmark xcorr with vector inputs on the CPU and GPU.
    
%   Copyright 2012 The MathWorks, Inc.

    timevec = zeros(1,numruns);
    gdev = gpuDevice;
    for ii=1:numruns
        ts = tic;
        o = xcorr(u,v); %#ok<NASGU>
        wait(gdev)
        timevec(ii) = toc(ts);
        fprintf('.');
    end
    t = min(timevec);
end


Running xcorr of 2000 elements...
		CPU  time : 0.74 ms
		GPU time :  3.51 ms
Running xcorr of 10000 elements...
		CPU  time : 1.98 ms
		GPU time :  3.69 ms
Running xcorr of 100000 elements...
		CPU  time : 16.16 ms
		GPU time :  5.47 ms
Running xcorr of 500000 elements...
		CPU  time : 84.70 ms
		GPU time :  15.51 ms
Running xcorr of 1000000 elements...
		CPU  time : 325.30 ms
		GPU time :  28.03 ms

Benchmarking Matrix Column Cross-Correlation

For the second case, the columns of a matrix A are pairwise cross-correlated to produce a large matrix output of all correlations using the syntax xcorr(A). The ratio of CPU execution time to GPU execution time is plotted against the size of the matrix A.

fprintf('\n\n *** Benchmarking matrix column cross-correlation*** \n\n');
fprintf('Benchmarking function :\n');
type('benchXcorrMatrix');
fprintf('\n\n');

sizes = floor(linspace(0,100, 11));
sizes(1) = [];
tc = zeros(1,numel(sizes));
tg = zeros(1,numel(sizes));
numruns = 10;

for s=1:numel(sizes);
    fprintf('Running xcorr (matrix) of a %d x %d matrix...\n', sizes(s), sizes(s));
    delchar = repmat('\b', 1,numruns);

    a = rand(sizes(s));
    tc(s) = benchXcorrMatrix(a, numruns);
    fprintf([delchar '\t\tCPU  time : %.2f ms\n'], 1000*tc(s));
    tg(s) = benchXcorrMatrix(gpuArray(a), numruns);
    fprintf([delchar '\t\tGPU time :  %.2f ms\n'], 1000*tg(s));
end

%Plot the results
fig = figure;
ax = axes('parent', fig);
plot(ax, sizes.^2, tc./tg, 'r*-');
ylabel(ax, 'Speedup');
xlabel(ax, 'Matrix Elements');
title(ax, 'GPU Acceleration of XCORR (Matrix)');
drawnow;

 *** Benchmarking matrix column cross-correlation*** 

Benchmarking function :

function t = benchXcorrMatrix(A, numruns)
%Used to benchmark xcorr with Matrix input on CPU and GPU.
    
%   Copyright 2012 The MathWorks, Inc.

    timevec = zeros(1,numruns);
    gdev = gpuDevice;
    for ii=1:numruns,
        ts = tic;
        o = xcorr(A); %#ok<NASGU>
        wait(gdev)
        timevec(ii) = toc(ts);
        fprintf('.');
    end
    t = min(timevec);
end


Running xcorr (matrix) of a 10 x 10 matrix...
		CPU  time : 0.78 ms
		GPU time :  3.65 ms
Running xcorr (matrix) of a 20 x 20 matrix...
		CPU  time : 1.39 ms
		GPU time :  3.40 ms
Running xcorr (matrix) of a 30 x 30 matrix...
		CPU  time : 2.26 ms
		GPU time :  3.47 ms
Running xcorr (matrix) of a 40 x 40 matrix...
		CPU  time : 6.04 ms
		GPU time :  3.79 ms
Running xcorr (matrix) of a 50 x 50 matrix...
		CPU  time : 10.23 ms
		GPU time :  3.92 ms
Running xcorr (matrix) of a 60 x 60 matrix...
		CPU  time : 15.18 ms
		GPU time :  4.16 ms
Running xcorr (matrix) of a 70 x 70 matrix...
		CPU  time : 35.34 ms
		GPU time :  5.58 ms
Running xcorr (matrix) of a 80 x 80 matrix...
		CPU  time : 46.71 ms
		GPU time :  6.26 ms
Running xcorr (matrix) of a 90 x 90 matrix...
		CPU  time : 59.43 ms
		GPU time :  7.04 ms
Running xcorr (matrix) of a 100 x 100 matrix...
		CPU  time : 74.93 ms
		GPU time :  8.21 ms

Benchmarking Two-Dimensional Cross-Correlation

For the final case, two matrices, X and Y, are cross correlated using xcorr2(X,Y). X is fixed in size while Y is allowed to vary. The speedup is plotted against the size of the second matrix.

fprintf('\n\n *** Benchmarking 2-D cross-correlation*** \n\n');
fprintf('Benchmarking function :\n');
type('benchXcorr2');
fprintf('\n\n');

sizes = [100, 200, 500, 1000, 1500, 2000];
tc = zeros(1,numel(sizes));
tg = zeros(1,numel(sizes));
numruns = 4;
a = rand(100);

for s=1:numel(sizes);
    fprintf('Running xcorr2 of a 100x100 matrix and %d x %d matrix...\n', sizes(s), sizes(s));
    delchar = repmat('\b', 1,numruns);

    b = rand(sizes(s));
    tc(s) = benchXcorr2(a, b, numruns);
    fprintf([delchar '\t\tCPU  time : %.2f ms\n'], 1000*tc(s));
    tg(s) = benchXcorr2(gpuArray(a), gpuArray(b), numruns);
    fprintf([delchar '\t\tGPU time :  %.2f ms\n'], 1000*tg(s));
end

%Plot the results
fig = figure;
ax =axes('parent', fig);
semilogx(ax, sizes.^2, tc./tg, 'r*-');
ylabel(ax, 'Speedup');
xlabel(ax, 'Matrix Elements');
title(ax, 'GPU Acceleration of XCORR2');
drawnow;

fprintf('\n\nBenchmarking completed.\n\n');

 *** Benchmarking 2-D cross-correlation*** 

Benchmarking function :

function t = benchXcorr2(X, Y, numruns)
%Used to benchmark xcorr2 on the CPU and GPU.

%   Copyright 2012 The MathWorks, Inc.
 
    timevec = zeros(1,numruns);
    gdev = gpuDevice;
    for ii=1:numruns,
        ts = tic;
        o = xcorr2(X,Y); %#ok<NASGU>
        wait(gdev)
        timevec(ii) = toc(ts);
        fprintf('.');
    end
    t = min(timevec);
end


Running xcorr2 of a 100x100 matrix and 100 x 100 matrix...
		CPU  time : 15.92 ms
		GPU time :  10.49 ms
Running xcorr2 of a 100x100 matrix and 200 x 200 matrix...
		CPU  time : 31.35 ms
		GPU time :  20.30 ms
Running xcorr2 of a 100x100 matrix and 500 x 500 matrix...
		CPU  time : 159.65 ms
		GPU time :  60.75 ms
Running xcorr2 of a 100x100 matrix and 1000 x 1000 matrix...
		CPU  time : 552.55 ms
		GPU time :  188.32 ms
Running xcorr2 of a 100x100 matrix and 1500 x 1500 matrix...
		CPU  time : 1228.93 ms
		GPU time :  391.52 ms
Running xcorr2 of a 100x100 matrix and 2000 x 2000 matrix...
		CPU  time : 2402.89 ms
		GPU time :  655.20 ms


Benchmarking completed.

Other GPU Accelerated Signal Processing Functions

There are several other signal processing functions that can be run on the GPU. These functions include fft, ifft, conv, filter, fftfilt, and more. In some cases, you can achieve large acceleration relative to the CPU. For a full list of GPU accelerated signal processing functions see the "GPU Acceleration" sectionsee the "GPU Acceleration" section in the Signal Processing Toolbox (TM) table of contents.

Was this topic helpful?