File Exchange

image thumbnail

Fast 2D GPU-based convolution

version 1.0 (49 KB) by

Graphics chip assisted fast 2d convolution

7 Downloads

Updated

View License

cudaconv - Performs 2d convolution using an NVIDIA graphics chipset.

For large datasets (~1 million elements) and especially for large kernels (performance does not scale much with kernel size) cudaconv can outperform conv2 by as much as 5000%.

I did not create this algorithm.. it is adapted from an example included in the CUDA SDK and wrapped in MATLAB-compatible C code.

With very large data matrices, it can *completely* crash your computer(/graphics driver?), so beware. In testing, I found an upper limit on convolution size (limited either by the size the CUDA FFT function can accept or the size of a 2D texture) of roughly 2^20 elements, so above that the code breaks the convolution into smaller pieces. If you are feeling adventurous, feel free to raise that limit, but be aware that at those sizes cudaconv is already roughly 50-100x faster than conv2.

Comments and Ratings (13)

DBrown

DBrown (view profile)

anybody successfully compiled and ran the code under windows, with correct results?

Try adding '-m 64' to the nvcc compile line.

I had similar issues on MacOS (10.6.7+) because 'uname -a' returns i386 but gcc builds for x86_64 by default. nvcc tries to 'autodetect' but gets the wrong value.

I hope this helps.

Diego Ardila

I get the same result as Dung Chu when I use the .mexmaci file which is included with the download.

I believe that you are supposed to delete that file, and create a new one using make. (Go to that directory in terminal, type 'make')

However, when I do this I am getting architecture issues that I do not know how to deal with:
When compiling, I get the errors like this:

warning: in cudaconv.o, file was built for i386 which is not the architecture being linked (x86_64)

When using the resulting file I get this:
c = cudaconv(2,2)
??? Invalid MEX-file '/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci': dlopen(/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci, 1): no suitable image found. Did find:
/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci: mach-o, but wrong architecture.

Dung Chu

It works. But the result is somehow weird. I run this
y = ones(5);
f = 1/5 * ones(3);
z = cudaconv(y, f)
z2 = conv2(y, f, 'same')

z =

   1.0e-35 *

   -0.1319 0.0000 -0.1319 0.0000 -0.1319
    0.0000 0 0.0000 0 0
         0 0 0.0000 0 -0.1320
   -0.1319 0.0000 -0.1319 0.0000 -0.1941
         0 0 0 0 0.0000

z2 =

    0.8000 1.2000 1.2000 1.2000 0.8000
    1.2000 1.8000 1.8000 1.8000 1.2000
    1.2000 1.8000 1.8000 1.8000 1.2000
    1.2000 1.8000 1.8000 1.8000 1.2000
    0.8000 1.2000 1.2000 1.2000 0.8000

I'm using Fedora 10 with matlab2008. Does any one have any idea why?

Oh HongSic

Dear Alex, I compile this example as follow..

first of all, i insert a some code at cudaconv.cu
#pragma comment(lib,"C:\\CUDA\\lib64\\cufft.lib")
#pragma comment(lib,"C:\\CUDA\\lib64\\cudart.lib")

next, make a object file

>> system('c:\cuda\bin64\nvcc --compile "d:\cudaconv\cudaconv\cudaconv.cu" -ccbin "C:\Dev\msvs\VC\bin" -o cudaconv.o -IC:\Dev\MATLAB\R2009b\extern\include -IC:\Dev\Msvs\VC\include')

Finaly, compile & link it

>> mex ('cudaconv.o')

good luck to you & sorry to my poor english..

Alex

Alex (view profile)

Docu clearly states not windows supported. Trying to alter mex files to have this work. Has anyone had any luck getting this to work under windowze?

Don

Don (view profile)

I have not ventured outside of matlab yet. How to I compile this code so I can run it?

-D

Jveer

Jveer (view profile)

finally functions that use the GPU!

Bernd

Bernd (view profile)

The convolution is very fast and pretty accurate for the 'valid' part of an 2D signal (except the known double-single precision difference), but there are big differences near the edges if using 'same' shape. Therefore I wrote a piece of shaping code to treat it like conv2. Please test and report any coding mistakes!!!
____________________________________________________
function [newimage] = cudaconv2(image,filter,shape)
if nargin == 2
    shape = 'full';
end

if (strcmp(shape, 'full')) % it's not a real 'full' convolution !!!!!
    [im in] = size(image);
    [fm fn] = size(filter);
    outM1 = 1;
    outN1 = 1;
    image2 = zeros(im+fm-1, in+fn-1);
    image2(round(fm/2):round(im + fm/2 - ...1),round(fn/2):round(in + fn/2 - 1)) = image(1:end,1:end);
    output = cudaconv(image2,filter);
    [outM2, outN2] = size(output);
    
elseif (strcmp(shape, 'same')) % large differences on the edges
    output = cudaconv(image,filter);
    [Am An] = size(image);
    outM1 = 1;
    outN1 = 1;
    outM2 = Am;
    outN2 = An;

elseif (strcmp(shape, 'valid')) % very accurate
    output = cudaconv(image,filter);
    [Am An] = size(image);
    [Cm Rn] = size(filter);
    outM1 = round(Cm/2);
    outN1 = round(Rn/2);
    outM2 = round(Am - Cm/2);
    outN2 = round(An - Rn/2);
else
    disp('Shape type not valid');
    return;
end

newimage = output(outM1:outM2,outN1:outN2);
____________________________________________________

Yi Cao

Yi Cao (view profile)

It works as expected on my Geforce 8400 GPU.

Bjorn Bjorno

To solve the problem with the zeros output (see previous message by Simon Knight), run the NVIDA CUDA toolkit installer again, opt for the customized installation and check 'CUDAKext'. After rebooting, the cudaconv function should run perfectly.

Simon Knight

Hi, I am only getting a matrix of zeros when I run this:
>> y = rand(64);
>> f = 1/9*ones(3);
>> z1 = conv2(y,f, 'same');
>> z2 = cudaconv(y,f);
>> any(any(z1))
ans =
     1
>> any(any(z2))
ans =
     0

I am using R2007a, and have tried on OSX. Is the latest zip file supplied above the one with the corrected header file?
This stuff looks promising, so I'd be very keen to try it.
Thanks!

Alex Huth

Sorry there was a missing header file -- all should be fixed when the update is posted.

Updates

Fixed missing header file, removed unnecessary file resource forks, reformatted m-file help.

MATLAB Release
MATLAB 7.4 (R2007a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video