Code covered by the BSD License  

Highlights from
Fast 2D GPU-based convolution

4.2

4.2 | 5 ratings Rate this file 37 Downloads (last 30 days) File Size: 49 KB File ID: #20220
image thumbnail

Fast 2D GPU-based convolution

by

 

09 Jun 2008 (Updated )

Graphics chip assisted fast 2d convolution

| Watch this File

File Information
Description

cudaconv - Performs 2d convolution using an NVIDIA graphics chipset.

For large datasets (~1 million elements) and especially for large kernels (performance does not scale much with kernel size) cudaconv can outperform conv2 by as much as 5000%.

I did not create this algorithm.. it is adapted from an example included in the CUDA SDK and wrapped in MATLAB-compatible C code.

With very large data matrices, it can *completely* crash your computer(/graphics driver?), so beware. In testing, I found an upper limit on convolution size (limited either by the size the CUDA FFT function can accept or the size of a 2D texture) of roughly 2^20 elements, so above that the code breaks the convolution into smaller pieces. If you are feeling adventurous, feel free to raise that limit, but be aware that at those sizes cudaconv is already roughly 50-100x faster than conv2.

MATLAB release MATLAB 7.4 (R2007a)
Other requirements To compile and run this software, you need the NVIDIA CUDA Toolkit (http://www.nvidia.com/object/cuda_get.html) and a modern NVIDIA graphics card. Tested on OS X 10.5, assumed to work under any brand of Linux, no guarantees in Windows.
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (13)
09 May 2012 Xuefeng

anybody successfully compiled and ran the code under windows, with correct results?

20 Aug 2011 Bogdan Vacaliuc

Try adding '-m 64' to the nvcc compile line.

I had similar issues on MacOS (10.6.7+) because 'uname -a' returns i386 but gcc builds for x86_64 by default. nvcc tries to 'autodetect' but gets the wrong value.

I hope this helps.

06 Jun 2011 Diego Ardila

I get the same result as Dung Chu when I use the .mexmaci file which is included with the download.

I believe that you are supposed to delete that file, and create a new one using make. (Go to that directory in terminal, type 'make')

However, when I do this I am getting architecture issues that I do not know how to deal with:
When compiling, I get the errors like this:

warning: in cudaconv.o, file was built for i386 which is not the architecture being linked (x86_64)

When using the resulting file I get this:
c = cudaconv(2,2)
??? Invalid MEX-file '/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci': dlopen(/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci, 1): no suitable image found. Did find:
/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci: mach-o, but wrong architecture.

01 Apr 2010 Dung Chu

It works. But the result is somehow weird. I run this
y = ones(5);
f = 1/5 * ones(3);
z = cudaconv(y, f)
z2 = conv2(y, f, 'same')

z =

1.0e-35 *

-0.1319 0.0000 -0.1319 0.0000 -0.1319
0.0000 0 0.0000 0 0
0 0 0.0000 0 -0.1320
-0.1319 0.0000 -0.1319 0.0000 -0.1941
0 0 0 0 0.0000

z2 =

0.8000 1.2000 1.2000 1.2000 0.8000
1.2000 1.8000 1.8000 1.8000 1.2000
1.2000 1.8000 1.8000 1.8000 1.2000
1.2000 1.8000 1.8000 1.8000 1.2000
0.8000 1.2000 1.2000 1.2000 0.8000

I'm using Fedora 10 with matlab2008. Does any one have any idea why?

23 Feb 2010 Oh HongSic

Dear Alex, I compile this example as follow..

first of all, i insert a some code at cudaconv.cu
#pragma comment(lib,"C:\\CUDA\\lib64\\cufft.lib")
#pragma comment(lib,"C:\\CUDA\\lib64\\cudart.lib")

next, make a object file

>> system('c:\cuda\bin64\nvcc --compile "d:\cudaconv\cudaconv\cudaconv.cu" -ccbin "C:\Dev\msvs\VC\bin" -o cudaconv.o -IC:\Dev\MATLAB\R2009b\extern\include -IC:\Dev\Msvs\VC\include')

Finaly, compile & link it

>> mex ('cudaconv.o')

good luck to you & sorry to my poor english..

03 Dec 2009 Alex

Docu clearly states not windows supported. Trying to alter mex files to have this work. Has anyone had any luck getting this to work under windowze?

04 May 2009 Don

I have not ventured outside of matlab yet. How to I compile this code so I can run it?

-D

24 Apr 2009 Jveer

finally functions that use the GPU!

24 Apr 2009 Bernd

The convolution is very fast and pretty accurate for the 'valid' part of an 2D signal (except the known double-single precision difference), but there are big differences near the edges if using 'same' shape. Therefore I wrote a piece of shaping code to treat it like conv2. Please test and report any coding mistakes!!!
____________________________________________________
function [newimage] = cudaconv2(image,filter,shape)
if nargin == 2
shape = 'full';
end

if (strcmp(shape, 'full')) % it's not a real 'full' convolution !!!!!
[im in] = size(image);
[fm fn] = size(filter);
outM1 = 1;
outN1 = 1;
image2 = zeros(im+fm-1, in+fn-1);
image2(round(fm/2):round(im + fm/2 - ...1),round(fn/2):round(in + fn/2 - 1)) = image(1:end,1:end);
output = cudaconv(image2,filter);
[outM2, outN2] = size(output);

elseif (strcmp(shape, 'same')) % large differences on the edges
output = cudaconv(image,filter);
[Am An] = size(image);
outM1 = 1;
outN1 = 1;
outM2 = Am;
outN2 = An;

elseif (strcmp(shape, 'valid')) % very accurate
output = cudaconv(image,filter);
[Am An] = size(image);
[Cm Rn] = size(filter);
outM1 = round(Cm/2);
outN1 = round(Rn/2);
outM2 = round(Am - Cm/2);
outN2 = round(An - Rn/2);
else
disp('Shape type not valid');
return;
end

newimage = output(outM1:outM2,outN1:outN2);
____________________________________________________

09 Apr 2009 Yi Cao

It works as expected on my Geforce 8400 GPU.

01 Mar 2009 Bjorn Bjorno

To solve the problem with the zeros output (see previous message by Simon Knight), run the NVIDA CUDA toolkit installer again, opt for the customized installation and check 'CUDAKext'. After rebooting, the cudaconv function should run perfectly.

22 Sep 2008 Simon Knight

Hi, I am only getting a matrix of zeros when I run this:
>> y = rand(64);
>> f = 1/9*ones(3);
>> z1 = conv2(y,f, 'same');
>> z2 = cudaconv(y,f);
>> any(any(z1))
ans =
1
>> any(any(z2))
ans =
0

I am using R2007a, and have tried on OSX. Is the latest zip file supplied above the one with the corrected header file?
This stuff looks promising, so I'd be very keen to try it.
Thanks!

17 Jun 2008 Alex Huth

Sorry there was a missing header file -- all should be fixed when the update is posted.

Updates
17 Jun 2008

Fixed missing header file, removed unnecessary file resource forks, reformatted m-file help.

Contact us