Hello to all!
This is my first question here! I run a code for image processing and I try to implement this on GPU arrays in order to reduce time. The main part of code is below. When I execute loaded all matrixes as gpuArrays, it takes much longer than using it with simple arrays on workspace. I'm very new on GPU processing and this is my first try. Can someone explain to me about this delay?
Thank you a lot
PS: I'm using Matlab R2013a on a Macbook pro
for i = 1:dim(1) for j = 1:dim(2) iMin = max(i-w,1); iMax = min(i+w,dim(1)); jMin = max(j-w,1); jMax = min(j+w,dim(2)); I = A(iMin:iMax,jMin:jMax); H = exp(-(I-A(i,j)).^2/(2*sigma_r^2)); F = H.*G((iMin:iMax)-i+w+1,(jMin:jMax)-j+w+1); B(i,j) = sum(F(:).*I(:))/sum(F(:));
No products are associated with this question.
Relocated to Comment by Matt J
You're not using any of gpuArray's accelerated functions as far as I can see, so no wonder that it is slow. The computations you're doing also don't look terribly appropriate for GPU acceleration. About the only thing that can be parallel-split are the iterations of the for-loop, which is best done on the CPU. You might try the version below, which uses more vectorization and is also re-organized to use PARFOR.
II=1:dim(1); JJ=1:dim(2); [III,JJJ]=ndgrid(II,JJ);
IMin = max( II - w,1); IMax = min( II + w,dim(1)); JMin = max(JJ - w,1); JMax = min(JJ + w,dim(2));
I = A(irange,jrange); H = exp(-(I-A(i,j)).^2/z); F = H.*G((irange)+(wplus1-i),(jrange)+(wplus1-j)); B(i,j) = sum(F(:).*I(:))/sum(F(:));