MATLAB Answers

## GPU processing slower that CPU

Asked by Evripidis

### Evripidis (view profile)

on 28 Jun 2013
Accepted Answer by Evripidis

### Evripidis (view profile)

Hello to all!

This is my first question here! I run a code for image processing and I try to implement this on GPU arrays in order to reduce time. The main part of code is below. When I execute loaded all matrixes as gpuArrays, it takes much longer than using it with simple arrays on workspace. I'm very new on GPU processing and this is my first try. Can someone explain to me about this delay?

Thank you a lot

PS: I'm using Matlab R2013a on a Macbook pro

```for i = 1:dim(1)
for j = 1:dim(2)
iMin = max(i-w,1);
iMax = min(i+w,dim(1));
jMin = max(j-w,1);
jMax = min(j+w,dim(2));
I = A(iMin:iMax,jMin:jMax);
H = exp(-(I-A(i,j)).^2/(2*sigma_r^2));
F = H.*G((iMin:iMax)-i+w+1,(jMin:jMax)-j+w+1);
B(i,j) = sum(F(:).*I(:))/sum(F(:));
```
```     end
end```

## Products

No products are associated with this question.

## 3 Answers

### Evripidis (view profile)

Answer by Evripidis

on 28 Jun 2013
Edited by Matt J

### Matt J (view profile)

on 29 Jun 2013
Accepted answer

Relocated to Comment by Matt J

Answer by Matt J

on 28 Jun 2013
Edited by Matt J

### Matt J (view profile)

on 28 Jun 2013

You're not using any of gpuArray's accelerated functions as far as I can see, so no wonder that it is slow. The computations you're doing also don't look terribly appropriate for GPU acceleration. About the only thing that can be parallel-split are the iterations of the for-loop, which is best done on the CPU. You might try the version below, which uses more vectorization and is also re-organized to use PARFOR.

```    II=1:dim(1);
JJ=1:dim(2);
[III,JJJ]=ndgrid(II,JJ);```
```           IMin = max( II - w,1);
IMax = min( II  + w,dim(1));
JMin = max(JJ - w,1);
JMax = min(JJ + w,dim(2));```
```    z=2*sigma_r^2;
wplus1=w+1;```
`      parfor k=1:numel(JJJ)`
`               i=III(k); j=JJJ(k);`
```               irange=IMin(i):IMax(i);
jrange=JMin(j):JMax(j); ```
```               I = A(irange,jrange);
H = exp(-(I-A(i,j)).^2/z);
F = H.*G((irange)+(wplus1-i),(jrange)+(wplus1-j));
B(i,j) = sum(F(:).*I(:))/sum(F(:));```
`      end`

Matt J

### Matt J (view profile)

on 29 Jun 2013

Evripidis Commented:

Thanks for the answer... :) I convert matrixes to GPU arrays previously to program. I just referred here the important point of code, so all A,I,G,H and are already gpuArrays.

Matt J

### Matt J (view profile)

on 29 Jun 2013

Hi Evripidis,

Yes, I understood that A,I,G,H and are already gpuArrays. But converting arrays to gpuArrays doesn't magically make everything you do with them faster. Only certain kinds of operations are accelerated for gpuArrays, like arrayfun() and the functions listed here

http://www.mathworks.com/help/distcomp/using-gpuarray.html#bsloua3-1

For-loops, in particular, are bad on the GPU. That's why I suggested that you convert all the gpuArrays back to normal arrays and try a parfor approach.

### Evripidis (view profile)

Answer by Evripidis

### Evripidis (view profile)

on 1 Jul 2013

Thanks for the info Mett J. ! :) I will implement this code with parfor loops in order to reduce time. :)

#### Join the 15-year community celebration.

Play games and win prizes!

Learn more

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

### Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

### MATLAB Academy

New to MATLAB?

Learn MATLAB today!