Why for GPU is slower than CPU for this code? Is it because of sparsity or because of "for" loop

2 views (last 30 days)
I want to generate two 512 by 512 matrices, and change the value of about 1000-2000 points in each matrix.
Yet the GPU is significantly slower than the CPU, about 20 times slower. Is it because of the sparsity of the problem or because of the for loop?
I know that GPU for MATLAB can't deal with sparse matrix efficiently, and "for" loop is not well optimized by GPU, either. But which one is the bottle neck here?
Is there a clue that how I can evade this difficulty? Can't figure it out myself. Kinda of killing me.
Nx=512;
rhsx= zeros(Nx,Nx);
rhsy= zeros(Nx,Nx);
for i=1:256
bmatrix=randn(4,4);
rhsx(nx(i)-1:nx(i)+2,ny(i)-1:ny(i)+2)=...
rhsx(nx(i)-1:nx(i)+2,ny(i)-1:ny(i)+2)+rforcx(i)*bmatrix;
%"nx" and "ny" denote the position where a
%4 by 4 small square in the big matrix is changed.
rhsy(nx(i)-1:nx(i)+2,ny(i)-1:ny(i)+2)=...
rhsy(nx(i)-1:nx(i)+2,ny(i)-1:ny(i)+2)+rforcy(i)*bmatrix;
end

Answers (1)

Joss Knight
Joss Knight on 4 Feb 2016
Edited: Joss Knight on 4 Feb 2016
What you have here is a highly serial algorithm, accumulating small amounts of data inside a large array. To vectorize, perhaps consider looping over the 16 elements of your 4x4 square rather than over your nx and ny arrays. Something like:
bmatrix = randn(256,4,4);
for x = -1:2
for y = -1:2
linearIndex = sub2ind(size(rhsx), nx+x, ny+y);
rhsx(linearIndex) = rhsx(linearIndex) + rforcx.*bmatrix(:,x+2,y+2);
rhsy(linearIndex) = rhsy(linearIndex) + rforcy.*bmatrix(:,x+2,y+2);
end
end
This only works as long as the values of nx and ny are unique. If not, we need to go for the motherload, and vectorize the remaining loops using accumarray:
bmatrix = randn(256,16);
[x, y] = meshgrid(-1:2);
x = bsxfun(@plus, nx(:), x(:)');
y = bsxfun(@plus, ny(:), y(:)');
bmatrixx = bsxfun(@times, rforcx(:), bmatrix);
bmatrixy = bsxfun(@times, rforcy(:), bmatrix);
rhsx = accumarray([x(:) y(:)], bmatrixx(:), [Nx Nx], @sum);
rhsy = accumarray([x(:) y(:)], bmatrixy(:), [Nx Nx], @sum);
  2 Comments
Joss Knight
Joss Knight on 10 Feb 2016
Let me clarify:
% Get all the random boxes for all 256 positions. Each box
% is a row of this matrix.
bmatrix = randn(256,16);
% [x(:) y(:)] will be a list of the 16 offset coordinates for
% each box element
[x, y] = meshgrid(-1:2);
% Add every offset to every coordinate. The result is 256x16, that
% is one row per box, with 16 elements in each box. Now x and y
% identify every element of every box.
x = bsxfun(@plus, nx(:), x(:)');
y = bsxfun(@plus, ny(:), y(:)');
% Apply the multiplier to each box. So for every box b we multiply
% the appropriate elements rforcx(b) and rforcy(b).
bmatrixx = bsxfun(@times, rforcx(:), bmatrix);
bmatrixy = bsxfun(@times, rforcy(:), bmatrix);
% Now we have arrays that identify what to add together and where
% to put the results. accumarray can do the accumulation based
% on the list of coordinates [x(:) y(:)] and the data going into
% each element bmatrixx(:) and bmatrixy(:).
rhsx = accumarray([x(:) y(:)], bmatrixx(:), [Nx Nx], @sum);
rhsy = accumarray([x(:) y(:)], bmatrixy(:), [Nx Nx], @sum);

Sign in to comment.

Categories

Find more on Language Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!