Parallel matrix operations on GPU using arrayfun, how to accelerate for big matrix?

Question

Open in MATLAB Online

1 vote

I am attempting to use GPU to accelerate for loop problem.

The process is like this:

for a n^2*n^2 matrix,

step 1: reshape each column n^2 element to a n*n matrix,

step 2: do a 2d discrete sine transform,

step 3: reshape back to n^2*n^2 matrix.

Now my computer takes 4.23 second for this matrix muliplication. My question is how to improve the speed?

The code is like this:

% input
    n_square=10000;
    A= rand(n_square, n_square);
    tic
    B=bigMatrixMuliplication(A);
    toc
    % core function
      function output=bigMatrixMuliplication(input)
        n_square = size(input,1);
        n=sqrt(n_square);
        output=zeros(n_square,n_square);
        for ii=1:n_square
            output(:,ii) = reshape(dst2(reshape(input(:,ii),n,n)),n_square,1);  %step 1,2,3
        end
    end
    % 1 d discrete sine transform
    function y = dst1(x)  %1 d discrete sine transform
    n = size(x,1); m = size(x,2);
    y = [zeros(1,m);x]; y = -imag(fft(y,2*n+2)); 
    y = sqrt(2/(n+1))*y(2:n+1,:);
    end
    % 2 d discrete sine transform
    function y = dst2(x) %2 d discrete sine transform
    y = dst1(dst1(x).').';
    end

One of my idea is using parfor, but result is similar, also about 5 seconds. Another idea is split this matrix into 3d matrix, for example 10000*10000 matrix into 100*100*10000 matices. And use GPU to calculate 10000 matrices with size of 100*100 in parallel. But arrayfun on GPU is even slower than by CPU. Someone maybe has idea how to do it? Thank you very much!

A = rand(100, 100, 5000);
A1 = gpuArray.rand(100, 100, 5000);
B1=  1:5000;
tic
B=arrayfun(@(i)dst2(A(:,:,i)), 1:size(A,3),'un',0);
time1=toc
tic
B=arrayfun(@(i)dst2(A1(:,:,i)), B1,'un',0);
time2=toc

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 2 Oct 2017

Edited: Matt J on 3 Oct 2017

Open in MATLAB Online

2 votes

I believe you can accelerate this considerably using a combination of my FEX tools FUNC2MAT ( Download ) and KronProd ( Download ).

First, obtain a 1D DST matrix

M=gpuArray( func2mat(@dst1,ones(n,1),'doSparse',0) ); % n x n DST matrix

Then build a KronProd object to apply it in 2D

K=KronProd({M,M});

and now simply multiply

input=gpuArray(input);
tic;
 output = K*input;
toc

Note that the relevant performance measure is the time for the multiplication alone. The construction of the K-object is a one-time cost.

1 Comment
Show -1 older comments Hide -1 older comments

Matt J on 2 Oct 2017

Edited: Matt J on 2 Oct 2017

Open in MATLAB Online

Here's a simplified experiment mimicking your data sizes. The execution times that I get on my GPU (Titan X) are well below 4.23 sec.

    >> M=gpuArray(rand(100));
    >> K=KronProd({M,M});
    >> input=gpuArray.rand(1e4);
    >> gputimeit(@() K*input)
    ans =
        0.2863

Sign in to comment.

Answer 2

Jan on 2 Oct 2017

0 votes

I've played with the code. Inlining the contents of dst1 into dst2 or inlining both function into bigMatrixMuliplication might save some milliseconds. But the main work remains in -imag(fft(y,2*n+2)) and the copying of the values between the arrays.

It is a pitty to calculate the full fft, if you crop the results afterwards. Perhaps a hand-made fft version can avoid this.

The columns are independent from each other, therefore I'd assume that parfor accelerates the code. Please post, how you have implemented it and on how many cores you run it. A limitation might be, that fft could be multi-threaded already. Then all cores are loaded and cannot be used for a parallelization.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Parallel matrix operations on GPU using arrayfun, how to accelerate for big matrix?

0 Comments
Show -2 older comments Hide -2 older comments

Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Parallel matrix operations on GPU using arrayfun, how to accelerate for big matrix?

0 Comments Show -2 older comments Hide -2 older comments

Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments