Faster interp1 and indexing on GPU

Question

0 votes

DDetectorProb.mat

Dear all,

This is my first time using Matlab on GPU.

I tried the benchmark code to test my GPU. For double precision, my GPU is around 50 times better than CPU.

I changed my input arrary into gpuArray. The performance is shown in the figures. test_bi_grlt_pat*.m calls Bi_GLRT_patch1_1.m and then calls Dnoisefun.m (Dnoisefun. and noisefun.m are similiar.)

I am doing image processing. Bi_GLRT_patch1_1.m is basically gradient descent on each pixel. Dnoisefun.m calculates the gradient on each pixel. noisefun.m calculates the value on each pixel.

For CPU:

For GPU:

As we can see, GPU is much slower than CPU. The reason is: we called Dnoisefun.m and noisefun.m a lot; 'interp1' should be faster on GPU but didn't seem so; the indexing operation 'result(result<0)' is super slow on GPU.

Any advice on how to improve this?

Furthermore, I wrote a simple code to test different dimension of array's performance on GPU and CPU, where Inten, DProb is the x, y for interpolation:

gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
disp(size(x));
xg= gpuArray(x);
tic 
result1=interp1(Inten,DProb,x,'linear','extrap' );
time1 = toc;
disp(time1)
x1=x';
tic 
result2=interp1(Inten,DProb,x1,'linear','extrap' );
time2 = toc;
disp(time2)
tic 
result3=interp1(Inten,DProb,xg,'linear','extrap' );
time3 = toc;
disp(time3)
xg1=xg';
tic 
result=interp1(Inten,DProb,xg1,'linear','extrap' );
time4 = toc;
disp(time4)

The performance is not very consistent for different trials. Here are some of the trials' results:

 test_gpu
           1       10000
   8.0200e-04
   2.8000e-04
   3.2500e-04
   1.2600e-04
>> clear 
>> test_gpu
           1      100000
   9.7700e-04
   8.8300e-04
    0.0011
   1.6100e-04
>> clear
>> test_gpu
           1     1000000
    0.0055
    0.0048
   5.1600e-04
   9.3200e-04
>> clear
>> test_gpu
           1     1000000
    0.0051
    0.0046
   3.5500e-04
   1.1500e-04
>> clear
>> test_gpu
           1     1000000
    0.0059
    0.0043
   3.7100e-04
   1.1600e-04
>> clear
>> test_gpu
           1     1000000
    0.0058
    0.0046
   3.6500e-04
   1.1900e-04
>> clear
>> test_gpu
           1     1000000
    0.0057
    0.0047
   6.5600e-04
    0.0011

Similarly, the idexing performance is not consistent either:

clear
load('DDetectorProb.mat')
gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
xs=x;
ban = (min(Inten)+max(Inten))/2;
disp(size(x));
xg= gpuArray(x);
xgs = xg;
tic 
xs(x>ban)=1;
time1 = toc;
disp(time1)
x1=x';
xs = x1;
tic 
xs(x1>ban)=1;
time2 = toc;
disp(time2)
tic 
xgs(xg>ban)=1;
time3 = toc;
disp(time3)
xg1=xg';
xg1s = xg1;
tic 
xg1s(xg1>ban)=1;
time4 = toc;
disp(time4)

Results:
         1     1000000
0031
0034
0010
0014
           1     1000000
0032
0030
6000e-04
7700e-04
           1     1000000
0032
0031
2500e-04
0021
           1     1000000
0030
0031
7100e-04
0019

5 Comments
Show 3 older comments Hide 3 older comments

mengya hu on 26 Nov 2019

Thanks. Yes. Should I copy or you copy the answers I get for other users who may see this post for help later?

Kyle Steiner on 12 Mar 2020

I'd be interested to see your response from technical support - would you be able to post?

Thanks!

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 12 Mar 2020

Open in MATLAB Online

0 votes

Instead of indexing modify your lower boundary slightly and use min and max

result = min(0.8, max(realmin, result)) ;

The difference is that in your original code any value that was exactly 0 was left exactly 0 and negative were modified to realmin (which is positive), whereas in this revised code, values that are exactly 0 would modified to realmin as well.

1 Comment
Show -1 older comments Hide -1 older comments

Walter Roberson on 12 Mar 2020

Which is to say: don't do your own indexing on GPUs if you can avoid it. The architecture of Nvidia gpu makes indexing inefficient.

Sign in to comment.

Faster interp1 and indexing on GPU

5 Comments
Show 3 older comments Hide 3 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

Faster interp1 and indexing on GPU

5 Comments Show 3 older comments Hide 3 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments

1 Comment
Show -1 older comments Hide -1 older comments