Faster interp1 and indexing on GPU
Show older comments
Dear all,
This is my first time using Matlab on GPU.
I tried the benchmark code to test my GPU. For double precision, my GPU is around 50 times better than CPU.
I changed my input arrary into gpuArray. The performance is shown in the figures. test_bi_grlt_pat*.m calls Bi_GLRT_patch1_1.m and then calls Dnoisefun.m (Dnoisefun. and noisefun.m are similiar.)
I am doing image processing. Bi_GLRT_patch1_1.m is basically gradient descent on each pixel. Dnoisefun.m calculates the gradient on each pixel. noisefun.m calculates the value on each pixel.
For CPU:



For GPU:



As we can see, GPU is much slower than CPU. The reason is: we called Dnoisefun.m and noisefun.m a lot; 'interp1' should be faster on GPU but didn't seem so; the indexing operation 'result(result<0)' is super slow on GPU.
Any advice on how to improve this?
Furthermore, I wrote a simple code to test different dimension of array's performance on GPU and CPU, where Inten, DProb is the x, y for interpolation:
gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
disp(size(x));
xg= gpuArray(x);
tic
result1=interp1(Inten,DProb,x,'linear','extrap' );
time1 = toc;
disp(time1)
x1=x';
tic
result2=interp1(Inten,DProb,x1,'linear','extrap' );
time2 = toc;
disp(time2)
tic
result3=interp1(Inten,DProb,xg,'linear','extrap' );
time3 = toc;
disp(time3)
xg1=xg';
tic
result=interp1(Inten,DProb,xg1,'linear','extrap' );
time4 = toc;
disp(time4)
The performance is not very consistent for different trials. Here are some of the trials' results:
test_gpu
1 10000
8.0200e-04
2.8000e-04
3.2500e-04
1.2600e-04
>> clear
>> test_gpu
1 100000
9.7700e-04
8.8300e-04
0.0011
1.6100e-04
>> clear
>> test_gpu
1 1000000
0.0055
0.0048
5.1600e-04
9.3200e-04
>> clear
>> test_gpu
1 1000000
0.0051
0.0046
3.5500e-04
1.1500e-04
>> clear
>> test_gpu
1 1000000
0.0059
0.0043
3.7100e-04
1.1600e-04
>> clear
>> test_gpu
1 1000000
0.0058
0.0046
3.6500e-04
1.1900e-04
>> clear
>> test_gpu
1 1000000
0.0057
0.0047
6.5600e-04
0.0011
Similarly, the idexing performance is not consistent either:
clear
load('DDetectorProb.mat')
gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
xs=x;
ban = (min(Inten)+max(Inten))/2;
disp(size(x));
xg= gpuArray(x);
xgs = xg;
tic
xs(x>ban)=1;
time1 = toc;
disp(time1)
x1=x';
xs = x1;
tic
xs(x1>ban)=1;
time2 = toc;
disp(time2)
tic
xgs(xg>ban)=1;
time3 = toc;
disp(time3)
xg1=xg';
xg1s = xg1;
tic
xg1s(xg1>ban)=1;
time4 = toc;
disp(time4)
Results:
1 1000000
0.0031
0.0034
0.0010
0.0014
1 1000000
0.0032
0.0030
7.6000e-04
8.7700e-04
1 1000000
0.0032
0.0031
7.2500e-04
0.0021
1 1000000
0.0030
0.0031
7.7100e-04
0.0019
5 Comments
Joss Knight
on 20 Nov 2019
Please provide the inputs Inten and DProb so that I can inspect your code.
Please also call wait(gpuDevice) before each call to tic or toc as per the guidelines for timing GPU code in the documentation here. That way you will be getting correct timings for your code.
mengya hu
on 21 Nov 2019
Joss Knight
on 25 Nov 2019
Hopefully your response through technical support was sufficient?
mengya hu
on 26 Nov 2019
Kyle Steiner
on 12 Mar 2020
I'd be interested to see your response from technical support - would you be able to post?
Thanks!
Answers (1)
Walter Roberson
on 12 Mar 2020
Instead of indexing modify your lower boundary slightly and use min and max
result = min(0.8, max(realmin, result)) ;
The difference is that in your original code any value that was exactly 0 was left exactly 0 and negative were modified to realmin (which is positive), whereas in this revised code, values that are exactly 0 would modified to realmin as well.
1 Comment
Walter Roberson
on 12 Mar 2020
Which is to say: don't do your own indexing on GPUs if you can avoid it. The architecture of Nvidia gpu makes indexing inefficient.
Categories
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!