# Matrix Multiplication on GPU quite slow?

20 views (last 30 days)

Show older comments

Hi, I just started out using GPU in Matlab and hoped for considerable performance gains in matrix multiplication. I did some performance test and read quite a bit on it in different spots. But my results from testing appear quite frustrating and I found no good explanations online for those mixed results.

First some hardware info: i5-4590 quadcore 3.30GHz, 64 bit(Win 7, Matlab 2016a); GeForce GT 640, 384 CUDA cores, ~1 GHz.

When running the tests, I got some gains when multiplying 2 1024x1024 matrices. But when looping on 200x200 or 500x500 matrices multiplication is down for GPU by about the difference in clock speed. While looping over some similiar matrix addition shows up as succesful as I hoped.

I also get different results for timing with tictoc or (gpu)timeit.

So here are my timing results, which mostly explain themselves. Attached there is also the MinExample producing this output.

-------------------------------------

Single Matrix Operation on 1024x1024

-------------------------------------

Standard CPU:

tictoc

Elapsed time is 0.030685 seconds.

timeit

Elapsed time is 0.035352 seconds

Lets check GPU:

tictoc

Elapsed time is 0.000323 seconds.

Elapsed time is 0.000173 seconds.

timeit

Elapsed time is 0.061935 seconds

Elapsed time is 0.061718 seconds

-------------------------------------

Now starting some loops:

-------------------------------------

-------------------------------------

Matrix Addition n=10000:

-------------------------------------

-------------------------------------

Matrix is 600x600

-------------------------------------

Standard CPU:

Elapsed time is 1.675066 seconds.

Lets check GPU:

Elapsed time is 0.123021 seconds.

-------------------------------------

Matrix is 1000x1000

-------------------------------------

Standard CPU:

Elapsed time is 20.782437 seconds.

Lets check GPU:

Elapsed time is 0.119888 seconds.

-------------------------------------

Matrix Multiplication n=1000:

-------------------------------------

-------------------------------------

Matrix is 200x200

-------------------------------------

Standard CPU:

Elapsed time is 0.190912 seconds.

Lets check GPU:

Elapsed time is 0.751289 seconds.

-------------------------------------

Matrix is 500x500

-------------------------------------

Standard CPU:

Elapsed time is 2.620033 seconds.

Lets check GPU:

Elapsed time is 7.402474 seconds.

I summarize here for better understanding. One time operations with CPU(1024x1024): around 0.031s While for GPU tictoc counts only like 0.0003s but timeit gets like 0.06s. First confusion here, does timing function matter so much? Does GPU really speedup?

Next doing 1k multiplications on 500x500 takes: CPU: 2.62s GPU: 7.40s Loosing around clock speed difference.

For the 100k addition of 1000x1000 GPU speeds up dramatically from 20.78s -> 0.12s

So is there a consistent way to speed up with GPU in matrix multiplications? Can exact implementation matter a lot? What slows down the multiplication loop?

Thanks in advance Best Sven

### Answers (2)

Edric Ellis
on 8 Dec 2017

### See Also

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!