File Exchange

image thumbnail

GPUBench

Compare GPUs using standard numerical benchmarks in MATLAB.

194 Downloads

Updated 25 Jul 2018

View License

Editor's Note: This file was selected as MATLAB Central Pick of the Week

GPUBENCH times different MATLAB GPU tasks and estimates the peak performance of your GPU in floating-point operations per second (FLOP/s). It produces a detailed HTML report showing how your GPU's performance compares to pre-stored performance results from a range of other GPUs.
Note that this tool is designed for comparing GPU hardware. It does not compare GPU performance across different MATLAB releases.
Requires MATLAB R2013b or above and a GPU with CUDA Compute Capability 2.0 or higher.

Comments and Ratings (81)

Ben Tordoff

Hi Mohammed, normally gpuBench will measure both your GPU and your CPU to give you an idea of the best-case speed-up you might be able to achieve. If no GPU is detected you should be given the option to run only on the CPU so that you can see what speed-up different GPUs might give you compared to your CPU if you installed them. Is that what you saw, or did it only run on the CPU without offering the choice?

Whilst it might seem odd at first, being able to compare your CPU with various GPUs has been useful to many people thinking of buying a GPU.

it run GPU computation on CPU, isn't that wierd??

Dario Tilves

Hi, just running on Nvidia K80 and got (btw, only one GPU was used from card, how to quickly change it?):

Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring
something that takes longer.
> In timeit (line 158)
In gpuBench>iTimeit (line 323)
In gpuBench>runMTimes (line 207)
In gpuBench (line 103)
In gpuBenchLauncher (line 11)
In gpuBenchApp/startApp (line 88)
In gpuBenchApp (line 48)
In appinstall.internal.runapp>execute (line 78)
In appinstall.internal.runapp>runapp13a (line 57)
In appinstall.internal.runapp>runcorrectversion (line 36)
In appinstall.internal.runapp (line 18)

KSSV

KSSV (view profile)

Malcolm Cook

My TITAN X (Pascal) results are below.

I think I should expect double precision arithmetic to proceed at 11 TFLOP / 32 = ~343 GFLOP

This is based on reading http://www.guru3d.com/articles-pages/nvidia-geforce-titan-x-pascal-review,1.html

343 GFLOP is just about what you see in the below table of benchmarks, so I think I'm getting what I paid for.

Results for data-type 'double' Results for data-type 'single'
(In GFLOPS) (In GFLOPS)
MTimes Backslash FFT MTimes Backslash FFT
TITAN X (Pascal) 357.95 308.44 187.75 7349.88 2175.31 632.93

I'd appreciate feedback. Am I reasoning correctly?

Nike Dattani

Why don't you just post the results somewhere?

Rebeca Joca

i7 7700K CPU @ 4.2 GHz / 16GB RAM (3200MHz)
CUDA 8

==========================================
Double Precision
MTimes | Backslash | FFT
GeForce GTX 1080Ti | 423 | 286 | 190
Host PC | 258 | 162 | 23
==========================================
Single Precision
MTimes | Backslash | FFT
GeForce GTX 1080Ti | 11907 | 1897 | 679
Host PC | 502 | 340 | 33
==========================================

Harel

Harel (view profile)

There is still hope for Mac users!

MBP 2016 I i7-6920HQ CPU @ 2.90GHz
Memory 16GB
CUDA 8,
Results:

==========================================

Double Precision

MTimes | Backslash | FFT

GeForce GTX 980 Ti | 190 | 165 | 104

Host PC | 157 | 105 | 12

========================================== 

Single Precision

MTimes | Backslash | FFT

GeForce GTX 980 Ti | 5998 | 1058 | 433

Host PC | 316 | 202 | 20

==========================================

Tony

Tony (view profile)

Greg

Greg (view profile)

benkant

Adrian

Adrian (view profile)

Philip

Philip (view profile)

https://devblogs.nvidia.com/parallelforall/cuda-8-features-revealed/

Hopefully we'll get proper Pascal support soon.

osnr

osnr (view profile)

i7-6850K@3.60GHz, CUDA 8RC, Memory 16GB

Results:
==========================================
Double Precision
MTimes | Backslash | FFT
GeForce GTX 1080 | 276 | 188 | 139
Host PC | 204 | 124 | 8
==========================================
Single Precision
MTimes | Backslash | FFT
GeForce GTX 1080 | 5273 | 1403 | 422
Host PC | 367 | 245 | 15
==========================================

On closer examination of individual curves:

-- FFT(double) drops after ArraySize=4M
-- MTimes(single) drops after ArraySize=4M
-- FFT(single) drops after ArraySize=16M

arnold

arnold (view profile)

Martin,
I'm also interested in that but it's clear that the 'pascal' Geforce TitanX will perform roughly 30% better than the 1080GTX I've posted results for 7 posts down. The single precision performance is by far not as good as it should be (should be >8Tflop/s for GTX1080 and <12Tflop/s for the TitanX) since the current version of CUDA 7.5 does not fully support the pascal platform properly.
As for double precision, the TitanX like any GeForce card won't stand a chance against your K40c because Nvidia wants to sell the much pricier Tesla cards.

At single precision, the 1080GTX is already 20% faster than yourK40c, but it is 160% faster hardware-wise.

Would anyone be willing to post their benchmark results using the latest Titan X? I currently have a K40c that I'm using for calculations (mainly fft) and I'm looking to buy more computational power (double precision). From what I understand fft is mostly memory bound so something like the Titan X might work as well as the K40c at a much cheaper price. I've posted my results from the K40c below.

Double | MTimes | Backslash | FFT
Tesla K40c | 1154.72 | 706.48 | 135.51
Host PC | 186.81 | 117.49 | 4.97

Single | MTimes | Backslash | FFT
Tesla K40c | 3071.64 | 1284.10 | 299.57
Host PC | 468.12 | 226.27 | 8.94

arnold

arnold (view profile)

Thanks Alison, tried this just now. Works after a restart of the entire computer.

Matlab/the GPU keeps crashing trying to use a simple filter like medfilt2(A,[9,9]). Smaller Neighbourhoods work, larger 9,9 or larger don't. The free memory is then NaN and nothing helps besides restarting Matlab. The hardware seems rock solid, stress-tested with other CUDA code over several days. Here's a description. http://de.mathworks.com/matlabcentral/answers/299970-reset-gpudevice-does-not-work

I don't want to get into the details here, wrong place, but bottom line for us at the moment is, that we planned on expanding some of our simulation work using Matlab plus consumer GPUs (1080GTX, TitanX) with 8 or 12 Tflops sounding great for the money (we only need single precision). At this point though, I'm not convinced the combination runs very robustly.

Maybe CUDA 8 will solve this and/or 2016b then.

Alison Eele

Hi Arnold

One way to stop this delay repeating for using the first gpuArray or other GPU command in MATLAB for the GTX1080 is to set an environment variable on your system called CUDA_CACHE_MAXSIZE. This is by default set to 32MB which when we re-optimize our libraries for the Pascal architecture is not enough room. So rather than being a one time optimization the delay occurs every time.

From experimentation we recommend setting this to between 500MB to 1GB in size. In order to set the cache to 1GB use CUDA_CACHE_MAXSIZE 1073741824. In Windows you can do this in properties > advanced system settings > environment variables.

arnold

arnold (view profile)

Hi Ben,

about delays. Yes, very much to my surprise it currently takes like a minute (!) or so to open the first gpuArray every time I start Matlab, which kind of defeats the purpose of quickly analyzing a large image for instance. The same happened when executing this benchmark, it took very long before it even started. This seems to fall within your description of CUDA 7.5 then.
Will the current Matlab release get the update if as soon as NVidia updates CUDA... because it would be a shame to have to wait another six months (worst case) after NVidia releases it.
I'm also holding back on purchasing a new private license then because using the GPU is exaclty what I was planning to do at home as well. At work we have a subscription plan, so no worries there.... 'when it's done'.

thanks for letting me know, I was literally about to purchase a r2016a license for home usage soon but that would be a waste then if I need CUDA 8 for Pascal and Matlab will probably not go and provide it for r2016a anymore....

arnold

arnold (view profile)

asa sidenote:
I'm having problems using gpuArrays with matlab, like medfilt2(A,[11,11]) always crashes whereas a size of 7,7 still works.
Only a matlab restart makes the GPU usable again.

Ben Tordoff

Hi Arnold, thanks for sharing these results.

Although timing might be an issue at smaller sizes, I think the real reason you're not seeing much in the way of a gain for the GTX1080 is that the versions of CUDA used in MATLAB and Parallel Computing Toolbox (CUDA 7.5 and earlier) do not directly support the new "Pascal" class GPUs. Instead they fall back to just-in-time recompilation of the libraries, which is also why you will see large delay on first use. This means the resulting algorithms are not fully optimized for the new Pascal GPU architecture.

CUDA 8 will be the first CUDA release to have native Pascal GPU support, but as of now (22nd August) it is not yet available except as a "release candidate".

arnold

arnold (view profile)

Hi all,

did the test on a new machine (GTX 1080 & Intel 5960X), got a nice warning message:
====================
Warning: The measured time for F may be inaccurate because it is
running too fast. Try measuring something that takes longer.
====================

So it is fast, I guess :P, obviously only at single precision (thank you NVidia). Interestingly, it is far from it's 8Tflop/s though which it theoretically has at single precision. With 4.42 Tflop/s it's only 500GFlop/s faster than the GTX 970 I've tried. It might be, that the warning message is right and this test can't really measure the proper performance?

Results:
==========================================
Double Precision
MTimes | Backslash | FFT
GeForce GTX 1080 | 219.50 | 175.22 | 115.19
Host PC | 329.06 | 202.88 | 16.29
==========================================
Single Precision
MTimes | Backslash | FFT
GeForce GTX 1080 | 4420.24 | 1570.92 | 414.50
Host PC | 617.28 | 407.54 | 19.79
==========================================

Staffan

One general question, I assume that some of you use GPU computing with neural networks; have any of you used the GPU for prediction problems and obtained a faster computation than compared to using the CPU?

More info here:
http://se.mathworks.com/matlabcentral/answers/291744-time-series-prediction-using-neural-networks-narnet-narxnet-is-it-at-all-possible-to-train-a-ne

Staffan

Yufeng, what score do you get for the titan x?

Yufeng Huang

Malcolm: I'm very interested in this, can you keep us posted?
(also rated this; I'm running one Titan X and one T2075 on two separate machines)
Yufeng

I'm in the process of building an external GPU for a macbook pro 15" mid 2015. Trying to work out which is the best GPU to run over thunderbolt 2.

Staffan

(Arnold, I meant GTX 980 and not GTX 1070...sorry for this)

Staffan

Thanks Arnold for the specs on the GTX 1070 card. If I may make wish to the nexus it would be to have the same test performed with a GTX 1080 card. Tomorrow morning I will add 16 Gb of RAM and a pro SSD to my rig, the next step might very well be adding a GTX 1080 (however, I expect a drop in the pricing of this card soon and will wait a few weeks before buying one). If no one has beaten me to it, I will add the specs for the GTX 1080 card once obtained.

arnold

arnold (view profile)

works.
tried it on this system:
Intel 2500K, GeForce GTX 970 4GB
==============
Double -----------------------------Single
MTimes | Backslash | FFT --- MTimes | Backslash | FFT

GeForce GTX 970
115.58 | 86.22 | 62.41 ---- 3755.02 | 444.68 | 247.94
Host PC
104.40 | 62.44 | 7.68 ---- 214.48 | 152.65 | 14.94

such a shame that the ordinary cards are so crippled at double precision

The app gives many warnings that nargchk is outdated, it would be nice if you could update the app accordingly.

Shiv Tewari

Provides the user with a good perspective if he/she really wants to go ahead and implement their code on GPU. You know, the pain of transforming your codes v/s the reward of faster computation.

cheng joylin

cheng joylin

Philip

Philip (view profile)

Martin

Martin (view profile)

I just wanted to point out that I could run it on MATLAB R2014a only after changing line 442 in gpuBench.m from "freeMem = gpu.AvailableMemory;" to "freeMem = gpu.FreeMemory;" in case someone else has the same issue. Cheers, Martin

Peri

Peri (view profile)

@Alex. I agree totally. The 15" Macbook Pro double-precision floating point performance is crippled. See GPUBench score @ http://www.tinyurl.com/cuda-on-mac

Alex R.

@Fabio: This is on a Macbook Pro Late 2013, 2.3 Ghz, 16 GB. See last two rows. Seems only the FFT double-precision is faster than the CPU (about twice as fast). \ and * are much slower. Pretty crippled performance ...
Apple should have used the Quadro K1100M (same physical chip as the 750M without the crippled double-precision). From the looks of it, it's not really worth the effort of coding for the GPU on the MBP with 750M. You can just stick with the CPU. Problem is that if you want to buy the MBP without the 750M but with the same CPU, you end up paying the same price (at least that was the case back in May when I bought it).

Results for data-type 'double'
(In GFLOPS) Results for data-type 'single'
(In GFLOPS)
MTimes Backslash FFT MTimes Backslash FFT

Quadro K6000 1489.50 453.38 141.32 3998.82 737.72 295.48
Tesla K20c 1005.00 490.83 110.40 2690.21 772.21 257.51
Tesla C2075 327.83 242.26 69.13 684.97 425.15 144.56
GeForce GTX TITAN 213.35 124.43 90.89 3840.88 735.68 328.85
GeForce GTX 680 139.20 97.53 58.82 1468.69 620.54 214.67
Quadro 2000 38.60 33.01 14.18 232.90 122.57 46.32
GeForce GT 640 18.13 14.08 8.51 185.60 95.49 33.62
Quadro K600 13.24 10.69 6.17 135.40 0.01 26.57

Host PC 136.45 82.92 7.90 250.23 178.69 5.71
GeForce GT 750M 27.38 23.01 13.83 348.97 0.03 59.32

Peri

Peri (view profile)

Noel, MacBook Pro retina Late 2013, i7 2.3GHz, 16 GB, with Nvidia GT 750M discrete card :
C/GPU GFlops |MTimes| Backslash| FFT| MTimes| Backslash| FFT|

Host PC |144.88| 63.95| 6.93| 235.92| 153.01| 11.81

GeForce GT 750M| 27.92| 19.58| 13.04 |296.35 |0.03| 60.88

Fabio Freschi

Noel

Noel (view profile)

And a request ... does anyone have gpuBench results for the Nvidia GT 750M that lives inside the latest MacBook Pro. I'd like to know just how crippled the double precision is before I buy one. Thanks in advance.

Noel

Noel (view profile)

New version post R2014a, but still no data for R2014a, so GPUBench falls over when it comes to report production (if running R2014a). A workaround is to change the name of R2013b.mat to R2014a.mat, then R2014a can run GPUBench successfully!

Philip

Philip (view profile)

Cedric Wannaz

Hi Ben, yes, I went from 8pins + defect 6pins, to 8 pins + LP4->6pins, which works great now. Thank you for the support!

Ben Tordoff

Just a final clarification: do you now have both 6-pin and 8-pin connectors connected? You definitely need both to get the full 250W that the Titan can consume at peak load.

Ben

Cedric Wannaz

Hi Ben, thank you for your comment. After performing a lot of tests, swaps, etc, I found out that my PSU has a defect, because it is working now (gpuBench, 3Dmark, etc) after I replaced the direct 6 pins outlet from the PSU with a dual LP4 outlet + adapter.

Ben Tordoff

Hi Cedric, could you send us the full log entry? If it's too big to post here, send it direct using the author link above.

At a guess it sounds like you exceeded some power setting whilst running computations - GPU bench is deliberately computation (and therefore power) heavy. Do you definitely have both power connectors connected? Your PSU sounds big enough, so it's a bit odd.

Ben

Cedric Wannaz

I get a power off/restart (entry named "kernel-power" in the event logs) when I try running GPUbench, in the "GPU single" test/section.

GTX Titan Black (in slot PCIe2 16x 75W) on DELL Precision T7500, dual Xeon X5550, 24GB RAM, 1110W power supply, latest BIOS update, SERR/DMI disabled, driver 337.88 for the graphic card.

Lanier

Lanier (view profile)

Win 7SP1 64bit, CPU E5-2687Wv2, Matlab 2014a

GTX TITAN Black 1312.05 517.26 150.15 3730.83 881.97 309.47
Host PC 140.18 101.90 6.89 327.19 209.63 9.50

Lanier

Lanier (view profile)

Remsus

Remsus (view profile)

Thank you Michal

I think you have the same problem for double precision as i had.

But it seems that its necessary to enable double precision mode for the GTX Titan

it is in the NVIDIA control Panel, under Manage 3d settings, global settings tab.

After Enabling things look much different:
MTimes_D Backslash_D FFT_D
GeForce GTX TITAN 1285.83 128.35 146.92
Tesla C2075 333.84 246.11 73.36

Ubuntu 12.04.3 64bit, Matlab R2014a
Results for data-type 'double'(In GFLOPS)

Results for data-type 'single'(In GFLOPS)
MTimesBackslashFFTMTimesBackslashFFT
Tesla K20c1005.83496.82131.462690.80783.38282.48
Tesla C2075333.84246.1173.36696.37435.56163.04
GeForce GTX TITAN213.31130.6995.013826.94514.20365.85
GeForce GTX 680139.2694.6660.661463.78604.57223.48
GeForce GTX 670117.7381.7752.221165.37519.18201.95
Quadro K500085.4864.1741.00955.10451.36172.25
Quadro K400060.5749.6428.40663.63364.36128.24
Quadro K200028.7920.9313.90310.71141.5856.71
GeForce GT 64028.7921.1013.71314.82141.8559.29
Host PC38.9729.152.1079.2947.974.05
Quadro K60013.2410.386.31135.5771.1227.61

CUDADevice with properties:

Name: 'GeForce GTX TITAN'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 5.5000
ToolkitVersion: 5.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4421e+09
FreeMemory: 5.9798e+09
MultiprocessorCount: 14
ClockRateKHz: 875500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

Remsus

Remsus (view profile)

For the people who get the error with max 500 recursions, try not to run the app, but just type gpuBench(). For me it worked.

Is there any information on what systems the reference result statistics are made?
We decided to go for a GeForce GTX TITAN, in stead of a C2075 because on specs it should beat the C2075 eccept for the ECC memory but most people turn that off, to get faster performance. But now when i runned the bench, the Tesla 2075 beats the GTX in our system on nearly everything, except for MTimes and FFT (SINGLE).
Especially Backslash double with 82Gflops was very dissapointing compared to the 246 for the C2075 in the reference system.

Any one else there with a Titan, that could share his/her results? please send a pm if so.

current version of gpuBench is not comaptible with R2014a

Some problem with latest version R2014a:

Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N) to change the limit. Be aware that exceeding your
available stack space can crash MATLAB and/or your computer.

Error in gpuBenchApp

Ben Tordoff

Thanks Matthew, you're right - I'll get that fixed.

Ideally timing should be measured using timeit (for host) or gputimeit (for gpu), but if I started using those then this would stop working on R2013a and earlier. I'll post an update shortly.

Hi Ben,
I wanted to say thanks for the great app, but also to point out something that could cause inaccurate results in some cases. The function gtoc() is using the wait() function (which is good), but it's also calling gpuDevice every time, which is actually pretty slow - it typically takes between 3.6 and 5.6ms on my machine - and this time gets added to the total. You might consider storing the output of gpuDevice in a persistent variable, e.g. gpuid, and instead call wait(gpuid).
For large array sizes I suppose it doesn't matter too much, but for smaller arrays the extra gpuDevice time can make it look like a GPU is slower than a CPU in cases where it's really not.

Ben Tordoff

Hi Rodrigo,

gpuBench does not show any "speed-up" comparisons, it shows absolute performance in floating-point operations per second (FLOPS). The results for you CPU are the absolute performance results for your CPU in isolation, not as a comparison. Likewise for the other results. The pre-stored "host" results are the absolute performance of the machine used to capture the results.

All of the plots include both GPU and Host PC results, so the text should probably say "These results show the performance of the GPU or host PC when calculating...". I'll fix that.

Thanks
Ben

R

R (view profile)

For example if I click on Host PC in the results I see

"These results show the performance of the GPU when calculating ... "

Also why is there a speed-up for my CPU? Presumably it is because it is using parallel computations with increasing number of CPUs, is that the case?

R

R (view profile)

Ben,

Thank you for you answer. If I understand correctly, the highlighted GeForce GTX 770M in the GPUBench report is the speed-up from my own GPU and the main host is my CPU against the CPU used for the pre-stored data?

Im still not clear on what the results are telling me. Perhaps the report could include a bit more explanation?

Thanks.

Rodrigo.

Ben Tordoff

Hi Rodrigo,

the "host PC" data doesn't use the GPU at all, it measures your PC's main CPU(s). As such, you are probably just seeing that we used a pretty high-spec PC for hosting the various GPUs we tested (to make for a fairer GPU vs CPU comparison).

Ben

R

R (view profile)

Hi, Thanks for a very nice submission!

Im finding that my computer (host pc) is considerably slower than the exact same card (Nvidia GTX 770M) in the pre-stored data. Are there any recommendations that may improve this? ny recommended reading?

thanks again,

Rodrigo.

Ben Tordoff

Hi Mike, I have no problem with bug reports appearing here as it means others can see them too. I was able to reproduce the problem using a fresh MATLAB install and I have a fix in the works.

As a work-around, you should be able to run gpuBench at the commandline (just type "gpuBench") - it is just the app launcher that is broken.

Mike

Mike (view profile)

@Ben I've messaged you details via FileExchange. I should have done that in the first instance. Could you, or someone at Mathworks, remove my comments please so that I'm not messing up the comments and ratings thread for what is a bug report. Sorry about that.

Ben Tordoff

Hi Mike. I've just tried downloading and installing the app on both R2013b and R2013a and didn't hit any problems. Could you describe exactly what steps you performed so that I can try and diagnose the problem?

Mike

Mike (view profile)

This has always worked well in the past but on dowloading today and running in MATLAB 2013a, I get the error

Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N) to change the
limit. Be aware that exceeding your available stack space can crash MATLAB and/or
your computer.

Error in gpuBenchApp

Jos Martin

Great GPU application to show how your GPU compares to others.

Firas Sawaf

Justin, I had a similar error, like you described. I fixed by copying files to a different folder (c:\gpubench) and running the install from there.

Justin

Justin (view profile)

I am getting the following error when attempting to use your app on R2013a:

Error using evalin
Undefined function or variable 'GPUBenchApp'.

Error in appinstall.internal.runapp>execute (line 69)
out = evalin('caller', [script ';']);

Error in appinstall.internal.runapp>runapp13a (line 51)
outobj = execute(fullfile(appinstalldir, [wrapperfile 'App.m']));

Error in appinstall.internal.runapp>runcorrectversion (line 35)
appobj = runapp13a(appinstalldir);

Error in appinstall.internal.runapp (line 17)
out = runcorrectversion(appmetadata, appentrypoint, appinstalldir);

Ben Tordoff

Hi Andrei,

yes, you can do this with the tool as it is, although it isn't that easy. I will look at adding a more convenient way later.

1. Remove the data-file for the release you are using (so data/R2013a.mat if using the latest release).
2. Capture and store the results from each machine/GPU you are interested in:

>> data = gpuBench();
>> gpubench.saveResults(data);

This will build up a new data-file specific to your machines and the MATLAB release being used. Let me know if this doesn't work for you or you have suggestions as to how to make this more convenient.

Cheers
Ben

As stated in the description, GPUBench "produces a detailed HTML report showing how your GPU's performance compares to PRE-STORED PERFORMANCE RESULTS from a range of other GPUs." Although being very happy with GPUBench, I found strange that the application only allows to compare against pre-defined set of other hardware.

Quite a typical situation is that your bosses (or yourself) want to compare machines that the company already has (e.g., to decide what comps to allocate for the development and what for running release versions, or to decide which computers must be enhanced with additional processor units). It would be fine to have an opportunity to run GPUBench in one computer, save the benchmark structure to a file, copy this file to another computer and run the GPUBench on that another computer in such a manner that its data are added to the benchmark structure. Thus the User could compare his/her own computers.

Is this mode can be realized somehow in the current version of the application? If not, can it be included in future versions?

Mirko

Mirko (view profile)

Wow, super thought through app. Smart to include own Computer and other GPUs.

Narfi

Narfi (view profile)

If you run into CUDA_ERROR_LAUNCH_TIMEOUT, have a look at

http://www.mathworks.com/gputimeout

It explains how to change your system settings to avoid this.

David Allen

Hi Ben,

Thanks for the code.

I am getting this error though. I know it is to do with the time-out settings, but don't know what to do from here. My Quadro 1000M does not appear to be speeding up my ffts etc.

Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.
> In gpuBench at 75
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.
> In gpuBench at 75
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.
> In gpuBench at 75
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.

Error in C:\Program
Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\currentDeviceFreeMem.p>currentDeviceFreeMem
(line 7)

Error in parallel.gpu.CUDADevice/get.FreeMemory (line 107)
fm = parallel.internal.gpu.currentDeviceFreeMem();

Error in gpuBench>getTestSizes (line 371)
freeMem = gpu.FreeMemory;

Error in gpuBench>runMTimes (line 163)
sizes = getTestSizes( type, safetyFactor, device );

Error in gpuBench (line 76)
gpuData = runMTimes( gpuData, reps, 'double', 'GPU', progressTitle, numTasks );

Thanks,
Dave

Ben Tordoff

Hi Tristan,

GPUBench only benchmarks one GPU at a time. Since it just uses the current device, you can use "gpuDevice(n)" to select the nth GPU before calling it. However, NVIDIA's drivers normally default to the most powerful card first, so if you're only getting results for your slowest card that indicates a wider problem. Can you try doing:

>> gpuDeviceCount()

to make sure all four devices are found? You can then try

>> for ii=1:gpuDeviceCount(), gpuDevice(ii), end

to print out the details of all the cards found. You need to make sure all of them have the "DeviceSupported" flag set to 1.

I've never seen the particular error you report, and looking on NVIDIA's forums they say it is most likely caused by a hardware problem and once you hit it you have to reboot to fully flush memory:

http://forums.nvidia.com/index.php?showtopic=204333

That doesn't sound good, I'm afraid!
Let me know how you get on.

Ben

I've attempted to run benchmark. I have 3 teslas and a quadro in my machine. I noticed that only my fourth GPU was being used at all. The benchmark failed at 19% with the following error:
An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ECC_UNCORRECTABLE.

Error in C:\Program
Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\currentDeviceFreeMem.p>currentDeviceFreeMem
(line 7)

Error in parallel.gpu.CUDADevice/get.FreeMemory (line 107)
fm = parallel.internal.gpu.currentDeviceFreeMem();

Error in gpuBench>getTestSizes (line 371)
freeMem = gpu.FreeMemory;

Error in gpuBench>runMTimes (line 163)
sizes = getTestSizes( type, safetyFactor, device );

Error in gpuBench (line 76)
gpuData = runMTimes( gpuData, reps, 'double', 'GPU', progressTitle, numTasks );

Thanks for your help on this.

Thomas

Thomas (view profile)

Good benchmark for GPU's

Updates

1.11.0.0

Update report style.
Remove warning about old pre-stored results.

1.10.0.0

Update R2017b data file

1.10.0.0

Add data files for R2017a

1.9.0.0

* Update gpuBench with data for R2014b,...,R2016a
* Clean gpuBench code by using gputimeit and by removing dead code
* Make gpuBench robust when running MATLAB with -nodesktop or -nojvm

1.10.0.0

Fix a problem with the data location when running the app in R2014b

1.9.0.0

* Improve compatibility with MATLAB R2014b

1.8.0.0

Fix recursion problems when using the MATLAB App version.

1.7.0.0

* Add datafile for R2013b

1.6.0.0

* Add results for R2013a (including K20!)

1.5.0.0

* Suppressed warnings about results being skipped
* Now includes a set of pre-stored host-PC data so that you get a rough CPU/GPU comparison when just viewing the report
* Reduced largest size used for MTIMES to avoid out of memory

1.4.0.0

* Add an "app" version for use with R2012b and above
* Updated data-files for R2012a and R2012b

1.2.0.0

Try to prevent timeout being hit on very slow GPUs that happen to be driving the display as well.

1.1.0.0

Add data for C2075

MATLAB Release Compatibility
Created with R2013b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Discover Live Editor

Create scripts with code, output, and formatted text in a single executable document.


Learn About Live Editor