Code covered by the MathWorks Limited License

Highlights from
GPUBench

4.875

4.9 | 8 ratings Rate this file 250 Downloads (last 30 days) File Size: 278 KB File ID: #34080
image thumbnail

GPUBench

by

 

05 Dec 2011 (Updated )

Compare GPUs using standard numerical benchmarks in MATLAB.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

GPUBENCH times different MATLAB GPU tasks and estimates the peak performance of your GPU in floating-point operations per second (FLOP/s). It produces a detailed HTML report showing how your GPU's performance compares to pre-stored performance results from a range of other GPUs.
Note that this tool is designed for comparing GPU hardware. It does not compare GPU performance across different MATLAB releases.

Requires MATLAB R2011b or above and a GPU with CUDA Compute Capability 1.3 or higher.

Required Products Parallel Computing Toolbox
MATLAB
MATLAB release MATLAB 7.13 (R2011b)
Other requirements GPU with CUDA Compute Capability 1.3 or higher.
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (35)
24 Jun 2014 Cedric Wannaz

Hi Ben, yes, I went from 8pins + defect 6pins, to 8 pins + LP4->6pins, which works great now. Thank you for the support!

17 Jun 2014 Ben Tordoff

Just a final clarification: do you now have both 6-pin and 8-pin connectors connected? You definitely need both to get the full 250W that the Titan can consume at peak load.

Ben

17 Jun 2014 Cedric Wannaz

Hi Ben, thank you for your comment. After performing a lot of tests, swaps, etc, I found out that my PSU has a defect, because it is working now (gpuBench, 3Dmark, etc) after I replaced the direct 6 pins outlet from the PSU with a dual LP4 outlet + adapter.

17 Jun 2014 Ben Tordoff

Hi Cedric, could you send us the full log entry? If it's too big to post here, send it direct using the author link above.

At a guess it sounds like you exceeded some power setting whilst running computations - GPU bench is deliberately computation (and therefore power) heavy. Do you definitely have both power connectors connected? Your PSU sounds big enough, so it's a bit odd.

Ben

16 Jun 2014 Cedric Wannaz

I get a power off/restart (entry named "kernel-power" in the event logs) when I try running GPUbench, in the "GPU single" test/section.

GTX Titan Black (in slot PCIe2 16x 75W) on DELL Precision T7500, dual Xeon X5550, 24GB RAM, 1110W power supply, latest BIOS update, SERR/DMI disabled, driver 337.88 for the graphic card.

04 Jun 2014 Lanier

Win 7SP1 64bit, CPU E5-2687Wv2, Matlab 2014a

GTX TITAN Black 1312.05 517.26 150.15 3730.83 881.97 309.47
Host PC 140.18 101.90 6.89 327.19 209.63 9.50

04 Jun 2014 Lanier  
28 Mar 2014 Remsus

Thank you Michal

I think you have the same problem for double precision as i had.

But it seems that its necessary to enable double precision mode for the GTX Titan

it is in the NVIDIA control Panel, under Manage 3d settings, global settings tab.

After Enabling things look much different:
MTimes_D Backslash_D FFT_D
GeForce GTX TITAN 1285.83 128.35 146.92
Tesla C2075 333.84 246.11 73.36

28 Mar 2014 Michal Kvasnicka

Ubuntu 12.04.3 64bit, Matlab R2014a
Results for data-type 'double'(In GFLOPS)

Results for data-type 'single'(In GFLOPS)
MTimesBackslashFFTMTimesBackslashFFT
Tesla K20c1005.83496.82131.462690.80783.38282.48
Tesla C2075333.84246.1173.36696.37435.56163.04
GeForce GTX TITAN213.31130.6995.013826.94514.20365.85
GeForce GTX 680139.2694.6660.661463.78604.57223.48
GeForce GTX 670117.7381.7752.221165.37519.18201.95
Quadro K500085.4864.1741.00955.10451.36172.25
Quadro K400060.5749.6428.40663.63364.36128.24
Quadro K200028.7920.9313.90310.71141.5856.71
GeForce GT 64028.7921.1013.71314.82141.8559.29
Host PC38.9729.152.1079.2947.974.05
Quadro K60013.2410.386.31135.5771.1227.61

28 Mar 2014 Michal Kvasnicka

CUDADevice with properties:

Name: 'GeForce GTX TITAN'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 5.5000
ToolkitVersion: 5.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4421e+09
FreeMemory: 5.9798e+09
MultiprocessorCount: 14
ClockRateKHz: 875500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

28 Mar 2014 Remsus

For the people who get the error with max 500 recursions, try not to run the app, but just type gpuBench(). For me it worked.

Is there any information on what systems the reference result statistics are made?
We decided to go for a GeForce GTX TITAN, in stead of a C2075 because on specs it should beat the C2075 eccept for the ECC memory but most people turn that off, to get faster performance. But now when i runned the bench, the Tesla 2075 beats the GTX in our system on nearly everything, except for MTimes and FFT (SINGLE).
Especially Backslash double with 82Gflops was very dissapointing compared to the 246 for the C2075 in the reference system.

Any one else there with a Titan, that could share his/her results? please send a pm if so.

10 Mar 2014 Michal Kvasnicka

current version of gpuBench is not comaptible with R2014a

10 Mar 2014 Michal Kvasnicka

Some problem with latest version R2014a:

Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N) to change the limit. Be aware that exceeding your
available stack space can crash MATLAB and/or your computer.

Error in gpuBenchApp

06 Feb 2014 Ben Tordoff

Thanks Matthew, you're right - I'll get that fixed.

Ideally timing should be measured using timeit (for host) or gputimeit (for gpu), but if I started using those then this would stop working on R2013a and earlier. I'll post an update shortly.

05 Feb 2014 Matthew Bergkoetter

Hi Ben,
I wanted to say thanks for the great app, but also to point out something that could cause inaccurate results in some cases. The function gtoc() is using the wait() function (which is good), but it's also calling gpuDevice every time, which is actually pretty slow - it typically takes between 3.6 and 5.6ms on my machine - and this time gets added to the total. You might consider storing the output of gpuDevice in a persistent variable, e.g. gpuid, and instead call wait(gpuid).
For large array sizes I suppose it doesn't matter too much, but for smaller arrays the extra gpuDevice time can make it look like a GPU is slower than a CPU in cases where it's really not.

10 Jan 2014 Ben Tordoff

Hi Rodrigo,

gpuBench does not show any "speed-up" comparisons, it shows absolute performance in floating-point operations per second (FLOPS). The results for you CPU are the absolute performance results for your CPU in isolation, not as a comparison. Likewise for the other results. The pre-stored "host" results are the absolute performance of the machine used to capture the results.

All of the plots include both GPU and Host PC results, so the text should probably say "These results show the performance of the GPU or host PC when calculating...". I'll fix that.

Thanks
Ben

09 Jan 2014 R

For example if I click on Host PC in the results I see

"These results show the performance of the GPU when calculating ... "

Also why is there a speed-up for my CPU? Presumably it is because it is using parallel computations with increasing number of CPUs, is that the case?

09 Jan 2014 R

Ben,

Thank you for you answer. If I understand correctly, the highlighted GeForce GTX 770M in the GPUBench report is the speed-up from my own GPU and the main host is my CPU against the CPU used for the pre-stored data?

Im still not clear on what the results are telling me. Perhaps the report could include a bit more explanation?

Thanks.

Rodrigo.

06 Jan 2014 Ben Tordoff

Hi Rodrigo,

the "host PC" data doesn't use the GPU at all, it measures your PC's main CPU(s). As such, you are probably just seeing that we used a pretty high-spec PC for hosting the various GPUs we tested (to make for a fairer GPU vs CPU comparison).

Ben

01 Jan 2014 R

Hi, Thanks for a very nice submission!

Im finding that my computer (host pc) is considerably slower than the exact same card (Nvidia GTX 770M) in the pre-stored data. Are there any recommendations that may improve this? ny recommended reading?

thanks again,

Rodrigo.

13 Nov 2013 Ben Tordoff

Hi Mike, I have no problem with bug reports appearing here as it means others can see them too. I was able to reproduce the problem using a fresh MATLAB install and I have a fix in the works.

As a work-around, you should be able to run gpuBench at the commandline (just type "gpuBench") - it is just the app launcher that is broken.

12 Nov 2013 Mike

@Ben I've messaged you details via FileExchange. I should have done that in the first instance. Could you, or someone at Mathworks, remove my comments please so that I'm not messing up the comments and ratings thread for what is a bug report. Sorry about that.

12 Nov 2013 Ben Tordoff

Hi Mike. I've just tried downloading and installing the app on both R2013b and R2013a and didn't hit any problems. Could you describe exactly what steps you performed so that I can try and diagnose the problem?

12 Nov 2013 Mike

This has always worked well in the past but on dowloading today and running in MATLAB 2013a, I get the error

Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N) to change the
limit. Be aware that exceeding your available stack space can crash MATLAB and/or
your computer.

Error in gpuBenchApp

16 Sep 2013 Jos Martin

Great GPU application to show how your GPU compares to others.

03 Jul 2013 Firas Sawaf

Justin, I had a similar error, like you described. I fixed by copying files to a different folder (c:\gpubench) and running the install from there.

24 May 2013 Justin

I am getting the following error when attempting to use your app on R2013a:

Error using evalin
Undefined function or variable 'GPUBenchApp'.

Error in appinstall.internal.runapp>execute (line 69)
out = evalin('caller', [script ';']);

Error in appinstall.internal.runapp>runapp13a (line 51)
outobj = execute(fullfile(appinstalldir, [wrapperfile 'App.m']));

Error in appinstall.internal.runapp>runcorrectversion (line 35)
appobj = runapp13a(appinstalldir);

Error in appinstall.internal.runapp (line 17)
out = runcorrectversion(appmetadata, appentrypoint, appinstalldir);

09 May 2013 Ben Tordoff

Hi Andrei,

yes, you can do this with the tool as it is, although it isn't that easy. I will look at adding a more convenient way later.

1. Remove the data-file for the release you are using (so data/R2013a.mat if using the latest release).
2. Capture and store the results from each machine/GPU you are interested in:

>> data = gpuBench();
>> gpubench.saveResults(data);

This will build up a new data-file specific to your machines and the MATLAB release being used. Let me know if this doesn't work for you or you have suggestions as to how to make this more convenient.

Cheers
Ben

08 May 2013 Andrei Borissovitch Utkin

As stated in the description, GPUBench "produces a detailed HTML report showing how your GPU's performance compares to PRE-STORED PERFORMANCE RESULTS from a range of other GPUs." Although being very happy with GPUBench, I found strange that the application only allows to compare against pre-defined set of other hardware.

Quite a typical situation is that your bosses (or yourself) want to compare machines that the company already has (e.g., to decide what comps to allocate for the development and what for running release versions, or to decide which computers must be enhanced with additional processor units). It would be fine to have an opportunity to run GPUBench in one computer, save the benchmark structure to a file, copy this file to another computer and run the GPUBench on that another computer in such a manner that its data are added to the benchmark structure. Thus the User could compare his/her own computers.

Is this mode can be realized somehow in the current version of the application? If not, can it be included in future versions?

17 Apr 2013 Mirko

Wow, super thought through app. Smart to include own Computer and other GPUs.

16 Apr 2012 Narfi

If you run into CUDA_ERROR_LAUNCH_TIMEOUT, have a look at

http://www.mathworks.com/gputimeout

It explains how to change your system settings to avoid this.

13 Apr 2012 David Allen

Hi Ben,

Thanks for the code.

I am getting this error though. I know it is to do with the time-out settings, but don't know what to do from here. My Quadro 1000M does not appear to be speeding up my ffts etc.

Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.
> In gpuBench at 75
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.
> In gpuBench at 75
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.
> In gpuBench at 75
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT.

Error in C:\Program
Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\currentDeviceFreeMem.p>currentDeviceFreeMem
(line 7)

Error in parallel.gpu.CUDADevice/get.FreeMemory (line 107)
fm = parallel.internal.gpu.currentDeviceFreeMem();

Error in gpuBench>getTestSizes (line 371)
freeMem = gpu.FreeMemory;

Error in gpuBench>runMTimes (line 163)
sizes = getTestSizes( type, safetyFactor, device );

Error in gpuBench (line 76)
gpuData = runMTimes( gpuData, reps, 'double', 'GPU', progressTitle, numTasks );

Thanks,
Dave

21 Feb 2012 Ben Tordoff

Hi Tristan,

GPUBench only benchmarks one GPU at a time. Since it just uses the current device, you can use "gpuDevice(n)" to select the nth GPU before calling it. However, NVIDIA's drivers normally default to the most powerful card first, so if you're only getting results for your slowest card that indicates a wider problem. Can you try doing:

>> gpuDeviceCount()

to make sure all four devices are found? You can then try

>> for ii=1:gpuDeviceCount(), gpuDevice(ii), end

to print out the details of all the cards found. You need to make sure all of them have the "DeviceSupported" flag set to 1.

I've never seen the particular error you report, and looking on NVIDIA's forums they say it is most likely caused by a hardware problem and once you hit it you have to reboot to fully flush memory:

http://forums.nvidia.com/index.php?showtopic=204333

That doesn't sound good, I'm afraid!
Let me know how you get on.

Ben

20 Feb 2012 Tristan Martel

I've attempted to run benchmark. I have 3 teslas and a quadro in my machine. I noticed that only my fourth GPU was being used at all. The benchmark failed at 19% with the following error:
An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ECC_UNCORRECTABLE.

Error in C:\Program
Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\currentDeviceFreeMem.p>currentDeviceFreeMem
(line 7)

Error in parallel.gpu.CUDADevice/get.FreeMemory (line 107)
fm = parallel.internal.gpu.currentDeviceFreeMem();

Error in gpuBench>getTestSizes (line 371)
freeMem = gpu.FreeMemory;

Error in gpuBench>runMTimes (line 163)
sizes = getTestSizes( type, safetyFactor, device );

Error in gpuBench (line 76)
gpuData = runMTimes( gpuData, reps, 'double', 'GPU', progressTitle, numTasks );

Thanks for your help on this.

25 Jan 2012 Thomas

Good benchmark for GPU's

Updates
18 Jan 2012

Add data for C2075

23 Jul 2012

Try to prevent timeout being hit on very slow GPUs that happen to be driving the display as well.

16 Oct 2012

* Add an "app" version for use with R2012b and above
* Updated data-files for R2012a and R2012b

01 Nov 2012

* Suppressed warnings about results being skipped
* Now includes a set of pre-stored host-PC data so that you get a rough CPU/GPU comparison when just viewing the report
* Reduced largest size used for MTIMES to avoid out of memory

08 May 2013

* Add results for R2013a (including K20!)

06 Nov 2013

* Add datafile for R2013b

01 Jul 2014

Fix recursion problems when using the MATLAB App version.

Contact us