I wanted to say thanks for the great app, but also to point out something that could cause inaccurate results in some cases. The function gtoc() is using the wait() function (which is good), but it's also calling gpuDevice every time, which is actually pretty slow - it typically takes between 3.6 and 5.6ms on my machine - and this time gets added to the total. You might consider storing the output of gpuDevice in a persistent variable, e.g. gpuid, and instead call wait(gpuid).
For large array sizes I suppose it doesn't matter too much, but for smaller arrays the extra gpuDevice time can make it look like a GPU is slower than a CPU in cases where it's really not.