This MEX performs 2d bilinear interpolation using an NVIDIA graphics chipset. To compile and run this software, one needs the NVIDIA CUDA Toolkit (http://www.nvidia.com/object/cuda_get.html) and, of course, an NVIDIA graphics card of reasonably modern vintage.
BUILDING INSTRUCTIONS: Change the 'MATLAB' (and if necessary, 'MEX') variables in the Makefile to appropriate values, then simply run 'make' at a prompt and an executable (mex/mexmac/mexmaci/dll?) file will be created.
This code uses your GPU's built-in bilinear texture interpolation capability, and is very fast. For reasonably sized operations (taking, say, a 50x50 matrix up to 1000x1000) CUDA-based code is 5-10x faster than linear interp2 (as tested on a MBP 2.4GHz C2D, GeForce 8600M GT).
With very (VERY) large matrices, however, it has the capability of completely crashing your computer or giving bizarre results. Be careful!
This is difficult to review, really - as I learned a lot from this (I've done quite a bit of MATLAB and CUDA integration but somehow managed to avoid texture mapping) - so I think this is a good example, as you saved me figuring it out for myself! Thanks :)
On the strict review side, I'd say the code actually needs to be quite a lot more rigorous before being put to general use. Let's put it like this - if you get a SegV in a MEX file you have to restart MATLAB, which is bad enough, but this code has loads of omissions - which could lead to whole system or graphics system crashes.
So sorry for being annoying, but hopefully some of my (painful!) experience will be useful:
1. More rigorous checking of the inputs. Sizes and class checks on all inputs will avoid many problems (normal MEX file stuff).
2. Look in the CUDA SDK examples at the 'deviceQuery' source code - rip it out, and use it to check device properties. John is usually right (array just not allocating and code returning garbage) - but in this case I suspect he may not be, as that is likely to lead to a seg fault. I think it's probably something to do with the block or grid size, which may exceed your card's limit when large matrices are used (hence garbled results). Using code from deviceQuery will show you how to check all the GPU's properties (including max memory allocation).
3. You should be able to set which device you use. I think this code defaults to device 0. So my system with a wussy GT8600 and a massive TeslaC1060 would run the code on the GT8600! You can select devices automatically (get the one with maxGFlops) or pass in an argument to say which one you want.
4. Use the cutilsafecall utility (again, check the SDK examples), or use the error flags to handle errors yourself. At the moment the returned status flags from CUDA functions such as cudaMAlloc aren't used. This leads to two problems: a. Seg Faults and b. Memory leaks when things fail, requiring system restart. Doing this will also tell you the answer to (2) above as it'll show if there are problems.
Th description is somewhat misleading. The function (and Mr. Buchgraber's) only interpolates to a set of regularly spaced points. It is equivalent to the ZI = interp2(Z,ntimes) call of interp2. It does not do 2D interpolation on a general. i.e. not equally spaced points. as can also be done with interp2
I did something similar with some improvements.
I also included a test to compare my solution (bilininterp) with cudainterp2 and as you can see here
it is much faster than cudainterp2.
My lack of any c expertise (or the compiler) shows here, so I can't test this.
I'd still make a bit more of a revision to the help however. Tell the user what will be returned, even though it seems logical to me that the new array will have shape (nrows,ncols). Or does this code overwrite data? Tell your user what your expectations for data are on input. Must it be a double array, or may it be single or uint8? Complex?
John -- Thanks for the tips, I've submitted a version with corrected help and a better zipped version (OS X's archiver .. ).
more matlab functions compiled with Nvidia CUDA please !!!
e.g. image processing toolbox functions that are generally very slow ( rotate in 3d , filtering , radon etc. etc )
cpu speed X 2 in 18 m, gpu speed X 2 in 6 !
While I do agree with Scott, I'll add a few comments to what he said.
It should not be necessary to have a file with the name cudainterp2_help. Instead, just rename that help file cudainterp2.m. Matlab is smart enough that it looks for help in an m-file with the same name, even if the m-file ONLY has help in it. When running the code, it uses the mexed file instead.
One thing that I do like is the author's comment about the limits of this code. I'll conjecture that the problem of crashing on very large problems is due to memory limitations in the GPU (or something like that.) As such, its probably not a bug in this code. But warning the user of known issues is a nice thing to do.
The one thing that I can say about this code is the H1 line, i.e., the first line of help. It should be a simple, concise line, that includes important key words that lookfor will find. In this code, the first line of the help was a long winded thing that went on to tell the user about installation requirements, compilation particulars, etc. Split up that line.
Finally, I see that the zip file contains some unnecessary files. These should be cleaned up before zipping the directory. The gipper utility is a nice one (found on the file exchange, written by Tim Davis) to zip directories while dropping out those files you don't want to include.
Thanks for posting this. I won't review it, since I haven't run it, but it would be great if there were more routines that worked in GPUs with all the interfaces to MATLAB that would allow them to be used as direct plug-ins without needing to know all the gritty details!
Updated help, added test benchmarking script.
Cleaned up ZIP file, help.
Create scripts with code, output, and formatted text in a single executable document.