I had similar issues on MacOS (10.6.7+) because 'uname -a' returns i386 but gcc builds for x86_64 by default. nvcc tries to 'autodetect' but gets the wrong value.
I get the same result as Dung Chu when I use the .mexmaci file which is included with the download.
I believe that you are supposed to delete that file, and create a new one using make. (Go to that directory in terminal, type 'make')
However, when I do this I am getting architecture issues that I do not know how to deal with:
When compiling, I get the errors like this:
warning: in cudaconv.o, file was built for i386 which is not the architecture being linked (x86_64)
When using the resulting file I get this:
c = cudaconv(2,2)
??? Invalid MEX-file '/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci': dlopen(/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci, 1): no suitable image found. Did find:
/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci: mach-o, but wrong architecture.
This is difficult to review, really - as I learned a lot from this (I've done quite a bit of MATLAB and CUDA integration but somehow managed to avoid texture mapping) - so I think this is a good example, as you saved me figuring it out for myself! Thanks :)
On the strict review side, I'd say the code actually needs to be quite a lot more rigorous before being put to general use. Let's put it like this - if you get a SegV in a MEX file you have to restart MATLAB, which is bad enough, but this code has loads of omissions - which could lead to whole system or graphics system crashes.
So sorry for being annoying, but hopefully some of my (painful!) experience will be useful:
1. More rigorous checking of the inputs. Sizes and class checks on all inputs will avoid many problems (normal MEX file stuff).
2. Look in the CUDA SDK examples at the 'deviceQuery' source code - rip it out, and use it to check device properties. John is usually right (array just not allocating and code returning garbage) - but in this case I suspect he may not be, as that is likely to lead to a seg fault. I think it's probably something to do with the block or grid size, which may exceed your card's limit when large matrices are used (hence garbled results). Using code from deviceQuery will show you how to check all the GPU's properties (including max memory allocation).
3. You should be able to set which device you use. I think this code defaults to device 0. So my system with a wussy GT8600 and a massive TeslaC1060 would run the code on the GT8600! You can select devices automatically (get the one with maxGFlops) or pass in an argument to say which one you want.
4. Use the cutilsafecall utility (again, check the SDK examples), or use the error flags to handle errors yourself. At the moment the returned status flags from CUDA functions such as cudaMAlloc aren't used. This leads to two problems: a. Seg Faults and b. Memory leaks when things fail, requiring system restart. Doing this will also tell you the answer to (2) above as it'll show if there are problems.
Th description is somewhat misleading. The function (and Mr. Buchgraber's) only interpolates to a set of regularly spaced points. It is equivalent to the ZI = interp2(Z,ntimes) call of interp2. It does not do 2D interpolation on a general. i.e. not equally spaced points. as can also be done with interp2