Great job!
Thanks to mtimesx I changed a huge for loop to 3D matrix computation, reducing script execution time from 14 s to 1.5 s!
Tested and compiled on Gentoo Linux (64bit) 2.6.33 on amd64 system. Compiling with gcc 4.3.* (not the deprecated 4.2 as suggested by Mathworks) I had to delete all quotes from C code files. Probably "//" sequence is not friendly to newer gcc compiler, maybe you can replace it with " /*quote*/ " wich is correctly ignored. Also to successfully compile I had to use:
wich is something different from default. Both '-lmwlapack', '-lmwblas' are the options suggested for compiling in UNIX systems (see "Building MEX-Files" in doc), but that was enough. I can say it works correctly, but I didn't run the benchmarks files.
Hope this helps!
Feel free to ask if some testing on LINUX 64bit is needed. I'm going to run tests soon.