C++ std::nth_element is an efficient algorithm for selecting a ranked element from a vector of data. Typically it is implemented as a variant of quickselect, AKA Hoare's Selection Algorithm. The mex-file in this package will run nth_element over a 2D array column-wise. See C++ documentation and http://en.wikipedia.org/wiki/Selection_algorithm for more information.
I have added (as of v0.84) the ability to operate on data in-place. This potentially saves an array copy so can be significantly more efficient. I see about another 2x speedup in my tests. I've tested this somewhat, but you should still consider it experimental. It may also break on future versions of Matlab. Please give it a try and send feedback.
I have also added (as of v0.85) OpenMP pragmas to support operating on multiple columns in parallel. See nth_element.m or below for instructions on compiling the mex files with OpenMP support.
One example calculation based on nth_element is also included, a mex-file for fast_median. In my benchmarks, fast_median is roughly twice as fast as MatLab's native median function. MatLab's median relies on sort, but sorting the entire input data to get the median is inefficient. Theoretical average complexity of fast_median is O(n), compared to best case complexity of O(n log n) for a full sort based approach.
Median calculations are particularly important in robust statistics, for example the median absolute deviation (MAD).
To install, unpack the zip, go to the directory from MatLab, and run:
> mex nth_element.cpp
> mex fast_median.cpp
To get parallel processing of independent columns, you need to compile with OpenMP support. See your compiler instructions. With GCC, e.g., you would do:
> mex nth_element.cpp CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
Then put the resulting binaries on your MatLab path. |