4.5

4.5 | 2 ratings Rate this file 20 Downloads (last 30 days) File Size: 10.1 KB File ID: #29453

nth_element

by Peter Li

 

19 Nov 2010 (Updated 27 Jun 2013)

MEX wrap of C++ nth_element. Plus fast_median, a faster median function. In-place and parallel.

| Watch this File

File Information
Description

C++ std::nth_element is an efficient algorithm for selecting a ranked element from a vector of data. Typically it is implemented as a variant of quickselect, AKA Hoare's Selection Algorithm. The mex-file in this package will run nth_element over a 2D array column-wise. See C++ documentation and http://en.wikipedia.org/wiki/Selection_algorithm for more information.

I have added (as of v0.84) the ability to operate on data in-place. This potentially saves an array copy so can be significantly more efficient. I see about another 2x speedup in my tests. I've tested this somewhat, but you should still consider it experimental. It may also break on future versions of Matlab. Please give it a try and send feedback.

I have also added (as of v0.85) OpenMP pragmas to support operating on multiple columns in parallel. See nth_element.m or below for instructions on compiling the mex files with OpenMP support.

One example calculation based on nth_element is also included, a mex-file for fast_median. In my benchmarks, fast_median is roughly twice as fast as MatLab's native median function. MatLab's median relies on sort, but sorting the entire input data to get the median is inefficient. Theoretical average complexity of fast_median is O(n), compared to best case complexity of O(n log n) for a full sort based approach.

Median calculations are particularly important in robust statistics, for example the median absolute deviation (MAD).

To install, unpack the zip, go to the directory from MatLab, and run:
  > mex nth_element.cpp
  > mex fast_median.cpp

To get parallel processing of independent columns, you need to compile with OpenMP support. See your compiler instructions. With GCC, e.g., you would do:
  > mex nth_element.cpp CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"

Then put the resulting binaries on your MatLab path.

MATLAB release MATLAB 7.10 (R2010a)
Other requirements MEX setup with a C++ compiler. OpenMP supporting compiler for parallel column processing.
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (16)
26 Jun 2013 Peter Li

Well, this reply is very late, but Matthias I don't quite understand your issue. fast_median does work on the first dimension by default and seems to do the right thing with your example: med = fast_median(data(:));

The thing fast_median does not do is operate on row vectors, which is a common Matlab behavior I have not chosen to support; is that what you meant to ask about?

15 Jun 2013 Jaehwan  
27 Feb 2012 Matthias Klemm

As a follow up on WurmD's comment: "all" Matlab functions work on the first dimension by default so I think a "replacement" method should do so too.
The current behavior of fast_median breaks things like:
med = median(data(:));

Currently I can't use fast_median in my code as I need to be able to change the filtering function and I don't want to code in some special treatment for fast_median... :/

Would that be possible?

07 Dec 2011 Peter Li

Hi jianbo, I think this is the same as Martijn's request? I may have time to give this a try in the next month or two. Unfortunately I expect it to be significantly slower than just returning the partially-sorted output. This is a limitation of the way the C++ standard library is written and I'm not sure I can get around it. Bruno's version may have more flexibility to be efficient here.

05 Dec 2011 jianbo

Hi, I am interested to have index, rather of the n-th element output, thx.

25 Aug 2011 Peter Li

Hi LI, this uses a very similar algorithm to built-in sort, except that if you are only trying to know the "nth highest element" it is wasteful to sort the entire data. So it is faster than sort for finding out the "nth highest element" as long as the data are fairly large (for shorter data it performs about equally to sort).

Of course, if you are trying to find many different "nth highest elements" out of the same dataset you might be better off sorting the whole thing. One way to improve on the performance in that case would be to use the output array from nth_element as the input to the next call to nth_element; the array will become more fully sorted each iteration so the performance will get better. If you use the latest version (about to upload 0.84) you can use the inplace version of nth_element to make this very efficient.

Another caveat is that if you are trying to get, for example, the highest or lowest value then of course a simple max() or min() is better. This holds true if you are trying to get, for example, only the top three values out of a large dataset too.

19 Jun 2011 LI

It's interesting. I am wondering if it is faster than built-in function sort in MATLAB in some sceanrios.

12 Jun 2011 Peter Li

@Martijn: Good question; I believe I could implement this but it would require a significant rewrite. I doubt I'll have time in the short term.

If someone wants to take a stab at it, download my source and then change the call to std::nth_element so that instead of sorting a simple vector of numbers you are sorting a vector of number pairs where the first number of each pair is the value and the second number is the original index. You will also have to change the call to std:nth_element to the version that takes a Compare argument and write a custom Compare to simply take the pair and only use the first entry for sorting.

In the meantime, I recommend you look into Bruno Luong's implementation. He wrote the algorithm by hand in C, so probably has more flexibility to include features like this; it may already be included.

12 Jun 2011 Martijn Steenwijk

Nice work Peter. Is there a way to get out the (old) indices of the N ranked items?

19 May 2011 Peter Li

Thanks for the suggestion. I'm currently sitting on some other suggested updates too so I will try to look through things and release an update next week.

19 May 2011 WurmD

Thank you for your work Peter,

I would suggest fast_median to receive the same arguments as median, namely "M = median(A,dim)"
for it to be trivially swappable with median, and ensure its eventual inclusion into standard MatLab.

Cheers,

22 Feb 2011 Peter Li

Fixed version 0.83 has been uploaded, may take some time to post. I'll PM it to you to test. Thanks!

18 Feb 2011 Dun Kirk

Yes, i am running them from that directory with all *.h and *.cpp.
Microsoft Visual C++ 2008 SP1
is my compiler
and btw, I'm using win 7 64-bit with matlab r2010b 64-bit.

17 Feb 2011 Peter Li

Hmm, are you running these from the directory with the nth_element.h file? Can you tell me what compiler you're using?

17 Feb 2011 Dun Kirk

mex fast_median.cpp
mex nth_element.cpp

both of them return errors:
nth_element_0.81\nth_element.h(79) : error C2561: 'run_nth_element' : function must return a value

08 Dec 2010 Peter Li

Note Bruno Luong's very similar post here: http://www.mathworks.com/matlabcentral/fileexchange/23576-minmax-selection

His version is written in C. He wrote the partial quicksort algorithm himself and it looks like very nice work. I find that relying on the C++ standard library gives slightly faster performance, but your mileage may vary depending on your compiler.

Updates
19 Nov 2010

Just updating licenses to match FreeBSD required by MatLab central. Also added a little documentation to source files.

22 Feb 2011

Fixed void pointer return to void so that fancier C++ compilers will not complain about missing return type.

25 Aug 2011

Added in-place editing.

27 Jun 2013

Added OpenMP pragmas to support operating on multiple columns in parallel. Tried to clean up includes without complicating mex build process.

27 Jun 2013

Corrected OpenMP compile help in nth_element.m and added similar info in description here.

Contact us