This function executes a fast version of the non-parametric Theil-Sen robust linear regression algorithm by finding the median slope between all pairwise combinations of points in a given data set.
For my application I needed to run a robust regression on large data sets (many thousands of points), but the implementations I found on the File Exchange were far too slow (see figure). This code is substantially faster, and for large data sets can be two orders of magnitude faster than those currently available.
I believe the 'nanmedian' function requires a toolbox?
You can make this function toolbox free if you use the following code
b = median(D(~isnan(D)));
Replace D with whatever the parameter of nanmedium was previously.
Great script btw!
@Ian: Absolutely right, I have updated the submission with your suggestion.
Fantastic script, but you can squeeze even more speed out of this. When you calculate the slopes, you are building a symmetric matrix and calculating each slope twice, when you only need to do it once. Change the calculation of C to:
C(i,i:end) = (data(i,2)-data(i:end,2))./(data(i,1) - data(i:end,1));
(That still calculates unnecessary 'NaN' values for the diagonal, but the indexing is easy.)
Using timeit to time both functions with various values of n up to 8192, I get an almost 2x speedup (not surprising because I'm only doing half the calculations). For 8192 points on my machine, the time goes from 11.278s to 6.3985s.
@Dillon: You are absolutely right, the legend entries are swapped.
There is a small error in the example code you have in the documentation. I believe you mislabeled the Least Squares line and the the Theil-Sen line. LS should be blue and TS should be red.
This is the best of the three Theil-Sen scripts. Can return both m and b, faster by 30x, and handles multi-dimensional input.
- updated help
Updated code to function with higher dimensional data sets.
Updated units on associated figure.