This function executes a fast version of the non-parametric Theil-Sen robust linear regression algorithm by finding the median slope between all pairwise combinations of points in a given data set.
For my application I needed to run a robust regression on large data sets (many thousands of points), but the implementations I found on the File Exchange were far too slow (see figure). This code is substantially faster, and for large data sets can be two orders of magnitude faster than those currently available.
Zachary Danziger (2021). Theil-Sen Robust Linear Regression (https://www.mathworks.com/matlabcentral/fileexchange/48294-theil-sen-robust-linear-regression), MATLAB Central File Exchange. Retrieved .
Thank you for your effort, I cant calculate the intercept?? Would you tell me how to calculate it please??
Great routine, thanks!
Minor correction: Citation should be to Gilbert (1987) §16.5, not 6.5
I am a total newbie to the field so there are a few things I am missing here.
I want to estimate a linear model such y = a + b1*X1 + b2*X2. My input array should be data = [ones(n,1), X1, X2, y]?
What is the meaning of the offset output? What about standard errors? Any help and/or references would be highly appreciated. Thanks
Then how would you plot the line on the time series data after you have estimated the slope?
Very good contrib
I believe the 'nanmedian' function requires a toolbox?
You can make this function toolbox free if you use the following code
b = median(D(~isnan(D)));
Replace D with whatever the parameter of nanmedium was previously.
Great script btw!
@Ian: Absolutely right, I have updated the submission with your suggestion.
Fantastic script, but you can squeeze even more speed out of this. When you calculate the slopes, you are building a symmetric matrix and calculating each slope twice, when you only need to do it once. Change the calculation of C to:
C(i,i:end) = (data(i,2)-data(i:end,2))./(data(i,1) - data(i:end,1));
(That still calculates unnecessary 'NaN' values for the diagonal, but the indexing is easy.)
Using timeit to time both functions with various values of n up to 8192, I get an almost 2x speedup (not surprising because I'm only doing half the calculations). For 8192 points on my machine, the time goes from 11.278s to 6.3985s.
@Dillon: You are absolutely right, the legend entries are swapped.
There is a small error in the example code you have in the documentation. I believe you mislabeled the Least Squares line and the the Theil-Sen line. LS should be blue and TS should be red.
This is the best of the three Theil-Sen scripts. Can return both m and b, faster by 30x, and handles multi-dimensional input.
Inspired by: Theil–Sen estimator
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!