File Exchange

## Theil-Sen Robust Linear Regression

version 1.2 (2.53 KB) by

Quickly perform linear regression that is robust to outliers.

Updated

This function executes a fast version of the non-parametric Theil-Sen robust linear regression algorithm by finding the median slope between all pairwise combinations of points in a given data set.
For my application I needed to run a robust regression on large data sets (many thousands of points), but the implementations I found on the File Exchange were far too slow (see figure). This code is substantially faster, and for large data sets can be two orders of magnitude faster than those currently available.

T C

### T C (view profile)

I believe the 'nanmedian' function requires a toolbox?

You can make this function toolbox free if you use the following code

b = median(D(~isnan(D)));

Replace D with whatever the parameter of nanmedium was previously.

Great script btw!

Zachary Danziger

### Zachary Danziger (view profile)

@Ian: Absolutely right, I have updated the submission with your suggestion.

Ian Craig

### Ian Craig (view profile)

Fantastic script, but you can squeeze even more speed out of this. When you calculate the slopes, you are building a symmetric matrix and calculating each slope twice, when you only need to do it once. Change the calculation of C to:

C(i,i:end) = (data(i,2)-data(i:end,2))./(data(i,1) - data(i:end,1));

(That still calculates unnecessary 'NaN' values for the diagonal, but the indexing is easy.)

Using timeit to time both functions with various values of n up to 8192, I get an almost 2x speedup (not surprising because I'm only doing half the calculations). For 8192 points on my machine, the time goes from 11.278s to 6.3985s.

Zachary Danziger

### Zachary Danziger (view profile)

@Dillon: You are absolutely right, the legend entries are swapped.

Dillon Amaya

### Dillon Amaya (view profile)

There is a small error in the example code you have in the documentation. I believe you mislabeled the Least Squares line and the the Theil-Sen line. LS should be blue and TS should be red.

Levi Golston

### Levi Golston (view profile)

This is the best of the three Theil-Sen scripts. Can return both m and b, faster by 30x, and handles multi-dimensional input.

 29 Sep 2015 1.2 - updated help - speed increase for 2D case 30 Jan 2015 1.2 Updated code to function with higher dimensional data sets. Included example in documentation. 28 Oct 2014 1.1 Updated units on associated figure.
##### MATLAB Release
MATLAB 8.5 (R2015a)