LOWESS, a robust regression like LOWESS allows detecton of a trend otherwise with too much variance
LOWESS Locally Weighted Scatterplot Smoothing that does not require the statistical toolbox in matlab.
This regression will work on linear and nonlinear relationships between X and Y.
Modifications:
12/19/2008  added upper and lower LOWESS smooths. These additional smooths show how the distribution of Y varies with X. These smooths are simply LOWESS applied to the positive and negative residuals separately, then added to the original lowess of the data. The same smoothing factor is applied to both the upper and lower limits.
2/21/2009  added sorting to the function, data no longer need to be sorted. Also added a routine such that if a user also supplies a second dataset, linear interpolations are done one the lowess and used to predict yvalues for the supplied xvalues.
10/27/2009  modified the second user provided Xdata for obtaining predictions. Matlab function unique sorts by default. It really was not needed in the section of code to perform linear interpolations of the xdata using the ypredicted LOWESS results. If the user does not supply a second xdata set, it will assume to use the supplied xy data set. Thus there is an output (xy) that maintains the original sequence of the input. Additionally, the user can now include a sequence index as the first column of input data. This can be a datenum or some other ordering index. The output will be sequenced using that index. If a sequence index is provided a second subplot will be created show the predicted Yvalues in the order of the included sequence index. I suspect this sequence index most often will be a DateTime (i.e. datenum). Just to the function generic enough, the Xaxis labels are not converted to a nice date format, but the user could easily change that with a datetic attribute in the subplot.
6/15/2012  oddly, when using this routine on data without a time sequence (i.e. a third column), the plotting portions cause an error. Not sure how I would have missed that but...I think I have it fixed.
Using a robust regression like LOWESS allows one the ability to detect a trend in data that may otherwise have too much variance resulting in nonsignificance pvalues.
Yhat (prediction) is computed from a weghted least squares regression whose weights are both a function of distance from X and magnitude from of the residual from the previous regression.
The logic of these functions and subfunctions follow the USGS
Kendall.exe routines. Because matlab is 8byte precision, there are some very small differences between FORTRAN compiled and matlab. Maybe 64bit OS's has 16byte precision in matlab?
Data are expected to be sorted prior to data input for this function. Sorted on first column of datain.
There is a very simple subfucntion to create a plot of the data and regression if the user so choses with a flag in the call to the lowess function. BTW the png file looks much better than what the figure looks like on screen.
There are loops in these routines to keep the memory requirements to a minimum, since it is foreseeable that one may have very large datasets to work with.
f = a smoothing factor between 0 and 1. The closer to one, the more smoothing done.
Syntax:
[dataout lowerLimit upperLimit]
= lowess(datain,f,wantplot,imagefile)
datain = n x 2 matrix
dataout = n x 3 matrix
wantplot = scaler (optional)
if ~= 0 then create plot
imagefile = full path and file name where to output the figure to an
png file type at 600 dpi.
e.g. imagefile = 'd:\temp\lowess.png';
where:
datain(:,1) = x
datain(:,2) = y
f = scaler (0 < f < 1)
wantplot = scaler
imagefile = string
datain must be sorted prior to loading into this function on the
xvalue. This is not done in the function because the user may want to have the end result be unsorted (e.g. time sort).
dataout(:,1) = x
dataout(:,2) = y
dataout(:,3) = yprediction (aka yhat)
lowerLimit(:,1) = x with negative residuals
lowerLimit(:,2) = yprediction of residuals + original yprediction
upperLimit(:,1) = x with positive residuals
upperLimit(:,2) = yprediction of residuals + original yprediction
Requirements: none
Written by
Jeff Burkey
King County Department of Natural Resources and Parks
email: jeff.burkey@kingcounty.gov
12/16/2008
1.4  There wasnt really an issue having zero values; rather, when supplying data with no time sequence, the interpolation would error. 

1.3  Updated graphing. Revised input. Includes example data file. Revised output. 

1.2  Added a sorting call. Also added a few lines of code allow the user to supply a second set of xvalues to be used for ypredictions using the results of the lowess regression. 

1.1  Added computation of upper and lower smooths of the residuals, also will plot on figure. 
StephanieJ (view profile)
Excellent and works as documented. Thank you for sharing Jeff. This has been helpful for me to smooth noisy optical data plots. There is more to explore here for me before I utilize the full capability.
Muhammad Haziq Kamarul Azman (view profile)
Sorry, the equation should've been:
b = (length(x)*wxywx*wy)/(length(x)*wxxwx^2);
Muhammad Haziq Kamarul Azman (view profile)
Very good piece of code.
Just one question though. Why is it, in the function wlsq, that the calculation of b is missing the signal length N? Shouldn't it be:
b = (wxywx*wy)/(length(x)*wxxwx^2);
Szymon Beczkowski (view profile)
Very verbose function. Remove all unnecessary fprintfs and replace the necessary ones with warnings. Do not put tic/toc inside the function.
Ji (view profile)
Alex (view profile)
does anyone know whether this uses linear regression, or polynomial regression with degree 2 to do the smoothing? Is it possible to modify it to use weighted moving averages, i.e. only fit the constant term in regression?
Carl Witthoft (view profile)
Recommend input validation for "f". E.g.,
if ( ~exist('f','var')  f <=0 ),
f = 0.1;
end
And it definitely needs to allow x=0. Imagine a NASA launch timeline, "T minus 5, Tminus 4, 3,2,1,...Oh wait, we can't allow a zero...T plus 1..." :)
Mike Bindschadler (view profile)
Thanks for a great function which generally appears to work very well. However, users should be aware that any data point with an x value of zero will be ignored. It's not clear what the reasoning for this is, but I had to change it to treat points with x values of 0 the same as any other point.
Michael Schwartz (view profile)
Works great  just as described. Well documented. Thanks a lot Jeff.
Yan (view profile)
Say at time point 0, we have measured the concentration of a molecule. What measured is just the initial value.
Yan (view profile)
Many thanks to Jeff for replying me and solving the problem.
Jeff Burkey (view profile)
If one truly measures zero, what would be the expected response? A constant? I can see how that may pose a problem, but what is the question imposed on the data? This is a good question. I did not consider measuring zero. Give me a few days to revisit and have a response.  Jeff
Yan (view profile)
If datain is an n×2 matrix, the first column of datain does not allow 0.
Yan (view profile)
I tried the function, and it worked for part of my problems. It seems that the code does not support if datain contains 0 in it? If there is 0 in the datain, the error report is:
??? Undefined function or variable "xy".
Error in ==> lowess at 269
customplot(dataout,upperLimit,lowerLimit,f,[dte x_data
y_data],xy,imagefile);
ChihWei Tsai (view profile)
Andre Guy Tranquille (view profile)
Jeff Burkey (view profile)
You shouldn’t need to modify the function to not input the “xdata”. If you don’t input it, the function will perform just not that task. If that is the case, don’t specify the “xy” as part of your output either.
The “dataout” variable contains the predicted y for each x value supplied in the “datain” variable. The “xdata” is if you want a different set of ypredictions using the lowess function for other xdata not provided in the “datain” variable.
I don’t remember if the function I posted sorts or not, but if you are wanting to regress as a function of time, you may not want to sort.
I hope this helps.
 Jeff
Kaite (view profile)
awsome! This is exactly what I need. I am a little confused about one thing...I have an n x 2 matrix in datain that consists of x=precip and y=groundwater level that I would like to do the regression on and get the ypredictions using the lowess regression results. I modified the function to not include the "xdata" and "xy" but I am getting errors about the trycatch syntax on line 298 of the mfile.I have version 7.4.0 (R2007a). The error states that it will continue to run but I dont get results. Am I putting the data in the wrong place? Do I need to put a second data set in? I wanted to take the resulting predicted y values and run the MannKendall test to see if i see a significant trend. what is the purpose of the second dataset of xvalues? As you can tell I am new to Matlab and it would be wonderful if I could figure this out! thanks so much!! If i can get this to work for my data this would be AMAZING!
Jeff Evans (view profile)
Fantastic! Worked perfectly the first time. I did modify it just a bit to fit my purposes by suppressing the screen output and only returning the dataout variable, since don't want to plot the confidence limits. Thanks for the contribution.
 Jeff