Hello,
I have been trying for a couple of days now to efficiently apply the lessons and code from the Webinair Data Driven Fitting by Richard Willey. I am most interested in how he was able to apply a nonparametric fitting routine to a set of datapoints.
I am trying to create a nonparametric fit and interpolate values using the function "smooth" for 218 different utilities with each having many datapoints. I have tried to break apart his code and read everything about it but there are a couple of lines I cannot figure out and am hoping someone here could explain these sections of code:
%%Fitit
% Copyright (c) 2011, The MathWorks, Inc.
function [myfit,varargout] = fitit(X,Y,varargin)
....
....
% Finding optimal span for lowess
num = 99;
spans = linspace(.01,.99,num);
sse = zeros(size(spans));
cp = cvpartition(100,'k',10);
for j=1:length(spans),
f = @(train,test) norm(test(:,2)  mylowess(train,test(:,1),spans(j)))^2;
sse(j) = sum(crossval(f,[X,Y],'partition',cp));
end
[~,minj] = min(sse);
span = spans(minj);
I have read all the help documentation related to "cvpartition" and "crossval" and have looked at the code for the mylowess function which is given later but the code I do not understand is given by:
for j=1:length(spans),
f = @(train,test) norm(test(:,2)  mylowess(train,test(:,1),spans(j)))^2;
sse(j) = sum(crossval(f,[X,Y],'partition',cp));
end
I understand he is trying to minimize the sum of squared errors, but how are the variables train and test created? From cvpartition? Why do you take the norm minus the result from the mylowess function? What exactly is crossval doing? Why do we linear interpolate (from mylowess function) using test(:,1) whatever that is?
And the accompanying mylowess function is given by:
function ys=mylowess(xy,xs,span)
%MYLOWESS Lowess smoothing, preserving x values
% YS=MYLOWESS(XY,XS) returns the smoothed version of the x/y data in the
% twocolumn matrix XY, but evaluates the smooth at XS and returns the
% smoothed values in YS. Any values outside the range of XY are taken to
% be equal to the closest values.
if nargin<3  isempty(span)
span = .3;
end
% Sort and get smoothed version of xy data
xy = sortrows(xy);
x1 = xy(:,1);
y1 = xy(:,2);
ys1 = smooth(x1,y1,span,'loess');
% Remove repeats so we can interpolate
t = diff(x1)==0;
x1(t)=[]; ys1(t) = [];
% Interpolate to evaluate this at the xs values
ys = interp1(x1,ys1,xs,'linear',NaN);
% Some of the original points may have x values outside the range of the
% resampled data. Those are now NaN because we could not interpolate them.
% Replace NaN by the closest smoothed value. This amounts to extending the
% smooth curve using a horizontal line.
if any(isnan(ys))
ys(xs<x1(1)) = ys1(1);
ys(xs>x1(end)) = ys1(end);
end
If anyone can provide some insight on the exact specifics of this code that would be great. Thanks.
Kevin
