|
Hi Amrender
The 2009a release of MATLAB just shipped this morning. It has some brand
new functionality that sounds like a very good match for your use case.
The 2009a release includes some major enhancements to the Curve Fitting
Toolbox product. Curve Fitting Toolbox 2.0 now supports surface fitting.
The toolbox includes a surface fitting algorithm called LOWESS (also known
as Locally Weighted Scatterplot Smoothing, aka localized regression)
LOWESS is a nonparametric algorithm. This means that you don't have to
specify any functional form for your model. So long as you have a
reasonably dense cloud of data points, you can normally generate a pretty
good fit.
LOWESS does have some limitations. In particular, you need to specify the
bandwidth for the smoother. If you're bandwidth is too large, your R^2 will
be pretty bad. If your bandwidth is too small, you'll get a great R^2, but
you'll "overfit" the data. Your best option is to use a technique called
Cross Validation to estimate a good bandwidth for the smoother.
The following code will give you a pretty good idea how to use this
functionality.
The code generates a noisy surface using the peaks function inside MATLAB.
The demo then uses cross validation to estimate a smoothing parameter and
then uses a bootstrap to estimate confidence intervals.
This technique is very similar to methods that are described in "The
Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman.
(You'll need Statistics Toolbox and Curve Fitting Toolbox 2.0 to run this
code)
%% Generate a dataset
clear all
a = 3;
b = -10;
c = -1/3;
P = haltonset( 2 );
P = scramble( P, 'RR2' );
X = net( P, 300);
x = ((X(:,1)) * 6) - 3;
y = (X(:,2) * 6) - 3;
z = a*(1-x).^2.*exp(-(x.^2) - (y+1).^2) + b*(x/5 - x.^3 -
y.^5).*exp(-x.^2-y.^2) + c*exp(-(x+1).^2 - y.^2);
% Add noise
noise = (0.05 *(range(z))*randn( size( z ) ));
z = z + noise;
%% Use Cross Validation to estimate the optimal bandwidth for the smoother
tic
nSpans = 49;
spans = log10( logspace( 0.02, 0.5, nSpans ) );
mse = zeros( size( spans ) );
cp = cvpartition( size( X, 1 ), 'k', 10 );
for j = 1:nSpans
f = @( x, y, xi ) feval( fit( x, y, 'lowess', 'span', spans(j) ), xi );
mse(j) = crossval( 'mse', X, z, 'PredFun', f, 'Partition', cp );
end
toc
%% Identify the span that minimizes the Mean Squared Error
[minmse, minj] = min( mse );
span = spans(minj);
semilogx( spans, mse, '-', span, minmse, 'ro' )
title( sprintf( 'Best span is %f', span ) );
%% Plot a Surface with that Span
set( gca, 'XScale', 'Linear' );
sf = fit( X, z, 'lowess', 'span', span )
plot( sf, X, z );
view( 135, 15 )
%% Richard's homemade nonparametric residual bootstrap
tic
nboot = 350;
Zhat = sf(X(:,1),X(:,2));
Residuals = z - Zhat;
dump = zeros(length(X),nboot);
for i = 1:nboot
Eresiduals = randsample(Residuals,length(Residuals),true);
EZhat = Zhat + Eresiduals;
sfb = fit( X, EZhat, 'lowess', 'span', span );
dump(:,i) = sfb(X(:,1),X(:,2));
end
stdloess = std(dump,0,2);
meanloess = mean(dump,2);
plot(Zhat(:),meanloess(:),'o')
toc
%% Generate 95% confidence intervals for the Surface
% 95% of the surfaces generated will fall between +- 2 standard deviations
% of the original surface
sf_lb = fit(X, Zhat - 2*stdloess, 'linearinterp');
sf_ub = fit(X, Zhat + 2*stdloess, 'linearinterp');
plot( sf, X, z );
plot( sf_lb, 'Parent', gca );
plot( sf_ub, 'Parent', gca );
view( 135, 15 )
|