Code covered by the BSD License  

Highlights from
polyfitn

5.0

5.0 | 3 ratings Rate this file 163 Downloads (last 30 days) File Size: 52.58 KB File ID: #34765

polyfitn

by John D'Errico

 

25 Jan 2012

Polynomial modeling in 1 or n dimensions

| Watch this File

File Information
Description

Polyfitn is an extension of polyfit, allowing the user to create models with more than one independent variable. It also allows the user to specify a general model, for example, a quadratic model, with constant and quadratic terms, but no linear term.

For example, to fit a polynomial model to points selected from a cosine curve, we will only need the even ordered terms.

x = -2:.1:2;
y = cos(x);
p = polyfitn(x,y,'constant x^2 x^4 x^6');
p.Coefficients
ans =
    [0.99996 -0.49968 0.041242 -0.0012079]

The coefficients won't be exact of course, as I used only a finite number of terms for what is essentially a truncated Taylor series, and I had only a finite amount of points to build the model from. The exact first 4 coefficients for a cosine series would have been:

>> [1 -1/2 1/24 -1/720]
ans =
            1 -0.5 0.041667 -0.0013889

so we got the expected result.

Of course, polyfitn works in higher dimensions, as it was this problem it was really designed to solve.

x = rand(100,1);
y = rand(100,1);
z = exp(x+y) + randn(100,1)/100;
p = polyfitn([x,y],z,3);

The result can be converted into a symbolic form to view the model more simply. Here I'll use my sympoly toolbox, but there is also a polyn2sym function provided.

polyn2sympoly(p)
ans =
    0.43896*X1^3 + 1.4919*X1^2*X2 + 0.041084*X1^2 + 1.4615*X1*X2^2 - 0.095977*X1*X2 + 1.2799*X1 + 0.56912*X2^3 - 0.15306*X2^2 + 1.361*X2 + 0.94819

And of course, parameter error estimates are generated for those who wish to determine the significance of the terms generated.

I've also provided tools for evaluating these models, and to differentiate a model.

A caveat - beware the use of high order polynomials to fit your data. Just because a low order model works, a high order model is not necessarily better. High order polynomials often suffer from severe ringing between the data points. Always plot your data. Think about the model you will be building. Then plot the resulting model. Use your eyes to validate the result, more so than simply looking at an r-squared coefficient (although I do return that parameter too.)

If you do find that a high order polynomial mode is necessary because your curve is simply too complicated, consider using a regression or smoothing spline model instead.

Acknowledgements
This submission has inspired the following:
gapolyfitn
MATLAB release MATLAB 7.5 (R2007b)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (6)
01 Feb 2012 Mr D

Thanks..your function solve my problem, but honestly I'm curious about the concept of linear regression in 2d surface. Do we have to do polyfit (1st order) on each row to get the best fit plane of the surface?

Thanks.

01 Feb 2012 John D'Errico

A linear regression (which is polyfitn) uses backslash at its core. For example, suppose one wished to use polyfit to estimate a quadratic model in one variable. The model would be

  y = a1*x^2 + a2*x + a3 (+noise)

polyfit is able to estimate the three unknown coefficients [a1,a2,a3] by solving a linear (but overdetermined) system of equations in a way that minimizes the sum of squares of the residuals. Again, backslash is the engine underneath here.

Now, suppose one wished to estimate a more complex model, with two independent variables, z(x,y). I'll write it as a full two variable quadratic model, so there will be six unknown coefficients.

z = a1*x^2 + a2*x*y + a3*y^2 + a4*x + a5*y + a6 + noise

As long as you have sufficient data to estimate the six unknown coefficients, the problem is essentially the same. See that they enter the problem linearly in the same way as for the one variable problem we looked at before. Backslash again can solve the overdetermined problem to estimate the coefficients.

In fact, both polyfit and polyfitn use a QR factorization of the design matrix to achieve their goals. This is because they are both designed to also generate covariance matrix estimates on the parameters. A QR factorization is the easiest way to serve both purposes, but its all just linear algebra.

08 Feb 2012 Slaven

Great function! Thank you. Would there be an easy way to convert it to do weighted least squares. I was looking to try to do it myself, perhaps you have some insights or warnings?

08 Feb 2012 John D'Errico

This should do it...

Look for these two lines below:

[Q,R,E] = qr(M,0);

polymodel.Coefficients(E) = R\(Q'*depvar);

Assume a column vector of weights as W, one for each data point. Change the above lines to read:

[Q,R,E] = qr(bsxfun(@times,W(:),M),0);

polymodel.Coefficients(E) = R\(Q'*(W(:).*depvar));

This should give you weighted estimates of the parameters. I think the statistics on the parameters will work too, as well as R^2, etc. Of course, you will need to pass in the weight vector too as an argument.

The above change assumes you are using a version that has bsxfun available. Its been around for a while so that should not be a problem. Even if not, a call to repmat will replace it.

22 Feb 2012 John

Great set of functions!

Can you suggest any ways to vectorize their use? I have a scalar function that is dependent on its location in space f(x,y,z), and I calculate this function surrounding N bodies. Currently, I run a loop for each body, but is there a way to modify the code so that the backslash operator works across all of them at once, returning N sets of coefficients?

22 Feb 2012 John D'Errico

John - There are two possible questions here. First, if you have a problem with what is known as multiple right hand sides, then yes it is very vectorizable. By this I mean a problem with the same sets of independent variables, but different sets of dependent variables. Then backslash is the way to go, allowing you to factorize the equations once for a huge time savings. You would need to build the design matrix of course, but this is not a difficult thing to do.

If however, you have multiple problems, each with different sets of independent variables, then vectorization would require you to build a sparse block diagonal design matrix. As such, you can gain a bit in this effort, but not as significant of a gain as the first case. You need to build the design matrix as a sparse matrix. The trick here is to supply the blocks to blkdiag as individually sparse matrices. Thus, create the individual (sparse) design matrices in a cell array, then pass it to blkdiag as a comma separated list. Can you do this efficiently enough to make the backslash solve more efficient than a loop around the calls? Yes, but as I said, it will take some effort.

You can probably glean from my response that it will not be a trivial action to vectorize the result from this code, but that you could do things more efficiently with some time invested.

Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
polyfit John D'Errico 25 Mar 2012 06:59:33
modeling John D'Errico 25 Mar 2012 06:59:33
regression John D'Errico 25 Mar 2012 06:59:33
linear regression John D'Errico 25 Mar 2012 06:59:33
approximation John D'Errico 25 Mar 2012 06:59:33
function John D'Errico 25 Mar 2012 06:59:33
curve John D'Errico 25 Mar 2012 06:59:33
multidimensional John D'Errico 25 Mar 2012 06:59:33
curve fitting John D'Errico 25 Mar 2012 06:59:33
polynomial John D'Errico 25 Mar 2012 06:59:33
fit John D'Errico 25 Mar 2012 06:59:33
interpolation John D'Errico 25 Mar 2012 06:59:33
fitting John D'Errico 25 Mar 2012 06:59:33

Contact us at files@mathworks.com