polyfit vs mldivide upon inversion of arrays
Show older comments
OK, this is probably a stupid question, but I'm seeing some unexpected output when trying to do a linear fit to two sets of data. When I invert the x/y axes, the two fits are very different:
linearfitcoeff=polyfit(x,y,1)
linearfitcoeff =
0.9600 6.1664
>> linearfitcoeff=polyfit(y,x,1)
linearfitcoeff =
0.7659 177.7362
Data sets are large (over 75000 data points each). This is not what I would expect from a least-squares fit. Can someone explain this behaviour to me?
Update: mldivide works as expected, but I'm still curious about why polyfit doesn't
Thanks,
Matthew
Accepted Answer
More Answers (1)
Walter Roberson
on 25 Jul 2017
Edited: Walter Roberson
on 25 Jul 2017
0 votes
There is no reason to expect that the the coefficients should be nearly the same.
Take the simple case of a straight line, y = m*x + b . Solve for x: x = y/m - b/m . Therefore even in the simple case of a straight line, the coefficients would not be expected to be the same: it would be like x = (1/m) * y + (-1/m) * b
Note: polyfit works internally by forming the Vandermonde matrix V = [x.^n ... x.^2 x ones(size(x))] and using mldivide (except that it takes one extra step to increase precision.)
5 Comments
Matthew Lawrence
on 25 Jul 2017
Edited: Matthew Lawrence
on 25 Jul 2017
Walter Roberson
on 25 Jul 2017
syms A B C m b
X = [-1 0 1];
Y = A.*X.^2 + B*X + C;
residue = simplify(sum((m*X + b - Y).^2));
bestm = simplify(solve(diff(residue,m),m));
residue_b = simplify(subs(residue,m,bestm));
bestb = simplify(solve(diff(residue_b,b),b));
residue2 = simplify(sum((m*Y + b - X).^2));
bestm2 = simplify(solve(diff(residue2,m),m));
residue_b2 = simplify(subs(residue2,m,bestm2));
bestb2 = simplify(solve(diff(residue_b2,b),b));
Now under your hypothesis, bestm and bestm2 should be inverses of each other. Let's look:
>> bestm, bestm2
bestm =
B
bestm2 =
-(2*A*b - 2*B + 3*C*b)/(2*A^2 + 4*A*C + 2*B^2 + 3*C^2)
Doesn't look like it to me. How about bestb and bestb2, are they negative inverses?
>> bestb,bestb2
bestb =
(2*A)/3 + C
bestb2 =
-(B*(2*A + 3*C))/(A^2 + 3*B^2)
Nope.
Walter Roberson
on 25 Jul 2017
Let us look at the fitting of a straight line with a small tweak to the X values:
syms m b mhat bhat
dx = sym('dx',[1 3]);
X = [-1 0 1];
Y = m*(X+dx)+b;
residue = sum((mhat * X + bhat - Y).^2);
bestmhat = simplify(solve(diff(residue,mhat),mhat));
residue_bhat = simplify(subs(residue,mhat,bestmhat));
bestbhat = simplify(solve(diff(residue_bhat,bhat),bhat));
bestmhat = simplify(subs(bestmhat, bhat, bestbhat));
residue2 = sum((mhat * Y + bhat - X).^2);
bestmhat2 = simplify(solve(diff(residue2,mhat),mhat));
residue_bhat2 = simplify(subs(residue2,mhat,bestmhat2));
bestbhat2 = simplify(solve(diff(residue_bhat2,bhat),bhat));
bestmhat2 = simplify(subs(bestmhat2, bhat, bestbhat2));
Under your hypothesis, bestbhat and bestbhat2 would have to be pretty close to equal to negative inverses of each other, but:
>> bestbhat, bestbhat2
bestbhat =
b + (dx1*m)/3 + (dx2*m)/3 + (dx3*m)/3
bestbhat2 =
-((dx3 - dx1 + 2)*(3*b + dx1*m + dx2*m + dx3*m))/(2*m*(dx1^2 - dx1*dx2 - dx1*dx3 - 3*dx1 + dx2^2 - dx2*dx3 + dx3^2 + 3*dx3 + 3))
... they aren't. Let's take out some of the complication by supposing that the first two points were exact and there was just a tweak to the third:
>> simplify(subs(bestbhat,dx(1:2),[0 0]))
ans =
b + (dx3*m)/3
>> simplify(subs(bestbhat2,dx(1:2),[0 0]))
ans =
-((3*b + dx3*m)*(dx3 + 2))/(2*m*(dx3^2 + 3*dx3 + 3))
Nope. There are some structural similarities, certainly, but the change is non-linear (and not just inverse linear) with changes in X.
You could run the same thing with tweaks to Y instead of tweaks to X, but your hypothesis requires that tweaks to Y would have the same effect as tweaks to X.
Matthew Lawrence
on 25 Jul 2017
John D'Errico
on 26 Jul 2017
Edited: John D'Errico
on 26 Jul 2017
Draw a picture. As I said, polyfit does NOT minimize the distances to the line.
polyfit minimizes the sum of squares of VERTICAL distances to the line. This is very different.
When you swap x and y, polyfit still essentially minimizes vertical distances, but now they are what would before have been HORIZONTAL distances.
The point is that polyfit assumes the noise all lies in the second variable passed in, thus the dependent variable. y is the dependent variable in a regression. Polyfit assumes the independent variable (x) has no noise at all in it. Therefore the distance to the line is computed as a vertical distance.
Yes, I know that some may feel this is not fully intuitive. But that is how linear regression works. In these methods, it is assumed that you can control the value of x that is used. Thus an experiment. Then stuff happens, and you get out a value of y for each x. y is assumed to have some noise in that value. As I said, stuff happens.
You are thinking about the orthogonal regression case. In orthogonal regression, it is assumed that you measure the value of TWO variables. Call one x, the other y. There, both variables may be assumed to have some noise in them, some uncertainty in their value. In that case, it makes some amount of sense to minimize the orthogonal distances to the line, assuming both variables had noise in them. That is VERY different from the polyfit style of regression.
The two cases are really very different. So why does polyfit solve the vertical error problem? It does so because the problem becomes MUCH more difficult to solve for even a quadratic polynomial. Standard linear regression is trivial to compute for general polynomials if we assume the noise occurs only in y. It is difficult as hell for the orthogonal case on a quadratic or above.
Categories
Find more on Spline Postprocessing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

