polyfit vs mldivide upon inversion of arrays

OK, this is probably a stupid question, but I'm seeing some unexpected output when trying to do a linear fit to two sets of data. When I invert the x/y axes, the two fits are very different:
linearfitcoeff=polyfit(x,y,1)
linearfitcoeff =
0.9600 6.1664
>> linearfitcoeff=polyfit(y,x,1)
linearfitcoeff =
0.7659 177.7362
Data sets are large (over 75000 data points each). This is not what I would expect from a least-squares fit. Can someone explain this behaviour to me?
Update: mldivide works as expected, but I'm still curious about why polyfit doesn't
Thanks,
Matthew

 Accepted Answer

Sigh. This is in fact, EXTREMELY common. Exactly as expected. If there is any noise at all, swapping x and y will NOT produce a fit with an inverse slope. NOT. NOT. In fact, you will expect to see slope estimates with a bias when you swap the variables.
Lets make up some data.
x = rand(20,1);
y = x + randn(size(x))/5;
polyfit(x,y,1)
ans =
1.1032 -0.10593
polyfit(y,x,1)
ans =
0.67926 0.1779
So the first fit had a slope of 1.1. The second fit, if the slope was the inverse, would be roughly 0.9. But it was around 0.68.
Hmm. WHY? Why does this make complete sense? Why is it exactly the expected result? Here is the linear fit, essentially polyfit(x,y,1).
plot(x,y,'o')
The reason is leverage.
Look at the points I generated, completely at random. The errors were purely in y.
Leverage tells us that points near the center of the linear fit have very little impact on the estimate of the slope. It is the points near the extremes that sway the fit one way or another.
Now, suppose I have two points near the very end of the curve, at the bottom. To one of those points, I add some noise that is +delta (for some fixed delta.) To the other point, I subtract delta. I'll add two points in red so you can see the idea.
Since I added the same absolute noise, +delta and -delta to each y value, they will have relatively little influence on the estimated slope, when fit as polyfit(x,y,1).
Now, imagine I were to swap the axes and perform a straight line fit?
plot(y,x,'bo')
hold on
plot(ye,xe,'rs')
Look at the red point where I added delta. That point is now near the very middle of the data.
polyfit([y;ye'],[x;xe'],1)
ans =
0.63971 0.1755
Note that I added two points to my data fit, one high in y, the other low, and I got a result with a slope that was biased low from the previous fit in that direction.
What did I say before? Points near the middle have no impact on the slope. The point with noise that was low however is now a very high leverage point. It has a great deal of impact on the slope.
So what does this tell us? When you do a linear fit with data that has noise in y, then swap the axes and do another linear fit, so now the noise is in the wrong variable, it tends to produce a biased estimate of the slope. The new slope will tend to be biased LOW compared to the inverse of the original slope, because the high leverage points will drag the curve to have a lower slope than if you had fit the line the normal way.
So you saw EXACTLY what you should have expected to see. No surprise at all.
Of course, if you had no noise at all, then the line will be perfect, with the points falling exactly on the line. So no difference would now be expected. As the noise variance increases, the expected bias in the slope will grow, in a very predictable way.

5 Comments

I guess I'm not getting it - I would have thought that polyfit produces a line with minimum sum of the distances from all the points in the xy plot to the line. I would have thought that line/minimization would be independent of x/y flipping (the line would be the same line but with 1/slope and -1/bm when represented in the flipped-axes graph). Is that not what polyfit does?
NO NO NO NO NO NO!!!!!! That is not at all what polyfit does!
You are thinking about what is called orthogonal regression. Polyfit assumes the noise is entirely in y (the dependent variable.) This is true of backslash too. That is the assumption for ALL standard linear regression tools. Thus polyfit, lsqr, lscov, lsqlin, regress, backslash, etc. I'm sure I missed a few in that list.
So, when you start with a problem with noise in y, then swap the variables, it becomes a problem with noise in what polyfit thinks is now the INDEPENDENT variable. But polyfit still assume the noise is in what it sees as the dependent variable (the second input to polyfit.) Polyfit has no idea that you have swapped x and y.
When that happens you get a result with a biased slope estimate. The constant term may or may not be screwed up, but you should expect it to be wrong too.
In an orthogonal regression, the fit produced tries to project each point by the closest orthogonal distance to the line, assuming errors in BOTH x and y. That is not what polyfit does. In fact, if you have errors ONLY in y, then orthogonal regression is the wrong thing to do.
So now go back and read my (now) complete answer carefully. Think about what I've said.
Note: I did see that you commented that mldivide does seem to work for you. BE CAREFUL THERE!!!!!!!!!!!!!
I think you use mldivide as
S = x\y;
That fits a model with NO intercept term, so it forces the line to go through the origin. In order to do a comparable fit with polyfit, you need to bring a constant term into that model. So simple use of mldivide in that way is NOT the same thing as polyfit at all.
Got it - it's clear that polyfit is not what I'm looking for. I am just looking for the line with the minimal distance from all points. Hence my confusion. Thanks!
The orthogonal regression case CAN be solved. In fact, there are surely tools on the FEX that solve it. I took a look through my unpublished tools, and I wrote one long ago. Nothing fancy, just an orthogonal fit for a straight line. I never saw a reason to post it on the FEX since the method is pretty commonly used.
In the code below, the results will be essentially always identical if you flip x and y, in the sense that the slope will be the reciprocal. So if you start with
y = m*x+b
then flip x and y, we will have:
x = (1/m)*y + (-b/m)
Again, those results will be the same, except for floating point trash in the least significant bits. (Never test for exact equality of floating point numbers.)
Here is my code:
function [P1,meandist] = orthogonalRegress(x,y)
% Simple orthogonal regression between x and y
% usage: P1 = orthogonalRegress(x,y)
%
% x,y - vector input data
%
% P1 - linear (first order) polynomial fit to the data,
% in the form of [slope,intercept].
%
% meandist - measure of the residual error, as computed
% in the form of the mean of the orthogonal distances
% to the computed line.
% mean subtract
mux = mean(x);
muy = mean(y);
xhat = x - mux;
yhat = y - muy;
% use svd to do the regression
A = [xhat(:),yhat(:)];
[U,S,V] = svd(A,0);
% slope of the line is just the negative of the ratio of the components
% in the second singular vector, as found in V.
slope = -V(1,2)/V(2,2);
% recover the y-intercept
yintercept = muy - slope*mux;
% return the line in standard coefficient form
P1 = [slope,yintercept];
% the singular value from the svd itself will give us an error measure,
% but a better measure might be just the mean of the orthogonal
% distances to the line
OD = abs(y - P1(1)*x - P1(2))/sqrt(1 + P1(1)^2);
meandist = mean(OD);

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 25 Jul 2017
Edited: Walter Roberson on 25 Jul 2017
There is no reason to expect that the the coefficients should be nearly the same.
Take the simple case of a straight line, y = m*x + b . Solve for x: x = y/m - b/m . Therefore even in the simple case of a straight line, the coefficients would not be expected to be the same: it would be like x = (1/m) * y + (-1/m) * b
Note: polyfit works internally by forming the Vandermonde matrix V = [x.^n ... x.^2 x ones(size(x))] and using mldivide (except that it takes one extra step to increase precision.)

5 Comments

Sorry, I wasn't clear - I was expecting the results you mentioned, but you can see that the results aren't those results: 1/0.96 is not 0.77 and the intercept relation doesn't hold either. Far from it.
mldivide on its own does give the inverse slope I was expecting.
syms A B C m b
X = [-1 0 1];
Y = A.*X.^2 + B*X + C;
residue = simplify(sum((m*X + b - Y).^2));
bestm = simplify(solve(diff(residue,m),m));
residue_b = simplify(subs(residue,m,bestm));
bestb = simplify(solve(diff(residue_b,b),b));
residue2 = simplify(sum((m*Y + b - X).^2));
bestm2 = simplify(solve(diff(residue2,m),m));
residue_b2 = simplify(subs(residue2,m,bestm2));
bestb2 = simplify(solve(diff(residue_b2,b),b));
Now under your hypothesis, bestm and bestm2 should be inverses of each other. Let's look:
>> bestm, bestm2
bestm =
B
bestm2 =
-(2*A*b - 2*B + 3*C*b)/(2*A^2 + 4*A*C + 2*B^2 + 3*C^2)
Doesn't look like it to me. How about bestb and bestb2, are they negative inverses?
>> bestb,bestb2
bestb =
(2*A)/3 + C
bestb2 =
-(B*(2*A + 3*C))/(A^2 + 3*B^2)
Nope.
Let us look at the fitting of a straight line with a small tweak to the X values:
syms m b mhat bhat
dx = sym('dx',[1 3]);
X = [-1 0 1];
Y = m*(X+dx)+b;
residue = sum((mhat * X + bhat - Y).^2);
bestmhat = simplify(solve(diff(residue,mhat),mhat));
residue_bhat = simplify(subs(residue,mhat,bestmhat));
bestbhat = simplify(solve(diff(residue_bhat,bhat),bhat));
bestmhat = simplify(subs(bestmhat, bhat, bestbhat));
residue2 = sum((mhat * Y + bhat - X).^2);
bestmhat2 = simplify(solve(diff(residue2,mhat),mhat));
residue_bhat2 = simplify(subs(residue2,mhat,bestmhat2));
bestbhat2 = simplify(solve(diff(residue_bhat2,bhat),bhat));
bestmhat2 = simplify(subs(bestmhat2, bhat, bestbhat2));
Under your hypothesis, bestbhat and bestbhat2 would have to be pretty close to equal to negative inverses of each other, but:
>> bestbhat, bestbhat2
bestbhat =
b + (dx1*m)/3 + (dx2*m)/3 + (dx3*m)/3
bestbhat2 =
-((dx3 - dx1 + 2)*(3*b + dx1*m + dx2*m + dx3*m))/(2*m*(dx1^2 - dx1*dx2 - dx1*dx3 - 3*dx1 + dx2^2 - dx2*dx3 + dx3^2 + 3*dx3 + 3))
... they aren't. Let's take out some of the complication by supposing that the first two points were exact and there was just a tweak to the third:
>> simplify(subs(bestbhat,dx(1:2),[0 0]))
ans =
b + (dx3*m)/3
>> simplify(subs(bestbhat2,dx(1:2),[0 0]))
ans =
-((3*b + dx3*m)*(dx3 + 2))/(2*m*(dx3^2 + 3*dx3 + 3))
Nope. There are some structural similarities, certainly, but the change is non-linear (and not just inverse linear) with changes in X.
You could run the same thing with tweaks to Y instead of tweaks to X, but your hypothesis requires that tweaks to Y would have the same effect as tweaks to X.
But if I have a line which has the minimum sum of distances to all my data points, if I flip all data points AND the line about x=y, won't the distances to the line for all data points remain the same, and therefore won't the sum of distance remain a minimum?
Draw a picture. As I said, polyfit does NOT minimize the distances to the line.
polyfit minimizes the sum of squares of VERTICAL distances to the line. This is very different.
When you swap x and y, polyfit still essentially minimizes vertical distances, but now they are what would before have been HORIZONTAL distances.
The point is that polyfit assumes the noise all lies in the second variable passed in, thus the dependent variable. y is the dependent variable in a regression. Polyfit assumes the independent variable (x) has no noise at all in it. Therefore the distance to the line is computed as a vertical distance.
Yes, I know that some may feel this is not fully intuitive. But that is how linear regression works. In these methods, it is assumed that you can control the value of x that is used. Thus an experiment. Then stuff happens, and you get out a value of y for each x. y is assumed to have some noise in that value. As I said, stuff happens.
You are thinking about the orthogonal regression case. In orthogonal regression, it is assumed that you measure the value of TWO variables. Call one x, the other y. There, both variables may be assumed to have some noise in them, some uncertainty in their value. In that case, it makes some amount of sense to minimize the orthogonal distances to the line, assuming both variables had noise in them. That is VERY different from the polyfit style of regression.
The two cases are really very different. So why does polyfit solve the vertical error problem? It does so because the problem becomes MUCH more difficult to solve for even a quadratic polynomial. Standard linear regression is trivial to compute for general polynomials if we assume the noise occurs only in y. It is difficult as hell for the orthogonal case on a quadratic or above.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!