Path: news.mathworks.com!not-for-mail
From: "John D'Errico" <woodchips@rochester.rr.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: linear regression - inconsistent results
Date: Thu, 6 Nov 2008 23:12:02 +0000 (UTC)
Organization: John D'Errico (1-3LEW5R)
Lines: 51
Message-ID: <gevtk2$t2n$1@fred.mathworks.com>
References: <gevnh1$bnn$1@fred.mathworks.com>
Reply-To: "John D'Errico" <woodchips@rochester.rr.com>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1226013122 29783 172.30.248.38 (6 Nov 2008 23:12:02 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 6 Nov 2008 23:12:02 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 869215
Xref: news.mathworks.com comp.soft-sys.matlab:499408


"Russ Scott" <robinandruss@gmail.com> wrote in message <gevnh1$bnn$1@fred.mathworks.com>...
> I've been noticing when using regress or polyfit that I'm getting inconsistent result when I switch the 1 independent variable with the dependent variable. Mathematically this does not make sense to me.
> 
> If y = mx + b
> 
> then 
> 
> x = y/m - b/m
> 
> But I've found for certain datasets that when I flip the y and x around using either regress (and adding a column of ones to X) or polyfit(x,y,1) I get non-consistent results.
> 

(snip)

> THESE ARE INCONSISTENT RESULTS AREN'T THEY?

NO! Sorry, but they are not. I'll just expand on
the other comments by a bit. What model do
you assume when you fit a linear regression
model? Do you know? If not, then you should
expend the effort to learn, because this is the
cause of your confusion.

You may think that you are fitting the model

  y = a*x + b

but, you are not. That model is actually missing
a term. In fact, your true model is of the form

  y = a*x + b + E_i

where E_i is assumed to be normally (Gaussian)
distributed error. Thus for each data point,
we assume additive, zero mean errors, but
with an unknown variance. Those errors are
added to the value of y.

What happens when you swap the x and y
in your regression? In effect, the model is
suddenly in a different form.

  x = c*y + d + F_i

Here, the errors are assumed to be in the
x variable. This difference is the source
of your problem. The model is truly
different, so those regression parameters
will certainly be different too.

John