Path: news.mathworks.com!not-for-mail
From: "Steven Lord" <slord@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: linear regression - inconsistent results
Date: Thu, 6 Nov 2008 16:51:50 -0500
Organization: The MathWorks, Inc.
Lines: 99
Message-ID: <gevotm$t21$1@fred.mathworks.com>
References: <gevnh1$bnn$1@fred.mathworks.com>
Reply-To: "Steven Lord" <slord@mathworks.com>
NNTP-Posting-Host: lords.dhcp.mathworks.com
X-Trace: fred.mathworks.com 1226008310 29761 144.212.105.187 (6 Nov 2008 21:51:50 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 6 Nov 2008 21:51:50 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579
Xref: news.mathworks.com comp.soft-sys.matlab:499390



"Russ Scott" <robinandruss@gmail.com> wrote in message 
news:gevnh1$bnn$1@fred.mathworks.com...
> I've been noticing when using regress or polyfit that I'm getting 
> inconsistent result when I switch the 1 independent variable with the 
> dependent variable. Mathematically this does not make sense to me.
>
> If y = mx + b
>
> then
>
> x = y/m - b/m
>
> But I've found for certain datasets that when I flip the y and x around 
> using either regress (and adding a column of ones to X) or polyfit(x,y,1) 
> I get non-consistent results.
>
> This is a short example to illustrate my problem.
>
>>> d=[    0.0074143     0.052035
>    0.0076173    0.0014361
>    0.0077408   -0.0041507
>     0.013317     0.054487
>     0.013289     0.061777
>     0.013346     0.055137
>      0.01397    -0.046578
>     0.014114    -0.026229
>     0.014658     0.042499
>     0.020282   -0.0010642];

Let's take a look at your data:

x = d(:, 1);
y = d(:, 2);
plot(x, y, 'go')
hold on

Do the green circles look like they're arranged on a line?  Does it make 
sense to fit a line to this data set, given how the points are distributed?

Pictures are sometimes truly worth a thousand words.

>>> polyfit(d(:,2),d(:,1),1)
>
> ans =
>
>    -0.011263     0.012788

mb1 = polyfit(y, x, 1);
plot(polyval(mb1, y), y, 'r+')

When we evaluate the red line for the y values you used to fit the line, it 
seems to be a rough approximation to the vertical line around x = 0.014.

>>> polyfit(d(:,1),d(:,2),1)
>
> ans =
>
>      -1.0641     0.032315

mb2 = polyfit(x, y, 1);
plot(x, polyval(mb2, x), 'kx')

These look to be a rough fit to the horizontal line "splitting the 
difference" between the points around y = 0 and the points around y = 0.05.

> THESE ARE INCONSISTENT RESULTS AREN'T THEY?
> e.g.,
> 1/-0.011263 ~=  -1.0641

If your data was more "linear", then you might expect the relationship you 
proposed above to hold.


figure
x = 1:10; y = 7*(x+rand(size(x)));
plot(x, y, 'go');
hold on

mb1 = polyfit(y, x, 1); % x = mb1(1)*y + mb1(2)
plot(polyval(mb1, y), y, 'r+')

mb2 = polyfit(x, y,1); % y = mb2(1)*x + mb2(2)
plot(x, polyval(mb2, x), 'kx')

compareLinearCoeff = [mb1(1), 1./mb2(1)]
compareConstantCoeff = [mb1(2), -mb2(2)./mb2(1)]


On these graphs, the black x's and the red +'s are much closer to one 
another and to the green circles.  While the relationship you proposed above 
doesn't exactly hold, in this case the data is much better fit by the lines 
and so the relationship is much closer to being satisfied.

-- 
Steve Lord
slord@mathworks.com