Thread Subject: linear regression - inconsistent results

Subject: linear regression - inconsistent results

From: Russ Scott

Date: 6 Nov, 2008 21:28:01

Message: 1 of 6

I've been noticing when using regress or polyfit that I'm getting inconsistent result when I switch the 1 independent variable with the dependent variable. Mathematically this does not make sense to me.

If y = mx + b

then

x = y/m - b/m

But I've found for certain datasets that when I flip the y and x around using either regress (and adding a column of ones to X) or polyfit(x,y,1) I get non-consistent results.

This is a short example to illustrate my problem.

>> d=[ 0.0074143 0.052035
    0.0076173 0.0014361
    0.0077408 -0.0041507
     0.013317 0.054487
     0.013289 0.061777
     0.013346 0.055137
      0.01397 -0.046578
     0.014114 -0.026229
     0.014658 0.042499
     0.020282 -0.0010642];
>> polyfit(d(:,2),d(:,1),1)

ans =

    -0.011263 0.012788

>> polyfit(d(:,1),d(:,2),1)

ans =

      -1.0641 0.032315

THESE ARE INCONSISTENT RESULTS AREN'T THEY?
e.g.,
1/-0.011263 ~= -1.0641

Subject: linear regression - inconsistent results

From: Steven Lord

Date: 6 Nov, 2008 21:51:50

Message: 2 of 6


"Russ Scott" <robinandruss@gmail.com> wrote in message
news:gevnh1$bnn$1@fred.mathworks.com...
> I've been noticing when using regress or polyfit that I'm getting
> inconsistent result when I switch the 1 independent variable with the
> dependent variable. Mathematically this does not make sense to me.
>
> If y = mx + b
>
> then
>
> x = y/m - b/m
>
> But I've found for certain datasets that when I flip the y and x around
> using either regress (and adding a column of ones to X) or polyfit(x,y,1)
> I get non-consistent results.
>
> This is a short example to illustrate my problem.
>
>>> d=[ 0.0074143 0.052035
> 0.0076173 0.0014361
> 0.0077408 -0.0041507
> 0.013317 0.054487
> 0.013289 0.061777
> 0.013346 0.055137
> 0.01397 -0.046578
> 0.014114 -0.026229
> 0.014658 0.042499
> 0.020282 -0.0010642];

Let's take a look at your data:

x = d(:, 1);
y = d(:, 2);
plot(x, y, 'go')
hold on

Do the green circles look like they're arranged on a line? Does it make
sense to fit a line to this data set, given how the points are distributed?

Pictures are sometimes truly worth a thousand words.

>>> polyfit(d(:,2),d(:,1),1)
>
> ans =
>
> -0.011263 0.012788

mb1 = polyfit(y, x, 1);
plot(polyval(mb1, y), y, 'r+')

When we evaluate the red line for the y values you used to fit the line, it
seems to be a rough approximation to the vertical line around x = 0.014.

>>> polyfit(d(:,1),d(:,2),1)
>
> ans =
>
> -1.0641 0.032315

mb2 = polyfit(x, y, 1);
plot(x, polyval(mb2, x), 'kx')

These look to be a rough fit to the horizontal line "splitting the
difference" between the points around y = 0 and the points around y = 0.05.

> THESE ARE INCONSISTENT RESULTS AREN'T THEY?
> e.g.,
> 1/-0.011263 ~= -1.0641

If your data was more "linear", then you might expect the relationship you
proposed above to hold.


figure
x = 1:10; y = 7*(x+rand(size(x)));
plot(x, y, 'go');
hold on

mb1 = polyfit(y, x, 1); % x = mb1(1)*y + mb1(2)
plot(polyval(mb1, y), y, 'r+')

mb2 = polyfit(x, y,1); % y = mb2(1)*x + mb2(2)
plot(x, polyval(mb2, x), 'kx')

compareLinearCoeff = [mb1(1), 1./mb2(1)]
compareConstantCoeff = [mb1(2), -mb2(2)./mb2(1)]


On these graphs, the black x's and the red +'s are much closer to one
another and to the green circles. While the relationship you proposed above
doesn't exactly hold, in this case the data is much better fit by the lines
and so the relationship is much closer to being satisfied.

--
Steve Lord
slord@mathworks.com

Subject: linear regression - inconsistent results

From: Ken Campbell

Date: 6 Nov, 2008 22:11:02

Message: 3 of 6

"Russ Scott" <robinandruss@gmail.com> wrote in message <gevnh1$bnn$1@fred.mathworks.com>...
> I've been noticing when using regress or polyfit that I'm getting inconsistent result when I switch the 1 independent variable with the dependent variable. Mathematically this does not make sense to me.
>
> If y = mx + b
>
> then
>
> x = y/m - b/m
>
> But I've found for certain datasets that when I flip the y and x around using either regress (and adding a column of ones to X) or polyfit(x,y,1) I get non-consistent results.
>
> This is a short example to illustrate my problem.
>
> >> d=[ 0.0074143 0.052035
> 0.0076173 0.0014361
> 0.0077408 -0.0041507
> 0.013317 0.054487
> 0.013289 0.061777
> 0.013346 0.055137
> 0.01397 -0.046578
> 0.014114 -0.026229
> 0.014658 0.042499
> 0.020282 -0.0010642];
> >> polyfit(d(:,2),d(:,1),1)
>
> ans =
>
> -0.011263 0.012788
>
> >> polyfit(d(:,1),d(:,2),1)
>
> ans =
>
> -1.0641 0.032315
>
> THESE ARE INCONSISTENT RESULTS AREN'T THEY?
> e.g.,
> 1/-0.011263 ~= -1.0641


In addition to the points made by Steve, note that regression minimizes

sum((y_data-y_predicition).^2)

which doesn't have to be the same as

sum((x_data-x_prediction).^2)

so transposing your data and repeating the fit won't normally give related regression parameters.

Ken

Subject: linear regression - inconsistent results

From: John D'Errico

Date: 6 Nov, 2008 23:12:02

Message: 4 of 6

"Russ Scott" <robinandruss@gmail.com> wrote in message <gevnh1$bnn$1@fred.mathworks.com>...
> I've been noticing when using regress or polyfit that I'm getting inconsistent result when I switch the 1 independent variable with the dependent variable. Mathematically this does not make sense to me.
>
> If y = mx + b
>
> then
>
> x = y/m - b/m
>
> But I've found for certain datasets that when I flip the y and x around using either regress (and adding a column of ones to X) or polyfit(x,y,1) I get non-consistent results.
>

(snip)

> THESE ARE INCONSISTENT RESULTS AREN'T THEY?

NO! Sorry, but they are not. I'll just expand on
the other comments by a bit. What model do
you assume when you fit a linear regression
model? Do you know? If not, then you should
expend the effort to learn, because this is the
cause of your confusion.

You may think that you are fitting the model

  y = a*x + b

but, you are not. That model is actually missing
a term. In fact, your true model is of the form

  y = a*x + b + E_i

where E_i is assumed to be normally (Gaussian)
distributed error. Thus for each data point,
we assume additive, zero mean errors, but
with an unknown variance. Those errors are
added to the value of y.

What happens when you swap the x and y
in your regression? In effect, the model is
suddenly in a different form.

  x = c*y + d + F_i

Here, the errors are assumed to be in the
x variable. This difference is the source
of your problem. The model is truly
different, so those regression parameters
will certainly be different too.

John

Subject: linear regression - inconsistent results

From: Scott Seidman

Date: 6 Nov, 2008 23:36:33

Message: 5 of 6

"Ken Campbell" <campbeks@gmail.com> wrote in
news:gevq1m$g80$1@fred.mathworks.com:

> In addition to the points made by Steve, note that regression
> minimizes
>
> sum((y_data-y_predicition).^2)
>
> which doesn't have to be the same as
>
> sum((x_data-x_prediction).^2)
>
> so transposing your data and repeating the fit won't normally give
> related regression parameters.
>
> Ken
>

Exactly. The independent variable is called the independent variable for
a reason. you know that this isn't where your errors are. If you have
two variables with errors, the correct optimization is an orthogonal
regression, or an "error in variables" model.

--
Scott
Reverse name to reply

Subject: linear regression - inconsistent results

From: Russ Scott

Date: 6 Nov, 2008 23:48:02

Message: 6 of 6


THANKS ALL for your help and information. Got it.

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
regress Russ Scott 6 Nov, 2008 16:30:22
polyfit Russ Scott 6 Nov, 2008 16:30:22
rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com