<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/238802</link>
    <title>MATLAB Central Newsreader - linear regression - inconsistent results</title>
    <description>Feed for thread: linear regression - inconsistent results</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Thu, 06 Nov 2008 21:28:01 -0500</pubDate>
      <title>linear regression - inconsistent results</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/238802#609460</link>
      <author>Russ Scott</author>
      <description>I've been noticing when using regress or polyfit that I'm getting inconsistent result when I switch the 1 independent variable with the dependent variable. Mathematically this does not make sense to me.&lt;br&gt;
&lt;br&gt;
If y = mx + b&lt;br&gt;
&lt;br&gt;
then &lt;br&gt;
&lt;br&gt;
x = y/m - b/m&lt;br&gt;
&lt;br&gt;
But I've found for certain datasets that when I flip the y and x around using either regress (and adding a column of ones to X) or polyfit(x,y,1) I get non-consistent results.&lt;br&gt;
&lt;br&gt;
This is a short example to illustrate my problem.&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt; d=[    0.0074143     0.052035&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.0076173    0.0014361&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.0077408   -0.0041507&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.013317     0.054487&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.013289     0.061777&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.013346     0.055137&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.01397    -0.046578&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.014114    -0.026229&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.014658     0.042499&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.020282   -0.0010642];&lt;br&gt;
&amp;gt;&amp;gt; polyfit(d(:,2),d(:,1),1)&lt;br&gt;
&lt;br&gt;
ans =&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-0.011263     0.012788&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt; polyfit(d(:,1),d(:,2),1)&lt;br&gt;
&lt;br&gt;
ans =&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-1.0641     0.032315&lt;br&gt;
&lt;br&gt;
THESE ARE INCONSISTENT RESULTS AREN'T THEY?&lt;br&gt;
e.g., &lt;br&gt;
1/-0.011263 ~=  -1.0641</description>
    </item>
    <item>
      <pubDate>Thu, 06 Nov 2008 21:51:50 -0500</pubDate>
      <title>Re: linear regression - inconsistent results</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/238802#609467</link>
      <author>Steven Lord</author>
      <description>&lt;br&gt;
&quot;Russ Scott&quot; &amp;lt;robinandruss@gmail.com&amp;gt; wrote in message &lt;br&gt;
news:gevnh1$bnn$1@fred.mathworks.com...&lt;br&gt;
&amp;gt; I've been noticing when using regress or polyfit that I'm getting &lt;br&gt;
&amp;gt; inconsistent result when I switch the 1 independent variable with the &lt;br&gt;
&amp;gt; dependent variable. Mathematically this does not make sense to me.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; If y = mx + b&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; then&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; x = y/m - b/m&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; But I've found for certain datasets that when I flip the y and x around &lt;br&gt;
&amp;gt; using either regress (and adding a column of ones to X) or polyfit(x,y,1) &lt;br&gt;
&amp;gt; I get non-consistent results.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; This is a short example to illustrate my problem.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt;&amp;gt; d=[    0.0074143     0.052035&lt;br&gt;
&amp;gt;    0.0076173    0.0014361&lt;br&gt;
&amp;gt;    0.0077408   -0.0041507&lt;br&gt;
&amp;gt;     0.013317     0.054487&lt;br&gt;
&amp;gt;     0.013289     0.061777&lt;br&gt;
&amp;gt;     0.013346     0.055137&lt;br&gt;
&amp;gt;      0.01397    -0.046578&lt;br&gt;
&amp;gt;     0.014114    -0.026229&lt;br&gt;
&amp;gt;     0.014658     0.042499&lt;br&gt;
&amp;gt;     0.020282   -0.0010642];&lt;br&gt;
&lt;br&gt;
Let's take a look at your data:&lt;br&gt;
&lt;br&gt;
x = d(:, 1);&lt;br&gt;
y = d(:, 2);&lt;br&gt;
plot(x, y, 'go')&lt;br&gt;
hold on&lt;br&gt;
&lt;br&gt;
Do the green circles look like they're arranged on a line?  Does it make &lt;br&gt;
sense to fit a line to this data set, given how the points are distributed?&lt;br&gt;
&lt;br&gt;
Pictures are sometimes truly worth a thousand words.&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt;&amp;gt; polyfit(d(:,2),d(:,1),1)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; ans =&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;    -0.011263     0.012788&lt;br&gt;
&lt;br&gt;
mb1 = polyfit(y, x, 1);&lt;br&gt;
plot(polyval(mb1, y), y, 'r+')&lt;br&gt;
&lt;br&gt;
When we evaluate the red line for the y values you used to fit the line, it &lt;br&gt;
seems to be a rough approximation to the vertical line around x = 0.014.&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt;&amp;gt; polyfit(d(:,1),d(:,2),1)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; ans =&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;      -1.0641     0.032315&lt;br&gt;
&lt;br&gt;
mb2 = polyfit(x, y, 1);&lt;br&gt;
plot(x, polyval(mb2, x), 'kx')&lt;br&gt;
&lt;br&gt;
These look to be a rough fit to the horizontal line &quot;splitting the &lt;br&gt;
difference&quot; between the points around y = 0 and the points around y = 0.05.&lt;br&gt;
&lt;br&gt;
&amp;gt; THESE ARE INCONSISTENT RESULTS AREN'T THEY?&lt;br&gt;
&amp;gt; e.g.,&lt;br&gt;
&amp;gt; 1/-0.011263 ~=  -1.0641&lt;br&gt;
&lt;br&gt;
If your data was more &quot;linear&quot;, then you might expect the relationship you &lt;br&gt;
proposed above to hold.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
figure&lt;br&gt;
x = 1:10; y = 7*(x+rand(size(x)));&lt;br&gt;
plot(x, y, 'go');&lt;br&gt;
hold on&lt;br&gt;
&lt;br&gt;
mb1 = polyfit(y, x, 1); % x = mb1(1)*y + mb1(2)&lt;br&gt;
plot(polyval(mb1, y), y, 'r+')&lt;br&gt;
&lt;br&gt;
mb2 = polyfit(x, y,1); % y = mb2(1)*x + mb2(2)&lt;br&gt;
plot(x, polyval(mb2, x), 'kx')&lt;br&gt;
&lt;br&gt;
compareLinearCoeff = [mb1(1), 1./mb2(1)]&lt;br&gt;
compareConstantCoeff = [mb1(2), -mb2(2)./mb2(1)]&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
On these graphs, the black x's and the red +'s are much closer to one &lt;br&gt;
another and to the green circles.  While the relationship you proposed above &lt;br&gt;
doesn't exactly hold, in this case the data is much better fit by the lines &lt;br&gt;
and so the relationship is much closer to being satisfied.&lt;br&gt;
&lt;br&gt;
-- &lt;br&gt;
Steve Lord&lt;br&gt;
slord@mathworks.com </description>
    </item>
    <item>
      <pubDate>Thu, 06 Nov 2008 22:11:02 -0500</pubDate>
      <title>Re: linear regression - inconsistent results</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/238802#609475</link>
      <author>Ken Campbell</author>
      <description>&quot;Russ Scott&quot; &amp;lt;robinandruss@gmail.com&amp;gt; wrote in message &amp;lt;gevnh1$bnn$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; I've been noticing when using regress or polyfit that I'm getting inconsistent result when I switch the 1 independent variable with the dependent variable. Mathematically this does not make sense to me.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; If y = mx + b&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; then &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; x = y/m - b/m&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; But I've found for certain datasets that when I flip the y and x around using either regress (and adding a column of ones to X) or polyfit(x,y,1) I get non-consistent results.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; This is a short example to illustrate my problem.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; d=[    0.0074143     0.052035&lt;br&gt;
&amp;gt;     0.0076173    0.0014361&lt;br&gt;
&amp;gt;     0.0077408   -0.0041507&lt;br&gt;
&amp;gt;      0.013317     0.054487&lt;br&gt;
&amp;gt;      0.013289     0.061777&lt;br&gt;
&amp;gt;      0.013346     0.055137&lt;br&gt;
&amp;gt;       0.01397    -0.046578&lt;br&gt;
&amp;gt;      0.014114    -0.026229&lt;br&gt;
&amp;gt;      0.014658     0.042499&lt;br&gt;
&amp;gt;      0.020282   -0.0010642];&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; polyfit(d(:,2),d(:,1),1)&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; ans =&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt;     -0.011263     0.012788&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; polyfit(d(:,1),d(:,2),1)&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; ans =&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt;       -1.0641     0.032315&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; THESE ARE INCONSISTENT RESULTS AREN'T THEY?&lt;br&gt;
&amp;gt; e.g., &lt;br&gt;
&amp;gt; 1/-0.011263 ~=  -1.0641&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
In addition to the points made by Steve, note that regression minimizes&lt;br&gt;
&lt;br&gt;
sum((y_data-y_predicition).^2)&lt;br&gt;
&lt;br&gt;
which doesn't have to be the same as&lt;br&gt;
&lt;br&gt;
sum((x_data-x_prediction).^2)&lt;br&gt;
&lt;br&gt;
so transposing your data and repeating the fit won't normally give related regression parameters.&lt;br&gt;
&lt;br&gt;
Ken</description>
    </item>
    <item>
      <pubDate>Thu, 06 Nov 2008 23:12:02 -0500</pubDate>
      <title>Re: linear regression - inconsistent results</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/238802#609485</link>
      <author>John D'Errico</author>
      <description>&quot;Russ Scott&quot; &amp;lt;robinandruss@gmail.com&amp;gt; wrote in message &amp;lt;gevnh1$bnn$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; I've been noticing when using regress or polyfit that I'm getting inconsistent result when I switch the 1 independent variable with the dependent variable. Mathematically this does not make sense to me.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; If y = mx + b&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; then &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; x = y/m - b/m&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; But I've found for certain datasets that when I flip the y and x around using either regress (and adding a column of ones to X) or polyfit(x,y,1) I get non-consistent results.&lt;br&gt;
&amp;gt; &lt;br&gt;
&lt;br&gt;
(snip)&lt;br&gt;
&lt;br&gt;
&amp;gt; THESE ARE INCONSISTENT RESULTS AREN'T THEY?&lt;br&gt;
&lt;br&gt;
NO! Sorry, but they are not. I'll just expand on&lt;br&gt;
the other comments by a bit. What model do&lt;br&gt;
you assume when you fit a linear regression&lt;br&gt;
model? Do you know? If not, then you should&lt;br&gt;
expend the effort to learn, because this is the&lt;br&gt;
cause of your confusion.&lt;br&gt;
&lt;br&gt;
You may think that you are fitting the model&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;y = a*x + b&lt;br&gt;
&lt;br&gt;
but, you are not. That model is actually missing&lt;br&gt;
a term. In fact, your true model is of the form&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;y = a*x + b + E_i&lt;br&gt;
&lt;br&gt;
where E_i is assumed to be normally (Gaussian)&lt;br&gt;
distributed error. Thus for each data point,&lt;br&gt;
we assume additive, zero mean errors, but&lt;br&gt;
with an unknown variance. Those errors are&lt;br&gt;
added to the value of y.&lt;br&gt;
&lt;br&gt;
What happens when you swap the x and y&lt;br&gt;
in your regression? In effect, the model is&lt;br&gt;
suddenly in a different form.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;x = c*y + d + F_i&lt;br&gt;
&lt;br&gt;
Here, the errors are assumed to be in the&lt;br&gt;
x variable. This difference is the source&lt;br&gt;
of your problem. The model is truly&lt;br&gt;
different, so those regression parameters&lt;br&gt;
will certainly be different too.&lt;br&gt;
&lt;br&gt;
John</description>
    </item>
    <item>
      <pubDate>Thu, 06 Nov 2008 23:36:33 -0500</pubDate>
      <title>Re: linear regression - inconsistent results</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/238802#609488</link>
      <author>Scott Seidman</author>
      <description>&quot;Ken Campbell&quot; &amp;lt;campbeks@gmail.com&amp;gt; wrote in&lt;br&gt;
news:gevq1m$g80$1@fred.mathworks.com: &lt;br&gt;
&lt;br&gt;
&amp;gt; In addition to the points made by Steve, note that regression&lt;br&gt;
&amp;gt; minimizes &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; sum((y_data-y_predicition).^2)&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; which doesn't have to be the same as&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; sum((x_data-x_prediction).^2)&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; so transposing your data and repeating the fit won't normally give&lt;br&gt;
&amp;gt; related regression parameters. &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Ken&lt;br&gt;
&amp;gt; &lt;br&gt;
&lt;br&gt;
Exactly.  The independent variable is called the independent variable for &lt;br&gt;
a reason.  you know that this isn't where your errors are.  If you have &lt;br&gt;
two variables with errors, the correct optimization is an orthogonal &lt;br&gt;
regression, or an &quot;error in variables&quot; model.&lt;br&gt;
&lt;br&gt;
-- &lt;br&gt;
Scott&lt;br&gt;
Reverse name to reply</description>
    </item>
    <item>
      <pubDate>Thu, 06 Nov 2008 23:48:02 -0500</pubDate>
      <title>Re: linear regression - inconsistent results</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/238802#609491</link>
      <author>Russ Scott</author>
      <description>&lt;br&gt;
THANKS ALL for your help and information.  Got it.</description>
    </item>
  </channel>
</rss>

