<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/169345</link>
    <title>MATLAB Central Newsreader - Variance?</title>
    <description>Feed for thread: Variance?</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2008 by The MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>The MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Thu, 15 May 2008 20:43:21 -0400</pubDate>
      <title>Variance?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/169345#432356</link>
      <author>saneman</author>
      <description>I have a vector that contains:&lt;br&gt;
&lt;br&gt;
v = 0.5677    0.4792    0.4844    0.4870    0.5104    0.4870    0.4792 &lt;br&gt;
0.4974    0.4688    0.4870&lt;br&gt;
&lt;br&gt;
Now I would like to know how much this data varies. I was thinking:&lt;br&gt;
&lt;br&gt;
a = max(v) - min(v)&lt;br&gt;
&lt;br&gt;
But if just one sample is very different (like 10.0) then the above &lt;br&gt;
procedure will not give a realistic result.&lt;br&gt;
&lt;br&gt;
I have also tried to use the matlab var function:&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt; var(v)&lt;br&gt;
&lt;br&gt;
ans =&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;7.7977e-004&lt;br&gt;
&lt;br&gt;
But 0.0007 is not where most of the data belongs. Is there better approach &lt;br&gt;
to this problem? &lt;br&gt;
&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 15 May 2008 21:24:02 -0400</pubDate>
      <title>Re: Variance?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/169345#432362</link>
      <author>Roger Stafford</author>
      <description>"saneman" &amp;lt;asd@ad.com&amp;gt; wrote in message &amp;lt;g0i78i$ltc$1@news.net.uni-&lt;br&gt;
c.dk&amp;gt;...&lt;br&gt;
&amp;gt; I have a vector that contains:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; v = 0.5677    0.4792    0.4844    0.4870    0.5104    0.4870    0.4792 &lt;br&gt;
&amp;gt; 0.4974    0.4688    0.4870&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Now I would like to know how much this data varies. I was thinking:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; a = max(v) - min(v)&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; But if just one sample is very different (like 10.0) then the above &lt;br&gt;
&amp;gt; procedure will not give a realistic result.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I have also tried to use the matlab var function:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; var(v)&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; ans =&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt;   7.7977e-004&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; But 0.0007 is not where most of the data belongs. Is there better approach &lt;br&gt;
&amp;gt; to this problem? &lt;br&gt;
------------&lt;br&gt;
&amp;nbsp;&amp;nbsp;Remember, the 'var' function returns the mean of the *squares* of the &lt;br&gt;
differences between your numbers and their mean value.  Your differences &lt;br&gt;
from their mean are somewhere in the neighborhood of .03 so the mean &lt;br&gt;
square of these differences would be in the neighborhood of .0009.  (Actually &lt;br&gt;
you got .0008 .)  To get a value which is comparable to these differences, you &lt;br&gt;
should either call on 'std' or take the square root of the variance.&lt;br&gt;
&lt;br&gt;
Roger Stafford&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 15 May 2008 21:38:36 -0400</pubDate>
      <title>Re: Variance?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/169345#432367</link>
      <author>saneman</author>
      <description>&lt;br&gt;
"Roger Stafford" &amp;lt;ellieandrogerxyzzy@mindspring.com.invalid&amp;gt; skrev i en &lt;br&gt;
meddelelse news:g0i9li$qum$1@fred.mathworks.com...&lt;br&gt;
&amp;gt; "saneman" &amp;lt;asd@ad.com&amp;gt; wrote in message &amp;lt;g0i78i$ltc$1@news.net.uni-&lt;br&gt;
&amp;gt; c.dk&amp;gt;...&lt;br&gt;
&amp;gt;&amp;gt; I have a vector that contains:&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; v = 0.5677    0.4792    0.4844    0.4870    0.5104    0.4870    0.4792&lt;br&gt;
&amp;gt;&amp;gt; 0.4974    0.4688    0.4870&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; Now I would like to know how much this data varies. I was thinking:&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; a = max(v) - min(v)&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; But if just one sample is very different (like 10.0) then the above&lt;br&gt;
&amp;gt;&amp;gt; procedure will not give a realistic result.&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; I have also tried to use the matlab var function:&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&amp;gt; var(v)&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; ans =&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt;   7.7977e-004&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; But 0.0007 is not where most of the data belongs. Is there better &lt;br&gt;
&amp;gt;&amp;gt; approach&lt;br&gt;
&amp;gt;&amp;gt; to this problem?&lt;br&gt;
&amp;gt; ------------&lt;br&gt;
&amp;gt;  Remember, the 'var' function returns the mean of the *squares* of the&lt;br&gt;
&amp;gt; differences between your numbers and their mean value.  Your differences&lt;br&gt;
&amp;gt; from their mean are somewhere in the neighborhood of .03 so the mean&lt;br&gt;
&amp;gt; square of these differences would be in the neighborhood of .0009. &lt;br&gt;
&amp;gt; (Actually&lt;br&gt;
&amp;gt; you got .0008 .)  To get a value which is comparable to these differences, &lt;br&gt;
&amp;gt; you&lt;br&gt;
&amp;gt; should either call on 'std' or take the square root of the variance.&lt;br&gt;
&amp;gt;&lt;br&gt;
&lt;br&gt;
How is it possible to use these functions on the data without supplying &lt;br&gt;
information about the probabilities? &lt;br&gt;
&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 15 May 2008 22:33:02 -0400</pubDate>
      <title>Re: Variance?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/169345#432382</link>
      <author>Roger Stafford</author>
      <description>"saneman" &amp;lt;asd@ad.com&amp;gt; wrote in message &amp;lt;g0iag5$m5r$1@news.net.uni-&lt;br&gt;
c.dk&amp;gt;...&lt;br&gt;
&amp;gt; How is it possible to use these functions on the data without supplying &lt;br&gt;
&amp;gt; information about the probabilities? &lt;br&gt;
---------&lt;br&gt;
&amp;nbsp;&amp;nbsp;The 'mean', 'var', and 'std' functions all make the assumption that all values &lt;br&gt;
they receive in vectors are the result of some stationary random process.  They &lt;br&gt;
apply equal weighting to each data point and simply produce what is termed &lt;br&gt;
"sample" means and variances.  The probabilities involved in the random &lt;br&gt;
process can be of any statistical kind.  Of course these samples means and &lt;br&gt;
variances are only estimations for the true underlying probabilistic means and &lt;br&gt;
variances, but they are the best that can be obtained from a finite sample.  If you &lt;br&gt;
have two sample results of 2 and 4, the best you can do is to estimate the mean &lt;br&gt;
as 3, but for such a small sample, this is an unreliable estimate.  If you have a &lt;br&gt;
billion values and their average is 3, that is much more likely to be close to the &lt;br&gt;
correct answer.&lt;br&gt;
&lt;br&gt;
Roger Stafford&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Fri, 16 May 2008 10:45:51 -0400</pubDate>
      <title>Re: Variance?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/169345#432459</link>
      <author>saneman</author>
      <description>&lt;br&gt;
"Roger Stafford" &amp;lt;ellieandrogerxyzzy@mindspring.com.invalid&amp;gt; skrev i en &lt;br&gt;
meddelelse news:g0idmt$id$1@fred.mathworks.com...&lt;br&gt;
&amp;gt; "saneman" &amp;lt;asd@ad.com&amp;gt; wrote in message &amp;lt;g0iag5$m5r$1@news.net.uni-&lt;br&gt;
&amp;gt; c.dk&amp;gt;...&lt;br&gt;
&amp;gt;&amp;gt; How is it possible to use these functions on the data without supplying&lt;br&gt;
&amp;gt;&amp;gt; information about the probabilities?&lt;br&gt;
&amp;gt; ---------&lt;br&gt;
&amp;gt;  The 'mean', 'var', and 'std' functions all make the assumption that all &lt;br&gt;
&amp;gt; values&lt;br&gt;
&amp;gt; they receive in vectors are the result of some stationary random process. &lt;br&gt;
&amp;gt; They&lt;br&gt;
&amp;gt; apply equal weighting to each data point and simply produce what is termed&lt;br&gt;
&amp;gt; "sample" means and variances.  The probabilities involved in the random&lt;br&gt;
&amp;gt; process can be of any statistical kind.  Of course these samples means and&lt;br&gt;
&amp;gt; variances are only estimations for the true underlying probabilistic means &lt;br&gt;
&amp;gt; and&lt;br&gt;
&amp;gt; variances, but they are the best that can be obtained from a finite &lt;br&gt;
&amp;gt; sample.  If you&lt;br&gt;
&amp;gt; have two sample results of 2 and 4, the best you can do is to estimate the &lt;br&gt;
&amp;gt; mean&lt;br&gt;
&amp;gt; as 3, but for such a small sample, this is an unreliable estimate.  If you &lt;br&gt;
&amp;gt; have a&lt;br&gt;
&amp;gt; billion values and their average is 3, that is much more likely to be &lt;br&gt;
&amp;gt; close to the&lt;br&gt;
&amp;gt; correct answer.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Roger Stafford&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;&lt;br&gt;
&lt;br&gt;
But is the standard deviation (std) an absolute value? As I understand:&lt;br&gt;
&lt;br&gt;
min value &amp;lt; std &amp;lt; max value&lt;br&gt;
&lt;br&gt;
If I only know that std = 4.5 is it possible to say anything about the &lt;br&gt;
density of the data? It seems that it only makes sense to compare std with &lt;br&gt;
other datasets. &lt;br&gt;
&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Fri, 16 May 2008 17:29:02 -0400</pubDate>
      <title>Re: Variance?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/169345#432561</link>
      <author>Roger Stafford</author>
      <description>"saneman" &amp;lt;asd@ad.com&amp;gt; wrote in message &amp;lt;g0jok6$qms&lt;br&gt;
$1@news.net.uni-c.dk&amp;gt;...&lt;br&gt;
&amp;gt; But is the standard deviation (std) an absolute value? As I understand:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; min value &amp;lt; std &amp;lt; max value&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; If I only know that std = 4.5 is it possible to say anything about the &lt;br&gt;
&amp;gt; density of the data? It seems that it only makes sense to compare std with &lt;br&gt;
&amp;gt; other datasets. &lt;br&gt;
-------------&lt;br&gt;
&amp;nbsp;&amp;nbsp;If by "value" you mean the values that are input to 'std', then it is certainly &lt;br&gt;
NOT true that&lt;br&gt;
&lt;br&gt;
&amp;nbsp;min value &amp;lt; std &amp;lt; max value&lt;br&gt;
&lt;br&gt;
I don't know where you might have gotten that idea.  Standard deviation only &lt;br&gt;
has to do with differences among values.  Remember the word 'deviation'.  It &lt;br&gt;
means what it says.  It is the square root of the unbiased mean value of the &lt;br&gt;
squares of the differences between each element and their mean value.  &lt;br&gt;
Therefore it is totally unrelated to the values themselves but only to &lt;br&gt;
differences among them.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;To illustrate this, here is a standard deviation calculation that can be done &lt;br&gt;
with pen and paper.  Let x1 = 1000000, x2 = 1000002, x3 = 1000004, and &lt;br&gt;
x4 = 1000006.  The mean value of these four numbers is:&lt;br&gt;
&lt;br&gt;
&amp;nbsp;(1000000+1000002+1000004+1000006)/4 = 1000003&lt;br&gt;
&lt;br&gt;
The unbiased mean of the squares of the differences between them and this &lt;br&gt;
mean is&lt;br&gt;
&lt;br&gt;
&amp;nbsp;((1000000-1000003)^2 + (1000002-1000003)^2 + ...&lt;br&gt;
&amp;nbsp;&amp;nbsp;(1000004-1000003)^2 + (1000006-1000003)^2)/3 =&lt;br&gt;
&amp;nbsp;((-3)^2 + (-1)^2 + 1^2 + 3^2)/3 = (9+1+1+9)/3 = 20/3 = 6.6667&lt;br&gt;
&lt;br&gt;
This is the unbiased variance.  The standard deviation is the square root of &lt;br&gt;
this:&lt;br&gt;
&lt;br&gt;
&amp;nbsp;std = sqrt(6.6667) = 2.5820&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;Notice that the figure 2.5820 is totally unrelated to the sizes 1000000, &lt;br&gt;
1000002, 1000004, and 1000006, but only to the "typical" magnitude of &lt;br&gt;
their differences from their mean value, 1000003, namely, 3, 1, 1, and 3.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;If you are going to be doing much work with statistical entities, I would &lt;br&gt;
strongly recommend an extended perusal of a good elementary book on the &lt;br&gt;
subject so as to obtain a better grasp of some of these notions.  You have a &lt;br&gt;
number of mistaken ideas that need to be resolved.&lt;br&gt;
&lt;br&gt;
Roger Stafford&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
</description>
    </item>
  </channel>
</rss>
