Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: Variance?

Subject: Variance?

From: saneman

Date: 15 May, 2008 20:43:21

Message: 1 of 6

I have a vector that contains:

v = 0.5677 0.4792 0.4844 0.4870 0.5104 0.4870 0.4792
0.4974 0.4688 0.4870

Now I would like to know how much this data varies. I was thinking:

a = max(v) - min(v)

But if just one sample is very different (like 10.0) then the above
procedure will not give a realistic result.

I have also tried to use the matlab var function:

>> var(v)

ans =

  7.7977e-004

But 0.0007 is not where most of the data belongs. Is there better approach
to this problem?


Subject: Re: Variance?

From: Roger Stafford

Date: 15 May, 2008 21:24:02

Message: 2 of 6

"saneman" <asd@ad.com> wrote in message <g0i78i$ltc$1@news.net.uni-
c.dk>...
> I have a vector that contains:
>
> v = 0.5677 0.4792 0.4844 0.4870 0.5104 0.4870 0.4792
> 0.4974 0.4688 0.4870
>
> Now I would like to know how much this data varies. I was thinking:
>
> a = max(v) - min(v)
>
> But if just one sample is very different (like 10.0) then the above
> procedure will not give a realistic result.
>
> I have also tried to use the matlab var function:
>
> >> var(v)
>
> ans =
>
> 7.7977e-004
>
> But 0.0007 is not where most of the data belongs. Is there better approach
> to this problem?
------------
  Remember, the 'var' function returns the mean of the *squares* of the
differences between your numbers and their mean value. Your differences
from their mean are somewhere in the neighborhood of .03 so the mean
square of these differences would be in the neighborhood of .0009. (Actually
you got .0008 .) To get a value which is comparable to these differences, you
should either call on 'std' or take the square root of the variance.

Roger Stafford


Subject: Re: Variance?

From: saneman

Date: 15 May, 2008 21:38:36

Message: 3 of 6


"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> skrev i en
meddelelse news:g0i9li$qum$1@fred.mathworks.com...
> "saneman" <asd@ad.com> wrote in message <g0i78i$ltc$1@news.net.uni-
> c.dk>...
>> I have a vector that contains:
>>
>> v = 0.5677 0.4792 0.4844 0.4870 0.5104 0.4870 0.4792
>> 0.4974 0.4688 0.4870
>>
>> Now I would like to know how much this data varies. I was thinking:
>>
>> a = max(v) - min(v)
>>
>> But if just one sample is very different (like 10.0) then the above
>> procedure will not give a realistic result.
>>
>> I have also tried to use the matlab var function:
>>
>> >> var(v)
>>
>> ans =
>>
>> 7.7977e-004
>>
>> But 0.0007 is not where most of the data belongs. Is there better
>> approach
>> to this problem?
> ------------
> Remember, the 'var' function returns the mean of the *squares* of the
> differences between your numbers and their mean value. Your differences
> from their mean are somewhere in the neighborhood of .03 so the mean
> square of these differences would be in the neighborhood of .0009.
> (Actually
> you got .0008 .) To get a value which is comparable to these differences,
> you
> should either call on 'std' or take the square root of the variance.
>

How is it possible to use these functions on the data without supplying
information about the probabilities?


Subject: Re: Variance?

From: Roger Stafford

Date: 15 May, 2008 22:33:02

Message: 4 of 6

"saneman" <asd@ad.com> wrote in message <g0iag5$m5r$1@news.net.uni-
c.dk>...
> How is it possible to use these functions on the data without supplying
> information about the probabilities?
---------
  The 'mean', 'var', and 'std' functions all make the assumption that all values
they receive in vectors are the result of some stationary random process. They
apply equal weighting to each data point and simply produce what is termed
"sample" means and variances. The probabilities involved in the random
process can be of any statistical kind. Of course these samples means and
variances are only estimations for the true underlying probabilistic means and
variances, but they are the best that can be obtained from a finite sample. If you
have two sample results of 2 and 4, the best you can do is to estimate the mean
as 3, but for such a small sample, this is an unreliable estimate. If you have a
billion values and their average is 3, that is much more likely to be close to the
correct answer.

Roger Stafford


Subject: Re: Variance?

From: saneman

Date: 16 May, 2008 10:45:51

Message: 5 of 6


"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> skrev i en
meddelelse news:g0idmt$id$1@fred.mathworks.com...
> "saneman" <asd@ad.com> wrote in message <g0iag5$m5r$1@news.net.uni-
> c.dk>...
>> How is it possible to use these functions on the data without supplying
>> information about the probabilities?
> ---------
> The 'mean', 'var', and 'std' functions all make the assumption that all
> values
> they receive in vectors are the result of some stationary random process.
> They
> apply equal weighting to each data point and simply produce what is termed
> "sample" means and variances. The probabilities involved in the random
> process can be of any statistical kind. Of course these samples means and
> variances are only estimations for the true underlying probabilistic means
> and
> variances, but they are the best that can be obtained from a finite
> sample. If you
> have two sample results of 2 and 4, the best you can do is to estimate the
> mean
> as 3, but for such a small sample, this is an unreliable estimate. If you
> have a
> billion values and their average is 3, that is much more likely to be
> close to the
> correct answer.
>
> Roger Stafford
>
>

But is the standard deviation (std) an absolute value? As I understand:

min value < std < max value

If I only know that std = 4.5 is it possible to say anything about the
density of the data? It seems that it only makes sense to compare std with
other datasets.


Subject: Re: Variance?

From: Roger Stafford

Date: 16 May, 2008 17:29:02

Message: 6 of 6

"saneman" <asd@ad.com> wrote in message <g0jok6$qms
$1@news.net.uni-c.dk>...
> But is the standard deviation (std) an absolute value? As I understand:
>
> min value < std < max value
>
> If I only know that std = 4.5 is it possible to say anything about the
> density of the data? It seems that it only makes sense to compare std with
> other datasets.
-------------
  If by "value" you mean the values that are input to 'std', then it is certainly
NOT true that

 min value < std < max value

I don't know where you might have gotten that idea. Standard deviation only
has to do with differences among values. Remember the word 'deviation'. It
means what it says. It is the square root of the unbiased mean value of the
squares of the differences between each element and their mean value.
Therefore it is totally unrelated to the values themselves but only to
differences among them.

  To illustrate this, here is a standard deviation calculation that can be done
with pen and paper. Let x1 = 1000000, x2 = 1000002, x3 = 1000004, and
x4 = 1000006. The mean value of these four numbers is:

 (1000000+1000002+1000004+1000006)/4 = 1000003

The unbiased mean of the squares of the differences between them and this
mean is

 ((1000000-1000003)^2 + (1000002-1000003)^2 + ...
  (1000004-1000003)^2 + (1000006-1000003)^2)/3 =
 ((-3)^2 + (-1)^2 + 1^2 + 3^2)/3 = (9+1+1+9)/3 = 20/3 = 6.6667

This is the unbiased variance. The standard deviation is the square root of
this:

 std = sqrt(6.6667) = 2.5820

  Notice that the figure 2.5820 is totally unrelated to the sizes 1000000,
1000002, 1000004, and 1000006, but only to the "typical" magnitude of
their differences from their mean value, 1000003, namely, 3, 1, 1, and 3.

  If you are going to be doing much work with statistical entities, I would
strongly recommend an extended perusal of a good elementary book on the
subject so as to obtain a better grasp of some of these notions. You have a
number of mistaken ideas that need to be resolved.

Roger Stafford


Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics