Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Old bug in corrcoef not yet fixed

Subject: Old bug in corrcoef not yet fixed

From: Pasco Alquim

Date: 8 Jan, 2009 01:14:02

Message: 1 of 7

Help info of corrcoef says that

% 'alpha' A number between 0 and 1 to specify a confidence
% level of 100*(1-ALPHA)%. Default is 0.05 for 95%
% confidence intervals.

However, the 'alpha' value is simply ignored (as far back as in R13)
e.g
Sorry for the repetition but I forgot one very important thin in the subject - the BUG word

xy = rand(10,2);
[r,p]=corrcoef(xy)

r =
    1.0000 0.3724
    0.3724 1.0000
p =

    1.0000 0.2892
    0.2892 1.0000


[r,p]=corrcoef(xy,'alpha',0.5);
r =
    1.0000 0.3724
    0.3724 1.0000
p =
    1.0000 0.2892
    0.2892 1.0000

Subject: Old bug in corrcoef not yet fixed

From: Jiro Doke

Date: 8 Jan, 2009 02:14:02

Message: 2 of 7

"Pasco Alquim" <pasquimm@yahoo.com> wrote in message <gk3k0q$hko$1@fred.mathworks.com>...
> Help info of corrcoef says that
>
> % 'alpha' A number between 0 and 1 to specify a confidence
> % level of 100*(1-ALPHA)%. Default is 0.05 for 95%
> % confidence intervals.
>
> However, the 'alpha' value is simply ignored (as far back as in R13)
> e.g
> Sorry for the repetition but I forgot one very important thin in the subject - the BUG word
>
> xy = rand(10,2);
> [r,p]=corrcoef(xy)
>
> r =
> 1.0000 0.3724
> 0.3724 1.0000
> p =
>
> 1.0000 0.2892
> 0.2892 1.0000
>
>
> [r,p]=corrcoef(xy,'alpha',0.5);
> r =
> 1.0000 0.3724
> 0.3724 1.0000
> p =
> 1.0000 0.2892
> 0.2892 1.0000

I believe you need to compare the 3rd and 4th output argument:

[r,p,rlo,rup]=corrcoef(xy)

[r,p,rlo,rup]=corrcoef(xy,'alpha',0.5)

Subject: Old bug in corrcoef not yet fixed

From: Roger Stafford

Date: 8 Jan, 2009 02:35:03

Message: 3 of 7

"Pasco Alquim" <pasquimm@yahoo.com> wrote in message <gk3k0q$hko$1@fred.mathworks.com>...
> Help info of corrcoef says that
>
> % 'alpha' A number between 0 and 1 to specify a confidence
> % level of 100*(1-ALPHA)%. Default is 0.05 for 95%
> % confidence intervals.
>
> However, the 'alpha' value is simply ignored (as far back as in R13)
> e.g
> Sorry for the repetition but I forgot one very important thin in the subject - the BUG word
>
> xy = rand(10,2);
> [r,p]=corrcoef(xy)
>
> r =
> 1.0000 0.3724
> 0.3724 1.0000
> p =
>
> 1.0000 0.2892
> 0.2892 1.0000
>
>
> [r,p]=corrcoef(xy,'alpha',0.5);
> r =
> 1.0000 0.3724
> 0.3724 1.0000
> p =
> 1.0000 0.2892
> 0.2892 1.0000

  As I understand it, the confidence level set with the 'alpha' parameter is used in computing the confidence interval bounds, 'rlo' and 'rup', returned in the third and fourth output values of 'corrcoef':

 [r,p,rlo,rup] = corrcoef(...)

It has nothing to do with the 'r' and 'p' values returned. In particular the 'p' values, which are based on a statistical normality assumption about the data, do not involve a confidence level setting. To quote Mathworks' documentation, "Each p-value is the probability of getting a correlation as large as the observed value by random chance, when the true correlation is zero." You don't need a confidence level to make such a probability assessment, just a reference to an assumed probability distribution.

  To show any differences between the default 'alpha' value .05 and the .5 value you entered, you would have to display these confidence interval bounds. Do you detect any effect there?

Roger Stafford

Subject: Old bug in corrcoef not yet fixed

From: Pasco Alquim

Date: 8 Jan, 2009 03:43:02

Message: 4 of 7

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message
> As I understand it, the confidence level set with the 'alpha' parameter is used in computing the confidence interval bounds, 'rlo' and 'rup', returned in the third and fourth output values of 'corrcoef':
>
> [r,p,rlo,rup] = corrcoef(...)
>
> It has nothing to do with the 'r' and 'p' values returned. In particular the 'p' values, which are based on a statistical normality assumption about the data, do not involve a confidence level setting. To quote Mathworks' documentation, "Each p-value is the probability of getting a correlation as large as the observed value by random chance, when the true correlation is zero." You don't need a confidence level to make such a probability assessment, just a reference to an assumed probability distribution.

-----------------
Sorry, but what I see in the docs is

[...]=corrcoef(...,'param1',val1,'param2',val2,...) specifies additional parameters and their values. Valid parameters are the following.
  'alpha' A number between 0 and 1 to specify a confidence level of 100*(1 - alpha)%. Default is 0.05 for 95% confidence intervals.

No mention that, when 'alpha' is provided, one must have 4 argouts

And why should p be the probability for the 95% confidence only? Which is in fact is what it does.

We can confirm that by hacking the code and changing the value of variable rv before it calls tpvalue. We can than use the values from the table at
http://physics.mercer.edu/Younce/pearson.html
to confirm that what is return is the 95% confidence value.

Why the preference to 95%?
Why not let the user choose the prefered confodence level, as the documentation suggests?

Subject: Old bug in corrcoef not yet fixed

From: Peter Perkins

Date: 8 Jan, 2009 07:19:49

Message: 5 of 7

Pasco Alquim wrote:

> Sorry, but what I see in the docs is
>
> [...]=corrcoef(...,'param1',val1,'param2',val2,...) specifies additional parameters and their values. Valid parameters are the following.
> 'alpha' A number between 0 and 1 to specify a confidence level of 100*(1 - alpha)%. Default is 0.05 for 95% confidence intervals.
>
> No mention that, when 'alpha' is provided, one must have 4 argouts

What exactly would you expect the confidence level argument to _do_ if you do not compute the confidence bounds?

> And why should p be the probability for the 95% confidence only? Which is in fact is what it does.

You seem to have a fundamental misunderstanding of what a p-value is. Choosing a confidence level _in advance_ for a confidence interval has absolutely nothing to do with a p-value, which is the probability of an observed event given an assumed null model. At best, you could describe the p-value as the smallest significance level at which the null hypothesis "H0: correlation = 0" would have been accepted.

The second output argument from corrcoef is exactly what the help describes it to be:

"Each p-value is the probability of getting a correlation as large as the observed value by random chance, when the true correlation is zero."

Subject: Old bug in corrcoef not yet fixed

From: Roger Stafford

Date: 8 Jan, 2009 07:28:06

Message: 6 of 7

"Pasco Alquim" <pasquimm@yahoo.com> wrote in message <gk3so5$h1e$1@fred.mathworks.com>...
> Sorry, but what I see in the docs is
>
> [...]=corrcoef(...,'param1',val1,'param2',val2,...) specifies additional parameters and their values. Valid parameters are the following.
> 'alpha' A number between 0 and 1 to specify a confidence level of 100*(1 - alpha)%. Default is 0.05 for 95% confidence intervals.
>
> No mention that, when 'alpha' is provided, one must have 4 argouts
>
> And why should p be the probability for the 95% confidence only? Which is in fact is what it does.
>
> We can confirm that by hacking the code and changing the value of variable rv before it calls tpvalue. We can than use the values from the table at
> http://physics.mercer.edu/Younce/pearson.html
> to confirm that what is return is the 95% confidence value.
>
> Why the preference to 95%?
> Why not let the user choose the prefered confodence level, as the documentation suggests?
--------------
  I strongly disagree with what you are saying about the documentation, Pasco. Where it describes the 'p' output, there is no mention of any confidence level except the indirect reference, "If P(i,j) is small, say less than 0.05, then the correlation R(i,j) is significant," and this is simply a conclusion that is left for the user to draw and not a statement that p depends upon a confidence level input. They state further that "The p-value is computed by transforming the correlation to create a t statistic having n-2 degrees of freedom, where n is the number of rows of X." There is nothing in such a computation that involves any notion of a confidence level. Your statement "why should p be the probability for the 95% confidence only?" is just plain wrong; 'p' has nothing to do with a confidence level, 95% or otherwise.

  It is only when the specific confidence bounds outputs 'rlo' and 'rup' are called for that a confidence level is needed. If you don't specify 'alpha', they assume it is .05 for 95% confidence as default. I refer you to a couple of websites below for an explanation of the procedure involved in deducing these confidence bounds from each correlation value and, given the 'alpha' level, the appropriate values for 'rlo' and 'rup'. This is the only place the 'alpha' quantity input is used. The 'p' quantities are not affected by its setting. You can do your own independent calculation using the Fisher transform and its inverse to see if Mathworks does it right. The web sites are located at:

http://en.wikipedia.org/wiki/Fisher_transformation

http://www.stat.psu.edu/online/development/stat505/06_propmean/06_propmean_print.html

I am sure there are a great many more sites that give a good explanation of this procedure. Use a Google search for something like "Fisher Transform" and "confidence levels".

Roger Stafford

Subject: Old bug in corrcoef not yet fixed

From: Pasco Alquim

Date: 8 Jan, 2009 19:16:03

Message: 7 of 7

Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message <gk49el$784$1@fred.mathworks.com>...

> You seem to have a fundamental misunderstanding of what a p-value is.

Saddly (for me) that was an important part of the problem.
Thanks (not fogetting Roger) for the help.

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us