<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264743</link>
    <title>MATLAB Central Newsreader - p-values</title>
    <description>Feed for thread: p-values</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Mon, 02 Nov 2009 12:16:27 -0500</pubDate>
      <title>p-values</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264743#691466</link>
      <author>arun</author>
      <description>Hi,&lt;br&gt;
&lt;br&gt;
I have 2 random variables x and y. I calculate the correlation between&lt;br&gt;
x and y. Then I permute y a 1000 times and compute the correlation&lt;br&gt;
each time between x and permuted y (bootstrap approach). Could anyone&lt;br&gt;
suggest how to compute the p-values from this??</description>
    </item>
    <item>
      <pubDate>Mon, 02 Nov 2009 22:15:53 -0500</pubDate>
      <title>Re: p-values</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264743#691593</link>
      <author>Tom Lane</author>
      <description>&amp;gt; I have 2 random variables x and y. I calculate the correlation between&lt;br&gt;
&amp;gt; x and y. Then I permute y a 1000 times and compute the correlation&lt;br&gt;
&amp;gt; each time between x and permuted y (bootstrap approach). Could anyone&lt;br&gt;
&amp;gt; suggest how to compute the p-values from this??&lt;br&gt;
&lt;br&gt;
Well, the corr (Statistics Toolbox) and corrcoef (MATLAB) functions will &lt;br&gt;
compute p-values for you:&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt; x = randn(10,1);&lt;br&gt;
&amp;gt;&amp;gt; y = .6*x + randn(size(x));&lt;br&gt;
&amp;gt;&amp;gt; [r,p] = corr(x,y)&lt;br&gt;
r =&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.7564&lt;br&gt;
p =&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.0114&lt;br&gt;
&lt;br&gt;
But if you want to do this by simulation, notice that if the y values are &lt;br&gt;
permuted randomly, there should be no correlation with x. This gives you a &lt;br&gt;
random set of sample correlations with a distribution under the null &lt;br&gt;
hypothesis of no correlation. You could just see what proportion of them &lt;br&gt;
exceed the actual correlation you measured for your data:&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt; rv = zeros(1000,1);&lt;br&gt;
&amp;gt;&amp;gt; for j=1:1000; rv(j) = corr(x,y(randperm(numel(y)))); end&lt;br&gt;
&amp;gt;&amp;gt; mean(abs(rv)&amp;gt;.7564)&lt;br&gt;
ans =&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;0.0110&lt;br&gt;
&lt;br&gt;
-- Tom </description>
    </item>
    <item>
      <pubDate>Tue, 03 Nov 2009 17:19:23 -0500</pubDate>
      <title>Re: p-values</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264743#691770</link>
      <author>arun</author>
      <description>On Nov 2, 11:15&#160;pm, &quot;Tom Lane&quot; &amp;lt;tl...@mathworks.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; I have 2 random variables x and y. I calculate the correlation between&lt;br&gt;
&amp;gt; &amp;gt; x and y. Then I permute y a 1000 times and compute the correlation&lt;br&gt;
&amp;gt; &amp;gt; each time between x and permuted y (bootstrap approach). Could anyone&lt;br&gt;
&amp;gt; &amp;gt; suggest how to compute the p-values from this??&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Well, the corr (Statistics Toolbox) and corrcoef (MATLAB) functions will&lt;br&gt;
&amp;gt; compute p-values for you:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; x = randn(10,1);&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; y = .6*x + randn(size(x));&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; [r,p] = corr(x,y)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; r =&lt;br&gt;
&amp;gt; &#160; &#160; 0.7564&lt;br&gt;
&amp;gt; p =&lt;br&gt;
&amp;gt; &#160; &#160; 0.0114&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; But if you want to do this by simulation, notice that if the y values are&lt;br&gt;
&amp;gt; permuted randomly, there should be no correlation with x. This gives you a&lt;br&gt;
&amp;gt; random set of sample correlations with a distribution under the null&lt;br&gt;
&amp;gt; hypothesis of no correlation. You could just see what proportion of them&lt;br&gt;
&amp;gt; exceed the actual correlation you measured for your data:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; rv = zeros(1000,1);&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; for j=1:1000; rv(j) = corr(x,y(randperm(numel(y)))); end&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; mean(abs(rv)&amp;gt;.7564)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; ans =&lt;br&gt;
&amp;gt; &#160; &#160; 0.0110&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; -- Tom&lt;br&gt;
&lt;br&gt;
Hi Tom,&lt;br&gt;
I understand. I have another question which is a little more deeper&lt;br&gt;
than this. Suppose I have two vectors x1 and x2 and another vector y,&lt;br&gt;
now if x1 and x2 are independent of each other, (meaning corr(x,y) =&lt;br&gt;
0, say), then I could find the correlation between my so called&lt;br&gt;
&quot;features&quot; x1 and x2 and &quot;label&quot; y separately in a straightforward&lt;br&gt;
fashion. However, my question is how to find the correlation if x1 and&lt;br&gt;
x2 are indeed dependent on each other. Wouldn't the correlation&lt;br&gt;
measure in this case calculated as corr(x1,y) and corr(x2,y) be biased&lt;br&gt;
or incorrect in this case??&lt;br&gt;
thank you,&lt;br&gt;
best, arun.</description>
    </item>
    <item>
      <pubDate>Tue, 03 Nov 2009 17:20:00 -0500</pubDate>
      <title>Re: p-values</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264743#691771</link>
      <author>arun</author>
      <description>On Nov 2, 11:15&#160;pm, &quot;Tom Lane&quot; &amp;lt;tl...@mathworks.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; I have 2 random variables x and y. I calculate the correlation between&lt;br&gt;
&amp;gt; &amp;gt; x and y. Then I permute y a 1000 times and compute the correlation&lt;br&gt;
&amp;gt; &amp;gt; each time between x and permuted y (bootstrap approach). Could anyone&lt;br&gt;
&amp;gt; &amp;gt; suggest how to compute the p-values from this??&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Well, the corr (Statistics Toolbox) and corrcoef (MATLAB) functions will&lt;br&gt;
&amp;gt; compute p-values for you:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; x = randn(10,1);&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; y = .6*x + randn(size(x));&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; [r,p] = corr(x,y)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; r =&lt;br&gt;
&amp;gt; &#160; &#160; 0.7564&lt;br&gt;
&amp;gt; p =&lt;br&gt;
&amp;gt; &#160; &#160; 0.0114&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; But if you want to do this by simulation, notice that if the y values are&lt;br&gt;
&amp;gt; permuted randomly, there should be no correlation with x. This gives you a&lt;br&gt;
&amp;gt; random set of sample correlations with a distribution under the null&lt;br&gt;
&amp;gt; hypothesis of no correlation. You could just see what proportion of them&lt;br&gt;
&amp;gt; exceed the actual correlation you measured for your data:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; rv = zeros(1000,1);&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; for j=1:1000; rv(j) = corr(x,y(randperm(numel(y)))); end&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; mean(abs(rv)&amp;gt;.7564)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; ans =&lt;br&gt;
&amp;gt; &#160; &#160; 0.0110&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; -- Tom&lt;br&gt;
&lt;br&gt;
Hi Tom,&lt;br&gt;
I understand. I have another question which is a little more deeper&lt;br&gt;
than this. Suppose I have two vectors x1 and x2 and another vector y,&lt;br&gt;
now if x1 and x2 are independent of each other, (meaning corr(x,y) =&lt;br&gt;
0, say), then I could find the correlation between my so called&lt;br&gt;
&quot;features&quot; x1 and x2 and &quot;label&quot; y separately in a straightforward&lt;br&gt;
fashion. However, my question is how to find the correlation if x1 and&lt;br&gt;
x2 are indeed dependent on each other. Wouldn't the correlation&lt;br&gt;
measure in this case calculated as corr(x1,y) and corr(x2,y) be biased&lt;br&gt;
or incorrect in this case??&lt;br&gt;
thank you,&lt;br&gt;
best, arun.</description>
    </item>
    <item>
      <pubDate>Tue, 03 Nov 2009 18:04:41 -0500</pubDate>
      <title>Re: p-values</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264743#691782</link>
      <author>Tom Lane</author>
      <description>&amp;gt; I understand. I have another question which is a little more deeper&lt;br&gt;
&amp;gt; than this. Suppose I have two vectors x1 and x2 and another vector y,&lt;br&gt;
&amp;gt; now if x1 and x2 are independent of each other, (meaning corr(x,y) =&lt;br&gt;
&amp;gt; 0, say), then I could find the correlation between my so called&lt;br&gt;
&amp;gt; &quot;features&quot; x1 and x2 and &quot;label&quot; y separately in a straightforward&lt;br&gt;
&amp;gt; fashion. However, my question is how to find the correlation if x1 and&lt;br&gt;
&amp;gt; x2 are indeed dependent on each other. Wouldn't the correlation&lt;br&gt;
&amp;gt; measure in this case calculated as corr(x1,y) and corr(x2,y) be biased&lt;br&gt;
&amp;gt; or incorrect in this case??&lt;br&gt;
&lt;br&gt;
Arun, I don't think I understand your concern.&lt;br&gt;
&lt;br&gt;
Suppose you are interested in corr(x1,y). I could always generate another x2 &lt;br&gt;
that is either correlated with x1 or not. How would my doing that cause your &lt;br&gt;
correlation to become biased?&lt;br&gt;
&lt;br&gt;
There is a notion of multiple correlation. Its squared value is the R^2 &lt;br&gt;
statistic for a regression. It measures the correlation between y and the &lt;br&gt;
linear combination of the x's obtained by regressing y on the x's.&lt;br&gt;
&lt;br&gt;
There's also the notion of the partial correlation, where you measure the &lt;br&gt;
correlation between two variables after &quot;removing&quot; the effect of another &lt;br&gt;
variable.&lt;br&gt;
&lt;br&gt;
I'm not sure if these two things are related to your concern, though.&lt;br&gt;
&lt;br&gt;
-- Tom </description>
    </item>
    <item>
      <pubDate>Sat, 07 Nov 2009 14:26:29 -0500</pubDate>
      <title>Re: p-values</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264743#692891</link>
      <author>arun</author>
      <description>On Nov 3, 7:04&#160;pm, &quot;Tom Lane&quot; &amp;lt;tl...@mathworks.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; I understand. I have another question which is a little more deeper&lt;br&gt;
&amp;gt; &amp;gt; than this. Suppose I have two vectors x1 and x2 and another vector y,&lt;br&gt;
&amp;gt; &amp;gt; now if x1 and x2 are independent of each other, (meaning corr(x,y) =&lt;br&gt;
&amp;gt; &amp;gt; 0, say), then I could find the correlation between my so called&lt;br&gt;
&amp;gt; &amp;gt; &quot;features&quot; x1 and x2 and &quot;label&quot; y separately in a straightforward&lt;br&gt;
&amp;gt; &amp;gt; fashion. However, my question is how to find the correlation if x1 and&lt;br&gt;
&amp;gt; &amp;gt; x2 are indeed dependent on each other. Wouldn't the correlation&lt;br&gt;
&amp;gt; &amp;gt; measure in this case calculated as corr(x1,y) and corr(x2,y) be biased&lt;br&gt;
&amp;gt; &amp;gt; or incorrect in this case??&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Arun, I don't think I understand your concern.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Suppose you are interested in corr(x1,y). I could always generate another x2&lt;br&gt;
&amp;gt; that is either correlated with x1 or not. How would my doing that cause your&lt;br&gt;
&amp;gt; correlation to become biased?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; There is a notion of multiple correlation. Its squared value is the R^2&lt;br&gt;
&amp;gt; statistic for a regression. It measures the correlation between y and the&lt;br&gt;
&amp;gt; linear combination of the x's obtained by regressing y on the x's.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; There's also the notion of the partial correlation, where you measure the&lt;br&gt;
&amp;gt; correlation between two variables after &quot;removing&quot; the effect of another&lt;br&gt;
&amp;gt; variable.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I'm not sure if these two things are related to your concern, though.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; -- Tom&lt;br&gt;
&lt;br&gt;
Hi Tom,&lt;br&gt;
Thank you once again. I am sorry for replying late. I will try to&lt;br&gt;
explain my problem in detail.&lt;br&gt;
&lt;br&gt;
I have a set of random variables [x1,x2,...xn] each of which is a m*1&lt;br&gt;
column vector. Lets call it *features*. And I have another random&lt;br&gt;
variable y (m*1 column vector) which we call *label*. My idea is to&lt;br&gt;
find out which of these features (1...n) has the maximum dependency on&lt;br&gt;
y. That is, I would like to, say, find out the top 10 most significant&lt;br&gt;
features.&lt;br&gt;
Let me explain what these features are. If you consider the features&lt;br&gt;
as a whole, it is a m*n matrix. In this, each *ROW* represents, say, a&lt;br&gt;
student from a particular region who are/aren't affected with a&lt;br&gt;
particular disorder (some of them might be from a very close region or&lt;br&gt;
from the same or very different places). For each student, the n&lt;br&gt;
entries represent a certain entry (nucleotides A,C,G or T)&lt;br&gt;
corresponding to the most important locations on the chromosome. These&lt;br&gt;
are the potential locations of interest in all these students for this&lt;br&gt;
particular disorder. Each entry of the label denotes the corresponding&lt;br&gt;
outcome, that is, if the student has this disorder or not. If you&lt;br&gt;
follow this, then each column represent one particular location of&lt;br&gt;
chromosome for all the students. So, basically, my idea is to find&lt;br&gt;
which chromosome location is very much responsible for the disorder.&lt;br&gt;
&lt;br&gt;
My problem is that, the samples (students) come from different regions&lt;br&gt;
or very close by regions which might make them a bit dependent on each&lt;br&gt;
other. This leads to a population bias which demands removing this&lt;br&gt;
dependency amongst the students while checking for the dependency of&lt;br&gt;
disorders. For example, if two students are brothers, then if the&lt;br&gt;
disorder is likely to be present in both of them. I hope its a bit&lt;br&gt;
clear, at least, now?&lt;br&gt;
&lt;br&gt;
My question is when I find the correlation over different locations of&lt;br&gt;
chromosome Vs their disorder, how do I make sure that the dependency&lt;br&gt;
of students (or subjects) are minimized?&lt;br&gt;
&lt;br&gt;
Thank you very much once again. I would appreciate it if you could&lt;br&gt;
provide me some ideas. I already read about partial correlation, may&lt;br&gt;
be it is a bit close to what I seek...&lt;br&gt;
&lt;br&gt;
best, arun.</description>
    </item>
  </channel>
</rss>

