Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!v36g2000yqv.googlegroups.com!not-for-mail
From: arun <aragorn168b@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: p-values
Date: Tue, 3 Nov 2009 09:20:00 -0800 (PST)
Organization: http://groups.google.com
Lines: 46
Message-ID: <68f39d38-3dd5-4bce-a514-3950873fbccd@v36g2000yqv.googlegroups.com>
References: <f5889743-7afb-4534-ae77-66c0026c0ce2@c3g2000yqd.googlegroups.com> 
	<hcnlmp$o19$1@fred.mathworks.com>
NNTP-Posting-Host: 192.124.26.250
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1257268800 9773 127.0.0.1 (3 Nov 2009 17:20:00 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 3 Nov 2009 17:20:00 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: v36g2000yqv.googlegroups.com; posting-host=192.124.26.250; 
	posting-account=fyqXpgoAAABqt-0BifyaNxmZhzggFACu
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; 
	rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4,gzip(gfe),gzip(gfe)
Xref: news.mathworks.com comp.soft-sys.matlab:582100


On Nov 2, 11:15 pm, "Tom Lane" <tl...@mathworks.com> wrote:
> > I have 2 random variables x and y. I calculate the correlation between
> > x and y. Then I permute y a 1000 times and compute the correlation
> > each time between x and permuted y (bootstrap approach). Could anyone
> > suggest how to compute the p-values from this??
>
> Well, the corr (Statistics Toolbox) and corrcoef (MATLAB) functions will
> compute p-values for you:
>
> >> x = randn(10,1);
> >> y = .6*x + randn(size(x));
> >> [r,p] = corr(x,y)
>
> r =
>     0.7564
> p =
>     0.0114
>
> But if you want to do this by simulation, notice that if the y values are
> permuted randomly, there should be no correlation with x. This gives you a
> random set of sample correlations with a distribution under the null
> hypothesis of no correlation. You could just see what proportion of them
> exceed the actual correlation you measured for your data:
>
> >> rv = zeros(1000,1);
> >> for j=1:1000; rv(j) = corr(x,y(randperm(numel(y)))); end
> >> mean(abs(rv)>.7564)
>
> ans =
>     0.0110
>
> -- Tom

Hi Tom,
I understand. I have another question which is a little more deeper
than this. Suppose I have two vectors x1 and x2 and another vector y,
now if x1 and x2 are independent of each other, (meaning corr(x,y) =
0, say), then I could find the correlation between my so called
"features" x1 and x2 and "label" y separately in a straightforward
fashion. However, my question is how to find the correlation if x1 and
x2 are indeed dependent on each other. Wouldn't the correlation
measure in this case calculated as corr(x1,y) and corr(x2,y) be biased
or incorrect in this case??
thank you,
best, arun.