Code covered by the BSD License  

Highlights from
Fisher’s Exact Test

Be the first to rate this file! 56 Downloads (last 30 days) File Size: 5.82 KB File ID: #22550

Fisher’s Exact Test

by Michael Boedigheimer

 

29 Dec 2008 (Updated 17 Jan 2012)

Hypergeometric cumulative distribution (left, right and two-tailed) including permutation testing.

| Watch this File

File Information
Description

Matlab's implementation of Fisher's exact test only supports one-tailed tests. Other implementations on the file exchange are reportedly buggy and are far too slow for problems I'm working on. This implementation does 2.5e6 one or two-tailed tests per second and is ideally suited for genome-wide association studies or other large problems where 100s of samples and 10s to 100s of 1000s of tests are being performed.
It has been verified correct for samp les sizes up to 100 and spot checked beyond that up to 1000s of samples.

MATLAB release MATLAB 7.11 (2010b)
Other requirements Requires a large amount of memory for genome wide assocation studies (>=4Gb)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (5)
13 Jan 2011 James Herman

I'm really confused as to how to use this function. The help is not clear at all :-(

I've got a contingency table, but it's entirely unclear how I should translate it into the variables: a,M,K,N.

Please help!

16 Jun 2011 Tobias

I second James Herman's comment. It would be great if someone could explain how to input a contingency table into this function.

16 Jun 2011 Michael Boedigheimer

fexact can use vector inputs and it will calculate contingency tables for you, or it can take the same input as Matlab's hygecdf. For your convenience, from fexact's help, here is a 2x2 contingency table, with a,b,c and d being integer counts
a c
b d

K = a+c;
N = a+b;
M = a+b+c+d

then use fexact( a,M,K,N)
enjoy

17 Jan 2012 Giuseppe Cardillo

I compared your function with the mine using this matrix:
x=[7 12; 8 2].*1000000

Using Myfisher22(x), the function written by me, the result was:
Elapsed time is 5.715080 seconds.

Using your
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) the results were:

??? Maximum variable size allowed by the program is exceeded.

Error in ==> fexact>tri2sqind at 325
    k = (1:max_k)';
Error in ==> fexact>getLookup at 211
[k n ind] = tri2sqind(M);
Error in ==> fexact at 142
    P = getLookup( M,N,tail);

Same results reducing the complexity to
x=[7 12; 8 2].*100000 and
x=[7 12; 8 2].*10000

When I tried on x=[7 12; 8 2].*1000 the pc looped and after 15 minutes I switched it off.

Finally I tried it on x=[7 12; 8 2].*100
Elapsed time is 6.019058 seconds; using Myfisher22 on this "simple" matrix
Elapsed time is 0.008264 seconds.

So before to write "Other implementations on the file exchange are reportedly buggy and are far too slow for problems I'm working on."
I think that you should test deeper your function.

Regards

17 Jan 2012 Michael Boedigheimer

Giuseppe
I am sorry you are having difficulty with the function. My function is very well tested and as far as I can tell works as described. The problem I am working on is detailed in the documentation of the function. I have hundreds of samples and 10s of thousands or more replicates. In the example you give there are many more samples than that. In your smallest example you have 2900. In that case solving for 1 replicate takes 3s on my computer, but so does solving 10,000 replicates.
y = unidrnd(2,2900,1)-1;
x = unidrnd(2, 2900, 1)-1;
Elapsed time is 2.984087 seconds.
>> x = unidrnd(2, 2900, 10000)-1;
>> tic; z = fexact(x,y); toc
Elapsed time is 2.947621 seconds.

I mean no disrespect to you or your many contributions. I just didn't see how your program can be used to solve the problem I just described.

Please login to add a comment or rating.
Updates
23 Dec 2011

Improved documentation and help. Improved performance for single tests.

17 Jan 2012

Now can handle large number of observations in a single trial with minimal footprint.
If X = [7.9 7.911; 8 8]*1e6
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) returns 0.0498 for a two-sided test;

17 Jan 2012

Has been updated to work with a large sample size if performing a single test.

Tag Activity for this File
Tag Applied By Date/Time
probability Cristina McIntire 30 Dec 2008 15:38:17
hypergeometric Michael Boedigheimer 30 Dec 2008 15:38:31
genetics Michael Boedigheimer 30 Dec 2008 15:38:31
biotech Michael Boedigheimer 30 Dec 2008 15:38:31
statistics Michael Boedigheimer 30 Dec 2008 15:38:31
statistics Zhen Qian 04 Jun 2009 12:11:32
fisher exact Michael Boedigheimer 27 Dec 2011 08:41:18
contingency Michael Boedigheimer 27 Dec 2011 08:41:18

Contact us at files@mathworks.com