Code covered by the BSD License  

Highlights from
Fisher’s Exact Test

2.0

2.0 | 1 rating Rate this file 34 Downloads (last 30 days) File Size: 6.01 KB File ID: #22550

Fisher’s Exact Test

by Michael Boedigheimer

 

29 Dec 2008 (Updated 19 Apr 2012)

Hypergeometric cumulative distribution (left, right and two-tailed) including permutation testing.

| Watch this File

File Information
Description

Matlab's implementation of Fisher's exact test only supports one-tailed tests. Other implementations on the file exchange are reportedly buggy and are far too slow for problems I'm working on. This implementation does 2.5e6 one or two-tailed tests per second and is ideally suited for genome-wide association studies or other large problems where 100s of samples and 10s to 100s of 1000s of tests are being performed.
It has been verified correct for samples sizes up to 100 and spot checked beyond that up to 1000s of samples.

MATLAB release MATLAB 7.11 (2010b)
Other requirements Requires a large amount of memory for genome wide assocation studies (>=4Gb)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (7)
13 Jan 2011 James Herman

I'm really confused as to how to use this function. The help is not clear at all :-(

I've got a contingency table, but it's entirely unclear how I should translate it into the variables: a,M,K,N.

Please help!

16 Jun 2011 Tobias

I second James Herman's comment. It would be great if someone could explain how to input a contingency table into this function.

16 Jun 2011 Michael Boedigheimer

fexact can use vector inputs and it will calculate contingency tables for you, or it can take the same input as Matlab's hygecdf. For your convenience, from fexact's help, here is a 2x2 contingency table, with a,b,c and d being integer counts
a c
b d

K = a+c;
N = a+b;
M = a+b+c+d

then use fexact( a,M,K,N)
enjoy

17 Jan 2012 Giuseppe Cardillo

I compared your function with the mine using this matrix:
x=[7 12; 8 2].*1000000

Using Myfisher22(x), the function written by me, the result was:
Elapsed time is 5.715080 seconds.

Using your
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) the results were:

??? Maximum variable size allowed by the program is exceeded.

Error in ==> fexact>tri2sqind at 325
    k = (1:max_k)';
Error in ==> fexact>getLookup at 211
[k n ind] = tri2sqind(M);
Error in ==> fexact at 142
    P = getLookup( M,N,tail);

Same results reducing the complexity to
x=[7 12; 8 2].*100000 and
x=[7 12; 8 2].*10000

When I tried on x=[7 12; 8 2].*1000 the pc looped and after 15 minutes I switched it off.

Finally I tried it on x=[7 12; 8 2].*100
Elapsed time is 6.019058 seconds; using Myfisher22 on this "simple" matrix
Elapsed time is 0.008264 seconds.

So before to write "Other implementations on the file exchange are reportedly buggy and are far too slow for problems I'm working on."
I think that you should test deeper your function.

Regards

17 Jan 2012 Michael Boedigheimer

Giuseppe
I am sorry you are having difficulty with the function. My function is very well tested and as far as I can tell works as described. The problem I am working on is detailed in the documentation of the function. I have hundreds of samples and 10s of thousands or more replicates. In the example you give there are many more samples than that. In your smallest example you have 2900. In that case solving for 1 replicate takes 3s on my computer, but so does solving 10,000 replicates.
y = unidrnd(2,2900,1)-1;
x = unidrnd(2, 2900, 1)-1;
Elapsed time is 2.984087 seconds.
>> x = unidrnd(2, 2900, 10000)-1;
>> tic; z = fexact(x,y); toc
Elapsed time is 2.947621 seconds.

I mean no disrespect to you or your many contributions. I just didn't see how your program can be used to solve the problem I just described.

18 Apr 2012 Vince

Thank you for very much contributing this function to the community! I'd like to report some cases I've found that are giving nonsensical p-values through 2 examples:
- Case 1:
>> a = 8, M = 24, K = 24, N = 8;
>> fexact( a, M, K, N, 'tail', 'b' )

ans =

    2.0000

>> fexact( a, M, K, N, 'tail', 'l' )

ans =

  256.0000

>> fexact( a, M, K, N, 'tail', 'r' )

ans =

     1

- Case 2:
>> a = 6, M = 24, K = 18, N = 8
>> fexact( a, M, K, N, 'tail', 'b' )

ans =

    1.0001

>> fexact( a, M, K, N, 'tail', 'r' )

ans =

    0.6977

>> fexact( a, M, K, N, 'tail', 'l' )

ans =

    0.6809

Kind Regards,

Vince

18 Apr 2012 Michael Boedigheimer

Vince,

Thank you for showing these examples. I understand how you could lose confidence when a p-value is larger than 1. I have reviewed the procedure and believe you can safely round the p-value down to 1. I have or will have submitted a new version of the code that will no longer return values greater than 1. The problem should affect only the scalar version of the routines and not the vectorized version. I believe all the p-values in your examples that were less than one are correct. Please let me know if you have other questions or concerns.

Mike

Please login to add a comment or rating.
Updates
23 Dec 2011

Improved documentation and help. Improved performance for single tests.

17 Jan 2012

Now can handle large number of observations in a single trial with minimal footprint.
If X = [7.9 7.911; 8 8]*1e6
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) returns 0.0498 for a two-sided test;

17 Jan 2012

Has been updated to work with a large sample size if performing a single test.

19 Apr 2012

Fixed a reporting error that may affect results when a single test is done. The error that allowed p-values to be larger than 1 to be reported. p-values smaller than one should not be affected.

19 Apr 2012

fixed bug in detecting error conditions

Tag Activity for this File
Tag Applied By Date/Time
probability Cristina McIntire 30 Dec 2008 15:38:17
hypergeometric Michael Boedigheimer 30 Dec 2008 15:38:31
genetics Michael Boedigheimer 30 Dec 2008 15:38:31
biotech Michael Boedigheimer 30 Dec 2008 15:38:31
statistics Michael Boedigheimer 30 Dec 2008 15:38:31
statistics Zhen Qian 04 Jun 2009 12:11:32
fisher exact Michael Boedigheimer 27 Dec 2011 08:41:18
contingency Michael Boedigheimer 27 Dec 2011 08:41:18
hypergeometric John 07 May 2012 15:26:32

Contact us at files@mathworks.com