Code covered by the BSD License  

Highlights from
Fisher’s Exact Test

4.25

4.2 | 5 ratings Rate this file 70 Downloads (last 30 days) File Size: 6.77 KB File ID: #22550
image thumbnail

Fisher’s Exact Test

by

 

29 Dec 2008 (Updated )

Hypergeometric cumulative distribution (left, right and two-tailed) including permutation testing.

| Watch this File

File Information
Description

Matlab's implementation of Fisher's exact test only supports one-tailed tests. Other implementations on the file exchange are reportedly buggy and are far too slow for problems I'm working on. This implementation does 2.5e6 one or two-tailed tests per second and is ideally suited for genome-wide association studies or other large problems where 100s of samples and 10s to 100s of 1000s of tests are being performed.
It has been verified correct for samples sizes up to 100 and spot checked beyond that up to 1000s of samples.

MATLAB release MATLAB 8.2 (R2013b)
Other requirements Requires a large amount of memory for genome wide assocation studies (>=4Gb)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (12)
14 Apr 2013 Nolan C  
01 Oct 2012 Yoshiaki

Thank you very much.
I will try.

28 Sep 2012 Michael Boedigheimer

Yosiaki. Sorry you are having difficulty. inputParser is a built-in Matlab function that was introduced around 2010. As stated in the information section of this function, the program was tested on Matlab 2010b. You will need to upgrade your Matlab software to a more recent version.

28 Sep 2012 Yoshiaki

I use Matlab (R2006b).
When I use fexact.m, an error message, "inputParser is not defined" is displayed.
Please tell me, how to settle the problem.

28 Jun 2012 Minli

very useful and well written. Thank you !

18 Apr 2012 Michael Boedigheimer

Vince,

Thank you for showing these examples. I understand how you could lose confidence when a p-value is larger than 1. I have reviewed the procedure and believe you can safely round the p-value down to 1. I have or will have submitted a new version of the code that will no longer return values greater than 1. The problem should affect only the scalar version of the routines and not the vectorized version. I believe all the p-values in your examples that were less than one are correct. Please let me know if you have other questions or concerns.

Mike

18 Apr 2012 Vince

Thank you for very much contributing this function to the community! I'd like to report some cases I've found that are giving nonsensical p-values through 2 examples:
- Case 1:
>> a = 8, M = 24, K = 24, N = 8;
>> fexact( a, M, K, N, 'tail', 'b' )

ans =

2.0000

>> fexact( a, M, K, N, 'tail', 'l' )

ans =

256.0000

>> fexact( a, M, K, N, 'tail', 'r' )

ans =

1

- Case 2:
>> a = 6, M = 24, K = 18, N = 8
>> fexact( a, M, K, N, 'tail', 'b' )

ans =

1.0001

>> fexact( a, M, K, N, 'tail', 'r' )

ans =

0.6977

>> fexact( a, M, K, N, 'tail', 'l' )

ans =

0.6809

Kind Regards,

Vince

17 Jan 2012 Michael Boedigheimer

Giuseppe
I am sorry you are having difficulty with the function. My function is very well tested and as far as I can tell works as described. The problem I am working on is detailed in the documentation of the function. I have hundreds of samples and 10s of thousands or more replicates. In the example you give there are many more samples than that. In your smallest example you have 2900. In that case solving for 1 replicate takes 3s on my computer, but so does solving 10,000 replicates.
y = unidrnd(2,2900,1)-1;
x = unidrnd(2, 2900, 1)-1;
Elapsed time is 2.984087 seconds.
>> x = unidrnd(2, 2900, 10000)-1;
>> tic; z = fexact(x,y); toc
Elapsed time is 2.947621 seconds.

I mean no disrespect to you or your many contributions. I just didn't see how your program can be used to solve the problem I just described.

17 Jan 2012 Giuseppe Cardillo

I compared your function with the mine using this matrix:
x=[7 12; 8 2].*1000000

Using Myfisher22(x), the function written by me, the result was:
Elapsed time is 5.715080 seconds.

Using your
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) the results were:

??? Maximum variable size allowed by the program is exceeded.

Error in ==> fexact>tri2sqind at 325
k = (1:max_k)';
Error in ==> fexact>getLookup at 211
[k n ind] = tri2sqind(M);
Error in ==> fexact at 142
P = getLookup( M,N,tail);

Same results reducing the complexity to
x=[7 12; 8 2].*100000 and
x=[7 12; 8 2].*10000

When I tried on x=[7 12; 8 2].*1000 the pc looped and after 15 minutes I switched it off.

Finally I tried it on x=[7 12; 8 2].*100
Elapsed time is 6.019058 seconds; using Myfisher22 on this "simple" matrix
Elapsed time is 0.008264 seconds.

So before to write "Other implementations on the file exchange are reportedly buggy and are far too slow for problems I'm working on."
I think that you should test deeper your function.

Regards

16 Jun 2011 Michael Boedigheimer

fexact can use vector inputs and it will calculate contingency tables for you, or it can take the same input as Matlab's hygecdf. For your convenience, from fexact's help, here is a 2x2 contingency table, with a,b,c and d being integer counts
a c
b d

K = a+c;
N = a+b;
M = a+b+c+d

then use fexact( a,M,K,N)
enjoy

16 Jun 2011 Tobias

I second James Herman's comment. It would be great if someone could explain how to input a contingency table into this function.

13 Jan 2011 James Herman

I'm really confused as to how to use this function. The help is not clear at all :-(

I've got a contingency table, but it's entirely unclear how I should translate it into the variables: a,M,K,N.

Please help!

Updates
23 Dec 2011

Improved documentation and help. Improved performance for single tests.

17 Jan 2012

Now can handle large number of observations in a single trial with minimal footprint.
If X = [7.9 7.911; 8 8]*1e6
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) returns 0.0498 for a two-sided test;

17 Jan 2012

Has been updated to work with a large sample size if performing a single test.

19 Apr 2012

Fixed a reporting error that may affect results when a single test is done. The error that allowed p-values to be larger than 1 to be reported. p-values smaller than one should not be affected.

19 Apr 2012

fixed bug in detecting error conditions

12 Jun 2012

changed to allow permutation testing in 2012a. Added a graphic to help explain contingency table

05 Dec 2013

Chris Rorden added support for single-sided Leibermeister tests, which corrects for uncertaintity in the marginal sums.
Fixed bug in some two-tailed tests when performing tests on a single observation (doesn't affect vectorized tests)

Contact us