File Exchange

image thumbnail

fexact( varargin )

version 1.12 (15.8 KB) by

Hypergeometric cumulative distribution (left, right and two-tailed) including permutation testing.

3 Downloads

Updated

View License

Matlab now (2014b) has a two-tailed Fisher exact test (fishertest)! However, I still need a version that can be vectorized to perform 1000s of tests. This implementation performs 2.5e6 one or two-tailed tests per second and is ideally suited for genome-wide association studies or other large problems where 100s of samples and up to 100s of 1000s of tests are being performed.
It has been verified correct for samples sizes up to 100 and spot checked beyond that up to 1000s of samples.

Comments and Ratings (13)

ind2logical routine is missing.
I suppose it should be something like:

function logind = ind2logical(ind, len)
logind = false(1, len);
logind(ind) = true;

Nolan Conaway

Yoshiaki

Thank you very much.
I will try.

Yosiaki. Sorry you are having difficulty. inputParser is a built-in Matlab function that was introduced around 2010. As stated in the information section of this function, the program was tested on Matlab 2010b. You will need to upgrade your Matlab software to a more recent version.

Yoshiaki

I use Matlab (R2006b).
When I use fexact.m, an error message, "inputParser is not defined" is displayed.
Please tell me, how to settle the problem.

Minli

Minli (view profile)

very useful and well written. Thank you !

Vince,

Thank you for showing these examples. I understand how you could lose confidence when a p-value is larger than 1. I have reviewed the procedure and believe you can safely round the p-value down to 1. I have or will have submitted a new version of the code that will no longer return values greater than 1. The problem should affect only the scalar version of the routines and not the vectorized version. I believe all the p-values in your examples that were less than one are correct. Please let me know if you have other questions or concerns.

Mike

Vince

Vince (view profile)

Thank you for very much contributing this function to the community! I'd like to report some cases I've found that are giving nonsensical p-values through 2 examples:
- Case 1:
>> a = 8, M = 24, K = 24, N = 8;
>> fexact( a, M, K, N, 'tail', 'b' )

ans =

    2.0000

>> fexact( a, M, K, N, 'tail', 'l' )

ans =

  256.0000

>> fexact( a, M, K, N, 'tail', 'r' )

ans =

     1

- Case 2:
>> a = 6, M = 24, K = 18, N = 8
>> fexact( a, M, K, N, 'tail', 'b' )

ans =

    1.0001

>> fexact( a, M, K, N, 'tail', 'r' )

ans =

    0.6977

>> fexact( a, M, K, N, 'tail', 'l' )

ans =

    0.6809

Kind Regards,

Vince

Giuseppe
I am sorry you are having difficulty with the function. My function is very well tested and as far as I can tell works as described. The problem I am working on is detailed in the documentation of the function. I have hundreds of samples and 10s of thousands or more replicates. In the example you give there are many more samples than that. In your smallest example you have 2900. In that case solving for 1 replicate takes 3s on my computer, but so does solving 10,000 replicates.
y = unidrnd(2,2900,1)-1;
x = unidrnd(2, 2900, 1)-1;
Elapsed time is 2.984087 seconds.
>> x = unidrnd(2, 2900, 10000)-1;
>> tic; z = fexact(x,y); toc
Elapsed time is 2.947621 seconds.

I mean no disrespect to you or your many contributions. I just didn't see how your program can be used to solve the problem I just described.

Giuseppe Cardillo

I compared your function with the mine using this matrix:
x=[7 12; 8 2].*1000000

Using Myfisher22(x), the function written by me, the result was:
Elapsed time is 5.715080 seconds.

Using your
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) the results were:

??? Maximum variable size allowed by the program is exceeded.

Error in ==> fexact>tri2sqind at 325
    k = (1:max_k)';
Error in ==> fexact>getLookup at 211
[k n ind] = tri2sqind(M);
Error in ==> fexact at 142
    P = getLookup( M,N,tail);

Same results reducing the complexity to
x=[7 12; 8 2].*100000 and
x=[7 12; 8 2].*10000

When I tried on x=[7 12; 8 2].*1000 the pc looped and after 15 minutes I switched it off.

Finally I tried it on x=[7 12; 8 2].*100
Elapsed time is 6.019058 seconds; using Myfisher22 on this "simple" matrix
Elapsed time is 0.008264 seconds.

So before to write "Other implementations on the file exchange are reportedly buggy and are far too slow for problems I'm working on."
I think that you should test deeper your function.

Regards

fexact can use vector inputs and it will calculate contingency tables for you, or it can take the same input as Matlab's hygecdf. For your convenience, from fexact's help, here is a 2x2 contingency table, with a,b,c and d being integer counts
a c
b d

K = a+c;
N = a+b;
M = a+b+c+d

then use fexact( a,M,K,N)
enjoy

Tobias

Tobias (view profile)

I second James Herman's comment. It would be great if someone could explain how to input a contingency table into this function.

I'm really confused as to how to use this function. The help is not clear at all :-(

I've got a contingency table, but it's entirely unclear how I should translate it into the variables: a,M,K,N.

Please help!

Updates

1.12

Improved speed for large number of observations.
Slightly improved performance for large number of separate tests.
Updated documentation

1.11

Improved speed for large number of observations.
Slightly improved performance for large number of separate tests.
Updated documentation

1.10

Chris Rorden added support for single-sided Leibermeister tests, which corrects for uncertaintity in the marginal sums.
Fixed bug in some two-tailed tests when performing tests on a single observation (doesn't affect vectorized tests)

1.8

changed to allow permutation testing in 2012a. Added a graphic to help explain contingency table

1.6

fixed bug in detecting error conditions

1.5

Fixed a reporting error that may affect results when a single test is done. The error that allowed p-values to be larger than 1 to be reported. p-values smaller than one should not be affected.

1.4

Has been updated to work with a large sample size if performing a single test.

1.3

Now can handle large number of observations in a single trial with minimal footprint.
If X = [7.9 7.911; 8 8]*1e6
fexact(x(1),sum(x(:)),x(1)+x(3),x(1)+x(2)) returns 0.0498 for a two-sided test;

1.2

Improved documentation and help. Improved performance for single tests.

MATLAB Release
MATLAB 8.2 (R2013b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video