MATLAB Answers

NaN in ranksum test

32 views (last 30 days)
I know there is a nanmedian function, but it seems there is no nanranksum function. I have tried the ranksum function with and without NaN values filling in empty data and I get different p values. Am I wrong? Is there a Wilcoxon ranksum / Mann-Whitney U test that ignores NaN values?
Thanks for your help!

Accepted Answer

Daniel Golden
Daniel Golden on 31 May 2013
I have the same problem in Matlab R2012a. My version of the Matlab ranksum() documentation doesn't mention anything about NaNs, but the latest (R2013a) documentation at states, "ranksum treats NaNs in x and y as missing values and ignores them." This is obviously not true in R2012a, where there is probably a bug in the treatment of NaN values. For example try the following:
>> ranksum([1 2 3 nan], [4 5 6])
ans =
>> ranksum([1 2 3], [4 5 6])
ans =
Obviously, the NaNs are not being ignored.
Other times, NaN inputs will result in NaN outputs:
K>> ranksum([-14.44 NaN 5.97 -117.55 -77.56 -45.00], [-78.59 -101.04 -26.15 -79.51 -48.10 -23.45 -42.18 -76.75 -55.42 -135.18 70.02 -57.44 -31.69 -146.01])
ans =
But this isn't consistent. For example, removing any one of the vector values in the above example, even the non-NaN values, will result in a non-NaN output.
Here's a simple workaround if your inputs might have NaNs. If your input vectors are x and y, and you're running ranksum like:
p = ranksum(x, y)
Then just run ranksum like this:
p = ranksum(x(~isnan(x)), y(~isnan(y)))

More Answers (1)

Açmae on 3 Jun 2013
@ Daniel and Eric:
In R2012b and beyond, the test:
p = ranksum(x(~isnan(x)), y(~isnan(y)))
is performed in the function RANKSUM to remove any missing data, and thus takes care of NaN's. If you are using versions older than R2012b, then this is the workaround.


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!