# NaN in ranksum test

32 views (last 30 days)
Erin MacMillan on 27 Mar 2012
Hi,
I know there is a nanmedian function, but it seems there is no nanranksum function. I have tried the ranksum function with and without NaN values filling in empty data and I get different p values. Am I wrong? Is there a Wilcoxon ranksum / Mann-Whitney U test that ignores NaN values?

Daniel Golden on 31 May 2013
I have the same problem in Matlab R2012a. My version of the Matlab ranksum() documentation doesn't mention anything about NaNs, but the latest (R2013a) documentation at http://www.mathworks.com/help/stats/ranksum.html states, "ranksum treats NaNs in x and y as missing values and ignores them." This is obviously not true in R2012a, where there is probably a bug in the treatment of NaN values. For example try the following:
>> ranksum([1 2 3 nan], [4 5 6])
ans =
0.0571
>> ranksum([1 2 3], [4 5 6])
ans =
0.1000
Obviously, the NaNs are not being ignored.
Other times, NaN inputs will result in NaN outputs:
K>> ranksum([-14.44 NaN 5.97 -117.55 -77.56 -45.00], [-78.59 -101.04 -26.15 -79.51 -48.10 -23.45 -42.18 -76.75 -55.42 -135.18 70.02 -57.44 -31.69 -146.01])
ans =
NaN
But this isn't consistent. For example, removing any one of the vector values in the above example, even the non-NaN values, will result in a non-NaN output.
Here's a simple workaround if your inputs might have NaNs. If your input vectors are x and y, and you're running ranksum like:
p = ranksum(x, y)
Then just run ranksum like this:
p = ranksum(x(~isnan(x)), y(~isnan(y)))

Açmae on 3 Jun 2013
@ Daniel and Eric:
In R2012b and beyond, the test:
p = ranksum(x(~isnan(x)), y(~isnan(y)))
is performed in the function RANKSUM to remove any missing data, and thus takes care of NaN's. If you are using versions older than R2012b, then this is the workaround.