Code covered by the BSD License  

Highlights from
mwwtest(x1,x2)

5.0
5.0 | 4 ratings Rate this file 71 Downloads (last 30 days) File Size: 4.74 KB File ID: #25830 Version: 1.4

mwwtest(x1,x2)

by

 

13 Nov 2009 (Updated )

Mann-Whitney-Wilcoxon non parametric test for two unpaired groups.

| Watch this File

File Information
Description

This file executes the non parametric Mann-Whitney-Wilcoxon test to evaluate the difference between unpaired samples. If the number of combinations is less than 20000, the algorithm calculate the exact ranks distribution; else it uses a normal distribution approximation. The result is not different from
RANKSUM MatLab function, but there are more output informations. There is an alternative formulation of this test that yields a statistic commonly denoted by U. Also the U statistic is computed.
Syntax: STATS=MWWTEST(X1,X2)

Inputs:
X1 and X2 - data vectors.
Outputs:
- T and U values and p-value when exact ranks distribution is used.
- T and U values, mean, standard deviation, Z value, and p-value when normal distribution is used.
If STATS nargout was specified the results will be stored in the STATS struct.

Example:

X1=[181 183 170 173 174 179 172 175 178 176 158 179 180 172 177];

X2=[168 165 163 175 176 166 163 174 175 173 179 180 176 167 176];

Calling on Matlab the function: mwwtest(X1,X2)

Answer is:

MANN-WHITNEY-WILCOXON TEST
---------------------------------------------------------------------------
Group 1 Group 2
numerosity 15 15
Sum of Ranks (W) 270.0 195.0
Mean rank 18.0 13.0
Test variable (U) 75.0 150.0
---------------------------------------------------------------------------
Sample size is large enough to use the normal distribution approximation
Mean 112.5
Standard deviation corrected for ties 24.0474
Z corrected for continuity 1.5386 1.5386
p-value (1-tailed) 0.06195
p-value (2-tailed) 0.12389
---------------------------------------------------------------------------

Required Products Statistics and Machine Learning Toolbox
MATLAB release MATLAB 7.6 (R2008a)
MATLAB Search Path
/
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (25)
28 Apr 2015 Remi Chaussenot

Hello Giuseppe,

Seems that the forum erase my first message and replace it by the second.

Well, i was saying many thanks, the new version is really nice, i already integrated it to my script and it working properly :)
It is really a nice file so, again, congratulations :)

Sometime, in my files, i have missing value ("NaN"), and just say that anova1() automatically exclude the NaN, your function nevertheless, crashed.

It is not a problem (because i manually remove the NaN and print a warning that i had a NaN), but if you want to improve your function, maybe add a check for NaNs... Just a proposition, because i saw that in my data :-)

But, again, for me, now your function is really perfect :-)

Thanks !!

Comment only
23 Apr 2015 Giuseppe Cardillo

I dont know what var is, but, probably, you are loosing an information. I think you should correctly compute (if it is possible) the fifth MDX var or erase it

Comment only
23 Apr 2015 Remi Chaussenot

Well, finally, i came back :)
Still working good, but i noticed something.

I use anova1 built-in function before your function.
Sometime, it happens that i have a NaN, and anova1 just delete the NaN, but your function crashed.

I will check with isnan() and delete the NaN, but if you have some time and want to improve yours, maybe you can add this thing :)

Here, is a sample of my code :
% Extraction de la colonne genotype / de la mesure pour l'ANOVA
genotype = alldata(2:end, 1)
var = ndata(1:end, j)
measurename = cell2mat(alldata(1, j+2))

[p,table,stats] = anova1(var, genotype)
clear table;

% Préparation des données pour le MWW
if isequal(cell2mat(stats.gnames(2)),'WT')
alldata_wt = [alldata(1:1,1:end);alldata(stats.n(2)+2:end,1:end)]
alldata_ko = [alldata(1:1,1:end);alldata(2:stats.n(1)+1,1:end)]
else
alldata_wt = [alldata(1:1,1:end);alldata(2:stats.n(1)+1,1:end)]
alldata_ko = [alldata(1:1,1:end);alldata(stats.n(2)+2:end,1:end)]
end;

ndata_wt = cell2mat(alldata_wt(2:end,3:end));
ndata_ko = cell2mat(alldata_ko(2:end,3:end));
var_wt = transpose(ndata_wt(1:end, j))
var_ko = transpose(ndata_ko(1:end, j))

mww = mwwtest(var_wt,var_ko)

fprintf(log, '<tr>')
fprintf(log, '<td>%s</td>',measurename)
if p > 0.05
fprintf(log, '<td>p = %f</td>',p);
fprintf(log, '<td>p = %f</td>',mww.p(2));
else
fprintf(log, '<td><style="color:red">p = %f</style></td>',p);
fprintf(log, '<td><style="color:red">p = %f</style></td>',mww.p(2));
end;
close all
fprintf(log, '</tr>')

And the result :
genotype =

'MDX'
'MDX'
'MDX'
'MDX'
'MDX'
'MDX'
'MDX'
'WT'
'WT'
'WT'
'WT'
'WT'
'WT'
'WT'

var =

2.6500
0.6900
1.1000
1.4300
NaN
0.5000
1.6900
0.5600
0.6300
0.3200
2.2900
0.6700
1.2700
1.1900

measurename =

Amp - III - 30dB
above auditory threshold

p =

0.3968

table =

'Source' 'SS' 'df' 'MS' 'F' 'Prob>F'
'Groups' [0.4033] [ 1] [0.4033] [0.7775] [0.3968]
'Error' [5.7065] [11] [0.5188] [] []
'Total' [6.1099] [12] [] [] []

stats =

gnames: {2x1 cell}
n: [6 7]
source: 'anova1'
means: [1.3433 0.9900]
df: 11
s: 0.7203

alldata_wt =

'Génotype' 'Numéro' 'Amp - III - 70dB' [1x42 char] 'L-I' 'L-II' 'L-III' 'L-IV'
'WT' [ 452] [ 1.6700] [ 0.5600] [0.7500] [1.5500] [2.4500] [3.3000]
'WT' [ 453] [ 1.2400] [ 0.6300] [0.8500] [1.6500] [2.5500] [3.3500]
'WT' [ 456] [ 0.2600] [ 0.3200] [0.7500] [1.4500] [2.1000] [2.7500]
'WT' [ 466] [ 0.2400] [ 2.2900] [0.8000] [1.2500] [1.9500] [2.5000]
'WT' [ 467] [ 2.0300] [ 0.6700] [0.8500] [1.6000] [2.5000] [3.1500]
'WT' [ 468] [ 1.2800] [ 1.2700] [0.7500] [1.5500] [2.2500] [2.9000]
'WT' [ 470] [ 2.0800] [ 1.1900] [0.8500] [1.4000] [2.4000] [ 3]

alldata_ko =

'Génotype' 'Numéro' 'Amp - III - 70dB' [1x42 char] 'L-I' 'L-II' 'L-III' 'L-IV'
'MDX' [ 464] [ 2.8500] [ 2.6500] [0.3500] [1.5000] [2.3500] [3.4500]
'MDX' [ 455] [ 2.0600] [ 0.6900] [0.7000] [1.4500] [2.4000] [3.0500]
'MDX' [ 465] [ 0.8900] [ 1.1000] [0.8000] [1.4500] [2.4500] [2.9500]
'MDX' [ 469] [ 1.1700] [ 1.4300] [0.8000] [1.4500] [2.3000] [3.1500]
'MDX' [ 471] [ 0.9900] [ NaN] [1.0500] [1.7000] [2.6000] [3.3500]
'MDX' [ 472] [ 2.1700] [ 0.5000] [0.5500] [1.4500] [2.2000] [2.8500]

var_wt =

0.5600 0.6300 0.3200 2.2900 0.6700 1.2700 1.1900

var_ko =

2.6500 0.6900 1.1000 1.4300 NaN 0.5000

Error using mwwtest (line 60)
Warning: all X1 and X2 values must be numeric and finite

Error in Routine_v1a (line 132)
mww = mwwtest(var_wt,var_ko)

Comment only
13 Apr 2015 einna onaicul

인터넷프로토♬★___★♬ IU79.COM ♬★___★

Comment only
13 Apr 2015 Giuseppe Cardillo

Dear Remi, I just updated the function after Bergmann paper reading. Now it should be ok also for your problem

Comment only
07 Apr 2015 Giuseppe Cardillo

You are welcome. to cite the function... look at the help section ;-)

Comment only
07 Apr 2015 Remi Chaussenot

Thanks again, it is clear in my mind now :)

Again, really nice job Giuseppe, good continuation :)

It is possible that i will utilise your script for my analysis, how do i quote it in my paper ?

Thanks !

Comment only
07 Apr 2015 Giuseppe Cardillo

Bergmann, Reinhard; Ludbrook, Will P. J. M.; Spooren (2000). "Different Outcomes of the Wilcoxon-Mann-Whitney Test from Different Statistics Packages". The American Statistician 54 (1): 72–77

Comment only
07 Apr 2015 Giuseppe Cardillo

Considering that the test variable is the same, I think that the procedure for exact distribution is slight different between mine function and ranksum.

Comment only
07 Apr 2015 Remi Chaussenot

Dear Giuseppe,

Thank you for your very quick answer :-)

Can you explain a little more ?
Because, i trust you when you said there is no differences between 0.75, 0.78 and .070 (it's non significant values), but what if i get 0.10, 0.06 and 0.02 ? One will be significant and not the others ?
Why the same test return different values ? O_o

Thank you !

Rémi

Comment only
07 Apr 2015 Giuseppe Cardillo

Dear Remi, the values are the same. there is no difference between 0.75 0.78 0.70

Comment only
04 Apr 2015 Remi Chaussenot

Hi Giuseppe,

I am currently using the Statview software (http://en.wikipedia.org/wiki/StatView) which is old and unscriptable.

I try to create some MATLAB script to replace and automatized it, fitting my needs : anova, anova rm, Mann-Whitney U and plots.

Using :
gen1 =
2.8500 2.5000 3.4500 3.3500
>> gen2
gen2 =
3.3000 3.3500 3.1500 2.9000 3.0000 2.7500 2.5000

I have :
>> p = ranksum(gen1,gen2)
p =
0.7515

>> test = mwwtest(gen1,gen2)
MANN-WHITNEY-WILCOXON TEST
--------------------------------------------------------------------------------
The exact Mann-Whitney-Wilcoxon distribution was used

T U p-value (2-tailed)
--------------------------------------------------------------------------------
26.0000 16.0000 0.7879
--------------------------------------------------------------------------------
(read your thing about the 1-tailed and 2-tailed, so edit your scriupt to have the 2-tailed value following :
line 91 : p=(1-normcdf(zT))*2; %p-value
line 113 : p=(length(pdf(pdf>=T))/length(pdf))*2;

So, firt problem : even when i *2 the p of your script, i do not have the same thing as ranksum ?

And, as you can see here : http://whirlwind.fr/mww.png

Statview do not gave the same results as your script and ransum() function.

Who did i trust ?

Thank you for your work !!

R.

Comment only
14 Mar 2013 JR King

JR King (view profile)

Thank you for your quick response.

Your roc function is indeed very good.

For those who would just want a quick measure of effect size, it's easy to perform the empirical AUC from Giuseppe Cardillo's function by adding:
STATS.auc = U / (L(1)*L(2));

More information see:
http://www.mathworks.com/matlabcentral/fileexchange/30424-colauc

Comment only
13 Mar 2013 Giuseppe Cardillo

The results are exactly equal: ranksum gives a 2-tailed p-value and mwwtest gives a 1-tailed p-value. If you need a 2-tailed p-value simply multiply 1-tailed p-value by 2 (or, viceversa, divide 2-tailed p-value by 2 if you need 1-tailed p-value).
For Area Under The Curve, please, look at roc.m function that I wrote.

Comment only
13 Mar 2013 JR King

JR King (view profile)

Thank you very much for this useful function.

Is it normal that the p-values it makes are different from those obtained with matlab's ranksum function:
[p h stats] = ranksum(1:100,11:110)
p: 0.0203

stats = mwwtest(1:100,11:110)
stats.p: 0.0102

If so, which one shall we report?

As an addup, I think it would be great to add an output of the effect size using the area under the curve, as well as the confidence interval.

08 Jul 2012 Giuseppe Cardillo

1) if you have to compare more than 2 groups mwwtest is not the test for you...you should use the Kruskal-Wallis test. Or if you have "repeated measures" of the same group in different moments you should use the Friedman's test.

2) the concept of the mwwtest is that if the medians are equals the ranks will distribute equally in both groups and they will have a mean equal to n1*n2/2 (and so you can use Z-score...). The acceptance is the same of the Student's t-test because T (or U) is asintotically normally distributed. The magnitude of T is not the only parameter....if you two very large samples sizes also mean(T) will be very large...

Comment only
08 Jul 2012 Menno

Menno (view profile)

Thanks Giuseppe for your reply. Didn't know there were different computational methods.

About testing medians; of course I know how to use the median command ;). My research is into finance and I test whether activity of specific type of traders is larger in one period compared the other. I use the Mann-Whitney test in addition to a one-tailed independent sample T-test, because some samples violate the normal distribution severely.

Now I corrected all significant test results, which were in the wrong tail (according to the calculated medians) and thus the null hypothesis holds. In contrast, the T-test is symmetric and concludes on the direction of the result.

What then is the acceptance area of the null hypothesis w.r.t. the MWWtest? I guess it is around n1*n2/2? What if the U-statistic is much larger?

Comment only
08 Jul 2012 Giuseppe Cardillo

There are two way to compute T: I setup that described by Stanton Glantz in "Primer of biostatistics". U is related to T by a subtraction factor. Z-score is the same because it is (T-mean(T))/std(T): if you use the other computational method, mean(T) and std(T) change of course. This means that both methods are equivalent.

To reply to your question I have to put off a basically error that everybody do. The question median(1)>median(2) have to be asked BEFORE performing test, during experimental design project. An example: you discover a new diuretic drug. You want to demonstrate that group 1 treated by drug has a median value or urine (mL/24h) higher than group 2 treated by placebo. So you ask to mwwtest if median(1)>median(2). Of course you could decide to treat group 2 by drug: the result of mwwtest will be the same.
In many situations, you can't know BEFORE doing test which median should be greater: if you treat group 1 with drug 1 and group 2 with drug 2 you should not hypothesize which drug will be more effective. So you will ask to mwwtest: are median(1) and median(2) different? In this case you will perform a 2-tailed test and so you have to multiply by 2 the result of mwwtest.

Anyway, don't tell me that you don't know how to ask matlab to compute medians....

Comment only
07 Jul 2012 Menno

Menno (view profile)

Test results (both U and T) are different from the results produced by SPSS. However, the Z-score is the same.

In addition, I have a question in case you would like to use the MWW test to test whether the median in group 1 is larger than in group 2 (1-tailed). How can you conclude from the test results which group has the largest median?

Comment only
18 May 2012 Giuseppe Cardillo

if you like it...
I could decide to set another alpha value for my test (i.e. 0.01) and so your code will give a wrong answare...

Comment only
18 May 2012 Ipek

Ipek (view profile)

Thank you very much for this work.
I needed an exact answer to the hypothesis so added the following code at the end of yours:

if p<0.05
STATS.H=1; %reject H0
else
STATS.H=0; %fail to reject H0
end

25 Apr 2012 Trevor Agus  
14 Apr 2011 Giuseppe Cardillo

A) i'm trying to use your suggested matlab code; mwwtest(x1,x2). But my 2010a matlab version doesn't recognize this code. How is this possible?
R) I use the same version of MatLab (even if I use the Linux version) and mwwtest properly works. Please, check if your data are correct (see help).

A) Or is it possible that mwwtest code was deleted since the result is equivalent to ranksum(x1,x2) code?
B) If you are writing this comment, it means that mwwtest wasn't deleted. A question: did you download mwwtest from this page before using it?

A) If that is the case, can i replace mwwtest with ranksum?
R) You can use ranksum because, as I wrote in Description, the results are the same.

A)What is the difference between Mann-whitney u test and wilcoxon ranksum test?
R) There is no difference: they are two sides of the same coin. Infact, mwwtest compute both T and U.

A) My advisor is convinced that Mann-whitney tests the difference btw mean while ranksum tests difference of median or distribution rank.
R) Your advisor is wrong. Non parametric tests test differences between medians while parametric tests between means.

A)My final question is if i change to perform ranksum code instead of your suggested mwwtest, ranksum doesn't support 1-tailed test. Is this true? How can I test 1-tailed of ranksum or mwwtest in matlab.
R)Ranksum gives you a 2-tailed p. If you want 1-tailed p simply divide 2-tailed p by 2.

Comment only
13 Apr 2011 nuntinee

i'm trying to use your suggested matlab code; mwwtest(x1,x2). But my 2010a matlab version doesn't recognize this code. How is this possible? or is it possible that mwwtest code was deleted since the result is equivalent to ranksum(x1,x2) code?

If that is the case, can i replace mwwtest with ranksum? What is the difference between Mann-whitney u test and wilcoxon ranksum test? My advisor is convinced that Mann-whitney tests the difference btw mean while ranksum tests difference of median or distribution rank.

My final question is if i change to perform ranksum code instead of your suggested mwwtest, ranksum doesn't support 1-tailed test. Is this true? How can I test 1-tailed of ranksum or mwwtest in matlab. I see there are possible way to test 1-tailed in ALGLIB (http://www.alglib.net/hypothesistesting/mannwhitneyu.php).

thank you so much in advance for your reply.

nuntinee

Comment only
19 Mar 2010 Arsen Arakelyan

Thanks, very nice

Updates
16 Nov 2009 1.1

change in the help section

25 Nov 2009 1.2

bug fixed in T computation when n2<n1

23 Dec 2009 1.3

Changes in description

13 Apr 2015 1.4

more clear output; improvement in computations

Contact us