Fleiss'es kappa is a generalization of Scott's pi statistic, a statistical measure of inter-rater reliability. It is also related to Cohen's kappa statistic. Whereas Scott's pi and Cohen's kappa work for only two raters, Fleiss'es kappa works for any number of raters giving categorical ratings (see nominal data), to a fixed number of items. It can be interpreted as expressing the extent to which the observed amount of agreement among raters exceeds what would be expected if all raters made their ratings completely randomly. Agreement can be thought of as follows, if a fixed number of people assign numerical ratings to a number of items then the kappa will give a measure for how consistent the ratings are. The scoring range is between 0 and 1.

Thanks for contributing this, Giuseppe. I moved the submission to my own git repository in order to make a few changes and facilitate making changes in the future: https://github.com/drdan14/matlab_fleiss_kappa

Let me know if that's OK, or whether you'd prefer to use your own git repository that others can fork.

My modifications are in the current master and your FEX version from 23 Dec 2009 is https://github.com/drdan14/matlab_fleiss_kappa/tree/v2009.12.23

Sorry, my mistake: pj are effectively different. But kj and zj are not.

With j=2, sum(x.*(m-x)) yields two identical values. As observers can choose only between category 1 or category 2, n votes for cat 1 induce m-n votes for cat 2.

Parameter b=pj.*(1-pj) yields also 2 identical values with j=2.

Whenever I imput any other matrix than a 5 x 10 matrix into matlab, using your function "fleiss(X)"it gives an error message as follows:

EDU>> fleiss(X)
??? Error using ==> fleiss at 107
The raters are not the same for each rows

Can you tell me how to fix this?
Thx

Comment only

28 Jun 2007

Giuseppe Cardillo

The Fleiss'es kappa is an overall valuation of agreement. It doesn't recognize differences among raters. I think that this can be done using Cohen's kappa.
An example of the use of Fleiss'es kappa may be the following: Consider 14 psychiatrists are asked to look at ten patients. Each psychiatrist gives one of possibly five diagnoses to each patient. The Fleiss'es kappa can be computed to show the degree of agreement among the psychiatrists above the level of agreement expected by chance.

Comment only

26 Jun 2007

Amy Graham

I think this m-file is to work with rates not raters.

Comment only

Updates

28 Jun 2007

Corrections in help lines

26 Sep 2007

new output edited

27 May 2008

there is some numerical inaccuracy so that r*(1/r)' isn't numerically equal to a square matrix of 1 if all element in r are equal. So I have changed the test to check that all raters are the same for each row.

12 Jun 2008

NORMCDF was replaced by ERFC so Statistics Toolbox is no more needed