Fleiss'es kappa is a generalization of Scott's pi statistic, a statistical measure of inter-rater reliability. It is also related to Cohen's kappa statistic. Whereas Scott's pi and Cohen's kappa work for only two raters, Fleiss'es kappa works for any number of raters giving categorical ratings (see nominal data), to a fixed number of items. It can be interpreted as expressing the extent to which the observed amount of agreement among raters exceeds what would be expected if all raters made their ratings completely randomly. Agreement can be thought of as follows, if a fixed number of people assign numerical ratings to a number of items then the kappa will give a measure for how consistent the ratings are. The scoring range is between 0 and 1.

Sorry, my mistake: pj are effectively different. But kj and zj are not.

With j=2, sum(x.*(m-x)) yields two identical values. As observers can choose only between category 1 or category 2, n votes for cat 1 induce m-n votes for cat 2.

Parameter b=pj.*(1-pj) yields also 2 identical values with j=2.

Whenever I imput any other matrix than a 5 x 10 matrix into matlab, using your function "fleiss(X)"it gives an error message as follows:

EDU>> fleiss(X)
??? Error using ==> fleiss at 107
The raters are not the same for each rows

Can you tell me how to fix this?
Thx

28 Jun 2007

Giuseppe Cardillo

The Fleiss'es kappa is an overall valuation of agreement. It doesn't recognize differences among raters. I think that this can be done using Cohen's kappa.
An example of the use of Fleiss'es kappa may be the following: Consider 14 psychiatrists are asked to look at ten patients. Each psychiatrist gives one of possibly five diagnoses to each patient. The Fleiss'es kappa can be computed to show the degree of agreement among the psychiatrists above the level of agreement expected by chance.

26 Jun 2007

Amy Graham

I think this m-file is to work with rates not raters.

Updates

28 Jun 2007

Corrections in help lines

26 Sep 2007

new output edited

27 May 2008

there is some numerical inaccuracy so that r*(1/r)' isn't numerically equal to a square matrix of 1 if all element in r are equal. So I have changed the test to check that all raters are the same for each row.

12 Jun 2008

NORMCDF was replaced by ERFC so Statistics Toolbox is no more needed