这里有一个不错的连接,简洁却又系统了Kappa的前世今生,废话就不多说了,直接上图。
http://www.john-uebersax.com/stat/kappa.htm#procon
http://www.agreestat.com/research_papers/kappa_statistic_is_not_satisfactory.pdf
常见的使用Kappa检验的场景是
- Kappa statistics are appropriate for testing whether agreement exceeds chance levels for binary and nominal ratings.
Interpreting kappa
如何解释计算得出的Kappa的值呢,很多学者认为给出下面的这样一个range是非常危险的,因为有可能导致用户错误的使用该值,并且该定义并不具备通用性,不同的
场景该定义可能会截然不同,所以下面这个只是供大家一个参考,帮助大家了解Kappa检验的作用,请勿作为Best Practice.
Kappa measures the strength of agreement of the row and column variables, which typically represent the same categorical rating variable as applied by two raters to a set of subjects or items. Note that the minimum value of kappa, when there is complete disagreement, is negative. When there is perfect agreement, all cell counts off the diagonal are 0 and kappa is 1. Kappa is zero when there is no more agreement than would be expected under independence of the row and column variables. Landis and Koch ( Biometrics, 1977) give this interpretation of the range of kappa:
- <=0 Poor
- 0-.2 Slight
- .2-.4 Fair
- .4-.6 Moderate
- .6-.8 Substantial
- .8-1 Almost perfect