http://people.dbmi.columbia.edu/homepages/chuangj/kappa/
To assess the accuracy of any particular measuring'instrument', it is usual to distinguish between the reliability of the datacollected and their validity. Reliability isessentially the extent of the agreement between repeated measurements, and validity is the extent to which a method of measurement provides atrue assessment of that which it purports to measure. When studying the variability ofobserver categorical ratings, two components of possible lack of accuracy must bedistinguished. The first is inter-observer bias, which isreflected in differences in the marginal distributions of the response variable for eachof the observers (Cochran's Q-test is the appropriate testfor the hypothesis of no inter-observer bias). The second is observerdisagreement, which is indicated by how observers classify individual subjects intothe same category on the measurement scale (Kappa coefficientis one of the most common approaches). In this part, we will focus on the Kappacoefficient (or Kappa statistics).
Kappa Statistics: an index whichcompares the agreement against that which might be expected by chance. Kappa can bethought of as the chance-corrected proportional agreement, and possible values range from+1 (perfect agreement) via 0 (no agreement above that expected by chance) to -1 (completedisagreement).
Hypothetical Example: 29 patients are examined by two independent doctors(see Table). 'Yes' denotes the patient is diagnosed with disease X by a doctor. 'No'denotes the patient is classified as no disease X by a doctor.
| Doctor A | |||
No | Yes | Total | ||
Doctor B | No | 10 (34.5%) | 7 (24.1%) | 17 (58.6%) |
Yes | 0 (0.0%) | 12 (41.4%) | 12 (41.4%) | |
Total | 10 (34.5%) | 19 (65.5%) | 29 |
Observed agreement = (10 + 12)/29= 0.76
Chance agreement = 0.586 * 0.345+ 0.655 * 0.414 = 0.474
Kappa = (0.76 - 0.474)/(1 - 0.474) = 0.54