Cupping scores from experts are extensively used in the coffee industry for a variety of applications, from quality control to judging coffee competitions. In this paper, we examined inter‐rater reliability (IRR) of “clean cup” ratings by coffee experts (“cuppers”) in two studies. In both studies, IRR reliability was found to be low, denoting a lack of concept alignment among experts. Remarkably, however, within‐assessor reproducibility was high, suggesting that expert cuppers have their own individual understanding of “clean cup.”
The results presented suggested that “clean cup” scores have a fundamentally subjective nature. Since cupping scores are routinely used to drive business decisions (particularly in the context of quality control), it would be advisable that such attributes be anchored in a precise definition (in the case of “clean cup” of what constitute a defect from a sensory point of view) developed based on properly conducted sensory studies.
Link to article