Coffee quality is an extremely elusive concept, despite the rigorous quality scoring protocol of the Speciality Coffee Association. What is quality really, and to whom?

Coffee quality has over many years been evaluated by a strict grading protocol, using expert judges to score the quality of certain product attributes. However, these types of grading protocols require extensive training of a small panel but seems to lead to inconsistent results when applied on a world-wide scale. Judge calibration and reproducibility is low, and worst of all, the voice of the consumer is completely disregarded. Let’s see if we can understand the issue a little better.

Our scientific publication on lack of calibration between coffee experts can be downloaded here for free:

Inter-rater reliability of ‘clean cup’ scores by coffee experts
in Journal of Sensory Studies, Wiley

Quality scoring in the coffee industry

The SCA cupping protocol aims to provide an accurate assessment of coffee quality. More precisely, “The purpose of this cupping protocol is the determination of the cupper’s perception of quality” (SCA, 2019https://sca.coffee/research/protocols-best-practices). A long list of attributes are evaluated on a quality scale, based on the judge’s previous experience. The layout of the form is shown below.

The form is mainly used to provide the farmer with feedback on their quality and to determine appropriate prices, but is also commonly used in competitions, quality control in companies, green coffee import. It is a core element of the SCA Sensory Skills programme, where trainers must devote several hours to teach students both setup and practical use of the protocol.

The voice of the consumer

… is non-existent in this form of quality grading. This does not necessarily make it a bad tool, but it severely restricts the potential use-scenarios for the cupping form particularly in product development processes.
The final quality score of a coffee is based on personal perception of quality and previous experiences of the judges, i.e. extremely subjective. The definition of subjective implies that fundamentally it is unlikely that we agree on what good quality is. Calibrating judges based on a subjective quality concept is against the principles of a subjective evaluation. Likewise, you would not attempt to calibrate consumers to liking the same product in a consumer test. It is possible to calibrate on objective characteristics of high-quality products, sure, but that is not the approach of the SCA Cupping Protocol.

Judges can of course show decent alignment in terms of scores or rank order of quality after a lot of training. Even then, the cupping score does little to inform you whether a consumer will like the coffee or not. The cupping score shows you how a select group of individuals scored the coffees based on their experiences and preferences, which may or may not be useful to you.

Diversity in evaluations

The protocol works perfectly in showcasing the diversity of opinions when it comes to coffee quality. A perfect example of this is from the SCA Coffee Freshness Handbook.
The handbook shows fantastic and relevant data when it comes to chemical markers of coffee aging. Investigating coffee freshness using the cupping form is a different story though…
The quality of three coffees were evaluated over time to show how quality would decrease as the freshness was reduced. The three coffees were supposed to score between 83 and 86 at baseline, yet the test revealed something else.

‘Experienced’ cuppers who have been using the protocol professionally at least 3 times per week were recruited to take part in the experiment, and the result of the first test is highlighted by the red circle. See anything interesting?
The spread of the scores was remarkable. Cupping of the same (!) coffee ranging from around 65, which is extremely far below the 80 score threshold for specialty coffee and all the way up to 88, it is suffice to say that calibration did not exist between cuppers. At least 50% of the cuppers appear to score the coffees below 80, meaning they weren’t evaluated to be speciality coffees! This is a serious issue for many reasons, and truly makes me question the method.

Be critical

The intention here is not to point fingers at the assessors that participated in the test or the people behind it, but to highlight the fact that we do NOT agree what good quality is and the protocol is NOT a universal tool for everything taste-related in the industry. This is a fundamental method problem rather than anything else.

The protocol is essentially a consumer test of specialty coffee, on a very select audience (= specialty coffee people), and not an objective quality evaluation. This is extremely important to keep in mind, as ‘expert’ cuppers are drastically different ‘consumers’ than the consumers in your shop.
As is clear from the data shown above, even at the ‘expert’ level there is no clear consensus of what good quality is even between ‘calibrated’ experts and including non-experts or normal consumers would be likely to spread the scores even further. So what’s quality to you?

TL:DR

The SCA Cupping Protocol is determines the cuppers perception of quality
The protocol and the resulting scores are an effect of judges’ perception of quality
Judge quality perception is likely to differ from your customers
Quality evaluations are subjective, and likely to vary greatly even between ‘expert’ cuppers
You need to be critical of cupping score information. How is it of use to you?