Quality does not sell itself

Great to have you here! In this blogpost I want to share some background for the studies we did on Behavioral Economics as there was a lot of things going on that lead to the studies that the are now published in The British Food Journal (http://www.emeraldinsight.com/doi/pdfplus/10.1108/BFJ-03-2016-0127), Cafe Europa Magazine (September 2016) and Reco (https://youtu.be/3Jb03RWYrQ4).

The first study we did was done by Imane Bouzidi with myself and Thomas Zoëga Ramsøy from Copenhagen Business School at Decision Neuroscience Research Group (now Neurons inc (http://neuronsinc.com/)) as supervisors. This study is explained in details in SCAE’s members archive but here is a summary of the research design and the results.

A high quality and a low quality coffee were selected (a premium coffee from Kontra and a commodity coffee called Artnok which is Kontra’s commodity range (Kontra spelled backwards!)) and served for random customers in a shopping centre in Copenhagen. The coffee was served in cups with brand labels to influence the customer cognitively with the brand equity (https://en.wikipedia.org/wiki/Brand_equity) but in the cups was not coffee from any of those rands but just either the HQ or the LQ based on random selection as seen in the figure below.


Before tasting the coffees the customers filled out a questionnaire about their expectations for each coffee based on the brand and then later they rated the coffee after having tasted them. After they had tasted the coffees the consumed amount of each cup was measured and then they were allowed to choose a coffee they could have as a small benefit for their time in participation. So in conclusion the effect measures in the study were

  1. Brand expectations (‘liking’ [conscious])
  2. Rating of coffee samples (‘liking’)
  3. Measure amount consumed (‘wanting’ [sub-conscious])
  4. Final choice of coffee brand (‘behaviour’)

A summary of the results

  1. High brand equity gave
    1. higher tasting scores
    2. lower difference between HQ and LQ scores
  2. Sensory Scores: LQ was preferred! (P< 0,001)
  3. Consumption: LQ was preferred! (P< 0,001)
  4. HQ was preferred without milk

So if the brand had high brand equity people scored it higher when tasting it (1a) but also distinguished less between HQ and LQ (1b) both of which might be expected. A slightly more surprising (and a bit disappointment as a specialty coffee professional) was how strongly the data proved that consumers preferred low quality (point 2 and 3 with a really strong significant result with a P<0,001) but the real surprise and source of wonder for me was that despite 2 and 3 consumers were clear in pointing to the HQ when asked which coffee they could enjoy without milk! Without support in the data this gave me a hint for a hypothesis that the consumers preferred the LQ out of habit but when asked which coffee they could drink without milk they were able to taste that there was ‘less unpleasant flavours’ that they wanted to remove in the HQ which is what milk does in my mind. I believe that there is a physiological response of aversion for the unpleasant flavours in coffee that we in the specialty coffee business do our best to remove by selecting defect-free green beans, slow roast to avoid burnt and bitter flavours and a less aggressive brew (20% extraction rather than 30% as is the norm in commodity). You can get used to these bad flavours to the degree that you develop af preference when offered a choice between HQ and LQ but you are still able to recognize that HQ is the most pleasant to drink if you are not adding milk.

This led us to another pilot study that is strictly a pilot in the sense that we did not have a big enough cohort of subjects but we just wanted to make a small test that we could do in a few hours to get ideas for future studies. At this point Thomas Ramsøy had left Copenhagen Business School to start his own consumer research company Neurons Inc. (neuronsinc.com) and then I was lucky enough to meet Toke Fosgaard (https://dk.linkedin.com/in/tokefosgaard) who is now my playmate when it comes to studies in behavioral economics. Toke Fosgaard, Ida Steen and I had a cohort of 11 of Toke’s students with age 22 to 28 and we selected a high quality coffee and a low quality coffee. For this study we knew, that we did not have enough consumers so in order to increase the probability that we got useful data we selected extreme HQ and extreme LQ. The HQ coffee was one of my favorites namely Coffee Collective’s (http://coffeecollective.dk/da/) Kenya and the coffee from the 20 liter batch brewer in the university canteen that is for sure the worst green, roasted in no time and extracted from here to hell which Ida and I could confirm was the case with this coffee. It is a strategic decision whether to choose a HQ and LQ within a very similar flavour range or you should choose a HQ that goes far beyond the LQ/commodity traditional flavour profile. LQ is traditionally rich in bitterness, chocolate, nutty and other non-fruity flavours where HQ chosen from the elite roasteries is rich in acidity and fruitiness that is considered strange for the average consumer. In Imane’s study the HQ was chocolaty and nutty and not acidic so the consumers could concentrate on the quality of the beans rather than being confused with low bitterness and high acidity and fruitiness. But in this study we wanted from extreme LQ to extreme HQ which lead us to the above decisions on samples.

So with the samples at hand we had a room where the students would come in one by one and we changed the setup to alternate between two different setups described below so that half of the students would experience one setup and the other half part of the students would experience the other setup.

Setup 1: Served with the full sales pitch


In setup 1 we prepared two cupping setups, one for me and one for the student (consumer), and in the two cups were respectively the HQ and LQ. I presented myself as external lecturer at Food science with coffee as my full focus area, and I told them about my involvement in SCAE education and research and my many years as consultant world wide to choose high quality green coffee and how I design product ranges for clients so that it was very clear to the student that I was an international authority in coffee quality. After that introduction we did a cupping where I took my time to point specifically everything about the low quality coffee that I did not like and was a consequence of rotten beans, cheap and fast roast profile and an outrageously bad brew and I pointed to all the nice, elegant and juicy notes of the HQ with no attic/basement off flavours and no burtness nor bitterness and we went back and forth between the samples to make sure they really tasted themselves all the bad stuff about the LQ and all the good stuff about the HQ.

After the introduction pitch and the thorough tasting the students were told that they could choose to get one full cup of coffee to go of one of the two coffees as a small gift for their time, and this final choice was the ‘endpoint’ for this study since it was a study in Behavioral Economics where you measure behavior rather than asking for opinion. As a small little extra endpoint we did ask them what they liked about the coffee they chose.

Setup 2: Served with no comments


In setup 2 the HQ and LQ coffees were poured into two cups and when the student entered we did not tell them anything about the coffee at all but we told them that we would like them to taste from both cups and choose which one they would prefer to have as a free gift they could have as a gift for the time they spent on this study. We also asked them what they liked about the coffee they chose.

So what were the results? (drum roll please..)

Which coffee would you like to walk away with?


When the students had the sales pitch where I did EVERYTHING I COULD to heavily nudge them to prefer the HQ still 67% chose to WALK AWAY WITH THE LQ!!! They were nudged by my pitch about myself and the coffees to the degree that most of them excused themselves when choosing the LQ in front of me which was really interesting since even this embarrassment they felt for openly in front of an expert choosing the LQ did not shift the preference for the LQ to the less chosen cup!

Again only having 11 consumers in this study we can’t really calculate any valid statistics but I still think that it is surprising that 11 university students do not have a higher preference for HQ since I would expect this part of the population to be specialty coffee drinkers. Now that the statistics could not really be relied on, we found it interesting to hear the student’s comments when tasting and choosing between the LQ and HQ:

“I just really like a black coffee [the LQ]!”

“I like strong coffee [the LQ]”

“It [the HQ] does not taste like coffee”

“It [the HQ] tastes like tea”

“Is it [the HQ] a thin version of the canteen coffee?”

“This [the HQ] is not coffee this is something else”

These comments are really interesting I think. It points to the extreme HQ as being outside the category of coffee for these consumers which is often what I experience when people are new to the specialty coffee culture and one of the things that I have a keen eye on when I as a consultant help new roasteries design a product range (Online Lean Startup Process,https://coffee-mind.com/product/onlineleanstartup/) where I try to make my clients choose a product range where they can show their customers something new without pushing their customers off the cliff which takes careful preference mapping with surveys, focus groups and consumer studies since what is ‘too light roast’ in one area of the world or even city vs rural in one country is not the same from place to place.


Sensory Science and Common Business Practises

Last Wednesday CoffeeMind held a presentation at Square Mile Coffee Roasters in London. The presentation focused on quality control and how to improve sensory skills.

It was presented by Ida Steen, MSc of Sensory Science from Department of Food Science in Copenhagen, and Morten Münchow, Lecturer at Food Science at the University of Copenhagen, this two-hour presentation inspired the participants to take a more scientific approach to their quality control program, new product development and possible approaches to judge their own sensory skills. By an introduction to statistics and a brief overview of different sensory methods, we showed the different biases and sources of random decision that you face as a cupper. We explained the principles behind our innovative sensory training program as well as some quick methods to develop a more evidence based approach to quality control and product development methodologies.

Over the summer Square Mile Coffee Roasters will host a series of courses with focus on sensory training in coffee and this event explained the principles behind the research. Already on the 11-12th of May we will run the first of these courses. The course will focus on your skills as a taster in a highly innovative way so that you will be trained directly based on your strength and weaknesses to speed up your personal skills.

Even though we live and breathe coffee, the focus on your skills as a taster made this course relevant and applicable for people in other areas of food and drink such as beer, wine, spirits, chocolate etc

See the presentations: Statistics and  Sensory methodology


Roast profile analysis

This blog post is sketching out the basics of a roast profile analysis and introduces the concepts and basic calculations that are part of the exam for the Roasting professional in the SCAE Coffee Diploma System.

A roast profile is a graphs that shows the temperature development during a roast cycle and preferably both bean and air temp is measured and logged. An idealized profile is illustrated here:


During a roast you have different events, and the next illustration plots in these events:


As you can see when the green coffee at room temp is added to a preheated roasters the temperature drops quickly but after some time you would get a turning point where the temperature starts rising rather than falling. After the turning point you have a period of the roast with maximum temperature increment speed and the speed of the roast is in the business often called Rate of Rise (RoR) and the maximum rate of rise is a good thing to log and will be explained in detail later in this post. After a while you get 1st crack and the last event is when the roast ends and the ‘development’ time is time from 1st crack to end of the roast.

The temperature difference between air and bean is an interesting measure since it gives in indication how much convection drives the roast at any part of the roast process. I recommend to calculate the temperature difference at the turning point, at 1st crack and at end as shown on this below illustration:


Other interesting readings from the graph is the rate of rise as shown below:


The rate of rise is the speed of the roast (degrees pr minute) at any given time at the roast. 3 relevant points at the roast is defined above namely the max RoR, RoR at 1st crack and RoR when the roast finishes.

If you understand everything so far, there is no need to read further, but at the exam for the Roasting Professional Certification calculations are important since it is part of the certification process.

Geometrically a rate is the slope of a tangent at a given point on a curve:


A tangent only touch the curve in one point and it is exactly in this point that the tangent says something interesting about the slope namely how quickly the curve ‘changes’ in this point. So the tangent represents the speed of the curve in this particular point so we would like to calculate the inclination of the tangent because the inclination of the tangent equals the inclination of the curve in the particular point. So for a roast profile the tangent at any given point exposes the speed of the roast at this given point.

So measure the slope of a tangent you can just choose a random range of the tangent like in this example:


On the above illustration we have °C on the vertical axis and time on the horizontal axis just like on a roast profile. Just as an example the slopes is calculated based on how much the speed of temperature rise is in 3 min, and as you can se the specific tangent will rise 42°C during 3 minutes which will give you 14 degrees per minute which is the Rate of Rise of the curve (the circle in this example) in exactly the point where the tangent touch the curve.

If we choose to calculate the RoR at another point of the curve I could be a little later in the process like this:


Here you can see, that the slope of the curve is different and the inclination of the tangent in the point is less steep. Here the time period is chosen to be 4 minutes and during the 4 minutes the temperature rises with 22°C giving a Rate of Rise of 22°C / 4 minutes = 5,5° pr minute. I prefer to always look at the inclination of the tangent in a 4 minutes period because I find it easy to divide the corresponding temperature range by 4 in my head.

But let us look at an example that is relevant to the RoR of a roast profile:


As with the other tangents we would like to know how many °C the temperature changes during the 4 minutes of the tangent that represents the inclination of the curve in the one point where the tangent touch the curve. The challenge here is that the tangent is completely horizontal so there is not change in temperature which means that the temperature difference is = 0! So if we divide 0 with 4 we still have 0. So the RoR of the curve in the point where the tangent touch the curve is 0. In other words: the temperature is not going up anymore so the speed of the roast has stalled which is announced by the RoR by taking on the value of 0.

There is one last state of the RoR worth a mention and that is when the slope of the tangent is negative:


When the curve starts to go down after having stalled the RoR becomes negative because when you consider a change you always calculates the change by subtracting the initial state from the resulting state and if the process has gone in the reverse direction you subtract a bigger number from a smaller number and the result is negative (see the blogpost about change for a deeper explanation) so here it is -22°C/4min = -5,5°C/min. A negative RoR is something you don’t want in your roast profile in general yet for the last few seconds in the roast some coffees would work with a very low RoR, a stalling RoR (=0) and only few profiles would you ever find with a negative RoR in the end of the roast!

So to conclude this post the following is an illustration of all the main points of a roast profile:


Variation analysis of roast colour measurement

This article basically gives you a tool to answer these two crucial questions concerning you way of handling colour measurement equipment:

  1. Which grind size gives me the best results?
  2. How many replicates of each sample is ‘enough’?

I recently had the pleasure to roast coffees for a university project where consistency in the roast was key to minimize variation in the overall research setup and since I just got my new colour measurement apparatus I did not know the answer to the above questions.

My new colour measurement apparatus is the JAVALYTICS™ JAV-RDA-D that suits my needs well in my lab. One handy thing about this is that they did not invent a new arbitrary measurement scale but has made a great effort to be able to work with different scales and quite versatile in its ability to synchronize with other brands of measurement standards.


One of the most important aspects to standardize was the colour of the roasted beans. They happened to be extremely dry (5%) moisture and extremely non-uniform in terms of size and even moisture so colour measurement of the final ground product was important to establish some consistency in an otherwise quite diverse setup! And to determine how precisely I was working with a diverse product I needed to do some calculations.

The Javalytics technology made a great effort to be able to work with different roast degree measurement scales and quite versatile in its ability to synchronize with other brands of measurement standards.

The technology sends light to a ground sample of the coffee and measures how much is reflected. So the number gets SMALLER when the sample is DARKER as less light is reflected on a dark roast and vice versa for a light roast. If light is not reflected into the sensor it could be that

  1. it is absorbed by the darkness of the sample or
  2. it can be reflected but not perpendicular to the surface and then it is lost on the sides inside the measurement chamber

You would like to avoid (2) as this is an error and does not reflect the darkness of the sample but error in the setup. If the grounds are to fine you get a ‘hilly terrain’ on the surface on the sample when you try to smoothen it out and the hill sides sends the light into category (2). On the other extreme you have bit grounds that also reflects light ‘non-perpendicular’ that also looses a lot of light in category (2).

So the best grind size is in between the two extremes. But practically where?

To answer this question before starting to roast the samples I choose a coffee that is roasted as a filter coffee (around 72 on the Agtron Gourmet scale as you can see in the below spread sheet) and performed different samples with different grind size. You can see the result in this spread sheet:


(if you have problems with the functions it might be because I have a danish MS excel where they for ridiculous reasons have translated the functions to danish! If you have problems you might be able to get help to translate the name of the danish functions to english here)

So I measured 5 replicates of 4 different grind sizes but the document is prepared to do 10 different grind sizes and 8 measures per grind size. You can just add your own values in the spreadsheet and get the results immediately.

I have the most important value on the left that is generated by information on the right hand side in the document so let me explain it starting from the right side. ‘Measure 1 … Measure 8′ are the individual measures that you do on each grind size (replicates pr grind size) and you can choose the number of measures as you want but I would recommend more than 3 to get more precise results. The ‘number of samples’ column is just a trivial counting of the number of measures you do on each grind size that is needed by the function that calculates the 95% confidence interval (explained in a minute..). The first numbers to calculate is the average reading of the colour measurements for the grind size and the standard deviation. The standard deviation indicates how big the variation is at each grind size and as we are looking for the grind size that gives the most accurate (less variation) results and from the numbers and the graph we can conclude that we get the most accurate readings on grind size 2. But we still don’t know how accurate it is.

In itself the standard deviation does not lend itself to useful practical interpretation other than comparing variation between sample pools so this is where the  95% confidence interval comes in. It tells you in which interval you based on the given samples with 95% certainty (which is interpreted as optimal certainty) would estimate the ‘real value’ of the colour of the coffee. The 95% confidence interval depends on the variation and the numbers of replicates and it gets more precise as the variation gets smaller and/or if you take more replicates. So with the given variation we would like to know how many replicates we need in order to get a satisfying precision (narrow enough 95% confidence interval)

So to determin the width of the 95% confidence interval we have to calculate the lower end (column C) and the upper (column D) end and subtract the lower from the upper which gives us the width (column B).

As you can see I did 5 replicates per grind size to have enough data and clearly the 5 replicates gives sufficiently narrow confidence intervals (sensory relevant differences is in the area of 2-3 on the Agtron Gourmet scale) but would less than 5 replicates be enough? To test this I tried to include only 3 replicates and that gave me 95% confidence interval size of the different grind sizes 1, 2, 3, 4 respectively 1.6, 1.2, 2.6, 2.8 which is narrow enough at least for grind #2 so 3 replicates is more than enough (it seems that often just 1 measurement would work at this level of variance). In the lower left graph I have plottet the 95% of respectively 5 and 3 replicates which show the tendency that the confidence interval is always more precise (narrower) with more replicates.

The diagram in the lower right plots the 95% confidence interval around the average value so here you can see how the estimated ‘real value’ of the roast colour. Here you can  see that the measurement method tends to estimate the value of the roast degree lighter and lighter the more coarse you grind. I interpret this as the more coarse you grind the more is lost due to not-perpendicular reflection and hence more is lost to the sides in the measurement chamber (wonder if this could be compensated if the sides was constructed as a mirror). At very fine grinds the variation is bigger due to the not smooth surface but it is still measured as a darker average value because much more is absorbed by the sample itself. I don’t know if these interpretations are correct so please challenge me 🙂

The above variation analysis is based on a standard  filter coffee roast and not the actual samples that I needed to do. I trusted that the optimum grind size also goes for my coffee samples but I’m not sure the variation is the same on my samples as on the standard roasted filter coffee.  My challenge was that my samples was dehulled in a german rice polishing factory because the university had parchment rather than green beans shipped to Denmark so we had to be creative to get it dehulled. They did a great job but there was significantly more silverskin on the beans and as I chose to roast light to expose the differences between the samples more, I struggled with a lot of silverskin in the roasted and ground samples and that added a lot to the variation on the colour measurements. To be sure 3 replicates was still enough (remember that the precision ie the width of the 95% confidence interval depends on number of replicates AND the variation on the reading and the latter depends on the evenness of the roast and the silver skin gave a lot of variation because they are much lighter than the roasted coffee and are scattered amongst the much darker grounds which is very visible on the surface). It turned out that with 3 replicates even in a system with much more variation 3 replicates gave a width of the 95% confidence interval between 0.6 to 2.4 and even the latter is good enough I think.