Does bottled water actually taste better? Attribute Agreement Analysis

In order to answer this question, we setup an experiment to see if people could tell the difference between four different types of water (Filtered Tap Water, Fiji^©, Zephyrhills^© and a generic brand purchased at 7-11^©).

Fiji^©

Zephyrhills^©

7-11^© Generic

Filtered Tap Water

Each person was given a sample of the four waters at the beginning of the test, and told which one was which, so they knew how each water tasted. At any time during the test, they were allowed to go back to the samples and re-taste them.

After tasting each sample, they were given 12 unmarked cups of water, and asked to select the correct water based upon its taste and smell. Each of the four water brands were provided three times in the study (12 cups total, see image below).

	12	11	10	9	8	7
	1	2	3	4	5	6

The correct answer, along with the answer for each of the three testers are displayed below in Table 1.

Table 1. Correct and Chosen Answers for Water Test

Cup #	Actual	Tester #1	Tester #2	Tester #3	% Correct
1	Generic	Generic	Tap	Fiji	33%
2	Tap	Zephyrhills	Generic	Tap	33%
3	Fiji	Fiji	Fiji	Generic	67%
4	Zephyrhills	Fiji	Fiji	Generic	0%
5	Fiji	Tap	Tap	Zephyrhills	0%
6	Tap	Zephyrhills	Zephyrhills	Tap	33%
7	Generic	Fiji	Fiji	Zephyrhills	0%
8	Zephyrhills	Tap	Generic	Fiji	0%
9	Tap	Tap	Tap	Zephyrhills	67%
10	Generic	Generic	Generic	Generic	100%
11	Fiji	Generic	Zephyrhills	Zephyrhills	0%
12	Zephyrhills	Fiji	Fiji	Zephyrhills	33%
Overall		42% (4)	33% (3)	42% (4)	8% (1)

Having each brand show up more than once allows us to test how repeatable each tester is. In other words, if one tester correctly chooses the Fiji water the first time, but chooses it incorrectly the other two times, then it shows that the first selection may have been more of a lucky guess, rather than strong evidence that the tester could differentiate between the water.

In order to apply statistical analysis to this experiment, we used Minitab’s Attribute Agreement Analysis test. For those of you not familiar with this technique, it is a method for determining how well different people can select the correct answer from a list of choices.

Here is the Minitab Analysis of the results, summarized to highlight the key points

Attribute Agreement Analysis for Tester1, Tester2, Tester3
Each Appraiser vs Standard Assessment Agreement
Appraiser # Inspected # Matched Percent 95 % CI
Tester1 12 4 33.33 (9.92, 65.11)
Tester2 12 3 25.00 (5.49, 57.19)
Tester3 12 4 33.33 (9.92, 65.11)

# Matched: Appraiser’s assessment across trials agrees with the known standard.

All Appraisers vs Standard Assessment Agreement # Inspected # Matched Percent 95 % CI
12 1 8.33 (0.21, 38.48)

# Matched: All appraisers’ assessments agree with the known standard.

Fleiss’ Kappa Statistics Response

Kappa SE Kappa Z P(vs > 0)
Fiji -0.093322 0.166667 -0.55993 0.7122
Generic 0.259259 0.166667 1.55556 0.0599
Tap 0.323197 0.166667 1.93918 0.0262
Zephyrhills -0.217105 0.166667 -1.30263 0.9036
Overall 0.066972 0.096912 0.69106 0.2448

* NOTE * Single trial within each appraiser. No percentage of assessment agreement within appraiser is plotted.

To summarize the analysis above, the numbers in bold are the Kappa values. A kappa value greater than 0.7 is considered acceptable, meaning that our testers are able to adequately select that brand from the rest of them. As you can see, there are no brands with kappa values greater than 0.7, therefore we conclude that with an overall kappa value of 0.067, the testers are not able to determine a difference between the brands of water. In fact, since some of the values were close to zero, it means that they could have done just as well if they guessed (random chance), than actually tasting the water and making a selection. The brands highlighted in red were actually below zero, which means that they were worse than random chance, so the testers would have done better by simply guessing. Bottom line: Stop buying bottled water, just reuse your water bottles by filling them up with filtered tap water (not recommended for long term use). Not only will it help your own pocketbook, but you’ll help the environment, by preventing the creation of new bottles and reduce the transportation costs associated with getting the bottles to your local store.

Conclusion: So how is this study applicable to your company? Most processes collect some kind of data, and typically there are codes that get assigned to designate the type of transaction, type of defect, or some other reason. Without validating the ability of the people to correctly classify these codes into the right buckets, there is a possibility that the codes are being incorrectly used, and people are misinformed on what is really going on in the process.

Let’s say you are collecting data on reasons for late payments from your customers. You generate a report that shows the Top 5 reasons for late payments.

Reason	Percentage
Missing Paperwork	33%
Problem with Service Provided	25%
No Reason Provided by Customer	18%
Wrong Information on Invoice	13%
Wrong Amount on Invoice	5%

Naturally, you would start working on the “Missing Paperwork” category, but you are assuming that you have a good measurement system that is correctly coding these late payments into the correct defect code. The only way to know is by performing an Attribute Agreement Analysis. If it does not pass (poor Kappa values), then you must conclude that the defect codes are not accurate, and must be further clarified in order to get a “true” picture of which issue to focus on.

Let’s assume that your coding criteria is clarified for your people, and the data is cleaned up with this criteria. Now let’s look at the Top 5 issues…

Reason	Percentage
Wrong Information on Invoice	42%
Missing Paperwork	23%
Problem with Service Provided	15%
No Reason Provided by Customer	12%
Wrong Amount on Invoice	5%

As you can see, the order of reasons has changed after the criteria was improved, so now I can correctly go out and investigate why there is “Wrong Information on Invoice” instead of the previous problem of “Missing Paperwork”

Attribute Agreement Analysis allows you to have confidence that your attribute (coding, pass/fail) data is accurate, so you make good decisions and prioritize your efforts in the right direction.

Does bottled water actually taste better? Attribute Agreement Analysis

More Stories

E111: Apply Lean to Nonprofit Boards with Sally Toister

E110: Lean for Community Projects with Lean Portland – Lean Global Connection 2023

E109: Applying Lean Tools to Environmental and Social Challenges – Lean Global Connection 2023

E111: Apply Lean to Nonprofit Boards with Sally Toister

E110: Lean for Community Projects with Lean Portland – Lean Global Connection 2023

E109: Applying Lean Tools to Environmental and Social Challenges – Lean Global Connection 2023

E108: Interview with Maria Grzanka and Reed Harrison

More Stories

E111: Apply Lean to Nonprofit Boards with Sally Toister

E110: Lean for Community Projects with Lean Portland – Lean Global Connection 2023

E109: Applying Lean Tools to Environmental and Social Challenges – Lean Global Connection 2023

You may have missed

E111: Apply Lean to Nonprofit Boards with Sally Toister

E110: Lean for Community Projects with Lean Portland – Lean Global Connection 2023

E109: Applying Lean Tools to Environmental and Social Challenges – Lean Global Connection 2023

E108: Interview with Maria Grzanka and Reed Harrison