ASTM E2943-15(2021)
(Guide)Standard Guide for Two-Sample Acceptance and Preference Testing With Consumers
Standard Guide for Two-Sample Acceptance and Preference Testing With Consumers
SIGNIFICANCE AND USE
5.1 Acceptance and preference are the key measurements taken in consumer product testing as either a new product idea is developed into testable prototypes or existing products are evaluated for potential improvements, cost reductions, or other business reasons. Developing products that are preferred overall, or liked as well as, or better, on average, compared to a standard or a competitor, among a defined target consumer group, is usually the main goal of the product development process. Thus, it is necessary to test the consumer acceptability or the preference of a product or prototype compared to other prototypes or potential products, a standard product, or other products in the market. The researcher, with input from her/his stakeholders, has the responsibility to choose appropriate comparison products and scaling or test methods to evaluate them. In the case of a new-to-the-world product, there may or may not be a relevant product for comparison. In this case, a benchmark score or rating may be used to determine acceptability. A product or prototype that is acceptable to the target consumer is one that meets a minimum criterion for liking, and a product that is preferred over an existing product has the potential to be chosen more often than the less-preferred product by the consumer in the marketplace, when all other factors are equal.
5.2 The external validity (the extent to which the results of a study can be generalized) of both acceptance and preference measures to manage decision risk at all stages of the development cycle is dependent on the ability of the researcher to generalize the results from the respondent sample to the target population at large. This depends both upon the sample of respondents and the way the test is constructed. Within the context of a single test, acceptance measures tell the relative hedonic status of the two samples, quantitatively, as well as where on the hedonic continuum each of the samples falls, that is, “disliked,”...
SCOPE
1.1 This guide covers acceptance and preference measures when each is used in an unbranded, two-sample, product test. Each measure, acceptance, and preference, may be used alone or together in a single test or separated by time. This guide covers how to establish a product’s hedonic or choice status based on sensory attributes alone, rather than brand, positioning, imagery, packaging, pricing, emotional-cultural responses, or other nonsensory aspects of the product. The most commonly used measures of acceptance and preference will be covered, that is, product liking overall as measured by the nine-point hedonic scale and preference measured by choice, either two-alternative forced choice or two-alternative with a “no preference” option.
1.2 Three of the biggest challenges in measuring a product’s hedonic (overall liking or acceptability) or choice status (preference selection) are determining how many respondents and who to include in the respondent sample, setting up the questioning sequence, and interpreting the data to make product decisions.
1.3 This guide covers:
1.3.1 Definition of each type of measure,
1.3.2 Discussion of the advantages and disadvantages of each,
1.3.3 When to use each,
1.3.4 Practical considerations in test execution,
1.3.5 Risks associated with each,
1.3.6 Relationship between the two when administered in the same test, and
1.3.7 Recommended interpretations of results for product decisions.
1.4 The intended audience for this guide is the sensory consumer professional or marketing research professional (“the researcher”) who is designing, executing, and interpreting data from product tests with acceptance or choice measures, or both.
1.5 Only two-sample product tests will be covered in this guide. However, the issues and recommended practices raised in this guide often apply to multi-sample tests as well. Detailed coverage of execution tactics, optional types of s...
General Information
- Status
- Published
- Publication Date
- 31-Dec-2020
- Technical Committee
- E18 - Sensory Evaluation
- Drafting Committee
- E18.04 - Test Methods
Relations
- Effective Date
- 01-Apr-2022
- Effective Date
- 15-Oct-2019
- Effective Date
- 01-Aug-2019
- Effective Date
- 01-Mar-2019
- Effective Date
- 01-Oct-2018
- Effective Date
- 01-Aug-2018
- Effective Date
- 15-Jun-2018
- Effective Date
- 01-Jun-2018
- Effective Date
- 01-Oct-2017
- Effective Date
- 01-Oct-2017
- Refers
ASTM E1871-17 - Standard Guide for Serving Protocol for Sensory Evaluation of Foods and Beverages - Effective Date
- 01-Sep-2017
- Effective Date
- 01-May-2017
- Effective Date
- 01-Oct-2016
- Effective Date
- 01-Aug-2016
- Effective Date
- 01-Jun-2016
Overview
ASTM E2943-15(2021): Standard Guide for Two-Sample Acceptance and Preference Testing With Consumers provides guidance for conducting effective consumer product testing using unbranded, two-sample comparisons. Developed by ASTM International, this guide is vital for sensory consumer professionals and marketing research professionals who are responsible for designing, executing, and interpreting data from product tests. The main objective when developing or improving products is to ensure consumer preference and acceptance among defined target groups, using validated, repeatable test methods.
Key Topics
Acceptance vs. Preference Measures
- Acceptance: Measures the degree of liking or overall hedonic response on scales such as the nine-point hedonic scale.
- Preference: Determines which product is chosen over another, often using two-alternative forced choice tests or with a “no preference” option.
Test Design and Execution
- Focuses on unbranded, two-sample product tests to measure sensory attributes only.
- Guides on choosing appropriate comparison products and methods, excluding factors like branding, packaging, and price.
- Emphasizes the importance of a representative respondent sample and challenges such as determining sample size and questioning sequence.
Risks and Statistical Considerations
- Addresses statistical errors: alpha risk (Type I error) and beta risk (Type II error).
- Discusses the importance of setting action standards, hypothesis direction, and ensuring statistical power through appropriate sample sizes.
Interpretation and Decision-Making
- Provides recommendations on how to interpret results, understand the relationship between acceptance and preference, and make informed product development decisions.
- Stresses that acceptance and preference are not interchangeable-products may be acceptable to consumers but not preferred, or vice versa.
Applications
Product Development and Improvement
- Supports businesses in comparing prototypes to established products or competitors, enabling data-driven decisions for product launches and optimizations.
- Helps determine whether a new or reformulated product meets consumer expectations for liking or preference.
Market Research and Competitive Benchmarking
- Used by sensory and marketing professionals to assess consumer-driven product qualities, manage decision risk, and inform strategic product positioning.
- Facilitates testing at various development stages-from idea generation to post-launch assessment.
Quality Control
- Can be employed to ensure ongoing consumer acceptance and preference for existing products, informing quality improvements or cost-saving changes.
Related Standards
The effectiveness of two-sample acceptance and preference testing is enhanced by referencing and aligning with other ASTM standards, including:
- ASTM E253 - Terminology Relating to Sensory Evaluation of Materials and Products
Provides standardized definitions for sensory evaluation terms. - ASTM E1871 - Guide for Serving Protocol for Sensory Evaluation of Foods and Beverages
Details best practices for sample presentation and preparation. - ASTM E1958 - Guide for Sensory Claim Substantiation
Supports substantiating sensory-related product claims. - ASTM E2263 - Test Method for Paired Preference Test
Offers methods for conducting paired preference sensory testing. - ASTM E2299 - Guide for Sensory Evaluation of Products by Children and Minors
Addresses sensory testing involving younger populations.
ASTM E2943-15(2021) is an essential resource for conducting objective, statistically valid acceptance and preference tests that inform better product development and business decisions. By focusing solely on sensory attributes in an unbranded context, the standard helps organizations gain actionable consumer insights, increasing the likelihood of market success.
Buy Documents
ASTM E2943-15(2021) - Standard Guide for Two-Sample Acceptance and Preference Testing With Consumers
Get Certified
Connect with accredited certification bodies for this standard
IMP NDT d.o.o.
Non-destructive testing services. Radiography, ultrasonic, magnetic particle, penetrant, visual inspection.
Inštitut za kovinske materiale in tehnologije
Institute of Metals and Technology. Materials testing, metallurgical analysis, NDT.
Q Techna d.o.o.
NDT and quality assurance specialist. 30+ years experience. NDT personnel certification per ISO 9712, nuclear and thermal power plant inspections, QA/
Sponsored listings
Frequently Asked Questions
ASTM E2943-15(2021) is a guide published by ASTM International. Its full title is "Standard Guide for Two-Sample Acceptance and Preference Testing With Consumers". This standard covers: SIGNIFICANCE AND USE 5.1 Acceptance and preference are the key measurements taken in consumer product testing as either a new product idea is developed into testable prototypes or existing products are evaluated for potential improvements, cost reductions, or other business reasons. Developing products that are preferred overall, or liked as well as, or better, on average, compared to a standard or a competitor, among a defined target consumer group, is usually the main goal of the product development process. Thus, it is necessary to test the consumer acceptability or the preference of a product or prototype compared to other prototypes or potential products, a standard product, or other products in the market. The researcher, with input from her/his stakeholders, has the responsibility to choose appropriate comparison products and scaling or test methods to evaluate them. In the case of a new-to-the-world product, there may or may not be a relevant product for comparison. In this case, a benchmark score or rating may be used to determine acceptability. A product or prototype that is acceptable to the target consumer is one that meets a minimum criterion for liking, and a product that is preferred over an existing product has the potential to be chosen more often than the less-preferred product by the consumer in the marketplace, when all other factors are equal. 5.2 The external validity (the extent to which the results of a study can be generalized) of both acceptance and preference measures to manage decision risk at all stages of the development cycle is dependent on the ability of the researcher to generalize the results from the respondent sample to the target population at large. This depends both upon the sample of respondents and the way the test is constructed. Within the context of a single test, acceptance measures tell the relative hedonic status of the two samples, quantitatively, as well as where on the hedonic continuum each of the samples falls, that is, “disliked,”... SCOPE 1.1 This guide covers acceptance and preference measures when each is used in an unbranded, two-sample, product test. Each measure, acceptance, and preference, may be used alone or together in a single test or separated by time. This guide covers how to establish a product’s hedonic or choice status based on sensory attributes alone, rather than brand, positioning, imagery, packaging, pricing, emotional-cultural responses, or other nonsensory aspects of the product. The most commonly used measures of acceptance and preference will be covered, that is, product liking overall as measured by the nine-point hedonic scale and preference measured by choice, either two-alternative forced choice or two-alternative with a “no preference” option. 1.2 Three of the biggest challenges in measuring a product’s hedonic (overall liking or acceptability) or choice status (preference selection) are determining how many respondents and who to include in the respondent sample, setting up the questioning sequence, and interpreting the data to make product decisions. 1.3 This guide covers: 1.3.1 Definition of each type of measure, 1.3.2 Discussion of the advantages and disadvantages of each, 1.3.3 When to use each, 1.3.4 Practical considerations in test execution, 1.3.5 Risks associated with each, 1.3.6 Relationship between the two when administered in the same test, and 1.3.7 Recommended interpretations of results for product decisions. 1.4 The intended audience for this guide is the sensory consumer professional or marketing research professional (“the researcher”) who is designing, executing, and interpreting data from product tests with acceptance or choice measures, or both. 1.5 Only two-sample product tests will be covered in this guide. However, the issues and recommended practices raised in this guide often apply to multi-sample tests as well. Detailed coverage of execution tactics, optional types of s...
SIGNIFICANCE AND USE 5.1 Acceptance and preference are the key measurements taken in consumer product testing as either a new product idea is developed into testable prototypes or existing products are evaluated for potential improvements, cost reductions, or other business reasons. Developing products that are preferred overall, or liked as well as, or better, on average, compared to a standard or a competitor, among a defined target consumer group, is usually the main goal of the product development process. Thus, it is necessary to test the consumer acceptability or the preference of a product or prototype compared to other prototypes or potential products, a standard product, or other products in the market. The researcher, with input from her/his stakeholders, has the responsibility to choose appropriate comparison products and scaling or test methods to evaluate them. In the case of a new-to-the-world product, there may or may not be a relevant product for comparison. In this case, a benchmark score or rating may be used to determine acceptability. A product or prototype that is acceptable to the target consumer is one that meets a minimum criterion for liking, and a product that is preferred over an existing product has the potential to be chosen more often than the less-preferred product by the consumer in the marketplace, when all other factors are equal. 5.2 The external validity (the extent to which the results of a study can be generalized) of both acceptance and preference measures to manage decision risk at all stages of the development cycle is dependent on the ability of the researcher to generalize the results from the respondent sample to the target population at large. This depends both upon the sample of respondents and the way the test is constructed. Within the context of a single test, acceptance measures tell the relative hedonic status of the two samples, quantitatively, as well as where on the hedonic continuum each of the samples falls, that is, “disliked,”... SCOPE 1.1 This guide covers acceptance and preference measures when each is used in an unbranded, two-sample, product test. Each measure, acceptance, and preference, may be used alone or together in a single test or separated by time. This guide covers how to establish a product’s hedonic or choice status based on sensory attributes alone, rather than brand, positioning, imagery, packaging, pricing, emotional-cultural responses, or other nonsensory aspects of the product. The most commonly used measures of acceptance and preference will be covered, that is, product liking overall as measured by the nine-point hedonic scale and preference measured by choice, either two-alternative forced choice or two-alternative with a “no preference” option. 1.2 Three of the biggest challenges in measuring a product’s hedonic (overall liking or acceptability) or choice status (preference selection) are determining how many respondents and who to include in the respondent sample, setting up the questioning sequence, and interpreting the data to make product decisions. 1.3 This guide covers: 1.3.1 Definition of each type of measure, 1.3.2 Discussion of the advantages and disadvantages of each, 1.3.3 When to use each, 1.3.4 Practical considerations in test execution, 1.3.5 Risks associated with each, 1.3.6 Relationship between the two when administered in the same test, and 1.3.7 Recommended interpretations of results for product decisions. 1.4 The intended audience for this guide is the sensory consumer professional or marketing research professional (“the researcher”) who is designing, executing, and interpreting data from product tests with acceptance or choice measures, or both. 1.5 Only two-sample product tests will be covered in this guide. However, the issues and recommended practices raised in this guide often apply to multi-sample tests as well. Detailed coverage of execution tactics, optional types of s...
ASTM E2943-15(2021) is classified under the following ICS (International Classification for Standards) categories: 19.020 - Test conditions and procedures in general. The ICS classification helps identify the subject area and facilitates finding related standards.
ASTM E2943-15(2021) has the following relationships with other standards: It is inter standard links to ASTM E456-13a(2022)e1, ASTM E253-19, ASTM E1958-19a, ASTM E1958-19, ASTM E253-18a, ASTM E2263-12(2018), ASTM E253-18, ASTM E1958-18, ASTM E456-13A(2017)e3, ASTM E456-13A(2017)e1, ASTM E1871-17, ASTM E253-17, ASTM E1958-16a, ASTM E1958-16, ASTM E253-16. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
ASTM E2943-15(2021) is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E2943 − 15 (Reapproved 2021)
Standard Guide for
Two-Sample Acceptance and Preference Testing With
Consumers
This standard is issued under the fixed designation E2943; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
INTRODUCTION
This guide is intended to be used by sensory consumer and marketing research professionals
(referredtoasthe“researcher”or“researchprofessional”)asanaidtounderstandingissuesassociated
with and to conducting two-sample acceptance and preference tests with consumers. This guide
includes a general summary of considerations and practices for conducting hedonic tests followed by
specific considerations and practices for both acceptance and preference testing, including pros and
cons of each method. Final sections consider the incorporation of both acceptance and preference
testing into the research plan and discuss potential lack of linkage in output/results between them. A
flowchart outlining summary of these methods and references for further reading are also included.
1. Scope 1.3.3 When to use each,
1.3.4 Practical considerations in test execution,
1.1 This guide covers acceptance and preference measures
when each is used in an unbranded, two-sample, product test.
1.3.5 Risks associated with each,
Each measure, acceptance, and preference, may be used alone
1.3.6 Relationship between the two when administered in
or together in a single test or separated by time. This guide
the same test, and
covers how to establish a product’s hedonic or choice status
1.3.7 Recommended interpretations of results for product
based on sensory attributes alone, rather than brand,
decisions.
positioning, imagery, packaging, pricing, emotional-cultural
1.4 The intended audience for this guide is the sensory
responses, or other nonsensory aspects of the product. The
consumerprofessionalormarketingresearchprofessional(“the
most commonly used measures of acceptance and preference
will be covered, that is, product liking overall as measured by researcher”) who is designing, executing, and interpreting data
fromproducttestswithacceptanceorchoicemeasures,orboth.
the nine-point hedonic scale and preference measured by
choice, either two-alternative forced choice or two-alternative
1.5 Only two-sample product tests will be covered in this
with a “no preference” option.
guide. However, the issues and recommended practices raised
1.2 Threeofthebiggestchallengesinmeasuringaproduct’s
in this guide often apply to multi-sample tests as well. Detailed
hedonic (overall liking or acceptability) or choice status
coverage of execution tactics, optional types of scales, various
(preference selection) are determining how many respondents
approaches to data analysis, and extensive discussions of the
and who to include in the respondent sample, setting up the
reliability and validity of these measures are all outside of the
questioning sequence, and interpreting the data to make prod-
scope of this guide.
uct decisions.
1.6 Units—The values stated in SI units are to be regarded
1.3 This guide covers:
as the standard. No other units of measurement are included in
1.3.1 Definition of each type of measure,
this standard.
1.3.2 Discussion of the advantages and disadvantages of
1.7 This standard does not purport to address all of the
each,
safety concerns, if any, associated with its use. It is the
responsibility of the user of this standard to establish appro-
priate safety, health, and environmental practices and deter-
This guide is under the jurisdiction of ASTM Committee E18 on Sensory
Evaluation and is the direct responsibility of Subcommittee E18.04 on Fundamen- mine the applicability of regulatory limitations prior to use.
tals of Sensory.
1.8 This international standard was developed in accor-
Current edition approved Jan. 1, 2021. Published April 2021. Originally
dance with internationally recognized principles on standard-
approved in 2014. Last previous edition approved in 2015 as E2943 – 15. DOI:
10.1520/E2943-15R21. ization established in the Decision on Principles for the
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2943 − 15 (2021)
Development of International Standards, Guides and Recom- (SD). The individual statements are either clearly favorable or
mendations issued by the World Trade Organization Technical clearly unfavorable (2 and 3).
Barriers to Trade (TBT) Committee.
3.2.6 P ,n—used in forced choice preference measures; a
max
test sensitivity parameter established before testing and used
2. Referenced Documents
along with the selected values of α and β to determine the
2.1 ASTM Standards:
number of respondents needed in a study.
E253 Terminology Relating to Sensory Evaluation of Mate-
3.2.6.1 Discussion—P is the proportion of common re-
max
rials and Products
sponses that the researcher wants the test to be able to detect
E456 Terminology Relating to Quality and Statistics
with a probability of 1 –β. For example, if a researcher wants
E1871 Guide for Serving Protocol for Sensory Evaluation of
to have a 90 % confidence level of detecting a 60:40 split in
Foods and Beverages
preference, then P = 60 % and β = 0.10.
max
E1958 Guide for Sensory Claim Substantiation
3.2.7 risk, n—possible consequences to the researcher’s
E2263 Test Method for Paired Preference Test
client when the test leads to an incorrect conclusion.
E2299 Guide for Sensory Evaluation of Products by Chil-
3.2.7.1 Discussion—Risk around decisions made based on
dren and Minors
research test results can be grouped into two types, loosely
called a “false positive” (when the test detects a difference that
3. Terminology
does not exist) and a “false negative” when the study does not
3.1 Definitions:
detect a true difference. In the case of a false positive, the
3.1.1 For definitions of terms relating to sensory analysis,
company spends development time and resources on an alter-
see Terminology E253.
native that does not deliver the intended effect. In the case of
3.1.2 For terms relating to statistics, see Terminology E456.
a false negative, the product developer or the company will
3.2 Definitions of Terms Specific to This Standard:
miss a product opportunity and waste resources developing
3.2.1 α (alpha) risk, n—probability of concluding that a
alternatives.
difference in liking or preference exists, when, in reality, one
3.2.8 sequential monadic, adj—refers to the presentation or
does not.
ordering in which respondents evaluate products or stimuli.
3.2.1.1 Discussion—Also known as Type I error or signifi-
3.2.8.1 Discussion—In a sequential monadic test, the re-
cance level.
spondent is presented with one product at a time to evaluate.
3.2.2 β (beta) risk, n—probability of concluding that no
3.2.9 sign test, n—statistical hypothesis test that can be used
difference in liking or preference exists, when, in reality, one
to compare two samples or a sample with a standard.
does.
3.2.9.1 Discussion—Noassumptionismadeabouttheshape
3.2.2.1 Discussion—Also known as Type II error.
or parameters of the population frequency distribution with the
3.2.3 hedonic continuum, n—hypothesized underlying con-
sign test and only the sign of the difference is considered.
tinuous dimension measured by acceptance scales.
3.2.10 student’s t test, n—statistical hypothesis test used to
3.2.3.1 Discussion—It is presumed to run from strong dis-
compare the means of two samples or a sample mean to a
liking through a neutral region and onto strong liking.
standard value.
3.2.4 labeled affective magnitude scale, n—labeled magni-
3.2.10.1 Discussion—It is appropriate when the measure of
tudescale(LMS)isahybridscalingtechniqueusingaverbally
interest is normally distributed in small samples and, more
labeled line with quasi-logarithmic spacing between each label
generally, for continuous, unbounded, symmetric measure-
and the scale consists of a vertical line, which is marked with
ments when the sample size is larger. Assumptions include no
verbal anchors describing different intensities (for example,
ties in the data.
“weak,” “strong”).
3.2.11 Type I error, n—see alpha risk.
3.2.4.1 Discussion—Typically, subjects are instructed to
place a mark on the line where their perceived intensity of
3.2.12 Type II error, n—see beta risk.
sensation lies, with the upper limit of the scale being the
3.2.13 Wilcoxon-Mann-Whitney test, WMW, n—rank-based
strongest imaginable sensation (1).
independent sampling alternative to the student’s t-test that is
3.2.5 Likert scale, n—attitude scales that can be constructed
appropriate when the data are measured on a common continu-
in an “agree-disagree” format (2).
ous scale that is not normally distributed.
3.2.5.1 Discussion—The Likert-type scale calls for a graded
3.2.13.1 Discussion—In these situations, it can be more
response to each statement. The response is usually expressed
efficient (increased statistical power to find a difference at a
in terms of the following five categories: strongly agree (SA),
given sample size) than a student’s t-test. Like the students
agree (A), undecided (U), disagree (D), and strongly disagree
t-test, it requires the assumption that the data have no ties.
4. Summary of Guide
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
4.1 This guide covers the similarities and differences be-
Standards volume information, refer to the standard’s Document Summary page on
tween acceptance and preference measures when used alone
the ASTM website.
and together in a two-sample test (see Fig. 1). The two
The boldface numbers in parentheses refer to a list of references at the end of
this standard. measures provide different information about respondents’
E2943 − 15 (2021)
subjective responses to products and should be deployed to identification, control, measurement, and tracking of variables
meet different research or business objectives. Acceptance that may influence results across tests (for example, production
measures are recommended when there is a need to obtain location, sample age, and storage conditions) are the respon-
information on intensity of liking/disliking and determine the sibility of the researcher.
relative hedonic status of two products. Preference measures
5.3 While measures of acceptance and preference are both
are recommended when there is a need to obtain information
subjective responses to products, and can be somewhat related,
on choice behavior or determine an ordinal relationship be-
they provide different information. A product may be “accept-
tweentwoproducts.Correctsamplingofrespondentsiscritical
able” but still not be preferred by the consumer over other
in both types of test. The researcher shall carefully prepare the
alternatives, and conversely, a product may be preferred over
researchlearningplanandthoroughlyreviewtheprosandcons
another but still not be acceptable to the consumer. These two
of the specific research design chosen (that is, measuring
terms, therefore, should not be used interchangeably. When a
acceptance, measuring preference, measuring both) against the
bipolar hedonic scale with multipoint options is used, the
decision risks associated with each measurement. Acceptance
researcher should specifically refer to “liking,” “acceptance,”
and preference measures, while imperfect, continue to be
or “hedonic ratings.” When preference measures are used, the
extremely useful in managing the risk in developing and
researchershouldreferto,“preference,”“productselection,”or
delivering new products to the marketplace.
“choice.” Research professionals themselves should be precise
in their usage of the terms “acceptance” and “liking,” to refer
5. Significance and Use
only to scaling of liking. These researchers should use the
5.1 Acceptance and preference are the key measurements terms “preference” and “choice” to refer to two (“PreferA” or
taken in consumer product testing as either a new product idea “Prefer B”) or three-choice (“Prefer A” or “Prefer B” or “No
is developed into testable prototypes or existing products are Preference”) response options given in a preference test. In
evaluated for potential improvements, cost reductions, or other addition to having different meanings, the two measures also
business reasons. Developing products that are preferred do not always provide similar results.This guide will cover the
overall, or liked as well as, or better, on average, compared to similarities and differences in information each provides, some
a standard or a competitor, among a defined target consumer guidelines around implementation, and interpretation of find-
group, is usually the main goal of the product development ings. This guide will thus give users an understanding of the
process.Thus, it is necessary to test the consumer acceptability issues at hand when planning, designing, implementing, and
or the preference of a product or prototype compared to other interpreting results from acceptance and preference tests with
prototypes or potential products, a standard product, or other consumers.
products in the market. The researcher, with input from her/his
5.4 While both measures are commonly used to provide
stakeholders, has the responsibility to choose appropriate
information for product development decisions and evaluating
comparison products and scaling or test methods to evaluate
a product’s competitive status, it is important to remember that
them. In the case of a new-to-the-world product, there may or
pricing, positioning, competitive options, product availability,
may not be a relevant product for comparison. In this case, a
and other marketplace factors also impact a product’s success.
benchmark score or rating may be used to determine accept-
ability. A product or prototype that is acceptable to the target
6. Hedonic Testing—Steps in Planning and Conducting
consumer is one that meets a minimum criterion for liking, and
an Acceptance or Preference Test
a product that is preferred over an existing product has the
6.1 Decide on the Key Question to be Answered: Liking or
potential to be chosen more often than the less-preferred
Choice or Both—Before planning and implementing a test, the
product by the consumer in the marketplace, when all other
researcher should determine what is needed to be learned from
factors are equal.
the research and what decisions will be made based on the
5.2 The external validity (the extent to which the results of
outcome. The researcher would be wise to consider overall
a study can be generalized) of both acceptance and preference
business strategies and the wider context of the project before
measures to manage decision risk at all stages of the develop-
test implementation. Additional considerations include stake-
ment cycle is dependent on the ability of the researcher to
holder alignment, resource availability, and the actionability of
generalize the results from the respondent sample to the target
potential outcomes. The researcher translates the stakeholder’s
population at large. This depends both upon the sample of
desired learning into a testable hypothesis, defines the test
respondents and the way the test is constructed. Within the
object and decision criteria, and confirms the objective and
context of a single test, acceptance measures tell the relative
criteria with stakeholders before collecting data. Both types of
hedonic status of the two samples, quantitatively, as well as
tests may be done at all project stages—to get a product’s
where on the hedonic continuum each of the samples falls, that
baseline measure early in development, to gauge progress later
is, “disliked,” “neutral,” or “liked.” In contrast, preference
in development, or when a product is already in the market.
measures tell the relative choice status of two samples within
6.2 Set Decision Criteria: Action Standards, Hypothesis
a specific respondent group. Results from these measures can
Direction, Sample Size, and Risk Levels:
and will vary from test to test depending on the number and
type of respondents serving in each test, the size and nature of 6.2.1 Action Standards—The action standard determines
the sensory differences between the two samples, the method whether the product meets the success criterion set in advance
of executing the test, and any error present in the test. The for success. In the case of acceptance testing, the action
E2943 − 15 (2021)
standard is set based on the product of interest’s hedonic score 6.2.3 Review Previous Testing Results and Evaluate Risk
relative to that of the second product. In the case of preference Levels Appropriate to Project’s Objectives and Decision
testing, the action standard is set based on the product of Risks—The researcher evaluates risk by gathering information
interest’s preference score relative to that of the second about the status of the project that includes this particular
product. The type and direction of the primary question, on research, estimating the resource risk around the results, and
which the action standard is based, factor heavily into the the impact of a false positive (“α-risk”) or a false negative
setting of the action standard. (“β-risk”) test result. Alpha risk is the risk that arises from
6.2.2 Determine Type and Direction of Question—In falsely declaring two products to be different when they are
general, there are two classes of questions associated with truly at parity, while beta risk arises from falsely declaring two
these types of evaluations: difference (directional or nondirec- products to be at parity when they are truly different. As an
tional) and parity questions. example, finding a difference when products are actually at
6.2.2.1 Directional—One-sided Hypothesis Testing—The parity could impact the business by leading it to launch a
test hypothesis is often that a new version of a product will be product it believes has a competitive edge when in fact no
betterlikedorpreferredcomparedtothecurrentproductorthat competitive advantage exists. Similarly, failing to detect a
a given brand of product will be better liked or preferred difference when products are, in fact, different could lead the
compared to another brand. These are examples of one-sided company to spend unnecessary development time and re-
tests. Note that if the goal is not achieved (the new product is sources to improve a product further, when, in fact, it already
not better liked or preferred compared to the current product), is liked more than a standard. Further, using lack of signifi-
it cannot be determined whether the new product is at parity or cance in a preference test as the rationale for stating that parity
less liked or preferred compared to the current product. in preference exists is not correct and can lead, in the short
One-sided tests require fewer respondents and, thus, can be the term, to launching an inferior product.
mostcost-effectiveapproachtoevaluatingthehedonicstatusof
6.2.4 Set Sample Sizes Based on Direction and Risk
two products in an acceptance or preference test when the goal
Levels—For both acceptance and preference tests, a sufficient
is to outperform another product. However, if the goal is not
sample size shall be used to ensure enough test power.
achieved, the relative status of one product versus another
Practically,theresearcherwillneedtostrikeabalancebetween
cannot be determined.
test power and the number of respondents one can afford to
6.2.2.2 Nondirectional—Two-sided Hypothesis Testing—
employ. Commercial software for such calculations includes,
The classical two-sided test is most appropriate when the
but is not limited to, SAS, SPSS, JMP, Stata and Minitab. Free
business or researcher wishes to know “which product is liked
calculations are available at http://statpages.org/ or http://
better?” or “which product is preferred?” when, for example, it
www.stat.uiowa.edu/~rlenth/Power/index.html. Sample sizes
is possible a new product may be either less liked or preferred
for preference tests at different risk levels can be found in Test
or more liked or preferred than a comparison product. The
Method E2263.
advantage of this type of test is that it allows for a finding on
6.3 Plan Data Analysis—It is critical to determine how the
either side of parity. However, two-sided tests require a larger
data will be analyzed before data collection as the method of
sample size to achieve the same power as a one-sided test.
analysis will impact power and variability calculations needed
6.2.2.3 Parity—Hedonic parity, “equivalence in liking or
to determine sample size. It is best to outline the decision
preference,” “just as good as in (liking or preference),” are
criteria as they relate to the specific measures used in the test
studies in which the objective is to demonstrate that the two
in advance and gain the alignment amount stakeholders.
products’ hedonic status is the same. Hedonic parity does not
Following this, the researcher should outline the possible
include superiority. “Unsurpassed” tests are those in which the
outcomes of the test before the data are collected, as unex-
goal is to establish that the product of interest is not less liked
pected results will be challenged on many different levels:
or less preferred than a comparison product. The “unsur-
“Was the test executed properly?” “Was the right method/
passed” test objective is to obtain support that the test product
measure used?,” and so forth.
is comparable, or, possibly even higher, in liking or preference
versus another product. Parity or unsurpassed test results may 6.4 Define Respondent Sample—For both acceptance and
be used to support communications to the consumer. Regard- preference studies, it is important that the results from the
less of the end use of the data generated in a hedonic test or samples respondents reflect the target market, current category,
parity, the researcher will need to sample substantially more or brand users for the product. For both acceptance and
respondents than is needed in tests for difference. Estimated preference testing, respondents should include those most
respondent sample sizes to yield results sufficiently robust to relevant to the question under study: specific brand users,
support parity in liking or preference are between 200 to 500, product category users, or targeted non-category users. This
depending on the size of the differences between the two recommendationisparticularlytruewhentheresearchquestion
products,thestandardagainstwhichthetestresultismeasured, is hedonic in nature. When the research question is functional,
and the variance associated with the liking scores. See Test orperformancerelated,itmaybeappropriatetouseemployees
MethodE2263andGuideE1958formoredetailedinformation or non-target consumers to screen products for attributes such
on sample size requirements in preference tests when support as“easytoopen,”“dispensesuniformly,”“coverscompletely,”
for parity is the test objective. and so forth.
E2943 − 15 (2021)
TABLE 1 Types of Respondents
Respondent Sample Type Recommended? Rationale
Target users—Currently using the product, flavor/form users, would Yes Differences in hedonic responses among a sample of such
purchase/use again respondents are most likely to reflect that of the population of target
users, assuming that the sampling plan includes a sufficient number of
respondents and the appropriate selection criteria have been applied.
“Convenience” sample—Category users who are positive toward the “Qualified yet,” with Liking or preference response likely to mirror that of the target
concept, and so forth, and positive to the flavor in the case of a food associated risks consumer up to a point: if product differences are small, or there is
product sensory segmentation in the target group, hedonic responses might
mislead the researcher.
“Convenience” sample—External respondents, not current users, not No Liking or preference response to the two products may not mirror that
users of the category, or even rejectors of the category. of the target consumer.
Non-R&D and project team, for example, marketing, sales, and plant No Bias toward own product.
personnel
Research and Development personnel, not on project team No Knowledgeable about project objectives, technical knowledge about
product, bias toward own project.
Project team/stakeholders No Knowledgeable about project objectives, technical knowledge about
product, bias toward own product.
Trained or experienced panelists used in discrimination or descriptive No Testing and training experiences lead this group of respondents to
tests evaluate the products objectively rather than the subjective
evaluations required in hedonic tests.
6.4.1 Target user selection criteria may be based on a users of the products, category acceptors, or even familiar with
number of criteria: demographics, geography, psychographics, theproductcategory)arerecommendedwithreservations,only
proprietary segmentation information, or product usage iftheyareconceptpositiveandflavorpositiveifafoodproduct
is to be tested. These reservations are based on the common
behavior,orcombinationsthereof.Forexistingproductsorline
convenience sampling practice of obtaining small number of
extensions, a sample of current users of the product or brand is
consumers(forexample,lessthan100)whenusingalocalarea
recommended to assess a product’s suitability for the brand.
source, coupled with the possibility that drawing respondents
Additionally, if the product is intended to attract competitive
from a single area might not include consumers representing
users or new users, then respondent samples from the group(s)
different sensory segments. Results from respondents drawn
is/are needed, since the study results can vary significantly
from a convenience sampling method may not represent
across different subgroups of brand users within the category.
consumers who are actual users. See Table 1, which outlines
Based on the degree of consumer segmentation within a
recommendations for obtaining different consumer samples.
category or the presence of a small number of competitors, the
6.4.3 Trained descriptive, discrimination panelists or fre-
selection of respondents can greatly influence the study results,
quently used internal panelists drawn from the technical areas
particularly for preference studies conducted with in-market
of a company should not be used as respondents in an
products. It is generally accepted that loyal or heavy users of a
acceptance or preference test. Because of their training and
product may recognize their product, even in an unbranded
analytical orientation and their knowledge of the product’s
product test, and are biased toward rating it more favorably
technical features, these panelists are likely to respond to
than the other product within the study.After the acceptance or
products different from untrained consumers. See Table 1,
preference measure is completed, the researcher can ask
which lists the various types of respondent samples that might
respondents to postulate the brand identity of the products.
be considered for an acceptance or a preference test, recom-
Clear documentation of respondent selection criterion is re-
mended usage, and rationale.
quired so that this information is available for any subsequent
6.4.4 For new product categories, it may be difficult to
related consumer studies.
identify the criteria for selecting the target consumer. For new
6.4.2 External Respondents: Minimum Respondent Require-
products, the researcher may want to select category acceptors
ment for Acceptance and Preference Testing—It is highly
who are also early adopters, consumers who actively seek and
recommended that respondents be recruited and selected from
purchase new products in the category, or those that are
a population of target users for the products being tested. By
positive to the idea or concept of the new product (concept
doing so, the researcher should be able to generalize findings.
acceptors).
While some debate exists as to the suitability of using
employees to obtain products’ hedonic information as a best
6.5 Record Product Information—The researcher needs to
practice,useofemployeesasrespondentsforeitheracceptance
record the product information on the package. Most research-
or preference testing is strongly discouraged as there may not
ers take a picture of the product or remove the label and
be a meaningful relationship between employees’and external
photograph to the front label information, ingredients, and
target users’ responses to the tested products. “Convenience” nutritional facts. The lot number and “use by” dates also need
samples (typically small samples of respondents drawn from
to be recorded. If the product is not on the market, then the
one source, such as a church or a university that may not be formulaorcompositionandinformationneededforretrievalof
E2943 − 15 (2021)
the ingredients, processing, and manufacturing location should taken in comparing results to prior findings as consumer
be recorded. Preparation or other usage instructions and response is often context dependent. For example, other
carriers used should also be documented. These records will products included in the research may influence ratings for the
allow future researchers to compare results from the same products of interest. Recommendations as to next steps, based
product if needed. on test findings as related to business strategy, should be
included.
6.6 Develop Questionnaire—Diagnostic information
(intensity, just-about-right (JAR), “Check All That Apply”
7. Acceptance Testing
(CATA)), open-ended likes and dislikes, or other measures that
help explain product performance may be included in both 7.1 Definition of Acceptance Testing: Affective Continuum—
The nine-point hedonic scale is a bipolar scale with the same
acceptance and preference tests. The recommended practice is
to ask the overall liking or preference question first, before formatasLikertscales.Threebroadcategoriesarerepresented:
“like,” “neutral,” and “dislike.” This type of hedonic scale is
diagnostic questions, if the hedonic question is going to be
used for decision-making. If a preference question is to be used when the primary goal of the research is to learn where
included, the option of including a “no preference” response two products fall on this hedonic continuum and the size of the
shall be considered (see 8.5). hedonic differences between them. The nine-point hedonic
scale provides degree and direction from the neutral point
6.7 Collect Data—Present the proper set of products in a
“neitherlike/nordislike.”Theoriginalnine-pointhedonicscale
manner that ensures unbiased responses. Checks and balances
wasconstructedempiricallyand,whiletheverbalanchorshave
need to be implemented to ensure that data collected provide
been shown to have equal interval properties for the original
actionable results. For unbranded testing, sensory information
stimuli (5), some researchers do not accept the equality of the
that allows a product’s brand to be identified should be
categories (6).
eliminated or reduced as much as possible. Likewise, the ages,
7.2 Set Decision Criteria: Action Standards, Hypothesis
the condition, and the handling of the samples being tested
Direction, Sample Size, and Risk Levels:
should be comparable. The method of sample presentation
7.2.1 Use acceptance measures when there is a need to
should be balanced to reduce order and context effects. See
identify the two products’ relative status on the hedonic
Practice E1871, Guide E1958, Test Method E2263, and ASTM
continuum, that is, where on the scale each is rated, that is,
Manual 26 (4) for more complete descriptions of methods to
whether consumers “like,” “dislike,” or are “neutral” toward
manage or eliminate bias in sensory tests. Samples are typi-
each one of two products and when the interval relationship
cally served in sequential monadic fashion when conducting
between the two samples needs to be quantified.
acceptance testing, while sequential monadic or simultaneous
7.2.2 The hypothesis to be tested will state either that there
presentationarebothcommonmodesofsamplepresentationin
is some difference in liking between the samples or that there
preference testing. While a somewhat less sensitive determi-
is no difference in liking between the samples. The action
nation of the relative hedonic status of two products may also
be obtained via monadic testing (different respondent groups standard will be based on whether the obtained results are
consistent with the hypothesis at a prespecified probability
evaluate each of two samples), this guide has, as its focus, the
more common sequential monadic presentation. level. It is typical to test at the 90 or 95 % confidence level.
7.2.3 The number of consumers to be included in the
6.8 Analyze Data and Interpret Results—Determine
research will depend on several factors: (1) the consumer
Whether Action Standard Has Been Met):
sample size used historically in the company, (2) the minimum
6.8.1 Data Analysis Information for Both Acceptance and
size of the sensory difference in liking (in scale units) desired
Preference Measures—Theresearchplanforthespecificanaly-
to be detected between the two products, and (3) the variability
sis when both acceptance and preference are measured should
in liking ratings among the respondents. If consumer-liking
specify in advance the alpha level, beta level, and direction
data exist from previous testing of the same products, this
(one-sided or two-sided) of the statistical tests. For preference
historical data can be used to estimate the variability that is
tests, the plan should also include information on the number
likely to be found in a consumer test of the same products
of common response (P ) and the size of difference to be
max
(standard deviation/standard error). For many U.S. consumer
detected. For acceptance tests, the size of difference to be
products companies, sample sizes between 100 and 150 are
detectedandtheestimatedvariabilityinlikingofbothproducts
common when the test hypothesis is to establish differences in
should also be included. The results are compared with the
liking.Inacceptancetests,itispossibletogaugeinadvancethe
decision criteria for interpretation.
risk of missing a true difference in liking between two samples
6.9 Report and Communicate the Results—Derive a Mes- (beta) if one knows the size of the difference one wishes to
sage About Product’s Relative Hedonic or Preference Status— detect (if, for example, one wishes to be able to detect a
Once the mechanics of the test are complete and data are difference of 0.3 hedonic units on a 9-point hedonic scale) and
collected,analyzed,andreviewed,theresearcherhasthejobof knows the variance in liking ratings for the samples before
communicating what the results mean: which product is liked conducting the test.As an example, 130 people are required to
better; which product was selected more often over the other; have an 80 % chance of detecting a 0.5 difference with 95 %
and the evidence, if any, for consumer segments; limitations of confidence when using a 9-point hedonic scale with a standard
generalizing to other respondent groups; and how the results deviation of 1 unit in a 2-sample test. For acceptance tests with
compare to previous findings. Caution, however, should be the 9-point hedonic scale, a sample size of 112 respondents is
E2943 − 15 (2021)
needed to detect a 10 % difference in the scale given the Note that the response to these questions may be biased by a
variability in the data in this meta-analysis (7). halo effect as the respondent may be justifying their prior
choices/ratings.FormoreinformationonJARscalessee ASTM
7.3 Plan Data Analysis—Data analysis for a two-sample
Manual 63 (12).
acceptancetestistypicallyadependent(related)samples t-test.
7.5.5 Ask Acceptance before Diagnostic Questions—The
For a finding of one product being liked more or less than
first question asked is generally thought to be the most
another, the researcher only needs to set the confidence level in
unbiased. Placing the acceptance question first is recom-
advance. See 6.2.2.3 for a discussion of parity.
mended if that is the primary measure of interest. Placing the
7.4 Define Respondent Sample—See 6.4.
acceptance question after the attribute questions may change
(usually lower) the mean overall liking ratings. The diagnostic
7.5 Develop Questionnaire:
questions should use consumer language and refer to attributes
7.5.1 General Considerations—The questionnaire for an
that consumers would typically notice. For example, asking
acceptance test will consist of one or more liking scales for
about “glue lines” (in a cardboard package) in a consumer
overall and possibly attribute ratings of acceptance, and could
product is too technical, while asking how difficult it was to
also include diagnostic scales such as intensity or just about
open the package is not. It is hypothesized that focusing on
right. Scale format options vary widely.
specific attributes before the overall acceptance question may
7.5.2 Scale Format Options—The nine-point hedonic cat-
prompt consumers to pay closer attention to certain product
egory scale may be presented in either a horizontal or vertical
characteristics that they might otherwise ignore and, therefore,
layout, with categories labeled as follows; “9” Like Extremely,
cause them to be more critical when answering later questions.
“8” Like Very Much, “7” Like Moderately, “6” Like Slightly,
In monadic sequential designs, the second-sample acceptance
“5” Neither Like Nor Dislike, “4” Dislike Slightly, “3” Dislike
result may be influence by diagnostic questions asked in the
Moderately, “2” Dislike Very Much, and “1” Dislike Ex-
first sample (13). This is one reason that the order of product
tremely.Thescalingnumbersmayormaynotbeincludedwith
evaluation is carefully balanced across samples.
the scale anchors. Other options include the hedonic scale as a
7.6 Collect Data—See 6.7.
line scale (usually 15 cm), labeled affective magnitude scale
(7-9) or ratio scale (10). Each of these options has relative
7.7 Analyze Data and Interpret Results—Determine
advantages and disadvantages, which vary depending on the
Whether Action Standard Has Been Met—Once data have been
research objective and respondent sample. If results will be
collectedandcheckedforcorrectness,thestatisticalanalysisof
compared across tests, it is important to use the same scale
the data may be done using the actual variability measures.
consistently. Extrapolating results from one scale to another is
Parametric analyses, such as a dependent t-test in a two-
not recommended as end-point effects and other psychological
product, one-respondent group test, are typically done with
issues make this imprecise at best and grossly incorrect at
acceptance data, although nonparametric alternatives such as
worst. The Office of Scale Research at Southern Illinois
sign or signed rank tests on the differences should be consid-
University can assist researchers with scale identification and
ered when the data fail the parametric assumptions. After the
usage. See http://scaleresearch.siu.edu/.
datahavebeencollected,theyshouldbereviewedtodetermine
7.5.3 Number of Scale Points—An odd number of
if the variability and distribution assumptions used in planning
categories, or scale points, with a “neutral” midpoint and a
thetestweremet.Ifnot,aprespecifiedactionstandardmaynot
balanced number of categories on either side of the midpoint
have the desired risk levels. Since the business goal, the
are typical of hedonic rating scales. Unbalanced scales will not
analysis, and the desired risk levels determine the action
fairly represent the range of hedonic responses consumers
standard, it may be necessary to adjust these to attain the
might have. More scale points provide the advantage of
desired properties. If the true variation in liking is not known,
increased sensitivity in finding liking differences between two either the action standard or the desired risk levels can be set
products. End-point avoidance means that an N-point scale is
before the test is conducted, not both. This is because a
effectively an N minus two-point scale to the extent that measured variation in liking that is larger than that assumed
respondents avoid using the end points. For example, a
pre-testing will result in either greater risk levels associated
nine-point scale is often effectively seven-point scale, and a withagivenactionstandardoramorestringentactionstandard
five-point scale is often effectively a three-point scale (11).
to maintain the prespecified risk levels.
7.5.4 Inclusion of Diagnostic Scales—Although the liking 7.7.1 Plot Data, Review Variability, and Measures of Cen-
rating is the primary response with acceptance scales, further tral Tendency—Itiscriticalthattheresearcherexaminenotjust
diagnostic questions may be included in the questionnaire. the mean score or the summary liking or preference data from
Researchers frequently ask consumers to either (1) rate the a test but also the distribution of responses and the relationship
intensity or liking of the product on specific attributes, or (2) of these responses to characteristics of the panel sample, for
indicate the extent to which the product is “Just About Right” example, segmentation. It is also good practice to determine
(JAR) on specific attributes, or both, (that is, opportunity how well the data meet the requirements of any statistical tests
analysis). Both intensity attributes and JAR ratings are diag- that will be performed.As an example, examine the skewness,
nostic. They are intended to provide the researcher with kurtosis, and normality of the distributions for each of the
information to interpret the liking status and provide guidance products. If the acceptance ratings are bimodal for both
as to how to improve it. JAR data are used to explain why products, the researcher can do a cluster analysis to determine
products are liked or how the product can be improved or both. what the mean liking is for each product for each cluster and to
E2943 − 15 (2021)
identify what demographic variables are associated with each that may or may not be associated with preferences that would
group of consumers. If only one product’s rating is bimodal, be revealed in preference testing.
the researcher may need to consider conducting a non-
7.9 Disadvantages of Acceptance Testing:
parametric statistical analysis to determine if there are liking
7.9.1 Acceptance measures do not necessarily model con-
differences (14). Finally, the researcher should examine the
sumer behavior. Consumers choose products in the market
effect of order on the test results: was the product in the first
based on a range of variables, many of which are not based on
position rated differently from the same product in the second
theproduct’ssensorycharacteristicsandmanyofwhicharenot
position? Also, examine the difference in impact of order
rational. Almost never, in real life, does a consumer rate a
between the products. As an example, a new version of a
product on a scale. Consequently, it is difficult to translate
product may perform well in first position compared to the
differences between samples evaluated using acceptance tests
current product but may drop more in liking when evaluated
into real-world consumer behavior. For example, the meaning
after the current product than the current product drops when
of a one-point difference can only be determined with a large
evaluated after the new product.
database of past results that relate differences in acceptability
7.7.2 Standard Error Varies With Mean of Each Product—
to differences in consumer choice. Part of the researchers’
The researcher should calculate, along with the means and
value is in providing that reference/interpretive information.
mean difference
...




Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...