ISO/IEC 29170-3:2026
(Main)Information technology — JPEG AIC Assessment of image coding — Part 3: Subjective quality assessment of high-fidelity images
Information technology — JPEG AIC Assessment of image coding — Part 3: Subjective quality assessment of high-fidelity images
This document specifies a subjective image quality assessment methodology that covers a range from good quality up to mathematically lossless. This document is applicable to the assessment of distortions due to image coding (i.e. lossy compression) and not necessarily other kinds of distortions (e.g. capture, sensor or rendering artefacts).
Titre manque — Partie 3: Titre manque
General Information
- Status
- Published
- Publication Date
- 02-Feb-2026
- Current Stage
- 6060 - International Standard published
- Start Date
- 03-Feb-2026
- Due Date
- 21-Jul-2027
- Completion Date
- 03-Feb-2026
Overview
ISO/IEC 29170-3 - "Information technology - JPEG AIC Assessment of image coding - Part 3: Subjective quality assessment of high-fidelity images" specifies a subjective image quality assessment methodology targeted at the high-quality range from “good” up to mathematically lossless. The standard is part of the ISO/IEC 29170 series and focuses on fidelity (perceptual closeness to the source image) for distortions primarily due to image coding (lossy compression). It provides procedures to obtain fine-grained, reliable measurements expressed in just noticeable difference (JND) units.
Key topics and requirements
- Triplet comparison: Psychophysical test design where two distorted stimuli and the pivot (source image) are presented; observers indicate which test image differs more from the pivot. This method improves sensitivity for small fidelity differences.
- Boosted vs. plain comparisons:
- Boosted Triplet Comparison (BTC) uses artifact amplification, zooming and flickering to make minute differences more perceivable.
- Plain Triplet Comparison (PTC) presents stimuli without boosting. Results are statistically rescaled to a unified JND scale.
- JND scale reconstruction: Uses Thurstone’s Case V psychometric scaling (latent Gaussian quality scale) to convert observer responses into linear JND units.
- Data cleansing and analysis: Procedures for filtering unreliable observers/assignments (accuracy and consistency checks, trap questions, iterative outlier detection) before psychometric scaling.
- Observer and viewing conditions: Normative guidance on participant selection, controlled viewing conditions and study design to ensure reproducibility and validity.
- Normative annexes: Practical guidance on generation of stimuli, triplet selection, batch generation, BTC/PTC implementation, data cleansing, and an interchange format for experiment data.
Practical applications and users
- Image codec developers and researchers use ISO/IEC 29170-3 to evaluate and compare high-fidelity compression algorithms where traditional 5‑point ACR methods lack resolution.
- Quality assurance teams in imaging, camera, and display companies employ the methodology for fine-grained fidelity testing and regression checks.
- Streaming and content delivery services can validate perceptual thresholds and bitrate-quality trade-offs near visually lossless regions.
- Academic researchers and perceptual metrics developers use the JND-based data to train and validate objective image quality models.
- Standards bodies and test labs use the specified procedures for inter-lab reproducibility when assessing JPEG AIC and related high-quality image coding technologies.
Related standards
- Part of the ISO/IEC 29170 series; references established subjective test guidance such as ITU‑R BT.500 and complements ISO/IEC 29170‑2 (flicker test for visually lossless threshold).
Keywords: ISO/IEC 29170-3, JPEG AIC, subjective quality assessment, high-fidelity images, JND, triplet comparison, boosted triplet comparison, image coding, fidelity, psychometric scaling.
Get Certified
Connect with accredited certification bodies for this standard

BSI Group
BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

NYCE
Mexican standards and certification body.
Sponsored listings
Frequently Asked Questions
ISO/IEC 29170-3:2026 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology — JPEG AIC Assessment of image coding — Part 3: Subjective quality assessment of high-fidelity images". This standard covers: This document specifies a subjective image quality assessment methodology that covers a range from good quality up to mathematically lossless. This document is applicable to the assessment of distortions due to image coding (i.e. lossy compression) and not necessarily other kinds of distortions (e.g. capture, sensor or rendering artefacts).
This document specifies a subjective image quality assessment methodology that covers a range from good quality up to mathematically lossless. This document is applicable to the assessment of distortions due to image coding (i.e. lossy compression) and not necessarily other kinds of distortions (e.g. capture, sensor or rendering artefacts).
ISO/IEC 29170-3:2026 is classified under the following ICS (International Classification for Standards) categories: 35.040.30 - Coding of graphical and photographical information. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 29170-3:2026 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
International
Standard
ISO/IEC 29170-3
First edition
Information technology — JPEG AIC
2026-02
Assessment of image coding —
Part 3:
Subjective quality assessment of
high-fidelity images
Reference number
© ISO/IEC 2026
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2026 – All rights reserved
ii
Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms, definitions, symbols and abbreviated terms . 1
3.1 Terms and definitions .1
3.2 Symbols .3
3.3 Abbreviated terms .4
4 Methodological overview . 4
Annex A (normative) Generation of stimuli . . 7
Annex B (normative) Triplet selection and batch generation . 9
Annex C (normative) Observers and viewing conditions .11
Annex D (normative) Boosted and plain triplet comparison (BTC / PTC).13
Annex E (normative) Data cleansing procedures .15
Annex F (normative) JND scale reconstruction . 17
Annex G (informative) Interchange format .21
Annex H (informative) Application range and design rationale .24
Bibliography .29
© ISO/IEC 2026 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 29170 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
© ISO/IEC 2026 – All rights reserved
iv
International Standard ISO/IEC 29170-3:2026(en)
Information technology — JPEG AIC Assessment of image
coding —
Part 3:
Subjective quality assessment of high-fidelity images
1 Scope
This document specifies a subjective image quality assessment methodology that covers a range from good
quality up to mathematically lossless.
This document is applicable to the assessment of distortions due to image coding (i.e. lossy compression)
and not necessarily other kinds of distortions (e.g. capture, sensor or rendering artefacts).
2 Normative references
There are no normative references in this document.
3 Terms, definitions, symbols and abbreviated terms
3.1 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org
3.1.1
image quality
degree to which an image is free of undesirable aesthetic flaws or noticeable artefacts, as judged by observers
Note 1 to entry: This is an umbrella term that can refer to fidelity, appeal, or a combination of both.
3.1.2
appeal
aesthetic pleasingness of a stimulus under consideration, regardless of its difference with the source image
Note 1 to entry: It is possible to obtain distorted images with a better appeal than the source image (e.g. by applying
denoising, sharpening, contrast or saturation stretching, or other image enhancement methods). Appeal of a stimulus
can be assessed without reference to the source image.
3.1.3
fidelity
truthfulness to the source image, i.e. the degree to which there is no noticeable difference between the
source image and the stimulus under consideration
Note 1 to entry: By definition, the source image itself has the highest possible fidelity; no distorted image can have
a higher fidelity. Fidelity of a stimulus can only be assessed in relation to the source image. In this document, image
quality refers specifically to fidelity.
© ISO/IEC 2026 – All rights reserved
3.1.4
just noticeable difference
unit of difference between stimuli (distortion magnitude) corresponding to a 50 % detection rate, as
estimated with Thurstone’s Case V model
Note 1 to entry: The source image, by definition, is at the zero point of the JND scale.
3.1.5
observer
human with normal visual acuity and colour vision, who is a participant in a crowd-sourced or controlled-
environment subjective experiment
3.1.6
stimulus
image to be judged by an observer, either a source image or a distorted image
3.1.7
distorted image
image that is an imperfect reproduction of a source image, i.e. it is based on a source image but to some
extent differs from it
Note 1 to entry: The imperfections primarily considered in this document are those due to lossy compression artefacts
originating, but can also include other types of information loss, sensor noise, capture artefacts, transmission errors
or display issues.
3.1.8
source image
pristine image, which is the original from which distorted images are derived, for example by a lossy
encoding/decoding process
3.1.9
assessment methodology
experimental methodology aimed at accurately estimating image quality
3.1.10
triplet comparison
psychophysical method that involves three stimuli derived from the same source, where the pivot stimulus
is the reference image (source image) and the question asked is which of the other two stimuli (the test
images) shows the stronger differences to the pivot
3.1.11
batch
series of study questions presented to observers to respond to
3.1.12
assignment
instance of a batch for an observer that is performing the test
3.1.13
session
one-time participation of an observer performing a set of assignments
3.1.14
study question
specific instance of a triplet comparison question
3.1.15
trap question
triplet comparison designed to help filter out unreliable observers, such as random clickers, where the
maximum distortion level is compared to the undistorted source image, making the distortions relative to
the corresponding source image readily discernible
© ISO/IEC 2026 – All rights reserved
3.1.16
plain comparison
triplet questions in which the test images are not subjected to any boosting technique
3.1.17
boosted comparison
triplet questions in which the distortion of the test images is made more pronounced using techniques such
as zooming, amplification, and flickering
3.1.18
distortion type
method with a parameter that corresponds to an expected distortion magnitude (e.g. a specific encoder/
decoder implementation, with a quality setting, quantization parameter or bitrate target), which can be
used to produce a sequence of distorted images from a given source image
3.1.19
codec
image encoder and decoder process that yields a distortion type
3.1.20
data analysis
statistical procedures to process raw observer responses, filtering out unreliable assignments and
combining results from boosted and plain comparisons in order to obtain a unified reconstructed JND scale
3.1.21
raw experiment data
unprocessed observer responses including all recorded information relevant for data analysis
3.1.22
data cleansing
pre-processing steps to filter the raw experiment data of unreliable assignments and outliers, which is the
initial stage of data analysis
3.1.23
JND scale reconstruction
statistical procedures to derive quality scale values from the cleansed data
3.2 Symbols
a distorted image derived from source image S by applying distortion type C with distortion
CS,q
magnitude parameter q
D
distortion magnitude in JND units
R
bit rate in bits per pixel
S
a source image
Φ
the cumulative normal distribution
© ISO/IEC 2026 – All rights reserved
3.3 Abbreviated terms
ACR absolute category rating
ACR-HR absolute category rating with hidden reference
bpp bits per pixel
BTC boosted triplet comparison
CCR comparison category rating
DCR degradation category rating
DSCQS double stimulus continuous quality scale
DSCS double stimulus comparison scale, alternative name for CCR
DSIS double stimulus impairment scale, alternative name for DCR
IQA image quality assessment
JND just noticeable difference
MLE maximum likelihood estimation
MOS mean opinion score
PC pairwise comparison
PTC plain triplet comparison
VAS visual analog scale
4 Methodological overview
Image compression generally causes undesirable artefacts that reduce the perceived visual quality. The most
commonly used subjective image quality assessment methodologies apply 5-level absolute category rating
(ACR) which was designed to evaluate primarily heavily or medium compressed images. These methods
therefore are not accurate when evaluating images in the range from high to visually lossless quality.
Visually lossless quality is the quality range from 0 to 1 just noticeable difference (JND) units as measured
by the flicker test methodology defined in ISO/IEC 29170-2.
The subjective image quality assessment methodologies defined in this document cover a range from good
quality up to mathematically lossless. Good quality, here, is defined as the visual quality level corresponding
to a mean opinion score of 4 on a 5-point scale in a side-by-side comparison as in ITU-R BT.500. The threshold
of visually lossless quality, as determined by an AIC-2 flicker test (ISO/IEC 29170-2), is included in the range
covered by this document. For mathematically lossless compression, image quality assessment does not
apply, but the methods described in this document can provide meaningful differentiation (in fractional JND
units) between images that are visually lossless.
This document introduces several methodologies for reliable fine-grained and precise subjective quality
assessment in this high-quality range. The target is visual quality in the sense of fidelity, i.e., perceptual
closeness to the corresponding source image, which is the primary traditional concern in image codec
development.
For this purpose, this document specifies a methodology for subjective image quality assessment using
an approach in which the fidelity of two distorted images, derived from the same source, are compared
with each other. The human visual system is able to more sensitively detect minute differences in fidelity
in such comparisons than by direct rating of each test image independently. Observers should focus on
fidelity without being influenced by image aesthetics and occasional side effects of image compression
© ISO/IEC 2026 – All rights reserved
that effectively enhance the visual image quality (e.g. amounting to denoising). For that purpose, the
corresponding source image is included in each comparison. This creates triplet comparisons in which the
source image serves as the ‘pivot’.
The second methodology specified in this document is a boosting technique that supports human observers
to more clearly distinguish minute differences in fidelity. It combines artefact amplification, zooming, and
flickering. Artefact amplification of a distorted image is the linear scaling of pixel-wise differences to the
corresponding source image in all three RGB colour components. Zooming magnifies differences spatially. In
a flicker test two distorted images are alternating in-place rapidly with the common source image. Boosting
artificially increases perceived distortion magnitudes. To allow to scale them back to the desired original
perceived distortion magnitudes under normal viewing conditions, the quality assessment also includes a
smaller number of plain triplet comparisons without boosting and a statistical numerical method to derive
the desired unified quality scale. By means of this rescaling technique the precision and granularity of
assessed distortion magnitude is increased significantly.
Another fundamental methodological component specified in this document is a dataset cleansing
procedure. It is based on thresholding the average of weighted accuracy and consistency of batches of triplet
responses (task assignments), followed by a robust iterative outlier detection procedure. Each such batch
consists of multiple responses corresponding to one observer session of triplet comparisons. The accuracy
of such a batch of responses can be estimated from the ratio of same-codec responses that match with the
bitrate order of the test stimuli, thus correctly identifying the stimulus that can be assumed to generate the
stronger perceived distortion assuming monotonic encoder behaviour. Similarly, consistency of responses
in a batch can be estimated by comparing responses for symmetrically reversed triplet questions.
This document describes a method of psychometric scaling of perceptual image quality from collected
responses to triplet questions which was designed to quantify quality differences linearly and faithfully.
This method is known as Case V of Thurstonian scaling and based on the model assumption of a latent
quality scale upon which the perceived image qualities of all stimuli are given by Gaussian random variables
of equal variance. Such an assumption may not hold true in all applications of image quality assessment,
however, for the special case of distortions from image codecs that are restricted to a narrow range in
the high-quality region it is reasonable. The variance for the random variables can be chosen such that a
resulting difference of quality scale values of one unit between two compressed images can be interpreted
in the way that the probability of the detection of the difference of distortions in the images by a random
observer is equal to one half. Such a difference of perceptual distortion magnitude is called the JND. Thus,
the final reconstructed scores are expressed in terms of a JND scale for quality assessment of compressed
images in the targeted high-quality range.
The JND scale quantifies the relationship between stimulus intensity and the perceived differences. In
psychophysics, Thurstone’s Case V model is commonly employed to analyse binary responses in pairwise
comparison experiments. This model assumes that the perceptual differences between compared stimuli
follow a normal distribution with a fixed variance of 1.
Under this framework, one JND unit corresponds to the level of distortion at which an observer has a 50 %
probability of detecting a difference relative to the source image. In a pairwise comparison experiment with
forced choice, this equates to a proportion of correct responses of 0,75.
The relationship between the JND scale and the probability of correct responses is described mathematically
in Thurstone’s Case V as:
1
pd 07. 5 d (1)
where pd is the probability of correct responses, d represents the perceptual difference between stimuli
in JND units, and Φ is the cumulative distribution function (CDF) of the standard normal distribution. The
1
multiplication by 07. 5 ensures that d is expressed in JND units.
In traditional quality assessment, the quality of each stimulus is estimated independently of all others.
However, for the case of perceived quality of a series of images compressed by the same codec at different
bitrates from the same source image this ignores the prior knowledge that the reconstructed scales lie on
the corresponding distortion-rate curve. Therefore, this document specifies a method for direct estimation
© ISO/IEC 2026 – All rights reserved
of distortion-rate curves, respectively the estimation of the corresponding parameters that identify the
proper model from a family of suitable functions.
NOTE Subjective and objective image quality assessments typically only provide an ordering of test images
w.r.t. perceived image quality. This is why in a comparison of competing objective quality metrics for performance
in linearity by means of Pearson correlation with subjective ground truth, the quality predictions of the metric are
subjected to a fitting procedure relative to the subjective ground truth before computing the correlation. In subjective
quality assessment using the ACR scale, the perceived difference in quality between stimuli rated as ‘poor’ and ‘bad’
is typically significantly smaller than between ‘good’ and ‘fair’ even though the corresponding differences of the
corresponding mean opinion scores (MOS) are the same, namely equal to 1 unit.
An overarching principle for the reconstruction of image quality on the latent scale is maximum likelihood
estimation (MLE). It provides a general, well founded statistical method that applies equally to the
estimation of individual quality scales, the estimation of entire distortion-rate functions, as well as the joint
data analysis of BTC and PTC triplet responses for the unified quality scale. Moreover, MLE brings about new
and better ways to assess performance of objective metrics.
In the following the main ingredients of the methodology are presented, while remaining details and
specifications, as well as a demonstration case are given in the annexes.
The set of source images and distorted images shall be generated according to Annex A.
The selection of triplets and generation of batches shall be performed as specified in Annex B.
Observer selection and viewing conditions shall be as specified in Annex C.
The collection of observer responses shall be performed using the test protocols specified in Annex D.
Observer responses shall be filtered according to the procedures specified in Annex E.
After data cleansing, the observer responses obtained from the two test protocols (boosted and plain triplet
comparisons) shall be modelled and processed according to procedure specified in Annex F. The end result
is a reconstructed JND scale.
Raw results (before data cleansing) as well as reconstructed scores should be represented in the interchange
format specified in Annex G.
© ISO/IEC 2026 – All rights reserved
Annex A
(normative)
Generation of stimuli
A.1 Selection of source images
The procedures laid out in this document are intended for the assessment of perceptual quality in the high-
fidelity range. Therefore, it should be ensured that the source images are pristine. In particular, source
images should not contain any compression artefacts. Preferably photographic source images are shot with
a professional camera and created from the uncompressed raw camera data. It is also acceptable to use high
quality lossy images if they are downsampled with a sufficiently large downsampling factor to effectively
eliminate the (mild) compression artefacts.
The selection of source images should cover the potential applications for the codecs that will be tested. The
image content should be as diverse and representative as possible, within the constraints imposed by the
feasible size of the experiment. For general-purpose image codecs, the following image categories should be
considered:
— natural images of people and animals;
— natural images of scenery, landscapes and still life;
— photo-realistic rendered images (e.g. game screenshots or ray traced scenes);
— screen content, including text, icons, web pages, UI elements, etc.;
— digital art;
— artificial test patterns, in particular ‘challenge images’ that represent worst-case scenarios for lossy
compression.
If the coding system is intended for specific image types or applications, such as medical imaging, the source
images should be a set of images appropriate to the application.
The original image dimensions (width and height in pixels) of the selected images should be diverse and
representative for the application. The final source images should however be cropped to dimensions that
can be displayed 1:1 given the (minimum) test display resolution, taking into account the space taken by the
test interface.
A.2 Preparing distorted images
For each source image and each codec, a set of distorted images shall be created. In the typical case of
assessment of image coding, the distorted images are the result of applying an encoding step followed by a
decoding step.
Ideally the most distorted images in each codec have a (roughly) equivalent distortion magnitude, and the
intermediate distortion levels are (roughly) equally spaced. The relevant range of codec quality parameters
should be determined in one of the following ways (in order of decreasing accuracy and effort):
— A small pilot study is conducted, to determine distortion levels on a per-source and per-codec basis, for
example as it was done in Reference [13].
— By expert viewing, suitable codec parameters are determined on a per-source, per-codec basis.
© ISO/IEC 2026 – All rights reserved
— By expert viewing (or small pilot study), suitable parameters are determined for one anchor codec. For
the other codecs, parameters are chosen to approximate the bit rates (in bits per pixel [bpp]) of the
anchor codec.
— A suitable range of bit rates is selected, with (approximately) logarithmically spaced steps, e.g. 3 bpp,
2 bpp, 1.3 bpp, 0.9 bpp, 0.6 bpp.
The latter two approaches are only applicable in the case of assessment of image coding, where bit rates can
be computed.
For each source image S and codec C a sequence of n+1 images shall be created, denoted by
CS,,01CS,,,,CSn , with increasing distortion levels 01,,…,n , where level 0 refers to the source
image, that is, CS,.0 S The number of distortion levels nand the distortion magnitude of the most
distorted images can vary depending on the goals of the experiment, but they should be within the range
indicated in Table A.1.
Table A.1 — Recommended range of parameters for the preparation of sequences of distorted
images
Parameter Description Minimum value Maximum value Example value
Largest distortion magnitude, in JND units 1 5 2.5
D
max
n
Number of distortion levels per codec 4 20
4⋅D
max
© ISO/IEC 2026 – All rights reserved
Annex B
(normative)
Triplet selection and batch generation
B.1 Triplet notation
The notation LP,,R denotes a triplet consisting of three stimuli.
— L is the stimulus displayed on the left side of the screen and is typically a distorted image.
— P is the pivot image, which is the reference to which the two other images are compared. It is the source
image.
— R is the stimulus displayed on the right side of the screen and is typically a distorted image.
All three images are derived from the same source image. In this document, the pivot image shall always be
equal to the source image.
B.2 Triplet selection
The set of triplets shall be constructed as follows.
For each source image S , the following two types of triplets, all of the form CS,,iS,,CS j , shall be
included in the experiment:
— Same-codec comparisons: These are comparisons of different distortion levels within the same codec,
i.e., C = C and ij≠ . The purpose of these comparisons is the fine-grained quality assessment for each
sequence of compressed images for each codec. All possible same-codec comparisons should be evaluated.
— Cross-codec comparisons: In order to increase the accuracy of the scale reconstruction, cross-codec
comparisons, namely comparison of test images derived from the same source but compressed with
different codecs C ≠ C , should be introduced. The recommended number of cross-codec comparisons
is 20 % of the total number of comparisons (1 cross-cross codec comparison for every 4 same-codec
comparisons). The selection of the cross-codec comparisons should be random but not uniformly: it
should be based on approximate equality of bitrates or other assumptions of expected similarity of
distortion magnitude.
The same-codec and cross-codec triplets are used together to reconstruct the perceived quality differences
in JND units. The inclusion of cross-codec triplets enhances the accuracy of the reconstructed scales and
ensures better alignment of the scale across different codecs.
Additionally, trap questions may be added, in particular in a crowdsourcing environment. A trap question
is a question with a known answer used to filter observers according to their attention level. These triplets
should be of the form CS,,mS ,S , where m is either the maximum distortion level considered (i.e., it is
also a same-codec comparison with j= 0 ) or an even larger distortion level (in case the considered distortion
range is narrow). In other words, it contains one image with defects that a typical observer is certainly able
to detect, and a copy of the source image.
For every triplet LP,,R , the symmetric triplet RP,,L shall also be included.
© ISO/IEC 2026 – All rights reserved
B.3 Batch generation, test duration and timing
If the total number of test questions is too large, the experiment shall be divided in multiple batches. Each
batch shall satisfy the following constraints: it shall contain each of the three triplet types in the same
proportions as in the total set of triplets, and for all triplets in a batch, the symmetric triplet shall also be
included in the same batch.
The generation of the batches shall be randomized, within the constraints mentioned above. The order in
which the questions are presented to the observers should be randomized. The assignment of batches to
observers shall be randomized. In addition, consecutive stimuli derived from the same source image should
be avoided as much as possible.
The time required for an observer to complete one batch, t , depends on how fast the observer is able to
bathc
submit their answer, t , at each step.
answer
tNNN t
bathcwsc tans er
where N is the number of same-codec comparisons, N is the number of cross-codec comparisons, and N
s c t
is the number of trap questions in the batch.
The maximum batch duration t is computed taking into consideration the maximum time allowed
batchm− ax
to answer a study question. The maximum allowed time to submit an answer t shall be enforced
answearm− x
by the interface, and it should be set as follows: 11 s for the BTC protocol; 30 s for the PTC protocol.
tNNN t
batchmax sc tanswermax
In order to minimize the observers’ stress, each batch should not be longer than 25 min. If observers are
requested to complete multiple batches during one viewing session, they shall take a mandatory break of
minimum 3 min between batches. The total time of a viewing session should not exceed one hour and there
should be no more than one viewing session per observer per day.
© ISO/IEC 2026 – All rights reserved
Annex C
(normative)
Observers and viewing conditions
C.1 Observer selection
C.1.1 General
The observers should be selected from a general population. The observers for the experiment shall not
include evaluators who participated in the media selection for the experiment being conducted.
The observer population should include variations in gender, ethnicity and age. The experiment is visual in
nature and age can strongly correlate to visual acuity, therefore, this procedure favours the age range for the
observers from 18 to 30 years old. The age, gender and country of residence of the observers participating in
the experiment should be collected and reported along with the results.
C.1.2 Controlled environment
The following selection criteria shall apply:
— Observers should have normal or corrected-to-normal visual acuity; this may be verified by using a
Snellen or Landolt C vision test (see ISO 8596).
[11]
— Observers should have normal colour vision; this may be verified by using an Ishihara test.
C.1.3 Crowdsourcing environment
The following selection criteria shall apply:
[11]
— Observers should have normal colour vision; this may be verified by using an Ishihara test.
C.2 Instructions to the observers
C.2.1 Controlled environment
Evaluators shall provide the same instructions to all observers with the following criteria:
— Explain the main goal of the experiment.
— Explain the use of the user interface.
— Explain the time limitations.
— Explain where to sit and how to arrange the chair.
Instructions should be provided in both written and oral form.
C.2.2 Crowdsourcing environment
Evaluators shall provide the same instructions to all observers with the following criteria:
— Report the minimum system requirements for the experiment.
— Explain the main goal of the experiment.
© ISO/IEC 2026 – All rights reserved
— Explain the use of the user interface.
— Explain the time limitations.
— Recommend a viewing position.
— Inform the observers about the conditions of payment.
— Explain the procedure for receiving the payment after completing the experiment.
C.3 Training session
The evaluator should use a training session for observers new to the procedure used in the subjective quality
assessment. For a training session the following criteria shall be observed.
— Use of images with content different from those images used during the test session.
— No inclusion of data from the training session in the output results.
— Use of the same viewing time limit as in the main part of the experiment.
— Feedback for the observers in case a wrong answer is given. Observers should retry until a correct
answer is given.
A minimum number of six examples should be used for the training session. After the training session at the
start of the main session, a number of triplet questions may be discarded from the data analysis for scale
reconstruction. The purpose of this is to allow observers to go through the transition phase to become fully
accustomed to the study interface.
C.4 Viewing conditions
C.4.1 Controlled environment
Viewing conditions should be consistent with those reported in ITU-R BT.500:
— Monitor calibration.
— Room light conditions.
— Viewing distance.
— Viewing time.
In reporting of the results, the details of the viewing conditions shall be mentioned explicitly.
C.4.2 Crowdsourcing environment
Specific viewing conditions cannot be enforced in a crowdsourcing environment, but observers should be
requested to respect the following viewing conditions:
— Set the display brightness to a comfortable setting.
— Make sure that the ambient light is not too bright.
— Clear screen and (if applicable) glasses before starting the experiment.
These viewing conditions should be adapted to the context of the assessment, for example, when image
quality is to be tested outdoors on a smartphone, or for high dynamic range content. In reporting of the
results, any specific instructions related to viewing conditions shall be mentioned explicitly in case they
deviate from the above.
© ISO/IEC 2026 – All rights reserved
Annex D
(normative)
Boosted and plain triplet comparison (BTC / PTC)
D.1 General
Each test question corresponds to a triplet of the form II,,I . Each observer has to observe the two
LP R
stimuli I and I presented on the screen and then select the stimulus with the strongest perceived
L R
distortion compared to the pivot (source) image I .
P
Three different answers should be presented to the observers: ‘left’, ‘right’, ‘not sure’. The ‘not sure’ option is
introduced to reduce the mental demand of the observers.
A progress bar should also be displayed to let the observers keep track of their progress.
All images should be shown with a 1:1 mapping from the image pixels to the native display pixels.
Two different protocols shall be applied: BTC (Clause D.2) and PTC (Clause D.3). The number of responses
collected with BTC should be 80 % of the total number of responses; the number of responses collected with
PTC should be 20 % of the total.
D.2 Boosted triplet comparison (BTC)
D.2.1 Boosting techniques
In this comparison method, the perceptual impact of the distortions is boosted in one to three different
ways.
The first boosting technique is called flicker. It shall always be used. The test images are temporarily
interleaved with the pivot (source) image. The change rate should be 10 Hz (the original image is shown for
100 ms and then the distorted image is also shown for 100 ms).
The two other boosting techniques apply a pre-processing step to the triplet II,,I to produce a boosted
LP R
triplet ( B (I ), B (I ), B (I )).
L P R
The second boosting technique is called zooming. It is optional. All three images are upsampled by a factor
of two, such that one image pixel corresponds to 2×2 display pixels. The upsampling filter that should be
used is simple pixel duplication (nearest neighbour upscaling). In case zooming is used, the selection of the
source images (see Clause A.1) should take into account that the test display has to be able to fit two zoomed
images.
The third boosting technique is called artefact amplification. It is optional. The two stimuli are modified
as follows. On a per-pixel basis, the difference between the source sample values and the stimulus sample
values is multiplied by a constant factor. The recommended factor is two and the amplification may be
performed in a perceptually uniform colour space. The sample values are clamped to the nominal range if
needed.
If both artefact amplification and zooming are applied, then the order of pre-processing is as follows: first
artefact amplification, then zooming. If simple pixel duplication is used as an upsampling filter, the order of
operations does not matter.
NOTE Generally, the more boosting techniques are used, the more fine-grained the resulting reconstructed JND
scale would be.
© ISO/IEC 2026 – All rights reserved
D.2.2 Stimuli presentation
Each triplet ( B (I ), B (I ), B (I )) of images processed as in D.2.1 and selected as in Annex B shall be
L P R
presented to the observers in a side-by-side fashion as follows. The two boosted test stimuli B (I ) and B
L
I shall be displayed side-by-side in the central part of the interface, while the pivot image B I
R P
(corresponding to the source image, possibly zoomed) shall be temporally interleaved to the test stimuli to
create flicker at 10 Hz, that is, for 100 ms both test stimuli are shown, then for 100 ms in the same locations
two copies of the pivot image are shown, then for 100 ms both test stimuli are shown again, etc.
The question asked to the observers should be: ‘Which image has a stronger flicker effect?’.
Each triplet should be shown for 8 s, and successively hidden. Afterwards, the observers should have an
additions 3 s to submit their answer, without possibility of further inspecting the images. If an observer
fails to submit an answer within this time, the question shall be marked as ‘skipped’. The experiment shall
temporarily pause and the observer will need to press a button in order to continue viewing the next triplets.
D.3 Plain triplet comparison (PTC)
In this comparison method, the perceptual impact of the distortions shall not be boosted as in Clause D.2.
The two (non-boosted) test stimuli I and I shall be displayed side-by-side in the central part of the
L R
interface. Additionally, there shall be a button labelled ‘Show original’. For as long as this button is pressed,
both images shall be replaced (in-place) by the pivot (source) image I . This allows the observer to assess
P
the fidelity of each stimulus through an in-place comparison to the source image.
NOTE Allowing such in-place comparison can be seen as a mild form of boosting, relative to allowing only a side-
by-side comparison.
The observer should be allowed to press the ‘Show original’ button any number of times, but it shall be
enforced by the interface that it is pressed at least once (per question) before submitting a response. The
minimum delay between two consecutive presses of the button should be 500 ms, effectively limiting the
‘manual flicker’ rate to 2 Hz.
The question asked to the observers should be ‘Which image has a stronger distortion?’.
Each triplet should be shown for 30 s. If an observer fails to submit an answer within this time limit, the
question shall be marked as ‘skipped’. The experiment shall temporarily pause, and the observer will need
to press a button in order to continue viewing the next triplets.
© ISO/IEC 2026 – All rights reserved
Annex E
(normative)
Data cleansing procedures
E.1 General
The data cleansing procedure shall proceed in two stages. In the first stage (Clause E.2), thresholding with
respect to the combined accuracy and consistency shall be applied, as well as thresholding with respect to
order bias. In the second stage (Clause E.3), outlier detection shall be applied.
In reporting of the results, the details of the data cleansing procedures followed shall be mentioned explicitly
in case they deviate from the procedures described below.
E.2 Data cleansing stage one
Accuracy: Assignments shall be assigned an accuracy value in [0,1], based on all responses for same-codec
and trap questions. The accuracy of an assignment is defined as the ratio of correct responses, according to
the following algorithm:
— Each response contributes a value which shall be determined as follows: a correct response is either ‘left’
or ‘right’, depending on the involved distortion levels, and has a score of 1. An incorrect response has a
score of 0. A response ‘not sure’ has a score of 0.5.
— Response values shall be weighted. The weight for a triplet question is defined as the absolute difference
between the distortion levels of the left and right stimulus (an integer from 0 to m ).
— The accuracy shall be computed as the mean of the weighted response values.
Consistency: Assignments shall be assigned a consistency value in [0,1], based on all responses for same-
codec, cross-codec, and trap questions which occur in symmetric pairs, i.e., with distortion levels (i,0,k) and
(k,0,i) in the triplets. It shall be computed according to the following algorithm:
— For each symmetric pair of triplets, a score shall be computed. If the responses are consistent (i.e. one
‘left‘ and the other ‘right‘ or both ‘not sure’), the score shall be 1. If one response is ‘left’ and the other is
‘right’, the score shall be 0. If one response is ‘not sure‘ and the other is not, the score shall be 0.375.
— The weights for each pair of triplets shall be computed in the same way as for accuracy.
— The consiste
...




Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...