Information technology — Scalable compression and coding of continuous-tone still images — Part 6: IDR Integer Coding

ISO/IEC 18477-6:2016 specifies a coding format, referred to as JPEG XT, which is designed primarily for continuous-tone photographic content.

Technologies de l'information — Compression échelonnable et codage d'images plates en ton continu — Partie 6: Codage de nombre entier par IDR

General Information

Status
Published
Publication Date
27-Jan-2016
Current Stage
9093 - International Standard confirmed
Start Date
23-Jun-2021
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 18477-6:2016 - Information technology -- Scalable compression and coding of continuous-tone still images
English language
34 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO/IEC 18477-6:2016 - Information technology -- Scalable compression and coding of continuous-tone still images
English language
34 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


FINAL
INTERNATIONAL ISO/IEC
DRAFT
STANDARD FDIS
18477-6
ISO/IEC JTC 1/SC 29
Information technology — Scalable
Secretariat: JISC
compression and coding of
Voting begins on:
2015-10-14 continuous-tone still images —
Voting terminates on:
Part 6:
2015-12-14
IDR Integer Coding
Technologies de l’information — Compression échelonnable et codage
d’images plates en ton continu —
Partie 6: Codage de nombre entier par IDR
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/IEC FDIS 18477-6:2015(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
NATIONAL REGULATIONS. ISO/IEC 2015

ISO/IEC FDIS 18477-6:2015(E)
© ISO/IEC 2015, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions, abbreviated terms, and symbols . 1
3.1 Terms and definitions . 1
3.2 Symbols . 7
3.3 Abbreviated terms . 7
4 Conventions . 7
4.1 Conformance language . 7
4.2 Operators . 8
4.2.1 Arithmetic operators . 8
4.2.2 Logical operators . 8
4.2.3 Relational operators . 8
4.2.4 Precedence order of operators . 8
4.2.5 Mathematical functions . 9
5 General . 9
5.1 High level overview on JPEG XT ISO/IEC 18477-6 .10
5.2 Profiles .11
5.3 Encoder requirements .11
5.4 Decoder requirements.11
Annex A (normative) Encoding and decoding process .13
Annex B (normative) Boxes .17
Annex C (normative) Multi-component decorrelation .24
Annex D (normative) Entropy coding of refinement data .28
Annex E (normative) Profiles .33
Bibliography .34
© ISO/IEC 2015 – All rights reserved iii

ISO/IEC FDIS 18477-6:2015(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical
Barriers to Trade (TBT) see the following URL: Foreword - Supplementary information
The committee responsible for this document is ISO/IEC JTC 1, Information technology, SC 29, Coding of
audio, picture, multimedia and hypermedia information.
ISO/IEC 18477 contains the following parts under the general title Information technology — Scalable
compression and coding of continuous-tone still images:
— Part 1: Scalable compression and coding of continuous-tone still images
— Part 2: Extensions for high dynamic range images
— Part 3: Box file format
— Part 6: IDR Integer Coding
— Part 7: HDR Floating-Point Coding
— Part 8: Lossless and Near-lossless Coding
— Part 9: Alpha Channel Coding
The following parts are under preparation:
— Part 4: Conformance testing
— Part 5: Reference software
iv © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
Introduction
This part of ISO/IEC 18477 specifies a coded codestream format for storage of continuous-tone high and
low dynamic range photographic content. JPEG XT part 6 is a scalable image coding system supporting
multiple component images consisting of integer samples of a bit precision between 9 and 16 bits. The
format itself is based on the Box Based format specified in ISO/IEC 18477-3, which ensures that legacy
applications conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1 are able to reconstruct a lower quality,
low dynamic range, eight bits per sample version of the image.
Today, the most widely used digital photography format, a minimal implementation of JPEG (specified
in ITU Recommendation T.81 | ISO/IEC 10918-1), uses a bit depth of 8; each of the three channels that
together compose an image pixel is represented by 8 bits, providing 256 representable values per
channel. For more demanding applications, it is not uncommon to use a bit depth of 16, providing 65 536
representable values to describe each channel within a pixel, resulting on over 2,8 × 10 representable
colour values.
Most common photo and image formats use an 8-bit or 16-bit unsigned integer value to represent some
function of the intensity of each colour channel. While it might be theoretically possible to agree on
one method for assigning specific numerical values to real world colours, doing so is not practical.
Since any specific device has its own limited range for colour reproduction, the device’s range may be a
small portion of the agreed-upon universal colour range. As a result, such an approach is an extremely
inefficient use of the available numerical values, especially when using only 8 bits (or 256 unique
values) per channel. To represent pixel values as efficiently as possible, devices use a numeric encoding
optimized for their own range of possible colours or gamut.
JPEG XT is primarily designed to provide coded data containing intermediate dynamic range and wide
colour gamut content while simultaneously providing 8 bits per pixel low dynamic range images using
tools defined in ISO/IEC 18477-1, which is itself a subset of Rec. ITU-T T.81 | ISO/IEC 10918-1. The goal
is to provide a backwards compatible coding specification that allows legacy applications and existing
toolchains to continue to operate on codestreams conforming to this part of ISO/IEC 18477.
JPEG XT has been designed to be backwards compatible to legacy applications while at the same time
having a small coding complexity; JPEG XT uses, whenever possible, functional blocks of Rec. ITU-T T.81
| ISO/IEC 10918-1 to extend the functionality of the legacy JPEG Coding System. It is optimized for
storage and transmission of intermediate dynamic range and wide colour gamut images while also
enabling low-complexity encoder and decoder implementations.
This part of ISO/IEC 18477 is an extension of ISO/IEC 18477-1, a compression system for continuous
tone digital still images which is backwards compatible with Rec. ITU-T T.81 | ISO/IEC 10918-1. That
is, legacy applications conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1 will be able to reconstruct
streams generated by an encoder conforming to this part of ISO/IEC 18477, though will possibly not
be able to reconstruct such streams in full dynamic range, full quality or other features defined in this
Recommendation| International Standard.
This part of ISO/IEC 18477 is itself based on ISO/IEC 18477-3 which defines a box-based file format
similar to other JPEG standards. The aim of this part of ISO/IEC 18477 is to provide a migration path
for legacy applications to support, potentially in a limited way, coding of intermediate dynamic range
images, that is images represented by sample values requiring 9 to 16 bits precision. While the legacy
Rec. ITU-T T.81 | ISO/IEC 10918-1 already defines a coding mode for 12 bit sample precision, images
encoded in this mode cannot be decoded by applications implementing only the 8 bit mode. Unlike
the legacy standard, this part of ISO/IEC 18477 defines a scalable coding engine supporting all bit
depths between 9 and 16 bits per sample while also staying compatible with legacy applications. Such
applications will continue to work, but will only able to reconstruct an 8 bit standard low dynamic
range (LDR) version of the full image contained in the codestream. This part of ISO/IEC 18477 specifies
a coded file format, referred to as JPEG XT, which is designed primarily for storage and interchange of
continuous-tone photographic content.
© ISO/IEC 2015 – All rights reserved v

FINAL DRAFT INTERNATIONAL STANDARD ISO/IEC FDIS 18477-6:2015(E)
Information technology — Scalable compression and
coding of continuous-tone still images —
Part 6:
IDR Integer Coding
1 Scope
This part of ISO/IEC 18477 specifies a coding format, referred to as JPEG XT, which is designed primarily
for continuous-tone photographic content.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 18477-1:2015, Information technology — Scalable compression and coding of continuous-tone still
images — Part 1: Scalable compression and coding of continuous-tone still images
1)
ISO/IEC 18477-3:—, Information technology — Scalable compression and coding of continuous-tone still
images —Part 3: Box-based file format
Rec. ITU-T T.81 | ISO/IEC 10918-1, Information technology — Digital compression and coding of continuous
tone still images — Requirements and guidelines
Rec. ITU-T BT.601, Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9
aspect ratios
3 Terms and definitions, abbreviated terms, and symbols
3.1 Terms and definitions
For the purposes of this document, the following definitions apply.
3.1.1
AC coefficient
any DCT coefficient for which the frequency is not zero in at least one dimension
3.1.2
ASCII encoding
encoding of text characters and text strings according to ISO/IEC 10646-1
3.1.3
base decoding path
process of decoding legacy codestream and refinement data to the base image, jointly with all further
steps until residual data is added to the values obtained from the residual codestream
1) To be published.
© ISO/IEC 2015 – All rights reserved 1

ISO/IEC FDIS 18477-6:2015(E)
3.1.4
base image
collection of sample values obtained by entropy decoding the DCT coefficients of the legacy codestream
and the refinement codestream, and inversely DCT transforming them jointly
3.1.5
binary decision
choice between two alternatives
3.1.6
bitstream
partially encoded or decoded sequence of bits comprising an entropy-coded segment
3.1.7
block
8 × 8 array of samples or an 8 × 8 array of DCT coefficient values of one component
3.1.8
box
structured collection of data describing the image or the image decoding process embedded into one or
multiple APP marker segments
Note 1 to entry: See ISO/IEC 18477-3:—, Annex B for the definition of boxes.
3.1.9
byte
group of 8 bits
3.1.10
coder
embodiment of a coding process
3.1.11
coding
encoding or decoding
3.1.12
coding model
procedure used to convert input data into symbols to be coded
3.1.13
(coding) process
general term for referring to an encoding process, a decoding process, or both
3.1.14
compression
reduction in the number of bits used to represent source image data
3.1.15
component
two-dimensional array of samples having the same designation in the output or display device
Note 1 to entry: An image typically consists of several components, e.g. red, green, and blue.
3.1.16
continuous-tone image
image whose components have more than one bit per sample
3.1.17
DC coefficient
DCT coefficient for which the frequency is zero in both dimensions
2 © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
3.1.18
decoder
embodiment of a decoding process
3.1.19
decoding process
process which takes as its input compressed image data and outputs a continuous-tone image
3.1.20
dequantization
inverse procedure to quantization by which the decoder recovers a representation of the DCT coefficients
3.1.21
discrete cosine transform
DCT
either the forward discrete cosine transform or the inverse discrete cosine transform
3.1.22
downsampling
procedure by which the spatial resolution of a component is reduced
3.1.23
encoder
embodiment of an encoding process
3.1.24
encoding process
process which takes as its input a continuous-tone image and outputs compressed image data
3.1.25
entropy-coded (data) segment
independently decodable sequence of entropy encoded bytes of compressed image data
3.1.26
entropy decoder
embodiment of an entropy decoding procedure
3.1.27
entropy decoding
lossless procedure which recovers the sequence of symbols from the sequence of bits produced by the
entropy encoder
3.1.28
entropy encoder
embodiment of an entropy encoding procedure
3.1.29
entropy encoding
lossless procedure which converts a sequence of input symbols into a sequence of bits such that the
average number of bits per symbol approaches the entropy of the input symbols
3.1.30
grayscale image
continuous-tone image that has only one component
3.1.31
high dynamic range
image or image data comprised of more than eight bits per sample
© ISO/IEC 2015 – All rights reserved 3

ISO/IEC FDIS 18477-6:2015(E)
3.1.32
Huffman decoder
embodiment of a Huffman decoding procedure
3.1.33
Huffman decoding
entropy decoding procedure which recovers the symbol from each variable length code produced by
the Huffman encoder
3.1.34
Huffman encoder
embodiment of a Huffman encoding procedure
3.1.35
Huffman encoding
entropy encoding procedure which assigns a variable length code to each input symbol
3.1.36
intermediate dynamic range
image or image data comprised of more than eight bits per sample
3.1.37
joint photographic experts group
JPEG
informal name of the committee which created this part of ISO/IEC 18477
Note 1 to entry: The “joint” comes from the ITU-T and ISO/IEC collaboration.
3.1.38
legacy codestream
collection of markers and syntax elements defined by Rec. ITU-T T.81 | ISO/IEC 10918-1 bare any
additional syntax elements defined by the family ISO/IEC 18477 standards, i.e. the legacy codestream
consists of the collection of all markers except those APP markers that describe JPEG XT boxes by the
syntax defined in ISO/IEC 18477-3:—, Annex A
3.1.39
legacy decoding path
collection of operations to be performed on the entropy coded data as described by Rec. ITU-T T.81 |
ISO/IEC 10918-1 jointly with the Legacy Refinement scans before this data is merged with the residual
data to form the final output image
3.1.40
legacy decoder
embodiment of a decoding process conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1, confined to the
lossy DCT process and the baseline, sequential, or progressive modes, decoding at most four components
to eight bits per component
3.1.41
legacy image
arrangement of sample values as described by applying the decoding process described by Rec.
ITU-T T.81 | ISO/IEC 10918-1 on the entropy coded data as defined by said standard
3.1.42
lossless
descriptive term for encoding and decoding processes and procedures in which the output of the
decoding procedure(s) is identical to the input to the encoding procedure(s)
4 © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
3.1.43
lossless coding
mode of operation which refers to any one of the coding processes defined in this part of ISO/IEC 18477
in which all of the procedures are lossless
Note 1 to entry: See Annex H.
3.1.44
lossy
descriptive term for encoding and decoding processes which are not lossless
3.1.45
low dynamic range
image or image data comprised of data with no more than eight bits per sample
3.1.46
marker
two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and
hexadecimal FE
3.1.47
marker segment
marker together with its associated set of parameters
3.1.48
pixel
collection of sample values in the spatial image domain having all the same sample coordinates, e.g. a
pixel may consist of three samples describing its red, green, and blue value
3.1.49
precision
number of bits allocated to a particular sample or DCT coefficient
3.1.50
procedure
set of steps which accomplishes one of the tasks which comprise an encoding or decoding process
3.1.51
quantization value
integer value used in the quantization procedure
3.1.52
quantize
act of performing the quantization procedure for a DCT coefficient
3.1.53
residual decoding path
collection of operations applied to the entropy coded data contained in the residual data box and
residual refinement scan boxes up to the point where this data is merged with the base image to form
the final output image
3.1.54
residual image
extension image
sample values as reconstructed by inverse quantization and inverse DCT transformation applied to the
entropy-decoded coefficients described by the residual scan and residual refinement scans
© ISO/IEC 2015 – All rights reserved 5

ISO/IEC FDIS 18477-6:2015(E)
3.1.55
residual scan
additional pass over the image data invisible to legacy decoders which provides additive and/or
multiplicative correction data of the legacy scans to allow reproduction of high dynamic range or wide
colour gamut data
3.1.56
refinement scan
additional pass over the image data invisible to legacy decoders which provides additional least
significant bits to extend the precision of the DCT transformed coefficients
3.1.57
sample
one element in the two-dimensional image array which comprises a component
3.1.58
sample grid
common coordinate system for all samples of an image
Note 1 to entry: The samples at the top left edge of the image have the coordinates (0,0), the first coordinate
increases towards the right, the second towards the bottom.
3.1.59
scan
single pass through the data for one or more of the components in an image
3.1.60
scan header
marker segment that contains a start-of-scan marker and associated scan parameters that are coded at
the beginning of a scan
3.1.61
superbox
box that carries other boxes as payload data
3.1.62
table specification data
coded representation from which the tables used in the encoder and decoder are generated and their
destinations specified
3.1.63
(uniform) quantization
procedure by which DCT coefficients are linearly scaled in order to achieve compression
3.1.64
upsampling
procedure by which the spatial resolution of a component is increased
3.1.65
vertical sampling factor
relative number of vertical data units of a particular component with respect to the number of vertical
data units in the other components in the frame
3.1.66
zero byte
0 × 00 byte
3.1.67
zig-zag sequence
specific sequential ordering of the DCT coefficients from (approximately) lowest spatial frequency to
highest
6 © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
3.2 Symbols
X Width of the sample grid in positions
Y Height of the sample grid in positions
Nf Number of components in an image
s Subsampling factor of component in horizontal direction
i,x
s Subsampling factor of component in vertical direction
i,y
H Subsampling indicator of component in the frame header
i
V Subsampling indicator of component in the frame header
i
v Sample value at the sample grid position x,y
x,y
R Additional number of DCT coefficient bits represented by refinement scans in the legacy decod-
h
ing path, 8+R is the number of non-fractional bits (i.e. bits in front of the “binary dot”) of the
h
output of the inverse DCT process in the legacy decoding path.
R Additional number of DCT coefficient bits represented by refinement scans in the residual decod-
r
ing path. P+R is the number of non-fractional bits of the output of the invers DCT process in the
r
residual decoding path, where P is the frame-precision of the residual image as recorded in the
frame header of the residual codestream.
R Additional bits in the HDR image. 8+Rb is the sample precision of the reconstructed HDR image.
b
3.3 Abbreviated terms
For the purposes of this part of ISO/IEC 18477, the following abbreviated terms apply.
ASCII American Standard Code for Information Interchange
LSB Least Significant Bit
MSB Most Significant Bit
HDR High Dynamic Range
IDR Intermediate Dynamic Range
LDR Low Dynamic Range
TMO Tone Mapping Operator
DCT Discrete Cosine Transformation
4 Conventions
4.1 Conformance language
This part of ISO/IEC 18477 consists of normative and informative text.
Normative text is that text which expresses mandatory requirements. The word “shall” is used to express
mandatory requirements strictly to be followed in order to conform to this part of ISO/IEC 18477 and
from which no deviation is permitted. A conforming implementation is one that fulfils all mandatory
requirements.
© ISO/IEC 2015 – All rights reserved 7

ISO/IEC FDIS 18477-6:2015(E)
Informative text is text that is potentially helpful to the user, but not indispensable and can be removed,
changed, or added editorially without affecting interoperability. All text in this part of ISO/IEC 18477
is normative, with the following exceptions: The Introduction, any parts of the text that are explicitly
labelled as “informative”, and statements appearing with the preamble “NOTE” and behaviour described
using the word “should”. The word “should” is used to describe behaviour that is encouraged but is not
required for conformance to this part of ISO/IEC 18477.
The keywords “may” and “need not” indicate a course of action that is permissible in a conforming
implementation.
The keyword “reserved” indicates a provision that is not specified at this time, shall not be used, and
may be specified in the future. The keyword “forbidden” indicates “reserved” and in addition indicates
that the provision will never be specified in the future.
4.2 Operators
NOTE Many of the operators used in this part of ISO/IEC 18477 are similar to those used in the C
programming language.
4.2.1 Arithmetic operators
+ Addition
− Subtraction (as a binary operator) or negation (as a unary prefix operator)
* Multiplication
/ Division without truncation or rounding
4.2.2 Logical operators
|| Logical OR
&& Logical AND
! Logical NOT
∈ x ∈ {A, B} is defined as (x == A || x = = B)
∉ x ∉ {A, B} is defined as (x != A && x != B)
4.2.3 Relational operators
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
== Equal to
!= Not equal to
4.2.4 Precedence order of operators
Operators are listed below in descending order of precedence. If several operators appear in the same
line, they have equal precedence. When several operators of equal precedence appear at the same level
8 © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
in an expression, evaluation proceeds according to the associativity of the operator either from right to
left or from left to right.
Operators Type of operation Associativity
(), [ ], . Expression Left to Right
− Unary negation
*, / Multiplication Left to Right

+, − Addition and Subtraction Left to Right
<, >, <=, >= Relational Left to Right
4.2.5 Mathematical functions
Ceil of x. Returns the smallest integer that is greater than or equal to x.
x
 
 
Floor of x. Returns the largest integer that is lesser than or equal to x.
x
 
 
|x| Absolute value, is –x for x < 0, otherwise x.
sign(x) Sign of x, 0 if x is zero, +1 if x is positive, −1 if x is negative.
clamp(x,min,max) Clamps x to the range [min,max]: Returns min if x < min, max if x > max or
otherwise x.
a
x Raises the value of x to the power of a. x is a non-negative real number, a is a
a
real number. x is equal to exp[a × log(x)] where exp is the exponential func-
a
tion and log() the natural logarithm. If x is 0 and a is positive, x is defined to
be 0.
5 General
The purpose of this Clause is to give an informative overview of the elements specified in this part of
ISO/IEC 18477. Another purpose is to introduce many of the terms which are defined in Clause 3. These
terms are printed in italics upon first usage in this Clause.
There are three elements specified in this part of ISO/IEC 18477.
a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source
image data and encoder specifications, and by means of a specified set of procedures generates as
output codestream.
b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by
means of a specified set of procedures generates as output digital reconstructed image data.
c) The codestream is a compressed image data representation which includes all necessary data to
allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data
might be required that define the interpretation of the sample data, such as colour space or the
spatial dimensions of the samples.
© ISO/IEC 2015 – All rights reserved 9

ISO/IEC FDIS 18477-6:2015(E)
5.1 High level overview on JPEG XT ISO/IEC 18477-6
this part of ISO/IEC 18477 allows lossy coding of intermediate dynamic range of photographic images
in a way that is backwards compatible to Rec. ITU-T T.81 | ISO/IEC 10918-1. Decoders compliant to the
latter standard will be able to parse codestreams conforming to this part of ISO/IEC 18477 correctly,
albeit in less precision, with a limited dynamic range, and loss in sample bit precision.
this part of ISO/IEC 18477 includes multiple tools to reach the above functionality, defined in Annex B
and on. It is itself based on the file format specified in ISO/IEC 18477-3 and uses the syntax elements and
tools defined there. While ISO/IEC 18477-3 only defines a syntax, this part of ISO/IEC 18477 extends
the syntax of ISO/IEC 18477-3 to allow the representation of intermediate dynamic range images. It
also defines a decoding process that reconstructs sample values from conforming files. A high level
overview on both the syntax and the decoding will be given in this section.
The syntax of an ISO/IEC 18477-6 compliant codestream is specified in ISO/IEC 18477-3, that is, this
part of ISO/IEC 18477 uses a syntax element denoted as “Box” to annotate its syntactical elements. The
definition of the box syntax element is not repeated here, and readers are referred to ISO/IEC 18477-3
for further details. Additional boxes besides those already specified in ISO/IEC 18477-3 are defined
here, in specific, the Residual Data Box, the Refinement Data Box, and the Residual Refinement Box at
the top level of the file, and various sub-boxes of the Merging Specification Box defining the decoding
process. The Merging Specification Superbox is already defined in ISO/IEC 18477-3, all additional box
types are specified in Annex B.
LDR Image
Base Image Chroma
Base Image Chroma
Decoder Upsampling
Inverse
Decoder Upsampling
Inverse
Decorrelation
Decorrelation
HDR Image
Base Ouput
Base Ouput
+
Mapping Conversion
+
Mapping Conversion
Range
Range
adjustment
adjustment
Residual Image Chroma ReReReResisisisidualdualdualdual Residual
Residual Image Chroma ReReReResisisisidualdualdualdual Residual
Decoder Upsampling MMMMappappappappinininingggg Decorrelation
Decoder Upsampling MMMMappappappappinininingggg Decorrelation
Figure 1 — Overview on the decoding process
This part of ISO/IEC 18477 extends the legacy decoding process by two mechanisms (see Figure 1).
The Refinement Scan, specified in Annex D, increases the bit precision of the DCT coefficients,
i.e. it operates in the DCT domain. The mechanism used here is very similar to that of Subsequent
Approximation scans specified in Rec. ITU-T T.81 | ISO/IEC 10918-1:1994, Annex G: A baseline,
extended or progressive Huffman scan as defined in Annex G of the legacy standard defines the 12 most
significant bits of the DCT coefficients. These initial scans are represented in the legacy codestream
and are visible for any ISO/IEC 18477-1 compliant decoder. Refinement scans decode now into up to
four additional least significant bits in the same way subsequent approximation within Rec. ITU-T T.81
| ISO/IEC 10918-1 decode least significant bits of a progressive scan pattern. The difference between
10 © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
Refinement scans and Subsequent Approximation scans is only that in the latter case the number of
least significant bits is annotated in the scan header of the legacy codestream, whereas Refinement
scans are hidden from legacy applications and do not alter the scan header of the legacy codestream.
Their number is indicated in the Refinement Specification Box within the Merging Specification Box
and not in the legacy codestream.
While Refinement Scans extend the bit precision within the DCT domain by up to four bits and hence
allow backwards compatible coding of images of up to 12 bits sample precision, Residual Scans extend
the sample precision in the spatial (image) domain. While the entropy coded data of Residual Scans is
hidden in the Residual Data Box from legacy applications, its decoding process is identical to that of
the legacy data: A baseline, extended or progressive Huffman scan decodes the data in the Residual
Data Box to DCT coefficients, inverse quantization and inverse Discrete Cosine Transformation (DCT)
compute from these coefficients the residual image data. An image merging process, defined in
Annex A, computes from the precursor image reconstructed from the base image and the residual
image a final IDR output image. This merging process first performs chroma upsampling to reconstruct
a single sample on each point of the sample grid of the base image. Chroma upsampling is specified in
ISO/IEC 18477-1:2015, Annex A. It then converts the colour space of the base image first from YCbCr into
the Rec. ITU-T BT.601 colourspace, followed by an additional linear transformation transforming the
Rec. ITU-T BT.601 primary colours into the primary colours of the target IDR colourspace. For practical
reasons, these two transformations are combined into a single linear transformation matrix. This linear
transformation is followed by a nonlinear point transformation acting separately on each of the output
channels sample by sample. This point transformation can be either specified by a parametric curve or
by an explicit lookup table. The output of this decoding path is transformed again by an optional colour
transformation forming the precursor image which represents a rough imprecise approximation of
the final IDR image, already in the correct IDR output colour space.
Processing continues with the decoding of the Residual Image: DCT coefficients of the residual
image are decoded from the information in the Residual Data box, and their bit precision is extended
by additional refinement scans decoded from the data in the Residual Refinement Box. Processing
proceeds with inverse quantization and inverse DCT transformation. The output undergoes chroma
upsampling to generate a single sample per sample grid coordinate. The next processing step performs
a nonlinear point transformation on each of the reconstructed channels, separately for each sample,
resulting in an error image in a YCbCr type of colour space. Samples undergo then an inverse linear
decorrelation transformation to map the sample values from the intermediate YCbCr colourspace into
the target colourspace. This transformation is typically identical to the transformation matrix in the
base decoding path, but does not need to be. The result of this operation is the residual image.
To form the final output image, sample values of the precursor image and the residual image are
added together, plus an offset to make the residual image symmetric around zero. Results are then
clamped to the range of the intermediate range output image.
The detailed specification of the decoding and merging process is found in Annex A.
5.2 Profiles
The Profiles define the implementation of a particular technology within the functional blocks of
Figure 1. The profiles are described in Annex E.
5.3 Encoder requirements
There is no requirement in this part of ISO/IEC 18477 that any encoder shall support all profiles. An
encoder is only required to meet the compliance tests and to generate the codestream according to the
syntax and to limit the coding parameters to those valid within the profile it conforms to. Profiles are
defined in Annex E.
5.4 Decoder requirements
A decoding process converts compressed image data to reconstructed image data. It may follow the
decoding operation specified in the Recommendation | International Standard and ISO/IEC 18477-1
© ISO/IEC 2015 – All rights reserved 11

ISO/IEC FDIS 18477-6:2015(E)
to generate an LDR image from the legacy codestream, and it shall follow the operations in this part
of ISO/IEC 18477 to decode an IDR image from the data in the full file. The Decoder shall parse the
codestream syntax to extract the parameters, the residual image and the legacy image. The parameters
shall be used to merge the residual image with the base image into the reconstructed IDR Image.
In order to comply with this part of ISO/IEC 18477, a decoder
a) may convert a codestream conforming to this part of ISO/IEC 18477 without considering the
information in any box into to a low dynamic range image, and
b) shall convert a conforming codestream within the profile it claims to be conforming to into an
intermediate dynamic range image.
12 © ISO/IEC 2015 – All rights reserved

ISO/IEC FDIS 18477-6:2015(E)
Annex A
(normative)
Encoding and decoding process
A.1 Decoding process (normative)
The decoding process relies on a layered approach to extend JPEG’s capabilities. The encoder decomposes
an IDR image into a base layer, which consists of a tone-mapped version of the IDR image and an IDR residual
layer. In addition to the residual layer, the codestream includes a description of an approximate inverse
tone mapping operation that allows the decoder to reconstruct from the LDR image an approximate IDR
image; the errors of this approximation process are corrected by the residual codestream included the
residual data box and residual refinement box (see Annex B). Both the description of the tone mapping
and the residual image are included in boxes invisible to legacy decoders. Such decoders will thus only
see the tone mapped LDR image. While the base image complies to ISO/IEC 18477-1 and thus supports
only the 8-bit extended or baseline, extended or progressive Huffman modes, the residual image may
optionally be encoded in the 12-bit Huffman or progressive modes.
Figure A.1 illustrates the functionality of a compliant decoder:
Base Image
YcbCr
T.81 10918-1 Decoder Chroma
T.81 10918-1 Decoder Chroma
To RGB
Base Image Upsampling
Base Image Upsampling
B1 B2
Base
Base
Trans-
Trans-
formation
formation
B3
Re
inement Scan
Re
inement Scan
B1a
HDR Image
O J H F
i i i i
Base NLT Color
Base NLT Color Output
Point Trans- Output
+
Point Trans- Conversion
+
Trafo formation B9 Conversion
Trafo formation
B4 B4a B10
Q
i
2nd Residual
2nd Residual
NLT
NLT
Point
Point
Trafo
Trafo
B8d
Ù
i
Residual Ref'ment Scan
Residual Ref'ment Scan
B5a
Residual
Residual Residual
T.81 10918-1 Decoder Chroma NLT Residual
T.81 10918-1 Decoder Chroma NLT Trans-
Residual Image Upsampling Point Trans-
Residual Image Upsampling Point formation
R P
Trafo formation
i i
Trafo
B5 B6 B7 B8
NOTE Bold lines carry three (or one, for grey scale) components. Round boxes implement point-transformations,
square boxes (except B1, B1a, B5, B5a) multiplications by 3 × 3 matrices. Letters denote signal names.
Figure A.1 — High level overview of the decoding process of a compliant decoder
© ISO/IEC 2015 – All rights reserved 13

ISO/IEC FDIS 18477-6:2015(E)
This subclause specifies the reconstruction process of an intermediate dynamic range image from a
LDR image and a residual image decoders shall follow. This process consists of the following steps, see
also Figure A.1.
— In steps B1 and B1a, reconstruct the base image from legacy codestream and the refinement
codestream if a Refinement Data box is present. Refinement coding is specified in Annex D.
— In step B1 and B1a, apply the Inverse Quantization and Inverse Discrete Cosine Transformation as
in Rec. ITU-T T.81 | ISO/IEC 10918-1.
— In step B2, the upsampling process specified in ISO/IEC 18477-1:2015, Annex A shall be followed to
generate samples for all positions on the sample grid.
— In step B3, the linear transformation as selected by the Base Transformation Box defined in
Annex B shall be applied to inversely decorrelate the image components. Table C.1 defines which
transformation to pick. The output of this block consists of either one or three samples per grid point
O , depending on the number of components in the base image. The output of this transformation is
i
Rh+8
rounded to integers and clipped to [0,2 −1] where R is the number of refinement scans in the
h
base image (see Annex D).
— In step B4, a nonlinear point transformation shall be applied to each of the output components
O . This process is selected according to the Base nonlinear Point Transformation subbox of the
i
Merging Specification box, implementing the L Luts of Figure 1 and following the specifications
i
of ISO/IEC 18477-3:—, Annex C. The outputs of this process are the predicted high dynamic range
samples J . As above, i = 1.Nf.
i
— Also in step B4, a colour transformation is applied to the input values J resulting in the output
i
pixel values H . The transformation is selected by the Colour Transformation subbox of the Merging
i
Specification Box, which selects one of the transformations defined in Annex C. If Nf equals 1, no
transformation is performed.
— In steps B5 and B5a, the residual image shall be reconstructed from the data
...


INTERNATIONAL ISO/IEC
STANDARD 18477-6
First edition
2016-02-01
Information technology — Scalable
compression and coding of
continuous-tone still images —
Part 6:
IDR Integer Coding
Technologies de l’information — Compression échelonnable et codage
d’images plates en ton continu —
Partie 6: Codage de nombre entier par IDR
Reference number
©
ISO/IEC 2016
© ISO/IEC 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2016 – All rights reserved

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions, abbreviated terms, and symbols . 1
3.1 Terms and definitions . 1
3.2 Symbols . 6
3.3 Abbreviated terms . 7
4 Conventions . 7
4.1 Conformance language . 7
4.2 Operators . 8
4.2.1 Arithmetic operators . 8
4.2.2 Logical operators . 8
4.2.3 Relational operators . 8
4.2.4 Precedence order of operators . 8
4.2.5 Mathematical functions . 9
5 General . 9
5.1 High level overview on JPEG XT ISO/IEC 18477-6 . 9
5.2 Profiles .11
5.3 Encoder requirements .11
5.4 Decoder requirements.11
Annex A (normative) Encoding and decoding process .13
Annex B (normative) Boxes .17
Annex C (normative) Multi-component decorrelation .24
Annex D (normative) Entropy coding of refinement data .28
Annex E (normative) Profiles .33
Bibliography .34
© ISO/IEC 2016 – All rights reserved iii

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical
Barriers to Trade (TBT) see the following URL: Foreword - Supplementary information
The committee responsible for this document is ISO/IEC JTC 1, Information technology, SC 29, Coding of
audio, picture, multimedia and hypermedia information.
ISO/IEC 18477 contains the following parts under the general title Information technology — Scalable
compression and coding of continuous-tone still images:
— Part 1: Scalable compression and coding of continuous-tone still images
— Part 2: Extensions for high dynamic range images
— Part 3: Box file format
— Part 6: IDR Integer Coding
— Part 7: HDR Floating-Point Coding
— Part 8: Lossless and Near-lossless Coding
— Part 9: Alpha Channel Coding
The following parts are under preparation:
— Part 4: Conformance testing
— Part 5: Reference software
iv © ISO/IEC 2016 – All rights reserved

Introduction
This part of ISO/IEC 18477 specifies a coded codestream format for storage of continuous-tone high and
low dynamic range photographic content. JPEG XT part 6 is a scalable image coding system supporting
multiple component images consisting of integer samples of a bit precision between 9 and 16 bits. The
format itself is based on the Box Based format specified in ISO/IEC 18477-3, which ensures that legacy
applications conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1 are able to reconstruct a lower quality,
low dynamic range, eight bits per sample version of the image.
Today, the most widely used digital photography format, a minimal implementation of JPEG (specified
in ITU Recommendation T.81 | ISO/IEC 10918-1), uses a bit depth of 8; each of the three channels that
together compose an image pixel is represented by 8 bits, providing 256 representable values per
channel. For more demanding applications, it is not uncommon to use a bit depth of 16, providing 65 536
representable values to describe each channel within a pixel, resulting on over 2,8 × 10 representable
colour values.
Most common photo and image formats use an 8-bit or 16-bit unsigned integer value to represent some
function of the intensity of each colour channel. While it might be theoretically possible to agree on
one method for assigning specific numerical values to real world colours, doing so is not practical.
Since any specific device has its own limited range for colour reproduction, the device’s range may be a
small portion of the agreed-upon universal colour range. As a result, such an approach is an extremely
inefficient use of the available numerical values, especially when using only 8 bits (or 256 unique
values) per channel. To represent pixel values as efficiently as possible, devices use a numeric encoding
optimized for their own range of possible colours or gamut.
JPEG XT is primarily designed to provide coded data containing intermediate dynamic range and wide
colour gamut content while simultaneously providing 8 bits per pixel low dynamic range images using
tools defined in ISO/IEC 18477-1, which is itself a subset of Rec. ITU-T T.81 | ISO/IEC 10918-1. The goal
is to provide a backwards compatible coding specification that allows legacy applications and existing
toolchains to continue to operate on codestreams conforming to this part of ISO/IEC 18477.
JPEG XT has been designed to be backwards compatible to legacy applications while at the same time
having a small coding complexity; JPEG XT uses, whenever possible, functional blocks of Rec. ITU-T T.81
| ISO/IEC 10918-1 to extend the functionality of the legacy JPEG Coding System. It is optimized for
storage and transmission of intermediate dynamic range and wide colour gamut images while also
enabling low-complexity encoder and decoder implementations.
This part of ISO/IEC 18477 is an extension of ISO/IEC 18477-1, a compression system for continuous
tone digital still images which is backwards compatible with Rec. ITU-T T.81 | ISO/IEC 10918-1. That
is, legacy applications conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1 will be able to reconstruct
streams generated by an encoder conforming to this part of ISO/IEC 18477, though will possibly not
be able to reconstruct such streams in full dynamic range, full quality or other features defined in this
Recommendation| International Standard.
This part of ISO/IEC 18477 is itself based on ISO/IEC 18477-3 which defines a box-based file format
similar to other JPEG standards. The aim of this part of ISO/IEC 18477 is to provide a migration path
for legacy applications to support, potentially in a limited way, coding of intermediate dynamic range
images, that is images represented by sample values requiring 9 to 16 bits precision. While the legacy
Rec. ITU-T T.81 | ISO/IEC 10918-1 already defines a coding mode for 12 bit sample precision, images
encoded in this mode cannot be decoded by applications implementing only the 8 bit mode. Unlike
the legacy standard, this part of ISO/IEC 18477 defines a scalable coding engine supporting all bit
depths between 9 and 16 bits per sample while also staying compatible with legacy applications. Such
applications will continue to work, but will only able to reconstruct an 8 bit standard low dynamic
range (LDR) version of the full image contained in the codestream. This part of ISO/IEC 18477 specifies
a coded file format, referred to as JPEG XT, which is designed primarily for storage and interchange of
continuous-tone photographic content.
© ISO/IEC 2016 – All rights reserved v

INTERNATIONAL STANDARD ISO/IEC 18477-6:2016(E)
Information technology — Scalable compression and
coding of continuous-tone still images —
Part 6:
IDR Integer Coding
1 Scope
This part of ISO/IEC 18477 specifies a coding format, referred to as JPEG XT, which is designed primarily
for continuous-tone photographic content.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 18477-1:2015, Information technology — Scalable compression and coding of continuous-tone still
images — Part 1: Scalable compression and coding of continuous-tone still images
ISO/IEC 18477-3:2015, Information technology — Scalable compression and coding of continuous-tone still
images —Part 3: Box-based file format
Rec. ITU-T T.81 | ISO/IEC 10918-1, Information technology — Digital compression and coding of continuous
tone still images — Requirements and guidelines
Rec. ITU-T BT.601, Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9
aspect ratios
3 Terms and definitions, abbreviated terms, and symbols
3.1 Terms and definitions
For the purposes of this document, the following definitions apply.
3.1.1
AC coefficient
any DCT coefficient for which the frequency is not zero in at least one dimension
3.1.2
ASCII encoding
encoding of text characters and text strings according to ISO/IEC 10646-1
3.1.3
base decoding path
process of decoding legacy codestream and refinement data to the base image, jointly with all further
steps until residual data is added to the values obtained from the residual codestream
3.1.4
base image
collection of sample values obtained by entropy decoding the DCT coefficients of the legacy codestream
and the refinement codestream, and inversely DCT transforming them jointly
© ISO/IEC 2016 – All rights reserved 1

3.1.5
binary decision
choice between two alternatives
3.1.6
bitstream
partially encoded or decoded sequence of bits comprising an entropy-coded segment
3.1.7
block
8 × 8 array of samples or an 8 × 8 array of DCT coefficient values of one component
3.1.8
box
structured collection of data describing the image or the image decoding process embedded into one or
multiple APP marker segments
Note 1 to entry: See ISO/IEC 18477-3:2015, Annex B for the definition of boxes.
3.1.9
byte
group of 8 bits
3.1.10
coder
embodiment of a coding process
3.1.11
coding
encoding or decoding
3.1.12
coding model
procedure used to convert input data into symbols to be coded
3.1.13
(coding) process
general term for referring to an encoding process, a decoding process, or both
3.1.14
compression
reduction in the number of bits used to represent source image data
3.1.15
component
two-dimensional array of samples having the same designation in the output or display device
Note 1 to entry: An image typically consists of several components, e.g. red, green, and blue.
3.1.16
continuous-tone image
image whose components have more than one bit per sample
3.1.17
DC coefficient
DCT coefficient for which the frequency is zero in both dimensions
3.1.18
decoder
embodiment of a decoding process
2 © ISO/IEC 2016 – All rights reserved

3.1.19
decoding process
process which takes as its input compressed image data and outputs a continuous-tone image
3.1.20
dequantization
inverse procedure to quantization by which the decoder recovers a representation of the DCT coefficients
3.1.21
discrete cosine transform
DCT
either the forward discrete cosine transform or the inverse discrete cosine transform
3.1.22
downsampling
procedure by which the spatial resolution of a component is reduced
3.1.23
encoder
embodiment of an encoding process
3.1.24
encoding process
process which takes as its input a continuous-tone image and outputs compressed image data
3.1.25
entropy-coded (data) segment
independently decodable sequence of entropy encoded bytes of compressed image data
3.1.26
entropy decoder
embodiment of an entropy decoding procedure
3.1.27
entropy decoding
lossless procedure which recovers the sequence of symbols from the sequence of bits produced by the
entropy encoder
3.1.28
entropy encoder
embodiment of an entropy encoding procedure
3.1.29
entropy encoding
lossless procedure which converts a sequence of input symbols into a sequence of bits such that the
average number of bits per symbol approaches the entropy of the input symbols
3.1.30
grayscale image
continuous-tone image that has only one component
3.1.31
high dynamic range
image or image data comprised of more than eight bits per sample
3.1.32
Huffman decoder
embodiment of a Huffman decoding procedure
© ISO/IEC 2016 – All rights reserved 3

3.1.33
Huffman decoding
entropy decoding procedure which recovers the symbol from each variable length code produced by
the Huffman encoder
3.1.34
Huffman encoder
embodiment of a Huffman encoding procedure
3.1.35
Huffman encoding
entropy encoding procedure which assigns a variable length code to each input symbol
3.1.36
intermediate dynamic range
image or image data comprised of more than eight bits per sample
3.1.37
joint photographic experts group
JPEG
informal name of the committee which created this part of ISO/IEC 18477
Note 1 to entry: The “joint” comes from the ITU-T and ISO/IEC collaboration.
3.1.38
legacy codestream
collection of markers and syntax elements defined by Rec. ITU-T T.81 | ISO/IEC 10918-1 bare any
additional syntax elements defined by the family ISO/IEC 18477 standards, i.e. the legacy codestream
consists of the collection of all markers except those APP markers that describe JPEG XT boxes by the
syntax defined in ISO/IEC 18477-3:2015, Annex A
3.1.39
legacy decoding path
collection of operations to be performed on the entropy coded data as described by Rec. ITU-T T.81 |
ISO/IEC 10918-1 jointly with the Legacy Refinement scans before this data is merged with the residual
data to form the final output image
3.1.40
legacy decoder
embodiment of a decoding process conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1, confined to the
lossy DCT process and the baseline, sequential, or progressive modes, decoding at most four components
to eight bits per component
3.1.41
legacy image
arrangement of sample values as described by applying the decoding process described by Rec.
ITU-T T.81 | ISO/IEC 10918-1 on the entropy coded data as defined by said standard
3.1.42
lossless
descriptive term for encoding and decoding processes and procedures in which the output of the
decoding procedure(s) is identical to the input to the encoding procedure(s)
3.1.43
lossless coding
mode of operation which refers to any one of the coding processes defined in this part of ISO/IEC 18477
in which all of the procedures are lossless
Note 1 to entry: See ISO/IEC 18477-8.
4 © ISO/IEC 2016 – All rights reserved

3.1.44
lossy
descriptive term for encoding and decoding processes which are not lossless
3.1.45
low dynamic range
image or image data comprised of data with no more than eight bits per sample
3.1.46
marker
two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and
hexadecimal FE
3.1.47
marker segment
marker together with its associated set of parameters
3.1.48
pixel
collection of sample values in the spatial image domain having all the same sample coordinates, e.g. a
pixel may consist of three samples describing its red, green, and blue value
3.1.49
precision
number of bits allocated to a particular sample or DCT coefficient
3.1.50
procedure
set of steps which accomplishes one of the tasks which comprise an encoding or decoding process
3.1.51
quantization value
integer value used in the quantization procedure
3.1.52
quantize
act of performing the quantization procedure for a DCT coefficient
3.1.53
residual decoding path
collection of operations applied to the entropy coded data contained in the residual data box and
residual refinement scan boxes up to the point where this data is merged with the base image to form
the final output image
3.1.54
residual image
extension image
sample values as reconstructed by inverse quantization and inverse DCT transformation applied to the
entropy-decoded coefficients described by the residual scan and residual refinement scans
3.1.55
residual scan
additional pass over the image data invisible to legacy decoders which provides additive and/or
multiplicative correction data of the legacy scans to allow reproduction of high dynamic range or wide
colour gamut data
3.1.56
refinement scan
additional pass over the image data invisible to legacy decoders which provides additional least
significant bits to extend the precision of the DCT transformed coefficients
© ISO/IEC 2016 – All rights reserved 5

3.1.57
sample
one element in the two-dimensional image array which comprises a component
3.1.58
sample grid
common coordinate system for all samples of an image
Note 1 to entry: The samples at the top left edge of the image have the coordinates (0,0), the first coordinate
increases towards the right, the second towards the bottom.
3.1.59
scan
single pass through the data for one or more of the components in an image
3.1.60
scan header
marker segment that contains a start-of-scan marker and associated scan parameters that are coded at
the beginning of a scan
3.1.61
superbox
box that carries other boxes as payload data
3.1.62
table specification data
coded representation from which the tables used in the encoder and decoder are generated and their
destinations specified
3.1.63
(uniform) quantization
procedure by which DCT coefficients are linearly scaled in order to achieve compression
3.1.64
upsampling
procedure by which the spatial resolution of a component is increased
3.1.65
vertical sampling factor
relative number of vertical data units of a particular component with respect to the number of vertical
data units in the other components in the frame
3.1.66
zero byte
0x00 byte
3.1.67
zig-zag sequence
specific sequential ordering of the DCT coefficients from (approximately) lowest spatial frequency to
highest
3.2 Symbols
X Width of the sample grid in positions
Y Height of the sample grid in positions
Nf Number of components in an image
s Subsampling factor of component in horizontal direction
i,x
6 © ISO/IEC 2016 – All rights reserved

s Subsampling factor of component in vertical direction
i,y
H Subsampling indicator of component in the frame header
i
V Subsampling indicator of component in the frame header
i
v Sample value at the sample grid position x,y
x,y
R Additional number of DCT coefficient bits represented by refinement scans in the legacy decod-
h
ing path, 8+R is the number of non-fractional bits (i.e. bits in front of the “binary dot”) of the
h
output of the inverse DCT process in the legacy decoding path.
R Additional number of DCT coefficient bits represented by refinement scans in the residual decod-
r
ing path. P+R is the number of non-fractional bits of the output of the invers DCT process in the
r
residual decoding path, where P is the frame-precision of the residual image as recorded in the
frame header of the residual codestream.
R Additional bits in the HDR image. 8+Rb is the sample precision of the reconstructed HDR image.
b
3.3 Abbreviated terms
For the purposes of this part of ISO/IEC 18477, the following abbreviated terms apply.
ASCII American Standard Code for Information Interchange
LSB Least Significant Bit
MSB Most Significant Bit
HDR High Dynamic Range
IDR Intermediate Dynamic Range
LDR Low Dynamic Range
TMO Tone Mapping Operator
DCT Discrete Cosine Transformation
4 Conventions
4.1 Conformance language
This part of ISO/IEC 18477 consists of normative and informative text.
Normative text is that text which expresses mandatory requirements. The word “shall” is used to express
mandatory requirements strictly to be followed in order to conform to this part of ISO/IEC 18477 and
from which no deviation is permitted. A conforming implementation is one that fulfils all mandatory
requirements.
Informative text is text that is potentially helpful to the user, but not indispensable and can be removed,
changed, or added editorially without affecting interoperability. All text in this part of ISO/IEC 18477
is normative, with the following exceptions: The Introduction, any parts of the text that are explicitly
labelled as “informative”, and statements appearing with the preamble “NOTE” and behaviour described
using the word “should”. The word “should” is used to describe behaviour that is encouraged but is not
required for conformance to this part of ISO/IEC 18477.
The keywords “may” and “need not” indicate a course of action that is permissible in a conforming
implementation.
© ISO/IEC 2016 – All rights reserved 7

The keyword “reserved” indicates a provision that is not specified at this time, shall not be used, and
may be specified in the future. The keyword “forbidden” indicates “reserved” and in addition indicates
that the provision will never be specified in the future.
4.2 Operators
NOTE Many of the operators used in this part of ISO/IEC 18477 are similar to those used in the C
programming language.
4.2.1 Arithmetic operators
+ Addition
− Subtraction (as a binary operator) or negation (as a unary prefix operator)
* Multiplication
/ Division without truncation or rounding
4.2.2 Logical operators
|| Logical OR
&& Logical AND
! Logical NOT
∈ x ∈ {A, B} is defined as (x == A || x = = B)
∉ x ∉ {A, B} is defined as (x != A && x != B)
4.2.3 Relational operators
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
== Equal to
!= Not equal to
4.2.4 Precedence order of operators
Operators are listed below in descending order of precedence. If several operators appear in the same
line, they have equal precedence. When several operators of equal precedence appear at the same level
8 © ISO/IEC 2016 – All rights reserved

in an expression, evaluation proceeds according to the associativity of the operator either from right to
left or from left to right.
Operators Type of operation Associativity
(), [ ], . Expression Left to Right
− Unary negation
*, / Multiplication Left to Right

+, − Addition and Subtraction Left to Right
<, >, <=, >= Relational Left to Right
4.2.5 Mathematical functions
Ceil of x. Returns the smallest integer that is greater than or equal to x.
x
 
 
Floor of x. Returns the largest integer that is lesser than or equal to x.
x
 
 
|x| Absolute value, is –x for x < 0, otherwise x.
sign(x) Sign of x, 0 if x is zero, +1 if x is positive, −1 if x is negative.
clamp(x,min,max) Clamps x to the range [min,max]: Returns min if x < min, max if x > max or otherwise x.
a
x Raises the value of x to the power of a. x is a non-negative real number, a is a real number.
a
x is equal to exp[a × log(x)] where exp is the exponential function and log() the natural
a
logarithm. If x is 0 and a is positive, x is defined to be 0.
5 General
The purpose of this Clause is to give an informative overview of the elements specified in this part of
ISO/IEC 18477. Another purpose is to introduce many of the terms which are defined in Clause 3. These
terms are printed in italics upon first usage in this Clause.
There are three elements specified in this part of ISO/IEC 18477.
a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source
image data and encoder specifications, and by means of a specified set of procedures generates as
output codestream.
b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by
means of a specified set of procedures generates as output digital reconstructed image data.
c) The codestream is a compressed image data representation which includes all necessary data to
allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data
might be required that define the interpretation of the sample data, such as colour space or the
spatial dimensions of the samples.
5.1 High level overview on JPEG XT ISO/IEC 18477-6
This part of ISO/IEC 18477 allows lossy coding of intermediate dynamic range of photographic images
in a way that is backwards compatible to Rec. ITU-T T.81 | ISO/IEC 10918-1. Decoders compliant to the
latter standard will be able to parse codestreams conforming to this part of ISO/IEC 18477 correctly,
albeit in less precision, with a limited dynamic range, and loss in sample bit precision.
© ISO/IEC 2016 – All rights reserved 9

This part of ISO/IEC 18477 includes multiple tools to reach the above functionality, defined in Annex B
and on. It is itself based on the file format specified in ISO/IEC 18477-3 and uses the syntax elements and
tools defined there. While ISO/IEC 18477-3 only defines a syntax, this part of ISO/IEC 18477 extends
the syntax of ISO/IEC 18477-3 to allow the representation of intermediate dynamic range images. It
also defines a decoding process that reconstructs sample values from conforming files. A high level
overview on both the syntax and the decoding will be given in this section.
The syntax of an ISO/IEC 18477-6 compliant codestream is specified in ISO/IEC 18477-3, that is, this
part of ISO/IEC 18477 uses a syntax element denoted as “box” to annotate its syntactical elements. The
definition of the box syntax element is not repeated here, and readers are referred to ISO/IEC 18477-3
for further details. Additional boxes besides those already specified in ISO/IEC 18477-3 are defined
here, in specific, the Residual Data box, the Refinement Data box, and the Residual Refinement box at
the top level of the file, and various sub-boxes of the Merging Specification box defining the decoding
process. The Merging Specification Superbox is already defined in ISO/IEC 18477-3, all additional box
types are specified in Annex B.
LDR Image
Base Image Chroma
Base Image Chroma
Decoder Upsampling
Decoder Upsampling Inverse
Inverse
Decorrelation
Decorrelation
HDR Image
Base Ouput
Base Ouput
+
Mapping Conversion
+
Mapping Conversion
Range
Range
adjustment
adjustment
Residual Image Chroma ReReReResisisisidualdualdualdual Residual
Residual Image Chroma ReReReResisisisidualdualdualdual Residual
Decoder Upsampling MMMMappappappappinininingggg Decorrelation
Decoder Upsampling MMMMappappappappinininingggg Decorrelation
Figure 1 — Overview on the decoding process
This part of ISO/IEC 18477 extends the legacy decoding process by two mechanisms (see Figure 1).
The Refinement Scan, specified in Annex D, increases the bit precision of the DCT coefficients,
i.e. it operates in the DCT domain. The mechanism used here is very similar to that of Subsequent
Approximation scans specified in Rec. ITU-T T.81 | ISO/IEC 10918-1:1994, Annex G: A baseline,
extended or progressive Huffman scan as defined in Annex G of the legacy standard defines the 12 most
significant bits of the DCT coefficients. These initial scans are represented in the legacy codestream
and are visible for any ISO/IEC 18477-1 compliant decoder. Refinement scans decode now into up to
four additional least significant bits in the same way subsequent approximation within Rec. ITU-T T.81
| ISO/IEC 10918-1 decode least significant bits of a progressive scan pattern. The difference between
Refinement scans and Subsequent Approximation scans is only that in the latter case the number of
least significant bits is annotated in the scan header of the legacy codestream, whereas Refinement
scans are hidden from legacy applications and do not alter the scan header of the legacy codestream.
Their number is indicated in the Refinement Specification box within the Merging Specification box and
not in the legacy codestream.
10 © ISO/IEC 2016 – All rights reserved

While Refinement Scans extend the bit precision within the DCT domain by up to four bits and hence
allow backwards compatible coding of images of up to 12 bits sample precision, Residual Scans extend
the sample precision in the spatial (image) domain. While the entropy coded data of Residual Scans is
hidden in the Residual Data box from legacy applications, its decoding process is identical to that of
the legacy data: A baseline, extended or progressive Huffman scan decodes the data in the Residual
Data box to DCT coefficients, inverse quantization and inverse Discrete Cosine Transformation (DCT)
compute from these coefficients the residual image data. An image merging process, defined in
Annex A, computes from the precursor image reconstructed from the base image and the residual
image a final IDR output image. This merging process first performs chroma upsampling to reconstruct
a single sample on each point of the sample grid of the base image. Chroma upsampling is specified in
ISO/IEC 18477-1:2015, Annex A. It then converts the colour space of the base image first from YCbCr into
the Rec. ITU-T BT.601 colourspace, followed by an additional linear transformation transforming the
Rec. ITU-T BT.601 primary colours into the primary colours of the target IDR colourspace. For practical
reasons, these two transformations are combined into a single linear transformation matrix. This linear
transformation is followed by a nonlinear point transformation acting separately on each of the output
channels sample by sample. This point transformation can be either specified by a parametric curve or
by an explicit lookup table. The output of this decoding path is transformed again by an optional colour
transformation forming the precursor image which represents a rough imprecise approximation of
the final IDR image, already in the correct IDR output colour space.
Processing continues with the decoding of the Residual Image: DCT coefficients of the residual
image are decoded from the information in the Residual Data box, and their bit precision is extended
by additional refinement scans decoded from the data in the Residual Refinement box. Processing
proceeds with inverse quantization and inverse DCT transformation. The output undergoes chroma
upsampling to generate a single sample per sample grid coordinate. The next processing step performs
a nonlinear point transformation on each of the reconstructed channels, separately for each sample,
resulting in an error image in a YCbCr type of colour space. Samples undergo then an inverse linear
decorrelation transformation to map the sample values from the intermediate YCbCr colourspace into
the target colourspace. This transformation is typically identical to the transformation matrix in the
base decoding path, but does not need to be. The result of this operation is the residual image.
To form the final output image, sample values of the precursor image and the residual image are
added together, plus an offset to make the residual image symmetric around zero. Results are then
clamped to the range of the intermediate range output image.
The detailed specification of the decoding and merging process is found in Annex A.
5.2 Profiles
The Profiles define the implementation of a particular technology within the functional blocks of
Figure 1. The profiles are described in Annex E.
5.3 Encoder requirements
There is no requirement in this part of ISO/IEC 18477 that any encoder shall support all profiles. An
encoder is only required to meet the compliance tests and to generate the codestream according to the
syntax and to limit the coding parameters to those valid within the profile it conforms to. Profiles are
defined in Annex E.
5.4 Decoder requirements
A decoding process converts compressed image data to reconstructed image data. It may follow the
decoding operation specified in the Recommendation | International Standard and ISO/IEC 18477-1
to generate an LDR image from the legacy codestream, and it shall follow the operations in this part
of ISO/IEC 18477 to decode an IDR image from the data in the full file. The decoder shall parse the
codestream syntax to extract the parameters, the residual image and the legacy image. The parameters
shall be used to merge the residual image with the base image into the reconstructed IDR Image.
© ISO/IEC 2016 – All rights reserved 11

In order to comply with this part of ISO/IEC 18477, a decoder
a) may convert a codestream conforming to this part of ISO/IEC 18477 without considering the
information in any box into to a low dynamic range image, and
b) shall convert a conforming codestream within the profile it claims to be conforming to into an
intermediate dynamic range image.
12 © ISO/IEC 2016 – All rights reserved

Annex A
(normative)
Encoding and decoding process
A.1 Decoding process (normative)
The decoding process relies on a layered approach to extend JPEG’s capabilities. The encoder decomposes
an IDR image into a base layer, which consists of a tone-mapped version of the IDR image and an IDR residual
layer. In addition to the residual layer, the codestream includes a description of an approximate inverse
tone mapping operation that allows the decoder to reconstruct from the LDR image an approximate IDR
image; the errors of this approximation process are corrected by the residual codestream included the
residual data box and residual refinement box (see Annex B). Both the description of the tone mapping
and the residual image are included in boxes invisible to legacy decoders. Such decoders will thus only
see the tone mapped LDR image. While the base image complies to ISO/IEC 18477-1 and thus supports
only the 8-bit extended or baseline, extended or progressive Huffman modes, the residual image may
optionally be encoded in the 12-bit Huffman or progressive modes.
Figure A.1 illustrates the functionality of a compliant decoder:
Base Image
YcbCr
T.81 10918-1 Decoder Chroma
T.81 10918-1 Decoder Chroma
To RGB
Base Image Upsampling
Base Image Upsampling
B1 B2
Base
Base
Trans-
Trans-
formation
formation
B3
Re
inement Scan
Re
inement Scan
B1a
HDR Image
O J H F
i i i i
Base NLT Color
Base NLT Color Output
Point Trans- Output
+
Point Trans- Conversion
+
Trafo formation B9 Conversion
Trafo formation
B4 B4a B10
Q
i
2nd Residual
2nd Residual
NLT
NLT
Point
Point
Trafo
Trafo
B8d
Ù
i
Residual Ref'ment Scan
Residual Ref'ment Scan
B5a
Residual
Residual Residual
T.81 10918-1 Decoder Chroma NLT Residual
T.81 10918-1 Decoder Chroma NLT Trans-
Residual Image Upsampling Point Trans-
Residual Image Upsampling Point formation
R P
Trafo formation
i i
Trafo
B5 B6 B7 B8
NOTE Bold lines carry three (or one, for grey scale) components. Round boxes implement point-transformations,
square boxes (except B1, B1a, B5, B5a) multiplications by 3 × 3 matrices. Letters denote signal names.
Figure A.1 — High level overview of the decoding process of a compliant decoder
© ISO/IEC 2016 – All rights reserved 13

This subclause specifies the reconstruction process of an intermediate dynamic range image from a
LDR image and a residual image decoders shall follow. This process consists of the following steps, see
also Figure A.1.
— In steps B1 and B1a, reconstruct the base image from legacy codestream and the refinement
codestream if a Refinement Data box is present. Refinement coding is specified in Annex D.
— In step B1 and B1a, apply the Inverse Quantization and Inverse Discrete Cosine Transformation as
in Rec. ITU-T T.81 | ISO/IEC 10918-1.
— In step B2, the upsampling process specified in ISO/IEC 18477-1:2015, Annex A shall be followed to
generate samples for all positions on the sample grid.
— In step B3, the linear transformation as selected by the Base Transformation box defined in
Annex B shall be applied to inversely decorrelate the image components. Table C.1 defines which
transformation to pick. The output of this block consists of either one or three samples per grid point
O , depending on the number of components in the base image. The output of this transformation is
i
Rh+8
rounded to integers and clipped to [0,2 −1] where R is the number of refinement scans in the
h
base image (see Annex D).
— In step B4, a nonlinear point transformation shall be applied to each of the output components
O . This process is selected according to the Base nonlinear Point Transformation subbox of the
i
Merging Specification box, implementing the L Luts of Figure 1 and following the specifications of
i
ISO/IEC 18477-3:2015, Annex C. The outputs of this process are the predicted high dynamic range
samples J . As above, i = 1.Nf.
i
— In step B4a, a colour transformation is applied to the input values J resulting in the output pixel
i
values H . The transformation is selected by the Colour Transformation subbox of the Merging
i
Specification box, which selects one of the transformations defined in Annex C. If Nf equals 1, no
transformation is performed.
— In steps B5 and B5a, the residual image shall be reconstructed from the data contained in the
Residual Codestream box and the Residual Refinement box. The codestream contained in this box
follows the specifications defined in Rec. ITU-T T.81 | ISO/IEC 10918-1. If a Residual Refinement box
is present, the precision of the samples of the residual codestream shall be extended by refinement
coding as specified in Annex D. The number of components of the residual image shall be equal to
the number of components signalled in the base image.
— In steps B5 and B5a, apply Inverse Quantization and Inverse Discrete Cosine Transformation as in
Rec. ITU-T T.81 | ISO/IEC 10918-1.
— In step B6, residual data is upsampled to the common sample grid following the specificat
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...