Information technology — JPEG XL image coding system — Part 1: Core coding system

This document specifies a set of compression methods for coding one or more images of bi-level, continuous-tone greyscale, or continuous-tone colour, or multichannel digital samples. This document: — specifies decoding processes for converting compressed image data to reconstructed image data; — specifies a codestream syntax containing information for interpreting the compressed image data; — provides guidance on encoding processes for converting source image data to compressed image data.

Technologies de l'information — Systѐme de codage d'images JPEG XL — Partie 1: Système de codage de noyau

General Information

Status
Published
Publication Date
24-Jul-2024
Current Stage
6060 - International Standard published
Start Date
25-Jul-2024
Due Date
12-May-2025
Completion Date
25-Jul-2024
Ref Project

Relations

Standard
ISO/IEC 18181-1:2024 - Information technology — JPEG XL image coding system — Part 1: Core coding system Released:25. 07. 2024
English language
91 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


International
Standard
ISO/IEC 18181-1
Second edition
Information technology — JPEG XL
image coding system —
2024-07
Part 1:
Core coding system
Technologies de l'information — Systѐme de codage d'images
JPEG XL —
Partie 1: Système de codage de noyau
Reference number
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2024 – All rights reserved
ii
Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and abbreviated terms . 1
3.1 Terms and definitions .1
3.2 Inputs .2
3.3 Processes .3
3.4 Image and codestream organization .4
3.5 Abbreviated terms .5
4 Conventions . 5
4.1 Mathematical symbols .5
4.2 Functions .6
4.3 Operators .6
4.4 Pseudocode .7
5 Functional concepts . . 8
5.1 Image organization .8
5.2 Mirroring .8
5.3 Group splitting .8
5.4 Codestream organization .8
6 Encoder requirements . 9
7 Decoder requirements. 9
Annex A (normative) Codestream overview . 10
Annex B (normative) Header syntax .11
Annex C (normative) Entropy decoding . 14
Annex D (normative) Image header .20
Annex E (normative) Colour encoding .25
Annex F (normative) Frame header .34
Annex G (normative) Frame data sections . 41
Annex H (normative) Modular .45
Annex I (normative) VarDCT .55
Annex J (normative) Restoration filters .70
Annex K (normative) Image features . 74
Annex L (normative) Colour transforms .82
Annex M (normative) Profiles and levels .85
Annex N (normative) Extensions .87
Annex O (informative) Encoder overview .88
Bibliography .91

© ISO/IEC 2024 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This second edition cancels and replaces the first edition (ISO/IEC 18181-1:2022), which has been technically
and editorially revised. It also incorporates the Amendment ISO/IEC 18181-1:2022/Amd 1:2022.
The main changes are as follows:
— technical corrections and clarifications, in particular to correct the values of various constants, correct
errors in pseudocode, and clarify ambiguities in order to remove discrepancies between this document
and ISO/IEC 18181-3 and ISO/IEC 18181-4;
— a thorough update of the document structure in order to improve clarity of presentation and to obtain a
more logical ordering of the material from the point of view of decoder implementation.
A list of all parts in the ISO/IEC 18181 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.

© ISO/IEC 2024 – All rights reserved
iv
International Standard ISO/IEC 18181-1:2024(en)
Information technology — JPEG XL image coding system —
Part 1:
Core coding system
1 Scope
This document specifies a set of compression methods for coding one or more images of bi-level, continuous-
tone greyscale, or continuous-tone colour, or multichannel digital samples.
This document:
— specifies decoding processes for converting compressed image data to reconstructed image data;
— specifies a codestream syntax containing information for interpreting the compressed image data;
— provides guidance on encoding processes for converting source image data to compressed image data.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 15076-1, Image technology colour management — Architecture, profile format and data structure — Part 2:
Based on ICC.1:2022
ISO/IEC 60559, Information technology — Microprocessor Systems — Floating-Point arithmetic
IEC 61966-2-1, Multimedia systems and equipment — Colour measurement and management — Part 2-1: Colour
management — Default RGB colour space — sRGB
ITU-R BT.2100-2, Image parameter values for high dynamic range television for use in production and
international programme exchange
ITU-R BT.709-6, Parameter values for the HDTV standards for production and international programme
exchange
IETF RFC 7932:2016, Brotli Compressed Data Format
SMPTE ST 428-1, D-Cinema Distribution Master — Image Characteristics
3 Terms, definitions and abbreviated terms
3.1 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org

© ISO/IEC 2024 – All rights reserved
3.1.1
bitstream
sequence of bytes
3.1.2
codestream
bitstream representing compressed image data
3.1.3
bundle
structured data consisting of one or more fields
3.1.4
field
numerical value or bundle, or an array of either
3.1.5
histogram
array of unsigned integers representing a probability distribution, used for entropy coding
3.2 Inputs
3.2.1
pixel
vector of dimension corresponding to the number of channels, consisting of samples
3.2.2
sample
integer or real value, of which there is one per channel per pixel
3.2.3
grid
2-dimensional array
3.2.4
sample grid
common coordinate system for all samples of an image, with top-left coordinates (0, 0), the first coordinate
increasing towards the right, and the second increasing towards the bottom
3.2.5
channel component
rectangular array of samples having the same designation, regularly aligned along a sample grid
3.2.6
rectangle
rectangular area within a channel or grid
3.2.7
width
width in samples (number of columns) of a sample grid or a rectangle
3.2.8
height
height in samples (number of rows) of a sample grid or a rectangle
3.2.9
frame
single image (possibly part of an animation, composite, or multi-page image), i.e. a 2-dimensional array of pixels

© ISO/IEC 2024 – All rights reserved
3.2.10
greyscale
image representation in which each pixel is defined by a single sample representing intensity (either
luminance or luma depending on the ICC profile)
3.2.11
opsin
photosensitive pigments in the human retina, having dynamics approximated by the XYB colour space
3.2.12
animation
series of pictures and timing delays to display as a video medium
3.2.13
tick
unit of time such that animation frame durations are integer multiples of the tick duration
3.2.14
composite
series of images that are superimposed
3.2.15
multi-page image
sequence of pictures to display in a paged way
3.2.16
preview
lower-fidelity rendition of one of the frames (e.g., lower resolution), or a frame that represents the entire
content of all frames
3.3 Processes
3.3.1
decoding process
process which takes as its input a codestream and outputs an image
3.3.2
decoder
embodiment of a decoding process
3.3.3
encoding process
process which takes as its input image(s) and outputs compressed image data in the form of a codestream
3.3.4
encoder
embodiment of an encoding process
3.3.5
lossless
descriptive term for encoding and decoding processes in which the output of a decoding procedure is
identical to the input to the encoding procedure
3.3.6
lossy
descriptive term for encoding and decoding processes which are not lossless
3.3.7
upsampling
procedure by which the (nominal) spatial resolution of a channel is increased

© ISO/IEC 2024 – All rights reserved
3.3.8
downsampling
procedure by which the spatial resolution of a channel is reduced
3.3.9
entropy encoding
lossless procedure designed to convert a sequence of input symbols into a sequence of bits such that the
average number of bits per symbol approaches the entropy of the input symbols
3.3.10
entropy encoder
embodiment of an entropy encoding procedure
3.3.11
entropy decoding
lossless procedure which recovers the sequence of symbols from the sequence of bits produced by the
entropy encoder
3.3.12
entropy decoder
embodiment of an entropy decoding procedure
3.3.13
channel decorrelation
method of reducing total encoded entropy by removing correlations between channels
3.3.14
channel correlation factor
factor by which a channel should be multiplied by before adding it to another channel to undo the channel
decorrelation process
3.3.15
VarDCT
lossy encoding of a frame that applies DCT to varblocks
3.3.16
quantization
method of reducing the precision of (DCT) coefficients
3.3.17
quantization weight
factor that a quantized coefficient is multiplied by prior to application of the inverse DCT in the decoding process
3.4 Image and codestream organization
3.4.1
raster order
access pattern from left to right in the top row, then in the row below and so on
3.4.2
naturally aligned
positioning of a power-of-two sized rectangle such that its top and left coordinates are divisible by its width
and height, respectively
3.4.3
block
naturally aligned square rectangle covering up to 8 × 8 input pixels
3.4.4
varblock
variable-size rectangle of one or more blocks

© ISO/IEC 2024 – All rights reserved
3.4.5
group
n n
naturally aligned rectangle covering up to 2 × 2 input pixels, with n between 7 and 10, inclusive
3.4.6
table of contents
data structure that enables seeking to a group or the next frame within a codestream
3.4.7
section
part of the codestream with an offset and length that are stored in a frame’s table of contents
3.4.8
coefficient
input value to the inverse DCT
3.4.9
pass
data enabling decoding of successively more detail
3.4.10
LF group
n+3 n+3
LF values from a naturally aligned rectangle covering up to 2 × 2 input pixels
3.5 Abbreviated terms
DCT discrete cosine transform (as specified in I.7)
HF all DCT coefficients apart from the LF coefficients, i.e. the high frequency coefficients
IDCT inverse discrete cosine transform (as specified in I.7)
LF lowest frequency coefficients of DCT coefficients, i.e. block averages
RGB additive colour model with red, green, blue channels
XYB absolute colour space based on gamma-corrected LMS (a colour space representing the response
of cone cells in the human eye), in which X is derived from the difference between L and M (red-
green), Y is an average of L and M (behaves similarly to luminance), and B is derived from S (blue)
4 Conventions
4.1 Mathematical symbols
[a, b], (c, d), [e, f) closed or open or half-open intervals containing all integers or real numbers x (depend-
ing on context) such that a ≤x ≤b, c < x < d, e ≤x < f
{a, b, c} ordered sequence of elements
pi the smallest positive zero of the sine function ( π )

© ISO/IEC 2024 – All rights reserved
4.2 Functions
pow(x,y)
exponentiation, x to the power of y.
sqrt(x)
square root, such that pow(sqrt(x),2) == x and sqrt(x) >= 0. Undefined for x < 0.
cbrt(x)
cube root, such that pow(cbrt(x),3) == x.
cos(r)
cosine of the angle r (in radians).
erf(x)
x 2
−t
Gauss error function: erf(x) = edt

π
log(x)
natural logarithm of x. Undefined for x <= 0.
log2(x)
base-two logarithm of x. Undefined for x <= 0.
floor(x)
the largest integer that is less than or equal to x.
ceil(x)
the smallest integer that is greater than or equal to x.
abs(x)
absolute value of x: equal to -x if x < 0, otherwise x.
sign(x)
sign of x, 0 if x is 0, +1 if x is positive, −1 if x is negative.
len(a)
length (number of elements) of array a.
sum(a)
sum of all elements of the array/tuple/sequence a.
max(a)
maximal element of the array/tuple/sequence a.
min(a)
smallest element of the array/tuple/sequence a.
UnpackSigned(u)
equivalent to u/2 if u is even, and -(u + 1)/2 if u is odd.
clamp(x, lo, hi)
equivalent to min(max(lo, x), hi).
InterpretAsF16(u)
the real number resulting from interpreting the unsigned 16-bit integer u as a
binary16 floating-point number representation (cf. ISO/IEC 60559)
InterpretAsF32(u)
the real number resulting from interpreting the unsigned 32-bit integer u as a
binary32 floating-point number representation (cf. ISO/IEC 60559)
NarrowToI32(x)
the signed 32-bit integer corresponding to the 32 least significant bits of the two's
complement representation of x
BitSet(u,b)
the bit corresponding to b is 1 in u: true if and only if u & b (bitwise AND)
4.3 Operators
This document uses the operators defined by the C++ programming language (ISO/IEC 14882), with the
following differences:
/
division of real numbers without truncation or rounding. Division by zero is undefined.
<<
left shift: x << s is defined as x * pow(2,s)
>>
right shift: x >> s is defined as floor(x / pow(2,s))
Umod
remainder: a Umod d is the unique integer r in [0, d) for which a == r + q * d for a suitable
integer q
© ISO/IEC 2024 – All rights reserved
Idiv
integer division: a Idiv b is equivalent to a / b, rounded towards zero to an integer value
and
logical AND (&& in C++), true if and only if both operands are true, with short-circuit evaluation
or
logical OR (|| in C++), false if and only if both operands are false, with short-circuit evaluation
The order of precedence for these operators is listed in Table 1 in descending order. If several operators
appear in the same line, they have equal precedence. When several operators of equal precedence appear at
the same level in an expression, evaluation proceeds according to the associativity of the operator (either
from right to left or from left to right).
Table 1 — Operator precedence
Operators Type of operation Associativity
++x, - -x prefix increment/decrement left to right
x++, x- - postfix increment/decrement left to right
!, ~ logical/bitwise NOT right to left
*, /, Idiv, Umod multiplication, division, integer division, remainder left to right
+, - addition and subtraction left to right
<<, >> left shift and right shift left to right
< , >, <=, >=, ==, != relational left to right
&
bitwise AND left to right
^
bitwise XOR left to right
|
bitwise OR left to right
and
logical AND left to right
or
logical OR left to right
=
assignment right to left
+=, -=, *= compound assignment right to left
4.4 Pseudocode
This document describes functionality using pseudocode formatted as follows:
// Informative comment
var = u(8);
if (var == 1) return; // Stop executing this code snippet
/* Normative specification: var != 0 */
(out1, out2) = Function(var, kConstant);

Variables such as var are typically referenced by text outside the source code.
The semantics of this pseudocode are those of the C++ programming language (ISO/IEC 14882), except that:
— Symbols from 4.1 and functions from 4.2 are allowed;
— Division, remainder and exponentiation are expressed as specified in 4.3;
— Functions can return tuples which unpack to variables as in the above example;
— /* */ enclose normative directives specified using prose;
— All integers are stored using two’s complement;
— Expressions and variables of which types are omitted, are understood as real numbers.
Where unsigned integer wraparound and truncated division are required, Umod and Idiv (see 4.3) are used
for those purposes.
© ISO/IEC 2024 – All rights reserved
5 Functional concepts
5.1 Image organization
A channel is defined as a rectangular array of (integer or real) samples regularly aligned along a sample grid
of width sample positions horizontally and height sample positions vertically. The number of channels may
be 1 to 4099 (see num_extra in D.3).
A pixel is defined as a vector of dimension corresponding to the number of channels, consisting of samples
with a position matching that of the pixel.
An image is defined as a two-dimensional array of pixels, and its width is width and height is height. Unless
otherwise mentioned, `raster order’ is used: left to right column within the topmost row, then left to right
column within the row below the top, and so on until the rightmost column of the bottom row.
A codestream may contain multiple image frames. These can constitute an animation (a sequence of images)
or a composite still image. Frames can have dimensions that differ from the image dimensions, and they can
be blended over preceding frames. The frame that is being decoded is referred to as the `current’ frame.
5.2 Mirroring
Some operations access samples with coordinates cx, cy that are outside the current frame bounds. The
decoder redirects such accesses to a valid sample at the coordinates Mirror(cx, cy) as defined in the
following code:
Mirror1D(coord, size) {
if (coord < 0) return Mirror1D(−coord − 1, size);
else if (coord >= size) return Mirror1D(2 * size − 1 − coord, size);
else return coord;
}
Mirror(x, y) {
return (Mirror1D(x, width), Mirror1D(y, height));
}
5.3 Group splitting
Channels are logically partitioned into naturally-aligned groups of group_dim × group_dim samples. The effective
dimension of a group (i.e. how many pixels to read) can be smaller than group_dim for groups on the right or
bottom of the image. The decoder ensures the decoded image has the dimensions specified in SizeHeader (D.2)
by cropping at the right and bottom as necessary. Unless otherwise specified, group_dim is 256.
LF groups likewise consist of group_dim × group_dim LF samples, with the possibility of a smaller effective
size on the right and bottom of the image.
Groups can be decoded independently. A ’table of contents’ (F.3) stores the size (in bytes) of each group
to allow seeking to any group. An optional permutation allows arbitrary reordering of groups within the
codestream.
5.4 Codestream organization
A bitstream is a sequence of bytes. A codestream is a bitstream that represent compressed image data and
metadata. N bytes can also be viewed as 8 × N bits. The first 8 bits are the bits constituting the first byte,
in least to most significant order, the next eight bits (again in least to most significant order) constitute the
second byte, and so on. Unless otherwise specified, bits are read from the codestream as specified in B.2.1.
NOTE Ordering bits from least to most significant allows using special CPU instructions to isolate the least-
significant bits.
© ISO/IEC 2024 – All rights reserved
6 Encoder requirements
An encoder is an embodiment of the encoding process. This document does not specify an encoding process,
and any encoding process is acceptable as long as the codestream conforms to the codestream syntax
specified in this document. Annex O provides an informative description of such an encoding process.
7 Decoder requirements
A decoder is an embodiment of the decoding process. The decoder reconstructs sample values (arranged
on a rectangular sampling grid) from a codestream as specified in this document. Annexes A to N define an
output that alternative implementations shall duplicate.

© ISO/IEC 2024 – All rights reserved
Annex A
(normative)
Codestream overview
A.1 Codestream structure
The codestream consists of image headers (Annexes D and N) and an ICC profile, if present (E.4), followed
by one or more frames (Annex F), as shown in Table A.1. This table and those it references, together with the
syntax description (B.1), specify how a decoder reads the headers and frame(s).
Table A.1 — Codestream structure
condition type name Annex
headers
Headers Annex D,
Annex N
headers.metadata.colour_encoding.want_icc icc
E.4
headers.metadata.have_preview preview_frame
Frame Annex F
frames[0]
Frame Annex F
for (i = 1; !frames[i − 1].frame_header.is_last; ++i) frames[i]
Frame Annex F
The preview_frame is a self-contained frame whose FrameHeader (F.2) specifies lf_level=0, frame_
type=kRegularFrame and save_as_reference=0.
A.2 Decoding process
Annex B specifies how the decoder parses header information.
Annex C specifies how the decoder reads entropy-coded data.
After reading the image header (Annexes D and E) and the frame header (Annex F), the decoder reads the frame
data (Annex G). The reconstruction of the decoded image can be viewed as a pipeline with multiple stages.
Some Annexes or subclauses begin with a condition; the decoder only applies the processing steps described
in such an Annex or subclause if its condition holds true.
Annex H specifies the kModular frame encoding and Modular sub-bitstreams.
Annex I specifies the kVarDCT frame encoding.
The decoder applies restoration filters as specified in Annex J.
The presence/absence of additional image features (patches, splines and noise) is indicated in the frame
header. The decoder draws these as specified in Annex K. Image features (if present) are rendered after
restoration filters (if enabled), in the listed order.
Finally, the decoder performs colour transforms as specified in Annex L.
NOTE For an introduction to the coding tools and additional background, please refer to Reference [3].

© ISO/IEC 2024 – All rights reserved
Annex B
(normative)
Header syntax
B.1 General
The codestream is organized into “bundles” consisting of one or more conceptually related “fields”. A field
can be a bundle, or an array/value of a type from B.2. The graph of contained-within relationships between
types of bundles is acyclic. This document specifies the structure of each bundle using tables structured as
in Table B.1, with one row per field (in top to bottom order).
Table B.1 — Example structure of a table describing a bundle
condition type default name
false div8
Bool()
div8 h_div8
1 + u(5) 0
!div8 8 * h_div8 height
U32(1 + u(9), 1 + u(13), 1 + u(18), 1 + u(30))
ratio
u(3) 0
div8 and !ratio w_div8
1 + u(5) 0
!div8 and !ratio d_width width
U32(1 + u(9), 1 + u(13), 1 + u(18), 1 + u(30))
If the condition is blank or evaluates to true, the field with the name in the name column is read from the
codestream (B.1.1) according to the field type in the type column. Otherwise, the field is instead initialized
to default value in the default column (B.1.2).
A condition of the form for(i = 0; condition; ++i) is equivalent to replacing this row with a sequence of
rows obtained from the current one by removing its condition and replacing the value of i in each column
with the consecutive values assumed by the loop-variable i until condition is false. If condition is initially
false, the current row has no effect.
If name ends with [n] then the field is a fixed-length array with n entries. If condition is blank or evaluates
to true, each array element is read from the codestream (B.1.1) in order of increasing index. Otherwise,
each element is initialized to default (B.1.2). The name is potentially referenced in the condition or type of a
subsequent row, or the condition of the same row.
B.1.1 Reading a field
If a field is to be read from the codestream, the type entry determines how to do so. If it is a basic field
type, it is read as described in B.2. Otherwise type is a (nested) bundle denoted Nested, residing within a
parent bundle Parent. Nested is read as if the rows of the table defining Nested were inserted into the table
defining Parent in place of the row that defined Nested. This principle is applied recursively, corresponding
to a depth-first traversal of all fields.
B.1.2 Initializing a field
If a field is to be initialized to default, the type entry determines how to do so. If it is a bundle, then default
is blank and each field of the bundle is (recursively) initialized to the default specified within the bundle’s
table. Otherwise, the field is set to default, which is a valid value of the same type.

© ISO/IEC 2024 – All rights reserved
B.2 Field types
B.2.1 u(n)
u(0) evaluates to the value zero without reading any bits. For n > 0, u(n) reads n bits from the codestream,
n
advances the position accordingly, and returns the value in the range [0, 2 ) represented by the bits. The
decoder first reads the least-significant bit of the value, from the least-significant not yet consumed bit in
the first not yet fully consumed byte of the codestream. The next bit of the value (in increasing order of
significance) is read from the next (more significant) bit of the same byte unless all its bits have already
been consumed, in which case the decoder reads from the least-significant bit of the next byte, and so on.
B.2.2 U32(d0, d1, d2, d3)
In this subclause, “distribution” refers to one of the following three encodings of a range of values: u, u(n), or
offset + u(n). The d0, d1, d2, d3 parameters represent distributions.
U32(d0, d1, d2, d3) reads an unsigned 32-bit integer value in [0, 2 ) as follows. The decoder first reads a
u(2) from the codestream indicating which distribution to use (0 selects d0, 2 selects d2 etc.). Let d denote
this distribution, which determines how to decode value.
If d is u, value is the integer u. If d is u(n), value is read as u(n). If d is offset + u(n), the decoder reads v = u(n).
The resulting value is (offset + v) Umod (1 << 32).
NOTE The value of u is implicitly defined and not stored explicitly in the codestream.
EXAMPLE For a field of type U32(8, 16, 32, u(7)), the bits 10 result in value = 32. For a U32(u(2), u(4), u(6), u(8))
field, the bits 010111 result in value = 7.
B.2.3 U64()
U64() reads an unsigned 64-bit integer value in [0, 2 ) using a single variable-length encoding. The decoder
first reads a u(2) selector s. If s == 0, value is 0. If s == 1, value is 1 + u(4). If s == 2, value is 17 + u(8).
Otherwise (s == 3), value is read as specified by the following code:
value = u(12); shift = 12;
while (u(1) == 1) {
if (shift == 60) {
value += u(4) << shift; // only 4, we already read 60
break;
}
value += u(8) << shift; shift += 8;
}
EXAMPLE The largest possible value 2 − 1 is encoded as 73 consecutive 1-bits.
B.2.4 F16()
F16() reads a real value in [−65504, 65504], as specified by the following code:
bits16 = u(16);
biased_exp = (bits16 >> 10) & 0x1F;
value = InterpretAsF16(bits16);

The value of biased_exp is not 31.
NOTE This rules out NaN and infinities.
B.2.5 Bool()
Bool() reads a boolean value as u(1) ? true : false.

© ISO/IEC 2024 – All rights reserved
B.2.6 Enum(EnumTable)
Enum(EnumTable) reads v = U32(0, 1, 2 + u(4), 18 + u(6)). The value v does not exceed 63 and is defined by
the table titled EnumTable. Such tables are structured according to Table B.2, with one row per unique value.
Table B.2 — Example structure of a table describing an enumerated type
name value meaning
kD65 1 CIE Standard Illuminant D65: 0.3127, 0.3290
kCustom 2 Custom white point stored in colour_encoding.white
kE 10 CIE Standard Illuminant E (equal-energy): 1/3, 1/3
kDCI 11 DCI-P3 from SMPTE ST 428-1: 0.314, 0.351
An enumerated type is interpreted as having the meaning of the row with the corresponding value. The value
in the name column (or where ambiguous, EnumTable.name) is an identifier used to refer to a particular
enumerated value, and begins with the letter k, e.g., kRGB.
B.2.7 ZeroPadToByte()
The decoder skips to the next byte boundary if not already at a byte boundary. All skipped bits have value 0.
NOTE Since any padding bits are zero, they can serve as an additional indicator of codestream integrity.
B.3 Extensions
This bundle, specified in Table B.3, is a field in bundles (ImageMetadata, FrameHeader, RestorationFilter)
which can be extended (cf. Annex N).
Table B.3 — Extensions bundle
condition type default name
extensions
U64() 0
extensions != 0 extension_bits[NumExt]
U64() 0
An extension consists of zero or more bits whose interpretation is established by Annex N. Each extension is
identified by an ext_id in the range [0, 63), assigned in increasing sequential order. extensions is a bit array
where the i-th bit (least-significant == 0) indicates whether the extension with ext_id = i is present. NumExt
denotes the number of extensions present, i.e. number of 1-bits in extensions. Extensions that are present
are stored in ascending order of their ext_id. extension_bits[i] indicates the number of bits stored for the
extension whose ext_id = i, starting after extension_bits has been read. The decoder reads all these bits
for all extensions which are present.

© ISO/IEC 2024 – All rights reserved
Annex C
(normative)
Entropy decoding
C.1 Overview
This Annex specifies an entropy decoder. Other Annexes and their subclauses indicate how it is used for
various elements of the codestream.
The codestream contains multiple independently entropy-coded streams. During decoding of such a stream,
the decoder maintains an entropy coder state, which is initialized as specified in C.2. This state has to persist
and can be updated during the decoding of the entropy-coded stream, as specified in C.3.
The following variables (which are specified in the following clauses) are part of the entropy decoder state:
— num_dist, the pre-clustering number of distributions (context identifiers), which is given;
— num_clusters, the post-clustering number of distributions;
— lz77, a bundle with LZ77 settings;
— clusters[num_dist], describing the context clustering;
— configs[num_clusters], hybrid integer configurations;
— use_prefix_code, a Boolean flag indicating if prefix coding or ANS is used;
— lz_len_conf, a hybrid integer configuration used to decode LZ77 lengths;
— window[1 << 20], the LZ77 window of previously-decoded symbols (initialized to all zeroes);
— num_to_copy, copy_pos, and num_decoded: counters used for copying symbols from the LZ77 window;
— If use_prefix_code: a set of num_clusters post-clustered prefix codes;
— If !use_prefix_code, i.e. when the stream uses ANS:
— log_alphabet_size, indicating the alphabet size of the entropy-coded ANS symbols;
— a set of num_clusters post-clustered probability distributions D[1 << log_alphabet_size];
— the alias mapping for each post-clustered distribution (log_bucket_size, bucket_size, symbols,
offsets, and cutoffs);
— state, a 32-bit unsigned integer that represents the ANS state.
NOTE As an implementation optimization, the variables lz_len_conf, window, num_to_copy, copy_pos, and
num_decoded can be ignored in case lz77.enabled is false.
C.2 Distribution decoding
C.2.1 General
This subclause describes how to decode the probability distributions and other stream initialization
information that is needed before the actual entropy decoding can start. When this Annex is referenced,
num_dist is given; it denotes the number of pre-clustered probability distributions.

© ISO/IEC 2024 – All rights reserved
The decoder first reads the LZ77 settings lz77 as specified in Table C.1.
Table C.1 — LZ77Params bundle
condition type name
enabled
Bool()
enabled min_symbol
U32(224, 512, 4096, 8 + u(15))
enabled min_length
U32(3, 4, 5 + u(2), 9 + u(8))
If lz77.enabled, the decoder sets lz_dist_ctx = num_dist++, and reads lz_len_conf = HybridUintConfig(8)
as specified in C.2.3.
The decoder then reads the mapping clusters of the num_dist distributions into num_clusters clusters
as specified in C.2.2. It then reads use_prefix_code as a Bool(). If use_prefix_code is false, the decoder
sets log_alphabet_size to 5 + u(2); otherwise, it sets log_alphabet_size to 15. Then for each post-
clustered distribution, in increasing order of index i in [0, num_clusters), the decoder reads configs[i] =
HybridUintConfig(log_alphabet_size) as specified in C.2.3.
The decoder then reads num_clusters post-clustered distributions D[i] as follows. If use_prefix_code is
true, then for i in [0, num_clusters): if Bool() is false, count[i] = 1, otherwise n = u(4) is read, and count[i]
= 1 + (1 << n) + u(n), which is at most 1 << 15. After reading the counts, the decoder reads each D[i]
(implicitly described by a prefix code) as specified in C.2.4, with alphabet_size = count[i]. If use_prefix_
code is false, the distributions are read as specified in C.2.5.
C.2.2 Distribution clustering
The probability distributions that symbols belong to can be clustered together. This subclause specifies how
this clustering is read by the decoder.
The output of this procedure is an array clusters with num_dist entries (one for each pre-clustered context),
with values in the range [0, num_clusters). All integers in [0, num_clusters) are present in this array, and
num_clusters < 256. Position i in this array indicates that context i is merged into the corresponding cluster.
If num_dist == 1, then num_clusters = 1 and clusters[0] = 0, and the remainder of this subclause is
skipped.
The decoder first reads a Bool() is_simple indicating a simple clustering. If is_simple is true, it then decodes
a u(2) representing the number of bits per entry nbits. For each pre-clustered distribution, the decoder
reads a u(nbits) value that indicates the cluster that the given distribution belongs to.
Otherwise, if is_simple is false, the decoder reads a Bool() use_mtf. The decoder then initializes a symbol
decoder using a single distribution D, as specified in C.2.1, where if num_dist == 2 then in the recursive
distribution decoding, lz77.enabled is false. For each pre-clustered distribution i, the decoder reads an
integer as specified in C.3.3. Finally, if use_mtf, the decoder applies an inverse move-to-front transform to
the cluster mapping (see code below).
MTF(v[256], index) { value = v[index];
for (i = index; i; --i) v[i] = v[i − 1];
v[0] = value; }
InverseMoveToFrontTransform(clusters[num_dist]) {
for (i = 0; i < 256; ++i) mtf[i] = i;
for (i = 0; i < num_dist; ++i) {
index = clusters[i]; /* index < 256 */
clusters[i] = mtf[index];
if (index != 0) MTF(mtf, index);
}
}
C.2.3 Hybrid integer configuration
The configuration for the hybrid unsigned integer decoder of C.3.3 is read from the bitstream as follows:

© ISO/IEC 2024 – All rights reserved
----------------------
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...