ISO/IEC 21794-6:2025
(Main)Information technology — Plenoptic image coding system (JPEG Pleno) — Part 6: Learning-based point cloud coding
Information technology — Plenoptic image coding system (JPEG Pleno) — Part 6: Learning-based point cloud coding
This document defines the JPEG Pleno framework for learning-based point cloud coding. This document is applicable to interactive human visualization, with competitive compression efficiency compared to state of the art point cloud coding solutions in common use, and effective performance for 3D processing and machine-related computer vision tasks, and has the goal of supporting a royalty-free baseline. This document specifies a coded codestream format for storage of point clouds. It provides information on the encoding tools. It also defines extensions to the JPEG Pleno File Format and associated metadata descriptors that are specific to point cloud modalities.
Technologies de l'information — Système de codage d'images plénoptiques (JPEG Pleno) — Partie 6: Codage de nuages de points basé sur l’apprentissage
General Information
Standards Content (Sample)
International
Standard
ISO/IEC 21794-6
First edition
Information technology —
2025-07
Plenoptic image coding system
(JPEG Pleno) —
Part 6:
Learning-based point cloud coding
Technologies de l'information — Système de codage d'images
plénoptiques (JPEG Pleno) —
Partie 6: Codage de nuages de points basé sur l’apprentissage
Reference number
© ISO/IEC 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2025 – All rights reserved
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
4 Symbols and abbreviated terms. 4
4.1 Symbols .4
4.2 Abbreviated terms .7
5 Conventions . 7
5.1 Naming conventions for numerical values .7
5.2 Operators .8
5.2.1 Arithmetic operators .8
5.2.2 Logical operators .8
5.2.3 Relational operators .8
5.2.4 Set operators .9
5.2.5 Precedence order of operators .9
5.2.6 Mathematical functions .9
6 General . 10
6.1 Point cloud geometry representations .10
6.2 Multiple learnable neural network models .10
6.3 Functional description of the encoding process .10
6.4 Functional description of the decoding process .11
6.5 Encoder requirements . 12
6.6 Decoder requirements . 12
6.7 Trained models and parameters . 12
7 Organization of the document .13
Annex A (normative) File format . 14
Annex B (normative) Geometry codestream syntax .20
Annex C (normative) Deep learning-based decoding .27
Annex D (normative) Colour decoding .30
Annex E (normative) Synthesis transform .33
Annex F (normative) Hyper decoders .38
Annex G (normative) Entropy decoding .42
Annex H (normative) Binarization .45
Annex I (normative) Block merging .47
Annex J (normative) Up-sampling .49
Annex K (normative) Deep learning-based super-resolution .50
Bibliography .54
© ISO/IEC 2025 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 21794 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
© ISO/IEC 2025 – All rights reserved
iv
Introduction
This document is part of a series of standards for a system known as JPEG Pleno. This set of standards
facilitates the capture, representation, exchange and visualization of plenoptic imaging modalities. A
plenoptic image modality can be a light field, point cloud or hologram, which are sampled representations
of the plenoptic function in the form of, respectively, a vector function that represents the radiance of a
discretized set of light rays, a collection of points with position and attribute information, or a complex
wavefront. The plenoptic function describes the radiance in time and in space obtained by positioning a
pinhole camera at every viewpoint in 3D spatial coordinates, every viewing angle and every wavelength,
resulting in a 7D function.
JPEG Pleno specifies tools for coding these modalities while providing advanced functionality at system level,
such as support for data and metadata manipulation, editing, random access and interaction, protection of
privacy and ownership rights.
The scope of this document is the specification of a learning-based coding standard for point clouds and
associated attributes, offering a single-stream, compact compressed domain representation, supporting
advanced flexible data access functionalities. In this context, learning-based refers to the use of machine
learning technologies to learn an optimal compressed domain representation from supplied training data.
© ISO/IEC 2025 – All rights reserved
v
International Standard ISO/IEC 21794-6:2025(en)
Information technology — Plenoptic image coding system
(JPEG Pleno) —
Part 6:
Learning-based point cloud coding
1 Scope
This document defines the JPEG Pleno framework for learning-based point cloud coding.
This document is applicable to interactive human visualization, with competitive compression efficiency
compared to state of the art point cloud coding solutions in common use, and effective performance for
3D processing and machine-related computer vision tasks, and has the goal of supporting a royalty-free
baseline.
This document specifies a coded codestream format for storage of point clouds. It provides information
on the encoding tools. It also defines extensions to the JPEG Pleno File Format and associated metadata
descriptors that are specific to point cloud modalities.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
1)
ISO/IEC 6048-1:— , Information technology — JPEG AI learning-based image coding system — Part 1: Core
coding system
2)
ISO/IEC 15444-1 , Information technology — JPEG 2000 image coding system — Part 1: Core coding system
3)
ISO/IEC 15444-2 , Information technology — JPEG 2000 image coding system — Part 2: Extensions
ISO/IEC 21794-1, Information technology — Plenoptic image coding system (JPEG Pleno) — Part 1: Framework
ISO/IEC 21794-2, Information technology — Plenoptic image coding system (JPEG Pleno) — Part 2: Light field coding
ISO/IEC 21794-3, Information technology — Plenoptic image coding system (JPEG Pleno) — Part 3:
Conformance testing
ISO/IEC 21794-4, Information technology — Plenoptic image coding system (JPEG Pleno) — Part 4: Reference
software
ISO/IEC 21794-5, Information technology — Plenoptic image coding system (JPEG Pleno) — Part 5: Holography
ISO/IEC 23090-5, Information technology — Coded representation of immersive media — Part 5: Visual
volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC)
ISO/IEC 23090-9, Information technology — Coded representation of immersive media — Part 9: Geometry-
based point cloud compression
1) Under preparation. Stage at the time of publication: ISO/IEC PRF 6048-1:2025.
2) Similar to Rec. ITU-T T.800 | ISO/IEC 15444-1
3) Similar to Rec. ITU-T T.801 | ISO/IEC 15444-2
© ISO/IEC 2025 – All rights reserved
ISO/IEC 60559, Information technology — Microprocessor Systems — Floating-Point arithmetic
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 21794-1, ISO/IEC 21794-2,
ISO/IEC 21794-3, ISO/IEC 21794-4, ISO/IEC 21794-5, ISO/IEC 23090-5, ISO/IEC 23090-9, ISO/IEC 6048-1,
ISO/IEC 15444-1, ISO/IEC 15444-2, ISO/IEC 60559 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
point
fundamental element of a point cloud comprising a position specified as 3D spatial coordinates and colour
attributes
3.2
point cloud
unordered list of points
3.3
neural network layer
tensor operation that contains trainable parameters which receives and outputs a tensor
3.4
neural network module
set of neural network layers
3.5
neural network model
specified sequence of neural network modules (also called architecture) and corresponding trained
parameters
3.6
trainable parameters
parameters of a neural network layer whose values require a training process based on input ground truth
data to be set
3.7
trained parameters
parameters of a neural network model whose values have been set by a training process based on input
ground truth data
3.8
dense tensor
representation of a 3D block as a regular array with four dimensions: horizontal, vertical, depth and channel
dimension
3.9
sparse tensor
representation of a 3D block, where only non-zero elements are represented as a set of indices (or
coordinates) C and associated values (or features) F
P×3
Note 1 to entry: The set of coordinates C is represented as a matrix C∈Z and the associated features F are
PN×
represented as a matrix F∈R , where P is the number of non-zero elements and N is the number of channels.
The remaining elements of a sparse tensor are zeros.
Note 2 to entry: Opposed to a dense tensor.
© ISO/IEC 2025 – All rights reserved
3.10
latent tensor
intermediate representation of point cloud data during encoding or decoding processes, as a sparse tensor
3.11
standard deviations tensor
tensor of unsigned 16 bits integers, used for entropy coding, denoted as σ
3.12
concatenation of sparse tensors
process where the features of two sparse tensors are concatenated
3.13
element-wise addition of sparse tensors
process where the features of two sparse tensors are added element-wise
3.14
sparse convolution layer
three-dimensional sparse convolution denoted as SpConv KK××KN,,Ns, ↓
()
verhor depinout
3.15
transposed sparse convolution layer
three-dimensional transposed sparse convolution denoted as TSpConvK ××KK ,,NN ,s↑
()
verhor depinout
3.16
generative transposed sparse convolution layer
three-dimensional generative transposed sparse convolution denoted as
GTSpConv KK××KN,,Ns, ↑
()
verhor depinout
3.17
quantized sparse convolution layer
three-dimensional quantized sparse convolution is denoted as qSpConvK ××KK ,,NN ,,sd↓ ,p
()
verhor depinout
3.18
quantized generative transposed sparse convolution layer
three-dimensional quantized generative transposed sparse convolution is denoted as
qGTSpConvK ××KK ,,NN ,,sd↑ ,p
()
verhor depinout
3.19
matrix multiplication
matrix multiplication (denoted as ×) receives two two-dimensional arrays inputh ,L and inputL,w .
[] []
1 out 2 out
Note 1 to entry: This module produces a two-dimensional array output of size hw, :
[]
outout
For i=…01,,h − and jw=…01,, − :
out out
L−1
output[]ij,,=⋅inputi[]linput []lj,.
∑ 12
l=0
3.20
rectified linear unit
rectified linear unit is denoted as ReLU
Note 1 to entry: This element-wise function is defined as:
xi,,fx≥0
ReLU()x =
{
0,.otherwise
© ISO/IEC 2025 – All rights reserved
3.21
sigmoid
sigmoid operation is denoted as Sigmoid
Note 1 to entry: This element-wise function is defined as:
Sigmoidx()= .
−x
1+e
4 Symbols and abbreviated terms
4.1 Symbols
bias bias parameters of a convolutional layer
BlockPosition global coordinates of a point cloud block origin point in the input source point cloud
BS size of the 3D block in voxels
C coordinates of sparse tensor
c
a specific coordinate within C or C
in out
C
coordinates of sparse tensor input to a process
in
C
coordinates of sparse tensor output from a process
out
Col
colour of original point cloud
decoded point cloud colour
Col
i i
Col colour values corresponding to points in Geo
knn knn
colour of point i in decoded point cloud with super-resolution
Col
SR
i
d
clipping value for quantized sparse convolution layer j in hyper scale decoder network
j
D depth of the point cloud volume in voxels
F features of a sparse tensor
F
features of a sparse tensor input to a process
in
F
features of a sparse tensor output from a process
out
Geo
geometry of original point cloud
decoded point cloud geometry
Geo
decoded point cloud geometry with super-resolution
Geo
SR
Geo point i in decoded point cloud geometry with super-resolution Geo
SR SR
i
i
20 nearest neighbouring points of Geo from decoded point cloud geometry Geo
SR
Geo
i
knn
h height of 2D image produced by 3D to 2D projection module
© ISO/IEC 2025 – All rights reserved
H height of point cloud volume in voxels
Im
near layer image produced by 3D to 2D projection module
Im
far layer image produced by 3D to 2D projection module
k number of points to be decoded
K convolutional kernel
K
vertical dimension of a convolutional kernel
ver
K
horizontal dimension of a convolutional kernel
hor
K
depth dimension of a convolutional kernel
dep
m sampling factor index for super-resolution network
modelIdx
index of the point cloud coding model used for encoding and decoding
N number of points in point cloud
N
number of input channels of a sparse convolution layer
in
N number of output channels of a sparse convolution layer
out
ˆ
N number of scales of the Gaussian distributions for the entropy coding of r
σ F
P number of points in original point cloud block
p
de-scaling shift parameter values for quantized sparse convolution layer j in hyper scale
j
decoder network
ˆ
number of points in decoded point cloud block
P
QS user-defined quantisation step
r residual of subtraction of latent representation y and means μ
QS
coordinates of residual r
r
C
features of residual r
r
F
quantized residual r
ˆ
r
ˆ
features of r
ˆ
r
F
s stride of sparse convolution layers
SF user-defined down-sampling factor for point cloud block down-sampling
w width of 2D image produced by 3D to 2D projection module
W width of point cloud volume in voxels
weights weights of a convolutional layer
x point cloud block to be transformed by the analysis transform represented as a sparse tensor
x coordinates of x
C
© ISO/IEC 2025 – All rights reserved
x features of x
F
X global point cloud coordinates
global
X internal point cloud block coordinates given that each block has its own origin
internal
xˆ
decoded point cloud block x
ˆ decoded coordinates of x
x
c
SR
decoded point cloud block with super-resolution
xˆ
SR SR
xˆ coordinates of xˆ
C
SR SR
xˆ features of xˆ
F
y latent representation of point cloud block x created by the analysis transform
y coordinates of y
C
y features of y
F
ˆ
y
decoded latent representation y
y latent representation y scaled by QS
QS
ˆ decoded scaled latent representation y
y
QS
QS
z hyper latent tensor produced by hyper encoder
z
coordinates of z
C
z
features of z
F
zˆ
quantized hyper latent tensor z
ˆ
coordinates of z
ˆ
z
C
features of zˆ
zˆ
F
β ratio of number of points to be decoded over number of points in original point cloud block
μ
means of latent representation prediction created by the hyper mean decoder
σ
ˆ
standard deviations (or scales) of the Gaussian distributions for the entropy coding of r ,
F
created by the hyper scale decoder
σ
ˆ
minimum scale value of the Gaussian distributions for the entropy coding of r
min
F
ˆ
σ maximum scale value of the Gaussian distributions for the entropy coding of r
max F
Res residual produced by subtraction of far layer image from near layer image
TRes trimmed residual image Res
Left left position of TRes within Res
Top top position of TRes within Res
© ISO/IEC 2025 – All rights reserved
Right right position of TRes within Res
Bottom bottom position of TRes within Res
4.2 Abbreviated terms
2D two-dimensional
3D three-dimensional
CDF cumulative distribution function
CSV comma separated values
HTTP hypertext transfer protocol
IPR intellectual property rights
IRB inception-residual blocks
JPEG Joint Photographic Experts Group
JPL JPEG Pleno file format
LIRB lightweight inception-residual block
LSB least significant bit
MSB most significant bit
PC point cloud
rANS range variant of asymmetric numeral systems
SR super resolution
XML extensible markup language
5 Conventions
5.1 Naming conventions for numerical values
Integer numbers are expressed as bit patterns, hexadecimal values or decimal numbers. Bit patterns and
hexadecimal values have both a numerical value and an associated length in bits.
Hexadecimal notation, indicated by prefixing the hexadecimal number by "0x", may be used instead of
binary notation to denote a bit pattern having a length that is an integer multiple of 4. For example, 0x41
represents an eight-bit pattern having only its second most significant bit and its least significant bit equal
to 1. Numerical values that are specified under a "Code" heading in tables that are referred to as "code tables"
are bit pattern values (specified as a string of digits equal to 0 or 1 in which the left-most bit is considered
the most-significant bit). Other numerical values not prefixed by "0x" are decimal values. When used in
expressions, a hexadecimal value is interpreted as having a value equal to the value of the corresponding bit
pattern evaluated as a binary representation of an unsigned integer (i.e. as the value of the number formed
by prefixing the bit pattern with a sign bit equal to 0 and interpreting the result as a two's complement
representation of an integer value). For example, the hexadecimal value 0xF is equivalent to the 4-bit pattern
'1111' and is interpreted in expressions as being equal to the decimal number 15.
© ISO/IEC 2025 – All rights reserved
5.2 Operators
NOTE Many of the operators used in this document are similar to those used in the C programming language.
5.2.1 Arithmetic operators
+ addition
− subtraction (as a binary operator) or negation (as a unary prefix operator)
× multiplication
/ division without truncation or rounding
s
<< left shift; x<
s
>> right shift; x>>s is defined as ⎿x/2 ⏌
++ increment with 1
-- decrement with 1
umod unsigned modulo operator; x umod a is the unique value y between 0 and a–1
for which y+Na = x with a suitable integer N
& bitwise AND operator; compares each bit of the first operand to the corresponding bit of
the second operand
If both bits are 1, the corresponding result bit is set to 1. Otherwise, the corresponding result
bit is set to 0.
^ bitwise XOR operator; compares each bit of the first operand to the corresponding bit of the
second operand
If both bits are equal, the corresponding result bit is set to 0. Otherwise, the corresponding
result bit is set to 1.
5.2.2 Logical operators
|| logical OR
&& logical AND
! logical NOT
a ? b : c if condition a is true, then the result is equal to b; otherwise the result is equal to c
5.2.3 Relational operators
> greater than
>= greater than or equal to
< less than
<= less than or equal to
== equal to
!= not equal to
© ISO/IEC 2025 – All rights reserved
5.2.4 Set operators
set of all real numbers
set of all integers
∀ for all
∊ is an element of
5.2.5 Precedence order of operators
Operators are listed in descending order of precedence. If several operators appear in the same line,
they have equal precedence. When several operators of equal precedence appear at the same level in an
expression, evaluation proceeds according to the associativity of the operator either from right to left or
from left to right.
Operators Type of operation Associativity
() Expression left to right
[] indexing of arrays left to right
++, -- increment, decrement left to right
!, – logical not, unary negation
×, / multiplication, division left to right
umod unsigned modulo (remainder) left to right
+, − addition and subtraction left to right
& bitwise AND left to right
^ bitwise XOR left to right
&& logical AND left to right
|| logical OR left to right
<<, >> left shift and right shift left to right
< , >, <=, >= relational left to right
5.2.6 Mathematical functions
|x| absolute value, is –x for x < 0, otherwise x
sign(x) sign of x, zero if x is zero, +1 if x is positive, -1 if x is negative
clamp(x,min,max) clamps x to the range [min,max]: returns min if x < min, max if x > max or oth-
erwise x
⎾x⏋ ceiling of x; returns the smallest integer that is greater than or equal to x
⎿x⏌ floor of x; returns the largest integer that is less than or equal to x
rounding of x to the nearest integer, equivalent to sign()xx +05. In the case
⎿x⏋
that the operand is a vector, the operation is performed separately on each el-
ement
© ISO/IEC 2025 – All rights reserved
6 General
6.1 Point cloud geometry representations
A point cloud is defined as a set of points in the 3D space characterized by their positions, expressed in a
given 3D coordinate system, the so-called geometry. This geometrical data may be accompanied with per-
point colour samples with three components – red, green and blue – the so-called colour attributes.
There are three main types of point cloud geometry representations: point-based, volumetric-based, and
sparse tensor-based.
In a point-based representation, the point cloud geometry data corresponds to an unordered list of
coordinates for each point, in an N×3 array. The number of points N in each point cloud may be different.
In a volumetric-based representation, the point cloud geometry data corresponds to a 3D volume defined as a
rectangular array with a regular grid of WH××D sample positions. W is called the width, H is called the height,
and D is called the depth of the volume. Each of the volume samples is referred to as a voxel. The geometry
information is represented as a binary signal, where the value ‘1’ marks an existing 3D point in its corresponding
position – the voxel is occupied – and a value ‘0’ marks an absence of 3D points – the voxel is empty.
The sparse tensor-based representation is functionally equivalent to the volumetric-based representation.
The difference is that only the occupied voxels are explicitly represented, while the voxels that are not
represented are considered to be empty. The point cloud geometry data is represented in an N×3 array
containing the coordinates of the occupied voxels, and an N×1 array containing the features or values of the
corresponding occupied voxels, here simply equal to ‘1’.
6.2 Multiple learnable neural network models
In order to ensure variable rate support and efficient compression performance at different ranges of quality,
five different learnable neural network models are supported. The selection of models is defined by a model
indicator (modelIdx=…04,, from lowest to highest quality). According to the selected modelIdx , one out of
the five models will be loaded to perform the encoding or decoding process.
6.3 Functional description of the encoding process
Encoder operations are non-normative, described here only to facilitate normative decoder operations.
An instantiation of the JPEG Pleno point cloud encoder architecture is presented in Figure 1.
The input source point cloud data shall have a point-based representation and shall be voxelized. This means that,
precision
for a given precision (bit-depth), each of the 3D point coordinates lies on the uniform grid 02,,…−1 .
{}
The input source point cloud data may be encoded using only the Geometry Coding Mode or using the
Geometry and Colour Coding Mode.
In the Geometry Coding Mode, the input source point cloud geometry data is encoded in a five-step process.
First, the point cloud geometry data is divided into fixed-sized blocks to be independently encoded according
to a raster scanning order. All blocks are then down-sampled according to a user-defined sampling factor.
Each down-sampled block is independently coded with a deep learning coding model, generating a latent
representation, which is then quantized, and entropy coded. Subsequently, each block is also decoded via
the same deep learning coding model, in order to optimize the binarization process to be performed at the
decoder.
During the encoding, the point cloud data is converted to a sparse tensor-based representation, to be
processed by the deep learning coding model. The deep learning coding model will generate, for each PC
block, a separate bit-stream embedded in the codestream, which can be independently decoded in support
of random access.
When using the Geometry and Colour Coding Mode, after encoding and decoding the geometry for each
block using the deep learning coding model, all blocks are merged back into the full decoded point cloud
© ISO/IEC 2025 – All rights reserved
geometry in a point-based representation. Then, the decoded geometry is recoloured using the colour data
from the input source point cloud, and subsequently projected from 3D onto 2D images. Finally, the 2D
images containing the point cloud colour data are encoded using the JPEG AI image encoder.
The JPEG AI image encoder will generate a bit-stream for the whole colour data to be embedded in the
codestream.
Figure 1 — Generic JPEG Pleno point cloud encoder architecture
6.4 Functional description of the decoding process
This clause specifies the JPEG Pleno point cloud decoding algorithm.
The overall architecture (see Figure 2) provides the flexibility to configure the encoding and decoding
system depending on the requirements of the addressed use case.
In the Geometry Coding Mode, from the codestream, a deep learning model decodes each PC block
independently, generating a 3D volume containing the probabilities of each voxel being occupied. These
probabilities are then binarized to determine the points’ locations. Each block is up-sampled by the same
sampling factor used in the encoder, restoring the original point cloud precision. Optionally, a deep learning
model may be used to perform super-resolution on the up-sampled blocks, which also requires using
another binarization process. Finally, all coding units (blocks) are merged to generate the decoded point
cloud geometry data.
When decoding the codestream, the deep learning models process the point cloud data in a sparse tensor-
based representation. Finally, when merging all blocks, the decoded point cloud geometry data is converted
to a point-based representation.
When using the Geometry and Colour Coding Mode, from the codestream, JPEG AI first decodes the images
containing the colour data. Then, after merging all decoded blocks containing the geometry data, the
decoded geometry is used to compute the inverse projection of the colour data from 2D images onto a 3D
point cloud. Finally, the decoded colour data is assigned to the final decoded point cloud geometry data,
using interpolation when a direct correspondence does not exist.
© ISO/IEC 2025 – All rights reserved
Figure 2 — Generic JPEG Pleno Point Cloud decoder architecture
6.5 Encoder requirements
An encoding process converts source point cloud data to coded point cloud data.
In order to conform with this document, an encoder shall provide a codestream that conforms with the
codestream format syntax and file format syntax specified in the annexes for the encoding process(es)
embodied by the encoder.
6.6 Decoder requirements
A decoding process converts coded point cloud data to reconstructed point cloud data. The decoding process
shall be as specified in Annexes A to J.
A decoder is an embodiment of the decoding process. In order to conform to this document, a decoder
shall convert all, or specific parts of, any coded point cloud data that conform to the file format syntax and
codestream syntax specified in Annexes A to J to a reconstructed point cloud.
6.7 Trained models and parameters
The trained models and parameters can be found in the electronic attachment https:// standards .iso .org/ iso
-iec/ 21794/ -6/ ed -1/ en/ .
© ISO/IEC 2025 – All rights reserved
4)
The decoder’s trained parameters (weights and biases) are stored in the PyTorch® model format, version
1.13. The cumulative distribution function (CDF) tables for the entropy coder are stored in comma separated
values (CSV) format.
NOTE More information about the PyTorch® model format is available at https:// pytorch .org/ tutorials/ beginner/
saving _loading _models .html.
The directory structure is shown in Table 1. Folders are indicated by a trailing forward slash. The folder
‘Codec_quantized’ contains the deep learning-based decoding models. The folder ‘SR’ contains the deep
learning-based super resolution models. The folder ‘rANS’ contains the CDF tables common to all models and
‘model_modelIdx’ directories contain the CDF tables specific to each deep learning-based decoding model.
Table 1 — Directory structure in electronic attachment
model_0.pth
model_1.pth
Codec_quantized/ model_2.pth
model_3.pth
model_4.pth
SF_2.pth
SR/
SF_4.pth
model_0/ cdf_z.csv
model_1/ cdf_z.csv
model_2/ cdf_z.csv
model_3/ cdf_z.csv
model_4/ cdf_z.csv
rANS/
length_cdf_z.csv
offset_z.csv
cdf_r.csv
length_cdf_r.csv
offset_r.csv
7 Organization of the document
This document specifies the decoding process of point cloud data in ten successive annexes from Annex A
to Annex K. Annex A specifies the file format, Annex B specifies the code stream syntax, Annex C specifies
the geometry encoding and decoding architecture, Annex D specifies the colour encoding and decoding
architecture, Annex E specifies the synthesis transform, Annex F specifies the hyper decoders, Annex G
specifies the entropy decoder, Annex H specifies the binarization scheme, Annex I specifies the block
merging process, Annex J specifies the up-sampling process and finally, Annex K specifies the deep learning-
based super-resolution architecture.
4) PyTorch® is the trademark of The Linux Foundation®. This information is given for the convenience of users of this
document and does not constitute an endorsement by ISO/IEC of the product named. Equivalent products may be used if
they can be shown to lead to the same results.
© ISO/IEC 2025 – All rights reserved
Annex A
(normative)
File format
A.1 General
This annex specifies the use of the JPEG Pleno Point Cloud superbox which is designed to contain compressed
point cloud data and associated metadata. The listed boxes are defined as part of the JPL file format specified
in ISO/IEC 21794-1.
A.2 Organization of the JPEG Pleno Point Cloud superbox
Figure A.1 shows the hierarchical organization of the JPEG Pleno Point Cloud superbox contained by a JPL
file. This illustration does not specify nor imply a specific order to these boxes. In many cases, the file will
contain several boxes of a particular box type. The meaning of each of those boxes is dependent on the
placement and order of that particular box within the file.
This superbox shall contain the following core elements:
— a JPEG Pleno Point Cloud Header box containing parameterization information about the point cloud
such as geometry and colour parameters;
— a JPEG Pleno Point Cloud Geometry Data box containing the compressed geometry data of the point cloud;
— a JPEG Pleno Point Cloud Attribute Data box containing the compressed colour attributes data of the
point cloud.
Table A.1 lists all boxes defined as part of this clause. A box that is listed in Table A.1 as “Required” shall exist
within all conforming JPL files. For the placement of and restrictions on each box, see the relevant subclause
defining that box.
The IPR, XML, UUID and UUID boxes introduced in Annex A can be signalled, as well at the level of the JPEG
Pleno Point Cloud box, to carry point cloud specific metadata.
© ISO/IEC 2025 – All rights reserved
Figure A.1 — Hierarchical organization of a JPEG Pleno Point Cloud superbox
© ISO/IEC 2025 – All rights reserved
A.3 Defined boxes
A.3.1 General description
The following boxes shall properly be interpreted by all conforming readers. Each of these boxes conforms
to the standard box structure as defined in ISO/IEC 21794-1:2020, Annex A. The following clauses define
the value of the DBox field. It is assumed that the Lbox, Tbox and XLBox fields exist for each box in the file as
defined in ISO/IEC 21794-1:2020, Annex A.
Table A.1 — Defined boxes
Box name Type Superbox Required? Comments
JPEG Pleno Point Cloud ‘jppc’ Yes This box contains a series of boxes that
box contain the encoded point cloud, its
(0x6A70 7063)
Yes parameterization and associated meta-
data. (Defined in ISO/IEC 21794-1:2020,
Annex A)
JPEG Pleno Point Cloud ‘jpph’ Yes Yes This box contains generic information
Header box (A.3.3) about the point cloud, such as compo-
(0x6A70 7068)
nent information, geometry information
and colour information. (Defined in
subclause A.3.2)
Point Cloud Header box ‘phdr’ No Yes This box contains fixed length generic
(A.3.3.2) information about the point clouds,
(0x7068 6472)
such as point cloud dimensions, number
of points, number of components, and
bits per component. (Defined in sub-
clause A.3.3)
JPEG Pleno Point Cloud ‘pcgd’ Yes Yes This box contains a box that contains
Geometry Data box the encoded point cloud geometry data.
(0x7063 6764)
(Defined in Annex C)
JPEG Pleno Point Cloud ‘pcad’ Yes No This box contains a box that contains
Attribute Data box the encoded point cloud attribute data.
(0x7063 6164)
(Defined in Annex C)
A.3.2 JPEG Pleno Point Cloud Header box
A.3.2.1 General
The JPEG Pleno Point Cloud Header box contains generic information about the file, such as the number of
components, bits per component, colour space, and geometry information. This box is a superbox. Within a
JPL file, there shall be one JPEG Pleno Point Cloud Header box. The JPEG Pleno Point Cloud Header box shall
be located anywhere within the file after the File Type box but shall be before the JPEG Pleno Point Cloud
Geometry Data box and the JPEG Pleno Point Cloud Attribute Data box. It also shall be at the same level as the
JPEG Pleno Signature and File Type boxes (it shall not be inside any other superbox within the file).
The type of the JPEG Pleno Point Cloud Header box shall be ’jpph’ (0x6A70 7068).
This box contains several boxes. Other boxes may be defined in other standards and may be ignored by
conforming readers. Those boxes contained within the JPEG Pleno Point Cloud Header box, which are defined
within this clause, are as in Figure A.2.
© ISO/IEC 2025 – All rights reserved
Key
phdr point cloud header box
This box specifies information about the geometry bit depth and the number of components. This box shall be the
first box in the JPEG Pleno Header box and is specified in A.3.3.
i
Colr colour specification boxes
These boxes specify the colour space of the decompressed point cloud attributes. Their structures are specified
in Rec. ITU-T T.801 | ISO/IEC 15444-2. There shall be at least one
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...