Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio - Amendment 3: MPEG-H 3D Audio Phase 2

Technologies de l'information — Codage à haute efficacité et livraison des medias dans des environnements hétérogènes — Partie 3: Audio 3D — Amendement 3: Phase 2 de l'audio 3D MPEG-H

General Information

Status
Withdrawn
Publication Date
09-Jan-2017
Withdrawal Date
09-Jan-2017
Current Stage
9599 - Withdrawal of International Standard
Start Date
28-Feb-2019
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 23008-3:2015/Amd 3:2017 - Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio — Amendment 3: MPEG-H 3D Audio Phase 2 Released:1/10/2017
English language
454 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 23008-3:2015/Amd 3:2017 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio - Amendment 3: MPEG-H 3D Audio Phase 2". This standard covers: Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio - Amendment 3: MPEG-H 3D Audio Phase 2

Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio - Amendment 3: MPEG-H 3D Audio Phase 2

ISO/IEC 23008-3:2015/Amd 3:2017 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 23008-3:2015/Amd 3:2017 has the following relationships with other standards: It is inter standard links to ISO/IEC 23008-3:2015, ISO/IEC 23008-3:2019. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 23008-3:2015/Amd 3:2017 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 23008-3
First edition
2015-10-15
AMENDMENT 3
2017-01
Information technology — High
efficiency coding and media delivery
in heterogeneous environments —
Part 3:
3D audio
AMENDMENT 3: MPEG-H 3D Audio
Phase 2
Technologies de l’information — Codage à haute efficacité et livraison
des medias dans des environnements hétérogènes —
Partie 3: Audio 3D
AMENDEMENT 3: Phase 2 de l’audio 3D MPEG-H
Reference number
ISO/IEC 23008-3:2015/Amd.3:2017(E)
©
ISO/IEC 2017
ISO/IEC 23008-3:2015/Amd.3:2017(E)
© ISO/IEC 2017, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Contents Page
Foreword . v
Introduction . vi
1 Profiles and Levels . 7
2 Technical Overview - Update. 18
3 MPEG Surround . 21
4 3D Audio Phase II – HOA (Subband Directional Prediction, Parametric Ambiance
Replication, Phase-based Decorrelation, HOA Layered Coding) . 23
5 Optimizations and Improvements for Low Bitrate Coding . 125
6 Joint Channels for Low Bitrate Coding . 163
7 Discrete Multi-Channel Coding Tool . 173
8 Updates to MHAS . 190
9 Metadata Updates . 197
9.1 Update of mae_Data() syntax and semantics . 197
9.2 Update of OAM data transmission and processing . 203
9.2.1 OAM syntax and semantics . 203
9.2.2 2D spread rendering . 218
9.2.3 Informative distance and depth spread rendering . 220
9.3 Signaling and Processing of Scene Displacement Angles for CO content . 221
9.4 Extension of screen-related processing for off-centered screens . 230
9.5 Update of closest speaker playout for the conditioned case . 235
9.6 Processing of excluded sectors . 237
9.7 Interface for channel-based, object-based, and HOA metadata and audio . 238
9.8 Diffuseness Rendering . 249
9.8.1 Diffuseness Processing . 249
9.8.2 Informative decorrelation filtering for diffuseness processing . 252
9.9 Updates of the element metadata preprocesssor . 253
9.10 Review of Metadata . 262
9.11 References . 271
10 Improvements for use in broadcast ecosystems . 271
10.1 Order of elements in mpegh3daDecoderConfig() and mpegh3daFrame() . 271
10.2 Overall delay alignment and constant decoder delay . 273
10.3 Broadcast Contribution Mode Operation of MPEG-H . 276
10.4 Audio Pre-Roll . 277
10.5 Multi-stream Handling . 284
11 SAOC signaling update . 287
12 Tool for Advanced Loudness Control . 289
13 Frequency-Domain Prediction and Time-Domain Post-Filtering . 293
14 Sample Rate Converter . 302
15 Low Complexity Downmix . 303
16 Tonal Component Coding . 378
17 Internal Channel on MPS212 for Low Complexity Format Conversion . 390
18 Low Complexity HOA Spatial Decoding and Rendering . 403
19 High Resolution Envelope Processing (HREP) . 417
© ISO/IEC 2017 – All rights reserved iii

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
20 Signaling of IGF start and stop bands . 428
21 Consolidated Tables for Configuration Extensions,
mpegh3daConfigExtension(),usacConfigExtType . 430
22 Consolidated Tables for Extensions Element Configuration and Payload,
mpegh3daExtElementConfig(),usacExtElementType . 432
23 Consolidated Tables for MAE Data Types, mae_data(), mae_dataType . 435
24 Consolidated Table for tcx_coding() . 437
25 Peak Limiter . 439
26 Informative Annex on screen-related adaptation of HOA content in complexity
constrained implementations . 441
27 Further Changes, Not Categorized . 442
28 Retaining original file length with MPEG-H 3D Audio . 447
AMD.OFL.1 General.447
AMD.OFL.2 Avoiding Leading Zero Sampl .447
AMD.OFL.3 Avoiding Trailing Zero Samples.448
29 Enhanced Noise Filling . 449
30 Scope . 453
31 Main Profile . 454
iv © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees established
by the respective organization to deal with particular fields of technical activity. ISO and IEC technical
committees collaborate in fields of mutual interest. Other international organizations, governmental and non-
governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO
and IEC have established a joint technical committee, ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are described in
the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the different types of
document should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC
Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on the ISO list
of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not constitute
an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment, as
well as information about ISO's adherence to the World Trade Organization (WTO) principles in the Technical
Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
Amendment 3 to ISO/IEC 23008-3:2015 was prepared by Joint Technical Committee ISO/IEC JTC 1,
Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia
information.
© ISO/IEC 2017 – All rights reserved v

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Introduction
The following text describes the Amendment 3 to the specification ISO/IEC 23008-3:2015 MPEG-H 3D Audio
in an "Amendment"-style, i.e. in "Replace A with B"-style. It includes additions and changes that serve a
number of purposes:
• improving the coding efficiency especially for low bitrate coding modes (for scene based as well as for
object based and for multichannel based content)
• adding descriptive metadata
• updating the MHAS description
• some improvements for usage of MPEG-H 3D Audio in broadcasting applications
• a tool for Advanced Loudness Control
• a layered coding mode for coding of scene based content
It is envisioned that this amendment will be merged with the current version of the MPEG-H 3D Audio
specification resulting in a second edition of the standard. Text with yellow highlight shall be adjusted by the
editor of a new edition of ISO/IEC 23008-3.
For ease of review the document is structured by clauses, each of which reflect an approved set of changes.
New Clauses, Tables and Figures are typically labelled "AMDX.Y", where X is the number of the clause it
appears in in this document and Y is an increasing integer counter.
vi © ISO/IEC 2017 – All rights reserved

AMENDMENT ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Information technology — High efficiency coding and media
delivery in heterogeneous environments — Part 3: 3D audio,
AMENDMENT 3: MPEG-H 3D Audio Phase 2
1 Profiles and Levels
Add the following definition of profiles and levels to clause 4 Technical Overview:
4.X MPEG-H 3D Audio profiles and levels
4.X.1 Introduction
This subclause defines profiles and their levels for MPEG-H 3D Audio.
Complexity units are defined to give an approximation of the decoder complexity in terms of processing power
required for the decoding process. The approximated processing power is given in "Processor Complexity
Units" (PCU), specified in Millions Operations Per Second (MOPS).
4.X.2 Profiles
The following Audio Profiles are defined:
1. The Main Profile of MPEG-H 3D Audio provides a complete set of features for low-bitrate and high-
quality coding, and rendering for all playback scenarios, exclusively based on the first edition of the
MPEG-H 3D Audio specification ISO/IEC 23008-3:2015 3D Audio.
2. The High Profile of MPEG-H 3D Audio provides a complete set of features for low-bitrate and high-
quality coding, and rendering for all playback scenarios.
The High Profile is a superset of the Low-complexity Profile.
3. The Low Complexity Profile provides features for broadcasting and streaming with a reduced
complexity of the decoder;
Table P1 — Summary of the Location of and Normative Reference to the Definitions of MPEG-H 3D
Audio profiles. USAC and MPEG-H 3DA Main Profile are provided for information only
MPEG- MPEG- MPEG-H
H 3DA H 3DA 3DA
defined USAC
sub-
Tool / Module in 23003-
Main High Low-
clause
ISO/IEC 3
Profile Profile Complexi
ty Profile
block
14496-3 4.6.11 X X X X
switching
window AAC based 14496-3 4.6.11 X X X X
shapes
additional windows 23003-3 6.2.9.3 X X X X
AAC based 14496-3 4.6.11 X X X X
filter bank
additional USAC 23003-3 7.9 X X X X
© ISO/IEC 2017 – All rights reserved 7

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
TNS 14496-3 4.6.9 X X X X
intensity 14496-3 4.6.8.2
coupling 14496-3 4.6.8.3
perceptual PNS 14496-3 4.6.13
noise
noise filling
23003-3 7.2 X X X X
synthesis
basic mid/side coding 14496-3 4.6.8.1 X X X X
MS
MDCT based complex
23003-3 7.7.2 X X X X
prediction
non-uniform 14496-3 4.6.1 X X X X
quantization
uniform 23003-3 7.1 X X X X
Huffman 14496-3 4.6.3
entropy
context adaptive
coding 23003-3 7.4 X X X X
arithmetic coding
base 14496-3 4.6.18 X X X
SBR
enhanced 23003-3 7.5 X X X
Parametric Stereo 8.6.4 /
14496-3
8.A
parametric
stereo MPEG Surround 2-1-2
23003-3 6.2.13 X X X
extension (incl. residual coding)
Quad Channel Element 23008-3 5.5 X X
ACELP 23003-3 7.14 X X X X
frequency scale factor based 14496-3 4.6.2 X X X X
domain
LPC based
noise 23003-3 X X X X
shaping
Intelligent IGF for FD
23008-3 X X X
Gap Filling
IGF for TCX and 23008-3
X X
Improved TBE in ACELP  Amd3
LPD coding LPD stereo 23008-3
X X
Amd3
Predictors frequency-domain
23008-3
for FD and prediction and time- X X
Amd3
TCX domain post-filtering
Discrete MCT
Multi- 23008-3
X X
channel Amd3
coding
Format Generic downmix X
10,
23008-3 X X
Amd3.1
Converter (Note 4)
Immersive Immersive rendering 11, X
23008-3 X X
Rendering within format converter Amd3.2 (Note 4)
Metadata Audio
Elements (MAE) and
Static
Audio Scene 23008-3 15 X X X
metadata
Information (ASI)
Decoder and Renderer
Dynamic Object Audio Metadata
object (OAM) 23008-3 7, 8 X X X
metadata Decoder and Renderer
MPEG Surround 23003-1
9 X
Extension Amd 3
SAOC-3D Decoder and Renderer 23008-3 9 X X
Decoder and Renderer 23008-3
X
HOA and 12 X X
(Note 5)
Amd3
8 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Near Field X
23008-3 X X
Compensation (Note1)
Subband Directional 23008-3
X
Prediction Amd3
Parametric Ambiance 23008-3
X
Replication (PAR) Amd3
Phase-based 23008-3
X
decorrelation Amd3
FD-binaural, TD-
23008-3 13 X X X (Note2)
Binaural binaural
HOA2Binaural H2B 23008-3 X X X (Note2)
DRC-1 23003-4 X X X (Note3)
DRC-2 (single band) 23003-4 X X X
DRC
DRC-2 (multi band) 23003-4
DRC-3 (single band) 23003-4 X X X
Sample Rate 23008-3 Amd3.
X X
Converter Amd3 3
Unguided clipping 23008-3
Peak Limiter D X X
prevention 23003-4
Loudness metadata and
23003-4 6 X X X
handling
Loudness
Loudness compensation 23008-3
X X
Amd3
MPEG-H 3D audio
23008-3 14 X X X
stream
MHAS
Truncation message
23008-3
and CRC packet type, X X
Amd3
ASI packet type
Carriage of MPEG-H 3D
23008-3
File Format Audio in ISO base (Note 6)
Amd2
media file format
Interfaces and
Interfaces
processing for
and 23008-3 17,18 X X X
Interaction data and
processing
local setup info
Carriage of System
Carriage of 23008-3
Data for the interaction X X
system data Amd4
with System Engine
Tonal Component 23008-3
TCC X
Coding Amd3
Internal Channel 23008-3
IC X
Amd3
High Resolution 23008-3
HREP X
Envelope Processing Amd3
Note 1: Restrictions apply dependent on the levels
Note 2: Implementation of binaural rendering is only mandated if headphone reproduction is supported.
Note 3: Multi-band DRC-1 shall be applied in the STFT domain of the TD format converter.
Note 4: The TD format converter downmix shall be applied for downmixing.
Note 5: In order to achieve target complexity for the LC profile at a given level implementers should study Annex G.
Note 6: File Format encapsulation is independent of the profile that is used for the bitstream. A profile level indicator is
part of the file format specification (see XXX).
© ISO/IEC 2017 – All rights reserved 9

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
4.X.2.1 Levels of the Low Complexity Profile
Table P2 — Levels and their corresponding restrictions for the Low Complexity Profile
NOTE 1 nd NOTE 1
1 48000 10 5 2 2.0 5 2 ch. + 3 static obj. 2 2 order + 3 static obj.
NOTE 1 th NOTE 1
2 48000 18 9 8 7.1 9 6 ch. + 3 static obj. 4 4 order + 3 static obj.
th
3 48000 32 16 12 11.1 16 12 ch. + 4 obj. 6 6 order + 4 obj.
th
4 48000 56 28 24 22.2 28 24 ch. + 4 obj. 6 6 order + 4 obj.
th
5 96000 56 28 24 22.2 28 24 ch. + 4 obj. 6 6 order + 4 obj.
NOTE 1 – In this context "static objects" are understood as channel-based signals without accompanying OAM data which
are not also associated to a channel bed
— The use of switch groups determines the subset of core channels out of the core channels in the
bitstream that shall be decoded.
— If the mae_AudioSceneInfo() contains switch groups (mae_numSwitchGroups>0), then the
elementLengthPresent flag shall be 1
— The number of channels of the signaled referenceLayout shall not exceed the maximum number of
loudspeaker output channels as defined in the levels Table P2
Table P3 — Approximated worst case processing power (PCU) of decoder modules and the whole
decoder for the different Levels of the Low Complexity Profile given in MOPS
Objects
Core Format Object only Worst
2 1
Level LC Converter Renderer HOA Renderer DRC Limiter Binaural case PCU
33 3 0 3 9 6 4 7 58
2 59 10 0 17 16 18 5 19 118
3 106 36 7 36 29 24 6 27 206
4 186 113 7 93 50 30 9 46 392
5 373 226 14 186 50 34 19 92 758
NOTE: The complexity numbers for binaural processing are calculated on the basis of BRIR filters of 1
second length measured in a BS.1116 compliant room.
NOTE: The complexity numbers for the HOA spatial decoding and rendering are based on the Low
10 © ISO/IEC 2017 – All rights reserved
Level
Max. Sampling rate
Max. no. of core ch. in
compressed data stream
Max. no. of decoder
processed core ch.
Max. no. of loudspeaker
output ch.
Example of max.
loudspeaker configuration
Max. no. of decoded objects
Example of a
max. Config C+O
Max. HOA order
Example of
max. HOA order + O
ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Complexity Combined HOA Spatial Decoding and Rendering described in Annex G.
4.X.2.2 Restrictions for the Low Complexity Profile and Levels
In the low complexity profile the core decoder, format converter, object renderer, HOA renderer and DRC and
peak limiter operate in time domain, MDCT-domain or STFT-domain.
The following restrictions apply for HOA renderer and decoder:
Table P4 — Restrictions for the HOA spatial Decoding and Rendering according to the Level of the
Low Complexity Profile
Maximum allowed value depending on
Mpegh3daProfileLevelIndication
Restriction applies to
Lvl 1 Lvl 2 Lvl 3 Lvl 4 Lvl 5
HOA order (max) 2 4 6 6 6
Number of Predominant Sounds (max) 3 5 7 8 8
Number of directional signals used in prediction (max) 2 3 4 5 5
The Near Field Compensation (NFC) processing may be N/A
applied to HOA content of an order which is smaller or (NFC not 1 2 3 3
equal to:
allowed)
NFC may be employed in not more than one signal group of type SignalGroupTypeHOA.
The following restrictions apply to MPEG-D DRC (ISO/IEC 23003-4) when employed as part of MPEG-H 3D
audio:
— drcFrameSizePresent and timeDeltaMinPresent shall be set to 0.
— gainInterpolationType shall be set to 1.
— dependsOnDrcSetPresent shall be set to 0 for drcInstructionsUniDrc() with downmixId == 0.
— HOA signal groups shall be restricted to one drcChannelGroup and DRC gains shall be applied to the
HOA core channels (HOATransportChannels).
— The values of bsSequenceIndex within drcInstructionsUniDrc() shall be unique in simultaneously
applied DRC sets except for bsSequenceIndex == 0.
— Multiband DRC shall be restricted to drcInstructionsUniDrc() with downmixId == 0. If the bitstream
should contain multiband DRC, the number of multiband DRC core channels shall be restricted as
follows:
(numAudioChannels +
numAudioObjects + numAudioObjectsMB +
numHOATransportChannels + numHOATransportChannelsMB)
≤ (numCoreChannelsMax(Lvl) – dependsOnDrcSetPresentFlag – 1)
, where:
— numAudioChannels, numAudioObjects and numHOATransportChannels are the number of C,
O and HOA core channels as specified in Table 4.
© ISO/IEC 2017 – All rights reserved 11

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
— numAudioObjectsMB and numHOATransportChannelsMB are the number of O and HOA core
channels out of numAudioObjects and numHOATransportChannels that contain multiband
DRC
— numCoreChannelsMax is the maximum number of core channels ("No of Core ch") depending
on the Mpegh3daProfileLevelIndication field as defined in Table P2
— dependsOnDrcSetPresentFlag is set to one if the bitstream contains any configuration with
dependsOnDrcSetPresent==1 (otherwise zero).
— nNodes shall be restricted to a maximum value of 32, where nNodes is the number of encoded gain
values in the current DRC frame.
Table P5 — Restrictions applying to DRC processing according to the Levels of the Low Complexity
Profile
Maximum allowed value depending on
Mpegh3daProfileLevelIndication
Restriction applies to
Lvl 1 Lvl 2 Lvl 3 Lvl 4 Lvl 5
nDrcChannelGroupsTotal (Note 1) 5 9 16 28 28
drcCoefficientsUniDrcCount 4 4 4 4 4
bandCount (Note 2) 2 4 4 4 4
sequenceCountTotal (Note 3) 24 28 32 48 63
drcInstructionsUniDrcCount 16 16 32 32 32
Note 1: Maximum allowed number of simultaneously active DRC channel groups in all applied DRC sets.
Note 2: Maximum allowed number of DRC bands for multiband DRC.
Note 3: Sum of all nDrcBands in drcGainSequence() structures plus number of sequences with gainCodingProfile=3.
The following tool specific restrictions apply:
— If the independent noise filling (INF) of the intelligent gap filling (IGF) is activated (i.e. if igfUseEnf==1),
then the Complex Prediction tool shall be restricted to real-only prediction, i.e. complex_coef shall be
0.
— If Stereo Filling is activated (i.e. if stereo_filling==1), then the Complex Prediction tool shall be
restricted to real-only prediction, i.e. complex_coef shall be 0.
— The independent noise filling of the intelligent gap filling shall not be employed in cases where igfBgn
corresponds to an audio frequency higher than 8 kHz.
— The LPD mode shall only be employed at 3DA core coder sampling rates (as defined in Table 2 —
Syntax of mpegh3daConfig()) ≤ 32000 Hz
EXAMPLE For a 48 kHz input signal, the encoder resamples the signal to a 32 kHz core coder sampling rate and the
LPD decoder operates at this lower sampling rate. After the core decoding the signal is resampled to 48 kHz.
— The multi-channel coding tool (MCT) shall not employ more stereo boxes than specified in Table P9
Table P9 — Restrictions applying to MCT processing according to the Levels of the Low Complexity
Profile
Maximum allowed value depending on
Mpegh3daProfileLevelIndication
Restriction applies to
Lvl 1 Lvl 2 Lvl 3 Lvl 4 Lvl 5
Number of stereo boxes in MCT 5 9 16 28 28
12 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
The following restrictions apply to coding of audio objects and the associated OAM data:
Table P10 — Restrictions applying to Object Processing According to the Levels of the Low
Complexity Profile
Maximum allowed value n depending on
Mpegh3daProfileLevelIndication
Restriction applies to
Lvl 1 Lvl 2 Lvl 3 Lvl 4 Lvl 5
(number of objects without divergence) +
5 9 16 28 28
3·(number of objects with divergence > 0) ≤ n
— Efficient Object Metadata Decoding is not permitted, i.e. lowDelayMetadataCoding shall be 1.
— Furthermore the OAM frame length shall comply to:
OAMFrameLength = outputFrameLength / n,
with n being a positive integer in the range of {1,…,4}
— Objects shall not employ divergence and spread at the same time.
— If an object is defined with a spatial extent (spread α > 0.0° for uniform spread, spread_width
> 0.0° for non-uniform spread) it shall have a divergence value equal to zero.
α
width
— If an object is defined with a divergence value > 0, it shall not have a spatial extent (spread α
shall be equal to 0.0° for uniform spread, spread_width α shall be equal to 0.0° for non-
width
uniform spread)
The following restrictions apply to binaural rendering:
The value of bsBinauralDataFormatID in BinauralRendering() should be set to 1 (if the FD Binaural renderer is
implemented) or to 2 (if the TD Binaural renderer is implemented). The value of bsBinauralDataFormatID can
be set to 0 if the Parameterization of Binaural Room Impulse Responses according to 13.2.3 or 13.3.3 is
implemented.
The number of BRIR sets is restricted to a maximum number of 3.
In case of H2B filters, the number of BRIR filter pairs to be provided shall correspond to ‘Maximum H2B filter
order’ column in Table P8. In the other cases, the following applies:
The number of BRIR pairs in each BRIR set shall correspond to the number indicated in the relevant level-
dependent row of Table P8. The measured BRIR positions shall correspond to all nominal geometric positions
corresponding to the list of LoudspeakerGeometry indices in Table P8. The correspondence between
LoudspeakerGeometry index and nominal geometric position is defined in ISO/IEC 23001-8. Thereby, it is
ensured that one BRIR pair is available for each possible regular input channel configuration that can be used
within the indicated level.
An input channel configuration is regular if it is defined by means of an ISO/IEC 23001-8
ChannelConfiguration or a list of ISO/IEC 23001-8 LoudspeakerGeometry (CICPspeakerIdx).
If binaural rendering is activated, the measured BRIR positions shall be passed to the
mpegh3daLocalSetupInformation(). Thus, all renderer stages are set to the target layout that is equal to the
transmitted channel configuration. As one BRIR is available per regular input channel, the Format Converter
can be passed through in case regular input channel positions are used.
© ISO/IEC 2017 – All rights reserved 13

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Table P8 — The binaural restrictions for the LC profile
Level Number Maximum BRIR positions by means of BRIR positions by means of
of BRIR H2B filter Loudspeaker Position LoudspeakerGeometry according
pairs order Abbreviation to ISO/IEC 23001-8
1 3 1 L, R, C 0, 1, 2
2 10 2 L, R, C, LS, Rs, Lc, Rc, Lsr, Rsr, Cs 0, 1, 2, 4, 5, 6, 7, 8, 9, 10,
L, R, C, Ls, Rs, Lc, Rc, Lsr, Rsr, Cs,
0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 13, 14, 17,
3 21 3 Lss, Rss, Lv, Rc, Cv, Lvr, Rvr, Cvr,
18, 19, 20, 21, 22, 25, 30, 31
Rs, Lvs, Rvs
L, R, C, Ls, Rs, Lc, Rc, Lsr, Rsr, Cs,
0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 13, 14, 17,
Lss, Rss, Lv, Rc, Cv, Lvr, Rvr, Cvr,
4 28 5 18, 19, 20, 21, 22, 23, 24, 25, 27, 28,
Lvss, Rvss, Ts, Lb, Rb, Cb, Lvs, Rvs,
29, 30, 31, 37, 38,
Lbs, Rbs,
L, R, C, Ls, Rs, Lc, Rc, Lsr, Rsr, Cs,
0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 13, 14, 17,
Lss, Rss, Lv, Rc, Cv, Lvr, Rvr, Cvr,
5 28 6 18, 19, 20, 21, 22, 23, 24, 25, 27, 28,
Lvss, Rvss, Ts, Lb, Rb, Cb, Lvs, Rvs,
29, 30, 31, 37, 38,
Lbs, Rbs,
The following additional parameter values restrictions apply:
— The value of kMax in FdBinauralRendererParam() shall be equal to or less than 48 (bands).
— The value of kConv in FdBinauralRendererParam() shall be equal to 32.
— The values of rt60[k] in SfrBrirParam() shall be less than or equal to 1.0 (sec).
— The average of the values of nFilter[k] shall be less than or equal to 64.
— The values of nFilter[k] in VoffBrirParam() should be less than or equal to 256.
The following coding tools, modules, or features shall not be employed
— Time warped filterbank
— 768 sample outputFrameLength, i.e. coreSbrFrameLengthIndex shall not be 0
The following text describes restrictions dependent on the length of the arithmetic coder codeword,
arith_data(). For this text the following definitions apply:
F core coder sampling rate as indicated by means of usacSamplingFrequencyIndex or
sOut
usacSamplingFrequency in mpegh3daConfig()
F maximum allowed sampling rate of a given level in this profile
sMax
N maximum number of decoder processed core channels of a given level in this profile
chMax
according to Table P2.
N number of core coder channels in which the long term post filter (LTPF) is applied
chLtpf
N number of core coder channels in which the independent noise filling (INF) is applied
chInf
14 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Nbits (ch) number of bits used for arithmetic coding of spectral data, arith_data(), for core coder
arith_data()
channel ch for a given frame
Nbits = ∑ 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 (𝑐𝑐ℎ), i.e. the sum of all bits used for the arithmetic coding of spectral
arith_all all channels arith_data()
data of all core coder channel
— In any given audio frame Nbits shall comply with the following restriction:
arith_all
(3072∙𝑁𝑁 −2048∙𝑁𝑁 −2048∙𝑁𝑁 )∙𝐹𝐹
sMax
𝑐𝑐ℎ𝑀𝑀𝑀𝑀𝑀𝑀 𝑐𝑐ℎ𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑐𝑐ℎ𝐼𝐼𝑛𝑛𝐿𝐿
Nbits <
arith_all
𝐹𝐹
sOut
The following restrictions apply to the AudioPreRoll() extension:
— Decoders conforming to this profile shall support the full decoding and correct handling of the
AudioPreRoll() extension .
— The number of pre-roll frames, numPreRollFrames, in an AudioPreRoll() extension payload shall not
exceed 1 (one).
— In access units that are embedded as pre-roll in an AudioPreRoll() extension the
usacExtElementPresent field for extensions of type ID_EXT_ELE_AUDIOPREROLL shall be 0.
The following restrictions apply to the employed sampling rate and the resampler block:
— The sampling rate that is signaled by means of usacSamplingFrequencyIndex or
usacSamplingFrequency shall be one of the values in the first column of Table P6.
— Depending on the above mentioned sampling rate and the profile level the resampler may employ one
of the resampling ratios indicated in Table P6.
Table P6 — Allowed Sampling Rates and Resampling Ratios
Allowed resampling ratio depending on
Mpegh3daProfileLevelIndication
Allowed
sampling rate
Lvl 1 Lvl 2 Lvl 3 Lvl 4 Lvl 5
96000 N/A N/A N/A N/A 1
88200 N/A N/A N/A N/A 1
64000 N/A N/A N/A N/A 1.5
58800 N/A N/A N/A N/A 1.5
48000 1 1 1 1 1 or 2
44100 1 1 1 1 1 or 2
32000 1.5 1.5 1.5 1.5 1.5 or 3
29400 1.5 1.5 1.5 1.5 1.5 or 3
24000 2 2 2 2 2
22050 2 2 2 2 2
16000 3 3 3 3 3
14700 3 3 3 3 3
© ISO/IEC 2017 – All rights reserved 15

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
The following restrictions apply to the coding of the audio scene information structure:
Table P7 — ProfileLevel dependent restrictions to
selected fields of the mae_AudioSceneInfo()
Allowed maximum value depending on
Mpegh3daProfileLevelIndication
Restriction applies to
Lvl 1 Lvl 2 Lvl 3 Lvl 4 Lvl 5
mae_numGroups 5 9 16 28 28
mae_numSwitchGroups 2 4 8 14 14
mae_numGroupPresets 4 4 8 16 31
mae_bsGroupPresetNumConditions 5 9 16 16 16
mae_numDownmixIdGroupPresetExtensions per
4 4 8 16 31
mae_groupPresetID
mae_bsNumDescLanguages 4 4 4 8 16
mae_bsDescriptionDataLength 256 256 256 256 256
— If mae_numSwitchGroups > 0, then elementLengthPresent shall be set to 1.
4.X.2.3 Levels of the Main Profile
Currently blank — Placeholder for Main Profile
4.X.2.4 Levels of the High Profile
Currently blank — Placeholder for High Profile
In 5.3.2 replace Table 39 — Value of mpegh3daProfileLevelIndication
Table 39 — Value of mpegh3daProfileLevelIndication
value Indication of Profile Indication of Level
0x00 reserved for ISO use -
0x01-0xff reserved for future profile definition
with:
Table 39 — Value of mpegh3daProfileLevelIndication
value Indication of Profile Indication of Level
0x00 reserved for ISO use -
0x01 Main Profile L1
0x02 Main Profile L2
0x03 Main Profile L3
0x04 Main Profile L4
0x05 Main Profile L5
0x06 High Profile L1
16 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
0x07 High Profile L2
0x08 High Profile L3
0x09 High Profile L4
0x0A High Profile L5
0x0B Low Complexity Profile L1
0x0C Low Complexity Profile L2
0x0D Low Complexity Profile L3
0x0E Low Complexity Profile L4
0x0F Low Complexity Profile L5
0x10-0xFF reserved for future profile definition
Please also study Clause 31 of this Amendment document
© ISO/IEC 2017 – All rights reserved 17

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
2 Technical Overview - Update
Replace Figure 1 — Block diagram of the 3D-Audio decoder. (DRC: Dynamic Range Control,
SAOC: Spatial Audio Object Coding, HOA: Higher Order Ambisonics, LN: Loudness Normalization,
PL: Peak Limiter) with:
Add the following descriptive text after the description of the "HOA Decoder and Renderer" in
clause 4.2:
The MPEG Surround Decoder takes the downmix signals coming from the MPEG-H 3D Audio decoder and
performs the guided MPEG Surround upmix using the MPEG Surround side information to reproduce the
multichannel signal for the transmitted loudspeaker layout.
In Table 1 — MPEG-H 3DA functional blocks and internal processing domain replace:
1)
Audio Core MPEG-H 3D Audio Core Coder FD or TD
with:
1)
MPEG-H 3D Audio Core Coder FD or TD
Audio Core
MPEG Surround FD
18 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
Replace Figure 2 — MPEG-H 3D audio decoder overview with signal processing context with:
© ISO/IEC 2017 – All rights reserved 19

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
In clause “4.2 Overview over the codec building blocks” after the first paragraph add following text
and figure:
Arithm.
Signal
Decod.
Control
Noise
Inv. Bit
Filling
In/
Quant. Demux
Synth.
Out
Scaling
Functional Unit
FDP
LPC
MCT Coeff
Dequant.
M/S
Bit
FAC FDNS ACELP
Demux
Bit
IGF
Demux
IGF
TNS
TNS
Block
Switching
IMDCT TBE
Filter Bank
LPD
Part 1
Stereo
Transition
Windowing
Part 2
Pitch
Enhancement
LTPF
Bit
eSBR
Demux
á sbrRatio
MPEG
Surround
Output
Time
Signal
Figure AMD3.XX — Simplified block diagram of the typical MPEG-H core decoder configuration
Figure AMD3.XX shows a simplified block diagram of the typical MPEG-H core decoder building blocks. The
major differences compared to MPEG-D USAC technology are highlighted in yellow. The highlighted tools are
described in detail in section “MPEG-H 3D Audio Core decoder”.
20 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
3 MPEG Surround
Add the following new clause:
x MPEG Surround
x.1 Technical Overview
The output of the 3D Audio core decoder of the "channels / prerendered objects" path may be further
processed by MPEG Surround (MPS). Figure AMD1.1 shows a schematic of a combined 3D Audio core
decoder and a MPS decoder.
Figure AMD1.1: Block Diagram of a 3D Audio Core Decoder with an MPEG Surround decoder
If the SBR tool is active, a 3D Audio core decoder can typically be efficiently combined with a subsequent
MPS decoder by connecting them in the QMF domain in the same way as it is described for USAC in clause
4.3 of ISO/IEC 23003-1:2012, MPEG-D (MPEG audio technologies), Part 3: Unified Speech and Audio Coding,
2012. . If a connection in the QMF domain is not possible, the tools need to be connected in the time domain.
If MPS side information is embedded into a 3D Audio bitstream by means of the usacExtElement mechanism
(with usacExtElementType being ID_EXT_ELE_MPEGS), the time-alignment between the 3D Audio data and
the MPS data assumes the most efficient connection between the 3D Audio core decoder and the MPS
decoder. If the SBR tool in the 3D Audio core decoder is active and if MPS employs a 64-band QMF domain
representation (see 6.6.3 in ISO/IEC 23003-1:2007, Information technology — MPEG audio technologies —
Part 1: MPEG Surround), the most efficient connection is in the QMF domain. Otherwise, the most efficient
connection is in the time domain. This corresponds to the time-alignment for the combination of HE-AAC and
MPS as defined in 4.4, 4.5, and 7.2.1 of ISO/IEC 23003-1:2007, Information technology — MPEG audio
technologies — Part 1: MPEG Surround.
The additional delay introduced by adding MPS decoding after 3D Audio decoding is given by clause 4.5 of
ISO/IEC 23003-1:2007, Information technology — MPEG audio technologies — Part 1: MPEG Surround and
depends on whether HQ MPS or LP MPS is used, and whether MPS is connected to the 3D Audio core
decoder in the QMF domain or in the time domain.
If multiple signal groups of type SignalGroupTypeChannels are present in the bitstream, one extension
element conveying MPEG Surround data shall only refer to exactly only signal group. As per 5.3.1, the
corresponding channel elements shall directly follow that extension element.
If the MPEG Surround tool shall be used for one signal group of type SignalGroupTypeChannels, all core
coder channels belonging to that signal group shall be fed to the MPEG Surround tool.
x.2 Syntax and Data Structure
The bitstream syntax and data structure are identical to the definitions in ISO/IEC 23003-1:2007, Information
technology — MPEG audio technologies — Part 1: MPEG Surround and ISO/IEC 23003-1:2007/Amd.3:201x,
MPEG Surround extension for 3D Audio.
© ISO/IEC 2017 – All rights reserved 21

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
x.3 Tool Description
The processing of the MPEG Surround tools are fully specified in ISO/IEC 23003-1:2007, Information
technology — MPEG audio technologies — Part 1: MPEG Surround and ISO/IEC 23003-1:2007/Amd.3:201x,
MPEG Surround extension for 3D Audio
22 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
4 3D Audio Phase II – HOA (Subband Directional Prediction, Parametric Ambiance
Replication, Phase-based Decorrelation, HOA Layered Coding)
The following changes are related to HOA Matrix Encoder / Decoder
In Table 23 — Syntax of HOARenderingMatrix() replace:
precisionLevel 2 uimsbf
if (gainLimitPerHoaOrder) { 1 uimsbf
with:
maxHoaOrder = sqrt(NumOfHoaCoeffs)-1;
precisionLevel 2 uimsbf
isNormalized 1 uimsbf
if (gainLimitPerHoaOrder) { 1 uimsbf
In the same Table, replace
if (isFullMatrix) { 1 uimsbf
firstSparseOrder 1 uimsbf
with:
if (isFullMatrix==0) { 1 uimsbf
nbitsHoaOrder = ceil(log2(maxHoaOrder+1));
firstSparseOrder nbitsHoaOrder uimsbf
In the same Table, replace
} else { /* isAnyValueSymmetric==0 */
if { isAnySignSymmetric) { 1 uimsbf
for (i=0; i signSymmetricPairs[i] = boolVal 1 uimsbf
with:
} else {
if (isAnySignSymmetric) { 1 uimsbf
for (i=0; i signSymmetricPairs[i] = boolVal; 1 uimsbf
© ISO/IEC 2017 – All rights reserved 23

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
In Table 25 — Syntax of DecodeHoaMatrixData() replace:
for (i = 0; i < inputCount; ++i) {
for (j = 0; j < outputCount; ++j) {
hoaMatrix[i *outputCount+j] *= signMatrix[i*outputCount+j];
hoaMatrix[i *outputCount+j] /= sqrt(2 × ceil(sqrt(i+1)-1) + 1);
}
}
with:
for (i = 0; i < inputCount; ++i) {
for (j = 0; j < outputCount; ++j) {
hoaMatrix[i *outputCount+j] *= signMatrix[i*outputCount+j];
hoaMatrix[i *outputCount+j] /= sqrt(2 × ceil(sqrt(i+1)-1) + 1);
}
}
if(isNormalized) {
currentScalar = 0.0;
for (i = 0; i < inputCount; ++i) {
for (j = 0; j < outputCount; ++j) {
if (!outputConfig[j].isLFE)
currentScalar += hoaMatrix[i * outputCount + j] *
hoaMatrix[i * outputCount + j];
}
}
currentScalar = 1.0/sqrt(currentScalar);
for (i = 0; i < inputCount; ++i) {
for (j = 0; j < outputCount; ++j) {
if(!outputConfig[j].isLFE)
hoaMatrix[i * outputCount + j] *= currentScalar;
}
}
In subclause 5.3.6 HOA Rendering Matrix Data Elements add before precisionLevel:
maxHoaOrder HOA Order of the transmitted matrix.
isNormalized
Indicates if the HOA rendering matrix 𝑫𝑫 is energy normalized,
(𝑁𝑁+1)
𝐿𝐿 2
so that || 𝑫𝑫|| =∑ ∑ 𝐷𝐷 = 1 with 𝑙𝑙 being the non-
𝑓𝑓 𝑙𝑙=1 𝑙𝑙,𝑛𝑛
𝑛𝑛=1
LFE loudspeakers in the outputConfig.
In the same subclause before firstSparseOrder:
24 © ISO/IEC 2017 – All rights reserved

ISO/IEC 23008-3:2015/Amd. 3:2017(E)
nbitsHoaOrder Number of bits reading firstSparseOrder.
In Table 24 5.4.3.3 Decoding of HOA Rendering Matrix Coefficients after:
In this case the code words to decode the individual matrix elements for the left
loudspeaker are reduced or completely omitted accordingly.
Add:
If the bitfield isNormalized was set to 1 the final HOA rendering matrix 𝑫𝑫 is created by
dividing each weighting value in the 𝐿𝐿 rows of the HOA rendering matrix that are associated
(𝑁𝑁+1)
𝐿𝐿 2
with non-LFE loudspeakers by the matrix’s Frobenius Norm ∑ ∑ 𝐷𝐷 computed from
𝑙𝑙=1 𝑙𝑙,𝑛𝑛
𝑛𝑛=1
its 𝐿𝐿 rows associated with non-LFE loudspeakers.
The following changes are related to the chapter 12 Higher Order Ambisonics (HOA).
Replace Figure 33 — Simplified block diagram of the decoder with:
In subclause 12.1.1 Block Diagram replace:
The fixed subset of the (𝐼𝐼−𝑀𝑀) HOA coefficient signals is re-correlated, this means the decorrelation at the
HOA encoding stage is reversed. Next, all of the (𝐼𝐼−𝑀𝑀
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...