Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio

This document specifies technology that supports the efficient transmission of immersive audio signals and flexible rendering for the playback of immersive audio in a wide variety of listening scenarios. These include home theatre setups with 3D loudspeaker configurations, 22.2 loudspeaker systems, automotive entertainment systems and playback over headphones connected to a tablet or smartphone.

Technologies de l'information — Codage à haute efficacité et livraison des medias dans des environnements hétérogènes — Partie 3: Audio 3D

General Information

Status: Withdrawn
Publication Date: 16-Aug-2022

ICS: 35.040.40 - Coding of audio, video, multimedia and hypermedia information

Technical Committee: ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information
Drafting Committee: ISO/IEC JTC 1/SC 29/WG 6 - MPEG Audio coding

Current Stage: 9599 - Withdrawal of International Standard
Start Date: 03-Feb-2026
Completion Date: 12-Feb-2026

Relations

Revised: ISO/IEC 23008-3:2026 - Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio
Effective Date: 17-Aug-2024

Revises: ISO/IEC 23008-3:2019/Amd 1:2019 - Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio — Amendment 1: Audio metadata enhancements
Effective Date: 28-Aug-2021

Revises: ISO/IEC 23008-3:2019 - Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio
Effective Date: 28-Aug-2021

Revises: ISO/IEC 23008-3:2019/Amd 2:2020 - Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio — Amendment 2: 3D Audio baseline profile, corrections and improvements
Effective Date: 28-Aug-2021

Overview

ISO/IEC 23008-3:2022 - "Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio" specifies technologies for efficient transmission and flexible playback of immersive audio. The standard defines decoder architectures, metadata formats and rendering approaches to support 3D audio over diverse listening scenarios, including home theatre (3D loudspeaker and 22.2 systems), automotive entertainment, and headphone playback from mobile devices.

Key topics and technical requirements

MPEG‑H 3D audio core decoder: Syntax, data structures and tool descriptions for the MPEG‑H 3D audio core decoder (profiles, levels and contribution modes).
Decoder architecture and processing domains: Block diagrams, rule sets for processing in time and QMF domains, sample-rate conversion and decoder delay considerations.
Coding tools and signal processing: Transform splitting, multichannel coding, tonal component coding, frequency‑domain prediction, LPD and bandwidth extension, high resolution envelope processing (HREP), and more.
Object metadata and rendering: Object metadata configuration, low‑delay metadata payloads, object rendering algorithms (including loudspeaker mesh, imaginary loudspeakers and HOA rendering matrices).
Dynamic range control (DRC) and loudness: Loudness metadata, DRC metadata syntax and decoding, selection of DRC sets, loudness normalization and peak limiting for different rendering contexts.
SAOC 3D and scene-aware tools: Support for scene-adaptive object coding (SAOC 3D), synchronization and payload definitions for spatial object coding.
Operational constraints and buffers: Minimum decoder input buffer, bit reservoir behavior and maximum bit rate considerations; stream access point and inter-frame dependency rules.
Interoperability elements: Downmix matrices, compatible profile/level sets and configuration descriptors to enable flexible rendering across heterogeneous playback devices.

Applications

Delivering immersive audio for home theatre systems, including 3D loudspeaker and 22.2 setups.
Enabling spatial audio in automotive entertainment (in‑car rendering and loudness/DRC management).
Streaming object‑based, immersive audio to headphones, tablets and smartphones, with metadata-driven rendering for listener personalization.
Broadcast, streaming and on‑demand services seeking efficient coding and adaptive rendering for many playback endpoints.

Who should use this standard

Audio codec and media delivery implementers (encoder/decoder vendors).
Consumer electronics manufacturers (AV receivers, TVs, mobile devices, car audio).
Streaming and broadcast engineers integrating immersive audio workflows.
App developers and content creators producing object‑based or scene‑based 3D audio content.
Test labs and integrators ensuring interoperability and conformance with MPEG‑H 3D audio profiles/levels.

Related standards

Other parts of the ISO/IEC 23008 series covering high efficiency coding and media delivery (refer to ISO catalog for adjacent parts).
Industry specifications referencing MPEG‑H formats and metadata practices for immersive audio delivery.

Keywords: ISO/IEC 23008-3:2022, 3D audio, MPEG‑H 3D audio, immersive audio, high efficiency coding, media delivery, object metadata, dynamic range control, SAOC 3D, HOA rendering.

ISO/IEC 23008-3:2022 - Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio
Released:17. 08. 2022 - Page 1 preview

Standard

ISO/IEC 23008-3:2022 - Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio Released:17. 08. 2022

English language

867 pages

sale 15% off

Preview

sale 15% off

Preview

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

Visit Website

NYCE

Mexican standards and certification body.

EMA Mexico Verified

Visit Website

Frequently Asked Questions

What is ISO/IEC 23008-3:2022?

ISO/IEC 23008-3:2022 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio". This standard covers: This document specifies technology that supports the efficient transmission of immersive audio signals and flexible rendering for the playback of immersive audio in a wide variety of listening scenarios. These include home theatre setups with 3D loudspeaker configurations, 22.2 loudspeaker systems, automotive entertainment systems and playback over headphones connected to a tablet or smartphone.

What is the scope of ISO/IEC 23008-3:2022?

What ICS categories does ISO/IEC 23008-3:2022 belong to?

ISO/IEC 23008-3:2022 is classified under the following ICS (International Classification for Standards) categories: 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

What standards are related to ISO/IEC 23008-3:2022?

ISO/IEC 23008-3:2022 has the following relationships with other standards: It is inter standard links to ISO/IEC 23008-3:2026, ISO/IEC 23008-3:2019/Amd 1:2019, ISO/IEC 23008-3:2019, ISO/IEC 23008-3:2019/Amd 2:2020. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

How can I access ISO/IEC 23008-3:2022?

ISO/IEC 23008-3:2022 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)

ISO/IEC 23008-3:2022 - Informa...

INTERNATIONAL ISO/IEC
STANDARD 23008-3
Third edition
2022-08
Information technology — High
efficiency coding and media delivery
in heterogeneous environments —
Part 3:
3D audio
Technologies de l'information — Codage à haute efficacité et livraison
des medias dans des environnements hétérogènes —
Partie 3: Audio 3D
Reference number
© ISO/IEC 2022
© ISO/IEC 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
© ISO/IEC 2022 – All rights reserved

Contents Page
Foreword . xiii
Introduction .xiv
1 Scope . 1
2 Normative references . 1
3 Terms, definitions, symbols, abbreviated terms and mnemonics . 2
3.1 Terms, definitions, symbols and abbreviated terms . 2
3.2 Mnemonics . 2
4 Technical overview . 2
4.1 Decoder block diagram . 2
4.2 Overview over the codec building blocks . 3
4.3 Efficient combination of decoder processing blocks in the time domain and QMF
domain . 6
4.4 Rule set for determining processing domains . 9
4.4.1 Audio core codec processing domain . 9
4.4.2 Mixing . 10
4.4.3 DRC-1 Operation domains (DRC in rendering context) . 10
4.4.4 Audio core codec interface domain to rendering . 10
4.4.5 Rendering context . 10
4.4.6 Post-processing context . 11
4.4.7 End-of-chain context . 11
4.5 Sample rate converter . 11
4.6 Decoder delay . 11
4.7 Contribution mode of MPEG-H 3D audio . 12
4.8 MPEG-H 3D audio profiles and levels . 12
4.8.1 General . 12
4.8.2 Profiles . 13
5 MPEG-H 3D audio core decoder . 27
5.1 Definitions . 27
5.1.1 Joint stereo . 27
5.1.2 MPEG surround based stereo (MPS 212) . 28
5.2 Syntax . 28
5.2.1 General . 28
5.2.2 Decoder configuration . 28
5.2.3 MPEG-H 3D audio core bitstream payloads . 51
5.3 Data structure . 72
5.3.1 General . 72
5.3.2 General configuration data elements . 72
5.3.3 Loudspeaker configuration data elements . 75
5.3.4 Core decoder configuration data elements . 77
5.3.5 Downmix matrix data elements . 81
5.3.6 HOA rendering matrix data elements . 84
5.3.7 Signal group information elements . 87
5.3.8 Low frequency enhancement (LFE) channel element, mpegh3daLfeElement() . 87
5.3.9 Compatible profile and levels sets. 88
5.4 Configuration element descriptions . 88
5.4.1 General . 88
5.4.2 Downmix configuration . 88
5.4.3 HOA rendering matrix configuration . 94
© ISO/IEC 2022 – All rights reserved iii

5.5 Tool descriptions . 98
5.5.1 General. 98
5.5.2 Quad channel element . 98
5.5.3 Transform splitting . 100
5.5.4 MPEG surround for mono to stereo upmixing . 107
5.5.5 Enhanced noise filling . 110
5.5.6 Audio pre-roll . 134
5.5.7 Fullband LPD . 137
5.5.8 Time-domain bandwidth extension . 148
5.5.9 LPD stereo coding . 161
5.5.10 Multichannel coding tool . 169
5.5.11 Filterbank and block switching . 179
5.5.12 Frequency domain prediction . 180
5.5.13 Long-term postfilter. 183
5.5.14 Tonal component coding . 188
5.5.15 Internal channel on MPS212 for low complexity format conversion . 198
5.5.16 High resolution envelope processing (HREP) tool . 210
5.6 Buffer requirements . 216
5.6.1 Minimum decoder input buffer . 216
5.6.2 Bit reservoir . 216
5.6.3 Maximum bit rate. 217
5.7 Stream access point requirements and inter-frame dependency . 217
6 Dynamic range control and loudness processing . 218
6.1 General. 218
6.2 Description . 218
6.3 Syntax . 219
6.3.1 Loudness metadata . 219
6.3.2 Dynamic range control metadata . 219
6.3.3 Data elements . 220
6.4 Decoding process . 222
6.4.1 General. 222
6.4.2 Dynamic range control . 224
6.4.3 Usage of downmixId in MPEG-H . 224
6.4.4 DRC set selection process . 225
6.4.5 DRC-1 for SAOC 3D Content . 227
6.4.6 DRC-1 for HOA content . 228
6.4.7 Loudness normalization . 229
6.4.8 Peak limiter . 230
6.4.9 Time-synchronization of DRC gains . 230
6.4.10 Default parameters . 230
7 Object metadata decoding . 230
7.1 General. 230
7.2 Description . 230
7.3 Syntax . 231
7.3.1 Object metadata configuration . 231
7.3.2 Top level object metadata syntax . 232
7.3.3 Subsidiary payloads for efficient object metadata decoding . 233
7.3.4 Subsidiary payloads for object metadata decoding with low delay . 239
7.3.5 Enhanced object metadata configuration . 244
7.4 Data structure . 247
7.4.1 Definition of ObjectMetadataConfig() payloads . 247
7.4.2 Efficient object metadata decoding . 248
7.4.3 Object metadata decoding with low delay . 256
iv © ISO/IEC 2022 – All rights reserved

7.4.4 Enhanced object metadata . 261
8 Object rendering . 264
8.1 Description . 264
8.2 Terms and definitions . 264
8.3 Input data . 264
8.4 Processing . 265
8.4.1 General remark . 265
8.4.2 Imaginary loudspeakers . 266
8.4.3 Dividing the loudspeaker setup into a triangle mesh . 267
8.4.4 Rendering algorithm . 269
9 SAOC 3D . 273
9.1 Description . 273
9.2 Definitions . 273
9.3 Delay and synchronization . 275
9.4 Syntax . 275
9.4.1 Payloads for SAOC 3D . 275
9.4.2 Definition of SAOC 3D payloads . 279
9.5 SAOC 3D processing . 281
9.5.1 Compressed data stream decoding and dequantization of SAOC 3D data . 281
9.5.2 Time/frequency transforms . 281
9.5.3 Signals and parameters . 282
9.5.4 SAOC 3D decoding . 284
9.5.5 Dual mode . 289
10 Generic loudspeaker rendering/format conversion . 289
10.1 Description . 289
10.2 Definitions . 291
10.2.1 General remarks . 291
10.2.2 Variable definitions . 291
10.3 Processing . 291
10.3.1 Application of transmitted downmix matrices . 291
10.3.2 Application of transmitted equalizer settings . 296
10.3.3 Downmix processing involving multiple channel groups . 296
10.3.4 Initialization of the format converter . 297
10.3.5 Audio signal processing. 313
11 Immersive loudspeaker rendering/format conversion . 319
11.1 Description . 319
11.2 Syntax . 321
11.3 Definitions . 321
11.3.1 General remarks . 321
11.3.2 Variable definitions . 322
11.4 Processing . 323
11.4.1 Initialization of the format converter . 323
11.4.2 Audio signal processing. 366
12 Higher order ambisonics (HOA) . 373
12.1 Technical overview . 373
12.1.1 Block diagram . 373
12.1.2 Overview of the decoder tools . 374
12.2 Syntax . 376
12.2.1 Configuration of HOA elements . 376
12.2.2 Payloads of HOA elements . 380
12.3 Data structure . 393
12.3.1 Definitions of HOA Config . 393
© ISO/IEC 2022 – All rights reserved v

12.3.2 Syntax of getSubbandBandwidths() . 397
12.3.3 Definitions of HOA payload . 398
12.4 HOA tool description . 406
12.4.1 HOA frame converter . 406
12.4.2 Spatial HOA decoding . 423
12.4.3 HOA renderer . 452
12.4.4 Layered coding for HOA . 461
13 Binaural renderer . 464
13.1 General. 464
13.2 Frequency-domain binaural renderer . 464
13.2.1 General. 464
13.2.2 Definitions . 466
13.2.3 Parameterization of binaural room impulse responses . 470
13.2.4 Frequency-domain binaural processing . 482
13.3 Time-domain binaural renderer . 489
13.3.1 General. 489
13.3.2 Definitions . 490
13.3.3 Parameterization of binaural room impulse responses . 492
13.3.4 Time-domain binaural processing . 497
14 MPEG-H 3D audio stream (MHAS) . 498
14.1 Overview . 498
14.2 Syntax . 498
14.2.1 Main MHAS syntax elements . 498
14.2.2 Subsidiary MHAS syntax elements . 500
14.3 Semantics . 501
14.3.1 mpeghAudioStreamPacket() . 501
14.3.2 MHASPacketPayload() . 502
14.3.3 Subsidiary MHAS packets. 503
14.4 Description of MHASPacketTypes . 503
14.4.1 PACTYP_FILLDATA . 503
14.4.2 PACTYP_MPEGH3DACFG . 504
14.4.3 PACTYP_MPEGH3DAFRAME . 504
14.4.4 PACTYP_SYNC . 504
14.4.5 PACTYP_SYNCGAP . 504
14.4.6 PACTYP_MARKER . 504
14.4.7 PACTYP_CRC16 and PACTYP_CRC32 . 505
14.4.8 PACTYP_DESCRIPTOR . 505
14.4.9 PACTYP_USERINTERACTION . 506
14.4.10 PACTYP_LOUDNESS_DRC . 506
14.4.11 PACTYP_BUFFERINFO . 506
14.4.12 PACTYP_GLOBAL_CRC16 and PACTYP_ GLOBAL_CRC32 . 506
14.4.13 PACTYP_AUDIOTRUNCATION . 507
14.4.14 PACTYP_AUDIOSCENEINFO . 508
14.4.15 PACTYP_EARCON . 508
14.4.16 PACTYP_PCMCONFIG . 508
14.4.17 PACTYP_PCMDATA. 508
14.4.18 PACTYP_LOUDNESS. 509
14.4.19 MHASPacketType specific requirements for MHASPacketLabel . 509
14.5 Application examples . 510
14.5.1 Light-weighted broadcast . 510
14.5.2 MPEG-2 transport stream . 510
14.5.3 CRC error detection . 511
14.5.4 Audio sample truncation . 511
vi © ISO/IEC 2022 – All rights reserved

14.6 Multi-stream delivery and interface . 512
14.7 Carriage of generic data . 514
14.7.1 Syntax . 515
14.7.2 Semantics . 515
14.7.3 Processing at the MPEG-H 3D audio decoder . 516
15 Metadata audio elements (MAE) . 516
15.1 General . 516
15.2 Syntax . 518
15.3 Semantics . 527
15.4 Definition of mae_metaDataElementIDs . 540
15.5 Loudness compensation after gain interactivity . 541
16 Loudspeaker distance compensation . 543
17 Interfaces to the MPEG-H 3D audio decoder . 544
17.1 General . 544
17.2 Interface for local setup information . 544
17.2.1 General . 544
17.2.2 WIRE output. 544
17.2.3 Syntax for local setup information . 545
17.2.4 Semantics for local setup information . 545
17.3 Interface for local loudspeaker setup and rendering. 546
17.3.1 General . 546
17.3.2 Syntax for local loudspeaker signalling. 546
17.3.3 Semantics for local loudspeaker signalling. 547
17.4 Interface for binaural room impulse responses (BRIRs) . 548
17.4.1 General . 548
17.4.2 Syntax of binaural renderer interface . 548
17.4.3 Semantics . 553
17.5 Interface for local screen size information . 557
17.5.1 General . 557
17.5.2 Syntax . 557
17.5.3 Semantics . 557
17.6 Interface for signaling of local zoom area. 558
17.6.1 General . 558
17.6.2 Syntax . 558
17.6.3 Semantics . 559
17.7 Interface for user interaction . 559
17.7.1 General . 559
17.7.2 Definition of user interaction categories . 559
17.7.3 Definition of an interface for user interaction . 560
17.7.4 Syntax of interaction interface . 561
17.7.5 Semantics of interaction interface . 562
17.8 Interface for loudness normalization and dynamic range control (DRC) . 564
17.9 Interface for scene displacement data . 565
17.9.1 General . 565
17.9.2 Definition of an interface for scene-displacement data . 565
17.9.3 Syntax of the scene displacement interface . 566
17.9.4 Semantics of the scene displacement interface . 566
17.10 Interfaces for channel-based, object-based, and HOA metadata and audio data . 567
17.10.1 General . 567
17.10.2 Expectations on external renderers. 567
17.10.3 Object-based metadata and audio data (object output interface) . 567
17.10.4 Channel-based metadata and audio data . 575
17.10.5 HOA metadata and audio data . 580
© ISO/IEC 2022 – All rights reserved vii

17.10.6 Audio PCM data . 584
17.11 Interface for positional scene displacement data . 585
17.11.1 General . 585
17.11.2 Syntax of the positional scene displacement interface . 585
17.11.3 Semantics of the positional scene displacement interface . 585
17.11.4 Processing . 586
18 Application and processing of local setup information and interaction data and scene
displacement data . 586
18.1 Element metadata preprocessing . 586
18.2 Interactivity limitations and restrictions . 592
18.2.1 General information . 592
18.2.2 WIRE interactivity . 592
18.2.3 Position interactivity . 593
18.2.4 Screen-related element remapping and object remapping for zooming . 593
18.2.5 Closest loudspeaker playout . 594
18.3 Screen-related element remapping . 594
18.4 Screen-related adaptation and zooming for higher order ambisonics (HOA). 597
18.5 Object remapping for zooming . 599
18.6 Determination of the closest loudspeaker . 601
18.7 Determination of a list of loudspeakers for conditioned closest loudspeaker playback 601
18.8 Processing of scene displacement angles for channels and objects (CO) . 603
18.9 Processing of scene displacement angles for scene-based content (HOA) .
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...