Information technology - MPEG audio technologies - Part 2: Spatial Audio Object Coding (SAOC)

This document specifies the reference model of the spatial audio object coding (SAOC) technology that is capable of recreating, modifying and rendering a number of audio objects based on a smaller number of transmitted channels and additional parametric data.

Technologies de l'information — Technologies audio MPEG — Partie 2: Codage d'objet audio spatial (SAOC)

General Information

Status
Published
Publication Date
11-Dec-2018
Current Stage
9060 - Close of review
Completion Date
04-Jun-2029

Relations

Effective Date
28-Jan-2017
Effective Date
28-Jan-2017
Effective Date
28-Jan-2017
Effective Date
28-Jan-2017
Effective Date
28-Jan-2017
Effective Date
28-Jan-2017
Effective Date
28-Jan-2017

Overview

ISO/IEC 23003-2:2018 - Spatial Audio Object Coding (SAOC) is part of the MPEG audio technologies series. The standard specifies a reference model for SAOC: a parametric, object-based approach that can recreate, modify and render multiple audio objects from a smaller number of transmitted channels plus compact side‑information. SAOC enables efficient delivery of object-based and immersive audio while preserving compatibility with legacy channel-based receivers.

Key topics and technical requirements

  • Reference model and architecture - Defines the SAOC transcoder/decoder behavior (reconstruction, modification, rendering) and interaction with MPEG Surround (MPS) decoders.
  • Parametric coding - Uses low‑rate parametric data (parameter bands, parameter time slots, parameter sets) to represent object spatial attributes instead of transmitting full multichannel streams.
  • Time/frequency processing - Includes hybrid filterbank concepts (QMF + hybrid subbands) and frame-based processing for time/frequency transforms.
  • Profiles and modes - Specifies profiles and levels (e.g., baseline, low‑delay, SAOC‑DE, low‑power) and corresponding processing modes and constraints.
  • Transcoding and rendering - SAOC can act as a transcoder to MPS for multichannel setups (e.g., 5.1) or as a decoder/renderer for binaural or stereo outputs.
  • Stream syntax and transport - Defines SAOC payloads, side‑information transport (including PCM channels and MPEG environments), and rendering information description files.
  • Inter-stream merging - Parametric streams can be merged (MCU‑like functionality) to combine multiple SAOC sources at parameter level.
  • Conformance and reference software - Includes conformance testing requirements and reference software structure to support implementation and interoperability.
  • Ancillary features - Effects processing, MCU combiner, low‑power and low‑delay modes are described; normative tables and annexes provide implementation guidance.

Applications and practical value

  • Efficient delivery of immersive and object-based audio in streaming, broadcast and conferencing.
  • Enabling device-adaptive rendering: same transmitted bitstream supports legacy stereo/mono devices and advanced multichannel or binaural renderers.
  • Interactive or multi-source scenarios (e.g., multi-party conferencing, multi-stream mixing) where parametric merging and real‑time modification are needed.
  • Low‑delay and low‑power use cases for mobile and real‑time communications.

Who should use this standard

  • Audio codec and player developers implementing MPEG audio technologies.
  • Broadcast, streaming and conferencing platform engineers seeking object-based spatial audio.
  • Manufacturers of home-theatre, VR/AR, and mobile audio devices requiring scalable, compatible spatial rendering.
  • Standardization and interoperability test teams using the conformance and reference software artifacts.

Related standards

  • ISO/IEC 23003-1:2007 (MPEG Surround / MPS) - closely related for multichannel rendering and integration with SAOC.

Keywords: ISO/IEC 23003-2:2018, SAOC, Spatial Audio Object Coding, MPEG audio technologies, object-based audio, parametric coding, MPS, spatial audio.

Standard

ISO/IEC 23003-2:2018 - Information technology — MPEG audio technologies — Part 2: Spatial Audio Object Coding (SAOC) Released:12/12/2018

English language
172 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 23003-2:2018 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - MPEG audio technologies - Part 2: Spatial Audio Object Coding (SAOC)". This standard covers: This document specifies the reference model of the spatial audio object coding (SAOC) technology that is capable of recreating, modifying and rendering a number of audio objects based on a smaller number of transmitted channels and additional parametric data.

This document specifies the reference model of the spatial audio object coding (SAOC) technology that is capable of recreating, modifying and rendering a number of audio objects based on a smaller number of transmitted channels and additional parametric data.

ISO/IEC 23003-2:2018 is classified under the following ICS (International Classification for Standards) categories: 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 23003-2:2018 has the following relationships with other standards: It is inter standard links to ISO/IEC 23003-2:2010/Amd 4:2016, ISO/IEC 23003-2:2010/Amd 2:2015, ISO/IEC 23003-2:2010/Amd 3:2015, ISO/IEC 23003-2:2010/Amd 1:2015, ISO/IEC 23003-2:2010/Amd 5:2016, ISO/IEC 23003-2:2010, ISO/IEC 23003-2:2010/Cor 1:2012. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 23003-2:2018 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 23003-2
Second edition
2018-12
Information technology — MPEG
audio technologies —
Part 2:
Spatial Audio Object Coding (SAOC)
Technologies de l'information — Technologies audio MPEG —
Partie 2: Codage d'objet audio spatial (SAOC)
Reference number
©
ISO/IEC 2018
© ISO/IEC 2018
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2018 – All rights reserved

Contents Page
Foreword. v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Notations and abbreviated terms . 3
4.1 Notation . 3
4.2 Operations . 3
4.3 Constants . 3
4.4 Variables . 3
4.5 Abbreviated terms . 6
5 SAOC overview . 7
5.1 General . 7
5.2 Basic structure of the SAOC transcoder/decoder . 8
5.3 Tools and functionality . 10
5.4 Delay and synchronization . 11
5.5 SAOC Profiles and levels . 17
6 Syntax . 20
6.1 Payloads for SAOC . 20
6.2 Definition . 35
7 SAOC processing . 43
7.1 Compressed data stream decoding and dequantization of SAOC data . 43
7.2 Compressed data stream encoding and quantization of MPS data . 46
7.3 Time/frequency transforms . 47
7.4 Signals and parameters . 47
7.5 SAOC transcoding/decoding modes for baseline and LD profiles . 51
7.6 EAO processing for baseline and LD profiles . 64
7.7 SAOC-DE profile decoding modes. 73
7.8 DCU processing . 75
7.9 Modification range control for SAOC-DE processing modes . 79
7.10 MBO processing . 80
7.11 MCU Combiner . 81
7.12 Effects . 83
7.13 Low power SAOC processing . 86
7.14 Low delay SAOC processing . 87
8 Transport of SAOC side information . 89
8.1 Overview . 89
8.2 Transport and signalling in an MPEG environment . 89
8.3 Transport of SAOC data over PCM channels . 93
9 Transport of predefined rendering information . 94
9.1 General . 94
9.2 Rendering information description file format . 95
10 Conformance testing . 96
10.1 General . 96
10.2 Terms and definitions . 96
10.3 SAOC conformance testing . 96
10.4 Bitstreams . 96
© ISO/IEC 2018 – All rights reserved iii

10.5 SAOC decoder/transcoder . 105
11 Reference software . 119
11.1 Reference software structure . 119
Annex A (normative) Tables . 121
Annex B (normative) Low delay MPEG surround . 150
Annex C (informative) Effects processing . 161
Annex D (informative) Encoder . 163
Annex E (informative) Guidelines for rendering matrix specification . 167
Annex F (informative) MCU combiner . 169
Annex G (informative) Reference software . 171

iv © ISO/IEC 2018 – All rights reserved

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC list of patent
declarations received (see http://patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see:
www.iso.org/iso/foreword.html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This second edition cancels and replaces the first edition (ISO/IEC 23003-2:2010), which has been
technically revised. It also incorporates the Amendments ISO/IEC 23003-2:2010/Amd 1:2015,
ISO/IEC 23003-2:2010/Amd 2:2015, ISO/IEC 23003-2:2010/Amd 3:2015, ISO/IEC 23003-2:2010/
Amd 4:2016 and ISO/IEC 23003-2:2010/Amd 5:2016 and the Technical Corrigenda
ISO/IEC 23003-2:2010/Cor 1:2012 and ISO/IEC 23003-2:2010/Cor 2:2014.
The main changes compared to the previous edition are as follows:
— clarifications on SAOC-DE profile description;
— corrections to SAOC-DE profile specification;
— corrections to SAOC-DE profile;
— corrections to MPEG SAOC IS text;
— corrections to the low power mode.
A list of all parts in the ISO/IEC 23003 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
© ISO/IEC 2018 – All rights reserved v

Introduction
In the preferred modes of operating, the SAOC system, the transmitted signal can be either mono, stereo
or 3-channel. The audio objects can be represented by a mono, stereo, or 3-channel signal or have the
MPEG surround (MPS) multi-channel background object (MBO) format. The additional parametric data
exhibits a significantly lower data rate than required for transmitting all objects individually, making
the coding very efficient. At the same time, this ensures compatibility of the transmitted signal with
legacy devices.
When a multi-channel rendering setup (e.g. a 5.1 loudspeaker setup) is required, the SAOC system acts
as a transcoder, converting the additional parametric data to MPS parameters, and interfaces to the
MPS decoder that acts as rendering device. For certain rendering setups (e.g. a binaural or plain stereo
setup), the SAOC system behaves as a decoder, using its own rendering engine. Another key feature is
that the SAOC parametric data from different streams can be merged at parameter level to allow for the
combination of SAOC streams, similar to the functionality of a multi-point control unit (MCU).
The International Organization for Standardization (ISO) and International Electrotechnical
Commission (IEC) draw attention to the fact that it is claimed that compliance with this document may
involve the use of a patent.
ISO and IEC take no position concerning the evidence, validity and scope of this patent right. The holder
of this patent right has assured ISO and IEC that he/she is willing to negotiate licences under reasonable
and non-discriminatory terms and conditions with applicants throughout the world. In this respect, the
statement of the holder of this patent right is registered with ISO and IEC. Information may be obtained
from:
Qualcomm Incorporated
6455 Lusk Blvd
US-San Diego, CA 92121-2779
Fraunhofer Institute for Integrated Circuits IIS
Leonrodstrasse 68
DE-80636 München
LG Electronics
16 Woomyeon-Dong Seocho-Gu
KR-Seoul 137-724
Koninklijke Philips Electronics N.V.
High Tech Campus 44
NL-5656 AE , Eindhoven
Electronics and Telecommunications Research Institute
161 Gajeong-dong Yuseong-gu
KR-Daejeon 305-350
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights other than those identified above. ISO and IEC shall not be held responsible for identifying
any or all such patent rights.
vi © ISO/IEC 2018 – All rights reserved

Information technology — MPEG audio technologies — Part 2:
Spatial Audio Object Coding (SAOC)
1 Scope
This document specifies the reference model of the spatial audio object coding (SAOC) technology that
is capable of recreating, modifying and rendering a number of audio objects based on a smaller number
of transmitted channels and additional parametric data.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 23003-1:2007, Information technology — MPEG audio technologies — Part 1: MPEG Surround
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
audio object
input audio signal consisting of one, two or multiple channels, including multi-channel background
object (MBO)
3.2
frame
time segment (3.15) to which SAOC processing is applied according to the data conveyed in the
corresponding SAOCFrame() or SAOCDEFrame() syntax elements
3.3
hybrid filterbank
structure, consisting of a quadrature mirror filter (QMF) bank and oddly modulated Nyquist filter banks,
used to transform time domain signals into hybrid subband (3.5) samples
3.4
hybrid filtering
filtering step on a quadrature mirror filter (QMF) subband signal resulting in multiple hybrid subbands
(3.5)
Note 1 to entry: The resulting hybrid subbands can be non-consecutive in frequency.
3.5
hybrid subband
subband obtained after hybrid filtering (3.4) of a quadrature mirror filter (QMF) subband
Note 1 to entry: The hybrid subband can have the same time/frequency resolution as a QMF subband.
© ISO/IEC 2018 – All rights reserve 1

3.6
input channel
input audio channel corresponding to the channels of an audio object (3.1)
3.7
output channel
audio channel corresponding to a specific speaker
Note 1 to entry: Channel abbreviations and loudspeaker positions are given in Table 1.
3.8
parameter band
one or more hybrid subbands (3.5) applicable to one parameter
3.9
parameter time slot
specific time slot (3.16) for which the parameter is defined
3.10
parameter set
parameters associated with a specific parameter time slot (3.9)
3.11
parameter subset
parameters associated with a specific parameter time slot (3.9) and a specific one-to-two (OTT) box or
two-to-three (TTT) box
3.12
processing band
one or more hybrid subbands (3.5) defining the finest frequency resolution that could be controlled by
the parameters
3.13
QMF bank
bank of complex exponentially modulated filters
3.14
QMF subband
subband obtained after QMF filtering of a time-domain signal, without any additional hybrid filtering
stage
3.15
time segment
group of consecutive time slots (3.16)
3.16
time slot
finest resolution in time for spatial audio object (3.1) coding (SAOC) time borders
Note 1 to entry: One time slot equals one subsample in the hybrid quadrature mirror filter (QMF) domain.
2 © ISO/IEC 2018 – All rights reserved

4 Notations and abbreviated terms
4.1 Notation
The description of the SAOC system uses the following notations:
— vectors are indicated by bold lower-case names, e.g. vector;
— matrices (and vectors of vectors) are indicated by bold upper-case single letter names, e.g. M;
— variables are indicated by italic, e.g. variable;
— functions are indicated as func(x).
For equations (and flowcharts), normal mathematical (and pseudo-code) interpretation is assumed
with no rounding or truncation unless explicitly stated.
4.2 Operations
4.2.1 Scalar operations
*
is the complex conjugate of x.
x
y = log (x) is the base-10 logarithm of x.
y = min(.,.) is the minimum value in the argument list.
y = max(.,.) is the maximum value in the argument list.
exp(x) is the exponential function of x.
4.2.2 Vector and matrix operations
mM=diag
( ) is main diagonal of matrix, M.
is equal to the sorted vector x, where the elements of x are sorted in ascending
yx=sort
( )
order.
y=trace()M is sum of all diagonal elements of matrix, M.
*
is the complex conjugate transpose of M.
M
4.3 Constants
−9
ε is a constant to avoid division by and logarithm of zero, e.g. ε = 10 .
0 is a matrix of size A × B consisting of zeros.
A × B
I is an identity matrix of size A × A.
A
4.4 Variables
lm,
is the virtual speaker transfer function, defined for binaural output channel, i,
a
iy,
audio object, y, and all parameter time slots, l, and processing bands, m.
is the downmix matrix.
D
© ISO/IEC 2018 – All rights reserved 3

is the three-dimensional matrix holding the dequantized, and mapped CLD data
D
CLD
for every OTT box, every parameter set and M bands.
proc
is the three-dimensional matrix holding the dequantized, and mapped ICC data
D
ICC
for every OTT or TTT box, every parameter set and M bands.
proc
are the three-dimensional matrices holding the dequantized, and mapped first
,
D D
CPC_1 CPC_2
and second CPC data for every TTT box, every parameter set and M bands.
proc
are the three-dimensional matrices holding the dequantized, and mapped first
D , D
CLD_1 CLD_2
and second CLD data for every TTT box, every parameter set and Mproc bands.
is the matrix holding the dequantized, and mapped DCLD data for every input
D
DCLD
channel and every parameter set.
is the matrix holding the dequantized, and mapped DMG data for every input
D
DMG
channel and every parameter set. If DMG data contains information for more

than one downmix channel, D is a three-dimensional matrix holding the
DMG
dequantized, and mapped DMG data for every input channel, every downmix
channel and every parameter set.
is the four-dimensional matrix holding the dequantized, and mapped IOC data
D
IOC
for every input channel pair, every parameter set and M bands.
proc
is the two-dimensional matrix holding the dequantized, and mapped NRG data
D
NRG
for the highest energy within every parameter set and M bands.
proc
is the three-dimensional matrix holding the dequantized, and mapped OLD data
D
OLD
for every input channel, every parameter set and M bands.
proc
is the three-dimensional matrix holding the dequantized, and mapped PDG data
D
PDG
for every downmix channel, every parameter set and M bands.
proc
is the downmix sub-matrix for BGOs.
D
BGO
is the downmix sub-matrix for FGOs.
D
FGO
m
is the HRTF parameter which represents the average level with respect to the
H
i ,,{}LR
left and right ear {L, R} for the HRTF database index i, and all processing bands
m.
idxXXX(.,.) is a three-dimensional matrix holding the Huffman, and delta decoded indices.
XXX can be any of OLD, IOC, NRG, DCLD, DMG, PDG.
K is the number of hybrid subbands.

L is the number of parameter sets.

M is the number of downmix channels.

M is the number of processing bands.
proc
M is the number of QMF subbands depending on sampling frequency.
QMF
4 © ISO/IEC 2018 – All rights reserved

lm,
is the OTN/TTN upmix matrix for the prediction mode of operation.
M
lm,
is the OTN/TTN upmix matrix for the energy mode of operation.
M
Energy
nk, nk,
are the time and frequency variant pre-matrices, defined for all time slots, n,
M M
1 2
,
and all hybrid subbands, k.
lm,
is the time and frequency variant rendering matrix, defined for all parameter
M
ren
time slots, l, and all processing bands, m.
lm,
is the time and frequency variant parametric processing matrix, defined for all
G
DE
parameter time slots, l, and all processing bands, m.
lm,
is the time and frequency variant residual processing matrix, defined for all
M
DE
parameter time slots, l, and all processing bands, m.
is the modification gain for BGOs.
m
BGO
is the modification gain for FGOs.
m
FGO
is the decoder limited modification gain.
m
G
input
is the input modification gain.
m
G
N is the number of SAOC input channels of audio objects.

is the number of FGOs.
N
FGO
is the number of EAO channels.
N
EAO
is the number of MPS output channels.
N
MPS
is the number of different HRTFs in the HRTF database.
N
HRTF
is the number of groups of downmix signals.
N
g
q
is the number of downmix signals assigned to group g , defined for all group
N q
g
indices, q.
g is a vector with the indices of the downmix signals assigned to the same group,
q
defined for all group indices, q.
P is the frame length.
lm,
is the time and frequency variant matrix including ADGs, defined for all
W
ADG
parameter time slots, l, and all processing bands, m.
lm,
is the time and frequency variant sub-rendering matrix, defined for OTT box, h,
W
h
(of the MPS “5-1-5” tree-structure), all parameter time slots, l, and all
processing bands, m.
© ISO/IEC 2018 – All rights reserved 5

lm,
is the time and frequency variant matrix including PDGs, defined for all
W
PDG
parameter time slots, l, and all processing bands, m.
nk,
is a vector with the hybrid subband (encoder) input channels, defined for all
s
time slots, n, and all hybrid subbands, k.
nk,
is a vector with the hybrid subband (transcoder/decoder) input signals
x
(downmix and residuals), defined for all time slots, n, and all hybrid subbands,
k.
nk,
is a vector with the (transcoder/decoder) output hybrid subband signals, which
y
are fed into the hybrid synthesis filter banks, defined for all time slots, n, and all
hybrid subbands, k.
m
is the HRTFs parametric representation of the average phase difference,
φ
i
defined for the HRTF database index, i, and all processing bands, m.
4.5 Abbreviated terms
ADG arbitrary downmix gain
BGO background object
CLD channel level difference; describes the energy difference between two channels
CPC channel prediction coefficient; used for recreating three or more channels from two
channels
DCLD downmix channel level difference; describes the gain differences of objects contributing to
the left and right downmix channel in case of a stereo downmix
DCU distortion control unit
DE dialogue enhancement
DMG downmix gain; gains applied to each object before downmixing
EAO enhanced audio object
FGO foreground object
HRTF head related transfer function
ICC inter channel correlation; describes the correlation between two channels
IOC inter object correlation; describes the correlation between two channels of audio objects
LD low delay
MBO multi-channel background object
MCU multi-point control unit
MPS mpeg surround
6 © ISO/IEC 2018 – All rights reserved

N/A not applicable
NRG absolute object energy; specifies the absolute energy of the object with the highest energy
for the corresponding parameter band
OLD object level difference, describes intensity differences between one object and the object
with the highest energy for the corresponding parameter band
OTN conceptual “One-To-N” unit that takes one channel as input and produces N channels as
output
OTT conceptual “One-To-Two” unit that takes one channel as input and produces two channels
as output
PDG post(processing) downmix gains; describes intensity differences between the encoder-
generated downmix and the post(processed) downmix for the corresponding parameter
band
QMF quadrature mirror filter
SAC spatial audio coding
SAOC spatial audio object coding
TTN conceptual "Two-To-N" unit that takes two channels as input and produces N channels as
output
TTT conceptual "Two-To-Three" unit that takes two channels as input and produces three
channels as output
Table 1 — Channel abbreviations and loudspeaker positions
Channel abbreviation Loudspeaker position Figure
L Left front
L C R
R Right front
LFE
C Center front
LFE Low frequency enhancement
Ls Left surround
Ls Rs
Rs Right surround
5 SAOC overview
5.1 General
Spatial audio object coding (SAOC) is a parametric multiple object coding technique. It is designed to
transmit a number of audio objects in an audio signal that comprises M channels. Together with this
backwards compatible downmix signal, object parameters are transmitted that allow for recreation and
manipulation of the original object signals. An SAOC encoder produces a downmix of the object signals
at its input and extracts these object parameters. The number of objects that can be handled is in
principle not limited.
© ISO/IEC 2018 – All rights reserved 7

The object parameters are quantized and coded efficiently into an SAOC bitstream.
The downmix signal can be compressed and transmitted without the need to update existing coders and
infrastructures. The object parameters, or SAOC side information, are transmitted in a low bitrate side
channel, e.g. the ancillary data portion of the downmix bitstream.
On the decoder side, the input objects are reconstructed and at the same time rendered to a certain
number of playback channels. The rendering information containing reproduction level and panning
position for each object is user supplied or can be extracted from the SAOC bitsream (e.g. preset
information). The rendering information can be time variant. Output scenarios can range from mono to
multi-channel (e.g. 5.1) and are independent from both, the number of input objects and the number of
downmix channels. Binaural rendering of objects is possible including azimuth and elevation of virtual
object positions. An optional effects interface allows for advanced manipulation of object signals,
besides level and panning modification.
The objects themselves can be mono signals, stereophonic signals, as well as multi-channel signals (e.g.
5.1 channels). Typical downmix configurations are mono and stereo.
5.2 Basic structure of the SAOC transcoder/decoder
The SAOC transcoder/decoder module described below may act either as a stand-alone decoder or as a
transcoder from an SAOC to an MPS bitstream, depending on the intended output channel configuration.
Table 2 illustrates the differences between the two modes of operation.
Table 2 — Operation modes of the SAOC
SAOC
Output signal # of output # of input SAOC module MPS decoder
module
configuration channels channels output required
mode
Mono/stereo/binaural/
1, 2 or 3 1, 2 or 3 Decoder PCM output No
3-channel configuration
Multi-channel MPS bitstream,
>2 1 or 2 Transcoder Yes
configuration downmix signal
Figure 1 shows the basic structure of the SAOC transcoder/decoder architecture. The residual
processor extracts the EAOs from the incoming downmix using the residual information contained in
the SAOC bitstream. The downmix pre-processor processes the regular audio objects. The EAOs and
processed regular audio objects are combined to the output signal for the SAOC decoder mode or to the
MPS downmix signal for the SAOC transcoder mode. The detailed descriptions of these processing
blocks are given in the corresponding subclauses, namely, 7.5 and 7.5.4, which describe the SAOC
transcoder/decoder functionality and 7.6 explains handling of enhanced audio objects and residual
processing.
8 © ISO/IEC 2018 – All rights reserved

Downmix processor
EAOs
Output / MPS downmix
Residual
Downmix signal
signal
processor
SAOC downmix
pre-processor
Parameter processor
SAOC bitstream
SAOC
MPS bitstream
Rendering matrix
parameter
processor
HRTF parameters
Figure 1 — Overall structure of the SAOC transcoder/decoder architecture
Figure 2 (left) shows a block diagram of an SAOC transcoder unit. It consists of an SAOC parameter
processor and a downmix processor module. The SAOC parameter processor decodes the SAOC
bitstream and has furthermore a user interface from which it receives additional input in form of
generally time variant rendering information. It provides steering information for the downmix
processor. The SAOC transcoder outputs an MPS bitstream and downmix signal, as an input to the MPS
decoder. In case of a mono downmix, the downmix pre-processor leaves the downmix signal unchanged.
However, in case of a stereo downmix, it is functional to pre-process the downmix signal to allow more
flexible object panning than is supported by the MPS rendering engine alone. In case of a
mono/stereo/binaural/multi-channel output configuration, the SAOC system works in decoder mode
and MPS decoding is omitted [see Figure 2 (right)]. Here, the downmix processing module directly
provides the output signal.
SAOC transcoder SAOC decoder
MPS
Downmix Downmix Output
Downmix Downmix
downmix
processor processor
MPS
Output
decoder
SAOC MPS SAOC
SAOC SAOC
bitstream bitstream bitstream
parameter parameter
processor processor
Rendering Rendering HRTF
matrix matrix/gain parameters
Figure 2 — Block diagrams of the SAOC transcoder (left) and decoder (right) processing modes
© ISO/IEC 2018 – All rights reserved 9

5.3 Tools and functionality
5.3.1 General SAOC tools
5.3.1.1 Overview
The SAOC system incorporates a number of tools that allow for flexible complexity and/or quality trade-
off, as well as a diverse set of functionality. In the following subclauses, some key-features of SAOC are
briefly outlined.
5.3.1.2 Binaural decoding
The SAOC system can be operated in a binaural mode. This enables a multi-channel impression over
headphones by means of head related transfer function (HRTF) filtering.
5.3.1.3 Efficient multipoint control unit support
In order to use the SAOC concept for teleconferencing applications, a multipoint control unit (MCU)
functionality of combining the signals of several communication partners without decoding/re-
encoding the corresponding audio objects is provided. The MCU combines the input SAOC side
information streams into one common SAOC bitstream in a way that the parameters representing all
audio objects from the input bitstreams are included in the resulting output bitstream. These
calculations are performed in the parameter domain without the need to analyse the downmix signals
and, therefore, introduce no additional delay in the signal processing chain.
5.3.1.4 External downmix
The SAOC system is capable of handling not only encoder-generated downmixes but also
post(processed) downmixes supplied to the encoder in addition to the input audio object signals. In this
case, post downmix gains (PDGs) are calculated in the encoder and conveyed as a part of the SAOC
bitstream. The difference of the downmix signals is compensated for at the SAOC decoder side.
5.3.1.5 Multichannel background object
The audio input to a SAOC encoder can contain a so-called multi-channel background object (MBO).
Generally, the MBO can be considered as a complex sound scene involving a large and often unknown
number of sound sources, for which no controllable rendering functionality is required. The MBO is
represented by a downmix of the MPS encoded complex sound scene and corresponding MPS
parameters.
5.3.1.6 Enhanced audio object processing
A special "Karaoke-type" application scenario requires a total suppression of specific objects, typically
the lead vocals, while keeping the perceptual quality of the background sound scene unharmed. High
sound quality is assured by the incorporation of residual coding enabling a better separation of the
background object and foreground objects. The EAO processing mode supports reproduction of both
EAO and regular objects exclusively and arbitrary mixtures of these object groups.
5.3.1.7 Distortion control unit
The distortion control unit is incorporated into the SAOC system in order to provide a flexible control
for users and audio content providers over the SAOC rendering functionality and audio output quality.
10 © ISO/IEC 2018 – All rights reserved

5.3.1.8 Predefined rendering information
The SAOC system is capable of starting playback with some initial predefined settings which can be
stored and/or transmitted in SAOC bitstream. These settings can be dynamically updated. The SAOC
system allows instantaneous switching between them if more than one set of predefined settings is
available.
5.3.1.9 Effects interface
The SAOC effects interface operates on the downmix and therefore is part of the downmix processor of
the SAOC transcoder or decoder. The effects interface allows objects or linear combinations of objects to
be extracted from the downmix for effects processing. There are two types of effects processing
interfaces. The first type, referred to as the insert effects interface, allows effects processing to
individual objects in the downmix. The second type, referred to as the send effects interface, allows
effect processing on individual objects or linear combinations thereof.
5.3.2 High quality, low power and low delay
The SAOC decoder can be implemented in a high quality (HQ) version, a low power (LP) version and a
low delay (LD) version. The main differences are outlined by Table 3 and given in detail in 7.13 and
7.14.
Table 3 — Outline of difference between the HQ, LP and LD SAOC system
Tool or
HQ system LP system LD system
functionality
LD QMF,
Filterbank Complex valued QMF Partially complex valued QMF
no Nyquist filterbank
Tool operates on the real-valued part
Aliasing reduction Not applicable Not applicable
of the frequency range
Supported over the entire Only supported over the complex
Residual coding Not supported
frequency range valued part of the frequency range
Decorrelators Supported Not supported for stereo downmix Supported

5.4 Delay and synchronization
5.4.1 Overview
The SAOC decoder introduces a delay when processing the time domain signal coming from a downmix
decoder. Depending on whether the SAOC module is working as a decoder or as a transcoder for a
multichannel renderer (MC-SAOC), i.e. MPS decoder, two different cases are to be taken into account.
For all the different cases described in this subclause, the transmission of the SAOC side information
with respect to the transmission of the coded downmix signal is done in such a manner that there is no
need to further delay the downmix signal before the SAOC processing. This means that synchronization
of the spatial data and downmix is achieved at the SAOC decoder/transcoder, following the temporal
relationships described in Clause 8.
5.4.2 High quality and low power processing
5.4.2.1 SAOC decoding mode
The SAOC decoder (mono, stereo, 3-channel or binaural up-mix modes) introduces a total delay of
1 281 time domain samples for the high quality (HQ) mode and 1 601 samples for the low power (LP)
© ISO/IEC 2018 – All rights reserved 11

mode. As shown in Figure 3, the analysis filterbank as outlined in 7.3 introduces a delay of 704 samples,
while the synthesis filterbank introduces 257 for HQ and 577 for LP. The analysis filterbank processing
delay consists of the QMF and hybrid processing delays, 320 and 384 time domain samples, respectively.
If no real to complex conversion is performed, it shall be replaced by a delay line. This leads to 320 time
samples which are added on top the analysis processing delay for both HQ and LP decoders. The
synthesis filterbank introduces 257 samples delay which is introduced by the QMF synthesis filtering.
The hybrid synthesis does not introduce further delay. For the LP decoder, the complex to real
conversion adds 320 time domain samples to the synthesis processing delay.
SAOC spatial parameters
SAOC decoder
delay
T/F hybrid     SAOC hybrid     F/T

QMF analysis synthesis synthesis    QMF

cos to exp
exp to cos
320   320   384    0    0 HQ:   0  257
LP :    320
Figure 3 — Delay for the different parts of the SAOC decoder
5.4.2.2 SAOC transcoding mode
The MPS renderer introduces a delay when processing the downmix from the SAOC transcoder. Two
cases may be distinguished: mono and stereo downmix.
In case of a mono downmix, the signal from the downmix decoder is passed directly to the MPS decoder
as no further processing is applied to the downmix, as depicted in Figure 4. The MC-SAOC processing
delay shall be equal to the one given by the SAOC decoder, as the SAOC to MPS parameter processing
does not introduce any additional delay.
SAOC spatial parameters
MC-SAOC decoder
SAOC
transcoder
delay
T/F hybrid     MPS hybrid     F/T

QMF analysis synthesis synthesis QMF

cos to exp
exp to cos
320   320   384  0 HQ:   0  257
LP :    320
Figure 4 — Delay for the different parts of the MC-SAOC decoder with mono downmix signal
In case of a stereo downmix, the SAOC transcoder processes the core coder signal to adapt it to the
subsequent MPS decoding, as shown in Figure 5. The delay of the MPS renderer shall be added on top of
the SAOC processing delay. Both the SAOC transcoder and the MPS decoder introduce a delay of
12 © ISO/IEC 2018 – All rights reserved

1 281 samples for the HQ mode or of 1 601 samples for the LP mode. Their analysis/synthesis delay is
distributed as described above for the SAOC decoder.
If both modules are not integrated, i.e. they interface via the time domain, the total processing delay
shall be the sum of the delays introduced by each module plus buffering needed for synchronization of
the MPS parameters to the downmix.
The size of buffer B (spatial parameters buffer) is a multiple of the frame length. The downmix buffer A
is needed to synchronize the delayed bitstream and processed downmix. To achieve synchronization,
Formula (1) should be met; both buffer sizes are given in time domain samples:
B = N*Frame length so that B ≥ 1 281 (HQ) or B ≥ 1 601 (LP); N=0,1,2,3,…. (1)
A(HQ) = B – 1281 or A(LP) = B – 1 601
Given an interface via the hybrid QMF domain, the overall processing delay of the multi-channel
rendering is equal to the delay of the SAOC decoding mode.
SAOC transcoder
delay
T/F hybrid     SAOC hybrid     F/T

QMF analysis processing synthesis QMF

cos to exp
exp to cos
320   320  384    0 HQ:   0 257
LP : 320
B A
delay
F/T hybrid     MPS hybrid     T/F

QMF synthesis synthesis analysis QMF

cos to exp
exp to cos
257 HQ:   0   0     0  384  320 320
LP : 320
Time domain i
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...