Information technology - MPEG audio technologies - Part 2: Spatial Audio Object Coding (SAOC)

ISO/IEC 23003-2:2010 specifies the reference model of MPEG Spatial Audio Object Coding (SAOC): an efficient parametric coding technology designed to encode, transmit, and interactively render multiple audio objects for playback with various kinds of channel configurations (mono, stereo, 5.1, headphones/binaural). Rather than performing a discrete coding of the individual audio input signals, MPEG SAOC captures the perceptually relevant properties of audio signals into a compact set of parameters that are used to synthesize a flexibly rendered audio scene from a transmitted downmix signal. MPEG SAOC extends MPEG Surround in a way that provides several significant advantages in terms of additional functionality available to users. It allows the user on the decoding side to interactively control the multi-channel rendering of each individual audio object on different kinds of sound reproduction setup. In addition, MPEG SAOC inherits many advantages of MPEG Surround technology, like transmission (in a backward compatible way) of complex multi-object audio content at bitrates not much higher than what is required for its mono or stereo downmix. MPEG SAOC processing effectively reuses the multi-channel rendering functionality of MPEG Surround in a computationally efficient manner. Therefore, MPEG SAOC technology can be directly used to extend MPEG Surround and upgrade existing distribution infrastructures for stereo or mono audio content (teleconferencing systems, music downloads, Internet streaming, etc.) towards the delivery of audio content while retaining full compatibility with existing receivers. Rendering can be interactively controlled by the end-user and is independent of the playback system setup. Key features of MPEG SAOC are: interactive rendering of audio objects on the decoder/receiver side; transmitted SAOC bit stream is independent of loudspeaker (or headphones) configuration; low-power processing mode (e.g. for applications on portable devices); low-delay processing mode (e.g. for communication applications); flexibly selectable bitrate overhead, allowing scalability from low bitrate applications such as Internet streaming to high-quality applications such as custom remix of music; it can be applied upon audio using any coding scheme; backward compatibility: the default downmix is always available for legacy playback devices.

Technologies de l'information — Technologies audio MPEG — Partie 2: Codage d'objet audio spatial (SAOC)

General Information

Status
Withdrawn
Publication Date
05-Oct-2010
Withdrawal Date
05-Oct-2010
Current Stage
9599 - Withdrawal of International Standard
Start Date
12-Dec-2018
Completion Date
30-Oct-2025

Relations

Effective Date
26-Nov-2021
Effective Date
07-Aug-2021
Effective Date
31-Jul-2021
Effective Date
25-Apr-2020
Effective Date
23-Apr-2020
Effective Date
31-Jul-2021
Effective Date
28-Jan-2017
Standard

ISO/IEC 23003-2:2010 - Information technology -- MPEG audio technologies

English language
130 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 23003-2:2010 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - MPEG audio technologies - Part 2: Spatial Audio Object Coding (SAOC)". This standard covers: ISO/IEC 23003-2:2010 specifies the reference model of MPEG Spatial Audio Object Coding (SAOC): an efficient parametric coding technology designed to encode, transmit, and interactively render multiple audio objects for playback with various kinds of channel configurations (mono, stereo, 5.1, headphones/binaural). Rather than performing a discrete coding of the individual audio input signals, MPEG SAOC captures the perceptually relevant properties of audio signals into a compact set of parameters that are used to synthesize a flexibly rendered audio scene from a transmitted downmix signal. MPEG SAOC extends MPEG Surround in a way that provides several significant advantages in terms of additional functionality available to users. It allows the user on the decoding side to interactively control the multi-channel rendering of each individual audio object on different kinds of sound reproduction setup. In addition, MPEG SAOC inherits many advantages of MPEG Surround technology, like transmission (in a backward compatible way) of complex multi-object audio content at bitrates not much higher than what is required for its mono or stereo downmix. MPEG SAOC processing effectively reuses the multi-channel rendering functionality of MPEG Surround in a computationally efficient manner. Therefore, MPEG SAOC technology can be directly used to extend MPEG Surround and upgrade existing distribution infrastructures for stereo or mono audio content (teleconferencing systems, music downloads, Internet streaming, etc.) towards the delivery of audio content while retaining full compatibility with existing receivers. Rendering can be interactively controlled by the end-user and is independent of the playback system setup. Key features of MPEG SAOC are: interactive rendering of audio objects on the decoder/receiver side; transmitted SAOC bit stream is independent of loudspeaker (or headphones) configuration; low-power processing mode (e.g. for applications on portable devices); low-delay processing mode (e.g. for communication applications); flexibly selectable bitrate overhead, allowing scalability from low bitrate applications such as Internet streaming to high-quality applications such as custom remix of music; it can be applied upon audio using any coding scheme; backward compatibility: the default downmix is always available for legacy playback devices.

ISO/IEC 23003-2:2010 specifies the reference model of MPEG Spatial Audio Object Coding (SAOC): an efficient parametric coding technology designed to encode, transmit, and interactively render multiple audio objects for playback with various kinds of channel configurations (mono, stereo, 5.1, headphones/binaural). Rather than performing a discrete coding of the individual audio input signals, MPEG SAOC captures the perceptually relevant properties of audio signals into a compact set of parameters that are used to synthesize a flexibly rendered audio scene from a transmitted downmix signal. MPEG SAOC extends MPEG Surround in a way that provides several significant advantages in terms of additional functionality available to users. It allows the user on the decoding side to interactively control the multi-channel rendering of each individual audio object on different kinds of sound reproduction setup. In addition, MPEG SAOC inherits many advantages of MPEG Surround technology, like transmission (in a backward compatible way) of complex multi-object audio content at bitrates not much higher than what is required for its mono or stereo downmix. MPEG SAOC processing effectively reuses the multi-channel rendering functionality of MPEG Surround in a computationally efficient manner. Therefore, MPEG SAOC technology can be directly used to extend MPEG Surround and upgrade existing distribution infrastructures for stereo or mono audio content (teleconferencing systems, music downloads, Internet streaming, etc.) towards the delivery of audio content while retaining full compatibility with existing receivers. Rendering can be interactively controlled by the end-user and is independent of the playback system setup. Key features of MPEG SAOC are: interactive rendering of audio objects on the decoder/receiver side; transmitted SAOC bit stream is independent of loudspeaker (or headphones) configuration; low-power processing mode (e.g. for applications on portable devices); low-delay processing mode (e.g. for communication applications); flexibly selectable bitrate overhead, allowing scalability from low bitrate applications such as Internet streaming to high-quality applications such as custom remix of music; it can be applied upon audio using any coding scheme; backward compatibility: the default downmix is always available for legacy playback devices.

ISO/IEC 23003-2:2010 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 23003-2:2010 has the following relationships with other standards: It is inter standard links to ISO/IEC 23003-2:2010/Amd 4:2016, ISO/IEC 23003-2:2010/Amd 1:2015, ISO/IEC 23003-2:2010/Amd 5:2016, ISO/IEC 23003-2:2010/Amd 3:2015, ISO/IEC 23003-2:2010/Amd 2:2015, ISO/IEC 23003-2:2010/Cor 1:2012, ISO/IEC 23003-2:2018. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 23003-2:2010 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 23003-2
First edition
2010-10-01
Information technology —
MPEG audio technologies —
Part 2:
Spatial Audio Object Coding (SAOC)
Technologies de l'information — Technologies audio MPEG —
Partie 2: Codage d'objet audio spatial (SAOC)

Reference number
©
ISO/IEC 2010
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

©  ISO/IEC 2010
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2010 – All rights reserved

Contents Page
Foreword .v
Introduction.vi
1 Scope.1
2 Normative references.1
3 Terms and definitions .1
4 Symbols, notation and abbreviated terms.3
4.1 Notation .3
4.2 Operations.3
4.3 Constants .3
4.4 Variables.3
4.5 Abbreviated terms.5
5 SAOC overview.7
5.1 Introduction.7
5.2 Basic structure of the SAOC transcoder/decoder .7
5.3 Tools and functionality .9
5.4 Delay and synchronization.10
5.5 SAOC Profiles and Levels .15
6 Syntax.17
6.1 Payloads for SAOC.17
6.2 Definition .29
7 SAOC processing.34
7.1 Compressed data stream decoding and dequantization of SAOC data .34
7.2 Compressed data stream encoding and quantization of MPS data.38
7.3 Time/frequency transforms .39
7.4 Post(processing) downmix compensation.39
7.5 Signals and parameters.39
7.6 Transcoding modes .41
7.7 Decoding modes.49
7.8 EAO processing.53
7.9 DCU processing.61
7.10 MBO processing .65
7.11 MCU Combiner.66
7.12 Effects.67
7.13 Low Power SAOC processing.70
7.14 Low Delay SAOC processing .70
8 Transport of SAOC side information.73
8.1 Overview.73
8.2 Transport and signalling in an MPEG environment.73
8.3 Transport of SAOC data over PCM channels .77
9 Transport of predefined rendering information .78
9.1 Introduction.78
9.2 Rendering information description file format.79
Annex A (normative) Tables .80
Annex B (normative) Low Delay MPEG Surround.109
Annex C (informative) Effects processing.119
Annex D (informative) Encoder .121
© ISO/IEC 2010 – All rights reserved iii

Annex E (informative) Guidelines for rendering matrix specification .125
Annex F (informative) MCU Combiner.127
Annex G (informative) Patent statement.129
Bibliography .130

iv © ISO/IEC 2010 – All rights reserved

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
ISO/IEC 23003-2 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
ISO/IEC 23003 consists of the following parts, under the general title Information technology — MPEG audio
technologies:
— Part 1: MPEG Surround
— Part 2: Spatial Audio Object Coding (SAOC)

© ISO/IEC 2010 – All rights reserved v

Introduction
The International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC)
draw attention to the fact that it is claimed that compliance with this document may involve the use of patents.
ISO and IEC take no position concerning the evidence, validity and scope of these patent rights.
The holders of these patent rights have assured ISO and IEC that they are willing to negotiate licences under
reasonable and non-discriminatory terms and conditions with applicants throughout the world. In this respect,
the statements of the holders of these patent rights are registered with ISO and IEC. Information may be
obtained from the companies listed in Annex G.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights other than those identified in Annex G. ISO and IEC shall not be held responsible for identifying any or
all such patent rights.
vi © ISO/IEC 2010 – All rights reserved

INTERNATIONAL STANDARD ISO/IEC 23003-2:2010(E)

Information technology — MPEG audio technologies —
Part 2:
Spatial Audio Object Coding (SAOC)
1 Scope
This part of ISO/IEC 23003 specifies the reference model of the Spatial Audio Object Coding (SAOC)
technology that is capable of recreating, modifying and rendering a number of audio objects based on a
smaller number of transmitted channels and additional parametric data. In the preferred modes of operating
the SAOC system, the transmitted signal can be either mono or stereo. The audio objects can be represented
by a mono and stereo signal or have the MPEG Surround (MPS) Multi-channel Background Object (MBO)
format. The additional parametric data exhibits a significantly lower data rate than required for transmitting all
objects individually, making the coding very efficient. At the same time this ensures compatibility of the
transmitted signal with legacy devices.
When a multi-channel rendering setup (e.g. a 5.1 loudspeaker setup) is required, the SAOC system acts as a
transcoder, converting the additional parametric data to MPS parameters, and interfaces to the MPS decoder
that acts as rendering device. For certain rendering setups (e.g. a binaural or plain stereo setup), the SAOC
system behaves as a decoder, using its own rendering engine. Another key feature is that the SAOC
parametric data from different streams can be merged at parameter level to allow for the combination of
SAOC streams, similar to the functionality of a Multi-point Control Unit (MCU).
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO/IEC 13818-7:2006, Information technology — Generic coding of moving pictures and associated audio
information — Part 7: Advanced Audio Coding (AAC)
ISO/IEC 14496-3:2009, Information technology — Coding of audio-visual objects — Part 3: Audio
ISO/IEC 23003-1:2007, Information technology — MPEG audio technologies — Part 1: MPEG Surround
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
audio object
input audio signal consisting of one, two or multiple channels, including Multi-channel Background Object
(MBO)
3.2
frame
time segment to which SAOC processing is applied according to the data conveyed in the corresponding
SAOCFrame() syntax element
© ISO/IEC 2010 – All rights reserved 1

3.3
hybrid filterbank
hybrid filter bank structure, consisting of a quadrature mirror filter (QMF) bank and oddly modulated Nyquist
filter banks, used to transform time domain signals into hybrid subband samples
3.4
hybrid filtering
filtering step on a quadrature mirror filter (QMF) subband signal resulting in multiple hybrid subbands
NOTE The resulting hybrid subbands can be non-consecutive in frequency.
3.5
hybrid subband
subband obtained after hybrid filtering of a quadrature mirror filter (QMF) subband
NOTE The hybrid subband can have the same time/frequency resolution as a QMF subband.
3.6
input channel
input audio channel corresponding to the channels of an audio object
3.7
output channel
audio channel corresponding to a specific speaker
NOTE Channel abbreviations and loudspeaker positions are given in Table 1.
3.8
parameter band
one or more hybrid subbands applicable to one parameter
3.9
parameter time slot
specific time slot for which the parameter is defined
3.10
parameter set
parameters associated with a specific parameter time slot
3.11
parameter subset
parameters associated with a specific parameter time slot and a specific One-To-Two (OTT) box or
Two-To-Three (TTT) box
3.12
processing band
one or more hybrid subbands defining the finest frequency resolution that could be controlled by the
parameters
3.13
QMF bank
bank of complex exponentially modulated filters
3.14
QMF subband
subband obtained after QMF filtering of a time-domain signal, without any additional hybrid filtering stage
3.15
time segment
group of consecutive time slots
2 © ISO/IEC 2010 – All rights reserved

3.16
time slot
finest resolution in time for spatial audio coding (SAC) time borders
NOTE One time slot equals one subsample in the hybrid quadrature mirror filter (QMF) domain.
4 Symbols, notation and abbreviated terms
4.1 Notation
The description of the SAOC system uses the following notation:
• Vectors are indicated by bold lower-case names, e.g. vector.
• Matrices (and vectors of vectors) are indicated by bold upper-case single letter names, e.g. M.
• Variables are indicated by italic, e.g. variable.
• Functions are indicated as func(x).
For equations (and flowcharts), normal mathematical (and pseudo-code) interpretation is assumed with no
rounding or truncation unless explicitly stated.
4.2 Operations
4.2.1 Scalar operations
*
X is the complex conjugate of X .
y = log x is the base-10 logarithm of x .
()
y = min……, is the minimum value in the argument list.
()
y = max……, is the maximum value in the argument list.
()
4.2.2 Vector and matrix operations
mM= diag is diagonal of matrix M .
()
yx= sort is equal to the sorted vector x, where the elements of x are sorted in
()
ascending order.
yt=race()M is sum of all diagonal elements of matrix M .
4.3 Constants
−9
ε is an additive constant to avoid division by zero, e.g. ε = 10 .
4.4 Variables
lm,
a is the virtual speaker transfer function, defined for binaural output channel
iy,
i , audio object y and all parameter time slots l and processing bands m .
D is the downmix matrix.
D is the three dimensional matrix holding the dequantized, and mapped CLD
CLD
data for every OTT box, every parameter set, and M bands.
proc
© ISO/IEC 2010 – All rights reserved 3

D is the three dimensional matrix holding the dequantized, and mapped ICC
ICC
data for every OTT or TTT box, every parameter set, and M bands.
proc
D , D are the three dimensional matrices holding the dequantized, and mapped
CPC_1 CPC_2
first and second CPC data for every TTT box, every parameter set, and
M bands.
proc
D D are the three dimensional matrices holding the dequantized, and mapped
CLD_1 CLD_2
,
first and second CLD data for every TTT box, every parameter set, and
M bands.
proc
D is the two dimensional matrix holding the dequantized, and mapped DCLD
DCLD
data for every input channel, and every parameter set.
D
DMG
is the two dimensional matrix holding the dequantized, and mapped DMG
data for every input channel, and every parameter set.
D
IOC
is the four dimensional matrix holding the dequantized, and mapped IOC
data for every input channel pair, every parameter set, and M bands.
proc
D
NRG
is the two dimensional matrix holding the dequantized, and mapped NRG
data for the highest energy within every parameter set, and M bands.
proc
D
OLD
is the three dimensional matrix holding the dequantized, and mapped OLD
data for every input channel, every parameter set, and M bands.
proc
D is the three dimensional matrix holding the dequantized, and mapped PDG
PDG
data for every downmix channel, every parameter set, and M bands.
proc
m
H is the HRTF parameter which represents the average level with respect to
iL,{ ,R}
the left and right ear {L, R} for the HRTF database index i and all
processing bands m .
idxXXX()……, is a three dimensional matrix holding the Huffman and delta decoded
indices. XXX can be any of OLD, IOC, NRG, DCLD, DMG, PDG.
K is the number of hybrid subbands.
L is the number of parameter sets.
M is the number of downmix channels.
M is the number of processing bands.
proc
M is the number of QMF subbands depending on sampling frequency.
QMF
lm,
M is the OTN/TTN upmix matrix for the prediction mode of operation
lm,
M is the OTN/TTN upmix matrix for the energy mode of operation
Energy
nk, nk,
M M are the time and frequency variant pre-matrices, defined for all time slots
1 2
,
n and all hybrid subbands k .
lm,
M is the time and frequency variant rendering matrix, defined for all
ren
parameter time slots l and all processing bands m .
N
is the number of SAOC input channels of audio objects.
4 © ISO/IEC 2010 – All rights reserved

N is the number of EAO channels.
EAO
N is the number of MPS output channels.
MPS
N is the number of different HRTFs in the HRTF database.
HRTF
P frame length.
lm,
W is the time and frequency variant matrix including ADGs, defined for all
ADG
parameter time slots l and all processing bands m .
lm,
W is the time and frequency variant sub-rendering matrix, defined for OTT
h
box h (of the MPS “5-1-5” tree-structure), all parameter time slots l and
all processing bands m .
lm,
W is the time and frequency variant matrix including PDGs, defined for all
PDG
parameter time slots l and all processing bands m .
nk,
s is a vector with the hybrid subband (encoder) input channels, defined for
all time slots n and all hybrid subbands k .
nk,
x is a vector with the hybrid subband (transcoder/decoder) input signals
(downmix and residuals), defined for all time slots n and all hybrid
subbands k .
nk,
y is a vector with the (transcoder/decoder) output hybrid subband signals,
which are fed into the hybrid synthesis filter banks, defined for all time slots
n and all hybrid subbands k .
m
φ is the HRTFs parametric representation of the average phase difference,
i
defined for the HRTF database index i and all processing bands m .

4.5 Abbreviated terms
ADG Arbitrary Downmix Gain
CLD Channel Level Difference, describes the energy difference between two
channels
CPC Channel Prediction Coefficient, used for recreating three or more channels
from two channels
DCLD Downmix Channel Level Difference describes the gain differences of
objects contributing to the left and right downmix channel in case of a
stereo downmix
DCU Distortion Control Unit
DMG DownMix Gain, gains applied to each object before downmixing
EAO Enhanced Audio Object
HRTF Head Related Transfer Function
ICC Inter Channel Correlation, describes the correlation between two channels
IOC Inter Object Correlation, describes the correlation between two channels of
audio objects
LD Low Delay
© ISO/IEC 2010 – All rights reserved 5

MBO Multi-channel Background Object
MCU Multi-point Control Unit
MPS MPEG Surround
N/A Not Applicable
NRG absolute object eNeRGy, specifies the absolute energy of the object with
the highest energy for the corresponding frequency band
OLD Object Level Difference, describes intensity differences between one
object and the object with the highest energy for the corresponding
frequency band
OTN conceptual “One-To-N” unit that takes one channel as input and produces
N channels as output
OTT conceptual “One-To-Two” unit that takes one channel as input and
produces two channels as output
PDG Post(processing) Downmix Gains, describes intensity differences between
the encoder-generated downmix and the post(processed) downmix for the
corresponding frequency band
QMF Quadrature Mirror Filter
SAC Spatial Audio Coding
SAOC Spatial Audio Object Coding
TTN conceptual "Two-To-N" unit that takes two channels as input and produces
N channels as output
TTT conceptual "Two-To-Three" unit that takes two channels as input and
produces three channels as output
Table 1 – Channel abbreviations and loudspeaker positions
Channel abbreviation Loudspeaker position Figure
L Left Front
R Right Front
C Center Front
LFE Low Frequency Enhancement
Ls Left Surround
Rs Right Surround
6 © ISO/IEC 2010 – All rights reserved

5 SAOC overview
5.1 Introduction
Spatial Audio Object Coding (SAOC) is a parametric multiple object coding technique. It is designed to
transmit a number of audio objects in an audio signal that comprises M channels. Together with this
backwards compatible downmix signal, object parameters are transmitted that allow for recreation and
manipulation of the original object signals. An SAOC encoder produces a downmix of the object signals at its
input and extracts these object parameters. The number of objects that can be handled is in priniciple not
limited.
The object parameters are quantized and coded efficiently into an SAOC bitstream.
The downmix signal can be compressed and transmitted without the need to update existing coders and
infrastructures. The object parameters, or SAOC side information, are transmitted in a low bitrate side channel,
e.g. the ancillary data portion of the downmix bitstream.
On the decoder side, the input objects are reconstructed and at the same time rendered to a certain number
of playback channels. The rendering information containing reproduction level and panning position for each
object is user supplied or can be extracted from the SAOC bitsream (e.g. preset information). The rendering
information can be time variant. Output scenarios can range from mono to multi-channel (e.g. 5.1) and are
independent from both, the number of input objects and the number of downmix channels. Binaural rendering
of objects is possible including azimuth and elevation of virtual object positions. An optional effects interface
allows for advanced manipulation of object signals, besides level and panning modification.
The objects themselves can be mono signals, stereophonic signals, as well as multi-channel signals (e.g. 5.1
channels). Typical downmix configurations are mono and stereo.
5.2 Basic structure of the SAOC transcoder/decoder
The SAOC transcoder/decoder module described below may act either as a stand-alone decoder or as a
transcoder from an SAOC to an MPS bitstream, depending on the intended output channel configuration. The
following table illustrates the differences between the two modes of operation:
Table 2 – Operation modes of the SAOC
Output signal # of output SAOC module SAOC module MPS decoder
configuration channels mode output required
mono/stereo/binaural 1 or 2 Decoder PCM output No
multi-channel MPS bitstream,
> 2 Transcoder Yes
configuration downmix signal
Figure 1 shows the basic structure of the SAOC transcoder/decoder architecture. The residual processor
extracts the EAOs from the incoming downmix using the residual information contained in the SAOC bitstream.
The downmix pre-processor processes the regular audio objects. The EAOs and processed regular audio
objects are combined to the output signal for the SAOC decoder mode or to the MPS downmix signal for the
SAOC transcoder mode. The detailed descriptions of these processing blocks are given in the corresponding
subclauses, namely, 7.6 and 7.7 describe the SAOC transcoder/decoder functionality and 7.8 explains
handling of extended audio objects and residual processing.
© ISO/IEC 2010 – All rights reserved 7

Downmix processor
EAOs
Output / MPS downmix
Residual
Downmix signal
signal
processor
SAOC downmix
pre-processor
Parameter processor
SAOC bitstream
SAOC
MPS bitstream
Rendering matrix
parameter
processor
HRTF parameters
Figure 1 – Overall structure of the SAOC transcoder/decoder architecture
Figure 2 (left) shows a block diagram of an SAOC transcoder unit. It consists of an SAOC parameter
processor and a downmix processor module. The SAOC parameter processor decodes the SAOC bitstream
and has furthermore a user interface from which it receives additional input in form of generally time variant
rendering information. It provides steering information for the downmix processor. The SAOC transcoder
outputs an MPS bitstream and downmix signal, as an input to the MPS decoder. In case of a mono downmix,
the downmix pre-processor leaves the downmix signal unchanged. However, in case of a stereo downmix, it is
functional to pre-process the downmix signal to allow more flexible object panning than is supported by the
MPS rendering engine alone. In case of a mono/stereo/binaural output configuration the SAOC system works
in decoder mode and MPS decoding is omitted, see Figure 2 (right). Here the downmix processing module
directly provides the output signal.
SAOC transcoder SAOC decoder
MPS
Downmix Downmix Downmix Downmix Output
downmix
processor processor
MPS
Output
decoder
SAOC MPS SAOC
SAOC SAOC
bitstream bitstream bitstream
parameter parameter
processor processor
Rendering Rendering HRTF
matrix matrix parameters
Figure 2 – Block diagrams of the SAOC transcoder (left) and decoder (right) processing modes

8 © ISO/IEC 2010 – All rights reserved

5.3 Tools and functionality
5.3.1 General SAOC tools
5.3.1.1 Introduction
The SAOC system incorporates a number of tools that allow for flexible complexity and/or quality trade-off, as
well as a diverse set of functionality. In the following subclauses some key-features of SAOC are briefly
outlined.
5.3.1.2 Binaural decoding
The SAOC system can be operated in a binaural mode. This enables a multi-channel impression over
headphones by means of Head Related Transfer Function (HRTF) filtering.
5.3.1.3 Efficient multipoint control unit support
In order to use the SAOC concept for teleconferencing applications a Multipoint Control Unit (MCU)
functionality of combining the signals of several communication partners without decoding/re-encoding the
corresponding audio objects is provided. The MCU combines the input SAOC side information streams into
one common SAOC bitstream in a way that the parameters representing all audio objects from the input
bitstreams are included in the resulting output bitstream. These calculations are performed in the parameter
domain without the need to analyze the downmix signals and, therefore, introduce no additional delay in the
signal processing chain.
5.3.1.4 External downmix
The SAOC system is capable of handling not only encoder-generated downmixes but also post(processed)
downmixes supplied to the encoder in addition to the input audio object signals. In this case, Post Downmix
Gains (PDGs) are calculated in the encoder and conveyed as a part of the SAOC bitstream. The difference of
the downmix signals is compensated for at the SAOC decoder side.
5.3.1.5 Multichannel background object
The audio input to a SAOC encoder can contain a so-called Multi-channel Background Object (MBO).
Generally, the MBO can be considered as a complex sound scene involving a large and often unknown
number of sound sources, for which no controllable rendering functionality is required. The MBO is
represented by a downmix of the MPS encoded complex sound scene and corresponding MPS parameters.
5.3.1.6 Enhanced audio object processing
A special "Karaoke-type" application scenario requires a total suppression of specific objects, typically the
lead vocals, while keeping the perceptual quality of the background sound scene unharmed. High sound
quality is assured by the incorporation of residual coding enabling a better separation of the background
object and foreground objects. The current EAO processing mode supports reproduction of both EAO and
regular objects exclusively and arbitrary mixtures of these object groups.
5.3.1.7 Distortion control unit
The distortion control unit is incorporated into the SAOC system in order to provide a flexible control for users
and audio content providers over the SAOC rendering functionality and audio output quality.
5.3.1.8 Predefined rendering information
The SAOC system is capable of starting playback with some initial predefined settings which can be stored
and/or transmitted in SAOC bitstream. These settings can be dynamically updated. The SAOC system allows
instantaneous switching between them if more than one set of predefined settings is available.
© ISO/IEC 2010 – All rights reserved 9

5.3.1.9 Effects interface
The SAOC effects interface operates on the downmix and therefore is part of the downmix processor of the
SAOC transcoder or decoder. The effects interface allows objects or linear combinations of objects to be
extracted from the downmix for effects processing. Two types of effects processing interfaces are supported.
The first type, referred to as the insert effects interface, allows effects processing to individual objects in the
downmix. The second type, referred to as the send effects interface, allows effect processing on individual
objects or linear combinations thereof.
5.3.2 High Quality, Low Power and Low Delay
The SAOC decoder can be implemented in a High Quality (HQ) version, a Low Power (LP) version, and a Low
Delay (LD) version. The main difference are outlined by the following Table 3 and given in detail in 7.13 and
7.14.
Table 3 – Outline of difference between the HQ, LP and LD SAOC system
Tool or
HQ system LP system LD system
functionality
LD QMF,
Filterbank Complex valued QMF Partially complex valued QMF
no Nyquist FB
Tool operates on the real-valued part
Aliasing reduction Not Applicable Not Applicable
of the frequency range
Supported over the Only supported over the complex
Residual coding Not supported
entire frequency range valued part of the frequency range
Decorrelators No Decorrelator for stereo downmix
5.4 Delay and synchronization
5.4.1 Overview
The SAOC decoder introduces a delay when processing the time domain signal coming from a downmix
decoder. Depending on whether the SAOC module is working as a decoder or as a transcoder for a
Multichannel (MC) renderer, i.e. MPS decoder, two different cases are to be taken into account.
For all the different cases described in this subclause, the transmission of the SAOC side information with
respect to the transmission of the coded downmix signal is done in such a manner that there is no need to
further delay the downmix signal before the SAOC processing. This means that synchronization of the spatial
data and downmix is achieved at the SAOC decoder/transcoder, following the temporal relationships
described in Clause 8.
5.4.2 High Quality and Low Power processing
5.4.2.1 SAOC Decoding mode
The SAOC decoder (mono, stereo or binaural up mix modes) introduces a total delay of 1281 time domain
samples for the High Quality (HQ) mode and 1601 samples for the Low Power (LP) mode. As shown in
Figure 3 the analysis filterbank as outlined in 7.3 introduces a delay of 704 samples, while the synthesis
filterbank introduces 257 for HQ and 577 for LP. The analysis filterbank processing delay consists of the QMF
and hybrid processing delays, 320 and 384 time domain samples respectively. If no real to complex
conversion is performed, this delay shall be compensated by a delay. This leads to 320 time samples which
are added on top the analysis processing delay for both HQ and LP decoders. The synthesis filterbank
introduces 257 samples delay which is introduced by the QMF synthesis filtering. The hybrid synthesis does
not introduce further delay. For the LP decoder the complex to real conversion adds 320 time domain samples
to the synthesis processing delay.
10 © ISO/IEC 2010 – All rights reserved

SAOC spatial parameters
SAOC decoder
delay
T/F hybrid     SAOC hybrid     F/T

QMF analysis synthesis synthesis    QMF

cos to exp
exp to cos
320   320   384    0    0 HQ:   0  257
LP :    320
Figure 3 – Delay for the different parts of the SAOC decoder
5.4.2.2 SAOC transcoding mode
The MPS renderer introduces a delay when processing the downmix from the SAOC transcoder. Two cases
may be distinguished, mono and stereo downmix.
In case of a mono downmix the signal from the downmix decoder is passed directly to the MPS decoder as no
further processing is applied to the downmix, depicted in Figure 4. The MC SAOC processing delay shall be
equal to the one given by the SAOC decoder, as the SAOC to MPS parameter processing does not introduce
any additional delay.
SAOC spatial parameters
MC-SAOC decoder
SAOC
transcoder
delay
T/F hybrid     MPS hybrid     F/T

QMF analysis synthesis synthesis QMF

cos to exp
exp to cos
320   320   384  0 HQ:   0  257
LP :    320
Figure 4 – Delay for the different parts of the MC-SAOC decoder with mono downmix signal
In case of a stereo downmix the SAOC transcoder processes the core coder signal to adapt it to the
subsequent MPS decoding, as shown in Figure 5. The delay of the MPS renderer shall be added on top of the
SAOC processing delay. Both, the SAOC transcoder and the MPS decoder introduce a delay of 1281 samples
for the HQ mode or of 1601 samples for the LP mode. Their analysis/synthesis delay is distributed as
described above for the SAOC decoder.
If both modules are not integrated, i.e. they interface via the time domain, the total processing delay shall be
the sum of the delays introduced by each module plus buffering needed for synchronization of the MPS
parameters to the downmix.
The size of buffer B (spatial parameters buffer) is a multiple of the frame length. The downmix buffer A is
needed to synchronize the delayed bitstream and processed downmix. To achieve synchronization the
following equation must be met, both buffer sizes are given in time domain samples:
B = N*Frame length so that B ≥ 1281 (HQ) or B ≥ 1601 (LP); N=0,1,2,3,….
© ISO/IEC 2010 – All rights reserved 11

A(HQ) = B – 1281 or A(LP) = B – 1601
Given an interface via the hybrid QMF domain, the overall processing delay of the SAOC MC rendering is
equal to the delay of the SAOC decoding mode.
SAOC transcoder
delay
T/F hybrid     SAOC hybrid     F/T

QMF analysis processing synthesis QMF

cos to exp
exp to cos
320   320  384 0    0 HQ:   0 257
LP : 320
B A
delay
F/T hybrid     MPS hybrid     T/F

QMF synthesis synthesis analysis QMF

cos to exp
exp to cos
257 HQ:   0   0     0  384  320 320
LP : 320
Time domain interface between SAOC transcoder and MPS
MPEG Surround decoder
Figure 5 – Delay for the different parts of the MC-SAOC decoder with stereo downmix signal
5.4.2.3 Connection to an arbitrary core coder
If the (MC-)SAOC decoder is connected with an arbitrary downmix coder (including HE-AAC) via the time
domain, as shown in Figure 6, the additional delay introduced by the (MC-)SAOC decoder processing is
described in the previous subclauses.
SAOC spatial parameters
(MC) SAOC decoder
delay
(MC)
AAC SBR T/F hybrid     SAOC hybrid     F/T

decoder decoder QMF analysis synthesis synthesis QMF

cos to exp
exp to cos
962  320  320  384    SAOC decoding:  0    0  HQ:   0  257
SAOC transcoding:  LP : 320
time interface:    B
freq. interface:    0
Figure 6 – Delay when connecting (MC-)SAOC in the time-domain for an arbitrary core codec
(including HE-AAC)
12 © ISO/IEC 2010 – All rights reserved

If the (MC-)SAOC decoder is directly connected with a High Efficiency AAC decoder via the QMF domain, as
shown in Figure 7, the delay shall be reduced as outlined in 4.5 of ISO/IEC 23003-1:2007 for a MPS decoder.
The only additional delay is introduced by the real to complex converter for a LP decoder or the delay
compensation for a HQ decoder. No hybrid analysis delay is introduced due to the fact that the look-ahead of
384 time domain samples is already available on the SBR tool of HE-AAC, as described in 8.A.3 of
ISO/IEC 14496-3:2009.
SAOC spatial parameters
(MC) SAOC decoder
delay
(MC)
AAC SBR decoder w/o hybrid     SAOC hybrid     F/T

decoder QMF synthesis analysis synthesis synthesis    QMF

cos to exp
exp to cos
962 - 257       320   0 SAOC decoding: 0     0 HQ:   0  257
SAOC transcoding: LP : 320
time interface:    B
freq. interface:    0
Figure 7 – Delay when connecting (MC-)SAOC with HE-AAC in the QMF domain
5.4.3 LD processing
5.4.3.1 SAOC Decoding
As outlined in 7.13 for the LD SAOC processing the Hybrid QMF filterbank is substituted by a LD QMF, in
addition no LP mode is allowed and hereafter no complex conversion is to be performed. As a result, the
analysis and synthesis processing delays shall only depend on the LD QMF filtering, i.e. 160 time domain
samples for analysis and 96 for synthesis. This is shown in Figure 8.
SAOC spatial parameters
LD-SAOC decoder
T/F SAOC F/T
LD QMF synthesis LD QMF
160  0    96
Figure 8 – Delay for the different parts of the LD SAOC decoder
5.4.3.2 SAOC Transcoding
Equally to the non LD MC decoders, depending on the number of downmix channels (mono/stereo) the
processing for the LD-SAOC MC rendering scenarios is performed differently. Due to the delay constraints for
a LD-SAOC system no time domain interface is allowed for the stereo processing case. Hereafter, both mono
and stereo downmix cases may be equally described with respect to the systems delay, matching the SAOC
decoder delay distribution, as depicted in Figure 9 and Figure 10.
© ISO/IEC 2010 – All rights reserved 13

SAOC spatial parameters
LD-MC-SAOC decoder
SAOC
processor
F/T
T/F MPS
LD QMF
LD QMF synthesis
160     0  96
Figure 9 – Delay for the different parts of the LD-MC-SAOC decoder with mono downmix signal
SAOC spatial parameters
LD-SAOC LD-MPEG Surround
T/F SAOC MPS F/T
LD QMF processor synthesis LD QMF

160   0   0   96
Figure 10 – Delay for the different parts of the LD-MC-SAOC decoder with stereo downmix signal
5.4.3.3 LD-(MC-) SAOC connection to LD core codec
If the LD-(MC-)SAOC decoder is connected with a LD downmix coder (e.g. AAC-LD or AAC-ELD, with/-out
SBR ) via the time domain, as shown in Figure 11, the additional delay introduced by the LD-(MC-)SAOC
decoder processing is described above.
SAOC spatial parameters
LD- (MC) SAOC decoder
(MC)
T/F SAOC F/T
AAC-(E)LD
LD QMF synthesis LD QMF
160   0 96
Figure 11 – Delay when connecting LD-(MC-)SAOC in the time-domain for a LD core codec
(e.g. AAC-LD or AAC-ELD)
14 © ISO/IEC 2010 – All rights reserved

If the LD-(MC-) SAOC decoder is connected with a LD core coder via the frequency domain (e.g. AAC-ELD
with SBR), as shown in Figure 12, the additional delay introduced by the LD-(MC-) SAOC decoder processing
shall be reduced with respect to the description above, as no analysis is performed in the LD-(MC-) SAOC
decoder.
SAOC spatial parameters
LD- (MC) SAOC decoder
SBR wo (MC)
synthesis SAOC F/T
AAC-(E)LD
LD-QMF processing LD QMF
0 96
Figure 12 – Delay when connecting LD-(MC-) SAOC in the frequency-domain for a LD core codec
(e.g. AAC-ELD with SBR)
5.5 SAOC Profiles and Levels
5.5.1 Introduction
This Subclause defines profiles and their levels for SAOC.
Complexity units are defined to give an approximation of the decoder complexity in terms of processing power
and RAM usage required for the SAOC decoding/transcoding process. The approxima
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...