Speech and multimedia Transmission Quality (STQ); Speech quality performance in the presence of background noise; Part 1: Background noise simulation technique and background noise database

RES/STQ-308

Kakovost prenosa govora in večpredstavnih vsebin (STQ) - Kakovost govora v prisotnosti šuma ozadja - 1. del: Simulacijska tehnika šuma ozadja in podatkovna zbirka šumov ozadja

Kakovost prenosa šuma ozadja je pomemben dejavnik, ki bistveno prispeva k sprejeti celotni kakovosti govora. Obstoječo in še toliko bolj novejšo generacijo terminalov, omrežij in konfiguracij sistemov, vključno s širokopasovnimi storitvami, je mogoče znatno izboljšati s pravilnim projektiranjem, ob upoštevanju in obravnavi prisotnosti šumov ozadja. Ta dokument:
• vsebuje opis okolja za simulacijo šuma, v katerem so uporabljeni realistični scenariji za šum ozadja za laboratorijsko uporabo;
• vsebuje zbirko podatkov, vključno z relevantnimi vzorci šuma ozadja za subjektivno in objektivno vrednotenje.
V tem dokumentu so podane informacije o tehnikah zapisovanja, ki so potrebne za zapisovanje šuma ozadja, ter obravnavane so prednosti in pomanjkljivosti obstoječih metod. V tem dokumentu so prav tako opisane zahteve za laboratorijske pogoje. Opisani so postopki nastavljanja zvočnikov ter umerjanja zvočnikov in izenačevanja temperature. Določeno simulacijsko okolje je mogoče uporabiti za vrednotenje in optimizacijo terminalov ter zapletenih konfiguracij, vključno s terminali, omrežji in drugimi konfiguracijami. Glavna območja uporabe naj so: pisarna, domače okolje in avtomobil.
Namestitev in zbirka podatkov, kot sta opisani v tem dokumentu, se uporabljata za naslednje:
• objektivno vrednotenje delovanja terminalov v različnih (simuliranih) okoljih s šumom ozadja;
• vrednotenje obdelave govora s predhodno obdelanim govornim signalom ob prisotnosti šuma ozadja, ki ga zabeleži terminal;
• subjektivno vrednotenje terminalov na podlagi izvedbe pogovornih preskusov, specifičnih preskusov dvojnih prenosov ali preskusov oddajanja in prejemanja ob prisotnosti šuma ozadja;
• subjektivno vrednotenje preskusov prejemanja tretjih oseb z zapisovanjem vzorcev govora terminalov ob prisotnosti šuma ozadja.

General Information

Status
Not Published
Current Stage
12 - Citation in the OJ (auto-insert)
Due Date
23-May-2023
Completion Date
10-May-2023
Standard
ETSI ES 202 396-1 V1.9.1 (2023-03) - Speech and multimedia Transmission Quality (STQ); Speech quality performance in the presence of background noise; Part 1: Background noise simulation technique and background noise database
English language
62 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ETSI ES 202 396-1 V1.9.1 (2023-05) - Speech and multimedia Transmission Quality (STQ); Speech quality performance in the presence of background noise; Part 1: Background noise simulation technique and background noise database
English language
62 pages
sale 15% off
Preview
sale 15% off
Preview
Standardization document
ES 202 396-1 V1.9.1:2024 - BARVE
English language
62 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)

ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Speech quality performance
in the presence of background noise;
Part 1: Background noise simulation technique
and background noise database
2 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)

Reference
RES/STQ-308
Keywords
noise, performance, quality, speech
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - APE 7112B
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° w061004871

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
If you find a security vulnerability in the present document, please report it through our
Coordinated Vulnerability Disclosure Program:
https://www.etsi.org/standards/coordinated-vulnerability-disclosure
Notice of disclaimer & limitation of liability
The information provided in the present deliverable is directed solely to professionals who have the appropriate degree of
experience to understand and interpret its content in accordance with generally accepted engineering or
other professional standard and applicable regulations.
No recommendation as to products and services or vendors is made or should be implied.
No representation or warranty is made that this deliverable is technically accurate or sufficient or conforms to any law
rule and/or regulation and further, no representation or warranty is made of merchantability or fitness
and/or governmental
for any particular purpose or against infringement of intellectual property rights.
In no event shall ETSI be held liable for loss of profits or any other incidental or consequential damages.

Any software contained in this deliverable is provided "AS IS" with no warranties, express or implied, including but not
limited to, the warranties of merchantability, fitness for a particular purpose and non-infringement of intellectual property
rights and ETSI shall not be held liable in any event for any damages whatsoever (including, without limitation, damages
for loss of profits, business interruption, loss of information, or any other pecuniary loss) arising out of or related to the use
of or inability to use the software.
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© ETSI 2023.
All rights reserved.
ETSI
3 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 7
2 References . 7
2.1 Normative references . 7
2.2 Informative references . 8
3 Definition of terms, symbols and abbreviations . 9
3.1 Terms . 9
3.2 Symbols . 9
3.3 Abbreviations . 9
4 Overview of existing methods for realistic sound reproduction. 9
4.1 Introduction . 9
4.2 Surround Sound Techniques. 10 ®
4.3 IOSONO . 11
4.4 Eidophonie . 11
4.5 Four-loudspeaker arrangement for playback of binaurally recorded signals . 12
4.6 NTT Background-Noise Database . 13
4.7 General conclusions . 13
5 Recording arrangement . 14
5.1 Binaural recordings . 14
5.2 Equalization procedure . 14
6 Loudspeaker Setup for Background Noise Simulation . 16
6.1 Test Room Requirements . 16
6.2 Loudspeaker Positioning . 17
6.3 Equalization and Calibration . 18
6.3.1 Overview . 18
6.3.2 Separate equalization for each of the four loudspeakers . 18
6.3.3 Separate level adjustment for each loudspeaker . 20
6.3.4 Equalization for the two left-hand and the two right-hand loudspeakers . 20
6.3.5 Equalization and level adjustment for the subwoofer . 21
6.3.6 Delay compensation . 21
6.3.7 Overall equalization for all loudspeakers . 22
6.3.8 Troubleshooting failed equalizations . 22
6.3.9 Automated equalization and calibration procedure . 23
6.4 Accuracy of the reproduction arrangement . 23
6.4.0 Introduction. 23
6.4.1 Comparison between original sound field and simulated sound field . 23
6.4.2 Displacement of the test arrangement in the simulated sound field . 25
6.4.3 Transmission of background noise: Comparison of terminal performance in the original sound field
and the simulated sound field . 27
6.5 Simulation of additional acoustic conditions . 30
7 Background Noise Simulation in cars . 31
7.1 General setup . 31
7.2 Recording arrangement . 32
7.2.0 Introduction. 32
7.2.1 Recording setup with the terminal's microphone . 32
7.2.2 Recording setup with a pair of cardioid microphones. 33
7.3 Equalization and calibration with the terminal's microphone . 33
7.3.1 Overview . 33
7.3.2 Separate equalization for each of the four loudspeakers . 34
ETSI
4 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
7.3.3 Separate level adjustment for each loudspeaker . 36
7.3.4 Equalization for the two front and the two rear loudspeakers . 36
7.3.5 Equalization and level adjustment for the subwoofer . 37
7.3.6 Delay adjustment . 37
7.3.7 Overall equalization . 38
7.3.8 Troubleshooting failed equalizations . 38
7.4 Equalization and Calibration with a pair of cardioid microphones . 38
7.4.1 Overview . 38
7.4.2 Pair-wise equalization for the left-hand loudspeakers . 39
7.4.3 Separate level adjustment for the left-hand loudspeakers . 41
7.4.4 Pair-wise equalization and separate level adjustment for the right-hand loudspeakers . 41
7.4.5 Equalization and level adjustment for the subwoofer . 41
7.4.6 Delay compensation . 41
7.4.7 Overall equalization . 42
7.4.8 Troubleshooting failed equalizations . 42
7.5 Accuracy of the reproduction arrangement . 43
7.5.1 Comparison between original sound field and simulated sound field . 43
7.5.2 Transmission of background noise: Comparison of terminal performance in the original sound field
and the simulated sound field . 44
7.6 Automated equalization and calibration procedure . 46
8 Background Noise Database . 46
8.0 Introduction . 46
8.1 Binaural signals . 47
8.2 Binaural signals identical to the background noise recordings provided in ETSI TS 103 224 . 49
8.3 Stereophonic signals . 50
Annex A (informative): Comparison of Tests in Sending Direction and D-Values Conducted
in Different Rooms . 51
A.0 Introduction . 51
A.1 Test Setup . 51
A.2 Results of the Tests . 52
A.2.0 Introduction . 52
A.2.1 Sending Frequency Response Characteristics and SLR . 52
A.2.2 D-Value with Pink Noise . 52
A.2.3 D-Value with Cafeteria Noise . 53
A.3 Conclusions . 53
Annex B (informative): Graphs of test results in Annex A . 54
History . 62

ETSI
5 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The declarations
pertaining to these essential IPRs, if any, are publicly available for ETSI members and non-members, and can be
found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to
ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the
ETSI Web server (https://ipr.etsi.org/).
Pursuant to the ETSI Directives including the ETSI IPR Policy, no investigation regarding the essentiality of IPRs,
including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not
referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become,
essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its

Members. 3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and of the 3GPP
Organizational Partners. oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and of the ®
oneM2M Partners. GSM and the GSM logo are trademarks registered and owned by the GSM Association.
Foreword
This final draft ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia
Transmission Quality (STQ), and is now submitted for the ETSI standards Membership Approval Procedure.
The present document is part 1 of a multi-part deliverable covering Speech and multimedia Transmission Quality
(STQ); Speech quality performance in the presence of background noise, as identified below:
ETSI ES 202 396-1: "Background noise simulation technique and background noise database";
ETSI EG 202 396-2: "Background noise transmission - Network simulation - Subjective test database and results";
ETSI EG 202 396-3: "Background noise transmission - Objective test methods".
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Background noise is present in most of the conversations today. Background noise may impact the speech
communication performance to terminal and network equipment significantly. Therefore testing and optimization of
such equipment is necessary using realistic background noises. Furthermore reproducible conditions for the tests are
required which can be guaranteed only under lab type condition.
ETSI
6 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
The present document addresses this issue by describing a methodology for recording and playback of background
noises under well-defined and calibratable conditions in a lab-type environment. Furthermore a database with real
background noises is included.
ETSI
7 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
1 Scope
The quality of background noise transmission is an important factor, which significantly contributes to the perceived
overall quality of speech. Both existing and, even more notably, the new generation of terminals, networks and system
configurations, including broadband services, can be greatly improved when designed properly, with consideration and
presence of background noise. The present document:
• describes a noise simulation environment using realistic background noise scenarios for laboratory use;
• contains a database including the relevant background noise samples for subjective and objective evaluation.
The present document provides information about the recording techniques needed for background noise recordings and
discusses the advantages and drawbacks of existing methods. Additionally, the present document describes the
requirements for laboratory conditions. The loudspeaker setup and the loudspeaker calibration and equalization
procedure are described. The simulation environment specified can be used for the evaluation and optimization of
terminals and of complex configurations including terminals, networks and other configurations. The main application
areas should be: office, home and car environment.
The setup and database as described in the present document are applicable for:
• Objective performance evaluation of terminals in different (simulated) background noise environments.
• Speech processing evaluation by using the pre-processed speech signal in the presence of background noise,
recorded by a terminal.
• Subjective evaluation of terminals by performing conversational tests, specific double talk tests or talking and
listening tests in the presence of background noise.
• Subjective evaluation in third party listening tests by recording the speech samples of terminals in the presence
of background noise.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] Recommendation ITU-T P.57: "Artificial ears".
[2] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[3] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
ETSI
8 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] Surround Sound Past, Present, and Future: "A history of multichannel audio from mag stripe to
Dolby Digital", Joseph Hull - Dolby Laboratories Inc.
[i.2] AES preprint 3332 (1992): "Improved Possibilities of Binaural Recording and Playback
Techniques", K. Genuit, H.W. Gierlich; U. Künzli.
[i.3] AES preprint 3732 (1993): "A System for the Reproduction Technique for Playback of Binaural
Recordings", N. Xiang, K. Genuit, H.W. Gierlich.
[i.4] NTTAT Database: "Ambient Noise Database CD-ROM".
[i.5] ISO 11904-1: "Acoustics - Determination of sound immission from sound sources placed close to
the ear - Part 1: Technique using a microphone in a real ear (MIRE technique)".
[i.6] J. Blauert: "The psychophysics of human sound localization", Spatial Hearing.
[i.7] Void.
[i.8] Void.
[i.9] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
[i.10] Recommendation ITU-T P.64: "Determination of sensitivity/frequency characteristics of local
telephone systems".
[i.11] Recommendation ITU-T G.722: "7 kHz audio-coding within 64 kbit/s".
[i.12] Genuit, K.: "A Description of the Human Outer Ear Transfer Function by Elements of
Communication Theory (No. B6-8)".
th
NOTE: Proceedings of the 12 International Congress on Acoustics. Toronto published on behalf of the
th
Technical Program Committee by the Executive Committee of the 12 International Congress on
Acoustics.
[i.13] IEC 60050-722: "International Electrotechnical Vocabulary - Chapter 722: Telephony".
[i.14] "Wellenfeldsynthese - Eine neue Dimension der 3D-Audiowiedergabe"; Fernseh- und Kino-
Technik, Nr. 11/2002, pp. 735-738.
[i.15] N.Lee: "IOSONO" Computers in Entertainment, volume 2, issue 3 (2004).
[i.16] P. Scherer: "Ein neues Verfahren der raumbezogenen Stereophonie mit verbesserter Übertragung
der Rauminformation"; Rundfunktechnische Mitteilungen, 1977, pp. 196-204.
[i.17] Void.
[i.18] ETSI TS 151 010-1: "Digital cellular telecommunications system (Phase 2+) (GSM); Mobile
Station (MS) conformance specification; Part 1: Conformance specification (3GPP TS 51.010-1)".
[i.19] Void.
ETSI
9 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
[i.20] ETSI TR 126 921: "Universal Mobile Telecommunications System (UMTS); LTE; 5G;
Investigations on ambient noise reproduction systems for acoustic testing of terminals (3GPP
TR 26.921)".
3 Definition of terms, symbols and abbreviations
3.1 Terms
For the purposes of the present document, the following terms apply:
cross-talk: appearance of undesired energy in a channel, owing to the presence of a signal in another channel, caused
by, for example, induction, conduction or non-linearity
NOTE: See IEC 60050-722 [i.13].
3.2 Symbols
Void.
3.3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
CD Compact Disc
DF Diffuse Field
EQ Equalization
FF Free Field
FFT Fast Fourier Transform
FIR Finite Impulse Response
HATS Head And Torso Simulator
ID Independent of Direction
IIR Infinite Impulse Response
MIRE Microphone In Real Ear
MRP Mouth Reference Point
NTT Nippon Telegraph and Telephone corporation
SLR Send Loudness Rating
VHF Very High Frequency
4 Overview of existing methods for realistic sound
reproduction
4.1 Introduction
In general the existing methods for close to original sound recording and reproduction aimed for different applications:
• Techniques intending to reproduce the actual sound field.
• Techniques providing hearing adequate (ear related) signals in the human ear canal.
• Techniques generating artificial acoustical environments.
ETSI
10 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
Within this clause the different methods are briefly described and their applicability for close to original sound-filed
reproduction is discussed. A variety of methods have been studied, in the following a summary of the most important
ones relevant to the present document is given. The different methods were analysed on the basis of the following
requirements:
• The background noise recording technique should be:
- easy to use;
- easy to calibrate;
- capable of wideband recording;
- available at reasonable costs;
- mostly compatible to existing standards and procedures used in telecommunications testing;
- applicable to different environments (at least office, home and car).
• The background noise simulation arrangement should:
- be easy to setup;
- not require any specific acoustical treatment for the simulation requirement;
- provide a mostly realistic background noise simulation for all typical background noises faced with in
telecommunication applications;
- be easy to calibrate;
- be mostly insensitive against the positioning of (test)-objects in the simulated sound field;
- be applicable to all typical terminals used in telecommunication;
- be available at reasonable costs.
4.2 Surround Sound Techniques
The basics of surround techniques are found in cinema applications. The virtual image provided by stereophonic
presentation of sounds seemed not to be sufficient for the large screen display in cinema. In the 1950s 4-channel and
6-channel sound tracks recorded on magnetic stripes associated to the films were developed, 4-channel and 6-channel
loudspeaker systems were installed in cinemas to reproduce the multichannel sounds. The newer techniques were ®
mostly developed and marketed by Dolby [i.1]: Dolby Surround, Dolby Surround Pro Logic, Dolby Digital and Dolby
Digital Surround are examples for the techniques introduced more recently. The most common configuration is the
"5.1-configuration" used in cinema but in home applications as well. The reproduction system consists of left and right
channel, a centre speaker, two surround channels (left and right, arranged in the back of the listener) and a low
frequency channel for low frequency effects.
The aim of all surround system is to create an artificial acoustical image in the recording studio rather than recording a
real acoustical scenario and providing true to original playback possibilities.
On the recording side, special surround encoders are used allowing the 5-channel signal to be encoded from a special
mixing console to the 5.1 digital data stream. The playback system consists of a special decoder allowing to separate the
5 channels again and distribute them on the 5.1 loudspeaker playback system. The systems are mono and stereo
compatible and can handle the older 4-channel surround techniques by a specific decoder.
Applications:
Typical applications for surround systems are cinemas and home theatres. The source material is produced by
professional recording studios using multi-channel mixing consoles and specific 5.1 decoding techniques. In mostly all
cases virtual environments are created which support the visual image by an appropriate acoustical image.
ETSI
11 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
Conclusion:
Surround techniques are designed for creating acoustical images rather than for close to original recording and
reproduction. Although the spatial impression provided by surround techniques is sometimes remarkable the acoustical
image created is always artificial. Due to the lack of easy to use recording techniques allowing a spatial recording of a
sound field surround sound techniques are not suitable for creation of a background noise database with realistic
background noises and calibrated background noise simulation in a lab. ®
4.3 IOSONO ®
The IOSONO sound system (see [i.14], [i.15] and [i.16]) is based on the Wave-Field Synthesis. It employs Huygens
principle of wave theory. Applied to acoustics this principle means that it is possible to reproduce any form of wave
front with an array of loudspeakers, so that virtual sound sources can be placed anywhere within a listening area. For
practical use it is necessary to position loudspeakers all-round the playback room. In order to generate realistic sound
fields the input signal for each loudspeaker has to be calculated separately. For this purpose each single sound source
(e.g. voices) has to be recorded individually. If the recordings are done in a room, the characteristics (like reverberation)
of the recording room also have to be recorded separately. All resulting sound tracks are then mixed and manipulated
during the post-editing process and the reproduction.
The natural and realistic spatial sound reproduction is then achieved in a wide area of the play back room. Common 5.1
stereo systems achieve a "realistic" sound reproduction only in a small area of the reproduction room.
Applications: ®
Typical applications are sound systems for home use, cinemas and other entertainment events. The IOSONO sound
system is also able to play back recordings made in common stereo or 5.1 stereo techniques.
Conclusion:
The drawbacks of this method are the components needed: a sophisticated recording system, a powerful computing unit
for real-time mixing the large number of recorded sound tracks and the number of loudspeakers that have to be installed
in the listening room. In a common size cinema for example about 200 loudspeakers are needed. ®
The advantage is that with the IOSONO sound system a very realistic sound reproduction is possible, but it requires an
enormous effort, which is too high for daily use in laboratories.
4.4 Eidophonie
This method was developed for realistic sound reproduction using the VHF transmission technique. The main principle
is to separate the base signal from the part of the signal, which contains the information about the direction of sound
incidence.
st
For recording a 1 order gradient microphone with a cardioid directivity is used. During the recordings its directivity
rotates with 38 kHz in the recording plane. This "turning microphone" provides an amplitude-modulated signal at its
electrical output. The resulting side bands are out of the transmitted frequency range. But these side bands contain the
information of the direction of sound incidence. Using the VHF transmission techniques this phase information can be
nd
transmitted within the 2 audio-frequency channel.
The sound reproduction is made by a spatial demodulation: a switch is positioned before each loudspeaker and each
switches synchronously with the turning directivity. So a low pass filtered short-term section of the signal containing
the information of the direction of sound incidence is played back on each loudspeaker. The loudspeakers are positioned
all around the playback room.
Applications:
Eidophonie was developed to provide a realistic sound environment using a signal received from a VHF broadcast
station. With this technique the common stereo sound reproduction should be improved. Nevertheless Eidophonie is
also compatible to common mono and stereo recordings.
ETSI
12 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
Conclusion:
Benefits of this system are that three loudspeakers are sufficient to produce a realistic sound field. Using more
loudspeakers (e.g. 16) the spatial sound reproduction gets more and more independent from the listening position.
Moreover the independency of the transmitted sound from the acoustics of the reproduction room increases with the
number of loudspeakers used. But there are significant limitations of the method: The microphone directivity is
frequency dependent and not ideal. Therefore the interference between the different channels is created. A second
problem is the loudspeaker directivity, which does not fit the microphone directivity. This problem could be reduced if
the number of channels would be increased. This however is not possible due to the limited directivity of the
microphone arrangement used.
Localization of sound sources is hardly possible due to the interference effects of the microphone signals and the
loudspeakers. At close to original reproduction depends on the number and distribution of sound sources present. For
most of the sound source combinations this goal cannot be achieved.
In general the coding technique needed to record the sound field by a "turning microphone", is complicated and not
available commercially. A further drawback of this method is the complicated decoding technique needed on the
reproduction side, which is also not commercially available.
4.5 Four-loudspeaker arrangement for playback of binaurally
recorded signals
This reproduction procedure was originally investigated to reproduce binaurally signals recorded using artificial head
technology. It improves the impressions of direction and distance. Four loudspeakers are typically positioned in a
square formation around a central point (listening point) equidistantly, e.g. 2 m. The binaural recordings are reproduced
as follows: the two left-hand loudspeakers receive the same free-field equalized artificial head signal of the left-hand
channel only. The right-hand side is arranged similarly. For equalization, the transfer function from the two left-hand
loudspeakers is measured at the artificial head's left ear channel. With this result, several IIR and FIR filters are
designed, with which the input signal of the left-hand loudspeakers during the play back is filtered in such a way that
the transfer function then measured at the artificial head's left-hand channel is spectrally flat. The equalization for the
right-hand loudspeakers is done similarly.
The equalization procedure does not take into account the correction of cross-talk. This means, the left-hand channel of
the artificial head is only equalized for the left-hand loudspeakers, but during the reproduction this left-hand channel
will also receive a signal from the right-hand loudspeakers. But despite this simplification, the equalization procedure
provides a realistic binaural listening impression.
Investigations [i.2] and [i.3] carried out in different rooms have shown that directional hearing and distance localization
by sound reproduction with this four-loudspeaker arrangement are comparable to those with sound reproduction by
headphones.
For equalization there are several strategies. The equalization can either be done for each loudspeaker individually or by
pairs left - right or by pairs front - rear.
Applications:
A practical application is the sound reproduction for binaural recordings in a typical office-type room, e.g. for listening
tests but objective tests as well. The investigations shown in [i.2] and [i.3] indicate that the subjective impression
provided by the arrangement corresponds to the headphone reproduction of binaural recordings with respect to the
perception of the sound colour, the distance perception and (with some limitations) with respect to the sound source
localization. The data provided indicate that the setup could be used for the objective measurements of e.g.
telecommunication terminals.
This four-loudspeaker arrangement is also used in advanced driving simulators - which typically provide a visual
simulation of the driving situation in addition to the acoustical playback system.
ETSI
13 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03)
Conclusion:
The advantage of this arrangement on the recording side is the compatibility to standard mono and stereo recordings.
Due to this it is easily possible to playback either binaural recordings for subjective and objective experiments or
mono/stereo recordings in cases where a less realistic reproduction is sufficient. With slight modifications the
geometrical setup can be transferred to other environments like cars for example. Another benefit of this arrangement is
the moderate hardware effort.
Concerning the sound reproduction a drawback of this arrangement is that due to the superimposed loudspeaker signals
mostly in anechoic rooms interferences appear. This effect obliged test subjects to keep their heads in a mostly fixed
position during hearing tests, but also means that an exact sound reproduction is only possible in a small area in
anechoic rooms. Another drawback for binaural reproduction is the fact that no exact cross-talk cancellation is possible
with this arrangement. However in general this technique seems to be the most promising under the restrictions given
(moderate hardware effort on the recording and especially on the reproduction side, close to original reproduction of the
scenarios recorded without additional adjustment of the reproduction arrangement).
4.6 NTT Background-Noise Database
The NTT Background Database [i.4], which is commercially available from NTT, is typically used for codec tests. The
database contains noise files, which were recorded with a 4-channel recording using 4 directional microphones. The
microphones were arranged in an angle of 90 degrees with 70 cm diagonal. Although the original signals were recorded
with 20 kHz bandwidth, the signals commercially available are specially coded on a CD providing a bandwidth of
11 kHz/15 bit for each channel. A special decoder is needed if the signals are to be presented acoustically over
loudspeakers. The loudspeaker arrangement is suggested in a condition list. For calibration a calibration tone is
provided, only level calibration is performed, no equalization procedure is described.
For the electrical evaluation of systems a downmix of the 4-channel recording to a mono channel with 8 kHz sampling
rate is available. This signal is mostly used for the evaluation of new speech codecs in order to evaluate the influence of
background noise on the coder performance.
Applications:
The NTT-database is mostly used for the evaluation of speech coders using the 8 kHz down sampled signals. When
using the acoustical 4-channel playback, the limitation is mostly due to the bandwidth limitation of 11 kHz, which may
be not sufficient for future wideband applications.
Conclusion:
The disadvantage of this arrangement is found on the recording and on the playback side. For recording, a special
microphone arrangement and a special coding technique is needed. The signal on the reproduction side is band-limited
and may not be used in future wideband applications. Currently, it is not clear how the playback system can be
calibrated and equalized in order to achieve a close to original sound field and which procedure should be followed
during recording in order to get the right calibration and setup for the recording. Furthermore, the chosen microphone
arrangement seems to be impractical for recording in smaller enclosures, e.g. in a car.
Another drawback is that the background noise database, including a special decoder, needs to be purchased for each
application separately. It is not clear to what extent the recording and coding technique is commercially available.
4.7 General conclusions
Although a variety of reproduction techniques exists, the usability of such methods within the constraints of a
laboratory use is limited due to:
• easy and well described recording technique;
• easy to install and easy to use playback technique at reasonable costs;
• applicability of the playback technique in a variety of different rooms with different acoustical conditions;
In consequence, a four-channel loudspeaker setup with associated subwoofer based on binaurally recorded material is
selected as the basis for the ETSI background noise simulation arrangement.
ETSI
14 Final draft ETSI ES 202 396-1 V1.9.1 (2023-03
...


ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Speech quality performance
in the presence of background noise;
Part 1: Background noise simulation technique
and background noise database
2 ETSI ES 202 396-1 V1.9.1 (2023-05)

Reference
RES/STQ-308
Keywords
noise, performance, quality, speech
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - APE 7112B
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° w061004871

Important notice
The present document can be downloaded from:
https://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
If you find a security vulnerability in the present document, please report it through our
Coordinated Vulnerability Disclosure Program:
https://www.etsi.org/standards/coordinated-vulnerability-disclosure
Notice of disclaimer & limitation of liability
The information provided in the present deliverable is directed solely to professionals who have the appropriate degree of
experience to understand and interpret its content in accordance with generally accepted engineering or
other professional standard and applicable regulations.
No recommendation as to products and services or vendors is made or should be implied.
No representation or warranty is made that this deliverable is technically accurate or sufficient or conforms to any law
and/or governmental rule and/or regulation and further, no representation or warranty is made of merchantability or fitness
for any particular purpose or against infringement of intellectual property rights.
In no event shall ETSI be held liable for loss of profits or any other incidental or consequential damages.

Any software contained in this deliverable is provided "AS IS" with no warranties, express or implied, including but not
limited to, the warranties of merchantability, fitness for a particular purpose and non-infringement of intellectual property
rights and ETSI shall not be held liable in any event for any damages whatsoever (including, without limitation, damages
for loss of profits, business interruption, loss of information, or any other pecuniary loss) arising out of or related to the use
of or inability to use the software.
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© ETSI 2023.
All rights reserved.
ETSI
3 ETSI ES 202 396-1 V1.9.1 (2023-05)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 7
2 References . 7
2.1 Normative references . 7
2.2 Informative references . 8
3 Definition of terms, symbols and abbreviations . 9
3.1 Terms . 9
3.2 Symbols . 9
3.3 Abbreviations . 9
4 Overview of existing methods for realistic sound reproduction. 9
4.1 Introduction . 9
4.2 Surround Sound Techniques. 10 ®
4.3 IOSONO . 11
4.4 Eidophonie . 11
4.5 Four-loudspeaker arrangement for playback of binaurally recorded signals . 12
4.6 NTT Background-Noise Database . 13
4.7 General conclusions . 13
5 Recording arrangement . 14
5.1 Binaural recordings . 14
5.2 Equalization procedure . 14
6 Loudspeaker Setup for Background Noise Simulation . 16
6.1 Test Room Requirements . 16
6.2 Loudspeaker Positioning . 17
6.3 Equalization and Calibration . 18
6.3.1 Overview . 18
6.3.2 Separate equalization for each of the four loudspeakers . 18
6.3.3 Separate level adjustment for each loudspeaker . 20
6.3.4 Equalization for the two left-hand and the two right-hand loudspeakers . 20
6.3.5 Equalization and level adjustment for the subwoofer . 21
6.3.6 Delay compensation . 21
6.3.7 Overall equalization for all loudspeakers . 22
6.3.8 Troubleshooting failed equalizations . 23
6.3.9 Automated equalization and calibration procedure . 23
6.4 Accuracy of the reproduction arrangement . 23
6.4.0 Introduction. 23
6.4.1 Comparison between original sound field and simulated sound field . 23
6.4.2 Displacement of the test arrangement in the simulated sound field . 25
6.4.3 Transmission of background noise: Comparison of terminal performance in the original sound field
and the simulated sound field . 27
6.5 Simulation of additional acoustic conditions . 30
7 Background Noise Simulation in cars . 31
7.1 General setup . 31
7.2 Recording arrangement . 32
7.2.0 Introduction. 32
7.2.1 Recording setup with the terminal's microphone . 32
7.2.2 Recording setup with a pair of cardioid microphones. 33
7.3 Equalization and calibration with the terminal's microphone . 33
7.3.1 Overview . 33
7.3.2 Separate equalization for each of the four loudspeakers . 34
ETSI
4 ETSI ES 202 396-1 V1.9.1 (2023-05)
7.3.3 Separate level adjustment for each loudspeaker . 36
7.3.4 Equalization for the two front and the two rear loudspeakers . 36
7.3.5 Equalization and level adjustment for the subwoofer . 37
7.3.6 Delay adjustment . 37
7.3.7 Overall equalization . 38
7.3.8 Troubleshooting failed equalizations . 38
7.4 Equalization and Calibration with a pair of cardioid microphones . 38
7.4.1 Overview . 38
7.4.2 Pair-wise equalization for the left-hand loudspeakers . 39
7.4.3 Separate level adjustment for the left-hand loudspeakers . 41
7.4.4 Pair-wise equalization and separate level adjustment for the right-hand loudspeakers . 41
7.4.5 Equalization and level adjustment for the subwoofer . 41
7.4.6 Delay compensation . 41
7.4.7 Overall equalization . 42
7.4.8 Troubleshooting failed equalizations . 42
7.5 Accuracy of the reproduction arrangement . 43
7.5.1 Comparison between original sound field and simulated sound field . 43
7.5.2 Transmission of background noise: Comparison of terminal performance in the original sound field
and the simulated sound field . 44
7.6 Automated equalization and calibration procedure . 46
8 Background Noise Database . 46
8.0 Introduction . 46
8.1 Binaural signals . 47
8.2 Binaural signals identical to the background noise recordings provided in ETSI TS 103 224 . 49
8.3 Stereophonic signals . 50
Annex A (informative): Comparison of Tests in Sending Direction and D-Values Conducted
in Different Rooms . 51
A.0 Introduction . 51
A.1 Test Setup . 51
A.2 Results of the Tests . 52
A.2.0 Introduction . 52
A.2.1 Sending Frequency Response Characteristics and SLR . 52
A.2.2 D-Value with Pink Noise . 52
A.2.3 D-Value with Cafeteria Noise . 53
A.3 Conclusions . 53
Annex B (informative): Graphs of test results in Annex A . 54
History . 62

ETSI
5 ETSI ES 202 396-1 V1.9.1 (2023-05)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The declarations
pertaining to these essential IPRs, if any, are publicly available for ETSI members and non-members, and can be
found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to
ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the
ETSI Web server (https://ipr.etsi.org/).
Pursuant to the ETSI Directives including the ETSI IPR Policy, no investigation regarding the essentiality of IPRs,
including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not
referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become,
essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its

Members. 3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and of the 3GPP
Organizational Partners. oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and of the ®
oneM2M Partners. GSM and the GSM logo are trademarks registered and owned by the GSM Association.
Foreword
This ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia Transmission
Quality (STQ).
The present document is part 1 of a multi-part deliverable covering Speech and multimedia Transmission Quality
(STQ); Speech quality performance in the presence of background noise, as identified below:
ETSI ES 202 396-1: "Background noise simulation technique and background noise database";
ETSI EG 202 396-2: "Background noise transmission - Network simulation - Subjective test database and results";
ETSI EG 202 396-3: "Background noise transmission - Objective test methods".
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Background noise is present in most of the conversations today. Background noise may impact the speech
communication performance to terminal and network equipment significantly. Therefore testing and optimization of
such equipment is necessary using realistic background noises. Furthermore reproducible conditions for the tests are
required which can be guaranteed only under lab type condition.
ETSI
6 ETSI ES 202 396-1 V1.9.1 (2023-05)
The present document addresses this issue by describing a methodology for recording and playback of background
noises under well-defined and calibratable conditions in a lab-type environment. Furthermore a database with real
background noises is included.
ETSI
7 ETSI ES 202 396-1 V1.9.1 (2023-05)
1 Scope
The quality of background noise transmission is an important factor, which significantly contributes to the perceived
overall quality of speech. Both existing and, even more notably, the new generation of terminals, networks and system
configurations, including broadband services, can be greatly improved when designed properly, with consideration and
presence of background noise. The present document:
• describes a noise simulation environment using realistic background noise scenarios for laboratory use;
• contains a database including the relevant background noise samples for subjective and objective evaluation.
The present document provides information about the recording techniques needed for background noise recordings and
discusses the advantages and drawbacks of existing methods. Additionally, the present document describes the
requirements for laboratory conditions. The loudspeaker setup and the loudspeaker calibration and equalization
procedure are described. The simulation environment specified can be used for the evaluation and optimization of
terminals and of complex configurations including terminals, networks and other configurations. The main application
areas should be: office, home and car environment.
The setup and database as described in the present document are applicable for:
• Objective performance evaluation of terminals in different (simulated) background noise environments.
• Speech processing evaluation by using the pre-processed speech signal in the presence of background noise,
recorded by a terminal.
• Subjective evaluation of terminals by performing conversational tests, specific double talk tests or talking and
listening tests in the presence of background noise.
• Subjective evaluation in third party listening tests by recording the speech samples of terminals in the presence
of background noise.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] Recommendation ITU-T P.57: "Artificial ears".
[2] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[3] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
ETSI
8 ETSI ES 202 396-1 V1.9.1 (2023-05)
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] Surround Sound Past, Present, and Future: "A history of multichannel audio from mag stripe to
Dolby Digital", Joseph Hull - Dolby Laboratories Inc.
[i.2] AES preprint 3332 (1992): "Improved Possibilities of Binaural Recording and Playback
Techniques", K. Genuit, H.W. Gierlich; U. Künzli.
[i.3] AES preprint 3732 (1993): "A System for the Reproduction Technique for Playback of Binaural
Recordings", N. Xiang, K. Genuit, H.W. Gierlich.
[i.4] NTTAT Database: "Ambient Noise Database CD-ROM".
[i.5] ISO 11904-1: "Acoustics - Determination of sound immission from sound sources placed close to
the ear - Part 1: Technique using a microphone in a real ear (MIRE technique)".
[i.6] J. Blauert: "The psychophysics of human sound localization", Spatial Hearing.
[i.7] Void.
[i.8] Void.
[i.9] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
[i.10] Recommendation ITU-T P.64: "Determination of sensitivity/frequency characteristics of local
telephone systems".
[i.11] Recommendation ITU-T G.722: "7 kHz audio-coding within 64 kbit/s".
[i.12] Genuit, K.: "A Description of the Human Outer Ear Transfer Function by Elements of
Communication Theory (No. B6-8)".
th
NOTE: Proceedings of the 12 International Congress on Acoustics. Toronto published on behalf of the
th
Technical Program Committee by the Executive Committee of the 12 International Congress on
Acoustics.
[i.13] IEC 60050-722: "International Electrotechnical Vocabulary - Chapter 722: Telephony".
[i.14] "Wellenfeldsynthese - Eine neue Dimension der 3D-Audiowiedergabe"; Fernseh- und
Kino-Technik, Nr. 11/2002, pp. 735-738.
[i.15] N.Lee: "IOSONO" Computers in Entertainment, volume 2, issue 3 (2004).
[i.16] P. Scherer: "Ein neues Verfahren der raumbezogenen Stereophonie mit verbesserter Übertragung
der Rauminformation"; Rundfunktechnische Mitteilungen, 1977, pp. 196-204.
[i.17] Void.
[i.18] ETSI TS 151 010-1: "Digital cellular telecommunications system (Phase 2+) (GSM); Mobile
Station (MS) conformance specification; Part 1: Conformance specification (3GPP TS 51.010-1)".
[i.19] Void.
ETSI
9 ETSI ES 202 396-1 V1.9.1 (2023-05)
[i.20] ETSI TR 126 921: "Universal Mobile Telecommunications System (UMTS); LTE; 5G;
Investigations on ambient noise reproduction systems for acoustic testing of terminals (3GPP
TR 26.921)".
3 Definition of terms, symbols and abbreviations
3.1 Terms
For the purposes of the present document, the following terms apply:
cross-talk: appearance of undesired energy in a channel, owing to the presence of a signal in another channel, caused
by, for example, induction, conduction or non-linearity
NOTE: See IEC 60050-722 [i.13].
3.2 Symbols
Void.
3.3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
CD Compact Disc
DF Diffuse Field
EQ Equalization
FF Free Field
FFT Fast Fourier Transform
FIR Finite Impulse Response
HATS Head And Torso Simulator
ID Independent of Direction
IIR Infinite Impulse Response
MIRE Microphone In Real Ear
MRP Mouth Reference Point
NTT Nippon Telegraph and Telephone corporation
SLR Send Loudness Rating
VHF Very High Frequency
4 Overview of existing methods for realistic sound
reproduction
4.1 Introduction
In general the existing methods for close to original sound recording and reproduction aimed for different applications:
• Techniques intending to reproduce the actual sound field.
• Techniques providing hearing adequate (ear related) signals in the human ear canal.
• Techniques generating artificial acoustical environments.
ETSI
10 ETSI ES 202 396-1 V1.9.1 (2023-05)
Within this clause the different methods are briefly described and their applicability for close to original sound-filed
reproduction is discussed. A variety of methods have been studied, in the following a summary of the most important
ones relevant to the present document is given. The different methods were analysed on the basis of the following
requirements:
• The background noise recording technique should be:
- easy to use;
- easy to calibrate;
- capable of wideband recording;
- available at reasonable costs;
- mostly compatible to existing standards and procedures used in telecommunications testing;
- applicable to different environments (at least office, home and car).
• The background noise simulation arrangement should:
- be easy to setup;
- not require any specific acoustical treatment for the simulation requirement;
- provide a mostly realistic background noise simulation for all typical background noises faced with in
telecommunication applications;
- be easy to calibrate;
- be mostly insensitive against the positioning of (test)-objects in the simulated sound field;
- be applicable to all typical terminals used in telecommunication;
- be available at reasonable costs.
4.2 Surround Sound Techniques
The basics of surround techniques are found in cinema applications. The virtual image provided by stereophonic
presentation of sounds seemed not to be sufficient for the large screen display in cinema. In the 1950s 4-channel and
6-channel sound tracks recorded on magnetic stripes associated to the films were developed, 4-channel and 6-channel
loudspeaker systems were installed in cinemas to reproduce the multichannel sounds. The newer techniques were ®
mostly developed and marketed by Dolby [i.1]: Dolby Surround, Dolby Surround Pro Logic, Dolby Digital and Dolby
Digital Surround are examples for the techniques introduced more recently. The most common configuration is the
"5.1-configuration" used in cinema but in home applications as well. The reproduction system consists of left and right
channel, a centre speaker, two surround channels (left and right, arranged in the back of the listener) and a low
frequency channel for low frequency effects.
The aim of all surround system is to create an artificial acoustical image in the recording studio rather than recording a
real acoustical scenario and providing true to original playback possibilities.
On the recording side, special surround encoders are used allowing the 5-channel signal to be encoded from a special
mixing console to the 5.1 digital data stream. The playback system consists of a special decoder allowing to separate the
5 channels again and distribute them on the 5.1 loudspeaker playback system. The systems are mono and stereo
compatible and can handle the older 4-channel surround techniques by a specific decoder.
Applications:
Typical applications for surround systems are cinemas and home theatres. The source material is produced by
professional recording studios using multi-channel mixing consoles and specific 5.1 decoding techniques. In mostly all
cases virtual environments are created which support the visual image by an appropriate acoustical image.
ETSI
11 ETSI ES 202 396-1 V1.9.1 (2023-05)
Conclusion:
Surround techniques are designed for creating acoustical images rather than for close to original recording and
reproduction. Although the spatial impression provided by surround techniques is sometimes remarkable the acoustical
image created is always artificial. Due to the lack of easy to use recording techniques allowing a spatial recording of a
sound field surround sound techniques are not suitable for creation of a background noise database with realistic
background noises and calibrated background noise simulation in a lab. ®
4.3 IOSONO ®
The IOSONO sound system (see [i.14], [i.15] and [i.16]) is based on the Wave-Field Synthesis. It employs Huygens
principle of wave theory. Applied to acoustics this principle means that it is possible to reproduce any form of wave
front with an array of loudspeakers, so that virtual sound sources can be placed anywhere within a listening area. For
practical use it is necessary to position loudspeakers all-round the playback room. In order to generate realistic sound
fields the input signal for each loudspeaker has to be calculated separately. For this purpose each single sound source
(e.g. voices) has to be recorded individually. If the recordings are done in a room, the characteristics (like reverberation)
of the recording room also have to be recorded separately. All resulting sound tracks are then mixed and manipulated
during the post-editing process and the reproduction.
The natural and realistic spatial sound reproduction is then achieved in a wide area of the play back room. Common 5.1
stereo systems achieve a "realistic" sound reproduction only in a small area of the reproduction room.
Applications: ®
Typical applications are sound systems for home use, cinemas and other entertainment events. The IOSONO sound
system is also able to play back recordings made in common stereo or 5.1 stereo techniques.
Conclusion:
The drawbacks of this method are the components needed: a sophisticated recording system, a powerful computing unit
for real-time mixing the large number of recorded sound tracks and the number of loudspeakers that have to be installed
in the listening room. In a common size cinema for example about 200 loudspeakers are needed. ®
The advantage is that with the IOSONO sound system a very realistic sound reproduction is possible, but it requires an
enormous effort, which is too high for daily use in laboratories.
4.4 Eidophonie
This method was developed for realistic sound reproduction using the VHF transmission technique. The main principle
is to separate the base signal from the part of the signal, which contains the information about the direction of sound
incidence.
st
For recording a 1 order gradient microphone with a cardioid directivity is used. During the recordings its directivity
rotates with 38 kHz in the recording plane. This "turning microphone" provides an amplitude-modulated signal at its
electrical output. The resulting side bands are out of the transmitted frequency range. But these side bands contain the
information of the direction of sound incidence. Using the VHF transmission techniques this phase information can be
nd
transmitted within the 2 audio-frequency channel.
The sound reproduction is made by a spatial demodulation: a switch is positioned before each loudspeaker and each
switches synchronously with the turning directivity. So a low pass filtered short-term section of the signal containing
the information of the direction of sound incidence is played back on each loudspeaker. The loudspeakers are positioned
all around the playback room.
Applications:
Eidophonie was developed to provide a realistic sound environment using a signal received from a VHF broadcast
station. With this technique the common stereo sound reproduction should be improved. Nevertheless Eidophonie is
also compatible to common mono and stereo recordings.
ETSI
12 ETSI ES 202 396-1 V1.9.1 (2023-05)
Conclusion:
Benefits of this system are that three loudspeakers are sufficient to produce a realistic sound field. Using more
loudspeakers (e.g. 16) the spatial sound reproduction gets more and more independent from the listening position.
Moreover the independency of the transmitted sound from the acoustics of the reproduction room increases with the
number of loudspeakers used. But there are significant limitations of the method: The microphone directivity is
frequency dependent and not ideal. Therefore the interference between the different channels is created. A second
problem is the loudspeaker directivity, which does not fit the microphone directivity. This problem could be reduced if
the number of channels would be increased. This however is not possible due to the limited directivity of the
microphone arrangement used.
Localization of sound sources is hardly possible due to the interference effects of the microphone signals and the
loudspeakers. At close to original reproduction depends on the number and distribution of sound sources present. For
most of the sound source combinations this goal cannot be achieved.
In general the coding technique needed to record the sound field by a "turning microphone", is complicated and not
available commercially. A further drawback of this method is the complicated decoding technique needed on the
reproduction side, which is also not commercially available.
4.5 Four-loudspeaker arrangement for playback of binaurally
recorded signals
This reproduction procedure was originally investigated to reproduce binaurally signals recorded using artificial head
technology. It improves the impressions of direction and distance. Four loudspeakers are typically positioned in a
square formation around a central point (listening point) equidistantly, e.g. 2 m. The binaural recordings are reproduced
as follows: the two left-hand loudspeakers receive the same free-field equalized artificial head signal of the left-hand
channel only. The right-hand side is arranged similarly. For equalization, the transfer function from the two left-hand
loudspeakers is measured at the artificial head's left ear channel. With this result, several IIR and FIR filters are
designed, with which the input signal of the left-hand loudspeakers during the play back is filtered in such a way that
the transfer function then measured at the artificial head's left-hand channel is spectrally flat. The equalization for the
right-hand loudspeakers is done similarly.
The equalization procedure does not take into account the correction of cross-talk. This means, the left-hand channel of
the artificial head is only equalized for the left-hand loudspeakers, but during the reproduction this left-hand channel
will also receive a signal from the right-hand loudspeakers. But despite this simplification, the equalization procedure
provides a realistic binaural listening impression.
Investigations [i.2] and [i.3] carried out in different rooms have shown that directional hearing and distance localization
by sound reproduction with this four-loudspeaker arrangement are comparable to those with sound reproduction by
headphones.
For equalization there are several strategies. The equalization can either be done for each loudspeaker individually or by
pairs left - right or by pairs front - rear.
Applications:
A practical application is the sound reproduction for binaural recordings in a typical office-type room, e.g. for listening
tests but objective tests as well. The investigations shown in [i.2] and [i.3] indicate that the subjective impression
provided by the arrangement corresponds to the headphone reproduction of binaural recordings with respect to the
perception of the sound colour, the distance perception and (with some limitations) with respect to the sound source
localization. The data provided indicate that the setup could be used for the objective measurements of e.g.
telecommunication terminals.
This four-loudspeaker arrangement is also used in advanced driving simulators - which typically provide a visual
simulation of the driving situation in addition to the acoustical playback system.
ETSI
13 ETSI ES 202 396-1 V1.9.1 (2023-05)
Conclusion:
The advantage of this arrangement on the recording side is the compatibility to standard mono and stereo recordings.
Due to this it is easily possible to playback either binaural recordings for subjective and objective experiments or
mono/stereo recordings in cases where a less realistic reproduction is sufficient. With slight modifications the
geometrical setup can be transferred to other environments like cars for example. Another benefit of this arrangement is
the moderate hardware effort.
Concerning the sound reproduction a drawback of this arrangement is that due to the superimposed loudspeaker signals
mostly in anechoic rooms interferences appear. This effect obliged test subjects to keep their heads in a mostly fixed
position during hearing tests, but also means that an exact sound reproduction is only possible in a small area in
anechoic rooms. Another drawback for binaural reproduction is the fact that no exact cross-talk cancellation is possible
with this arrangement. However in general this technique seems to be the most promising under the restrictions given
(moderate hardware effort on the recording and especially on the reproduction side, close to original reproduction of the
scenarios recorded without additional adjustment of the reproduction arrangement).
4.6 NTT Background-Noise Database
The NTT Background Database [i.4], which is commercially available from NTT, is typically used for codec tests. The
database contains noise files, which were recorded with a 4-channel recording using 4 directional microphones. The
microphones were arranged in an angle of 90 degrees with 70 cm diagonal. Although the original signals were recorded
with 20 kHz bandwidth, the signals commercially available are specially coded on a CD providing a bandwidth of
11 kHz/15 bit for each channel. A special decoder is needed if the signals are to be presented acoustically over
loudspeakers. The loudspeaker arrangement is suggested in a condition list. For calibration a calibration tone is
provided, only level calibration is performed, no equalization procedure is described.
For the electrical evaluation of systems a downmix of the 4-channel recording to a mono channel with 8 kHz sampling
rate is available. This signal is mostly used for the evaluation of new speech codecs in order to evaluate the influence of
background noise on the coder performance.
Applications:
The NTT-database is mostly used for the evaluation of speech coders using the 8 kHz down sampled signals. When
using the acoustical 4-channel playback, the limitation is mostly due to the bandwidth limitation of 11 kHz, which may
be not sufficient for future wideband applications.
Conclusion:
The disadvantage of this arrangement is found on the recording and on the playback side. For recording, a special
microphone arrangement and a special coding technique is needed. The signal on the reproduction side is band-limited
and may not be used in future wideband applications. Currently, it is not clear how the playback system can be
calibrated and equalized in order to achieve a close to original sound field and which procedure should be followed
during recording in order to get the right calibration and setup for the recording. Furthermore, the chosen microphone
arrangement seems to be impractical for recording in smaller enclosures, e.g. in a car.
Another drawback is that the background noise database, including a special decoder, needs to be purchased for each
application separately. It is not clear to what extent the recording and coding technique is commercially available.
4.7 General conclusions
Although a variety of reproduction techniques exists, the usability of such methods within the constraints of a
laboratory use is limited due to:
• easy and well described recording technique;
• easy to install and easy to use playback technique at reasonable costs;
• applicability of the playback technique in a variety of different rooms with different acoustical conditions.
In consequence, a four-channel loudspeaker setup with associated subwoofer based on binaurally recorded material is
selected as the basis for the ETSI background noise simulation arrangement.
ETSI
14 ETSI ES 202 396-1 V1.9.1 (2023-05)
5 Recording arrangement
5.1 Binaural recordings
The sound field simulation technique described in the present document is generally based on the binaural recording
and reproduction technique as has been known for many years (see [i.6]). The general principle of t
...


SLOVENSKI STANDARD
01-februar-2024
Kakovost prenosa govora in večpredstavnih vsebin (STQ) - Kakovost govora v
prisotnosti šuma ozadja - 1. del: Simulacijska tehnika šuma ozadja in podatkovna
zbirka šumov ozadja
Speech and multimedia Transmission Quality (STQ) - Speech quality performance in the
presence of background noise - Part 1: Background noise simulation technique and
background noise database
Ta slovenski standard je istoveten z: ETSI ES 202 396-1 V1.9.1 (2023-05)
ICS:
33.040.35 Telefonska omrežja Telephone networks
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Speech quality performance
in the presence of background noise;
Part 1: Background noise simulation technique
and background noise database
2 ETSI ES 202 396-1 V1.9.1 (2023-05)

Reference
RES/STQ-308
Keywords
noise, performance, quality, speech
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - APE 7112B
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° w061004871

Important notice
The present document can be downloaded from:
https://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
If you find a security vulnerability in the present document, please report it through our
Coordinated Vulnerability Disclosure Program:
https://www.etsi.org/standards/coordinated-vulnerability-disclosure
Notice of disclaimer & limitation of liability
The information provided in the present deliverable is directed solely to professionals who have the appropriate degree of
experience to understand and interpret its content in accordance with generally accepted engineering or
other professional standard and applicable regulations.
No recommendation as to products and services or vendors is made or should be implied.
No representation or warranty is made that this deliverable is technically accurate or sufficient or conforms to any law
and/or governmental rule and/or regulation and further, no representation or warranty is made of merchantability or fitness
for any particular purpose or against infringement of intellectual property rights.
In no event shall ETSI be held liable for loss of profits or any other incidental or consequential damages.

Any software contained in this deliverable is provided "AS IS" with no warranties, express or implied, including but not
limited to, the warranties of merchantability, fitness for a particular purpose and non-infringement of intellectual property
rights and ETSI shall not be held liable in any event for any damages whatsoever (including, without limitation, damages
for loss of profits, business interruption, loss of information, or any other pecuniary loss) arising out of or related to the use
of or inability to use the software.
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© ETSI 2023.
All rights reserved.
ETSI
3 ETSI ES 202 396-1 V1.9.1 (2023-05)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 7
2 References . 7
2.1 Normative references . 7
2.2 Informative references . 8
3 Definition of terms, symbols and abbreviations . 9
3.1 Terms . 9
3.2 Symbols . 9
3.3 Abbreviations . 9
4 Overview of existing methods for realistic sound reproduction. 9
4.1 Introduction . 9
4.2 Surround Sound Techniques. 10 ®
4.3 IOSONO . 11
4.4 Eidophonie . 11
4.5 Four-loudspeaker arrangement for playback of binaurally recorded signals . 12
4.6 NTT Background-Noise Database . 13
4.7 General conclusions . 13
5 Recording arrangement . 14
5.1 Binaural recordings . 14
5.2 Equalization procedure . 14
6 Loudspeaker Setup for Background Noise Simulation . 16
6.1 Test Room Requirements . 16
6.2 Loudspeaker Positioning . 17
6.3 Equalization and Calibration . 18
6.3.1 Overview . 18
6.3.2 Separate equalization for each of the four loudspeakers . 18
6.3.3 Separate level adjustment for each loudspeaker . 20
6.3.4 Equalization for the two left-hand and the two right-hand loudspeakers . 20
6.3.5 Equalization and level adjustment for the subwoofer . 21
6.3.6 Delay compensation . 21
6.3.7 Overall equalization for all loudspeakers . 22
6.3.8 Troubleshooting failed equalizations . 23
6.3.9 Automated equalization and calibration procedure . 23
6.4 Accuracy of the reproduction arrangement . 23
6.4.0 Introduction. 23
6.4.1 Comparison between original sound field and simulated sound field . 23
6.4.2 Displacement of the test arrangement in the simulated sound field . 25
6.4.3 Transmission of background noise: Comparison of terminal performance in the original sound field
and the simulated sound field . 27
6.5 Simulation of additional acoustic conditions . 30
7 Background Noise Simulation in cars . 31
7.1 General setup . 31
7.2 Recording arrangement . 32
7.2.0 Introduction. 32
7.2.1 Recording setup with the terminal's microphone . 32
7.2.2 Recording setup with a pair of cardioid microphones. 33
7.3 Equalization and calibration with the terminal's microphone . 33
7.3.1 Overview . 33
7.3.2 Separate equalization for each of the four loudspeakers . 34
ETSI
4 ETSI ES 202 396-1 V1.9.1 (2023-05)
7.3.3 Separate level adjustment for each loudspeaker . 36
7.3.4 Equalization for the two front and the two rear loudspeakers . 36
7.3.5 Equalization and level adjustment for the subwoofer . 37
7.3.6 Delay adjustment . 37
7.3.7 Overall equalization . 38
7.3.8 Troubleshooting failed equalizations . 38
7.4 Equalization and Calibration with a pair of cardioid microphones . 38
7.4.1 Overview . 38
7.4.2 Pair-wise equalization for the left-hand loudspeakers . 39
7.4.3 Separate level adjustment for the left-hand loudspeakers . 41
7.4.4 Pair-wise equalization and separate level adjustment for the right-hand loudspeakers . 41
7.4.5 Equalization and level adjustment for the subwoofer . 41
7.4.6 Delay compensation . 41
7.4.7 Overall equalization . 42
7.4.8 Troubleshooting failed equalizations . 42
7.5 Accuracy of the reproduction arrangement . 43
7.5.1 Comparison between original sound field and simulated sound field . 43
7.5.2 Transmission of background noise: Comparison of terminal performance in the original sound field
and the simulated sound field . 44
7.6 Automated equalization and calibration procedure . 46
8 Background Noise Database . 46
8.0 Introduction . 46
8.1 Binaural signals . 47
8.2 Binaural signals identical to the background noise recordings provided in ETSI TS 103 224 . 49
8.3 Stereophonic signals . 50
Annex A (informative): Comparison of Tests in Sending Direction and D-Values Conducted
in Different Rooms . 51
A.0 Introduction . 51
A.1 Test Setup . 51
A.2 Results of the Tests . 52
A.2.0 Introduction . 52
A.2.1 Sending Frequency Response Characteristics and SLR . 52
A.2.2 D-Value with Pink Noise . 52
A.2.3 D-Value with Cafeteria Noise . 53
A.3 Conclusions . 53
Annex B (informative): Graphs of test results in Annex A . 54
History . 62

ETSI
5 ETSI ES 202 396-1 V1.9.1 (2023-05)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The declarations
pertaining to these essential IPRs, if any, are publicly available for ETSI members and non-members, and can be
found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to
ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the
ETSI Web server (https://ipr.etsi.org/).
Pursuant to the ETSI Directives including the ETSI IPR Policy, no investigation regarding the essentiality of IPRs,
including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not
referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become,
essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its

Members. 3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and of the 3GPP
Organizational Partners. oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and of the ®
oneM2M Partners. GSM and the GSM logo are trademarks registered and owned by the GSM Association.
Foreword
This ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia Transmission
Quality (STQ).
The present document is part 1 of a multi-part deliverable covering Speech and multimedia Transmission Quality
(STQ); Speech quality performance in the presence of background noise, as identified below:
ETSI ES 202 396-1: "Background noise simulation technique and background noise database";
ETSI EG 202 396-2: "Background noise transmission - Network simulation - Subjective test database and results";
ETSI EG 202 396-3: "Background noise transmission - Objective test methods".
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Background noise is present in most of the conversations today. Background noise may impact the speech
communication performance to terminal and network equipment significantly. Therefore testing and optimization of
such equipment is necessary using realistic background noises. Furthermore reproducible conditions for the tests are
required which can be guaranteed only under lab type condition.
ETSI
6 ETSI ES 202 396-1 V1.9.1 (2023-05)
The present document addresses this issue by describing a methodology for recording and playback of background
noises under well-defined and calibratable conditions in a lab-type environment. Furthermore a database with real
background noises is included.
ETSI
7 ETSI ES 202 396-1 V1.9.1 (2023-05)
1 Scope
The quality of background noise transmission is an important factor, which significantly contributes to the perceived
overall quality of speech. Both existing and, even more notably, the new generation of terminals, networks and system
configurations, including broadband services, can be greatly improved when designed properly, with consideration and
presence of background noise. The present document:
• describes a noise simulation environment using realistic background noise scenarios for laboratory use;
• contains a database including the relevant background noise samples for subjective and objective evaluation.
The present document provides information about the recording techniques needed for background noise recordings and
discusses the advantages and drawbacks of existing methods. Additionally, the present document describes the
requirements for laboratory conditions. The loudspeaker setup and the loudspeaker calibration and equalization
procedure are described. The simulation environment specified can be used for the evaluation and optimization of
terminals and of complex configurations including terminals, networks and other configurations. The main application
areas should be: office, home and car environment.
The setup and database as described in the present document are applicable for:
• Objective performance evaluation of terminals in different (simulated) background noise environments.
• Speech processing evaluation by using the pre-processed speech signal in the presence of background noise,
recorded by a terminal.
• Subjective evaluation of terminals by performing conversational tests, specific double talk tests or talking and
listening tests in the presence of background noise.
• Subjective evaluation in third party listening tests by recording the speech samples of terminals in the presence
of background noise.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] Recommendation ITU-T P.57: "Artificial ears".
[2] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[3] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
ETSI
8 ETSI ES 202 396-1 V1.9.1 (2023-05)
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] Surround Sound Past, Present, and Future: "A history of multichannel audio from mag stripe to
Dolby Digital", Joseph Hull - Dolby Laboratories Inc.
[i.2] AES preprint 3332 (1992): "Improved Possibilities of Binaural Recording and Playback
Techniques", K. Genuit, H.W. Gierlich; U. Künzli.
[i.3] AES preprint 3732 (1993): "A System for the Reproduction Technique for Playback of Binaural
Recordings", N. Xiang, K. Genuit, H.W. Gierlich.
[i.4] NTTAT Database: "Ambient Noise Database CD-ROM".
[i.5] ISO 11904-1: "Acoustics - Determination of sound immission from sound sources placed close to
the ear - Part 1: Technique using a microphone in a real ear (MIRE technique)".
[i.6] J. Blauert: "The psychophysics of human sound localization", Spatial Hearing.
[i.7] Void.
[i.8] Void.
[i.9] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
[i.10] Recommendation ITU-T P.64: "Determination of sensitivity/frequency characteristics of local
telephone systems".
[i.11] Recommendation ITU-T G.722: "7 kHz audio-coding within 64 kbit/s".
[i.12] Genuit, K.: "A Description of the Human Outer Ear Transfer Function by Elements of
Communication Theory (No. B6-8)".
th
NOTE: Proceedings of the 12 International Congress on Acoustics. Toronto published on behalf of the
th
Technical Program Committee by the Executive Committee of the 12 International Congress on
Acoustics.
[i.13] IEC 60050-722: "International Electrotechnical Vocabulary - Chapter 722: Telephony".
[i.14] "Wellenfeldsynthese - Eine neue Dimension der 3D-Audiowiedergabe"; Fernseh- und
Kino-Technik, Nr. 11/2002, pp. 735-738.
[i.15] N.Lee: "IOSONO" Computers in Entertainment, volume 2, issue 3 (2004).
[i.16] P. Scherer: "Ein neues Verfahren der raumbezogenen Stereophonie mit verbesserter Übertragung
der Rauminformation"; Rundfunktechnische Mitteilungen, 1977, pp. 196-204.
[i.17] Void.
[i.18] ETSI TS 151 010-1: "Digital cellular telecommunications system (Phase 2+) (GSM); Mobile
Station (MS) conformance specification; Part 1: Conformance specification (3GPP TS 51.010-1)".
[i.19] Void.
ETSI
9 ETSI ES 202 396-1 V1.9.1 (2023-05)
[i.20] ETSI TR 126 921: "Universal Mobile Telecommunications System (UMTS); LTE; 5G;
Investigations on ambient noise reproduction systems for acoustic testing of terminals (3GPP
TR 26.921)".
3 Definition of terms, symbols and abbreviations
3.1 Terms
For the purposes of the present document, the following terms apply:
cross-talk: appearance of undesired energy in a channel, owing to the presence of a signal in another channel, caused
by, for example, induction, conduction or non-linearity
NOTE: See IEC 60050-722 [i.13].
3.2 Symbols
Void.
3.3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
CD Compact Disc
DF Diffuse Field
EQ Equalization
FF Free Field
FFT Fast Fourier Transform
FIR Finite Impulse Response
HATS Head And Torso Simulator
ID Independent of Direction
IIR Infinite Impulse Response
MIRE Microphone In Real Ear
MRP Mouth Reference Point
NTT Nippon Telegraph and Telephone corporation
SLR Send Loudness Rating
VHF Very High Frequency
4 Overview of existing methods for realistic sound
reproduction
4.1 Introduction
In general the existing methods for close to original sound recording and reproduction aimed for different applications:
• Techniques intending to reproduce the actual sound field.
• Techniques providing hearing adequate (ear related) signals in the human ear canal.
• Techniques generating artificial acoustical environments.
ETSI
10 ETSI ES 202 396-1 V1.9.1 (2023-05)
Within this clause the different methods are briefly described and their applicability for close to original sound-filed
reproduction is discussed. A variety of methods have been studied, in the following a summary of the most important
ones relevant to the present document is given. The different methods were analysed on the basis of the following
requirements:
• The background noise recording technique should be:
- easy to use;
- easy to calibrate;
- capable of wideband recording;
- available at reasonable costs;
- mostly compatible to existing standards and procedures used in telecommunications testing;
- applicable to different environments (at least office, home and car).
• The background noise simulation arrangement should:
- be easy to setup;
- not require any specific acoustical treatment for the simulation requirement;
- provide a mostly realistic background noise simulation for all typical background noises faced with in
telecommunication applications;
- be easy to calibrate;
- be mostly insensitive against the positioning of (test)-objects in the simulated sound field;
- be applicable to all typical terminals used in telecommunication;
- be available at reasonable costs.
4.2 Surround Sound Techniques
The basics of surround techniques are found in cinema applications. The virtual image provided by stereophonic
presentation of sounds seemed not to be sufficient for the large screen display in cinema. In the 1950s 4-channel and
6-channel sound tracks recorded on magnetic stripes associated to the films were developed, 4-channel and 6-channel
loudspeaker systems were installed in cinemas to reproduce the multichannel sounds. The newer techniques were ®
mostly developed and marketed by Dolby [i.1]: Dolby Surround, Dolby Surround Pro Logic, Dolby Digital and Dolby
Digital Surround are examples for the techniques introduced more recently. The most common configuration is the
"5.1-configuration" used in cinema but in home applications as well. The reproduction system consists of left and right
channel, a centre speaker, two surround channels (left and right, arranged in the back of the listener) and a low
frequency channel for low frequency effects.
The aim of all surround system is to create an artificial acoustical image in the recording studio rather than recording a
real acoustical scenario and providing true to original playback possibilities.
On the recording side, special surround encoders are used allowing the 5-channel signal to be encoded from a special
mixing console to the 5.1 digital data stream. The playback system consists of a special decoder allowing to separate the
5 channels again and distribute them on the 5.1 loudspeaker playback system. The systems are mono and stereo
compatible and can handle the older 4-channel surround techniques by a specific decoder.
Applications:
Typical applications for surround systems are cinemas and home theatres. The source material is produced by
professional recording studios using multi-channel mixing consoles and specific 5.1 decoding techniques. In mostly all
cases virtual environments are created which support the visual image by an appropriate acoustical image.
ETSI
11 ETSI ES 202 396-1 V1.9.1 (2023-05)
Conclusion:
Surround techniques are designed for creating acoustical images rather than for close to original recording and
reproduction. Although the spatial impression provided by surround techniques is sometimes remarkable the acoustical
image created is always artificial. Due to the lack of easy to use recording techniques allowing a spatial recording of a
sound field surround sound techniques are not suitable for creation of a background noise database with realistic
background noises and calibrated background noise simulation in a lab. ®
4.3 IOSONO ®
The IOSONO sound system (see [i.14], [i.15] and [i.16]) is based on the Wave-Field Synthesis. It employs Huygens
principle of wave theory. Applied to acoustics this principle means that it is possible to reproduce any form of wave
front with an array of loudspeakers, so that virtual sound sources can be placed anywhere within a listening area. For
practical use it is necessary to position loudspeakers all-round the playback room. In order to generate realistic sound
fields the input signal for each loudspeaker has to be calculated separately. For this purpose each single sound source
(e.g. voices) has to be recorded individually. If the recordings are done in a room, the characteristics (like reverberation)
of the recording room also have to be recorded separately. All resulting sound tracks are then mixed and manipulated
during the post-editing process and the reproduction.
The natural and realistic spatial sound reproduction is then achieved in a wide area of the play back room. Common 5.1
stereo systems achieve a "realistic" sound reproduction only in a small area of the reproduction room.
Applications: ®
Typical applications are sound systems for home use, cinemas and other entertainment events. The IOSONO sound
system is also able to play back recordings made in common stereo or 5.1 stereo techniques.
Conclusion:
The drawbacks of this method are the components needed: a sophisticated recording system, a powerful computing unit
for real-time mixing the large number of recorded sound tracks and the number of loudspeakers that have to be installed
in the listening room. In a common size cinema for example about 200 loudspeakers are needed. ®
The advantage is that with the IOSONO sound system a very realistic sound reproduction is possible, but it requires an
enormous effort, which is too high for daily use in laboratories.
4.4 Eidophonie
This method was developed for realistic sound reproduction using the VHF transmission technique. The main principle
is to separate the base signal from the part of the signal, which contains the information about the direction of sound
incidence.
st
For recording a 1 order gradient microphone with a cardioid directivity is used. During the recordings its directivity
rotates with 38 kHz in the recording plane. This "turning microphone" provides an amplitude-modulated signal at its
electrical output. The resulting side bands are out of the transmitted frequency range. But these side bands contain the
information of the direction of sound incidence. Using the VHF transmission techniques this phase information can be
nd
transmitted within the 2 audio-frequency channel.
The sound reproduction is made by a spatial demodulation: a switch is positioned before each loudspeaker and each
switches synchronously with the turning directivity. So a low pass filtered short-term section of the signal containing
the information of the direction of sound incidence is played back on each loudspeaker. The loudspeakers are positioned
all around the playback room.
Applications:
Eidophonie was developed to provide a realistic sound environment using a signal received from a VHF broadcast
station. With this technique the common stereo sound reproduction should be improved. Nevertheless Eidophonie is
also compatible to common mono and stereo recordings.
ETSI
12 ETSI ES 202 396-1 V1.9.1 (2023-05)
Conclusion:
Benefits of this system are that three loudspeakers are sufficient to produce a realistic sound field. Using more
loudspeakers (e.g. 16) the spatial sound reproduction gets more and more independent from the listening position.
Moreover the independency of the transmitted sound from the acoustics of the reproduction room increases with the
number of loudspeakers used. But there are significant limitations of the method: The microphone directivity is
frequency dependent and not ideal. Therefore the interference between the different channels is created. A second
problem is the loudspeaker directivity, which does not fit the microphone directivity. This problem could be reduced if
the number of channels would be increased. This however is not possible due to the limited directivity of the
microphone arrangement used.
Localization of sound sources is hardly possible due to the interference effects of the microphone signals and the
loudspeakers. At close to original reproduction depends on the number and distribution of sound sources present. For
most of the sound source combinations this goal cannot be achieved.
In general the coding technique needed to record the sound field by a "turning microphone", is complicated and not
available commercially. A further drawback of this method is the complicated decoding technique needed on the
reproduction side, which is also not commercially available.
4.5 Four-loudspeaker arrangement for playback of binaurally
recorded signals
This reproduction procedure was originally investigated to reproduce binaurally signals recorded using artificial head
technology. It improves the impressions of direction and distance. Four loudspeakers are typically positioned in a
square formation around a central point (listening point) equidistantly, e.g. 2 m. The binaural recordings are reproduced
as follows: the two left-hand loudspeakers receive the same free-field equalized artificial head signal of the left-hand
channel only. The right-hand side is arranged similarly. For equalization, the transfer function from the two left-hand
loudspeakers is measured at the artificial head's left ear channel. With this result, several IIR and FIR filters are
designed, with which the input signal of the left-hand loudspeakers during the play back is filtered in such a way that
the transfer function then measured at the artificial head's left-hand channel is spectrally flat. The equalization for the
right-hand loudspeakers is done similarly.
The equalization procedure does not take into account the correction of cross-talk. This means, the left-hand channel of
the artificial head is only equalized for the left-hand loudspeakers, but during the reproduction this left-hand channel
will also receive a signal from the right-hand loudspeakers. But despite this simplification, the equalization procedure
provides a realistic binaural listening impression.
Investigations [i.2] and [i.3] carried out in different rooms have shown that directional hearing and distance localization
by sound reproduction with this four-loudspeaker arrangement are comparable to those with sound reproduction by
headphones.
For equalization there are several strategies. The equalization can either be done for each loudspeaker individually or by
pairs left - right or by pairs front - rear.
Applications:
A practical application is the sound reproduction for binaural recordings in a typical office-type room, e.g. for listening
tests but objective tests as well. The investigations shown in [i.2] and [i.3] indicate that the subjective impression
provided by the arrangement corresponds to the headphone reproduction of binaural recordings with respect to the
perception of the sound colour, the distance perception and (with some limitations) with respect to the sound source
localization. The data provided indicate that the setup could be used for the objective measurements of e.g.
telecommunication terminals.
This four-loudspeaker arrangement is also used in advanced driving simulators - which typically provide a visual
simulation of the driving situation in addition to the acoustical playback system.
ETSI
13 ETSI ES 202 396-1 V1.9.1 (2023-05)
Conclusion:
The advantage of this arrangement on the recording side is the compatibility to standard mono and stereo recordings.
Due to this it is easily possible to playback either binaural recordings for subjective and objective experiments or
mono/stereo recordings in cases where a less realistic reproduction is sufficient. With slight modifications the
geometrical setup can be transferred to other environments like cars for example. Another benefit of this arrangement is
the moderate hardware effort.
Concerning the sound reproduction a drawback of this arrangement is that due to the superimposed loudspeaker signals
mostly in anechoic rooms interferences appear. This effect obliged test subjects to keep their heads in a mostly fixed
position during hearing tests, but also means that an exact sound reproduction is only possible in a small area in
anechoic rooms. Another drawback for binaural reproduction is the fact that no exact cross-talk cancellation is possible
with this arrangement. However in general this technique seems to be the most promising under the restrictions given
(moderate hardware effort on the recording and especially on the reproduction side, close to original reproduction of the
scenarios recorded without additional adjustment of the reproduction arrangement).
4.6 NTT Background-Noise Database
The NTT Background Database [i.4], which is commercially available from NTT, is typically used for codec tests. The
database contains noise files, which were recorded with a 4-channel recording using 4 directional microphones. The
microphones were arranged in an angle of 90 degrees with 70 cm diagonal. Although the original signals were recorded
with 20 kHz bandwidth, the signals commercially available are specially coded on a CD providing a bandwidth of
11 kHz/15 bit for each channel. A special decoder is needed if the signals are to be presented acoustically over
loudspeakers. The loudspeaker arrangement is suggested in a condition list. For calibration a calibration tone is
provided, only level calibration is performed, no equalization procedure is described.
For the electrical evaluation of systems a downmix of the 4-channel recording to a mono channel with 8 kHz sampling
rate is available. This signal is mostly used for the evaluation of new speech codecs in order to evaluate the influence of
background noise on the coder performance.
Applications:
The NTT-database is mostly used for the evaluation of speech coders using the 8 kHz down sampled signals. When
using the acoustical 4-channel playback, the limitation is mostly due to the bandwidth limitation of 11 kHz, which may
be not sufficient for future wideband applications.
Conclusion:
The disadvantage of this arrangement is found on the recording and on the playback side. For recording, a special
microphone arrangement and a special coding technique is needed. The signal on the reproduction side is band-limited
and may not be used in future wideband applications. Currently, it is not clear how the playback system can be
calibrated and equalized in order to achieve a close to original sound field and which procedure should be followed
during recording in order to get the right calibration and setup for the recording. Furthermore, the chosen microphone
arrangement seems to be impr
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...