ETSI TS 103 224 V1.5.1 (2020-03)
Speech and multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise database
Speech and multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise database
RTS/STQ-289
General Information
Standards Content (Sample)
TECHNICAL SPECIFICATION
Speech and multimedia Transmission Quality (STQ);
A sound field reproduction method for terminal testing
including a background noise database
2 ETSI TS 103 224 V1.5.1 (2020-03)
Reference
RTS/STQ-289
Keywords
noise, quality, speech, terminal
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE
Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88
Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.
© ETSI 2020.
All rights reserved.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its Members.
3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and
of the oneM2M Partners. ®
GSM and the GSM logo are trademarks registered and owned by the GSM Association.
ETSI
3 ETSI TS 103 224 V1.5.1 (2020-03)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 6
1 Scope . 7
2 References . 7
2.1 Normative references . 7
2.2 Informative references . 8
3 Definition of terms, symbols and abbreviations . 8
3.1 Terms . 8
3.2 Symbols . 9
3.2 Abbreviations . 9
4 Methods for realistic sound reproduction . 9
5 Recording arrangement . 9
5.0 General . 9
5.1 Microphone array setup . 10
5.1.1 Principle limitations . 10
5.1.2 Microphone calibration . 10
5.2 Microphone array setup for handset-type and headset terminals. 10
5.3 Microphone array setup for hands-free terminals . 11
5.4 Microphone array setup for binaural applications . 12
6 Loudspeaker setup for background noise simulation . 13
6.0 General setup . 13
6.1 Test room requirements . 14
6.2 Equalization and calibration . 15
6.2.0 Overview of the equalization procedure . 15
6.2.1 Separate level adjustment for each loudspeaker . 15
6.2.2 System identification . 15
6.2.3 Pre-processing of the impulse responses . 16
6.2.4 Calculation of the inversion filters . 18
6.2.4.0 Overview . 18
6.2.4.1 Inversion procedure . 18
6.2.4.2 Different microphones for different frequency bands . 20
6.2.4.3 Search for the optimum regularization factor . 20
6.2.4.3.0 Introduction . 20
6.2.4.3.1 Basic methodology to find the optimum regularization factor . 20
6.2.4.3.2 Extended methodology to find the optimum regularization factor for frequencies above 2 kHz . 22
6.2.5 First test of equalization and filter adjustment for inversion error compensation . 24
6.2.6 Accuracy of the equalization . 25
6.3 Accuracy of the reproduction arrangement . 25
6.3.0 Introduction. 25
6.3.1 Comparison between original sound field and simulated sound field . 25
6.3.2 Impact of handset positioner and phone on the simulated sound field . 27
6.3.3 Comparison of terminal performance in the original sound field and the simulated sound field . 28
6.3.3.1 Introduction . 28
6.3.3.2 Background noise transmission . 29
6.3.3.2.0 Validation Procedure . 29
6.3.3.2.1 Handset . 29
6.3.3.2.2 Handheld Hands-free . 33
6.3.3.2.3 Desktop Hands-Free . 34
6.3.3.3 S-/N-/G-MOS Analysis according to ETSI TS 103 106 . 34
6.3.3.3.1 Handset . 34
ETSI
4 ETSI TS 103 224 V1.5.1 (2020-03)
6.3.3.3.2 Hands-free . 36
7 Generalization of the method for a more flexible loudspeaker and microphone arrangement. 37
7.0 Introduction . 37
7.1 Loudspeaker configuration . 37
7.2 Microphone setup . 37
7.3 Background noise recordings and reference noise . 38
7.4 Equalization and calibration . 38
7.5 Accuracy of the equalization . 39
7.6 Example use case: equalization inside a vehicle . 39
7.6.0 Introduction. 39
7.6.1 Loudspeaker configuration . 39
7.6.2 Microphone setup . 39
7.6.3 Equalization . 39
8 Background noise database . 40
8.0 Introduction . 40
8.1 Reference noise recording . 40
8.2 Background noise signals for terminal testing. 40
8.3 Background noise signals for binaural applications . 43
8.4 Background noise signals in a home-like test environment . 44
Annex A (informative): Home-like test environment . 47
History . 49
ETSI
5 ETSI TS 103 224 V1.5.1 (2020-03)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
Foreword
This Technical Specification (TS) has been produced by ETSI Technical Committee Speech and multimedia
Transmission Quality (STQ).
The present document describes a sound field recording and reproduction technique which can be applied for all types
of terminals but is especially suitable for modern multi-microphone terminals including array techniques. The present
document provides an additional simulation technique which can be used instead of the part 1 of ETSI multi-part
deliverable ES/EG 202 396 "Speech quality performance in the presence of background noise", as identified below:
• ETSI ES 202 396-1: "Background noise simulation technique and background noise database" [i.7];
• ETSI EG 202 396-2: "Background noise transmission - Network simulation - Subjective test database and
results" [i.8];
• ETSI EG 202 396-3: "Background noise transmission - Objective test methods" [i.9].
The background noise simulation can be used in conjunction with the objective test methods as described in ETSI
EG 202 396-3 [i.9] and ETSI TS 103 106 [i.10].
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
ETSI
6 ETSI TS 103 224 V1.5.1 (2020-03)
Introduction
Background noise is present in most of the conversations today. Background noise may impact the speech
communication performance of terminal and network equipment significantly. Therefore testing and optimization of
such equipment is necessary using realistic background noises. Furthermore reproducible conditions for the tests are
required which can be guaranteed only under lab type conditions. Since modern terminals incorporate more advanced
noise cancellation techniques, such as multi-microphone based noise cancellation, the use of microphone-array
recording techniques and more realistic noise field simulations (compared to the method described in ETSI
ES 202 396-1 [i.7]) are required.
The present document addresses this topic by specifying a methodology for recording and playback of realistic
background noise fields under conditions that are well-defined and able to be calibrated in a lab type environment.
Furthermore a database with real background noises is included.
ETSI
7 ETSI TS 103 224 V1.5.1 (2020-03)
1 Scope
The quality of background noise transmission is an important factor, which significantly contributes to the perceived
overall quality of speech. Terminals, networks, and system configurations including wideband, superwideband, and
fullband speech services can be greatly improved with a proper design of terminals and systems in the presence of
background noise. The present document:
• describes a sound field simulation technique allowing to simulate the real environment using realistic
background noise scenarios for laboratory use;
• contains a database including relevant background noise samples for subjective and objective evaluation.
The present document describes the recording technique used for the sound field simulation, the loudspeaker setup, and
the loudspeaker calibration and equalization procedures. Furthermore the present document specifies the test room
requirements for laboratory conditions.
The simulation environment specified can be used for the evaluation and optimization of terminals and of complex
configurations including terminals, networks and others. The main application areas are: outdoor, office, home and car
environment.
The setup and database as described in the present document are applicable for:
• Objective performance evaluation of terminals in different (simulated) background noise environments.
• Speech processing evaluation by using the pre-processed speech signals in the presence of background noise,
recorded by a terminal.
• Subjective evaluation of terminals by performing conversational tests, specific double talk tests, or talking and
listening tests in the presence of background noise.
• Subjective evaluation in third party listening tests by recording the speech samples of terminals in the presence
of background noise.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
Not applicable.
ETSI
8 ETSI TS 103 224 V1.5.1 (2020-03)
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] Berkhout A. J., de Vries D., & Vogel, P.: "Acoustic control by wave field synthesis", J. Acoust.
Soc. Am., p. 2764-2778, Mai 1993.
[i.2] Gerzon, M. A.: "Periphony: With-Height Sound Production", Journal of the Audio Engineering
Society 21, 1973.
[i.3] Ward D. B., Abhayapala T. D.: "Reproduction of a Plane-Wave Sound Field Using an Array of
Loudspeakers", IEEE transactions on speech and audio processing, Vol. 9, No.6, p. 697-707,
September 2001.
[i.4] Kirkeby O., Nelson P. A., Orduna-Bustamante F., Hamada H.: "Local sound field reproduction
using digital signal processing", J. Acoust. Soc. Am. 100(3), p. 1584-1593, September 1996.
[i.5] Kirkeby O., Nelson P. A., Hamada H., Orduna-Bustamante F.: "Fast Deconvolution of
Multichannel Systems Using Regularization", IEEE transactions on speech and audio processing,
VOL. 6, NO. 2, p. 189-195, March 1998.
[i.6] Recommendation ITU-T P.58: "Head and Torso Simulator for Telephonometry".
[i.7] ETSI ES 202 396-1: "Speech and multimedia Transmission Quality (STQ); Speech quality
performance in the presence of background noise; Part 1: Background noise simulation technique
and background noise database".
[i.8] ETSI EG 202 396-2: "Speech Processing, Transmission and Quality Aspects (STQ); Speech
quality performance in the presence of background noise; Part 2: Background noise transmission -
Network simulation - Subjective test database and results".
[i.9] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".
[i.10] ETSI TS 103 106: "Speech and multimedia Transmission Quality (STQ); Speech quality
performance in the presence of background noise: Background noise transmission for mobile
terminals-objective test methods".
[i.11] ISO 3382-1: "Measurement of room acoustic parameters -- Part 1: Performance spaces".
3 Definition of terms, symbols and abbreviations
3.1 Terms
Void.
ETSI
9 ETSI TS 103 224 V1.5.1 (2020-03)
3.2 Symbols
For the purposes of the present document, the following symbols apply:
c Sound velocity
C Matrix of FFT coefficients of Compensation Filters
H Matrix of FFT coefficients of Impulse Responses
3.2 Abbreviations
For the purposes of the present document, the following abbreviations apply:
DUT Device Under Test
EEP Ear canal Entrance Point
FFT Fast Fourier Transform
HATS Head And Torso Simulator
IR Impulse Response
MLS Maximum Length Sequence
MOS Mean Opinion Score
SNR Signal to Noise Ratio
SPL Sound Pressure Level
4 Methods for realistic sound reproduction
For reproduction of real world sound fields there exists a variety of different methods, two of them are wave field
synthesis [i.1] and Ambisonics [i.2]. Both methods, however, require a large number of microphones and loudspeakers
to achieve a sound field reproduction which is sufficiently good for testing purposes. The Wave-Field synthesis setup is
that complex and expensive that it can be neglected for laboratory purposes. Ambisonics, for example, has to be
performed using 43 microphones and 43 loudspeakers to reach a good sound field reproduction up to 2 kHz in a sweet
spot with radius 15 cm (using the rule of thumb in [i.3]). It furthermore cannot consider individual room characteristics
or insufficiencies, but is only designed for rooms offering pure free field conditions. If, e.g. for testing purposes a
HATS is positioned in the artificial noise field, the reproduction quality is reduced by an unknown amount. In summary,
the Ambisonics approach is due to its design not feasible for the intended testing scenario.
The present document introduces an alternative least mean squares method [i.4], which requires eight recording
channels and eight loudspeakers in order to achieve reasonably good reproduction results. The method is based on eight
sweet spots at important testing positions e.g. near the HATS, mainly at the microphone positions of modern phones.
A reasonable reproduction of the recorded sound field at the corresponding eight points in the reproduction situation
also yields good reproduction accuracy in between these points. This well-known property of sound fields is limited to
an upper cut-off frequency which depends on the distances between the recording microphones (see clause 5.1.1).
In clause 5, the recording technique required for this new method is described, while the setup allowing the
reproduction in laboratories and the different steps of the equalization procedure are introduced in clause 6.
5 Recording arrangement
5.0 General
The sound field recording technique (Multi-point sound field recording technique) is based on optimization of the sound
field reproduction at different points in space. The optimization criterion is based on minimization of the reproduction
error at each microphone position. Based on this principle the microphone locations and as a consequence the points in
space for which the sound field reproduction is mostly accurate can be chosen in a wide range. The advantage of the
method is that these locations can be adapted to the type of device which is to be tested. E.g. if the Device Under Test
(DUT) incorporates a microphone array of the Multi-point sound field recording microphones can be positioned in the
area of the microphones of the DUT. If a hands-free device is to be tested the Multi-point sound field recording
microphones are positioned in the area of the hands-free device.
ETSI
10 ETSI TS 103 224 V1.5.1 (2020-03)
The setup described in detail in clause 5 is optimized for the testing of handset or headset terminals using HATS
according to Recommendation ITU-T P.58 [i.6] and for hands-free testing. The procedure described here can be
followed in the same way for other microphone setups.
In this clause the setups for the microphone arrangements as used in the present document are described. The
background noise recordings based on these different recording setups are described in clause 8.
5.1 Microphone array setup
5.1.1 Principle limitations
With a perfect sound field reproduction at two closely spaced points, the cut-off frequency up to which the sound field
in between those two points is also correctly reproduced depends on their distance. This upper cut-off frequency can be
estimated as:
=
(1)
where d is the maximum distance between two microphones and c is the sound velocity.
max
EXAMPLE: For the eight microphones in Figure 1, f is dependent on the distance of the microphone pair
lim
considered and is about 1,7 kHz in the region of sparsely spaced microphones and approximately
3 kHz in the region of densely spaced microphones. Note, that at the microphone positions itself
the reproduction quality is optimal across the whole frequency range. In between of these positions
the accurate spatial reproduction can only be guaranteed up to f .
lim
5.1.2 Microphone calibration
In order to yield a good sound field reproduction at the defined positions, the microphone array for recording of the real
sound field and the microphone array for equalization and calibration of the reproduction setup have to match. In detail,
the frequency/phase response and the directional sensitivity of the corresponding microphones of the two arrays has to
be identical. As a consequence, each microphone has to be calibrated individually with regard to frequency response,
phase response and level.
The supplier of such devices should provide information regarding the sensitivity of the individual microphones
constituting the microphone array for verification purposes. The calibration data provided need to be suitable to ensure
a proper phase calibration up to at least 3 kHz, a proper frequency response calibration in the frequency range between
th
50 Hz and at least 3 kHz with an accuracy of < 0,5 dB in 1/12 octave, between 3 kHz and 10 kHz with an accuracy of
rd rd
< 0,5 dB in 1/3 octave, between 10 kHz and 20 kHz with an accuracy of < 3 dB in 1/3 octave and a proper level
calibration (at 250 Hz or 1 kHz) with an accuracy of < 0,1 dB.
5.2 Microphone array setup for handset-type and headset
terminals
Figure 1 shows the configuration of microphones located around an artificial head. The locations of the microphones
define the sweet spots where the reproduction of the recorded signals is optimal for all frequencies. In consequence the
majority of these points are at relevant positions where the microphones of the test devices are usually located (see
Figure 1, top left). The exact positions for the eight recording microphones are given in Figure 1 (bottom). Eight
additional positions are defined by clockwise rotation of the microphone array by 10 degrees. (Figure 1, top right, in
dark) around the axis of rotation of the HATS as defined in Recommendation ITU-T P.58 [i.6]. This position is called
"fine tuning set" and is used for optimization and verification of the equalization.
ETSI
11 ETSI TS 103 224 V1.5.1 (2020-03)
3 3
Pos 8
Pos 6
EEP EEP
42,5 Pos 8
Pos 1
Pos 7
Pos 6
Pos 1, 7
14,1
42,5
Pos 5
Pos 2
Pos 4
87,3
EEP
30,8
102,9
Pos 3
Pos 4
Pos 3
CL
Pos 2, 5
Figure 1: Positions of the recording microphones
Vertical positions are related to the vertical position of the EEP
5.3 Microphone array setup for hands-free terminals
In general, different microphone arrays could be used for hands-free terminals as well as for handsets and headsets.
However, to increase reusability and reduce efforts, the same microphone array can be used in both cases. The setup of
the array for measuring hands-free terminals is shown in Figure 2.
For the hands-free equalization, the DUT is first positioned at its testing position, which is defined in the relevant
standards. Then, the main microphone position of the terminal is determined. In the case of terminals using multi
microphone techniques terminals the main microphone is chosen, and in case of array techniques the acoustical centre
of the array (typically identical to the centre of the array) is used.
In the setup for hand-held and tablet terminals, the microphone array is positioned such that microphone 5 is in top view
right-angled in front of the main microphone position in 25 mm distance (Figure 2, right) and microphone 6 is at the
height of the main microphone position (Figure 2, left).
For desktop operated hands-free terminals, the microphone 5 of the array is positioned right-angled in front of the main
microphone position in 25 mm distance (Figure 3, right) and 25 mm above the table (Figure 3, left).
Note that the DUT is absent during the equalization procedure itself.
The "fine-tuning set" is realized the same way as described in clause 5.2, rotating the microphone array clockwise by
10 degrees.
ETSI
106,5
91,1
87,3
37,4
87,3
106,5
1 2 3
4 5 6
7 8 9
12 ETSI TS 103 224 V1.5.1 (2020-03)
25 mm
Pos. 2
DUT
Pos. 3
Pos. 1
Pos. 5
Pos. 4
DUT
Pos. 5
Pos. 6
Pos. 7
Pos. 8
main microphone or acoustical center
Figure 2: Positions of the recording microphones in a hands-free setup
for hand-held and tablet terminals
Pos. 8
Pos. 7
Pos. 6
Pos. 8
Pos. 5
Pos. 5
DUT
25 mm
DUT
Table
Pos. 4
main microphone or acoustical center
25 mm
Pos. 3
Pos. 2
Figure 3: Positions of the recording microphones in a hands-free setup
for desktop operated hands-free terminals
5.4 Microphone array setup for binaural applications
Figure 4 shows the configuration of microphones located around an artificial head. The locations of the microphones
define the sweet spots where the reproduction of the recorded signals is optimal for all frequencies. In consequence
these points are in the direct vicinity of the ears where the microphones of binaural test devices are usually located. The
exact positions for the eight recording microphones are given in Figure 4.
ETSI
13 ETSI TS 103 224 V1.5.1 (2020-03)
Figure 4: Positions of the recording microphones for binaural applications
Vertical positions are related to the vertical position of the EEP
6 Loudspeaker setup for background noise simulation
6.0 General setup
It should be noted that the position height of the loudspeakers as well as the exact spacing between them in general is
not critical since the equalization procedure described below accounts for the individual loudspeaker positions. The
difference which might be observed between different loudspeaker positions is a different deviation from the original
sound field at the intermediate positions of the microphone array. In order to allow better inter-lab accuracy of the
sound field reproduction the following positioning arrangement should be followed if the room allows.
Figure 5 shows the setup of the eight loudspeakers for the desired sound field reproduction. The vertical position of the
loudspeakers is adjusted so that the centre of every other loudspeaker (e.g. 1, 3, 5 and 7) is about 15 cm above the
HATS reference plane [i.6] and the centre of the remaining four loudspeakers (e.g. 2, 4, 6 and 8) is about 15 cm below
the HATS reference plane. The distance between the loudspeakers to the HATS as well as the horizontal distribution of
the loudspeakers can be selected depending on the room, hence the spacing between the loudspeakers does not have to
be exactly equal. The setup may be a square or a circle around the HATS or a setup in between depending on what fits
the room best.
The distance between the surface of the artificial head and the loudspeaker fronts should be at least 50 cm and should
not exceed 2,5 m. Note, that the maximum distance is also limited by the maximum sound pressure level which can be
produced by the loudspeakers. For the application of reproducing realistic background noises the reproduction of a
maximum sound pressure level of 105 dB SPL in the frequency range from 50 Hz to 5 kHz is considered to be
sufficient. Due to the typically much lower signal energy from 5 kHz to 20 kHz the sound pressure level produced at
such frequencies may be lower. In general it is advisable to select high quality loudspeakers with a mostly flat free-field
response characteristics and low distortion at maximum desired sound pressure.
ETSI
14 ETSI TS 103 224 V1.5.1 (2020-03)
Figure 5: General loudspeaker setup and principle of the equalization paths
for the handset and headset measurement setup
6.1 Test room requirements
The room required by the reproduction technique may vary from acoustically treated office rooms to anechoic rooms.
The playback room should meet the following requirements:
• Room size:
The room size should be in a range between 1,8 m × 2,4 m × 2,1 m to 8 m × 9 m × 4,5 m (L x W x H).
• Room acoustic parameter clarity 80:
The most important criterion a room has to fulfil depends on the clarity 80 (C ) [i.11]. This parameter is
defined as the signal energy of the first 80 ms of the impulse response (IR), ℎ, in relation to the remaining
energy of the impulse response expressed in dB:
� �10���� � (2)
Where, is the arrival time of the impulse response direct sound wave, is 80 ms after the arrival of the
direct sound wave, and is the effective length of the impulse response. For sound field reproduction
systems, 1 000 shall be used. Impulse responses shorter than 1 000 ms shall be zero-padded to
1 000 ms prior to C calculation.
For a noise field reproduction setup with l=1.L loudspeakers and i=1.N microphone positions, the system
identification (see clause 6.2.2) provides L·N impulse responses, ℎ . For each impulse response, the clarity
80 can be determined according to equation 2. The average clarity C of the reproduction setup is
determined as the arithmetic mean across all L·N C values in dB.
Suitable reproduction setups shall have an average C > 20 dB.
• Treatment of the room:
Office type rooms should be equipped with a carpet on the floor and some acoustical damping in the ceiling as
typically found in office rooms. A curtain should cover one or two walls in order to avoid strong reflections by
hard surfaces in the room. Additional damping materials may need to be applied in order to reach the C
value given above.
ETSI
15 ETSI TS 103 224 V1.5.1 (2020-03)
For anechoic or semi-anechoic chambers no additional treatment is needed.
• Noise floor:
In order to reduce the influence of external noise, the noise floor measured in a room should be less than
30 dB (A).
SPL
6.2 Equalization and calibration
6.2.0 Overview of the equalization procedure
For equalization the same microphone array setup with the same microphone position has to be used as for the
recording setup described in clause 5.2. Accordingly, the microphone array has to be calibrated as described in
clause 5.1.2. The equalization itself can then be performed completely automated - independent of the microphone array
setup.
Figure 6: Blocks of the equalization procedure
Figure 6 shows the overview of the complete equalization procedure, which consists of the following steps:
1) separate level adjustment for every loudspeaker;
2) system identification;
3) pre-processing of the impulse responses for equalization;
4) calculation of the inversion filters;
5) first test of inversion with recorded noise and adjusting the filters to compensate possible inversion errors.
Each of these steps is described in clause 6.2.
6.2.1 Separate level adjustment for each loudspeaker
First of all, the sound pressure level of each loudspeaker is adjusted to be the same for all loudspeakers. To achieve that
the average sound pressure level is measured and calculated across the whole frequency range and across the eight
microphones. The average sound pressure level for every loudspeaker should be at least 70 dB SPL, which is necessary
to ensure a sufficient SNR for measuring the impulse responses in the next step. Care should be taken not to overload
the loudspeakers, a headroom of at least 30 dB should be left for the equalization procedure. The test signal used for the
loudspeaker level adjustment is a logarithmic sweep signal (or MLS signal) with a constant amplitude. The signal level
is determined by averaging the signal level over the entire sweep (resp. MLS) signal.
6.2.2 System identification
In this step the impulse responses between all combinations of loudspeakers and microphones are measured. Figure 5
shows the eight-microphone/eight-loudspeaker setup. H represents the impulse response (frequency response) from
ri
loudspeaker r to microphone i.
ETSI
16 ETSI TS 103 224 V1.5.1 (2020-03)
There exist different possibilities for measuring impulse responses, e.g. using maximum length sequences (MLS) or
using swept-sines (sweeps). The advantage of sweeps is that non-linearities can easily be observed and that the SNR in
lower frequencies is higher than with MLS. Using sweeps is therefore recommended for system identification. Using
the sweep S(f), which is played back with the loudspeaker, and the recorded microphone signal Y(f), the frequency
response H (f) can be calculated as:
ri
= .
()
(3)
For sufficient system identification, the response should be calculated in the frequency range 20 Hz to 20 kHz.
6.2.3 Pre-processing of the impulse responses
As motivated in clause 5.1.1, the sound field is only correctly reproduced up to a cut-off frequency, which is in the
range between 2 kHz and 3 kHz for the given microphone setup. For higher frequencies, the tail of the impulse response
degrades the quality of the inversion. To cope with that, a low pass filter with a time-variant cut-off frequency is applied
to the measured impulse response.
1) Filter the impulse response in the frequency domain by multiplying a window function which has the value 1
for frequencies smaller than f and 0 for frequencies greater than f . Between those two frequencies the
window function has a cosine (hanning) characteristic.
2) The pre-processed impulse response is calculated in the time-domain as the weighted sum of the original
impulse response and the lowpass-filtered impulse response from step 1).
The time-variant weighting of the original impulse response is 1 up to t after the start of the impulse
response and falls down to 0 with a cosine (hanning) characteristic until t after the start of the impulse
response. The time-variant weighting of the lowpass-filtered impulse response from step 1) is 1 minus the
weighting of the original impulse response.
The values for these parameters are given in Table 1.
Table 1: Parameters for pre-processing the impulse responses
between loudspeakers and microphones
f t
2 kHz 1 ms
pass min
f t
4 kHz 4 ms
stop max
ETSI
17 ETSI TS 103 224 V1.5.1 (2020-03)
20k
10k
5k
2k
1k
25m 50m 75m t/s 0.1 0.125 0.15
-140 -120 p/dB[Pa]-80 -60 -40
20k
10k
5k
2k
1k
25m 50m 75m t/s 0.1 0.125 0.15
-140 -120 -100 L/dB -80 -60 -40
Figure 7: Comparison of spectrum vs. time analysis: original (measured) impulse response (upper)
and pre-processed impulse response (lower)
The result of this procedure can be seen in Figure 8 as an example a typical test room.
60m 60m
40m 40m
20m 20m
0 0
-20m -20m
-40m -40m
-60m -60m
50m 60m 70m t/s 80m 90m 0.1 50m 60m 70m t/s 80m 90m 0.1
Figure 8: Original (measured) impulse response (left) and processed impulse response (right)
ETSI
p/Pa
f/Hz f/Hz
/
18 ETSI TS 103 224 V1.5.1 (2020-03)
6.2.4 Calculation of the inversion filters
6.2.4.0 Overview
Figure 9 provides a block diagram of the inversion process of the impulse responses.
Figure 9: Block diagram of the inversion process of the impulse responses
NOTE: Every subband is processed with a subset of impulse responses according to the microphones used in each
subband as defined in Table 2. The matrix-size of the matrix which has to be inverted is given in brackets,
e.g. "Subband 1 (3×8)".
All impulse responses for all combinations of loudspeakers and microphones are inverted individually. Each impulse
response is first segmented in four subbands as specified in Figure 9. As described in Table 2, only a subset of all
microphones is used in each subband.
After finalizing the regularization process the resulting inverted impulse responses are combined in the different
frequency bands chosen by applying the appropriate frequency weighting (bandpass filtering) shown in the filter blocks
after the inversion blocks in Figure 9 are available.
6.2.4.1 Inversion procedure
The starting point for developing the inversion procedure is writing the signal P (f) arriving at microphone i as a linear
i
combination of the signals X (f) played back with loudspeaker l multiplied with a transfer function H (f), which models
l li
the acoustic path between microphone and loudspeaker. This can easily be seen in Figure 5.
ETSI
19 ETSI TS 103 224 V1.5.1 (2020-03)
=.
=1
(4)
This linear combina
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...