Speech and multimedia Transmission Quality (STQ) - Transmission requirements for narrowband VoIP terminals (handset and headset) from a QoS perspective as perceived by the user

The present document provides speech transmission performance requirements for 4 kHz narrowband VoIP handset and
headset terminals; it addresses all types of IP based terminals, including wireless and soft phones.
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
It is the intention of the present document to describe terminal performance parameters in such way that the remaining
variation of parameters can be assessed purely by the E-model.

Kakovost prenosa govora in večpredstavnih vsebin (STQ) - Prenosne zahteve za ozkopasovne terminale VoIP (ročne in naglavne) glede na kakovost storitev (QoS), kot jih dojema uporabnik

V tem dokumentu so podane zahteve glede učinkovitosti prenosa govora za 4-kHz ozkopasovne terminale VoIP (ročne in naglavne); obravnava vse vrste terminalov na podlagi naslova IP, vključno z brezžičnimi in programskimi telefoni.
V nasprotju z ostalimi standardi, ki opredeljujejo minimalne zahteve glede učinkovitosti, je namen tega dokumenta določiti zahteve za terminalsko opremo, ki proizvajalcem in ponudnikom storitev omogočajo, da zagotavljajo dobro kakovost govora od začetka do konca, kot jo dojema uporabnik.
Poleg osnovnih preskusnih postopkov ta dokument opisuje napredne preskusne postopke, ki upoštevajo še druge parametre kakovosti, kot jih dojema uporabnik.
Namen tega dokumenta je opisati parametre zmogljivosti terminalov, tako da je mogoče preostale spremembe parametrov oceniti zgolj z E-modelom.

General Information

Status
Published
Publication Date
17-Jan-2017
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
09-Jan-2017
Due Date
16-Mar-2017
Completion Date
18-Jan-2017
Standard
ETSI ES 202 737 V1.5.1 (2016-10) - Speech and multimedia Transmission Quality (STQ); Transmission requirements for narrowband VoIP terminals (handset and headset) from a QoS perspective as perceived by the user
English language
48 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ETSI ES 202 737 V1.5.1 (2017-01) - Speech and multimedia Transmission Quality (STQ); Transmission requirements for narrowband VoIP terminals (handset and headset) from a QoS perspective as perceived by the user
English language
49 pages
sale 15% off
Preview
sale 15% off
Preview
Standardization document
SIST ES 202 737 V1.5.1:2017
English language
49 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


Final draft ETSI ES 202 737 V1.5.1 (2016-10)

ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Transmission requirements for narrowband
VoIP terminals (handset and headset)
from a QoS perspective as perceived by the user

2 Final draft ETSI ES 202 737 V1.5.1 (2016-10)

Reference
RES/STQ-242
Keywords
narrowband, quality, speech, telephony, terminal,
VoIP
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the
print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2016.
All rights reserved.
TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members.
TM
3GPP and LTE™ are Trade Marks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI
3 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 7
3 Definitions and abbreviations . 8
3.1 Definitions . 8
3.2 Abbreviations . 8
4 General considerations . 9
4.1 Default coding algorithm . 9
4.2 End-to-end considerations . 10
5 Test equipment . 10
5.1 IP half channel measurement adaptor . 10
5.2 Environmental conditions for tests . 10
5.3 Accuracy of measurements and test signal generation . 11
5.4 Network impairment simulation . 11
5.5 Acoustic environment . 12
5.6 Influence of terminal delay on measurements . 12
6 Requirements and associated measurement methodologies . 12
6.1 Notes . 12
6.2 Test setup. 13
6.2.1 General . 13
6.2.2 Setup for handsets and headsets . 13
6.2.3 Position and calibration of HATS . 14
6.2.4 Test signal levels . 14
6.2.5 Setup of background noise simulation . 14
6.3 Coding independent parameters . 15
6.3.1 Send frequency response . 15
6.3.2 Send Loudness Rating (SLR). 16
6.3.3 Mic mute . 16
6.3.4 Linearity range for SLR . 17
6.3.5 Send distortion . 18
6.3.6 Out-of-band signals in send direction . 18
6.3.7 Send noise . 19
6.3.8 Sidetone Masking Rating STMR (mouth to ear) . 19
6.3.9 Sidetone delay . 20
6.3.10 Terminal Coupling Loss weighted (TCLw) . 20
6.3.11 Stability loss. 21
6.3.12 Receive frequency response . 22
6.3.13 Receive Loudness Rating (RLR) . 25
6.3.14 Receive distortion . 25
6.3.15 Out-of-band signals in receive direction . 26
6.3.16 Minimum activation level and sensitivity in receive direction . 26
6.3.17 Receive noise . 26
6.3.18 Automatic level control in receive . 27
6.3.19 Double talk performance . 27
6.3.19.1 General . 27
6.3.19.2 Attenuation range in send direction during double talk A . 27
H,S,dt
6.3.19.3 Attenuation range in receive direction during double talk A . 28
H,R,dt
ETSI
4 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
6.3.19.4 Detection of echo components during double talk . 29
6.3.19.5 Minimum activation level and sensitivity of double talk detection . 30
6.3.20 Switching characteristics . 30
6.3.20.1 Note . 30
6.3.20.2 Activation in send direction . 30
6.3.20.3 Silence suppression and comfort noise generation . 31
6.3.21 Background noise performance . 31
6.3.21.1 Performance in send direction in the presence of background noise . 31
6.3.21.2 Speech quality in the presence of background noise . 32
6.3.21.3 Quality of background noise transmission (with far end speech). 33
6.3.22 Quality of echo cancellation . 34
6.3.22.1 Temporal echo effects . 34
6.3.22.2 Spectral echo attenuation . 34
6.3.22.3 Occurrence of artefacts . 35
6.3.22.4 Variable echo path. 35
6.3.23 Variant impairments; network dependant . 36
6.3.23.1 Clock accuracy send . 36
6.3.23.2 Clock accuracy receive . 36
6.3.23.3 Send delay variation . 36
6.3.24 Send and receive delay - round trip delay . 37
6.4 Codec specific requirements. 39
6.4.1 Objective listening speech quality MOS-LQO in send direction . 39
6.4.2 Objective listening quality MOS-LQO in receive direction . 39
6.4.3 Quality of jitter buffer adjustment . 41
Annex A (informative): Processing delays in VoIP terminals . 43
Annex B (informative): Example IP delay variation . 46
Annex C (informative): Bibliography . 47
History . 48

ETSI
5 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This final draft ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia
Transmission Quality (STQ), and is now submitted for the ETSI standards Membership Approval Procedure.
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Traditionally, the analogue and digital telephones were interfacing switched-circuit 64 kbit/s PCM networks. With the
fast growth of IP networks, terminals directly interfacing packet-switched networks (VoIP) are being rapidly
introduced. Such IP network edge devices may include gateways, specifically designed IP phones, soft phones or other
devices connected to the IP based networks and providing telephony service. Since the IP networks will be in many
cases interworking with the traditional PSTN and private networks, many of the basic transmission requirements have
to be harmonised with specifications for traditional digital terminals. However, due to the unique characteristics of the
IP networks including packet loss, delay, etc. new performance specifications, as well as appropriate measuring
methods, will have to be developed. Terminals are getting increasingly complex, advanced signal processing is used to
address the IP specific issues. Also, the VoIP terminals may use other than 64 kbit/s PCM (Recommendation
ITU-T G.711 [7]) speech algorithms.
The advanced signal processing of terminals is targeted to speech signals. Therefore, wherever possible speech signals
are used for testing in order to achieve mostly realistic test conditions and meaningful results.
The present document provides speech transmission performance for narrowband VoIP handset and headset terminals.
NOTE: Requirement limits are given in tables, the associated curve when provided is given for illustration.
ETSI
6 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
1 Scope
The present document provides speech transmission performance requirements for 4 kHz narrowband VoIP handset and
headset terminals; it addresses all types of IP based terminals, including wireless and soft phones.
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
It is the intention of the present document to describe terminal performance parameters in such way that the remaining
variation of parameters can be assessed purely by the E-model.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] ETSI EN 300 726: "Digital cellular telecommunications system (Phase 2+) (GSM); Enhanced Full
Rate (EFR) speech transcoding (GSM 06.60)".
[2] ETSI TS 126 171: "Digital cellular telecommunications system (Phase 2+); Universal Mobile
Telecommunications System (UMTS); LTE; Speech codec speech processing functions; Adaptive
Multi-Rate - Wideband (AMR-WB) speech codec; General description (3GPP TS 26.171)".
[3] Recommendation ITU-T G.107: "The E-model: a computational model for use in transmission
planning".
[4] Recommendation ITU-T G.108: "Application of the E-model: A planning guide".
[5] Recommendation ITU-T G.109: "Definition of categories of speech transmission quality".
[6] Recommendation ITU-T G.122: "Influence of national systems on stability and talker echo in
international connections".
[7] Recommendation ITU-T G.711: "Pulse code modulation (PCM) of voice frequencies".
[8] Recommendation ITU-T G.723.1: "Dual rate speech coder for multimedia communications
transmitting at 5.3 and 6.3 kbit/s".
[9] Recommendation ITU-T G.726: "40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code
Modulation (ADPCM)".
[10] Recommendation ITU-T G.729: "Coding of speech at 8 kbit/s using conjugate-structure algebraic-
code-excited linear prediction (CS-ACELP)".
[11] Recommendation ITU-T G.729.1: "G.729-based embedded variable bit-rate coder: An 8-32 kbit/s
scalable wideband coder bitstream interoperable with G.729".
ETSI
7 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
[12] Recommendation ITU-T P.56: "Objective measurement of active speech level".
[13] Recommendation ITU-T P.57: "Artificial ears".
[14] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[15] Recommendation ITU-T P.64: "Determination of sensitivity/frequency characteristics of local
telephone systems".
[16] Recommendation ITU-T P.79: "Calculation of loudness ratings for telephone sets".
[17] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
[18] Recommendation ITU-T P.380: "Electro-acoustic measurements on headsets".
[19] Recommendation ITU-T P.501: "Test signals for use in telephonometry".
[20] Recommendation ITU-T P.502: "Objective test methods for speech communication systems using
complex test signals".
[21] Recommendation ITU-T P.581: "Use of head and torso simulator for hands-free and handset
terminal testing".
[22] IEC 61260-1: "Electroacoustics - Octave-band and fractional-octave-band filters - Part 1:
Specifications".
[23] Recommendation ITU-T P.800.1: "Mean Opinion Score (MOS) terminology".
[24] ETSI ES 202 739: "Speech and multimedia Transmission Quality (STQ); Transmission
requirements for wideband VoIP terminals (handset and headset) from a QoS perspective as
perceived by the user".
[25] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
[26] Recommendation ITU-T P.863: "Perceptual objective listening quality assessment".
[27] Recommendation ITU-T P.863.1: "Application guide for Recommendation ITU-T P.863".
[28] Recommendation ITU-T P.1010: "Fundamental voice transmission objectives for VoIP terminals
and gateways".
[29] IETF RFC 3550: "RTP: A Transport Protocol for Real-Time Applications".
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] ETSI EG 201 377-1: "Speech and multimedia Transmission Quality (STQ); Specification and
measurement of speech transmission quality; Part 1: Introduction to objective comparison
measurement methods for one-way speech quality across networks".
[i.2] ETSI EG 202 425: "Speech Processing, Transmission and Quality Aspects (STQ); Definition and
implementation of VoIP reference point".
ETSI
8 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
[i.3] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".
[i.4] NIST Net.
NOTE: Available at http://snad.ncsl.nist.gov/itg/nistnet/.
[i.5] Netem.
NOTE: Available at http://www.linuxfoundation.org/en/Net:Netem.
[i.6] Trace Control for Netem (TCN): "A. Keller, Trace Control for Netem, Semester Thesis
SA-2006-15, ETH Zürich, 2006".
3 Definitions and abbreviations
3.1 Definitions
For the purposes of the present document, the following terms and definitions apply:
artificial ear: device for the calibration of earphones incorporating an acoustic coupler and a calibrated microphone for
the measurement of the sound pressure and having an overall acoustic impedance similar to that of the median adult
human ear over a given frequency band
codec: combination of an analogue-to-digital encoder and a digital-to-analogue decoder operating in opposite directions
of transmission in the same equipment
Composite Source Signal (CSS): signal composed in time by various signal elements
diffuse field equalization: equalization of the HATS sound pick-up, equalization of the difference, in dB, between the
spectrum level of the acoustic pressure at the ear Drum Reference Point (DRP) and the spectrum level of the acoustic
pressure at the HATS Reference Point (HRP) in a diffuse sound field with the HATS absent using the reverse nominal
curve given in table 3 of Recommendation ITU-T P.58 [14]
Ear Reference Point (ERP): virtual point for geometric reference located at the entrance to the listener's ear,
traditionally used for calculating telephonometric loudness ratings
ear-Drum Reference Point (DRP): point located at the end of the ear canal, corresponding to the ear-drum position
freefield reference point: point located in the free sound field, at least in 1,5 m distance from a sound source radiating
in free air
NOTE: In case of a head and torso simulator (HATS) in the centre of the artificial head with no artificial head
present.
Head And Torso Simulator (HATS) for telephonometry: manikin extending downward from the top of the head to
the waist, designed to simulate the sound pick-up characteristics and the acoustic diffraction produced by a median
human adult and to reproduce the acoustic field generated by the human mouth
Mouth Reference Point (MRP): point located on axis and 25 mm in front of the lip plane of a mouth simulator
nominal setting of the volume control: when a receive volume control is provided, the setting which is closest to the
nominal RLR of 2 dB
3.2 Abbreviations
For the purposes of the present document, the following abbreviations apply:
AM-FM Amplitude Modulation-Frequency Modulation
AMR Adaptative Multi-Rate
ETSI
9 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
AMR-NB Adaptive Multi-Rate NarrowBand
CS Composite Source
CSS Composite Source Signal
DRP ear Drum Reference Point
EC Echo Canceller
EFR Enhanced Full Rate
EL Echo Loss
ERP Ear Reference Point
ETH Eidgenössische Technische Hochschule
FFT Fast Fourrier Transform
GSM Global System for Mobile communications
HATS Head And Torso Simulator
IEC International Electrotechnical Commission
IP Internet Protocol
IPDV IP Packet Delay Variation
ITU-T International Telecommunication Union - Telecommunication standardization sector
MOS Mean Opinion Score
MOS-LQOy Mean Opinion Score - Listening Quality Objective, y being N for narrow-band, M for mixed and S
for superwideband
NOTE: See Recommendation ITU-T P.800.1 [23].
MRP Mouth Reference Point
NIST National Institute of Standards and Technology
NLP Non Linear Processor
PBX Private Branch eXchange
PC Personal Computer
PCM Pulse Code Modulation
POLQA Perceptual Objective Listening Quality Assessment
PLC Packet Loss Concealment
PN Pseudo-random Noise
POI Point Of Interconnect
PSTN Public Switched Telephone Network
QoS Quality of Service
RLR Receive Loudness Rating
RMS Root Mean Square
RTP Real Time Protocol
SLR Send Loudness Rating
STMR SideTone Masking Rating
TCLw Terminal Coupling Loss (weighted)
TCN Trace Control for Netem
TDM Time Division Multiplex
TOSQA Telecommunication Objective Speech Quality Assessment
VAD Voice Activity Detector
4 General considerations
4.1 Default coding algorithm
VoIP terminals shall support the coding algorithm according to Recommendation ITU-T G.711 [7] (both µ-law and
A-law). VoIP terminals may support other coding algorithms.
NOTE: Associated Packet Loss Concealment (PLC) e.g. as defined in Recommendation ITU-T G.711 [7]
appendix I should be used.
ETSI
10 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
4.2 End-to-end considerations
In order to achieve a desired end-to-end speech transmission performance (mouth-to-ear) it is recommended that the
general rules of transmission planning are carried out with the E-model of Recommendation ITU-T G.107 [3] taking
into account that the E-model does not yet address headsets; this includes the a-priori determination of the desired
category of speech transmission quality as defined in Recommendation ITU-T G.109 [5].
While, in general, the transmission characteristics of single circuit-oriented network elements, such as switches or
terminals can be assumed to have a single input value for the planning tasks of Recommendation ITU-T G.108 [4], this
approach is not applicable in packet based systems and thus there is a need for the transmission planner's specific
attention.
In particular the decision as to which delay measured according to the present document should is acceptable or
representative for the specific configuration is the responsibility of the individual transmission planner.
Recommendation ITU-T G.108 with its amendments [4] provides further guidance on this important issue.
The following optimum terminal parameters from a users' perspective need to be considered:
• Minimized delay in send and receive direction.
• Optimum loudness Rating (RLR, SLR).
• Compensation for network delay variation.
• Packet loss recovery performance.
• Maximized terminal coupling loss.
5 Test equipment
5.1 IP half channel measurement adaptor
The IP half channel measurement adaptor is described in ETSI EG 202 425 [i.2].
5.2 Environmental conditions for tests
The following conditions shall apply for the testing environment:
a) Ambient temperature: 15 °C to 35 °C (inclusive);
b) Relative humidity: 5 % to 85 %;
c) Air pressure: 86 kPa to 106 kPa (860 mbar to 1 060 mbar).
ETSI
11 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
5.3 Accuracy of measurements and test signal generation
Unless specified otherwise, the accuracy of measurements made by test equipment shall be equal to or better than:
Table 1: Measurement accuracy
Item Accuracy
Electrical signal level
±0,2 dB for levels ≥ -50 dBV
±0,4 dB for levels < -50 dBV
Sound pressure ±0,7 dB
Frequency ±0,2 %
Time ±0,2 %
Application force ± 2 N
Measured maximum frequency 20 kHz

Unless specified otherwise, the accuracy of the signals generated by the test equipment shall be better than:
Table 2: Accuracy of test signal generation
Quantity Accuracy
Sound pressure level at ±3 dB for frequencies from 100 Hz to 200 Hz
Mouth Reference Point (MRP) ±1 dB for frequencies from 200 Hz to 4 000 Hz
±3 dB for frequencies from 4 000 Hz to 14 000 Hz
Electrical excitation levels ±0,4 dB across the whole frequency range
Frequency generation ±2 % (see note)
Time ±0,2 %
Specified component values ±1 %
NOTE: This tolerance may be used to avoid measurements at critical frequencies, e.g. those
due to sampling operations within the terminal under test.

For terminal equipment which is directly powered from the mains supply, all tests shall be carried out within ±5 % of
the rated voltage of that supply. If the equipment is powered by other means and those means are not supplied as part of
the apparatus, all tests shall be carried out within the power supply limit declared by the supplier. If the power supply is
a.c. the test shall be conducted within ±4 % of the rated frequency.
5.4 Network impairment simulation
At least one set of requirements is based on the assumption of an error free packet network, and at least one other set of
requirements is based on a defined simulated malperformance of the packet network.
An appropriate network simulator has to be used, for example NIST net [i.4] (http://snad.ncsl.nist.gov/itg/nistnet/) or
Netem [i.5].
Based on the positive experience, STQ have made during the ETSI Speech Quality Test Events with "NIST Net" this
will be taken as a basis to express and describe the variations of packet network parameters for the appropriate tests.
Here is a brief blurb about NIST Net:
The NIST Net network emulator is a general-purpose tool for emulating performance dynamics in IP
networks. The tool is designed to allow controlled, reproducible experiments with network performance
sensitive/adaptive applications and control protocols in a simple laboratory setting. By operating at the IP
level, NIST Net can emulate the critical end-to-end performance characteristics imposed by various wide area
network situations (e.g. congestion loss) or by various underlying sub network technologies (e.g. asymmetric
bandwidth situations of xDSL and cable modems).
TM
NIST Net is implemented as a kernel module extension to the Linux operating system and an X Window
System-based user interface application. In use, the tool allows an inexpensive PC-based router to emulate
numerous complex performance scenarios, including: tunable packet delay distributions, congestion and
background loss, bandwidth limitation, and packet reordering/duplication. The X interface allows the user to
select and monitor specific traffic streams passing through the router and to apply selected performance
"effects" to the IP packets of the stream. In addition to the interactive interface, NIST Net can be driven by
ETSI
12 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
traces produced from measurements of actual network conditions. NIST Net also provides support for user
defined packet handlers to be added to the system. Examples of the use of such packet handlers include: time
stamping/data collection, interception and diversion of selected flows, generation of protocol responses from
emulated clients.
The key points of Netem can be summarized as follows:
TM
• Netem is nowadays part of most Linux distributions, it only has to be switched on, when compiling a kernel.
With Netem, there are the same possibilities as with nistnet, there can be generated loss, duplication, delay and
TM
jitter (and the distribution can be chosen during runtime). Netem can be run on a Linux -PC running as a
bridge or a router (Nistnet only runs on routers).
• With an amendment of Netem, TCN (Trace Control for Netem) [i.6] which was developed by ETH Zurich, it
is even possible, to control the behaviour of single packets via a trace file. So it is for example possible to
generate a single packet loss, or a specific delay pattern. This amendment is planned to be included in new
TM
Linux kernels, nowadays it is available as a patch to a specific kernel and to the iproute2 tool (iproute2
contains Netem).
• It is not advised to define specific distortion patterns for testing in standards, because it will be easy to adapt
devices to these patterns (as it is already done for test signals). But if a pattern is unknown to a manufacturer,
the same pattern can be used by a test lab for different devices and gives comparable results. It is also possible
to take a trace of Nistnet distortions, generate a file out of this and playback the exact same distortions with
Netem.
TM TM TM TM
NOTE : NIST Net , NETEM , Linux and X Window System are examples of suitable products available
commercially. This information is given for the convenience of users of the present document and does
not constitute an endorsement by ETSI of these product(s).
5.5 Acoustic environment
Unless stated otherwise measurements shall be conducted under quiet and "anechoic" conditions. Depending on the
distance of the transducers from mouth and ear a quiet office room may be sufficient e.g. for handsets where artificial
mouth and artificial ear are located close to the acoustical transducers.
However, for some headsets or handset terminals with smaller dimension an anechoic room will be required.
In cases where real or simulated background noise is used as part of the testing environment, the original background
noise shall not be noticeably influenced by the acoustical properties of the room.
In all cases where the performance of acoustic echo cancellers shall be tested a realistic room which represents the
typical user environment for the terminal shall be used.
Standardized measurement methods for measurements with variable echo paths are for further study.
5.6 Influence of terminal delay on measurements
As delay is introduced by the terminal, care shall be taken for all measurements where exact position of the analysis
window is required. It shall be checked that the test is performed on the test signal and not on any other signal.
6 Requirements and associated measurement
methodologies
6.1 Notes
NOTE 1: In general the test methods as described in the present document apply. If alternative methods exist they
may be used if they have been proven to give the same result as the method described in the present
document. This will be indicated in the test report.
ETSI
13 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
NOTE 2: Due to the time variant nature of IP connections delay variation may impair the measurements. In such
cases the measurement has to be repeated until a valid measurement result is achieved.
6.2 Test setup
6.2.1 General
The preferred acoustical access to terminals is the most realistic simulation of the "average" subscriber. This can be
made by using HATS (Head And Torso Simulator) with appropriate ear simulation and appropriate means to fix
handset and headset terminals in a realistic and reproducible way to the HATS. HATS is described in Recommendation
ITU-T P.58 [14], appropriate ears are described in Recommendation ITU-T P.57 [13] (type 3.3 and type 3.4 ear), a
proper positioning of handsets under realistic conditions is to be found in Recommendation ITU-T P.64 [15].
The preferred way of testing a terminal is to connect it to a network simulator with exact defined settings and access
points. The test sequences are fed in either electrically, using a reference codec or using the direct signal processing
approach or acoustically using ITU-T specified devices.
When a coder with variable bit rate is used for testing terminal electro acoustical parameters, the bit rate recognized
giving the best characteristics should be selected, e.g.:
• AMR-NB (ETSI TS 126 171 [2]): 12,2 kbit/s.
• Recommendation ITU-T G.729.1 [11]: 32 kbit/s.
IP-Half-Channel VoIP
Network
Measurement Terminal
simulator
Adapter under
delay,
(VoIP Reference Point)
Path through Path through
test
jitter,
Gateway
IP network IP network
packet loss
Simulation
POI
Electrical
Reference
Point
Measurement System
Figure 1: Half channel terminal measurement
6.2.2 Setup for handsets and headsets
When using a handset telephone the handset is placed in the HATS position as described in Recommendation
ITU-T P.64 [15]. The artificial mouth shall be conforming to Recommendation ITU-T P.58 [14]. The artificial ear shall
be conforming to Recommendation ITU-T P.57 [13], type 3.3 or type 3.4 ears shall be used.
ETSI
14 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
Recommendations for positioning headsets are given in Recommendation ITU-T P.380 [18]. If not stated otherwise
headsets shall be placed in their recommended wearing position. Further information about setup and the use of HATS
can be found in Recommendation ITU-T P.380 [18].
Unless stated otherwise if a volume control is provided the setting is chosen such that the nominal RLR is met as close
as possible.
Unless stated otherwise the application force of 8 N is used for handset testing. No application force is used for
headsets.
6.2.3 Position and calibration of HATS
All the send and receive characteristics shall be tested with the HATS, it shall be indicated what type of ear was used at
what application force. For handsets, if not stated otherwise 8 N application force shall be used.
The horizontal positioning of the HATS reference plane shall be guaranteed within ±2º.
The HATS shall be equipped with a type 3.3 or type 3.4 artificial ear for handsets. For binaural headsets two artificial
ears are required. The type 3.3 or type 3.4 artificial ears as specified in Recommendation P.57 [13] shall be used. The
artificial ear shall be positioned on HATS according to Recommendation ITU-T P.58 [14].
The exact calibration and equalization can be found in Recommendation ITU-T P.581 [21]. If not stated otherwise, the
HATS shall be diffuse-field equalized. The inverse nominal diffuse field curve as found in table 3 of Recommendation
ITU-T P.58 [14] shall be used.
NOTE: The inverse average diffuse field response characteristics of HATS as found in Recommendation
ITU-T P.58 [14] is used and not the specific one corresponding to the HATS used. Instead of using the
individual diffuse field correction, the average correction function is used because, for handset and
headset measurements, mostly the artificial ear, ear canal and ear impedance simulation are effective. The
individual diffuse-field correction function of HATS includes all diffraction and reflection effects of the
complete individual HATS which are not effective in the measurement and potentially would lead to
bigger measurement uncertainties than using the average correction.
6.2.4 Test signal levels
Unless specified otherwise, the test signal level shall be -4,7 dBPa at the MRP.
Unless specified otherwise, the applied test signal level at the digital input shall be -16 dBm0.
6.2.5 Setup of background noise simulation
A setup for simulating realistic background noises in a lab-type environment is described in ETSI TS 103 224 [25].
If not stated otherwise this setup is used in all measurements where background noise simulation is required.
The following noises of ETSI TS 103 224 [25] shall be used.
Table 2a
1: 77,2 dB 2: 76,6 dB
HATS and microphone array in a 3: 75,7 dB 4: 76,0 dB
Pub Noise (Pub) 30 seconds
pub 5: 76,0 dB 6: 76,3 dB
7: 76,0 dB 8: 76,4 dB
1: 66,6 dB 2: 66,1 dB
Sales Counter HATS and microphone array in a 3: 65,7 dB 4: 66,5 dB
30 seconds
(SalesCounter) supermarket 5: 66,3 dB 6: 66,8 dB
7: 66,6 dB 8: 67,1 dB
1: 60,2 dB 2: 60,0 dB
HATS and microphone array in 3: 60,1 dB 4: 60,8 dB
Callcenter 2 (Callcenter) 30 seconds
business office 5: 60,2 dB 6: 60,6 dB
7: 60,2 dB 8: 60,7 dB
ETSI
15 Final draft ETSI ES 202 737 V1.5.1 (2016-10)
6.3 Coding independent parameters
6.3.1 Send frequency response
Requirement
The send frequency response of the handset or the headset shall be within a mask as defined in table 3 and shown in
figure 2. This mask shall be applicable for all types of handsets and headsets.
Table 3: Send frequency response
Frequency Upper Limit Lower Limit
-10 dB
100 Hz
(see notes 2 and 3)
300 Hz 5 dB -5 dB
3 400 Hz 5 dB -5 dB
4 000 Hz 5 dB
NOTE 1: The limits for intermediate frequencies lie on a straight line drawn between the
given values on a linear (dB) - logarithmic (Hz) scale.
NOTE 2: Under conditions of high background noise a limit of -18 dB is recommended.
NOTE 3: In ETSI ES 202 739 [24], the limit is 0 dB.

Figure 2: Send frequency response mask
NOTE: The basis for the target frequency responses in send and receive is the orthotelefonic reference response
which is measured between 2 sub
...


ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Transmission requirements for narrowband
VoIP terminals (handset and headset)
from a QoS perspective as perceived by the user

2 ETSI ES 202 737 V1.5.1 (2017-01)

Reference
RES/STQ-242
Keywords
narrowband, quality, speech, telephony, terminal,
VoIP
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the
print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2017.
All rights reserved.
TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members.
TM
3GPP and LTE™ are Trade Marks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI
3 ETSI ES 202 737 V1.5.1 (2017-01)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 7
3 Definitions and abbreviations . 8
3.1 Definitions . 8
3.2 Abbreviations . 8
4 General considerations . 9
4.1 Default coding algorithm . 9
4.2 End-to-end considerations . 10
5 Test equipment . 10
5.1 IP half channel measurement adaptor . 10
5.2 Environmental conditions for tests . 10
5.3 Accuracy of measurements and test signal generation . 11
5.4 Network impairment simulation . 11
5.5 Acoustic environment . 12
5.6 Influence of terminal delay on measurements . 12
6 Requirements and associated measurement methodologies . 13
6.1 Notes . 13
6.2 Test setup. 13
6.2.1 General . 13
6.2.2 Setup for handsets and headsets . 14
6.2.3 Position and calibration of HATS . 14
6.2.4 Test signal levels . 15
6.2.5 Setup of background noise simulation . 15
6.3 Coding independent parameters . 15
6.3.1 Send frequency response . 15
6.3.2 Send Loudness Rating (SLR). 16
6.3.3 Mic mute . 17
6.3.4 Linearity range for SLR . 17
6.3.5 Send distortion . 18
6.3.6 Out-of-band signals in send direction . 19
6.3.7 Send noise . 19
6.3.8 Sidetone Masking Rating STMR (mouth to ear) . 20
6.3.9 Sidetone delay . 20
6.3.10 Terminal Coupling Loss weighted (TCLw) . 21
6.3.11 Stability loss. 21
6.3.12 Receive frequency response . 22
6.3.13 Receive Loudness Rating (RLR) . 25
6.3.14 Receive distortion . 25
6.3.15 Out-of-band signals in receive direction . 26
6.3.16 Minimum activation level and sensitivity in receive direction . 27
6.3.17 Receive noise . 27
6.3.18 Automatic level control in receive . 27
6.3.19 Double talk performance . 27
6.3.19.1 General . 27
6.3.19.2 Attenuation range in send direction during double talk A . 28
H,S,dt
6.3.19.3 Attenuation range in receive direction during double talk A . 29
H,R,dt
ETSI
4 ETSI ES 202 737 V1.5.1 (2017-01)
6.3.19.4 Detection of echo components during double talk . 29
6.3.19.5 Minimum activation level and sensitivity of double talk detection . 31
6.3.20 Switching characteristics . 31
6.3.20.1 Note . 31
6.3.20.2 Activation in send direction . 31
6.3.20.3 Silence suppression and comfort noise generation . 31
6.3.21 Background noise performance . 32
6.3.21.1 Performance in send direction in the presence of background noise . 32
6.3.21.2 Speech quality in the presence of background noise . 32
6.3.21.3 Quality of background noise transmission (with far end speech). 33
6.3.22 Quality of echo cancellation . 34
6.3.22.1 Temporal echo effects . 34
6.3.22.2 Spectral echo attenuation . 34
6.3.22.3 Occurrence of artefacts . 35
6.3.22.4 Variable echo path. 35
6.3.23 Variant impairments; network dependant . 36
6.3.23.1 Clock accuracy send . 36
6.3.23.2 Clock accuracy receive . 37
6.3.23.3 Send delay variation . 37
6.3.24 Send and receive delay - round trip delay . 38
6.4 Codec specific requirements. 39
6.4.1 Objective listening speech quality MOS-LQO in send direction . 39
6.4.2 Objective listening quality MOS-LQO in receive direction . 40
6.4.3 Quality of jitter buffer adjustment . 42
Annex A (informative): Processing delays in VoIP terminals . 44
Annex B (informative): Example IP delay variation . 47
Annex C (informative): Bibliography . 48
History . 49

ETSI
5 ETSI ES 202 737 V1.5.1 (2017-01)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia Transmission
Quality (STQ).
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Traditionally, the analogue and digital telephones were interfacing switched-circuit 64 kbit/s PCM networks. With the
fast growth of IP networks, terminals directly interfacing packet-switched networks (VoIP) are being rapidly
introduced. Such IP network edge devices may include gateways, specifically designed IP phones, soft phones or other
devices connected to the IP based networks and providing telephony service. Since the IP networks will be in many
cases interworking with the traditional PSTN and private networks, many of the basic transmission requirements have
to be harmonised with specifications for traditional digital terminals. However, due to the unique characteristics of the
IP networks including packet loss, delay, etc. new performance specifications, as well as appropriate measuring
methods, will have to be developed. Terminals are getting increasingly complex, advanced signal processing is used to
address the IP specific issues. Also, the VoIP terminals may use other than 64 kbit/s PCM (Recommendation ITU-T
G.711 [7]) speech algorithms.
The advanced signal processing of terminals is targeted to speech signals. Therefore, wherever possible speech signals
are used for testing in order to achieve mostly realistic test conditions and meaningful results.
The present document provides speech transmission performance for narrowband VoIP handset and headset terminals.
NOTE: Requirement limits are given in tables, the associated curve when provided is given for illustration.
ETSI
6 ETSI ES 202 737 V1.5.1 (2017-01)
1 Scope
The present document provides speech transmission performance requirements for 4 kHz narrowband VoIP handset and
headset terminals; it addresses all types of IP based terminals, including wireless and soft phones.
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
It is the intention of the present document to describe terminal performance parameters in such way that the remaining
variation of parameters can be assessed purely by the E-model.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] ETSI EN 300 726: "Digital cellular telecommunications system (Phase 2+) (GSM); Enhanced Full
Rate (EFR) speech transcoding (GSM 06.60)".
[2] ETSI TS 126 171: "Digital cellular telecommunications system (Phase 2+); Universal Mobile
Telecommunications System (UMTS); LTE; Speech codec speech processing functions; Adaptive
Multi-Rate - Wideband (AMR-WB) speech codec; General description (3GPP TS 26.171)".
[3] Recommendation ITU-T G.107: "The E-model: a computational model for use in transmission
planning".
[4] Recommendation ITU-T G.108: "Application of the E-model: A planning guide".
[5] Recommendation ITU-T G.109: "Definition of categories of speech transmission quality".
[6] Recommendation ITU-T G.122: "Influence of national systems on stability and talker echo in
international connections".
[7] Recommendation ITU-T G.711: "Pulse code modulation (PCM) of voice frequencies".
[8] Recommendation ITU-T G.723.1: "Dual rate speech coder for multimedia communications
transmitting at 5.3 and 6.3 kbit/s".
[9] Recommendation ITU-T G.726: "40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code
Modulation (ADPCM)".
[10] Recommendation ITU-T G.729: "Coding of speech at 8 kbit/s using conjugate-structure algebraic-
code-excited linear prediction (CS-ACELP)".
[11] Recommendation ITU-T G.729.1: "G.729-based embedded variable bit-rate coder: An 8-32 kbit/s
scalable wideband coder bitstream interoperable with G.729".
ETSI
7 ETSI ES 202 737 V1.5.1 (2017-01)
[12] Recommendation ITU-T P.56: "Objective measurement of active speech level".
[13] Recommendation ITU-T P.57: "Artificial ears".
[14] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[15] Recommendation ITU-T P.64: "Determination of sensitivity/frequency characteristics of local
telephone systems".
[16] Recommendation ITU-T P.79: "Calculation of loudness ratings for telephone sets".
[17] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
[18] Recommendation ITU-T P.380: "Electro-acoustic measurements on headsets".
[19] Recommendation ITU-T P.501: "Test signals for use in telephonometry".
[20] Recommendation ITU-T P.502: "Objective test methods for speech communication systems using
complex test signals".
[21] Recommendation ITU-T P.581: "Use of head and torso simulator for hands-free and handset
terminal testing".
[22] IEC 61260-1: "Electroacoustics - Octave-band and fractional-octave-band filters - Part 1:
Specifications".
[23] Recommendation ITU-T P.800.1: "Mean Opinion Score (MOS) terminology".
[24] ETSI ES 202 739: "Speech and multimedia Transmission Quality (STQ); Transmission
requirements for wideband VoIP terminals (handset and headset) from a QoS perspective as
perceived by the user".
[25] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
[26] Recommendation ITU-T P.863: "Perceptual objective listening quality assessment".
[27] Recommendation ITU-T P.863.1: "Application guide for Recommendation ITU-T P.863".
[28] Recommendation ITU-T P.1010: "Fundamental voice transmission objectives for VoIP terminals
and gateways".
[29] IETF RFC 3550: "RTP: A Transport Protocol for Real-Time Applications".
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] ETSI EG 201 377-1: "Speech and multimedia Transmission Quality (STQ); Specification and
measurement of speech transmission quality; Part 1: Introduction to objective comparison
measurement methods for one-way speech quality across networks".
[i.2] ETSI EG 202 425: "Speech Processing, Transmission and Quality Aspects (STQ); Definition and
implementation of VoIP reference point".
ETSI
8 ETSI ES 202 737 V1.5.1 (2017-01)
[i.3] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".
[i.4] NIST Net.
NOTE: Available at https://www-x.antd.nist.gov/itg/nistnet/.
[i.5] Netem.
NOTE: Available at http://www.linuxfoundation.org/en/Net:Netem.
[i.6] Trace Control for Netem (TCN): "A. Keller, Trace Control for Netem, Semester Thesis
SA-2006-15, ETH Zürich, 2006".
3 Definitions and abbreviations
3.1 Definitions
For the purposes of the present document, the following terms and definitions apply:
artificial ear: device for the calibration of earphones incorporating an acoustic coupler and a calibrated microphone for
the measurement of the sound pressure and having an overall acoustic impedance similar to that of the median adult
human ear over a given frequency band
codec: combination of an analogue-to-digital encoder and a digital-to-analogue decoder operating in opposite directions
of transmission in the same equipment
Composite Source Signal (CSS): signal composed in time by various signal elements
diffuse field equalization: equalization of the HATS sound pick-up, equalization of the difference, in dB, between the
spectrum level of the acoustic pressure at the ear Drum Reference Point (DRP) and the spectrum level of the acoustic
pressure at the HATS Reference Point (HRP) in a diffuse sound field with the HATS absent using the reverse nominal
curve given in table 3 of Recommendation ITU-T P.58 [14]
Ear Reference Point (ERP): virtual point for geometric reference located at the entrance to the listener's ear,
traditionally used for calculating telephonometric loudness ratings
ear-Drum Reference Point (DRP): point located at the end of the ear canal, corresponding to the ear-drum position
freefield reference point: point located in the free sound field, at least in 1,5 m distance from a sound source radiating
in free air
NOTE: In case of a head and torso simulator (HATS) in the centre of the artificial head with no artificial head
present.
Head And Torso Simulator (HATS) for telephonometry: manikin extending downward from the top of the head to
the waist, designed to simulate the sound pick-up characteristics and the acoustic diffraction produced by a median
human adult and to reproduce the acoustic field generated by the human mouth
Mouth Reference Point (MRP): point located on axis and 25 mm in front of the lip plane of a mouth simulator
nominal setting of the volume control: when a receive volume control is provided, the setting which is closest to the
nominal RLR of 2 dB
3.2 Abbreviations
For the purposes of the present document, the following abbreviations apply:
AM-FM Amplitude Modulation-Frequency Modulation
AMR Adaptative Multi-Rate
ETSI
9 ETSI ES 202 737 V1.5.1 (2017-01)
AMR-NB Adaptive Multi-Rate NarrowBand
CS Composite Source
CSS Composite Source Signal
DRP ear Drum Reference Point
EC Echo Canceller
EFR Enhanced Full Rate
EL Echo Loss
ERP Ear Reference Point
ETH Eidgenössische Technische Hochschule
FFT Fast Fourrier Transform
GSM Global System for Mobile communications
HATS Head And Torso Simulator
IEC International Electrotechnical Commission
IP Internet Protocol
IPDV IP Packet Delay Variation
ITU-T International Telecommunication Union - Telecommunication standardization sector
MOS Mean Opinion Score
MOS-LQOy Mean Opinion Score - Listening Quality Objective
NOTE: y being N for narrow-band, M for mixed and S for superwideband. See Recommendation
ITU-T P.800.1 [23].
MRP Mouth Reference Point
NIST National Institute of Standards and Technology
NLP Non Linear Processor
PBX Private Branch eXchange
PC Personal Computer
PCM Pulse Code Modulation
POLQA Perceptual Objective Listening Quality Assessment
PLC Packet Loss Concealment
PN Pseudo-random Noise
POI Point Of Interconnect
PSTN Public Switched Telephone Network
QoS Quality of Service
RLR Receive Loudness Rating
RMS Root Mean Square
RTP Real Time Protocol
SLR Send Loudness Rating
STMR SideTone Masking Rating
TCLw Terminal Coupling Loss (weighted)
TCN Trace Control for Netem
TDM Time Division Multiplex
TOSQA Telecommunication Objective Speech Quality Assessment
VAD Voice Activity Detector
4 General considerations
4.1 Default coding algorithm
VoIP terminals shall support the coding algorithm according to Recommendation ITU-T G.711 [7] (both µ-law and
A-law). VoIP terminals may support other coding algorithms.
NOTE: Associated Packet Loss Concealment (PLC) e.g. as defined in Recommendation ITU-T G.711 [7]
appendix I should be used.
ETSI
10 ETSI ES 202 737 V1.5.1 (2017-01)
4.2 End-to-end considerations
In order to achieve a desired end-to-end speech transmission performance (mouth-to-ear) it is recommended that the
general rules of transmission planning are carried out with the E-model of Recommendation ITU-T G.107 [3] taking
into account that the E-model does not yet address headsets; this includes the a-priori determination of the desired
category of speech transmission quality as defined in Recommendation ITU-T G.109 [5].
While, in general, the transmission characteristics of single circuit-oriented network elements, such as switches or
terminals can be assumed to have a single input value for the planning tasks of Recommendation ITU-T G.108 [4], this
approach is not applicable in packet based systems and thus there is a need for the transmission planner's specific
attention.
In particular the decision as to which delay measured according to the present document should is acceptable or
representative for the specific configuration is the responsibility of the individual transmission planner.
Recommendation ITU-T G.108 with its amendments [4] provides further guidance on this important issue.
The following optimum terminal parameters from a users' perspective need to be considered:
• Minimized delay in send and receive direction.
• Optimum loudness Rating (RLR, SLR).
• Compensation for network delay variation.
• Packet loss recovery performance.
• Maximized terminal coupling loss.
5 Test equipment
5.1 IP half channel measurement adaptor
The IP half channel measurement adaptor is described in ETSI EG 202 425 [i.2].
5.2 Environmental conditions for tests
The following conditions shall apply for the testing environment:
a) Ambient temperature: 15 °C to 35 °C (inclusive);
b) Relative humidity: 5 % to 85 %;
c) Air pressure: 86 kPa to 106 kPa (860 mbar to 1 060 mbar).
ETSI
11 ETSI ES 202 737 V1.5.1 (2017-01)
5.3 Accuracy of measurements and test signal generation
Unless specified otherwise, the accuracy of measurements made by test equipment shall be equal to or better than:
Table 1: Measurement accuracy
Item Accuracy
Electrical signal level
±0,2 dB for levels ≥ -50 dBV
±0,4 dB for levels < -50 dBV
Sound pressure ±0,7 dB
Frequency ±0,2 %
Time ±0,2 %
Application force ± 2 N
Measured maximum frequency 20 kHz

Unless specified otherwise, the accuracy of the signals generated by the test equipment shall be better than:
Table 2: Accuracy of test signal generation
Quantity Accuracy
Sound pressure level at ±3 dB for frequencies from 100 Hz to 200 Hz
Mouth Reference Point (MRP) ±1 dB for frequencies from 200 Hz to 4 000 Hz
±3 dB for frequencies from 4 000 Hz to 14 000 Hz
Electrical excitation levels ±0,4 dB across the whole frequency range
Frequency generation ±2 % (see note)
Time ±0,2 %
Specified component values ±1 %
NOTE: This tolerance may be used to avoid measurements at critical frequencies, e.g. those
due to sampling operations within the terminal under test.

For terminal equipment which is directly powered from the mains supply, all tests shall be carried out within ±5 % of
the rated voltage of that supply. If the equipment is powered by other means and those means are not supplied as part of
the apparatus, all tests shall be carried out within the power supply limit declared by the supplier. If the power supply is
a.c. the test shall be conducted within ±4 % of the rated frequency.
5.4 Network impairment simulation
At least one set of requirements is based on the assumption of an error free packet network, and at least one other set of
requirements is based on a defined simulated malperformance of the packet network.
An appropriate network simulator has to be used, for example NIST net [i.4] (https://www-x.antd.nist.gov/itg/nistnet/)
or Netem [i.5].
Based on the positive experience, STQ have made during the ETSI Speech Quality Test Events with "NIST Net" this
will be taken as a basis to express and describe the variations of packet network parameters for the appropriate tests.
Here is a brief blurb about NIST Net:
The NIST Net network emulator is a general-purpose tool for emulating performance dynamics in IP
networks. The tool is designed to allow controlled, reproducible experiments with network performance
sensitive/adaptive applications and control protocols in a simple laboratory setting. By operating at the IP
level, NIST Net can emulate the critical end-to-end performance characteristics imposed by various wide area
network situations (e.g. congestion loss) or by various underlying sub network technologies (e.g. asymmetric
bandwidth situations of xDSL and cable modems).
ETSI
12 ETSI ES 202 737 V1.5.1 (2017-01)
TM
NIST Net is implemented as a kernel module extension to the Linux operating system and an X Window
System-based user interface application. In use, the tool allows an inexpensive PC-based router to emulate
numerous complex performance scenarios, including: tunable packet delay distributions, congestion and
background loss, bandwidth limitation, and packet reordering/duplication. The X interface allows the user to
select and monitor specific traffic streams passing through the router and to apply selected performance
"effects" to the IP packets of the stream. In addition to the interactive interface, NIST Net can be driven by
traces produced from measurements of actual network conditions. NIST Net also provides support for user
defined packet handlers to be added to the system. Examples of the use of such packet handlers include: time
stamping/data collection, interception and diversion of selected flows, generation of protocol responses from
emulated clients.
The key points of Netem can be summarized as follows:
TM
• Netem is nowadays part of most Linux distributions, it only has to be switched on, when compiling a kernel.
With Netem, there are the same possibilities as with nistnet, there can be generated loss, duplication, delay and
TM
jitter (and the distribution can be chosen during runtime). Netem can be run on a Linux -PC running as a
bridge or a router (Nistnet only runs on routers).
• With an amendment of Netem, TCN (Trace Control for Netem) [i.6] which was developed by ETH Zurich, it
is even possible, to control the behaviour of single packets via a trace file. So it is for example possible to
generate a single packet loss, or a specific delay pattern. This amendment is planned to be included in new
TM
Linux kernels, nowadays it is available as a patch to a specific kernel and to the iproute2 tool (iproute2
contains Netem).
• It is not advised to define specific distortion patterns for testing in standards, because it will be easy to adapt
devices to these patterns (as it is already done for test signals). But if a pattern is unknown to a manufacturer,
the same pattern can be used by a test lab for different devices and gives comparable results. It is also possible
to take a trace of Nistnet distortions, generate a file out of this and playback the exact same distortions with
Netem.
TM TM TM TM
NOTE: NIST Net , NETEM , Linux and X Window System are examples of suitable products available
commercially. This information is given for the convenience of users of the present document and does
not constitute an endorsement by ETSI of these product(s).
5.5 Acoustic environment
Unless stated otherwise measurements shall be conducted under quiet and "anechoic" conditions. Depending on the
distance of the transducers from mouth and ear a quiet office room may be sufficient e.g. for handsets where artificial
mouth and artificial ear are located close to the acoustical transducers.
However, for some headsets or handset terminals with smaller dimension an anechoic room will be required.
In cases where real or simulated background noise is used as part of the testing environment, the original background
noise shall not be noticeably influenced by the acoustical properties of the room.
In all cases where the performance of acoustic echo cancellers shall be tested a realistic room which represents the
typical user environment for the terminal shall be used.
Standardized measurement methods for measurements with variable echo paths are for further study.
5.6 Influence of terminal delay on measurements
As delay is introduced by the terminal, care shall be taken for all measurements where exact position of the analysis
window is required. It shall be checked that the test is performed on the test signal and not on any other signal.
ETSI
13 ETSI ES 202 737 V1.5.1 (2017-01)
6 Requirements and associated measurement
methodologies
6.1 Notes
NOTE 1: In general the test methods as described in the present document apply. If alternative methods exist they
may be used if they have been proven to give the same result as the method described in the present
document. This will be indicated in the test report.
NOTE 2: Due to the time variant nature of IP connections delay variation may impair the measurements. In such
cases the measurement has to be repeated until a valid measurement result is achieved.
6.2 Test setup
6.2.1 General
The preferred acoustical access to terminals is the most realistic simulation of the "average" subscriber. This can be
made by using HATS (Head And Torso Simulator) with appropriate ear simulation and appropriate means to fix
handset and headset terminals in a realistic and reproducible way to the HATS. HATS is described in Recommendation
ITU-T P.58 [14], appropriate ears are described in Recommendation ITU-T P.57 [13] (type 3.3 and type 3.4 ear), a
proper positioning of handsets under realistic conditions is to be found in Recommendation ITU-T P.64 [15].
The preferred way of testing a terminal is to connect it to a network simulator with exact defined settings and access
points. The test sequences are fed in either electrically, using a reference codec or using the direct signal processing
approach or acoustically using ITU-T specified devices.
When a coder with variable bit rate is used for testing terminal electro acoustical parameters, the bit rate recognized
giving the best characteristics should be selected, e.g.:
• AMR-NB (ETSI TS 126 171 [2]): 12,2 kbit/s.
• Recommendation ITU-T G.729.1 [11]: 32 kbit/s.
ETSI
14 ETSI ES 202 737 V1.5.1 (2017-01)
IP-Half-Channel VoIP
Network
Measurement Terminal
simulator
Adapter under
delay,
(VoIP Reference Point)
Path through Path through
test
jitter,
Gateway
IP network IP network
packet loss
Simulation
POI
Electrical
Reference
Point
Measurement System
Figure 1: Half channel terminal measurement
6.2.2 Setup for handsets and headsets
When using a handset telephone the handset is placed in the HATS position as described in Recommendation ITU-T
P.64 [15]. The artificial mouth shall be conforming to Recommendation ITU-T P.58 [14]. The artificial ear shall be
conforming to Recommendation ITU-T P.57 [13], type 3.3 or type 3.4 ears shall be used.
Recommendations for positioning headsets are given in Recommendation ITU-T P.380 [18]. If not stated otherwise
headsets shall be placed in their recommended wearing position. Further information about setup and the use of HATS
can be found in Recommendation ITU-T P.380 [18].
Unless stated otherwise if a volume control is provided the setting is chosen such that the nominal RLR is met as close
as possible.
Unless stated otherwise the application force of 8 N is used for handset testing. No application force is used for
headsets.
6.2.3 Position and calibration of HATS
All the send and receive characteristics shall be tested with the HATS, it shall be indicated what type of ear was used at
what application force. For handsets, if not stated otherwise 8 N application force shall be used.
The horizontal positioning of the HATS reference plane shall be guaranteed within ±2º.
The HATS shall be equipped with a type 3.3 or type 3.4 artificial ear for handsets. For binaural headsets two artificial
ears are required. The type 3.3 or type 3.4 artificial ears as specified in Recommendation P.57 [13] shall be used. The
artificial ear shall be positioned on HATS according to Recommendation ITU-T P.58 [14].
The exact calibration and equalization can be found in Recommendation ITU-T P.581 [21]. If not stated otherwise, the
HATS shall be diffuse-field equalized. The inverse nominal diffuse field curve as found in table 3 of Recommendation
ITU-T P.58 [14] shall be used.
ETSI
15 ETSI ES 202 737 V1.5.1 (2017-01)
NOTE: The inverse average diffuse field response characteristics of HATS as found in Recommendation ITU-T
P.58 [14] is used and not the specific one corresponding to the HATS used. Instead of using the
individual diffuse field correction, the average correction function is used because, for handset and
headset measurements, mostly the artificial ear, ear canal and ear impedance simulation are effective. The
individual diffuse-field correction function of HATS includes all diffraction and reflection effects of the
complete individual HATS which are not effective in the measurement and potentially would lead to
bigger measurement uncertainties than using the average correction.
6.2.4 Test signal levels
Unless specified otherwise, the test signal level shall be -4,7 dBPa at the MRP.
Unless specified otherwise, the applied test signal level at the digital input shall be -16 dBm0.
6.2.5 Setup of background noise simulation
A setup for simulating realistic background noises in a lab-type environment is described in ETSI TS 103 224 [25].
If not stated otherwise this setup is used in all measurements where background noise simulation is required.
The following noises of ETSI TS 103 224 [25] shall be used.
Table 2a
1: 77,2 dB 2: 76,6 dB
HATS and microphone array in a 3: 75,7 dB 4: 76,0 dB
Pub Noise (Pub)
30 seconds
pub 5: 76,0 dB 6: 76,3 dB
7: 76,0 dB 8: 76,4 dB
1: 66,6 dB 2: 66,1 dB
Sales Counter HATS and microphone array in a 3: 65,7 dB 4: 66,5 dB
30 seconds
(SalesCounter) supermarket 5: 66,3 dB 6: 66,8 dB
7: 66,6 dB 8: 67,1 dB
1: 60,2 dB 2: 60,0 dB
HATS and microphone array in 3: 60,1 dB 4: 60,8 dB
Callcenter 2 (Callcenter) 30 seconds
business office 5: 60,2 dB 6: 60,6 dB
7: 60,2 dB 8: 60,7 dB
6.3 Coding independent parameters
6.3.1 Send frequency response
Requirement
The send frequency response of the handset or the headset shall be within a mask as defined in table 3 and shown in
figure 2. This mask shall be applicable for all types of handsets and headsets.
Table 3: Send frequency response
Frequency Upper Limit Lower Limit
-10 dB
100 Hz
(see notes 2 and 3)
300 Hz 5 dB -5 dB
3 400 Hz 5 dB -5 dB
4 000 Hz 5 dB
NOTE 1: The limits for intermediate frequencies lie on a straight line drawn between the
given values on a linear (dB) - logarithmic (Hz) scale.
NOTE 2: Under conditions of high background noise a limit of -18 dB is recommended.
NOTE 3: In ETSI ES 202 739 [24], the limit is 0 dB.

ETSI
16 ETSI ES 202 737 V1.5.1 (2017-01)

Figure 2: Send frequency response mask
NOTE: The basis for the target frequency responses in send and receive is the orthotelefonic reference response
which is measured between 2 subjects in 1 m distance under free field conditions and is assuming an ideal
receive characteristic. Under these conditions the overall frequency response
...


SLOVENSKI STANDARD
01-marec-2017
.DNRYRVWSUHQRVDJRYRUDLQYHþSUHGVWDYQLKYVHELQ 674 3UHQRVQH]DKWHYH]D
R]NRSDVRYQHWHUPLQDOH9R,3 URþQHLQQDJODYQH JOHGHQDNDNRYRVWVWRULWHY 4R6 
NRWMLKGRMHPDXSRUDEQLN
Speech and multimedia Transmission Quality (STQ) - Transmission requirements for
narrowband VoIP terminals (handset and headset) from a QoS perspective as perceived
by the user
Ta slovenski standard je istoveten z: ETSI ES 202 737 V1.5.1 (2017-01)
ICS:
33.050.01 Telekomunikacijska Telecommunication terminal
terminalska oprema na equipment in general
splošno
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Transmission requirements for narrowband
VoIP terminals (handset and headset)
from a QoS perspective as perceived by the user

2 ETSI ES 202 737 V1.5.1 (2017-01)

Reference
RES/STQ-242
Keywords
narrowband, quality, speech, telephony, terminal,
VoIP
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the
print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2017.
All rights reserved.
TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members.
TM
3GPP and LTE™ are Trade Marks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI
3 ETSI ES 202 737 V1.5.1 (2017-01)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 7
3 Definitions and abbreviations . 8
3.1 Definitions . 8
3.2 Abbreviations . 8
4 General considerations . 9
4.1 Default coding algorithm . 9
4.2 End-to-end considerations . 10
5 Test equipment . 10
5.1 IP half channel measurement adaptor . 10
5.2 Environmental conditions for tests . 10
5.3 Accuracy of measurements and test signal generation . 11
5.4 Network impairment simulation . 11
5.5 Acoustic environment . 12
5.6 Influence of terminal delay on measurements . 12
6 Requirements and associated measurement methodologies . 13
6.1 Notes . 13
6.2 Test setup. 13
6.2.1 General . 13
6.2.2 Setup for handsets and headsets . 14
6.2.3 Position and calibration of HATS . 14
6.2.4 Test signal levels . 15
6.2.5 Setup of background noise simulation . 15
6.3 Coding independent parameters . 15
6.3.1 Send frequency response . 15
6.3.2 Send Loudness Rating (SLR). 16
6.3.3 Mic mute . 17
6.3.4 Linearity range for SLR . 17
6.3.5 Send distortion . 18
6.3.6 Out-of-band signals in send direction . 19
6.3.7 Send noise . 19
6.3.8 Sidetone Masking Rating STMR (mouth to ear) . 20
6.3.9 Sidetone delay . 20
6.3.10 Terminal Coupling Loss weighted (TCLw) . 21
6.3.11 Stability loss. 21
6.3.12 Receive frequency response . 22
6.3.13 Receive Loudness Rating (RLR) . 25
6.3.14 Receive distortion . 25
6.3.15 Out-of-band signals in receive direction . 26
6.3.16 Minimum activation level and sensitivity in receive direction . 27
6.3.17 Receive noise . 27
6.3.18 Automatic level control in receive . 27
6.3.19 Double talk performance . 27
6.3.19.1 General . 27
6.3.19.2 Attenuation range in send direction during double talk A . 28
H,S,dt
6.3.19.3 Attenuation range in receive direction during double talk A . 29
H,R,dt
ETSI
4 ETSI ES 202 737 V1.5.1 (2017-01)
6.3.19.4 Detection of echo components during double talk . 29
6.3.19.5 Minimum activation level and sensitivity of double talk detection . 31
6.3.20 Switching characteristics . 31
6.3.20.1 Note . 31
6.3.20.2 Activation in send direction . 31
6.3.20.3 Silence suppression and comfort noise generation . 31
6.3.21 Background noise performance . 32
6.3.21.1 Performance in send direction in the presence of background noise . 32
6.3.21.2 Speech quality in the presence of background noise . 32
6.3.21.3 Quality of background noise transmission (with far end speech). 33
6.3.22 Quality of echo cancellation . 34
6.3.22.1 Temporal echo effects . 34
6.3.22.2 Spectral echo attenuation . 34
6.3.22.3 Occurrence of artefacts . 35
6.3.22.4 Variable echo path. 35
6.3.23 Variant impairments; network dependant . 36
6.3.23.1 Clock accuracy send . 36
6.3.23.2 Clock accuracy receive . 37
6.3.23.3 Send delay variation . 37
6.3.24 Send and receive delay - round trip delay . 38
6.4 Codec specific requirements. 39
6.4.1 Objective listening speech quality MOS-LQO in send direction . 39
6.4.2 Objective listening quality MOS-LQO in receive direction . 40
6.4.3 Quality of jitter buffer adjustment . 42
Annex A (informative): Processing delays in VoIP terminals . 44
Annex B (informative): Example IP delay variation . 47
Annex C (informative): Bibliography . 48
History . 49

ETSI
5 ETSI ES 202 737 V1.5.1 (2017-01)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia Transmission
Quality (STQ).
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Traditionally, the analogue and digital telephones were interfacing switched-circuit 64 kbit/s PCM networks. With the
fast growth of IP networks, terminals directly interfacing packet-switched networks (VoIP) are being rapidly
introduced. Such IP network edge devices may include gateways, specifically designed IP phones, soft phones or other
devices connected to the IP based networks and providing telephony service. Since the IP networks will be in many
cases interworking with the traditional PSTN and private networks, many of the basic transmission requirements have
to be harmonised with specifications for traditional digital terminals. However, due to the unique characteristics of the
IP networks including packet loss, delay, etc. new performance specifications, as well as appropriate measuring
methods, will have to be developed. Terminals are getting increasingly complex, advanced signal processing is used to
address the IP specific issues. Also, the VoIP terminals may use other than 64 kbit/s PCM (Recommendation ITU-T
G.711 [7]) speech algorithms.
The advanced signal processing of terminals is targeted to speech signals. Therefore, wherever possible speech signals
are used for testing in order to achieve mostly realistic test conditions and meaningful results.
The present document provides speech transmission performance for narrowband VoIP handset and headset terminals.
NOTE: Requirement limits are given in tables, the associated curve when provided is given for illustration.
ETSI
6 ETSI ES 202 737 V1.5.1 (2017-01)
1 Scope
The present document provides speech transmission performance requirements for 4 kHz narrowband VoIP handset and
headset terminals; it addresses all types of IP based terminals, including wireless and soft phones.
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
It is the intention of the present document to describe terminal performance parameters in such way that the remaining
variation of parameters can be assessed purely by the E-model.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] ETSI EN 300 726: "Digital cellular telecommunications system (Phase 2+) (GSM); Enhanced Full
Rate (EFR) speech transcoding (GSM 06.60)".
[2] ETSI TS 126 171: "Digital cellular telecommunications system (Phase 2+); Universal Mobile
Telecommunications System (UMTS); LTE; Speech codec speech processing functions; Adaptive
Multi-Rate - Wideband (AMR-WB) speech codec; General description (3GPP TS 26.171)".
[3] Recommendation ITU-T G.107: "The E-model: a computational model for use in transmission
planning".
[4] Recommendation ITU-T G.108: "Application of the E-model: A planning guide".
[5] Recommendation ITU-T G.109: "Definition of categories of speech transmission quality".
[6] Recommendation ITU-T G.122: "Influence of national systems on stability and talker echo in
international connections".
[7] Recommendation ITU-T G.711: "Pulse code modulation (PCM) of voice frequencies".
[8] Recommendation ITU-T G.723.1: "Dual rate speech coder for multimedia communications
transmitting at 5.3 and 6.3 kbit/s".
[9] Recommendation ITU-T G.726: "40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code
Modulation (ADPCM)".
[10] Recommendation ITU-T G.729: "Coding of speech at 8 kbit/s using conjugate-structure algebraic-
code-excited linear prediction (CS-ACELP)".
[11] Recommendation ITU-T G.729.1: "G.729-based embedded variable bit-rate coder: An 8-32 kbit/s
scalable wideband coder bitstream interoperable with G.729".
ETSI
7 ETSI ES 202 737 V1.5.1 (2017-01)
[12] Recommendation ITU-T P.56: "Objective measurement of active speech level".
[13] Recommendation ITU-T P.57: "Artificial ears".
[14] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[15] Recommendation ITU-T P.64: "Determination of sensitivity/frequency characteristics of local
telephone systems".
[16] Recommendation ITU-T P.79: "Calculation of loudness ratings for telephone sets".
[17] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
[18] Recommendation ITU-T P.380: "Electro-acoustic measurements on headsets".
[19] Recommendation ITU-T P.501: "Test signals for use in telephonometry".
[20] Recommendation ITU-T P.502: "Objective test methods for speech communication systems using
complex test signals".
[21] Recommendation ITU-T P.581: "Use of head and torso simulator for hands-free and handset
terminal testing".
[22] IEC 61260-1: "Electroacoustics - Octave-band and fractional-octave-band filters - Part 1:
Specifications".
[23] Recommendation ITU-T P.800.1: "Mean Opinion Score (MOS) terminology".
[24] ETSI ES 202 739: "Speech and multimedia Transmission Quality (STQ); Transmission
requirements for wideband VoIP terminals (handset and headset) from a QoS perspective as
perceived by the user".
[25] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
[26] Recommendation ITU-T P.863: "Perceptual objective listening quality assessment".
[27] Recommendation ITU-T P.863.1: "Application guide for Recommendation ITU-T P.863".
[28] Recommendation ITU-T P.1010: "Fundamental voice transmission objectives for VoIP terminals
and gateways".
[29] IETF RFC 3550: "RTP: A Transport Protocol for Real-Time Applications".
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] ETSI EG 201 377-1: "Speech and multimedia Transmission Quality (STQ); Specification and
measurement of speech transmission quality; Part 1: Introduction to objective comparison
measurement methods for one-way speech quality across networks".
[i.2] ETSI EG 202 425: "Speech Processing, Transmission and Quality Aspects (STQ); Definition and
implementation of VoIP reference point".
ETSI
8 ETSI ES 202 737 V1.5.1 (2017-01)
[i.3] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".
[i.4] NIST Net.
NOTE: Available at https://www-x.antd.nist.gov/itg/nistnet/.
[i.5] Netem.
NOTE: Available at http://www.linuxfoundation.org/en/Net:Netem.
[i.6] Trace Control for Netem (TCN): "A. Keller, Trace Control for Netem, Semester Thesis
SA-2006-15, ETH Zürich, 2006".
3 Definitions and abbreviations
3.1 Definitions
For the purposes of the present document, the following terms and definitions apply:
artificial ear: device for the calibration of earphones incorporating an acoustic coupler and a calibrated microphone for
the measurement of the sound pressure and having an overall acoustic impedance similar to that of the median adult
human ear over a given frequency band
codec: combination of an analogue-to-digital encoder and a digital-to-analogue decoder operating in opposite directions
of transmission in the same equipment
Composite Source Signal (CSS): signal composed in time by various signal elements
diffuse field equalization: equalization of the HATS sound pick-up, equalization of the difference, in dB, between the
spectrum level of the acoustic pressure at the ear Drum Reference Point (DRP) and the spectrum level of the acoustic
pressure at the HATS Reference Point (HRP) in a diffuse sound field with the HATS absent using the reverse nominal
curve given in table 3 of Recommendation ITU-T P.58 [14]
Ear Reference Point (ERP): virtual point for geometric reference located at the entrance to the listener's ear,
traditionally used for calculating telephonometric loudness ratings
ear-Drum Reference Point (DRP): point located at the end of the ear canal, corresponding to the ear-drum position
freefield reference point: point located in the free sound field, at least in 1,5 m distance from a sound source radiating
in free air
NOTE: In case of a head and torso simulator (HATS) in the centre of the artificial head with no artificial head
present.
Head And Torso Simulator (HATS) for telephonometry: manikin extending downward from the top of the head to
the waist, designed to simulate the sound pick-up characteristics and the acoustic diffraction produced by a median
human adult and to reproduce the acoustic field generated by the human mouth
Mouth Reference Point (MRP): point located on axis and 25 mm in front of the lip plane of a mouth simulator
nominal setting of the volume control: when a receive volume control is provided, the setting which is closest to the
nominal RLR of 2 dB
3.2 Abbreviations
For the purposes of the present document, the following abbreviations apply:
AM-FM Amplitude Modulation-Frequency Modulation
AMR Adaptative Multi-Rate
ETSI
9 ETSI ES 202 737 V1.5.1 (2017-01)
AMR-NB Adaptive Multi-Rate NarrowBand
CS Composite Source
CSS Composite Source Signal
DRP ear Drum Reference Point
EC Echo Canceller
EFR Enhanced Full Rate
EL Echo Loss
ERP Ear Reference Point
ETH Eidgenössische Technische Hochschule
FFT Fast Fourrier Transform
GSM Global System for Mobile communications
HATS Head And Torso Simulator
IEC International Electrotechnical Commission
IP Internet Protocol
IPDV IP Packet Delay Variation
ITU-T International Telecommunication Union - Telecommunication standardization sector
MOS Mean Opinion Score
MOS-LQOy Mean Opinion Score - Listening Quality Objective
NOTE: y being N for narrow-band, M for mixed and S for superwideband. See Recommendation
ITU-T P.800.1 [23].
MRP Mouth Reference Point
NIST National Institute of Standards and Technology
NLP Non Linear Processor
PBX Private Branch eXchange
PC Personal Computer
PCM Pulse Code Modulation
POLQA Perceptual Objective Listening Quality Assessment
PLC Packet Loss Concealment
PN Pseudo-random Noise
POI Point Of Interconnect
PSTN Public Switched Telephone Network
QoS Quality of Service
RLR Receive Loudness Rating
RMS Root Mean Square
RTP Real Time Protocol
SLR Send Loudness Rating
STMR SideTone Masking Rating
TCLw Terminal Coupling Loss (weighted)
TCN Trace Control for Netem
TDM Time Division Multiplex
TOSQA Telecommunication Objective Speech Quality Assessment
VAD Voice Activity Detector
4 General considerations
4.1 Default coding algorithm
VoIP terminals shall support the coding algorithm according to Recommendation ITU-T G.711 [7] (both µ-law and
A-law). VoIP terminals may support other coding algorithms.
NOTE: Associated Packet Loss Concealment (PLC) e.g. as defined in Recommendation ITU-T G.711 [7]
appendix I should be used.
ETSI
10 ETSI ES 202 737 V1.5.1 (2017-01)
4.2 End-to-end considerations
In order to achieve a desired end-to-end speech transmission performance (mouth-to-ear) it is recommended that the
general rules of transmission planning are carried out with the E-model of Recommendation ITU-T G.107 [3] taking
into account that the E-model does not yet address headsets; this includes the a-priori determination of the desired
category of speech transmission quality as defined in Recommendation ITU-T G.109 [5].
While, in general, the transmission characteristics of single circuit-oriented network elements, such as switches or
terminals can be assumed to have a single input value for the planning tasks of Recommendation ITU-T G.108 [4], this
approach is not applicable in packet based systems and thus there is a need for the transmission planner's specific
attention.
In particular the decision as to which delay measured according to the present document should is acceptable or
representative for the specific configuration is the responsibility of the individual transmission planner.
Recommendation ITU-T G.108 with its amendments [4] provides further guidance on this important issue.
The following optimum terminal parameters from a users' perspective need to be considered:
• Minimized delay in send and receive direction.
• Optimum loudness Rating (RLR, SLR).
• Compensation for network delay variation.
• Packet loss recovery performance.
• Maximized terminal coupling loss.
5 Test equipment
5.1 IP half channel measurement adaptor
The IP half channel measurement adaptor is described in ETSI EG 202 425 [i.2].
5.2 Environmental conditions for tests
The following conditions shall apply for the testing environment:
a) Ambient temperature: 15 °C to 35 °C (inclusive);
b) Relative humidity: 5 % to 85 %;
c) Air pressure: 86 kPa to 106 kPa (860 mbar to 1 060 mbar).
ETSI
11 ETSI ES 202 737 V1.5.1 (2017-01)
5.3 Accuracy of measurements and test signal generation
Unless specified otherwise, the accuracy of measurements made by test equipment shall be equal to or better than:
Table 1: Measurement accuracy
Item Accuracy
Electrical signal level
±0,2 dB for levels ≥ -50 dBV
±0,4 dB for levels < -50 dBV
Sound pressure ±0,7 dB
Frequency ±0,2 %
Time ±0,2 %
Application force ± 2 N
Measured maximum frequency 20 kHz

Unless specified otherwise, the accuracy of the signals generated by the test equipment shall be better than:
Table 2: Accuracy of test signal generation
Quantity Accuracy
Sound pressure level at ±3 dB for frequencies from 100 Hz to 200 Hz
Mouth Reference Point (MRP) ±1 dB for frequencies from 200 Hz to 4 000 Hz
±3 dB for frequencies from 4 000 Hz to 14 000 Hz
Electrical excitation levels ±0,4 dB across the whole frequency range
Frequency generation ±2 % (see note)
Time ±0,2 %
Specified component values ±1 %
NOTE: This tolerance may be used to avoid measurements at critical frequencies, e.g. those
due to sampling operations within the terminal under test.

For terminal equipment which is directly powered from the mains supply, all tests shall be carried out within ±5 % of
the rated voltage of that supply. If the equipment is powered by other means and those means are not supplied as part of
the apparatus, all tests shall be carried out within the power supply limit declared by the supplier. If the power supply is
a.c. the test shall be conducted within ±4 % of the rated frequency.
5.4 Network impairment simulation
At least one set of requirements is based on the assumption of an error free packet network, and at least one other set of
requirements is based on a defined simulated malperformance of the packet network.
An appropriate network simulator has to be used, for example NIST net [i.4] (https://www-x.antd.nist.gov/itg/nistnet/)
or Netem [i.5].
Based on the positive experience, STQ have made during the ETSI Speech Quality Test Events with "NIST Net" this
will be taken as a basis to express and describe the variations of packet network parameters for the appropriate tests.
Here is a brief blurb about NIST Net:
The NIST Net network emulator is a general-purpose tool for emulating performance dynamics in IP
networks. The tool is designed to allow controlled, reproducible experiments with network performance
sensitive/adaptive applications and control protocols in a simple laboratory setting. By operating at the IP
level, NIST Net can emulate the critical end-to-end performance characteristics imposed by various wide area
network situations (e.g. congestion loss) or by various underlying sub network technologies (e.g. asymmetric
bandwidth situations of xDSL and cable modems).
ETSI
12 ETSI ES 202 737 V1.5.1 (2017-01)
TM
NIST Net is implemented as a kernel module extension to the Linux operating system and an X Window
System-based user interface application. In use, the tool allows an inexpensive PC-based router to emulate
numerous complex performance scenarios, including: tunable packet delay distributions, congestion and
background loss, bandwidth limitation, and packet reordering/duplication. The X interface allows the user to
select and monitor specific traffic streams passing through the router and to apply selected performance
"effects" to the IP packets of the stream. In addition to the interactive interface, NIST Net can be driven by
traces produced from measurements of actual network conditions. NIST Net also provides support for user
defined packet handlers to be added to the system. Examples of the use of such packet handlers include: time
stamping/data collection, interception and diversion of selected flows, generation of protocol responses from
emulated clients.
The key points of Netem can be summarized as follows:
TM
• Netem is nowadays part of most Linux distributions, it only has to be switched on, when compiling a kernel.
With Netem, there are the same possibilities as with nistnet, there can be generated loss, duplication, delay and
TM
jitter (and the distribution can be chosen during runtime). Netem can be run on a Linux -PC running as a
bridge or a router (Nistnet only runs on routers).
• With an amendment of Netem, TCN (Trace Control for Netem) [i.6] which was developed by ETH Zurich, it
is even possible, to control the behaviour of single packets via a trace file. So it is for example possible to
generate a single packet loss, or a specific delay pattern. This amendment is planned to be included in new
TM
Linux kernels, nowadays it is available as a patch to a specific kernel and to the iproute2 tool (iproute2
contains Netem).
• It is not advised to define specific distortion patterns for testing in standards, because it will be easy to adapt
devices to these patterns (as it is already done for test signals). But if a pattern is unknown to a manufacturer,
the same pattern can be used by a test lab for different devices and gives comparable results. It is also possible
to take a trace of Nistnet distortions, generate a file out of this and playback the exact same distortions with
Netem.
TM TM TM TM
NOTE: NIST Net , NETEM , Linux and X Window System are examples of suitable products available
commercially. This information is given for the convenience of users of the present document and does
not constitute an endorsement by ETSI of these product(s).
5.5 Acoustic environment
Unless stated otherwise measurements shall be conducted under quiet and "anechoic" conditions. Depending on the
distance of the transducers from mouth and ear a quiet office room may be sufficient e.g. for handsets where artificial
mouth and artificial ear are located close to the acoustical transducers.
However, for some headsets or handset terminals with smaller dimension an anechoic room will be required.
In cases where real or simulated background noise is used as part of the testing environment, the original background
noise shall not be noticeably influenced by the acoustical properties of the room.
In all cases where the performance of acoustic echo cancellers shall be tested a realistic room which represents the
typical user environment for the terminal shall be used.
Standardized measurement methods for measurements with variable echo paths are for further study.
5.6 Influence of terminal delay on measurements
As delay is introduced by the terminal, care shall be taken for all measurements where exact position of the analysis
window is required. It shall be checked that the test is performed on the test signal and not on any other signal.
ETSI
13 ETSI ES 202 737 V1.5.1 (2017-01)
6 Requirements and associated measurement
methodologies
6.1 Notes
NOTE 1: In general the test methods as described in the present document apply. If alternative methods exist they
may be used if they have been proven to give the same result as the method described in the present
document. This will be indicated in the test report.
NOTE 2: Due to the time variant nature of IP connections delay variation may impair the measurements. In such
cases the measurement has to be repeated until a valid measurement result is achieved.
6.2 Test setup
6.2.1 General
The preferred acoustical access to terminals is the most realistic simulation of the "average" subscriber. This can be
made by using HATS (Head And Torso Simulator) with appropriate ear simulation and appropriate means to fix
handset and headset terminals in a realistic and reproducible way to the HATS. HATS is described in Recommendation
ITU-T P.58 [14], appropriate ears are described in Recommendation ITU-T P.57 [13] (type 3.3 and type 3.4 ear), a
proper positioning of handsets under realistic conditions is to be found in Recommendation ITU-T P.64 [15].
The preferred way of testing a terminal is to connect it to a network simulator with exact defined settings and access
points. The test sequences are fed in either electrically, using a reference codec or using the direct signal processing
approach or acoustically using ITU-T specified devices.
When a coder with variable bit rate is used for testing terminal electro acoustical parameters, the bit rate recognized
giving the best characteristics should be selected, e.g.:
• AMR-NB (ETSI TS 126 171 [2]): 12,2 kbit/s.
• Recommendation ITU-T G.729.1 [11]: 32 kbit/s.
ETSI
14 ETSI ES 202 737 V1.5.1 (2017-01)
IP-Half-Channel VoIP
Network
Measurement Terminal
simulator
Adapter under
delay,
(VoIP Reference Point)
Path through Path through
test
jitter,
Gateway
IP network IP network
packet loss
Simulation
POI
Electrical
Reference
Point
Measurement System
Figure 1: Half channel terminal measurement
6.2.2 Setup for handsets and headsets
When using a handset telephone the handset is placed in the HATS position as described in Recommendation ITU-T
P.64 [15]. The artificial mouth shall be conforming to Recommendation ITU-T P.58 [14]. The artificial ear shall be
conforming to Recommendation ITU-T P.57 [13], type 3.3 or type 3.4 ears shall be used.
Recommendations for positioning headsets are given in Recommendation ITU-T P.380 [18]. If not stated otherwise
headsets shall be placed in their recommended wearing position. Further information about setup and the use of HATS
can be found in Recommendation ITU-T P.380 [18].
Unless stated otherwise if a volume control is provided the setting is chosen such that the nominal RLR is met as close
as possible.
Unless stated otherwise the application force of 8 N is used for handset testing. No application force is used for
headsets.
6.2.3 Position and calibration of HATS
All the send and receive characteristics shall be tested with the HATS, it shall be indicated what type of ear was used at
what application force. For handsets, if not stated otherwise 8 N application force shall be used.
The horizontal positioning of the HATS reference plane shall be guaranteed within ±2º.
The HATS shall be equipped with a type 3.3 or type 3.4 artificial ear for handsets. For binaural headsets two artificial
ears are required. The type 3.3 or type 3.4 artificial ears as specified in Recommendation P.57 [13] shall be used. The
artificial ear shall be positioned on HATS according to Recommendation ITU-T P.58 [14].
The exact calibration and equalization can be found in Recommendation ITU-T P.581 [21]. If not stated otherwise, the
HATS shall be diffuse-field equalized. The inverse nominal diffuse field curve as found in table 3 of Recommendation
ITU-T P.58 [14] shall be used.
ETSI
15 ETSI ES 202 737 V1.5.1 (2017-01)
NOTE: The inverse average diffuse field response characteristics of HATS as found in Recommendation ITU-T
P.58 [14] is used and not the specific one corresponding to the HATS used. Instead of using the
individual diffuse field correction, the average correction function is used because, for handset and
headset measurements, mostly the artificial ear, ear canal and ear impedance simulation are effective. The
individual diffuse-field correction function of HATS includes all diffraction and reflection effects of the
complete individual HATS which are not effective in the measurement and potentially would lead to
bigger measurement uncertainties than using the average correction.
6.2.4 Test signal levels
Unless specified otherwise, the test signal level shall be -4,7 dBPa at the MRP.
Unless specified otherwise, the applied test signal level at the digital input shall be -16 dBm0.
6.2.5 Setup of background noise simulation
A setup for simulating realistic background noises in a lab-type environment is described in ETSI TS 103 224 [25].
If not stated otherwise this setup is used in all measurements where background noise simulation is required.
The following noises of ETSI TS 103 224 [25] shall be used.
Table 2a
1: 77,2 dB 2: 76,6 dB
HATS and microphone array in a 3: 75,7 dB 4: 76,0 dB
Pub Noise (Pub)
30 seconds
pub 5: 76,0 dB 6: 76,3 dB
7: 76,0 dB 8: 76,4 dB
1: 66,6 dB 2: 66,1 dB
Sales Counter HATS and microphone array in a 3: 65,7 dB 4: 66,5 dB
30 seconds
(SalesCounter) supermarket 5: 66,3 dB 6: 66,8 dB
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...