Speech Processing, Transmission and Quality Aspects (STQ); QoS and network performance metrics and measurement methods Part 2 : Transmission Quality Indicator combining Voice Quality Metrics

DEG/STQ-00104-2

Vidiki obdelave, prenosa in kakovosti govora (STQ) - Metode metrike in merjenja kakovosti storitev (QoS) in zmogljivosti omrežij - 2. del: Kazalnik prenosne kakovosti, vključno z metriko kakovosti govora

General Information

Status
Published
Publication Date
22-Feb-2009
Current Stage
12 - Completion
Due Date
27-Feb-2009
Completion Date
23-Feb-2009
Standard
ETSI EG 202 765-2 V1.1.1 (2008-12) - Speech Processing, Transmission and Quality Aspects (STQ); QoS and network performance metrics and measurement methods Part 2 : Transmission Quality Indicator combining Voice Quality Metrics
English language
37 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ETSI EG 202 765-2 V1.1.1 (2009-02) - Speech Processing, Transmission and Quality Aspects (STQ); QoS and network performance metrics and measurement methods Part 2 : Transmission Quality Indicator combining Voice Quality Metrics
English language
37 pages
sale 15% off
Preview
sale 15% off
Preview
Guide
V ETSI/EG 202 765-2 V1.1.1:2009 - BARVE
English language
37 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
ETSI Guide
Speech Processing, Transmission and Quality Aspects (STQ);
QoS and network performance metrics and measurement
methods
Part 2: Transmission Quality Indicator combining
Voice Quality Metrics
2 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)

Reference
DEG/STQ-00104-2
Keywords
performance, QoS, voice
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
Individual copies of the present document can be downloaded from:
http://www.etsi.org
The present document may be made available in more than one electronic version or in print. In any case of existing or
perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF).
In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive
within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
http://portal.etsi.org/tb/status/status.asp
If you find errors in the present document, please send your comment to one of the following services:
http://portal.etsi.org/chaircor/ETSI_support.asp
Copyright Notification
No part may be reproduced except as authorized by written permission.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2008.
All rights reserved.
TM TM TM TM
DECT , PLUGTESTS , UMTS , TIPHON , the TIPHON logo and the ETSI logo are Trade Marks of ETSI registered
for the benefit of its Members.
TM
3GPP is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners.
ETSI
3 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
Contents
Intellectual Property Rights . 5
Foreword . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 6
3 Abbreviations . 7
4 Introduction . 8
5 Measurement type . 9
6 Voice quality scale . 10
7 List of indicators . 10
7.1 Post Dialling Delay . 10
7.2 Media establishment delay . 10
7.3 Unsuccessful call ratio . 11
7.4 Premature release probability . 11
7.5 Level of active speech signal at reception . 11
7.6 Noise level at reception . 12
7.7 Noise to signal ratio at reception . 12
7.8 Speech signal attenuation (or gain) after transmission . 13
7.9 Talker echo delay . 14
7.10 Talker echo attenuation . 14
7.11 Listening speech quality . 15
7.12 Listening speech quality stability . 16
7.13 End to end delay . 17
7.14 End to end delay variation . 17
7.15 Frequency responses at the reception . 18
8 Measurement frequency . 19
9 Duration of test calls. 19
10 Measurement configurations . 19
10.1 VoIP services . 19
10.2 VoIP services in triple play context . 19
11 Measurement locations and their distribution . 19
11.1 Measurement location requirements . 19
11.2 Method to determine measurement locations . 21
12 Results presentation . 21
12.1 One-view visualization of performances . 21
12.1.1 Pie diagram with all indicators . 22
12.1.2 Pie diagram with mandatory indicators . 23
12.2 Non-compliant limits for result visualization . 23
13 Publication of the results . 24
Annex A: Indicator stability formulation . 25
A.1 Presentation . 25
A.2 Formulation . 25
A.3 Graphic illustration of the formulation . 26
A.4 Some examples of stability indicator calculated on Listening Speech Quality . 28
ETSI
4 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
Annex B: Calibration to take into account the frequency response of transducers . 30
B.1 Method presentation . 30
B.1.1 Sending . 30
B.1.2 Sending . 31
B.1.3 Global communication . 31
B.1.4 Applications . 31
Annex C: Echo presentation. 32
C.1 Talker echo . 32
C.2 Listener echo . 32
Annex D: Examples of measurement point distribution . 33
D.1 Example of France. 33
D.2 Example of Switzerland . 34
History . 37

ETSI
5 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (http://webapp.etsi.org/IPR/home.asp).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This ETSI Guide (EG) has been produced by ETSI Technical Committee Speech Processing, Transmission and Quality
Aspects (STQ), and is now submitted for the ETSI standards Membership Approval Procedure.
ETSI
6 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
1 Scope
The present document aims at identifying and defining indicators and methodologies for a use in a context of end-user
quality characterization and supervision of voice telephony services.
In this context the measurements and metric determinations are perform by analysing signals accessible on user-end
services and not on the network. In order to mirror the reality in terms of access to the services at the user-end
measurements and analysis are perform on electrical signal that exclude the electro-acoustic part of the end equipment
but the probe adaptation to electric interface of the end user equipment much take into account the electro-acoustic
characteristics of this terminal.
2 References
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific.
• For a specific reference, subsequent revisions do not apply.
• Non-specific reference may be made only to a complete document or a part thereof and only in the following
cases:
- if it is accepted that it will be possible to use all future changes of the referenced document for the
purposes of the referring document;
- for informative references.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
For online referenced documents, information sufficient to identify and locate the source shall be provided. Preferably,
the primary source of the referenced document should be cited, in order to ensure traceability. Furthermore, the
reference should, as far as possible, remain valid for the expected life of the document. The reference shall include the
method of access to the referenced document and the full network address, with the same punctuation and use of upper
case and lower case letters.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
2.1 Normative references
The following referenced documents are indispensable for the application of the present document. For dated
references, only the edition cited applies. For non-specific references, the latest edition of the referenced document
(including any amendments) applies.
Not applicable.
2.2 Informative references
The following referenced documents are not essential to the use of the present document but they assist the user with
regard to a particular subject area. For non-specific references, the latest version of the referenced document (including
any amendments) applies.
[i.1] ITU-T Recommendation P.800: "Methods for subjective determination of transmission quality".
[i.2] ITU-T Recommendation P.862: "Perceptual evaluation of speech quality (PESQ): An objective
method for end-to-end speech quality assessment of narrow-band telephone networks and speech
codecs".
ETSI
7 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
[i.3] ITU-T Recommendation P.862.1: "Mapping function for transforming P.862 raw result scores to
MOS-LQO".
[i.4] ITU-T Recommendation P.862.2: "Wideband extension to Recommendation P.862 for the
assessment of wideband telephone networks and speech codecs".
[i.5] ITU-T Recommendation P.862.3: "Application guide for objective quality measurement based on
Recommendations P.862, P.862.1 and P.862.2".
[i.6] ITU-T Recommendation P.800.1: "Mean Opinion Score (MOS) terminology".
[i.7] ITU-T Recommendation E.800: "Terms and definitions related to quality of service and network
performance including dependability".
[i.8] ITU-T Recommendation E.845: "Connection accessibility objective for the international telephone
service".
[i.9] ETSI EG 201 769: "Speech Processing, Transmission and Quality Aspects (STQ); QoS parameter
defini tions and measurements; Parameters for voice telephony service required under the ONP
Voice Telephony Directive 98/10/EC".
[i.10] ITU-T Recommendation P.56: "Objective measurement of active speech level".
[i.11] ITU-T Recommendation O.41: "Psophometer for use on telephone-type circuits".
[i.12] ITU-T Recommendation G.131: "Talker echo and its control".
[i.13] ITU-T Recommendation G.168: "Digital network echo cancellers".
[i.14] ITU-T Recommendation G.114: "One-way transmission time".
[i.15] ITU-T Recommendation P.505: "One-view visualization of speech quality measurement results".
[i.16] ETSI EG 201 377 (all parts): "Speech Processing, Transmission and Quality Aspects (STQ);
Specification and measurement of speech transmission quality".
[i.17] ITU-T Recommendation H.323: "Packet-based multimedia communications systems".
[i.18] ITU-T Recommendation H.225.0: "Call signalling protocols and media stream packetization for
packet-based multimedia communication systems".
[i.19] ITU-Recommendation P.50: "T Artificial voices".
[i.20] ITU-Recommendation P.501: "Test signals for use in telephonometry".
3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
ADSL Asymmetrical Digital Subscriber Line
ATA Analog Telephone Adapter
IP Internet Protocol
ISDN Integrated Services Digital Network
ITU-T International Telecommunication Union - Telecommunication standardization sector
GPS Global Positioning System
GSM Global System for Mobile communications
HATS Head And Torso Simulator
MGCP Media Gateway Control Protocol
MOS Mean Opinion Score
MOS-LQOM Mean Opinion Store-Listening Quality Objective Mixed bandwidths
PDD Post Dialling Delay
PESQ Perceptual Evaluation of Speech Quality
PSTN Public Switched Telephone Network
RTP Real Time Protocol
ETSI
8 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
SIP Session Initiation Protocol
UMTS Universal Mobile Telecommunications Service
VoIP Voice over Internet Protocol
4 Introduction
The assessment of transmission quality based on voice quality metrics is already addressed in several standards at ETSI
(e.g.: EG 201 377 [i.16] series) and elsewhere (mostly ITU-T recommendations from the P and G series). These
different documents are addressing the measurement methodologies in terms of metrics, threshold, data acquisition or
modelling of subjective opinion.
The objective of the present document is to complement this material with practical requirements of use in the context
of service verification and benchmark on a large and representative scale from the point of view of the end-users or of
the regulatory authorities. This has been made necessary by the current or recent evolutions of the telecommunication
sector:
• the competitive environment, in particular in voice services, where public protocols with high quality services
have been replaced by a multitude of service providers with less guarantees, and where clients can very easily
change their service providers;
• the development of time varying quality in telecommunications, first in mobile offers (due to mobility and
irregular network coverage), but now also for fix services (mostly VoIP);
• the cohabitation, interaction and competition between services based on different technologies.
Voice transmission quality is now recognized as a differentiating factor, but it remains very difficult to quantify.
To achieve the goal mentioned beforehand, there are several existing possibilities, not fully satisfying:
• Customer surveys. This is by far the cheapest way to assess the perception of end users. But the bias
introduced by the other factors like price, as well as the fact that voice quality itself is rarely questioned as
itself or in a satisfactory way (one never knows before a survey what are the problems encountered by end
users), makes this source not really reliable.
• Pseudo-subjective tests, with a few human testers assessing the quality of real links in several situations. This
method has the major drawback of its lack of reproducibility, and is often applied without using the standard
metrics and quality scales that can be found in standards like ITU-T Recommendation P.800 [i.1]. It is also
very long to run and not really cheap in the current competitive context where so many offers have to be
assessed. And it is not easily applicable in a context of quality changing over time.
• Objective tests. This is the most reliable way, although it is also based on sampling and can cost a lot of money
in the case of a large deployment of probes or robots.
The present document assumes that this last family of methodology answers the needs of a reliable comparison of
telephony offers and is applied without combination with other methods.
What definitely matters is the point of view of the end-users. What they perceive is not only the result of the
transmission of a signal across a network; the processing of this signal at the sending and at the receiving sides has also
a big importance. Therefore, it seems obvious not to use passive network monitoring systems to assess end-to-end voice
quality, but rather active systems simulating the behaviour of the end users, including the terminal. A big advantage of
such an approach is that it is highly technical and protocol agnostic, and therefore compliant with the expectations of
users, which are not judging voice quality of PSTN, GSM or VoIP services following different criteria.
Last important aspect that is addressed in the present document is the practical organization of measurement campaigns
in order to get a realistic and reliable vision of the services as perceived by the end-users. In particular, the questions of
the periodicity of measurement and of the geographical coverage (i.e. more generally the sampling approach).
ETSI
9 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
In order to mirror the reality in terms of access to the services, a reliable measurement or supervision system should
provide the possibility to collect information from probes or robots adapted to the most common interfaces available.
This includes:
• analogue access (for the simulation of PSTN or of analogue phones behind an ATA box or an ADSL modem);
• ISDN access;
• handset (for any wireline terminal);
• electrical input and output (for PC soundcards of for any wireless terminal);
• GSM;
• UMTS;
• ethernet with IP phone termination (SIP, ITU-T Recommendation H.323 [i.17], MGCP, etc.).
Any combination of end-to-end connection between the types of access mentioned here have to be considered when a
measurement campaign is scheduled. Nevertheless, of course, there are practical limitations:
• the number of measurements for a given type of access should be in proportion with its level of use in the real
life;
• the number of probes and of measurement results available will be adapted to the real needs as well as to the
capacity (mostly in terms of cost and of processing capability) of the entity running these measurements.
Figure 4.1 shows these different configurations and interfaces.

GSM and UMTS Ethernet access
terminals on IP Network
Electrical access
on mobile
terminal
Analogue access
ADSL modem
Radio Network
Handset access
on analogue
terminal
IP Network
Handset
PSTN
access on
Handset access
terminal
on ISDN terminal
connected to
IP Network
Analogue access
ISDN access Electrical access
on Wireless
terminal
Electrical access
connected to IP
on Wireless
Electrical access on PC
Network
terminal
connected to
PSTN
Figure 4.1: Possible configurations and interfaces in context of user characterization
ETSI
10 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
5 Measurement type
To perform quality services assessments, there are two different methods: intrusive and non intrusive measurements.
The non intrusive measurements are not really adapted to end user surveys because it requires to install probes at the
user's terminals.
The intrusive measurements are more adapted to end user surveys because probe connection with end user terminals is
easier. Compared to non intrusive measurements, the intrusive methods have an advantage: the opportunity for voice
quality assessment to use models with references such as ITU-T Recommendation P.862[ i.2]
(see also ITU-T recommendations P.862.1 [i.3], P.862.2 [i.4] and P.862.3 [i.5] concerning mapping functions and
application guide) which give results close to subjective perception of the speech quality.
In this context, the intrusive measurements using models working with references for speech quality assessment will be
perform for end user survey.
6 Voice quality scale
It is important to consider that nowadays telephony has entered an era where traditional narrowband services will
cohabit with new services offering wideband audio capacities. For end-users, these are not separated kinds of services.
Therefore, the assessment of transmission quality of voice should now be based on common metrics and objective
quality levels and scales, in replacement of the existing narrow-band only ones. In this context, it is appropriate to use
the MOS-LQOM scale to characterize voice quality of narrow-band services and wideband services.
See ITU-T Recommendation P.800.1 [i.6] for more information on MOS terminology.
7 List of indicators
The indicators proposed for the context of end-user quality survey of voice services are:
7.1 Post Dialling Delay
Definition Post Dialling Delay (PDD) evaluates service availability to set up calls in an acceptable
delay. It is linked to the service architecture complexity, and to the performance of the
constituting network elements.
Post Dialling Delay is the time interval between the end of dialling by the caller and the
reception back by him of the appropriate ringing tone or recorded announcement.
Metric determines on one of the two access of the communication.
Assessment method Indicator determines sequentially from the two access of call configuration. This indicator
characterizes only the caller part of the configuration.
Unit Millisecond with an integer value.
Standardization
reference
Significant Mandatory.
Comment This indicator has to be separated between call types (IP to IP, IP to PSTN, IP to mobile,
etc.) for a detailed analysis.
The objective set up in for universal telephony service has been set up to 2 900 ms in the
French regulator recommendation.
ETSI
11 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
7.2 Media establishment delay
Definition Time determines on one of the two access of the communication, between off hock of the
called and the beginning of voice signal receive.
Assessment method Indicator determines sequentially from the two access of call configuration.
On an IP access this indicator may be assessed by using a non-intrusive probe, such as a
protocol analyser. Media establishment delay may be evaluated through the analysis of
media flows and signalling. For ITU-T Recommendation H.323 protocol [i.17] the flow
establishment delay corresponds to the time elapsed between the emission of the
ITU-T Recommendation H.225.0 [i.18] "CONNECT" message and the arrival of the first
IP packet including speech signal.
Unit Millisecond with an integer value.
Standardization
reference
Significant Optional.
Comment This indicator has to be separated between caller and called site for a detailed analysis.
7.3 Unsuccessful call ratio
Definition Ratio of unsuccessful calls to the total number of call attempts in a specified time period.
An unsuccessful call is a call attempt to a valid number, properly dialled following dial
tone, where neither called party busy tone, nor ringing tone, nor answer signal, is
recognized on the access line of the calling user within 30 seconds from the instant when
the address information required for setting up a call is received by the network.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit % with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation E.800 [i.7], ITU-T Recommendation E.845 [i.8],
reference EG 201 769 [i.9].
Significant Mandatory.
Comment The limit of 30 seconds is the default set-up of a timer in SS7 protocol.
7.4 Premature release probability
Definition This indicator characterizes the ability to release a service. It is based on the measurement
of the number of released communications in comparison with the number of established
communications.
Released communications are defined as communications released before voluntary action
from one of the ends of the transmission.
Assessment method
Unit % with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation E.800 [i.7].
reference
Significant Optional.
ETSI
12 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
7.5 Level of active speech signal at reception
Definition Level of speech signal received after transmission.
The level of the signal heard by the user has an impact on the quality he will perceive. A
too low signal will be hardly audible and masked by the noise, while a too high level will
be painful.
Therefore, a measurement of the speech signal level is necessary to ensure a good listening
comfort.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
A typical method for the measurement of this parameter, based on a sample by sample
approach and a moving threshold between noise and speech, is given in
ITU-T Recommendation P.56 [i.10].
Unit dBm with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation P.56 [i.10].
reference
Significant Optional.
Comment Each sample of signal has a level, generally express in mV. The mean speech level is the
transformation on as appropriate logarithmic scale of the mean signal voltage.
The samples taken into account for this measurement are the ones seen as speech (the
others are taken into account for noise measurements).
It is recommended to fall within classical speech levels values, i.e. between -25 dBm and
-10 dBm.
7.6 Noise level at reception
Definition Level of noise determines at reception in non-speech segment of speech sample.
The noise present besides the speech signal can have characteristics that can become a
disagreement, for instance if they have a varying spectrum (crowd, noise, for instance). But
the more important source of annoying due to noise is simply its level.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
The measurement of these parameters is normally performed as for speech signal level
(see clause 7.5), but on the samples identified as non-speech.
Unit dBmOp with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation O.41 [i.11].
reference
Significant Optional.
Comment Each sample of signal has a level, generally express in mV. The mean noise level is the
transformation on as appropriate logarithmic scale of the mean signal voltage of the noise
samples.
To get a more accurate noise level measure, a frequency transform needs to be done in
order to apply a psophometric weighting (see ITU-T Recommendation O.41 [i.11]).
It is recommended not to have noises louder than -50 dBmOp.
ETSI
13 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
7.7 Noise to signal ratio at reception
Definition Difference between the active vocal level and the level of noise at the reception.
The noise present besides the speech signal can have characteristics that can become a
disagreement, for instance if they have a varying spectrum (crowd, noise, for instance). But
the more important source of annoying due to noise is simply its level, and particularly the
relative level compared to speech.
Assessment method Combination of speech signal level (see clause 7.5) and noise level (see clause 7.6) can
replace noise to signal ration indicator.
The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
Speech signal level/noise level = SNR.
Indicator determines in the two directions of transmission.
Unit dB with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation P.56 [i.10].
reference
Significant Optional.
Comment It is recommended not to have SNR lower than 30 dB.
7.8 Speech signal attenuation (or gain) after transmission
Definition Variant metric of the speech signal level (if one knows the sending speech level, they are
even redundant).
Speech signal attenuation after transmission is the difference between the active vocal level
at receiving and sending access.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter. Once the speech signal level has been computed
(see clause 7.5), it is compared with the level of the sent signal. The attenuation is the
difference between these two levels.
There are other methods to compute this parameter, based for instance on intrusive
measurement made with sine waves and a specific weighting function.
Indicator determines in the two directions of transmission.
Unit dB with the resolution of 1 digit after the decimal point.
Standardization
reference
Significant Optional.
Comment It is recommended to comply with PSTN attenuation rules, i.e. an attenuation between
6 dB and 10 dB.
ETSI
14 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
7.9 Talker echo delay
Definition In telecommunications, the term echo describes delayed and unwanted feedback of the
send signal into the receive path. The so-called echo source is the reflection point between
send and receive directions, which could be one of the following causes:
• 4-wire/2-wire Hybrid Circuits (multiple reflections possible);
• coupling in handset cords;
• structure borne coupling in handsets;
• acoustical coupling between earpiece and microphone.
This phenomenon is characterized by two parameters: its attenuation and its delay.
See annex C for a more detailed discussion of talker echo and the related listener echo.
With the increased delays present in today's IP networks, echo has the potential to be much
more perceivable and annoying than in classical PSTN.
In order to achieve a similar user perception with higher delays the attenuation of the talker
echo should be increased, i.e. active echo cancellation is necessary.
In practice it can be observed that in some cases, either the cancelling does not occur, or it
is not fully performing.
Echo is characterized by two parameters: its attenuation and its delay. The less attenuation
and/or the more delay, the more the echo will become annoying.
Echo delay is the time it takes for the speech signal to go from the mouth of a subscriber
back to the ear of the same subscriber, with one or more reflections occurring along the
transmission path.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit Milliseconds with an integer value.
Standardization ITU-T Recommendation G.131 [i.12].
reference
Significant Optional.
Comment For fully digital networks the talker echo delay can be assumed to be equivalent to twice
the mean one-way delay.
ETSI
15 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
7.10 Talker echo attenuation
Definition In telecommunications, the term echo describes delayed and unwanted feedback of the sent
signal into the receive path. The so-called echo source is the reflection point between send
and receive directions, which could be one of the following causes:
• 4-wire/2-wire Hybrid Circuits (multiple reflections possible);
• coupling in handset cords;
• structure borne coupling in handsets;
• acoustical coupling between earpiece and microphone.
This phenomenon is characterized by two parameters: its attenuation and its delay.
See annex C for a more detailed discussion of talker echo and the related listener echo.
With the increased delays present in today's IP networks, echo has the potential to be much
more perceivable and annoying than in classical PSTN.
In order to achieve a similar user perception with higher delays the attenuation of the talker
echo should be increased, i.e. active echo cancellation is necessary.
In practice it can be observed that in some cases, either the cancelling does not occur, or it
is not fully performing.
Echo is characterized by two parameters: its attenuation and its delay. The less attenuation
and/or the more delay, the more the echo will become annoying.
Echo attenuation is the difference of level between the sending level and the (delayed)
receiving level both measured at the same subscriber while he is talking.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit dB with a resolution of one decimal.
Standardization ITU-T Recommendation G.131 [i.12], ITU-T Recommendation G.168 [i.13].
reference
Significant Optional.
Comment If active echo cancellation is used, ITU-T Recommendation G.168 [i.13] should be
applied; VoIP should provide echo attenuation of 55 dB. If the delay under all
circumstances is known to not exceed 50 ms lower values for the talker echo attenuation
(down to 35 dB) may be acceptable (see annex C).
Echo annoyance depends on two metrics: the attenuation and the delay.
For a similar attenuation level greater the delay is more important the annoyance will be
for the user(s).
Echo-Annoyance factor is defined as K.
K = EA - 40 × Log [(1 + delay/10) / (1 + delay/150)] + 6 × exp (-0,3 × delay²).
7.11 Listening speech quality
Definition Represents the intrinsic quality of speech signal after transmission. This indicator takes
into account the degradations generated on the signal by the transmission links.
ETSI
16 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
Assessment method Voice quality is evaluated by using the ITU-T Recommendation P.862 [i.2] standard with
the mapping functions according to ITU-T Recommendation P.862.1 [i.3] and
ITU-T Recommendation P.862.2 [i.4] standards.
MOS or Mean Opinion Score (calculated using the Perceptual Evaluation of Speech
Quality, or PESQ method) provides an objective view on the quality of the voice signal as
it may be perceived by the customer.
The MOS score is obtained by comparing speech samples:
• the original signal sent by the far end of the connection;
• the degraded signal received at the local end, where the measurement is applied.
The voice quality indicator is determined in the two directions of transmission.
Several MOS scores are determined in series during the same call. So for a given
transmission way, listening speech quality performance during the call is defined by the
mean value of MOS-LQOM measurements (in the same direction).
Unit Note between 1 (= very bad) and 5 (= excellent) determines on MOS-LQOM scale with a
resolution of two digits after the decimal point.
Standardization ITU-T Recommendation P.800 [i.1], ITU-T Recommendation P.800.1 [i.6],
reference ITU-T Recommendation P.862 [i.2], ITU-T Recommendation P.862.2 [i.4],
ITU-T Recommendation P.862.3 [i.5].
Significant Mandatory.
Comment The value of this indicator depends on the used codec, but also on impairments like IP
packet loss or low signal to noise ratio.
To make easier result comparisons, it is recommended to use the same speech samples in
all test configurations, or at least select different speech samples based on a thorough
selection and validation process.
ITU-T Recommendation P.862 [i.2] is measuring listening quality and does not take into
account impairments affecting conversational quality, like the end-to-end delay or echo.
It is also a one-way indicator, therefore it has to be measured separately in both
transmission directions, with no average between them afterwards.
This indicator may be separated between call types (IP to IP, IP to PSTN, IP to Mobile,
etc.) for a detailed analysis.
7.12 Listening speech quality stability
Definition It is well known that for IP networks, delay within the network can vary significantly, due
to congestion or indeed due to route changes during a session. Delay variations in the
network can be compensated to some extent through good jitter buffer design in the
receiver. Furthermore, packet loss in the network can occur due either to severe congestion
(buffer overflows) or routing problems, or in a receiver terminal due to jitter buffer
overflow. These factors will impact on the quality of the service.
Concerning voice over IP, a single measurement of speech quality once at the very
beginning of a call is not enough. They should be analysed all along the duration of the
call, typically several minutes.
This metric represents the stability of the voice quality during a communication of several
minutes long. This indicator takes into account the degradations generated on the signal by
the transmission links.
ETSI
17 Final draft ETSI EG 202 765-2 V1.1.1 (2008-12)
Assessment method Several measurements of MOS-LQOM score performed with
ITU-T Recommendation P.862 [i.2] and the adapted mapping function (see clause 7.11)
are performed in series within the same call. Typically, a measurement each 20 s is
enough. The results are reported in terms of statistics.
The assessment of Listening Speech Quality Stability is preformed in 5 steps. The generic
formulation is presented in annex A.
For stability indicator about Listening Speech Quality, THRESHOLD1 = 0,1 and the linear
weighting function applies in order to express Stability (ST-MOS) on a 0 to 100 scale. By
definition Stability equals 100 when no instability occurs and Stability ST-MOS equals 0
when instability is equal or more than 0,4.
ST-MOS is calculated as:
• ST-MOS = 100 - ( 250 × INS_MOS), and
• ST-MOS = 0 if [100-(250 × INS_MOS)] < 0.
This indicator is determined in the two directions of transmission.
Unit Statistics on MOS score variation are plotted on a 0 to 100 scale.
Standardization
reference
Significant Mandatory.
7.13 End to end delay
Definition Represent the global delay from one access to the other one. This indicator takes into
account the transmission delay on networks but also processing delay in sending and
receiving terminals.
End to end delay is one of the components of perceived voice quality, may be the major
one in VoIP. A great delay impacts negatively on QoS perceived by customer.
Assessment method The end to end delay is the delay from mouth to ear, which means the transmission delay
over the whole transmission path. For the purpose of this document, end to end delay does
not take into account the transducers delay (loudspeaker and microphone) while
measurements are done at the electrical interfaces of the end terminals.
To measure end to end delay it is needed to ensure a synchronization of both transmission
ends of the measurement device. This synchronization may be done by GPS clocks when
the two ends are distant. When ends are co-located synchronization may be done directly
by the analyser.
To assess the metric, the clock accuracy of the analyzer (or two analyzer parts) should be
better than 10 ppm.
The end to end delay is determined in the two directions of transmission.
Several measurements of delay are performed in series during the same call. So for a given
transmission way, end to end delay performance during the call
...


ETSI Guide
Speech Processing, Transmission and Quality Aspects (STQ);
QoS and network performance metrics and measurement methods
Part 2: Transmission Quality Indicator combining
Voice Quality Metrics
2 ETSI EG 202 765-2 V1.1.1 (2009-02)

Reference
DEG/STQ-00104-2
Keywords
performance, QoS, voice
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
Individual copies of the present document can be downloaded from:
http://www.etsi.org
The present document may be made available in more than one electronic version or in print. In any case of existing or
perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF).
In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive
within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
http://portal.etsi.org/tb/status/status.asp
If you find errors in the present document, please send your comment to one of the following services:
http://portal.etsi.org/chaircor/ETSI_support.asp
Copyright Notification
No part may be reproduced except as authorized by written permission.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2009.
All rights reserved.
TM TM TM TM
DECT , PLUGTESTS , UMTS , TIPHON , the TIPHON logo and the ETSI logo are Trade Marks of ETSI registered
for the benefit of its Members.
TM
3GPP is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners.
LTE™ is a Trade Mark of ETSI currently being registered
for the benefit of its Members and of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI
3 ETSI EG 202 765-2 V1.1.1 (2009-02)
Contents
Intellectual Property Rights . 5
Foreword . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 6
3 Abbreviations . 7
4 Introduction . 8
5 Measurement type . 9
6 Voice quality scale . 10
7 List of indicators . 10
7.1 Post Dialling Delay . 10
7.2 Media establishment delay . 10
7.3 Unsuccessful call ratio . 11
7.4 Premature release probability . 11
7.5 Level of active speech signal at reception . 11
7.6 Noise level at reception . 12
7.7 Noise to signal ratio at reception . 12
7.8 Speech signal attenuation (or gain) after transmission . 13
7.9 Talker echo delay . 14
7.10 Talker echo attenuation . 15
7.11 Listening speech quality . 15
7.12 Listening speech quality stability . 16
7.13 End to end delay . 17
7.14 End to end delay variation . 18
7.15 Frequency responses at the reception . 18
8 Measurement frequency . 19
9 Duration of test calls. 19
10 Measurement configurations . 19
10.1 VoIP services . 19
10.2 VoIP services in triple play context . 20
11 Measurement locations and their distribution . 20
11.1 Measurement location requirements . 20
11.2 Method to determine measurement locations . 21
12 Results presentation . 22
12.1 One-view visualization of performances . 22
12.1.1 Pie diagram with all indicators . 22
12.1.2 Pie diagram with mandatory indicators . 23
12.2 Non-compliant limits for result visualization . 23
13 Publication of the results . 24
Annex A: Indicator stability formulation . 25
A.1 Presentation . 25
A.2 Formulation . 25
A.3 Graphic illustration of the formulation . 26
A.4 Some examples of stability indicator calculated on Listening Speech Quality . 28
ETSI
4 ETSI EG 202 765-2 V1.1.1 (2009-02)
Annex B: Calibration to take into account the frequency response of transducers . 30
B.1 Method presentation . 30
B.1.1 Sending . 30
B.1.2 Sending . 31
B.1.3 Global communication . 31
B.1.4 Applications . 31
Annex C: Echo presentation. 32
C.1 Talker echo . 32
C.2 Listener echo . 32
Annex D: Examples of measurement point distribution . 33
D.1 Example of France. 33
D.2 Example of Switzerland . 34
History . 37

ETSI
5 ETSI EG 202 765-2 V1.1.1 (2009-02)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (http://webapp.etsi.org/IPR/home.asp).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This ETSI Guide (EG) has been produced by ETSI Technical Committee Speech and multimedia Transmission Quality
(STQ).
ETSI
6 ETSI EG 202 765-2 V1.1.1 (2009-02)
1 Scope
The present document aims at identifying and defining indicators and methodologies for a use in a context of end-user
quality characterization and supervision of voice telephony services.
In this context the measurements and metric determinations are perform by analysing signals accessible on user-end
services and not on the network. In order to mirror the reality in terms of access to the services at the user-end
measurements and analysis are perform on electrical signal that exclude the electro-acoustic part of the end equipment
but the probe adaptation to electric interface of the end user equipment much take into account the electro-acoustic
characteristics of this terminal.
2 References
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific.
• For a specific reference, subsequent revisions do not apply.
• Non-specific reference may be made only to a complete document or a part thereof and only in the following
cases:
- if it is accepted that it will be possible to use all future changes of the referenced document for the
purposes of the referring document;
- for informative references.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
2.1 Normative references
The following referenced documents are indispensable for the application of the present document. For dated
references, only the edition cited applies. For non-specific references, the latest edition of the referenced document
(including any amendments) applies.
Not applicable.
2.2 Informative references
The following referenced documents are not essential to the use of the present document but they assist the user with
regard to a particular subject area. For non-specific references, the latest version of the referenced document (including
any amendments) applies.
[i.1] ITU-T Recommendation P.800: "Methods for subjective determination of transmission quality".
[i.2] ITU-T Recommendation P.862: "Perceptual evaluation of speech quality (PESQ): An objective
method for end-to-end speech quality assessment of narrow-band telephone networks and speech
codecs".
[i.3] ITU-T Recommendation P.862.1: "Mapping function for transforming P.862 raw result scores to
MOS-LQO".
[i.4] ITU-T Recommendation P.862.2: "Wideband extension to Recommendation P.862 for the
assessment of wideband telephone networks and speech codecs".
ETSI
7 ETSI EG 202 765-2 V1.1.1 (2009-02)
[i.5] ITU-T Recommendation P.862.3: "Application guide for objective quality measurement based on
Recommendations P.862, P.862.1 and P.862.2".
[i.6] ITU-T Recommendation P.800.1: "Mean Opinion Score (MOS) terminology".
[i.7] ITU-T Recommendation E.800: "Terms and definitions related to quality of service and network
performance including dependability".
[i.8] ITU-T Recommendation E.845: "Connection accessibility objective for the international telephone
service".
[i.9] ETSI EG 201 769: "Speech Processing, Transmission and Quality Aspects (STQ); QoS parameter
defini tions and measurements; Parameters for voice telephony service required under the ONP
Voice Telephony Directive 98/10/EC".
[i.10] ITU-T Recommendation P.56: "Objective measurement of active speech level".
[i.11] ITU-T Recommendation O.41: "Psophometer for use on telephone-type circuits".
[i.12] ITU-T Recommendation G.131: "Talker echo and its control".
[i.13] ITU-T Recommendation G.168: "Digital network echo cancellers".
[i.14] ITU-T Recommendation G.114: "One-way transmission time".
[i.15] ITU-T Recommendation P.505: "One-view visualization of speech quality measurement results".
[i.16] ETSI EG 201 377 (all parts): "Speech Processing, Transmission and Quality Aspects (STQ);
Specification and measurement of speech transmission quality".
[i.17] ITU-T Recommendation H.323: "Packet-based multimedia communications systems".
[i.18] ITU-T Recommendation H.225.0: "Call signalling protocols and media stream packetization for
packet-based multimedia communication systems".
[i.19] ITU-Recommendation P.50: "T Artificial voices".
[i.20] ITU-Recommendation P.501: "Test signals for use in telephonometry".
3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
ADSL Asymmetrical Digital Subscriber Line
ATA Analog Telephone Adapter
IP Internet Protocol
ISDN Integrated Services Digital Network
ITU-T International Telecommunication Union - Telecommunication standardization sector
GPS Global Positioning System
GSM Global System for Mobile communications
HATS Head And Torso Simulator
MGCP Media Gateway Control Protocol
MOS Mean Opinion Score
MOS-LQOM Mean Opinion Store-Listening Quality Objective Mixed bandwidths
PDD Post Dialling Delay
PESQ Perceptual Evaluation of Speech Quality
PSTN Public Switched Telephone Network
RTP Real Time Protocol
SIP Session Initiation Protocol
UMTS Universal Mobile Telecommunications Service
VoIP Voice over Internet Protocol
ETSI
8 ETSI EG 202 765-2 V1.1.1 (2009-02)
4 Introduction
The assessment of transmission quality based on voice quality metrics is already addressed in several standards at ETSI
(e.g. EG 201 377 [i.16] series) and elsewhere (mostly ITU-T recommendations from the P and G series). These
different documents are addressing the measurement methodologies in terms of metrics, threshold, data acquisition or
modelling of subjective opinion.
The objective of the present document is to complement this material with practical requirements of use in the context
of service verification and benchmark on a large and representative scale from the point of view of the end-users or of
the regulatory authorities. This has been made necessary by the current or recent evolutions of the telecommunication
sector:
• the competitive environment, in particular in voice services, where public protocols with high quality services
have been replaced by a multitude of service providers with less guarantees, and where clients can very easily
change their service providers;
• the development of time varying quality in telecommunications, first in mobile offers (due to mobility and
irregular network coverage), but now also for fix services (mostly VoIP);
• the cohabitation, interaction and competition between services based on different technologies.
Voice transmission quality is now recognized as a differentiating factor, but it remains very difficult to quantify.
To achieve the goal mentioned beforehand, there are several existing possibilities, not fully satisfying:
• Customer surveys. This is by far the cheapest way to assess the perception of end users. But the bias
introduced by the other factors like price, as well as the fact that voice quality itself is rarely questioned as
itself or in a satisfactory way (one never knows before a survey what are the problems encountered by end
users), makes this source not really reliable.
• Pseudo-subjective tests, with a few human testers assessing the quality of real links in several situations. This
method has the major drawback of its lack of reproducibility, and is often applied without using the standard
metrics and quality scales that can be found in standards like ITU-T Recommendation P.800 [i.1]. It is also
very long to run and not really cheap in the current competitive context where so many offers have to be
assessed. And it is not easily applicable in a context of quality changing over time.
• Objective tests. This is the most reliable way, although it is also based on sampling and can cost a lot of money
in the case of a large deployment of probes or robots.
The present document assumes that this last family of methodology answers the needs of a reliable comparison of
telephony offers and is applied without combination with other methods.
What definitely matters is the point of view of the end-users. What they perceive is not only the result of the
transmission of a signal across a network; the processing of this signal at the sending and at the receiving sides has also
a big importance. Therefore, it seems obvious not to use passive network monitoring systems to assess end-to-end voice
quality, but rather active systems simulating the behaviour of the end users, including the terminal. A big advantage of
such an approach is that it is highly technical and protocol agnostic, and therefore compliant with the expectations of
users, which are not judging voice quality of PSTN, GSM or VoIP services following different criteria.
Last important aspect that is addressed in the present document is the practical organization of measurement campaigns
in order to get a realistic and reliable vision of the services as perceived by the end-users. In particular, the questions of
the periodicity of measurement and of the geographical coverage (i.e. more generally the sampling approach).
In order to mirror the reality in terms of access to the services, a reliable measurement or supervision system should
provide the possibility to collect information from probes or robots adapted to the most common interfaces available.
This includes:
• analogue access (for the simulation of PSTN or of analogue phones behind an ATA box or an ADSL modem);
• ISDN access;
• handset (for any wireline terminal);
• electrical input and output (for PC soundcards of for any wireless terminal);
ETSI
9 ETSI EG 202 765-2 V1.1.1 (2009-02)
• GSM;
• UMTS;
• ethernet with IP phone termination (SIP, ITU-T Recommendation H.323 [i.17], MGCP, etc.).
Any combination of end-to-end connection between the types of access mentioned here have to be considered when a
measurement campaign is scheduled. Nevertheless, of course, there are practical limitations:
• the number of measurements for a given type of access should be in proportion with its level of use in the real
life;
• the number of probes and of measurement results available will be adapted to the real needs as well as to the
capacity (mostly in terms of cost and of processing capability) of the entity running these measurements.
Figure 4.1 shows these different configurations and interfaces.

GSM and UMTS Ethernet access
terminals on IP Network
Electrical access
on mobile
terminal
Analogue access
ADSL modem
Radio Network
Handset access
on analogue
terminal
IP Network
Handset
PSTN
access on
Handset access
terminal
on ISDN terminal
connected to
IP Network
Analogue access
ISDN access Electrical access
on Wireless
terminal
Electrical access
connected to IP
on Wireless
Electrical access on PC
Network
terminal
connected to
PSTN
Figure 4.1: Possible configurations and interfaces in context of user characterization
5 Measurement type
To perform quality services assessments, there are two different methods: intrusive and non intrusive measurements.
The non intrusive measurements are not really adapted to end user surveys because it requires to install probes at the
user's terminals.
The intrusive measurements are more adapted to end user surveys because probe connection with end user terminals is
easier. Compared to non intrusive measurements, the intrusive methods have an advantage: the opportunity for voice
quality assessment to use models with references such as ITU-T Recommendation P.862[ i.2]
(see also ITU-T recommendations P.862.1 [i.3], P.862.2 [i.4] and P.862.3 [i.5] concerning mapping functions and
application guide) which give results close to subjective perception of the speech quality.
ETSI
10 ETSI EG 202 765-2 V1.1.1 (2009-02)
In this context, the intrusive measurements using models working with references for speech quality assessment will be
perform for end user survey.
6 Voice quality scale
It is important to consider that nowadays telephony has entered an era where traditional narrowband services will
cohabit with new services offering wideband audio capacities. For end-users, these are not separated kinds of services.
Therefore, the assessment of transmission quality of voice should now be based on common metrics and objective
quality levels and scales, in replacement of the existing narrow-band only ones. In this context, it is appropriate to use
the MOS-LQOM scale to characterize voice quality of narrow-band services and wideband services.
See ITU-T Recommendation P.800.1 [i.6] for more information on MOS terminology.
7 List of indicators
The indicators proposed for the context of end-user quality survey of voice services are:
7.1 Post Dialling Delay
Definition Post Dialling Delay (PDD) evaluates service availability to set up calls in an acceptable
delay. It is linked to the service architecture complexity, and to the performance of the
constituting network elements.
Post Dialling Delay is the time interval between the end of dialling by the caller and the
reception back by him of the appropriate ringing tone or recorded announcement.
Metric determines on one of the two access of the communication.
Assessment method Indicator determines sequentially from the two access of call configuration. This indicator
characterizes only the caller part of the configuration.
Unit Millisecond with an integer value.
Standardization
reference
Significant Mandatory.
Comment This indicator has to be separated between call types (IP to IP, IP to PSTN, IP to mobile,
etc.) for a detailed analysis.
The objective set up in for universal telephony service has been set up to 2 900 ms in the
French regulator recommendation.
7.2 Media establishment delay
Definition Time determines on one of the two access of the communication, between off hock of the
called and the beginning of voice signal receive.
Assessment method Indicator determines sequentially from the two access of call configuration.
On an IP access this indicator may be assessed by using a non-intrusive probe, such as a
protocol analyser. Media establishment delay may be evaluated through the analysis of
media flows and signalling. For ITU-T Recommendation H.323 protocol [i.17] the flow
establishment delay corresponds to the time elapsed between the emission of the
ITU-T Recommendation H.225.0 [i.18] "CONNECT" message and the arrival of the first
IP packet including speech signal.
Unit Millisecond with an integer value.
ETSI
11 ETSI EG 202 765-2 V1.1.1 (2009-02)
Standardization
reference
Significant Optional.
Comment This indicator has to be separated between caller and called site for a detailed analysis.
7.3 Unsuccessful call ratio
Definition Ratio of unsuccessful calls to the total number of call attempts in a specified time period.
An unsuccessful call is a call attempt to a valid number, properly dialled following dial
tone, where neither called party busy tone, nor ringing tone, nor answer signal, is
recognized on the access line of the calling user within 30 seconds from the instant when
the address information required for setting up a call is received by the network.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit % with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation E.800 [i.7], ITU-T Recommendation E.845 [i.8],
reference EG 201 769 [i.9].
Significant Mandatory.
Comment The limit of 30 seconds is the default set-up of a timer in SS7 protocol.
7.4 Premature release probability
Definition This indicator characterizes the ability to release a service. It is based on the measurement
of the number of released communications in comparison with the number of established
communications.
Released communications are defined as communications released before voluntary action
from one of the ends of the transmission.
Assessment method
Unit % with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation E.800 [i.7].
reference
Significant Optional.
7.5 Level of active speech signal at reception
Definition Level of speech signal received after transmission.
The level of the signal heard by the user has an impact on the quality he will perceive. A
too low signal will be hardly audible and masked by the noise, while a too high level will
be painful.
Therefore, a measurement of the speech signal level is necessary to ensure a good listening
comfort.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
A typical method for the measurement of this parameter, based on a sample by sample
approach and a moving threshold between noise and speech, is given in
ITU-T Recommendation P.56 [i.10].
ETSI
12 ETSI EG 202 765-2 V1.1.1 (2009-02)
Unit dBm with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation P.56 [i.10].
reference
Significant Optional.
Comment Each sample of signal has a level, generally express in mV. The mean speech level is the
transformation on as appropriate logarithmic scale of the mean signal voltage.
The samples taken into account for this measurement are the ones seen as speech (the
others are taken into account for noise measurements).
It is recommended to fall within classical speech levels values, i.e. between -25 dBm and
-10 dBm.
7.6 Noise level at reception
Definition Level of noise determines at reception in non-speech segment of speech sample.
The noise present besides the speech signal can have characteristics that can become a
disagreement, for instance if they have a varying spectrum (crowd, noise, for instance). But
the more important source of annoying due to noise is simply its level.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
The measurement of these parameters is normally performed as for speech signal level
(see clause 7.5), but on the samples identified as non-speech.
Unit dBmOp with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation O.41 [i.11].
reference
Significant Optional.
Comment Each sample of signal has a level, generally express in mV. The mean noise level is the
transformation on as appropriate logarithmic scale of the mean signal voltage of the noise
samples.
To get a more accurate noise level measure, a frequency transform needs to be done in
order to apply a psophometric weighting (see ITU-T Recommendation O.41 [i.11]).
It is recommended not to have noises louder than -50 dBmOp.
7.7 Noise to signal ratio at reception
Definition Difference between the active vocal level and the level of noise at the reception.
The noise present besides the speech signal can have characteristics that can become a
disagreement, for instance if they have a varying spectrum (crowd, noise, for instance). But
the more important source of annoying due to noise is simply its level, and particularly the
relative level compared to speech.
Assessment method Combination of speech signal level (see clause 7.5) and noise level (see clause 7.6) can
replace noise to signal ration indicator.
The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
Speech signal level/noise level = SNR.
ETSI
13 ETSI EG 202 765-2 V1.1.1 (2009-02)
Indicator determines in the two directions of transmission.
Unit dB with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation P.56 [i.10].
reference
Significant Optional.
Comment It is recommended not to have SNR lower than 30 dB.
7.8 Speech signal attenuation (or gain) after transmission
Definition Variant metric of the speech signal level (if one knows the sending speech level, they are
even redundant).
Speech signal attenuation after transmission is the difference between the active vocal level
at receiving and sending access.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter. Once the speech signal level has been computed
(see clause 7.5), it is compared with the level of the sent signal. The attenuation is the
difference between these two levels.
There are other methods to compute this parameter, based for instance on intrusive
measurement made with sine waves and a specific weighting function.
Indicator determines in the two directions of transmission.
Unit dB with the resolution of 1 digit after the decimal point.
Standardization
reference
Significant Optional.
Comment It is recommended to comply with PSTN attenuation rules, i.e. an attenuation between
6 dB and 10 dB.
ETSI
14 ETSI EG 202 765-2 V1.1.1 (2009-02)
7.9 Talker echo delay
Definition In telecommunications, the term echo describes delayed and unwanted feedback of the
send signal into the receive path. The so-called echo source is the reflection point between
send and receive directions, which could be one of the following causes:
• 4-wire/2-wire Hybrid Circuits (multiple reflections possible);
• coupling in handset cords;
• structure borne coupling in handsets;
• acoustical coupling between earpiece and microphone.
This phenomenon is characterized by two parameters: its attenuation and its delay.
See annex C for a more detailed discussion of talker echo and the related listener echo.
With the increased delays present in today's IP networks, echo has the potential to be much
more perceivable and annoying than in classical PSTN.
In order to achieve a similar user perception with higher delays the attenuation of the talker
echo should be increased, i.e. active echo cancellation is necessary.
In practice it can be observed that in some cases, either the cancelling does not occur, or it
is not fully performing.
Echo is characterized by two parameters: its attenuation and its delay. The less attenuation
and/or the more delay, the more the echo will become annoying.
Echo delay is the time it takes for the speech signal to go from the mouth of a subscriber
back to the ear of the same subscriber, with one or more reflections occurring along the
transmission path.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit Milliseconds with an integer value.
Standardization ITU-T Recommendation G.131 [i.12].
reference
Significant Optional.
Comment For fully digital networks the talker echo delay can be assumed to be equivalent to twice
the mean one-way delay.
ETSI
15 ETSI EG 202 765-2 V1.1.1 (2009-02)
7.10 Talker echo attenuation
Definition In telecommunications, the term echo describes delayed and unwanted feedback of the sent
signal into the receive path. The so-called echo source is the reflection point between send
and receive directions, which could be one of the following causes:
• 4-wire/2-wire Hybrid Circuits (multiple reflections possible);
• coupling in handset cords;
• structure borne coupling in handsets;
• acoustical coupling between earpiece and microphone.
This phenomenon is characterized by two parameters: its attenuation and its delay.
See annex C for a more detailed discussion of talker echo and the related listener echo.
With the increased delays present in today's IP networks, echo has the potential to be much
more perceivable and annoying than in classical PSTN.
In order to achieve a similar user perception with higher delays the attenuation of the talker
echo should be increased, i.e. active echo cancellation is necessary.
In practice it can be observed that in some cases, either the cancelling does not occur, or it
is not fully performing.
Echo is characterized by two parameters: its attenuation and its delay. The less attenuation
and/or the more delay, the more the echo will become annoying.
Echo attenuation is the difference of level between the sending level and the (delayed)
receiving level both measured at the same subscriber while he is talking.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit dB with a resolution of one decimal.
Standardization ITU-T Recommendation G.131 [i.12], ITU-T Recommendation G.168 [i.13].
reference
Significant Optional.
Comment If active echo cancellation is used, ITU-T Recommendation G.168 [i.13] should be
applied; VoIP should provide echo attenuation of 55 dB. If the delay under all
circumstances is known to not exceed 50 ms lower values for the talker echo attenuation
(down to 35 dB) may be acceptable (see annex C).
Echo annoyance depends on two metrics: the attenuation and the delay.
For a similar attenuation level greater the delay is more important the annoyance will be
for the user(s).
Echo-Annoyance factor is defined as K.
K = EA - 40 × Log [(1 + delay/10) / (1 + delay/150)] + 6 × exp (-0,3 × delay²).
7.11 Listening speech quality
Definition Represents the intrinsic quality of speech signal after transmission. This indicator takes
into account the degradations generated on the signal by the transmission links.
ETSI
16 ETSI EG 202 765-2 V1.1.1 (2009-02)
Assessment method Voice quality is evaluated by using the ITU-T Recommendation P.862 [i.2] standard with
the mapping functions according to ITU-T Recommendation P.862.1 [i.3] and
ITU-T Recommendation P.862.2 [i.4] standards.
MOS or Mean Opinion Score (calculated using the Perceptual Evaluation of Speech
Quality, or PESQ method) provides an objective view on the quality of the voice signal as
it may be perceived by the customer.
The MOS score is obtained by comparing speech samples:
• the original signal sent by the far end of the connection;
• the degraded signal received at the local end, where the measurement is applied.
The voice quality indicator is determined in the two directions of transmission.
Several MOS scores are determined in series during the same call. So for a given
transmission way, listening speech quality performance during the call is defined by the
mean value of MOS-LQOM measurements (in the same direction).
Unit Note between 1 (= very bad) and 5 (= excellent) determines on MOS-LQOM scale with a
resolution of two digits after the decimal point.
Standardization ITU-T Recommendation P.800 [i.1], ITU-T Recommendation P.800.1 [i.6],
reference ITU-T Recommendation P.862 [i.2], ITU-T Recommendation P.862.2 [i.4],
ITU-T Recommendation P.862.3 [i.5].
Significant Mandatory.
Comment The value of this indicator depends on the used codec, but also on impairments like IP
packet loss or low signal to noise ratio.
To make easier result comparisons, it is recommended to use the same speech samples in
all test configurations, or at least select different speech samples based on a thorough
selection and validation process.
ITU-T Recommendation P.862 [i.2] is measuring listening quality and does not take into
account impairments affecting conversational quality, like the end-to-end delay or echo.
It is also a one-way indicator, therefore it has to be measured separately in both
transmission directions, with no average between them afterwards.
This indicator may be separated between call types (IP to IP, IP to PSTN, IP to Mobile,
etc.) for a detailed analysis.
7.12 Listening speech quality stability
Definition It is well known that for IP networks, delay within the network can vary significantly, due
to congestion or indeed due to route changes during a session. Delay variations in the
network can be compensated to some extent through good jitter buffer design in the
receiver. Furthermore, packet loss in the network can occur due either to severe congestion
(buffer overflows) or routing problems, or in a receiver terminal due to jitter buffer
overflow. These factors will impact on the quality of the service.
Concerning voice over IP, a single measurement of speech quality once at the very
beginning of a call is not enough. They should be analysed all along the duration of the
call, typically several minutes.
This metric represents the stability of the voice quality during a communication of several
minutes long. This indicator takes into account the degradations generated on the signal by
the transmission links.
ETSI
17 ETSI EG 202 765-2 V1.1.1 (2009-02)
Assessment method Several measurements of MOS-LQOM score performed with
ITU-T Recommendation P.862 [i.2] and the adapted mapping function (see clause 7.11)
are performed in series within the same call. Typically, a measurement each 20 s is
enough. The results are reported in terms of statistics.
The assessment of Listening Speech Quality Stability is preformed in 5 steps. The generic
formulation is presented in annex A.
For stability indicator about Listening Speech Quality, THRESHOLD1 = 0,1 and the linear
weighting function applies in order to express Stability (ST-MOS) on a 0 to 100 scale. By
definition Stability equals 100 when no instability occurs and Stability ST-MOS equals 0
when instability is equal or more than 0,4.
ST-MOS is calculated as:
• ST-MOS = 100 - ( 250 × INS_MOS), and
• ST-MOS = 0 if [100-(250 × INS_MOS)] < 0.
This indicator is determined in the two directions of transmission.
Unit Statistics on MOS score variation are plotted on a 0 to 100 scale.
Standardization
reference
Significant Mandatory.
7.13 End to end delay
Definition Represent the global delay from one access to the other one. This indicator takes into
account the transmission delay on networks but also processing delay in sending and
receiving terminals.
End to end delay is one of the components of perceived voice quality, may be the major
one in VoIP. A great delay impacts negatively on QoS perceived by customer.
Assessment method The end to end delay is the delay from mouth to ear, which means the transmission delay
over the whole transmission path. For the purpose of this document, end to end delay does
not take into account the transducers delay (loudspeaker and microphone) while
measurements are done at the electrical interfaces of the end terminals.
To measure end to end delay it is needed to ensure a synchronization of both transmission
ends of the measurement device. This synchronization may be done by GPS clocks when
the two ends are distant. When ends are co-located synchronization may be done directly
by the analyser.
To assess the metric, the clock accuracy of the analyzer (or two analyzer parts) should be
better than 10 ppm.
The end to end delay is determined in the two directions of transmission.
Several measurements of delay are performed in series during the same call. So for a given
transmission way, end to end delay performance during the call is defined by the mean
value of delay measurements (in the same direction).
Unit Millisecond with an integer value.
Standardization ITU-T Recommendation G.114 (session 1) [i.14].
reference
Significant Mandatory.
ETSI
18 ETSI EG 202 765-2 V1.1.1 (2009-02)
Comment The metric measurement needs time synchronization between the two parts of the analyzer
connected to the both access of the communication path.
The standards (ITU-T Recommendation G.114 [i.14] in particular) recommend not going
beyond 150 ms in one-w
...


SLOVENSKI STANDARD
01-maj-2009
9LGLNLREGHODYHSUHQRVDLQNDNRYRVWLJRYRUD 674 0HWRGHPHWULNHLQPHUMHQMD
NDNRYRVWLVWRULWHY 4R6 LQ]PRJOMLYRVWLRPUHåLMGHO.D]DOQLNSUHQRVQH
NDNRYRVWLYNOMXþQR]PHWULNRNDNRYRVWLJRYRUD
Speech Processing, Transmission and Quality Aspects (STQ) - QoS and network
performance metrics and measurement methods - Part 2: Transmission Quality Indicator
combining Voice Quality Metrics
Ta slovenski standard je istoveten z: EG 202 765-2 Version 1.1.1
ICS:
33.040.35 Telefonska omrežja Telephone networks
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

ETSI Guide
Speech Processing, Transmission and Quality Aspects (STQ);
QoS and network performance metrics and measurement methods
Part 2: Transmission Quality Indicator combining
Voice Quality Metrics
2 ETSI EG 202 765-2 V1.1.1 (2009-02)

Reference
DEG/STQ-00104-2
Keywords
performance, QoS, voice
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
Individual copies of the present document can be downloaded from:
http://www.etsi.org
The present document may be made available in more than one electronic version or in print. In any case of existing or
perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF).
In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive
within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
http://portal.etsi.org/tb/status/status.asp
If you find errors in the present document, please send your comment to one of the following services:
http://portal.etsi.org/chaircor/ETSI_support.asp
Copyright Notification
No part may be reproduced except as authorized by written permission.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2009.
All rights reserved.
TM TM TM TM
DECT , PLUGTESTS , UMTS , TIPHON , the TIPHON logo and the ETSI logo are Trade Marks of ETSI registered
for the benefit of its Members.
TM
3GPP is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners.
LTE™ is a Trade Mark of ETSI currently being registered
for the benefit of its Members and of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI
3 ETSI EG 202 765-2 V1.1.1 (2009-02)
Contents
Intellectual Property Rights . 5
Foreword . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 6
3 Abbreviations . 7
4 Introduction . 8
5 Measurement type . 9
6 Voice quality scale . 10
7 List of indicators . 10
7.1 Post Dialling Delay . 10
7.2 Media establishment delay . 10
7.3 Unsuccessful call ratio . 11
7.4 Premature release probability . 11
7.5 Level of active speech signal at reception . 11
7.6 Noise level at reception . 12
7.7 Noise to signal ratio at reception . 12
7.8 Speech signal attenuation (or gain) after transmission . 13
7.9 Talker echo delay . 14
7.10 Talker echo attenuation . 15
7.11 Listening speech quality . 15
7.12 Listening speech quality stability . 16
7.13 End to end delay . 17
7.14 End to end delay variation . 18
7.15 Frequency responses at the reception . 18
8 Measurement frequency . 19
9 Duration of test calls. 19
10 Measurement configurations . 19
10.1 VoIP services . 19
10.2 VoIP services in triple play context . 20
11 Measurement locations and their distribution . 20
11.1 Measurement location requirements . 20
11.2 Method to determine measurement locations . 21
12 Results presentation . 22
12.1 One-view visualization of performances . 22
12.1.1 Pie diagram with all indicators . 22
12.1.2 Pie diagram with mandatory indicators . 23
12.2 Non-compliant limits for result visualization . 23
13 Publication of the results . 24
Annex A: Indicator stability formulation . 25
A.1 Presentation . 25
A.2 Formulation . 25
A.3 Graphic illustration of the formulation . 26
A.4 Some examples of stability indicator calculated on Listening Speech Quality . 28
ETSI
4 ETSI EG 202 765-2 V1.1.1 (2009-02)
Annex B: Calibration to take into account the frequency response of transducers . 30
B.1 Method presentation . 30
B.1.1 Sending . 30
B.1.2 Sending . 31
B.1.3 Global communication . 31
B.1.4 Applications . 31
Annex C: Echo presentation. 32
C.1 Talker echo . 32
C.2 Listener echo . 32
Annex D: Examples of measurement point distribution . 33
D.1 Example of France. 33
D.2 Example of Switzerland . 34
History . 37

ETSI
5 ETSI EG 202 765-2 V1.1.1 (2009-02)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (http://webapp.etsi.org/IPR/home.asp).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This ETSI Guide (EG) has been produced by ETSI Technical Committee Speech and multimedia Transmission Quality
(STQ).
ETSI
6 ETSI EG 202 765-2 V1.1.1 (2009-02)
1 Scope
The present document aims at identifying and defining indicators and methodologies for a use in a context of end-user
quality characterization and supervision of voice telephony services.
In this context the measurements and metric determinations are perform by analysing signals accessible on user-end
services and not on the network. In order to mirror the reality in terms of access to the services at the user-end
measurements and analysis are perform on electrical signal that exclude the electro-acoustic part of the end equipment
but the probe adaptation to electric interface of the end user equipment much take into account the electro-acoustic
characteristics of this terminal.
2 References
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific.
• For a specific reference, subsequent revisions do not apply.
• Non-specific reference may be made only to a complete document or a part thereof and only in the following
cases:
- if it is accepted that it will be possible to use all future changes of the referenced document for the
purposes of the referring document;
- for informative references.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
2.1 Normative references
The following referenced documents are indispensable for the application of the present document. For dated
references, only the edition cited applies. For non-specific references, the latest edition of the referenced document
(including any amendments) applies.
Not applicable.
2.2 Informative references
The following referenced documents are not essential to the use of the present document but they assist the user with
regard to a particular subject area. For non-specific references, the latest version of the referenced document (including
any amendments) applies.
[i.1] ITU-T Recommendation P.800: "Methods for subjective determination of transmission quality".
[i.2] ITU-T Recommendation P.862: "Perceptual evaluation of speech quality (PESQ): An objective
method for end-to-end speech quality assessment of narrow-band telephone networks and speech
codecs".
[i.3] ITU-T Recommendation P.862.1: "Mapping function for transforming P.862 raw result scores to
MOS-LQO".
[i.4] ITU-T Recommendation P.862.2: "Wideband extension to Recommendation P.862 for the
assessment of wideband telephone networks and speech codecs".
ETSI
7 ETSI EG 202 765-2 V1.1.1 (2009-02)
[i.5] ITU-T Recommendation P.862.3: "Application guide for objective quality measurement based on
Recommendations P.862, P.862.1 and P.862.2".
[i.6] ITU-T Recommendation P.800.1: "Mean Opinion Score (MOS) terminology".
[i.7] ITU-T Recommendation E.800: "Terms and definitions related to quality of service and network
performance including dependability".
[i.8] ITU-T Recommendation E.845: "Connection accessibility objective for the international telephone
service".
[i.9] ETSI EG 201 769: "Speech Processing, Transmission and Quality Aspects (STQ); QoS parameter
defini tions and measurements; Parameters for voice telephony service required under the ONP
Voice Telephony Directive 98/10/EC".
[i.10] ITU-T Recommendation P.56: "Objective measurement of active speech level".
[i.11] ITU-T Recommendation O.41: "Psophometer for use on telephone-type circuits".
[i.12] ITU-T Recommendation G.131: "Talker echo and its control".
[i.13] ITU-T Recommendation G.168: "Digital network echo cancellers".
[i.14] ITU-T Recommendation G.114: "One-way transmission time".
[i.15] ITU-T Recommendation P.505: "One-view visualization of speech quality measurement results".
[i.16] ETSI EG 201 377 (all parts): "Speech Processing, Transmission and Quality Aspects (STQ);
Specification and measurement of speech transmission quality".
[i.17] ITU-T Recommendation H.323: "Packet-based multimedia communications systems".
[i.18] ITU-T Recommendation H.225.0: "Call signalling protocols and media stream packetization for
packet-based multimedia communication systems".
[i.19] ITU-Recommendation P.50: "T Artificial voices".
[i.20] ITU-Recommendation P.501: "Test signals for use in telephonometry".
3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
ADSL Asymmetrical Digital Subscriber Line
ATA Analog Telephone Adapter
IP Internet Protocol
ISDN Integrated Services Digital Network
ITU-T International Telecommunication Union - Telecommunication standardization sector
GPS Global Positioning System
GSM Global System for Mobile communications
HATS Head And Torso Simulator
MGCP Media Gateway Control Protocol
MOS Mean Opinion Score
MOS-LQOM Mean Opinion Store-Listening Quality Objective Mixed bandwidths
PDD Post Dialling Delay
PESQ Perceptual Evaluation of Speech Quality
PSTN Public Switched Telephone Network
RTP Real Time Protocol
SIP Session Initiation Protocol
UMTS Universal Mobile Telecommunications Service
VoIP Voice over Internet Protocol
ETSI
8 ETSI EG 202 765-2 V1.1.1 (2009-02)
4 Introduction
The assessment of transmission quality based on voice quality metrics is already addressed in several standards at ETSI
(e.g. EG 201 377 [i.16] series) and elsewhere (mostly ITU-T recommendations from the P and G series). These
different documents are addressing the measurement methodologies in terms of metrics, threshold, data acquisition or
modelling of subjective opinion.
The objective of the present document is to complement this material with practical requirements of use in the context
of service verification and benchmark on a large and representative scale from the point of view of the end-users or of
the regulatory authorities. This has been made necessary by the current or recent evolutions of the telecommunication
sector:
• the competitive environment, in particular in voice services, where public protocols with high quality services
have been replaced by a multitude of service providers with less guarantees, and where clients can very easily
change their service providers;
• the development of time varying quality in telecommunications, first in mobile offers (due to mobility and
irregular network coverage), but now also for fix services (mostly VoIP);
• the cohabitation, interaction and competition between services based on different technologies.
Voice transmission quality is now recognized as a differentiating factor, but it remains very difficult to quantify.
To achieve the goal mentioned beforehand, there are several existing possibilities, not fully satisfying:
• Customer surveys. This is by far the cheapest way to assess the perception of end users. But the bias
introduced by the other factors like price, as well as the fact that voice quality itself is rarely questioned as
itself or in a satisfactory way (one never knows before a survey what are the problems encountered by end
users), makes this source not really reliable.
• Pseudo-subjective tests, with a few human testers assessing the quality of real links in several situations. This
method has the major drawback of its lack of reproducibility, and is often applied without using the standard
metrics and quality scales that can be found in standards like ITU-T Recommendation P.800 [i.1]. It is also
very long to run and not really cheap in the current competitive context where so many offers have to be
assessed. And it is not easily applicable in a context of quality changing over time.
• Objective tests. This is the most reliable way, although it is also based on sampling and can cost a lot of money
in the case of a large deployment of probes or robots.
The present document assumes that this last family of methodology answers the needs of a reliable comparison of
telephony offers and is applied without combination with other methods.
What definitely matters is the point of view of the end-users. What they perceive is not only the result of the
transmission of a signal across a network; the processing of this signal at the sending and at the receiving sides has also
a big importance. Therefore, it seems obvious not to use passive network monitoring systems to assess end-to-end voice
quality, but rather active systems simulating the behaviour of the end users, including the terminal. A big advantage of
such an approach is that it is highly technical and protocol agnostic, and therefore compliant with the expectations of
users, which are not judging voice quality of PSTN, GSM or VoIP services following different criteria.
Last important aspect that is addressed in the present document is the practical organization of measurement campaigns
in order to get a realistic and reliable vision of the services as perceived by the end-users. In particular, the questions of
the periodicity of measurement and of the geographical coverage (i.e. more generally the sampling approach).
In order to mirror the reality in terms of access to the services, a reliable measurement or supervision system should
provide the possibility to collect information from probes or robots adapted to the most common interfaces available.
This includes:
• analogue access (for the simulation of PSTN or of analogue phones behind an ATA box or an ADSL modem);
• ISDN access;
• handset (for any wireline terminal);
• electrical input and output (for PC soundcards of for any wireless terminal);
ETSI
9 ETSI EG 202 765-2 V1.1.1 (2009-02)
• GSM;
• UMTS;
• ethernet with IP phone termination (SIP, ITU-T Recommendation H.323 [i.17], MGCP, etc.).
Any combination of end-to-end connection between the types of access mentioned here have to be considered when a
measurement campaign is scheduled. Nevertheless, of course, there are practical limitations:
• the number of measurements for a given type of access should be in proportion with its level of use in the real
life;
• the number of probes and of measurement results available will be adapted to the real needs as well as to the
capacity (mostly in terms of cost and of processing capability) of the entity running these measurements.
Figure 4.1 shows these different configurations and interfaces.

GSM and UMTS Ethernet access
terminals on IP Network
Electrical access
on mobile
terminal
Analogue access
ADSL modem
Radio Network
Handset access
on analogue
terminal
IP Network
Handset
PSTN
access on
Handset access
terminal
on ISDN terminal
connected to
IP Network
Analogue access
ISDN access Electrical access
on Wireless
terminal
Electrical access
connected to IP
on Wireless
Electrical access on PC
Network
terminal
connected to
PSTN
Figure 4.1: Possible configurations and interfaces in context of user characterization
5 Measurement type
To perform quality services assessments, there are two different methods: intrusive and non intrusive measurements.
The non intrusive measurements are not really adapted to end user surveys because it requires to install probes at the
user's terminals.
The intrusive measurements are more adapted to end user surveys because probe connection with end user terminals is
easier. Compared to non intrusive measurements, the intrusive methods have an advantage: the opportunity for voice
quality assessment to use models with references such as ITU-T Recommendation P.862[ i.2]
(see also ITU-T recommendations P.862.1 [i.3], P.862.2 [i.4] and P.862.3 [i.5] concerning mapping functions and
application guide) which give results close to subjective perception of the speech quality.
ETSI
10 ETSI EG 202 765-2 V1.1.1 (2009-02)
In this context, the intrusive measurements using models working with references for speech quality assessment will be
perform for end user survey.
6 Voice quality scale
It is important to consider that nowadays telephony has entered an era where traditional narrowband services will
cohabit with new services offering wideband audio capacities. For end-users, these are not separated kinds of services.
Therefore, the assessment of transmission quality of voice should now be based on common metrics and objective
quality levels and scales, in replacement of the existing narrow-band only ones. In this context, it is appropriate to use
the MOS-LQOM scale to characterize voice quality of narrow-band services and wideband services.
See ITU-T Recommendation P.800.1 [i.6] for more information on MOS terminology.
7 List of indicators
The indicators proposed for the context of end-user quality survey of voice services are:
7.1 Post Dialling Delay
Definition Post Dialling Delay (PDD) evaluates service availability to set up calls in an acceptable
delay. It is linked to the service architecture complexity, and to the performance of the
constituting network elements.
Post Dialling Delay is the time interval between the end of dialling by the caller and the
reception back by him of the appropriate ringing tone or recorded announcement.
Metric determines on one of the two access of the communication.
Assessment method Indicator determines sequentially from the two access of call configuration. This indicator
characterizes only the caller part of the configuration.
Unit Millisecond with an integer value.
Standardization
reference
Significant Mandatory.
Comment This indicator has to be separated between call types (IP to IP, IP to PSTN, IP to mobile,
etc.) for a detailed analysis.
The objective set up in for universal telephony service has been set up to 2 900 ms in the
French regulator recommendation.
7.2 Media establishment delay
Definition Time determines on one of the two access of the communication, between off hock of the
called and the beginning of voice signal receive.
Assessment method Indicator determines sequentially from the two access of call configuration.
On an IP access this indicator may be assessed by using a non-intrusive probe, such as a
protocol analyser. Media establishment delay may be evaluated through the analysis of
media flows and signalling. For ITU-T Recommendation H.323 protocol [i.17] the flow
establishment delay corresponds to the time elapsed between the emission of the
ITU-T Recommendation H.225.0 [i.18] "CONNECT" message and the arrival of the first
IP packet including speech signal.
Unit Millisecond with an integer value.
ETSI
11 ETSI EG 202 765-2 V1.1.1 (2009-02)
Standardization
reference
Significant Optional.
Comment This indicator has to be separated between caller and called site for a detailed analysis.
7.3 Unsuccessful call ratio
Definition Ratio of unsuccessful calls to the total number of call attempts in a specified time period.
An unsuccessful call is a call attempt to a valid number, properly dialled following dial
tone, where neither called party busy tone, nor ringing tone, nor answer signal, is
recognized on the access line of the calling user within 30 seconds from the instant when
the address information required for setting up a call is received by the network.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit % with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation E.800 [i.7], ITU-T Recommendation E.845 [i.8],
reference EG 201 769 [i.9].
Significant Mandatory.
Comment The limit of 30 seconds is the default set-up of a timer in SS7 protocol.
7.4 Premature release probability
Definition This indicator characterizes the ability to release a service. It is based on the measurement
of the number of released communications in comparison with the number of established
communications.
Released communications are defined as communications released before voluntary action
from one of the ends of the transmission.
Assessment method
Unit % with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation E.800 [i.7].
reference
Significant Optional.
7.5 Level of active speech signal at reception
Definition Level of speech signal received after transmission.
The level of the signal heard by the user has an impact on the quality he will perceive. A
too low signal will be hardly audible and masked by the noise, while a too high level will
be painful.
Therefore, a measurement of the speech signal level is necessary to ensure a good listening
comfort.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
A typical method for the measurement of this parameter, based on a sample by sample
approach and a moving threshold between noise and speech, is given in
ITU-T Recommendation P.56 [i.10].
ETSI
12 ETSI EG 202 765-2 V1.1.1 (2009-02)
Unit dBm with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation P.56 [i.10].
reference
Significant Optional.
Comment Each sample of signal has a level, generally express in mV. The mean speech level is the
transformation on as appropriate logarithmic scale of the mean signal voltage.
The samples taken into account for this measurement are the ones seen as speech (the
others are taken into account for noise measurements).
It is recommended to fall within classical speech levels values, i.e. between -25 dBm and
-10 dBm.
7.6 Noise level at reception
Definition Level of noise determines at reception in non-speech segment of speech sample.
The noise present besides the speech signal can have characteristics that can become a
disagreement, for instance if they have a varying spectrum (crowd, noise, for instance). But
the more important source of annoying due to noise is simply its level.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
The measurement of these parameters is normally performed as for speech signal level
(see clause 7.5), but on the samples identified as non-speech.
Unit dBmOp with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation O.41 [i.11].
reference
Significant Optional.
Comment Each sample of signal has a level, generally express in mV. The mean noise level is the
transformation on as appropriate logarithmic scale of the mean signal voltage of the noise
samples.
To get a more accurate noise level measure, a frequency transform needs to be done in
order to apply a psophometric weighting (see ITU-T Recommendation O.41 [i.11]).
It is recommended not to have noises louder than -50 dBmOp.
7.7 Noise to signal ratio at reception
Definition Difference between the active vocal level and the level of noise at the reception.
The noise present besides the speech signal can have characteristics that can become a
disagreement, for instance if they have a varying spectrum (crowd, noise, for instance). But
the more important source of annoying due to noise is simply its level, and particularly the
relative level compared to speech.
Assessment method Combination of speech signal level (see clause 7.5) and noise level (see clause 7.6) can
replace noise to signal ration indicator.
The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter.
Speech signal level/noise level = SNR.
ETSI
13 ETSI EG 202 765-2 V1.1.1 (2009-02)
Indicator determines in the two directions of transmission.
Unit dB with the resolution of 1 digit after the decimal point.
Standardization ITU-T Recommendation P.56 [i.10].
reference
Significant Optional.
Comment It is recommended not to have SNR lower than 30 dB.
7.8 Speech signal attenuation (or gain) after transmission
Definition Variant metric of the speech signal level (if one knows the sending speech level, they are
even redundant).
Speech signal attenuation after transmission is the difference between the active vocal level
at receiving and sending access.
Assessment method The received decoded signal used for instance for ITU-T Recommendation P.862 [i.2] can
be used also to assess this parameter. Once the speech signal level has been computed
(see clause 7.5), it is compared with the level of the sent signal. The attenuation is the
difference between these two levels.
There are other methods to compute this parameter, based for instance on intrusive
measurement made with sine waves and a specific weighting function.
Indicator determines in the two directions of transmission.
Unit dB with the resolution of 1 digit after the decimal point.
Standardization
reference
Significant Optional.
Comment It is recommended to comply with PSTN attenuation rules, i.e. an attenuation between
6 dB and 10 dB.
ETSI
14 ETSI EG 202 765-2 V1.1.1 (2009-02)
7.9 Talker echo delay
Definition In telecommunications, the term echo describes delayed and unwanted feedback of the
send signal into the receive path. The so-called echo source is the reflection point between
send and receive directions, which could be one of the following causes:
• 4-wire/2-wire Hybrid Circuits (multiple reflections possible);
• coupling in handset cords;
• structure borne coupling in handsets;
• acoustical coupling between earpiece and microphone.
This phenomenon is characterized by two parameters: its attenuation and its delay.
See annex C for a more detailed discussion of talker echo and the related listener echo.
With the increased delays present in today's IP networks, echo has the potential to be much
more perceivable and annoying than in classical PSTN.
In order to achieve a similar user perception with higher delays the attenuation of the talker
echo should be increased, i.e. active echo cancellation is necessary.
In practice it can be observed that in some cases, either the cancelling does not occur, or it
is not fully performing.
Echo is characterized by two parameters: its attenuation and its delay. The less attenuation
and/or the more delay, the more the echo will become annoying.
Echo delay is the time it takes for the speech signal to go from the mouth of a subscriber
back to the ear of the same subscriber, with one or more reflections occurring along the
transmission path.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit Milliseconds with an integer value.
Standardization ITU-T Recommendation G.131 [i.12].
reference
Significant Optional.
Comment For fully digital networks the talker echo delay can be assumed to be equivalent to twice
the mean one-way delay.
ETSI
15 ETSI EG 202 765-2 V1.1.1 (2009-02)
7.10 Talker echo attenuation
Definition In telecommunications, the term echo describes delayed and unwanted feedback of the sent
signal into the receive path. The so-called echo source is the reflection point between send
and receive directions, which could be one of the following causes:
• 4-wire/2-wire Hybrid Circuits (multiple reflections possible);
• coupling in handset cords;
• structure borne coupling in handsets;
• acoustical coupling between earpiece and microphone.
This phenomenon is characterized by two parameters: its attenuation and its delay.
See annex C for a more detailed discussion of talker echo and the related listener echo.
With the increased delays present in today's IP networks, echo has the potential to be much
more perceivable and annoying than in classical PSTN.
In order to achieve a similar user perception with higher delays the attenuation of the talker
echo should be increased, i.e. active echo cancellation is necessary.
In practice it can be observed that in some cases, either the cancelling does not occur, or it
is not fully performing.
Echo is characterized by two parameters: its attenuation and its delay. The less attenuation
and/or the more delay, the more the echo will become annoying.
Echo attenuation is the difference of level between the sending level and the (delayed)
receiving level both measured at the same subscriber while he is talking.
Assessment method Indicator determines sequentially from the two access of call configuration.
Unit dB with a resolution of one decimal.
Standardization ITU-T Recommendation G.131 [i.12], ITU-T Recommendation G.168 [i.13].
reference
Significant Optional.
Comment If active echo cancellation is used, ITU-T Recommendation G.168 [i.13] should be
applied; VoIP should provide echo attenuation of 55 dB. If the delay under all
circumstances is known to not exceed 50 ms lower values for the talker echo attenuation
(down to 35 dB) may be acceptable (see annex C).
Echo annoyance depends on two metrics: the attenuation and the delay.
For a similar attenuation level greater the delay is more important the annoyance will be
for the user(s).
Echo-Annoyance factor is defined as K.
K = EA - 40 × Log [(1 + delay/10) / (1 + delay/150)] + 6 × exp (-0,3 × delay²).
7.11 Listening speech quality
Definition Represents the intrinsic quality of speech signal after transmission. This indicator takes
into account the degradations generated on the signal by the transmission links.
ETSI
16 ETSI EG 202 765-2 V1.1.1 (2009-02)
Assessment method Voice quality is evaluated by using the ITU-T Recommendation P.862 [i.2] standard with
the mapping functions according to ITU-T Recommendation P.862.1 [i.3] and
ITU-T Recommendation P.862.2 [i.4] standards.
MOS or Mean Opinion Score (calculated using the Perceptual Evaluation of Speech
Quality, or PESQ method) provides an objective view on the quality of the voice signal as
it may be perceived by the customer.
The MOS score is obtained by comparing speech samples:
• the original signal sent by the far end of the connection;
• the degraded signal received at the local end, where the measurement is applied.
The voice quality indicator is determined in the two directions of transmission.
Several MOS scores are determined in series during the same call. So for a given
transmission way, listening speech quality performance during the call is defined by the
mean value of MOS-LQOM measurements (in the same direction).
Unit Note between 1 (= very bad) and 5 (= excellent) determines on MOS-LQOM scale with a
resolution of two digits after the decimal point.
Standardization ITU-T Recommendation P.800 [i.1], ITU-T Recommendation P.800.1 [i.6],
reference ITU-T Recommendation P.862 [i.2], ITU-T Recommendation P.862.2 [i.4],
ITU-T Recommendation P.862.3 [i.5].
Significant Mandatory.
Comment The value of this indicator depends on the used codec, but also on impairments like IP
packet loss or low signal to noise ratio.
To make easier result comparisons, it is recommended to use the same speech samples in
all test configurations, or at least select different speech samples based on a thorough
selection and validation process.
ITU-T Recommendation P.862 [i.2] is measuring listening quality and does not take into
account impairments affecting conversational quality, like the end-to-end delay or echo.
It is also a one-way indicator, therefore it has to be measured separately in both
transmission directions, with no average between them afterwards.
This indicator may be separated between call types (IP to IP, IP to PSTN, IP to Mobile,
etc.) for a detailed analysis.
7.12 Listening speech quality stability
Definition It is well known that for IP networks, delay within the network can vary significantly, due
to congestion or indeed due to route changes during a session. Delay variations in the
network can be compensated to some extent through good jitter buffer design in the
receiver. Furthermore, packet loss in the network can occur due either to severe congestion
(buffer overflows) or routing problems, or in a receiver terminal due to jitter buffer
overflow. These factors will impact on the quality of the service.
Concerning voice over IP, a single measurement of speech quality once at the very
beginning of a call is not enough. They should be analysed all along the duration of the
call, typically several minutes.
This metric represents the stability of the voice quality during a communication of several
minutes long. This indicator takes into account the degradations generated on the signal by
the transmission links.
ETSI
17 ETSI EG 202 765-2 V1.1.1 (2009-02)
Assessment method Several measurements of MOS-LQOM score performed with
ITU-T Recommendation P.862 [i.2] and the adapted mapping function (see clause 7.11)
are performed in series within the same call. Typically, a measurement each 20 s is
enough. The results are reported in terms of statistics.
The assessment of Listening Speech Quality Stability is preformed in 5 steps. The generic
formulation is presented in annex A.
For stability indicator about Listening Speech Quality, THRESHOLD1 = 0,1 and the linear
weighting function applies in order to express Stability (ST-MOS) on a 0 to 100 scale. By
definition Stability equals 100 when no instability occurs and Stability ST-MOS equals 0
when instability is equal or more than 0,4.
ST-MOS is calculated as:
• ST-MOS = 100 - ( 250 × INS_MOS), and
• ST-MOS = 0 if [100-(250 × INS_MOS)] < 0.
This indicator is determined in the two directions of transmission.
Unit Statistics on MOS score variation are plotted on a 0 to 100 scale.
Standardization
reference
Significant Mandatory.
7.13 End to end delay
Definition Represent the global delay from one access to the other one. This indicator takes into
account the transmission delay on networks but also processing delay in sending and
receiving terminals.
End to end delay is one of the components of perceived voice quality, may be the major
one in VoIP. A great delay impacts negatively on Q
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...