EN ISO 9921:2003
(Main)Ergonomics - Assessment of speech communication (ISO 9921:2003)
Ergonomics - Assessment of speech communication (ISO 9921:2003)
ISO 9921:2003 specifies the requirements for the performance of speech communication for verbal alert and danger signals, information messages, and speech communication in general. Methods to predict and to assess the performance in practical applications are described and examples are given.
Ergonomie - Beurteilung der Sprachkommunikation (ISO 9921:2003)
Diese Internationale Norm legt die Anforderungen an die Leistungsfähigkeit der Sprachkommunikation für sprachliche Warnungen und Gefahrensignale, Mitteilungen und an die allgemeine Sprachkommunikation fest. Es werden Verfahren für die Vorhersage und Beurteilung der subjektiven und der objektiven Leistungs-fähigkeit unter praktischen Anwendungsbedingungen beschrieben und Beispiele dafür angegeben.
Für das Erreichen einer optimalen Leistung in besonderen Anwendungsbereichen können drei Stufen in Betracht kommen:
a) Festlegung des Anwendungsbereichs und Definition der entsprechenden Leistungskriterien;
b) Gestaltung eines Kommunikationssystems und Vorhersage der Leistungsfähigkeit;
c) Beurteilung der Leistungsfähigkeit für Einsatzbedingungen.
Die Verwendung von akustischen Warnsignalen an Stelle von Sprache ist nicht Inhalt dieser Internationalen Norm, sondern wird in ISO 7731 behandelt.
Ergonomie - Evaluation de la communication parlée (ISO 9921:2003)
L'ISO 9921:2003 spécifie les exigences de performance en communication parlée relatives aux signaux oraux d'alerte et de danger, aux messages d'information et à la communication parlée en général. Des méthodes de prédiction et d'évaluation de la performance, subjectives et objectives, sont décrites dans des applications pratiques avec des exemples à l'appui.
Ergonomija – Ocena govorne komunikacije (ISO 9921:2003)
General Information
- Status
- Published
- Publication Date
- 14-Oct-2003
- Technical Committee
- CEN/TC 122 - Ergonomics
- Drafting Committee
- CEN/TC 122/WG 11 - Ergonomics of the Physical Environment
- Current Stage
- 9093 - Decision to confirm - Review Enquiry
- Start Date
- 10-Jun-2008
- Completion Date
- 10-Jun-2008
Overview
EN ISO 9921:2003 - Ergonomics: Assessment of speech communication specifies requirements and methods for evaluating the performance of spoken messages used as verbal alert and danger signals, information messages, and general speech communication. The standard describes practical ways to predict and assess both subjective and objective speech performance for direct person-to-person exchange, public address (PA) systems, and personal communication systems (telephones, intercoms, mobile devices).
Key concepts covered include speech intelligibility, speech quality, vocal effort, and environmental effects such as the Lombard effect. The document supports the three design stages of a communication solution: (a) define application and performance criteria, (b) predict performance during system design, and (c) assess in-situ performance after installation.
Key Topics and Requirements
- Scope and purpose: Defines when speech should be used (warnings, information) and when non-voice signals may be preferable.
- Three communication modes: Direct communication, public address, and personal communication systems - each with distinct assessment needs.
- Performance metrics and objective methods:
- Speech Transmission Index (STI) - objective prediction/measurement of intelligibility.
- Speech Intelligibility Index (SII) - objective method related to the Articulation Index.
- Speech Interference Level (SIL) - simple metric using A-weighted speech level and octave-band ambient noise.
- Subjective assessment: Guidance on listening tests and intelligibility scoring; Annexes provide test methods and rating scales.
- Speaker and listener factors: Vocal effort, gender, accents, non-native speakers, distance, and hearing protection effects are described.
- Design and prediction: Practical predictive methods and examples (informative annexes) to estimate intelligibility before installation.
- Normative/informative annexes: Speaker/listener characteristics, STI details, SIL definition, test examples, and application case studies.
Applications
EN ISO 9921 is practical for:
- Designing and commissioning public address systems in transport hubs, auditoria, factories.
- Specifying and testing alarm and emergency voice messages to ensure warnings are intelligible.
- Evaluating workplace communication where verbal instructions are critical (control rooms, shop floors).
- Assessing personal communication devices (intercoms, telephones, mobile handsets) for intelligibility and user effort.
- Supporting occupational safety decisions: when to use speech vs. non-speech signals.
Who uses this standard
- Ergonomists and human factors engineers
- Acoustic and audio-system consultants
- Safety and facility managers
- System integrators for PA and intercom systems
- Telecom and product designers concerned with speech intelligibility
Related Standards
- ISO/TR 4870 (speech intelligibility test construction)
- IEC 60268-16 (objective STI measurement)
- ISO 7731 (auditory warning signals)
- ISO 11429, IEC 60849 (related alarm/public address guidance)
EN ISO 9921:2003 is a practical, standards-based reference for ensuring clear, effective speech communication in safety-critical and everyday environments. Keywords: ergonomics, speech communication, speech intelligibility, STI, SIL, SII, vocal effort, public address, alarm messages.
Frequently Asked Questions
EN ISO 9921:2003 is a standard published by the European Committee for Standardization (CEN). Its full title is "Ergonomics - Assessment of speech communication (ISO 9921:2003)". This standard covers: ISO 9921:2003 specifies the requirements for the performance of speech communication for verbal alert and danger signals, information messages, and speech communication in general. Methods to predict and to assess the performance in practical applications are described and examples are given.
ISO 9921:2003 specifies the requirements for the performance of speech communication for verbal alert and danger signals, information messages, and speech communication in general. Methods to predict and to assess the performance in practical applications are described and examples are given.
EN ISO 9921:2003 is classified under the following ICS (International Classification for Standards) categories: 13.180 - Ergonomics. The ICS classification helps identify the subject area and facilitates finding related standards.
You can purchase EN ISO 9921:2003 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of CEN standards.
Standards Content (Sample)
SLOVENSKI STANDARD
01-junij-2004
Ergonomija – Ocena govorne komunikacije (ISO 9921:2003)
Ergonomics - Assessment of speech communication (ISO 9921:2003)
Ergonomie - Beurteilung der Sprachkommunikation (ISO 9921:2003)
Ergonomie - Evaluation de la communication parlée (ISO 9921:2003)
Ta slovenski standard je istoveten z: EN ISO 9921:2003
ICS:
13.180 Ergonomija Ergonomics
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
EUROPEAN STANDARD
EN ISO 9921
NORME EUROPÉENNE
EUROPÄISCHE NORM
October 2003
ICS 13.180
English version
Ergonomics - Assessment of speech communication (ISO
9921:2003)
Ergonomie - Evaluation de la communication parlée (ISO Ergonomie - Beurteilung der Sprachkommunikation (ISO
9921:2003) 9921:2003)
This European Standard was approved by CEN on 1 October 2003.
CEN members are bound to comply with the CEN/CENELEC Internal Regulations which stipulate the conditions for giving this European
Standard the status of a national standard without any alteration. Up-to-date lists and bibliographical references concerning such national
standards may be obtained on application to the Management Centre or to any CEN member.
This European Standard exists in three official versions (English, French, German). A version in any other language made by translation
under the responsibility of a CEN member into its own language and notified to the Management Centre has the same status as the official
versions.
CEN members are the national standards bodies of Austria, Belgium, Czech Republic, Denmark, Finland, France, Germany, Greece,
Hungary, Iceland, Ireland, Italy, Luxembourg, Malta, Netherlands, Norway, Portugal, Slovakia, Spain, Sweden, Switzerland and United
Kingdom.
EUROPEAN COMMITTEE FOR STANDARDIZATION
COMITÉ EUROPÉEN DE NORMALISATION
EUROPÄISCHES KOMITEE FÜR NORMUNG
Management Centre: rue de Stassart, 36 B-1050 Brussels
© 2003 CEN All rights of exploitation in any form and by any means reserved Ref. No. EN ISO 9921:2003 E
worldwide for CEN national Members.
CORRECTED 2003-12-03
Foreword
This document (EN ISO 9921:2003) has been prepared by Technical Committee ISO/TC 159
"Ergonomics" in collaboration with Technical Committee CEN/TC 122 "Ergonomics", the
secretariat of which is held by DIN.
This European Standard shall be given the status of a national standard, either by publication of
an identical text or by endorsement, at the latest by April 2004, and conflicting national
standards shall be withdrawn at the latest by April 2004.
According to the CEN/CENELEC Internal Regulations, the national standards organizations of
the following countries are bound to implement this European Standard: Austria, Belgium, Czech
Republic, Denmark, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy,
Luxembourg, Malta, Netherlands, Norway, Portugal, Slovakia, Spain, Sweden, Switzerland and
the United Kingdom.
Endorsement notice
The text of ISO 9921:2003 has been approved by CEN as EN ISO 9921:2003 without any
modifications.
INTERNATIONAL ISO
STANDARD 9921
First edition
2003-10-15
Ergonomics — Assessment of speech
communication
Ergonomie — Évaluation de la communication parlée
Reference number
ISO 9921:2003(E)
©
ISO 2003
ISO 9921:2003(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2003
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2003 — All rights reserved
ISO 9921:2003(E)
Contents Page
Foreword. iv
Introduction . v
1 Scope. 1
2 Normative references. 1
3 Terms and definitions. 1
4 Descriptions of speech communications. 3
4.1 General. 3
4.2 Speaker. 3
4.3 Transmission channel. 3
4.4 Listener. 3
5 Performance of speech communications. 3
5.1 General. 3
5.2 Alert and warning situations. 4
5.3 Person-to-person communications. 4
5.4 Public address in public areas . 4
5.5 Personal communication systems. 5
5.6 Summary of recommended minimum performance. 5
6 Assessment and prediction. 5
6.1 General. 5
6.2 Subjective assessment methods . 5
6.3 Objective assessment and prediction methods . 6
Annex A (normative) Speaker and listener characteristics . 7
Annex B (informative) Subjective speech-intelligibility tests . 9
Annex C (informative) Speech transmission index, STI . 12
Annex D (informative) Overview of the means of communication and related parameters . 14
Annex E (normative) Speech interference level, SIL . 18
Annex F (informative) Intelligibility ratings for speech communications. 19
Annex G (normative) Definition of symbols . 22
Annex H (informative) Examples of applications of predictive intelligibility methods . 23
Bibliography . 28
ISO 9921:2003(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 9921 was prepared by Technical Committee ISO/TC 159, Ergonomics, Subcommittee SC 5, Ergonomics
of the physical environment.
This first edition of ISO 9921 cancels and replaces ISO 9921-1:1996.
iv © ISO 2003 — All rights reserved
ISO 9921:2003(E)
Introduction
The aim of standardization in the field of the ergonomic assessment of speech-communication is to
recommend the levels of speech-communication quality required for conveying comprehensive messages in
different applications. The quality of speech communication is assessed for the following cases:
warning of hazard;
warning of danger;
information messages for work places, public areas, meeting rooms, and auditoria.
For some applications, direct communication between humans is considered while, in others, the use of
electro-acoustic systems (e.g. PA systems) or personal communication equipment (e.g. telephone, intercom)
will be the most convenient means of informing and instructing or exchanging information.
The use of auditory warning symbols other than speech is not included in this International Standard but is
covered by ISO 7731.
Acoustical danger and warning signals are in general omni-directional and therefore may be universal in many
situations. Auditory warnings are of great benefit in situations where smoke, darkness or other obstructions
interfere with visual warnings.
It is essential that, in the case of verbal messages, a sufficient level of intelligibility is achieved, in the
coverage area. If this cannot be achieved, non-voice warning signals (see ISO 7731, IEC 60849 and [4] in the
Bibliography) or visual warning signals (see ISO 11429) may be preferable.
If acoustical signals are too loud, hearing damage or environmental problems may occur (e.g. noise nuisance
to dwellings near railway platforms, road traffic, airports, etc.). Good design can minimize these negative
aspects. In addition, prediction methods with sufficient accuracy are useful for consultants, suppliers and end-
users and may thus reduce costs of necessary adjustments after installation of a system.
The communications might be directly between humans, through public address or intercom systems or by
pre-recorded messages. In general, text-to-speech systems are not recommended because of the low
intelligibility of these systems.
It is recognized that, in a general-purpose document, simple to apply and easily available tools for prediction
and assessment should be described, as well as more sophisticated advanced technological methodologies.
INTERNATIONAL STANDARD ISO 9921:2003(E)
Ergonomics — Assessment of speech communication
1 Scope
This International Standard specifies the requirements for the performance of speech communication for
verbal alert and danger signals, information messages, and speech communication in general. Methods to
predict and to assess the subjective and objective performance in practical applications are described and
examples are given.
In order to obtain optimal performance in a specific application, three stages can be considered:
a) specification of the application and definition of the corresponding performance criteria;
b) design of a communication system and prediction of the performance;
c) assessment of the performance for in situ conditions.
The use of auditory warning signals other than speech is not included in this International Standard but is
covered by ISO 7731.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO/TR 4870:1991, Acoustics — The construction and calibration of speech intelligibility tests
IEC 60268-16:1998, Sound system equipment — Part 16: Objective rating of speech intelligibility by speech
transmission index
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
alarm
warning of existing or approaching danger
3.2
danger
risk of harm or damage
3.3
effective signal-to-noise ratio
measure to express the (combined) effect of various types of distortions on the intelligibility of a speech signal
in terms of the effect of a masking noise resulting in a speech signal having the same intelligibility
3.4
emergency
imminent risk or serious threat to persons or property
ISO 9921:2003(E)
3.5
Lombard effect
spontaneous increase of the vocal effort induced by the increase of the ambient noise level at the speaker’s
ear
3.6
non-native speaker
person speaking a language which is different from the language that was learned as the primary language
during the childhood of the speaker
3.7
speech communication
conveying or exchanging information using speech, speaking, hearing modalities, and understanding
NOTE Speech communication may involve brief texts, sentences, groups of words and/or isolated words.
3.8
speech communicability
rating of the ease with which speech communication is performed
NOTE Speech communicability includes speech intelligibility, speech quality, vocal effort, and delays.
3.9
speech intelligibility
rating of the proportion of speech that is understood
NOTE Speech intelligibility is usually quantified as the percentage of a message understood correctly.
3.10
speech intelligibility index
SII
objective method for prediction of intelligibility based on the Articulation Index
NOTE See [1] in the Bibliography.
3.11
speech interference level
SIL
difference between A-weighted speech level and the arithmetic average of sound-pressure levels of ambient
noise in four octave bands with central frequencies of 500 Hz, 1 000 Hz, 2 000 Hz and 4 000 Hz
3.12
speech quality
rating of sound quality of a speech signal
NOTE Speech quality characterizes the amount of audible distortion of a speech signal and is usually rated by a
description.
3.13
speech transmission index
STI
objective method for prediction and measurement of speech intelligibility
3.14
vocal effort
exertion of the speaker, quantified objectively by the A-weighted speech level at 1 m distance in front of the
mouth and qualified subjectively by a description
2 © ISO 2003 — All rights reserved
ISO 9921:2003(E)
3.15
warning
important notice concerning any change of status that demands attention or activity
4 Descriptions of speech communications
4.1 General
Speech communication requires three sequential components: speaker, transmission channel and listener(s).
Based on this concept, three means of communication are identified.
a) Direct communication. This is typical for person-to-person communications, where both persons are in
the same environment without making use of electro-acoustic means.
b) Public address. In general, an electro-acoustic system that is used to address a group of people in one
or more environments.
c) Personal communication systems. These include the use of mobile telephones and handheld
transceivers and the use of normal telephones, intercoms and hands-free telephones.
4.2 Speaker
Several speaker-related parameters define the contribution of the speaker to the performance of a
communication. These parameters include vocal effort, speaking quality, gender, accents, non-native speech,
speaking disorders, and distance from the listener or microphone.
Vocal effort is expressed by the equivalent A-weighted sound-pressure level at a distance of 1 m in front of the
mouth. The ambient noise level at the speaker's position (causing the Lombard effect) and the wearing of a
hearing protector influence the vocal effort. The relation between these parameters and the effect on the
speech quality is described in Annex A.
The frequency spectrum of the speech is related to the gender of the speaker and the vocal effort. This may
result, in combination with a specific type of noise, in a gender-related performance [see Annex B (B.3) and
Annex C].
The effects of strong accents and non-native speakers and listeners reduce the performance of a
communication; quantitative data are given in A.6.
4.3 Transmission channel
The transmission path between the speaker’s mouth and the listener’s ear is described by the distribution of
the speech signal in a room or by an electro-acoustic system. It affects the deterioration of the speech signal.
Important influences are ambient noise, reverberation, echoes, sound radiation, limitation in the frequency
response, and non-linearities. In Annex D, an overview is given of the means of communication and related
parameters.
4.4 Listener
For the listener, hearing aspects (directional hearing, masking, hearing disorders, reception threshold) and the
use of hearing protection define the deterioration. In Annexes A, C, D and E, these listener-related parameters
are considered, except for that of directional hearing, which is not considered in this International Standard.
5 Performance of speech communications
5.1 General
A correct recognition of each utterance is required for the understanding of spoken messages. In technical
terms, this means that an intelligibility score of 100 % is required for sentences. A sentence intelligibility score
ISO 9921:2003(E)
of 100% does not imply that each individual word is clearly understood and that the listening situation is
comfortable and relaxed and there are many situations in which a better performance is required. In alert
situations under adverse conditions, it is sufficient to fully understand a short message, even if correct
understanding requires some effort from the listener. In a meeting room, an auditorium, or at work places
where speech communication is a part of the task and where people are normally present for a longer period
of time, a more relaxed speaking condition and a good listening condition are required. For the speaker, this is
reflected by the low vocal effort required to be understood (see Table A.1). For the listener, the listening effort
may be primarily related to the speech intelligibility and speech quality at the listening position (see Table F.1).
The range of the classification scales and the number of the intervals is large enough to discriminate between
conditions required for different applications (see Table F.1 and Figure F.1).
The quality of speech communication is expressed in terms of intelligibility and vocal effort. In this
International Standard, various application and environmental conditions are identified. For each of them,
minimal performance criteria are recommended, covering the range from short alert and warning messages
under adverse conditions to relaxed communications in a meeting room or auditorium. People with a slight
hearing disorder (in general the elderly) or non-native listeners require a higher signal-to-noise ratio
(approximately 3 dB).
The different fields of application are described in 5.2 to 5.5 and summarized in 5.6.
5.2 Alert and warning situations
In general, clearly pronounced short messages are required for alert and warning situations, in order to
provide guidance for safe evacuation or clearance with minimal risk of panic. Hence, simple sentences should
be understood correctly even under adverse conditions, high environmental-noise levels, the speaker shouting,
etc.
As seen in Annex F (Figure F.1), the qualification “poor” is just adequate for alert and warning situations. This
criterion represents a mean value for listeners with a normal hearing (50 % coverage). For 96 % coverage of
the population, an improvement is required that can be expressed by an increase of the signal-to-noise ratio
by 3 dB. Therefore, the recommended criterion should be at least “poor”.
With the use of a public-address system, poor-to-fair intelligibility may be recommended in adverse conditions.
However, distortions introduced by the electro-acoustic systems and/or the environment (band-pass limiting,
non-linear distortion, noise, reverberation and echoes) may also affect the speech intelligibility. This generally
results in the need for a better signal-to-noise ratio.
In order to include effects of all the distortions and environmental conditions on the overall intelligibility rating,
it is necessary to assess the system performance under representative (in situ) conditions.
5.3 Person-to-person communications
For communication in work situations, offices, meeting rooms, auditoria, and in critical situations (ambulance
personnel, firemen, etc), a different level of intelligibility is required depending on the purpose of the
communication. In critical situations, generally short messages are exchanged which also include a certain
number of known critical words. For such communication conditions, at least a “fair” intelligibility is
recommended at an increased vocal effort (loud).
In situations of a relaxed type of communication, for example, occurring in offices, during meetings, lectures
and performances, which take place over a longer period of time, a good level of intelligibility is recommended
allowing for a normal vocal effort.
5.4 Public address in public areas
In public areas, general announcements are made with a short to medium duration at a normal vocal effort.
The content of the announcements may consist of numbers, names of destinations, names of persons, etc.
For these purposes, a fair-to-good intelligibility is recommended. Typical areas are shopping centres, railway
stations, within transportation means, and stadiums.
4 © ISO 2003 — All rights reserved
ISO 9921:2003(E)
5.5 Personal communication systems
Communication systems are generally limited in bandwidth and may be used in noisy environments.
Examples are the outdoor use of mobile telephones and handheld transceivers, and the indoor use of normal
telephones and hands-free telephones. Depending on the type of the communication (complexity of the
messages) and intensity of the use, a fair-to-good intelligibility is recommended at a normal vocal effort.
5.6 Summary of recommended minimum performance
The recommended minimal performance rating is summarized in Table 1. However, in certain circumstances,
it is advisable to have a higher rating.
Table 1 — Recommended minimal performance ratings for intelligibility and vocal effort in four
applications (for examples of rating see Table A.1)
Minimum intelligibility
Application Maximum vocal effort Description
rating
Alert and warning situations
(correct understanding of simple Poor Loud 5.2
sentences)
Alert and warning situations
(correct understanding of critical Fair Loud 5.2
words)
Person-to-person
Fair Loud 5.3
communications (critical)
Person-to-person
communications (prolonged Good Normal 5.3
normal communication)
Public address in public areas Fair Normal 5.4
Personal communication
Fair Normal 5.5
systems
6 Assessment and prediction
6.1 General
Assessment of speech communication includes speech quality, speech intelligibility, speech communicability
and vocal effort. For the purpose of this International Standard, only speech intelligibility and vocal effort are
considered. The intelligibility can be determined by subjective methods (making use of speakers and listeners)
and by objective methods (making use of physical properties and the physical description of the speaking and
listening process).
6.2 Subjective assessment methods
Subjective intelligibility tests require trained speakers to read lists of test words and listeners who write down
what they thought they heard. Normally lists are 50 words long and the result is scored out of 100. Test words
should be embedded in a carrier phrase in order
a) to let the speaker control his vocal effort,
b) to account for temporal distortion during pronunciation of the test word, and
c) to get the attention of the listener at each utterance.
ISO 9921:2003(E)
Test words may be meaningful words or nonsensical words, and phonetically balanced (phoneme distribution
representative for the language) or equally balanced (phoneme distribution equal for all phonemes). The type
of words used in the test defines the relation with other types of tests such as STI (Speech Transmission
Index) or SIL (Speech Interference Level). An informative description of subjective intelligibility tests is given in
Annex B and ISO/TR 4870.
6.3 Objective assessment and prediction methods
There are several objective methods to predict speech intelligibility. Depending on the method, either results
of objective measurements or specifications of a system and space are used to calculate an index to predict
intelligibility. These may include
spectrum of the speech signal,
spectrum of environmental noise,
spatial distribution of these sound fields,
reverberation,
associated selection of listener positions, and
evaluation of the resulting intelligibility score.
Commonly used methods are the Speech Interference Level (SIL), the Speech Transmission Index (STI), and
the Speech Intelligibility Index (SII). A normative description of the SIL is given in Annex E, a normative
description of STI is given in IEC 60268-16 and an informative description in Annex C. The SII is described in
[1]
ANSI S3.5 .
6 © ISO 2003 — All rights reserved
ISO 9921:2003(E)
Annex A
(normative)
Speaker and listener characteristics
A.1 Vocal effort
The level of the speech signal depends on the vocal effort of the speaker. The vocal effort is expressed by the
equivalent continuous A-weighted sound-pressure level of speech measured at a distance of 1 m in front of
the mouth. The relation between vocal effort and the corresponding level is given in Table A.1 for a typical
male speaker.
Table A.1 — Vocal effort of a male speaker and related A-weighted
speech level (dB re 20 µPa) at 1 m in front of the mouth
L
S, A, 1 m
Vocal effort
dB
Very loud 78
Loud 72
Raised 66
Normal 60
Relaxed 54
A.2 Effect of ambient noise on vocal effort
Ambient noise above a certain level influences the vocal effort (this in known as the Lombard effect). In
Figure A.1 the relation between speech level and ambient-noise level is given. The hatched area indicates the
variability of the Lombard effect among speakers.
A.3 Decrease of speech quality with loud speech
The quality of loud speech, above the level of L = 75 dB, is substantially reduced, making it more
S, A, 1 m
difficult to understand in comparison with speech produced at a lower vocal effort. This is taken into account
by reduction of the speech level in calculations: (L ) shall be reduced by ∆L = 0,4 (L – 75) dB for
S, A 1 m S, A, 1 m
L > 75 dB.
S, A, 1 m
NOTE Certain symbols used in this annex are defined in Annex G.
A.4 Effect of hearing protection on vocal effort
A speaker wearing hearing protectors will reduce his vocal effort by about 3 dB compared to the unprotected
situation, if the ambient noise level L exceeds 75 dB.
N, A
ISO 9921:2003(E)
Figure A.1 — Relation between the range of vocal effort (equivalent continuous speech sound level)
and the ambient-noise level at the speaker’s position
A.5 Effect of distance between speaker and listener
From the speech level at the speaker position (L ), the speech level at the listener position (L )
S, A, 1 m S, A, L
may be approximated using the equation:
r
=− 20 lg
LL
S, A, L S, A, 1 m
r
where
r is the distance in metres between the speaker and listener;
r = 1 m
Hence, the decrease in speech level is assumed to be 6 dB for each doubling of the distance. This relation is
valid for indoor and outdoor conditions up to about 2 m. For conditions with a reverberation time smaller than
2 s at 500 Hz, a maximum distance of 8 m is valid.
A.6 Effect of non-native speakers and listeners
A reduced intelligibility is observed with non-native but fluent speakers and listeners of a second language.
For non-native speakers or listeners, or for both in combination, a 4 dB to 5 dB improvement in the signal-to-
[15]
noise ratio is required for a similar intelligibility as is obtained with native speakers and/or listeners . This
4 dB signal-to-noise ratio improvement corresponds with an improvement of the STI of 0,13 and of the SIL of
4 dB.
8 © ISO 2003 — All rights reserved
ISO 9921:2003(E)
Annex B
(informative)
Subjective speech-intelligibility tests
B.1 Basic conditions for testing
The speaking ability of speakers and the hearing capacity of listeners shall be sufficient to provide an efficient
direct communication, communication by means of a public-address system or personal communication
device (see Figure D.1).
The speakers and the listeners shall be familiar with the language used, as far as to pronounce and
understand a verbal message. It is best to use native speakers of the language.
Listeners should be protected from risks to health and safety. This means that a safe speech level should not
be exceeded. The recommended maximum speech level is 80 dB A-weighted for an exposure of maximum
8 h per working day.
B.2 Test material
B.2.1 General
The speech-intelligibility test should be such as to obtain valid, reliable results allowing for an analysis of
errors in listeners’ responses. The test material must use samples of speech sounds, which are typical for the
communication system being tested, and representative of the type of message transmitted through the
system. Economy of testing should be considered, i.e., possible automation to simplify test administration.
A number of methods have been proposed for the measurement of speech intelligibility (see F.4). In this
document, three types of intelligibility tests are included:
an open-set nonsensical CVC word test;
EQB
an open-set meaningful PB-word test;
a sentence test.
B.2.2 Open-set lists
Open-set lists of test items are made using items drawn randomly from a total set of test items. In the case of
nonsensical CVC word tests, a test item is generated randomly from a set of initial consonants, vowels and
final consonants. The CVC nonsensical words are balanced to represent all phonemes of the test
EQB
language in equal proportion. In the CVC test generation, language-dependent restrictions may apply in the
conjunction of specific phonemes.
The meaningful phonetically balanced word test (PB-words) is constructed as a set of monosyllabic words.
For phonetically balanced tests, different phonemes occur in the test in the proportion in which they occur in
natural language.
The nonsensical CVC-word test and the meaningful phonetically balanced word test (PB-words) typically
comprises 50 words per list. The total number of required test items is at least 1 000 words, to avoid listeners
adapting to frequently used lists (see ISO/TR 4870). The CVC test requires about a 6 dB higher signal-to-
EQB
noise ratio to obtain a similar percentage correct score as does the meaningful PB-word test (see Figure F.1).
ISO 9921:2003(E)
The lists spoken by speakers are presented to a panel of listeners. Since open format is used, the listeners
typically respond by writing down the response on a response sheet (or using a silent keyboard). The
intelligibility score is the percentage of words correctly identified in the test. With nonsensical CVC-words,
separate scores for the initial consonant, the vowel, and the final consonant can also be determined, this then
allows for the construction of a confusion matrix. For details see Annex F, ISO/TR 4870 and [13].
B.2.3 Sentence tests
Usually, sentence tests are not recommended for evaluating transmission systems because the listener’s
knowledge of grammar, meaning and syntax of the sentence influences the results. Another difficulty is
creating a large number of sentences that are phonetically representative of speech and with a well-defined
[10]
complexity. However, for specific uses, the SRT method can be used which determines the noise level
which provides 50 % sentence intelligibility. Depending on the speech material, this corresponds to a signal-
to-noise ratio of − 4 dB to − 6 dB (see Figure F.1). Hence, conversion to other conditions is possible.
B.3 Speakers and listeners
Speakers and listeners should be selected to be representative of the user population of a system under test.
In selecting the speakers and listeners, age, gender, education, relevant experience and linguistic background
should be taken into account. The group of speakers and listeners, the size and training shall be selected in
accordance with ISO/TR 4870.
ISO/TR 4870 recommends the following:
at least one male and one female speaker typical of a given nationality and language;
five well-motivated listeners for small closed-set test formats, and ten for large open-format tests;
normal experience in use and spelling of the language to be used, good hearing, that is a pure tone
audiogram not exceeding a hearing level of 10 dB at any test frequency up to 4 000 Hz, and 15 dB at any
frequency up to 6 000 Hz;
training time between 5 min and 24 h depending on the test format, see ISO/TR 4870:1991, 3.10.
The speech samples may be spoken directly, or prerecorded. Recordings of test material should be made
according to ISO/TR 4870. The electrical parameters of a recording system such as frequency response, non-
linear distortions, and the signal-to-noise ratio should be good enough to be considered ideal in comparison
with the respective parameters of the system under test. For recording, the speaker should be placed in a
quiet and sound absorbing environment. The distance of the speaker’s mouth to the microphone should be
reported.
The speaker should be familiar with the grammar of the text material. The speaker should be given visual
feedback to control the level, and timing, of spoken items. The same kind of feedback should be used in the
case of live and recorded speech. Speakers must be trained until they attain a stable sound-pressure level of
pronounced speech (65 dB ± 3 dB) on the average, at a distance of 1 m in front of the speaker’s lips. For
details see ISO/TR 4870.
Listeners should be familiar with the communication system under testing. They must also become familiar
with the test procedure. The listeners should be given written instructions.
The listeners should be trained until they become familiar with the test procedure and the test words. The
training should include hearing all the words from a list under quiet conditions, using an undistorted
communication system. The training should be conducted until listeners achieve 100 %, or nearly 100 %
performance in ideal conditions. Listeners should be trained by hearing the voices of all the speakers used.
There should be no visual contact between the speaker and the listener in order to prevent the listener from lip
reading.
10 © ISO 2003 — All rights reserved
ISO 9921:2003(E)
B.4 Administration of the intelligibility test
Usually, intelligibility testing involves a number of test conditions because several communication systems or
several states of a communication system (e.g. various speech-to-noise ratios) are to be measured, resulting
in different intelligibility ratings. However, if only one test condition is to be assessed, the use of reference
conditions is recommended.
If several conditions are measured, they should be presented using a balanced experimental design that will
neutralize the influence of various random factors, that are not fully controlled in measurements such as the
effect of learning by the listeners. Other information relevant to the listener’s performance should be collected.
This includes information about the confidence of the listener's responses as well as the listener's opinions
about the measured system. All variables important for the conditions of testing should be chosen in advance
or measured.
In the case of live speech, the speaking level, rate of speech and vocal effort should be controlled and
reported. The speech and noise level both on the speaker’s side and at the listener's ears should be
measured and reported. In the case of prerecorded speech, the speech and noise level at the listener’s ears
should be measured and reported.
If the communication device creates constraints of the mouth and lips (e.g. special helmet with a microphone),
it should be reported and described.
B.5 Statistical analysis and documenting results
For a simple test, the mean score (percent correct responses) and the corresponding standard deviation
should be calculated, thus allowing for prediction of the 96 % confidence interval. Depending on the
construction of the test (i.e., number of speakers, number of listeners, number of conditions, number of
replicas), statistical analysis such as an analysis of variance (ANOVA) can be applied.
ISO 9921:2003(E)
Annex C
(informative)
Speech transmission index, STI
The STI-method [7], [11], [12], [14] assumes that the intelligibility of a transmitted speech signal is related to
the preservation of the original spectral differences between speech sounds. These spectral differences may
be reduced by band-pass limiting, masking noise, temporal distortion (echoes, reverberation, and automatic
gain control), and non-linear distortion (system overload, quantization noise). The reduction of these spectral
differences can be quantified by the effective signal-to-noise ratio obtained for a number of frequency bands.
Also human-related hearing aspects such as masking, the reception threshold, hearing disorders, and non-
native speakers and listeners may reduce the effective signal-to-noise ratio. The method is based on the
calculation of the effective signal-to-noise ratio in seven relevant frequency bands (octave bands, centre
frequencies ranging from 125 Hz to 8 kHz). Weighted contributions of the quantified information transfer
function in seven octave bands results in a single index, the STI .
r
Originally the STI-method was developed for measurements. For this purpose, a specific test signal was
designed, which, after transmission through the channel under test, was analysed in order to determine the
effective signal-to-noise ratios in different frequency bands and to calculate the STI . The test signal was so
r
designed that, after analysis, information could be obtained on most types of distortion mentioned above. In
particular, temporal distortion and non-linear distortion require a specific test signal and analysis.
It is possible to predict the STI value for transmission channels with band-pass limiting and noise, based on
r
the signal-to-noise ratio in the seven octave bands. However, the prediction of the effect of temporal distortion
on the STI is limited to single echoes and reverberation. For reverberation, a simple algorithm is used and
r
only continuous exponential decay curves can be accounted for. This excludes prediction, for acoustically
1)
coupled enclosures and very complex environments . The effect of non-linear distortion on the STI cannot be
r
predicted by a simple algorithm.
The measurement of the STI is described in IEC 60268-16.
Prediction of the STI-value can be performed in the following nine steps.
Step 1: Determine the speech spectrum in seven octave bands at the listener’s ear.
This includes the determination of the vocal effort (including the Lombard effect and the effect of wearing a
hearing protector, see Annex A), the male/female speech spectrum, the distance between speaker and
listener, and the effect of band-pass limiting.
Step 2: Determine the noise spectrum in seven octave bands at the listener’s ear.
Step 3: For each band, determine the signal-to-noise ratio, based on the speech and noise spectra and
convert these signal-to-noise ratios to the corresponding m-values.
10 S
m = 10 exp
SN+
where
S is the speech level, in decibels;
N is the noise level, in decibels.
1) With prediction algorithms such as ray-tracing, more complex environments can be included.
12 © ISO 2003 — All rights reserved
ISO 9921:2003(E)
If no temporal distortion has to be accounted for, then proceed with Step 6.
Step 4: Determine the early decay reverberation time for
...
제목: EN ISO 9921:2003 - 인간공학 - 음성 의사소통 평가 (ISO 9921:2003) 내용: ISO 9921:2003은 구술 경고 및 위험 신호, 정보 메시지 및 일반적인 음성 의사소통의 성능에 대한 요구 사항을 명시합니다. 실제 응용 프로그램에서의 성능을 예측하고 평가하는 방법에 대해 설명하며, 예시도 제공됩니다.
記事のタイトル: EN ISO 9921:2003 - 人間工学 - 音声コミュニケーションの評価(ISO 9921:2003) 記事の内容: ISO 9921:2003は、口頭での警報や危険信号、情報メッセージ、そして一般的な音声コミュニケーションの性能に対する要件を規定しています。実際の応用における性能を予測および評価する方法について説明し、例も示されています。
The article discusses the standards set by ISO 9921:2003 for speech communication in various contexts. It outlines the requirements for verbal alert and danger signals, information messages, and speech communication in general. The article also describes methods to predict and assess the performance of speech communication in practical applications, providing examples for clarification.










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...