ISO 532-3:2023
(Main)Acoustics — Methods for calculating loudness — Part 3: Moore-Glasberg-Schlittenlacher method
Acoustics — Methods for calculating loudness — Part 3: Moore-Glasberg-Schlittenlacher method
This document specifies a method for estimating the loudness and loudness level of both stationary and time-varying sounds as perceived by otologically normal adult listeners under specific listening conditions. The sounds may be recorded using a single microphone, using a head and torso simulator, or, for sounds presented via earphones, the electrical signal delivered to the earphones may be used. The method is based on the Moore-Glasberg-Schlittenlacher algorithm. NOTE 1 Users who wish to study the details of the calculation method can review or implement the source code which is entirely informative and provided with the standard for the convenience of the user. This method can be applied to any sounds, including tones, broadband noises, complex sounds with sharp line spectral components, musical sounds, speech, and impact sounds such as gunshots and sonic booms. Calculation of a single value for the overall loudness over the entire period of a time-varying signal lasting more than 5 s is outside the scope of this document. NOTE 2 It has been shown that, for steady tones, this method provides a good match to the contours of equal loudness level as defined in ISO 226:2003[18] and the reference threshold of hearing as defined in ISO 389-7:2019[19].
Acoustique — Méthode de calcul du niveau d'isosonie — Partie 3: Titre manque
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO
STANDARD 532-3
First edition
2023-07
Acoustics — Methods for calculating
loudness —
Part 3:
Moore-Glasberg-Schlittenlacher
method
Reference number
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General . 4
5 Input signal . 4
5.1 Single microphone . 4
5.2 Two microphones in the ear canals or microphones in a head and torso simulator . 4
5.3 Earphone presentation. 4
6 Instrumentation . 5
7 Description of the method . 5
7.1 General . 5
7.2 Transfer of sound through the outer and middle ear . 6
7.2.1 General . 6
7.2.2 Free-field transfer function . 7
7.2.3 Diffuse-field transfer function . 8
7.2.4 Signal recorded using microphones in the ear canals or using a Head and
Torso Simulator . 8
7.2.5 Earphone presentation . 8
7.3 Calculation of the running short-term spectrum . 8
7.4 Calculation of the running short-term excitation pattern . 9
7.5 Transformation of excitation into specific loudness . 10
7.5.1 General . 10
7.5.2 Reference excitation at the reference threshold of hearing . 10
7.5.3 Gain of the cochlear amplifier for inputs with low sound pressure levels . 11
7.5.4 Calculation of specific loudness from excitation when E /E ≤ E/E . 11
THRQ 0 0
7.5.5 Calculation of specific loudness from excitation when E /E > E/E .12
THRQ 0 0
7.5.6 Calculation of specific loudness from excitation when E/E > 10 .12
7.6 Calculation of short-term specific loudness . 13
7.7 Smoothing of short-term specific loudness and application of binaural inhibition .13
7.8 Calculation of short-term loudness . 15
7.9 Calculation of long-term loudness . 15
7.10 Relationship between loudness level and loudness. 15
7.11 Calculation of the reference threshold of hearing . 16
8 Uncertainty of calculated loudness sounds .17
9 Data reporting .17
Annex A (informative) Software for the calculation of loudness according to the method in
this document .19
Annex B (informative) Test signals used for verification of this document .21
Annex C (informative) Test signals used for verification of equivalence with ISO 532-2 .24
Bibliography .28
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use
of (a) patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed
patent rights in respect thereof. As of the date of publication of this document, ISO had/had not received
notice of (a) patent(s) which may be required to implement this document. However, implementers are
cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents. ISO shall not be held responsible for identifying any or all
such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 43, Acoustics.
A list of all parts in the ISO 532 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
Loudness and loudness level are two perceptual attributes of sound describing absolute and relative
sensations of sound strength perceived by a listener under specific listening conditions. Due to inherent
individual differences among people, both loudness and loudness level have the nature of statistical
estimators characterized by their respective measures of central tendency and dispersion determined
for a specific sample of the general population.
The object of this document is to specify a calculation procedure based on the physical properties of
sound for estimating loudness and loudness level of sound as perceived by listeners with otologically
normal hearing under specific listening conditions. This procedure seeks numbers that can be used
in many scientific and technical applications to estimate the perceived loudness and loudness level of
sound without conducting separate human observer studies for each application. Because loudness
is a perceived quantity, the perception of which may vary among people, any calculated loudness
value represents only an estimate of the average loudness as perceived by a group of individuals with
otologically normal hearing.
This document describes a method for calculating the loudness of time-varying sounds from the
input signal, which may differ for the two ears. This calculation method is based on Moore-Glasberg-
[1] to [5]
Schlittenlacher loudness calculation algorithms . The method allows calculation of two quantities:
a) The short-term loudness, which is the momentary loudness of a short segment of a sound, such as a
word in a speech sound or a single note in a piece of music.
b) The long-term loudness, which is the loudness of a longer segment of sound, such as a whole
sentence or a musical phrase.
For most everyday sounds, both the short-term loudness and the long-term loudness vary over time.
The loudness of sounds with durations up to 2 s or 3 s is well predicted from the maximum value of the
[4][6] to [8]
long-term loudness reached during presentation of the sound . For long-duration stationary
sounds, the long-term loudness based on the method described in this document is very close to the
[9]
loudness determined using the method described in ISO 532-2 . Deviations can occur for sounds with
strong amplitude fluctuations, such as noises with narrow bandwidth; for such sounds the calculated
loudness is more accurate for this document than for ISO 532-2.
The method of loudness calculation described in this standard can be applied to signals of any duration.
However, it does not directly give an output corresponding to the overall loudness impression of a sound
scene or soundscape over a period of minutes, hours, or days, which is called the “overall loudness” in
this standard. The output of the method of loudness calculation described in this standard can be post-
processed to estimate the overall loudness of a sound scene.
NOTE Post-processing is outside the scope of this document, but some possible methods have been
[10] to [13]
described .
This document describes the calculation procedure leading to estimation of the loudness and loudness
level of time-varying sounds and provides executable computer programs. The software provided with
this document is entirely informative and provided for the convenience of the user. Use of the provided
software is not required for conformity with the document.
NOTE Equipment or machinery noise emissions/immissions can also be judged by other quantities defined
[14] [15] [16] [17]
in various International Standards (see e.g. ISO 1996-1 , ISO 3740 , ISO 9612 , and ISO 11200 ).
v
INTERNATIONAL STANDARD ISO 532-3:2023(E)
Acoustics — Methods for calculating loudness —
Part 3:
Moore-Glasberg-Schlittenlacher method
1 Scope
This document specifies a method for estimating the loudness and loudness level of both stationary
and time-varying sounds as perceived by otologically normal adult listeners under specific listening
conditions. The sounds may be recorded using a single microphone, using a head and torso simulator,
or, for sounds presented via earphones, the electrical signal delivered to the earphones may be used.
The method is based on the Moore-Glasberg-Schlittenlacher algorithm.
NOTE 1 Users who wish to study the details of the calculation method can review or implement the source
code which is entirely informative and provided with the standard for the convenience of the user.
This method can be applied to any sounds, including tones, broadband noises, complex sounds with
sharp line spectral components, musical sounds, speech, and impact sounds such as gunshots and sonic
booms.
Calculation of a single value for the overall loudness over the entire period of a time-varying signal
lasting more than 5 s is outside the scope of this document.
NOTE 2 It has been shown that, for steady tones, this method provides a good match to the contours
[18]
of equal loudness level as defined in ISO 226:2003 and the reference threshold of hearing as defined in
[19]
ISO 389-7:2019 .
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
IEC 60318-7, Electroacoustics – Simulators of human head and ear – Part 7: Head and torso simulator for
the measurement of sound sources close to the ear
IEC 61672-1, Electroacoustics - Sound level meters - Part 1: Specifications
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
sound pressure level
L
p
ten times the logarithm to the base 10 of the ratio of the square of the sound pressure, p, to the square
of a reference value, p , expressed in decibels
p
L =10lg dB
p
p
where the reference value, p , in air is 20 μPa
Note 1 to entry: Because of practical limitations of the measuring instruments, p is always understood to
denote the square of a frequency-weighted, frequency-band-limited or time-weighted sound pressure. If specific
frequency and time weightings as specified in IEC 61672-1 and/or specific frequency bands are applied, this
should be indicated by appropriate subscripts; e.g. L denotes the A-weighted sound pressure level with
p,AS
time weighting S (slow). Frequency weightings such as A-weighting should not be used when specifying sound
pressure levels for the purpose of loudness calculation using the current procedure.
[20]
Note 2 to entry: This definition is technically in accordance with ISO 80000-8:2020, 8-22 .
3.2
filter
any device or mathematical operation which, when applied to a complex signal, passes energy of signal
components of certain frequencies while substantially attenuating energy of signal components of all
other frequencies
3.3
band-pass filter
filter (3.2) that passes signal energy within a certain frequency band and rejects most of the signal
energy outside of this frequency band
3.4
sound spectrum
representation of the magnitudes (and sometimes of the phases) of the components of a complex sound
as a function of frequency
3.5
auditory filter
filter (3.2) within the human cochlea describing the frequency resolution of the auditory system, whose
characteristics are usually estimated from the results of masking experiments
3.6
ERB
n
equivalent rectangular bandwidth of the auditory filter for otologically normal persons
width of an idealised rectangular band-pass filter (3.3) that has the same peak transmission as the
auditory filter (3.5) at the same centre frequency and that passes the same power for a white noise input
(in Hz)
Note 1 to entry: The subscript n indicates that the value applies for listeners with otologically normal hearing.
Note 2 to entry: The unconventional use of a multiletter abbreviated term presented in italics and with a subscript
is used here in the place of a symbol to maintain the use of an established notation and to avoid confusion.
3.7
ERB -number scale
n
equivalent rectangular bandwidth number scale
transformation of the frequency scale constructed such that an increase in frequency equal to one ERB
n
(Hz) (3.6) leads to an increase of one unit on the ERB -number scale
n
Note 1 to entry: The unit of the ERB -number scale is the Cam. For example, the value of ERB for a centre
n n
frequency of 1 000 Hz is approximately 132 Hz, so an increase in frequency from 934 Hz to 1 066 Hz corresponds
to a step of one Cam. The equation relating ERB -number to frequency is given in 7.4.
n
3.8
loudness level
sound pressure level of a frontally incident, sinusoidal plane progressive wave, presented binaurally
at a frequency of 1 000 Hz that is judged by otologically normal persons as being as loud as the given
sound
Note 1 to entry: Loudness level is expressed in phons.
3.9
loudness
perceived magnitude of a sound, which depends on the acoustic properties of the sound and the specific
listening conditions, as estimated by otologically normal listeners
Note 1 to entry: Loudness is expressed in sones.
Note 2 to entry: Loudness depends primarily upon the sound pressure although it also depends upon the
frequency, waveform, bandwidth, and duration of the sound.
Note 3 to entry: One sone is the loudness of a sound whose loudness level is 40 phon.
Note 4 to entry: A sound that is twice as loud as another sound is characterized by doubling the number of sones.
3.10
short-term loudness
loudness of an individual brief segment of sound, such as a syllable in speech, a single musical note, or a
short burst of a sound, typically lasting up to 500 ms
3.11
long-term loudness
loudness of a long sound, such as a whole sentence, a musical phrase, or a continuous noise, typically
lasting up to 5 s
Note 1 to entry: The overall loudness of a sound or soundscape lasting longer than 5 s can be estimated by post-
processing of the long-term loudness as a function of time. Such post-processing is outside the scope of this
standard, but some possible methods are described in References [10] to [13].
3.12
excitation
E
output of an auditory filter (3.5) centred at a given frequency, specified in units that are linearly related
to power
Note 1 to entry: An excitation of 1 unit is produced at the output of an auditory filter centred at 1 000 Hz by a tone
with a frequency of 1 000 Hz with a sound pressure level of 0 dB presented in a free field with frontal incidence.
3.13
excitation level
L
E
ten times the logarithm to the base 10 of the ratio of the excitation (3.12) at the output of an auditory
filter (3.5) centred at the frequency of interest to the reference excitation (3.12), E
E
L =10lg dB
E
E
where the reference excitation E is the excitation produced by a 1 000 Hz tone with a sound pressure
level of 0 dB presented in a free field with frontal incidence
3.14
specific loudness
N'
calculated loudness evoked over a frequency band with a bandwidth of 1 ERB centred on the frequency
n
of interest
4 General
The method described in this document specifies a method for calculating loudness and loudness level
of any sound based on the Moore-Glasberg-Schlittenlacher procedure.
The method involves a sequence of stages. Each stage is described below. However, it is envisaged
that those wishing to calculate loudness using this procedure will use one of the computer programs
(see Annex A) provided with this document that implements the described procedure. It is not expected
that the procedure will be implemented “by hand”. Such computations would be very time consuming.
The source code provided in Annex A gives an example of the implementation of the method. Other
implementations using different software are possible.
NOTE 1 The computational procedure described in this document is an updated version of procedures
published earlier elsewhere in References [1] to [5].
NOTE 2 Uncertainties are addressed in Clause 8.
5 Input signal
The signal that is used as input to the algorithm is the waveform for each ear (left and right), sampled
1)
using a 32 kHz sampling rate. If the Matlab® code described in Annex C is used, higher sampling rates
for the signal are allowed. These are automatically converted by the Matlab® software to a 32 kHz
sampling rate. The signal can be obtained in three ways.
5.1 Single microphone
The sound can be recorded using a single microphone placed at the centre of the position of the listener’s
head, after the listener has been removed from the sound field. In this case, the sound would be diotic
(the same at the two ears) and the single recorded signal would be presented to both input channels of
the algorithm.
5.2 Two microphones in the ear canals or microphones in a head and torso simulator
The sound can be recorded using two small probe microphones with the tips placed close to each ear
drum (left and right) or using the two ear simulators (left and right) in a head and torso simulator.
5.3 Earphone presentation
If the sound is delivered via earphones, the input signals for the algorithm correspond to the electrical
signals delivered to the earphones, but with allowance for the transfer function from each earphone to
the eardrum; see 7.2.4.
1) Matlab® is a trademark of MathWorks. This information is given for the convenience of users of this document
and does not constitute an endorsement by ISO of the product named. Equivalent products may be used if they can
be shown to lead to the same results
6 Instrumentation
Measuring instrumentation used to acquire a signal to be used as an input for method 5.1 and 5.2
shall conform to IEC 61672-1. The microphone(s) used for method 5.1 shall have an omnidirectional
characteristic or a free-field characteristic. If a head and torso simulator is used it shall conform to
IEC 60318-7. For signals acquired using a head and torso simulator, the transfer function of the simulator
as supplied by the equipment manufacturer or acquisition software shall be allowed for.
7 Description of the method
7.1 General
The procedure involves a sequence of processing operations, as illustrated in Figure 1.
For each ear, the processing operations are:
a) a filter to allow for the effects of transfer of sound through the outer and middle ear;
b) a short-term spectral analysis of the sound spectrum with greater frequency resolution at low than
at high frequencies;
c) calculation of an excitation pattern, representing the magnitudes of the outputs of the auditory
filters as a function of centre frequency;
d) application of a compressive nonlinearity to the output of each auditory filter to transform
excitation to specific loudness;
e) smoothing over time of the resulting instantaneous specific loudness pattern using an averaging
process resembling an automatic gain control (AGC) to give short-term specific loudness.
Subsequent stages are:
f) the short-term specific loudness patterns for each ear are used to calculate broadly-tuned binaural
inhibition functions, the amount of inhibition depending on the relative short-term specific
loudness at the two ears;
g) the inhibited specific loudness patterns are summed across frequency to give an estimate of the
short-term loudness for each ear;
h) the binaural short-term loudness is calculated as the sum of the short-term loudness values for the
two ears;
i) the long-term loudness for each ear is calculated by smoothing the short-term loudness for that ear,
again by a process resembling AGC;
j) the binaural long-term loudness is obtained by summing the long-term loudness across ears.
These steps are described sequentially in 7.2 to 7.9.
Figure 1 — Flow chart illustrating the sequence of processing operations in the method
7.2 Transfer of sound through the outer and middle ear
7.2.1 General
The transfer of sound through the outer and middle ear is modelled using one of three finite impulse
response (FIR) filters with 4 097 coefficients. Different filters are used depending on the method by
which the sound was picked up and the method by which the sound was delivered to the listeners. Each
filter represents the combined effect of the outer ear and the middle ear. The transfer function for the
[9]
middle ear is the same as for ISO 532-2:2017 , 7.3 and is specified in column 4 of Table 1. This transfer
function is referred to as “middle ear only”.
Table 1 — Transfer functions
Frequency Difference between the sound Difference between the sound Scaled transfer
pressure level at the tympan- pressure level at the tympan- function value for
ic membrane and the sound ic membrane and the sound the middle ear
pressure level measured in the pressure level measured in
free field (in the absence of a the diffuse field (in the ab-
listener) sence of a listener)
Hz dB dB dB
20 0,0 0,0 −39,6
25 0,0 0,0 −32,0
31,5 0,0 0,0 −25,85
40 0,0 0,0 −21,4
50 0,0 0,0 −18,5
63 0,0 0,0 −15,9
a
Values are in a range that has not been validated.
TTabablele 1 1 ((ccoonnttiinnueuedd))
Frequency Difference between the sound Difference between the sound Scaled transfer
pressure level at the tympan- pressure level at the tympan- function value for
ic membrane and the sound ic membrane and the sound the middle ear
pressure level measured in the pressure level measured in
free field (in the absence of a the diffuse field (in the ab-
listener) sence of a listener)
Hz dB dB dB
80 0,0 0,0 −14,1
100 0,0 0,0 −12,4
125 0,1 0,1 −11,0
160 0,3 0,3 −9,6
200 0,5 0,4 −8,3
250 0,9 0,5 −7,4
315 1,4 1,0 −6,2
400 1,6 1,6 −4,8
500 1,7 1,7 −3,8
630 2,5 2,2 −3,3
750 2,7 2,7 −2,9
800 2,6 2,9 −2,6
1 000 2,6 3,8 −2,6
1 250 3,2 5,3 −4,5
1 500 5,2 6,8 −5,4
1 600 6,6 7,2 −6,1
2 000 12,0 10,2 −8,5
2 500 16,8 14,9 −10,4
3 000 15,3 14,5 −7,3
3 150 15,2 14,4 −7,0
4 000 14,2 12,7 −6,6
5 000 10,7 10,8 −7,0
6 000 7,1 8,9 −9,2
6 300 6,4 8,7 −10,2
8 000 1,8 8,5 −12,2
9 000 -0,9 6,2 −10,8
10 000 -1,6 5,0 −10,1
11 200 1,9 4,5 −12,7
12 500 4,9 4,0 −15,0
a
14 000 2,0 3,3 −18,2
a
15 000 -2,0 2,6 −23,8
a
16 000 2,5 2,0 −32,3
a
Values are in a range that has not been validated.
7.2.2 Free-field transfer function
For sounds presented in a free field from a frontal direction, it is assumed that the transformation from
free-field sound pressure (measured in the absence of the listener at the position corresponding to
[21]
the centre of the listener’s head) to eardrum sound pressure is as specified in Shaw . This transfer
[9]
function is the same as for ISO 532-2:2017 ,7.3 and is specified in column 2 of Table 1. The overall
transfer function for this option is the sum of the transfer functions in columns 2 and 4 of Table 1.
The free-field option can be used for sound picked up by a single microphone placed at the centre of the
position of the listener’s head, after the listener has been removed from the sound field. In this case, the
sound would be diotic (the same at the two ears).
7.2.3 Diffuse-field transfer function
For diffuse-field presentation, the transfer function is derived by averaging the sound-field-to-eardrum
transfer function over many directions of incidence. The values used are based on the average of
[21] to[23]
measurements given in in the literature . The diffuse-field transfer function is the same as
[9]
for ISO 532-2:2017 , 7.3 and is specified in column 3 of Table 1. The overall transfer function for this
option is the sum of the transfer functions in columns 3 and 4 of Table 1.
The diffuse-field option can be used for diffuse-field listening conditions using sound picked up by a
single microphone placed at the centre of the position of the listener’s head, after the listener has been
removed from the sound field. In this case, the sound would be diotic (the same at the two ears).
7.2.4 Signal recorded using microphones in the ear canals or using a Head and Torso Simulator
The sound waveforms can be recorded using small microphones placed in each ear canal close to
the eardrum or they can be recorded using a Head and Torso simulator (also called a dummy head
or acoustic manikin) that mimics the acoustic properties of the torso, head, pinna and ear canal, as
specified in IEC 60318-7. In this case, the middle-ear only option is used.
7.2.5 Earphone presentation
For sounds presented via earphones, the input to the algorithm can be based on the electrical signals
delivered to the earphones. The sensitivity level of the earphones (the sound pressure level produced
for a given applied voltage) at a given frequency (e.g. 1 000 Hz) shall be specified. The transfer function
chosen depends on the characteristics of the earphones and on the way that the electrical input to the
earphones is processed:
a) When the earphones have a free field-field response, i.e. when the transfer function from the
electrical signal at the input to the earphones to the sound pressure at the eardrum approximates
the transfer function in column 2 of Table 1, the free-field option is used.
b) When the waveforms of the sounds have been pre-processed using a free-field equalizer prior to
[24]
delivery to the earphones , the free-field option is used.
c) When the earphones have a diffuse-field response, i.e. when the transfer function from the
electrical signal at the input to the earphones to the sound pressure at the eardrum approximates
the transfer function in column 3 of Table 1, the diffuse-field option is used.
d) When the waveforms of the sounds have been pre-processed using a diffuse-field equalizer prior to
delivery to the earphones, the diffuse-field option is used.
e) When sounds are delivered via earphones with a “flat” response at the eardrum or when the
electrical signal delivered to the earphones is digitally filtered to simulate the response of the
earphone at the eardrum prior to being used as input to the model, the middle-ear only option is
used.
For sounds presented via earphones, the sounds can be identical at the two ears (diotic) or they can
differ at the two ears (dichotic).
7.3 Calculation of the running short-term spectrum
A running estimate of the sound spectrum at the output of the FIR filter for each ear is obtained by
calculating six Fast Fourier Transforms (FFTs) in parallel, using signal segment durations that decrease
with increasing centre frequency. This is done to give sufficient spectral resolution at low frequencies
and sufficient temporal resolution at high frequencies. The six FFTs are based on Hann-windowed
segments with durations of 2 ms, 4 ms, 8 ms, 16 ms, 32 ms and 64 ms, all aligned at their temporal
centres. The windowed segments are zero padded and all FFTs are based on 2 048 sample points. All
FFTs are updated every 1 ms. Hence, the overlap of successive segments is at least 50 %.
The use of a Hann window reduces the effective level relative to what would be obtained without a
window by 4,26 dB. However, this effect is partly offset by the fact that for a sinusoidal input signal,
the calculated short-term spectrum has significant energy in more than one bin. To allow for these
effects the level of each component in the calculated spectrum is increased by 3,32 dB. This leads to
a calculated loudness of exactly 1 sone for a 1 000 Hz sinusoid presented binaurally in free field with
frontal incidence with a sound pressure level of 40 dB.
Each FFT is used to calculate spectral magnitudes over a specific frequency range; values outside that
range are discarded. These ranges are 20 Hz to 80 Hz, 80 Hz to 500 Hz, 500 Hz to 1 250 Hz, 1 250 Hz to
2 540 Hz, 2 540 Hz to 4 050 Hz and 4 050 Hz to 15 000 Hz, for segment durations of 64 ms, 32 ms, 16 ms,
8 ms, 4 ms and 2 ms, respectively.
7.4 Calculation of the running short-term excitation pattern
An excitation pattern is calculated from the short-term sound spectrum at 1-ms intervals, using the
[9]
same method as described in ISO 532-2:2017 , 7.4, but with slightly coarser spacing of points along
the frequency axis. The excitation pattern is defined as the output of the auditory filters, plotted as a
function of centre frequency. The bandwidths and shapes of the filters depend on both the input sound
pressure level and the centre frequency, f , of the filter. The excitation pattern of a given sound is defined
c
as the output of the auditory filters represented as a function of f . The output can be specified either as
c
excitation ratio, E/E , or as excitation level, L , as defined in 3.12 and 3.13.
0 E
The equivalent rectangular bandwidth, ERB (in Hz), of the auditory filter for otologically normal
n
persons and for an input sound pressure level to the cochlea of 51 dB, is specified as a function of the
centre frequency of the band pass auditory filter, f (in Hz), by Formula (1):
c
ERBf=+24,,673 0 004368 1 (1)
()
nc
The characteristics of the auditory filter for other input levels are derived as described below.
For a given auditory filter (with a specific f ), the excitation is calculated by summing the power of the
c
output in response to all of the different frequency components in the input. A first stage in this process
is to sum the powers of the components of the input spectrum in 1-ERB -wide bands, where the width
n
of the bands is defined by Formula (1). The resulting power, converted to decibels (using the reference
excitation defined in 3.13), is referred to as the level per ERB and is denoted X. It is assumed that the
n
sharpness of the auditory filter depends on X.
The value of X is calculated using a rounded-exponential weighting function (called hereafter a filter)
rather than a rectangular weighting function. The rounded-exponential filter is defined by Formula (2)
Wg,efp=+1 gpxp − g (2)
() () ()
c
where the value of g at frequency f is given by Formula (3):
gf=− ff/ (3)
()
cc
and p is a dimensionless parameter determining the bandwidth and slope of the filter. For calculating
X, the value of p in Formula (2) is set to 4f /ERB . The output power of the rounded-exponential filter is
c n
calculated over the following ranges given by Formula (4):
For f < f g = 0 to 1 (4)
c
For f > f g = 0 to 4
c
The calculation of X is performed with the filter centred in turn on every component in the input
spectrum.
To calculate the output of a given auditory filter in response to a given group of frequency components,
it is first necessary to specify the shape of the filter. Each side of the filter is specified to have the
form given in Formula (2). The value of p for the lower side of the filter (frequencies below the centre
frequency) is denoted p , while the value of p for the upper side of the filter (frequencies above the
l
centre frequency) is denoted p . The value of p is invariant with level and is equal to 4f / ERB . The
u u c n
value of p is calculated as follows.
l
The value of p for X = 51 dB is set equal to 4f / ERB . Let p (X, f ) denote the value of p at level X and
l c n l c l
centre frequency f . Then Formula (5)
c
pX,,fp= 51 fD− pf51,/pX51,1000 −51 (5)
() () []() () ()
lc lc lc l
where pf()51, is the value of p at centre frequency f for X = 51 dB, p ()51,1000 denotes the value of
lc l c l
p at 1 000 Hz for X = 51 dB and D is a constant with unit 1/dB and a value of 0,35.
l
The final excitation pattern is plotted with the scale of centre frequency, f , transformed to an ERB -
c n
number scale. An increase in frequency equal to 1 ERB corresponds to a step of one unit on the ERB -
n n
number scale. For brevity, the unit of the ERB -number scale is denoted the Cam, and the scale is
n
denoted the Cam scale.
EXAMPLE According to Formula (1), the value of ERB for f = 1 000 Hz is approximately 132 Hz, so an
n c
increase in frequency from 934 Hz to 1 066 Hz corresponds to a step of 1 Cam.
The relationship of ERB -number i to f is given by an equation derived from Formula (1)
n c
if=+21,,366lg 0 004368 1 (6)
()
c
The excitation pattern is calculated for ERB -numbers i from 1,75 to 39 in steps of 0,25. The
n
corresponding centre frequencies f are calculated by inverting Formula (6).
c
7.5 Transformation of excitation into specific loudness
7.5.1 General
The short-term excitation at each centre frequency is transformed to specific loudness, using exactly
[9]
the same processing steps as described in ISO 532-2:2017 , 7.5. The specific loudness pattern for a
given ear at this stage is what would occur if there were no input to the other ear.
The excitation ratio E/E is transformed to specific loudness N′ in sone/ERB . The calculation of specific
0 n
loudness depends on two properties of the cochlea:
— excitation at the reference threshold of hearing, and
— gain of the cochlea for inputs with low sound pressure levels,
which are described in 7.5.2 and 7.5.3. Subclauses, 7.5.4 to 7.5.7, describe the calculation procedure
based on the excitation ratio E/E .
7.5.2 Reference excitation at the reference threshold of hearing
The reference threshold of hearing is the lowest detectable sound pressure level of a sound in the
absence of any other sounds. The function relating the excitation level at the reference threshold of
hearing to frequency for monaural listening is specified in Table 2 (binaural listening is considered in
7.7). Interpolation is used to determine values at frequencies between those shown in the table. The
values given in Table 2 are values at the peak of the excitation pattern for sinusoidal signals, i.e., values
at the output of the auditory filter centred at the signal frequency. Above 500 Hz, the excitation ratio at
the reference threshold of hearing is constant. The peak excitation ratio produced by a sinusoidal signal
at threshold (for monaural listening) is denoted E /E . For frequencies of 500 Hz and above, the
THRQ 0
value of E /E is 2,307 (equivalent to an excitation level, L , of 3,63 dB).
THRQ 0 E
Table 2 — Excitation level and value of 10lg G at the reference threshold of hearing for monaural
listening
Centre frequency Excitation level at reference 10lg G at reference threshold
threshold
(constant for all values above
500 Hz)
Hz dB dB
50 28,18 -24,55
63 23,90 -20,27
80 19,20 -15,57
100 15,68 -12,05
125 12,67 -9,04
160 10,09 -6,46
200 8,08 -4,45
250 6,30 -2,67
315 5,30 -1,67
400 4,50 -0,87
500 3,63 0,00
630 3,63 0,00
750 3,63 0,00
800 3,63 0,00
1 000 3,63 0,00
7.5.3 Gain of the cochlear amplifier for inputs with low sound pressure levels
The term G represents the gain of the cochlea for inputs with low sound pressure levels at a specific
frequency, relative to the gain at 500 Hz and above (which is assumed to be constant). The product of G
and E /E at a specific frequency is independent of frequency. Column 3 of Table 2 shows the value
THRQ 0
of G, expressed in decibels, for different frequencies.
EXAMPLE If E /E is a factor of ten higher than the value at 500 Hz and above, then G is equal to 0,1. More
THRQ 0
generally, if E /E is a factor K higher than the value at 500 Hz and above, then G is equal to 1/K.
THRQ 0
7.5.4 Calculation of specific loudness from excitation when E /E ≤ E/E
THRQ 0 0
When the excitation evoked by the signal of interest at a specific centre frequency is greater than or
equal to the value of E /E for that frequency, but less than or equal to 10 , which covers the range
THRQ 0
of most practical applications, the specific loudness is calculated by Formula (7):
α α
N′=CGEE/ +AA− (7)
()
where C = 0,063 sone/Cam. The quantity EE/ is dimensionless. For frequencies of 500 Hz and above,
the value of α is equal to 0,2 and the value of A is equal to 2 E /E . A is dimensionless. Below 500 Hz,
THRQ 0
the values of α and A are related to the value of G. The relationship of α to G is specified in Table 3. The
relationship of A to G is specified in Table 4. In these tables, G has been converted to decibel units.
Interpolation is used to determine values at frequencies between those shown in the table.
7.5.5 Calculation of specific loudness from excitation when E /E > E/E
THRQ 0 0
When the excitation evoked by the signal of interest at a specific centre frequency is less than the value
of E /E for that frequency, the specific loudness is calculated as shown in Formula (8):
THRQ 0
15,
α
2E E
α
N′=C G +AA− (8)
E
EE+
()
0
THRQ
7.5.6 Calculation of specific loudness from excitation when E/E > 10
When the excitation ratio evoked by the signal of interest at a specific centre fre
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...