SIST ETS 300 726 E1:2003
(Main)Digital cellular telecommunications system; Enhanced Full Rate (EFR) speech transcoding (GSM 06.60)
Digital cellular telecommunications system; Enhanced Full Rate (EFR) speech transcoding (GSM 06.60)
The transcoding procedure specified in this ETS is applicable for the enhanced full rate traffic channel (TCH) in the GSM system. In GSM EFR 06.51, a reference configuration for the speech transmission chain of the GSM EFR system is shown. According to this reference configuration, the speech encoder takes its input as a 13-bit uniform PCM signal either from the audio part of the Mobile Station or on the network side, from the PSTN via an 8-bit/A-law to 13-bit uniform PCM conversion. The encoded speech at the output of the speech encoder is delivered to a channel encoder unit which is specified in GSM 05.03. In the receive direction, the inverse operations take place. This draft European Technical Specification (ETS) describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform PCM format to encoded blocks of 260 bits and from encoded blocks of 260 bits to output blocks of 160 reconstructed speech samples. The sampling rate is 8000 sample/s leading to a bit rate for the encoded bit stream of 13 kbit/s. The coding scheme is the so-called Algebraic Code Excited Linear Prediction Coder, here-after referred to as ACELP. This ETS also specifies the conversion between A-law PCM and 13-bit uniform PCM. Performance requirements for the audio input and output parts are included only to the extent that they affect the transcoder performance. This part also describes the codec down to the bit level, thus enabling the verification of compliance to the part to a high degree of confidence by use of a set of digital test sequences. These test sequences are also described and are available on disks.
Digitalni celični telekomunikacijski sistem – Prekodiranje izboljšanega govora s polno hitrostjo (EFR) (GSM 06.60)
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
01-december-2003
'LJLWDOQLFHOLþQLWHOHNRPXQLNDFLMVNLVLVWHP±3UHNRGLUDQMHL]EROMãDQHJDJRYRUDV
SROQRKLWURVWMR()5*60
Digital cellular telecommunications system; Enhanced Full Rate (EFR) speech
transcoding (GSM 06.60)
Ta slovenski standard je istoveten z: ETS 300 726 Edition 1
ICS:
33.070.50 Globalni sistem za mobilno Global System for Mobile
telekomunikacijo (GSM) Communication (GSM)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
EUROPEAN ETS 300 726
TELECOMMUNICATION March 1997
STANDARD
Source: ETSI TC-SMG Reference: DE/SMG-020660
ICS: 33.020
Key words: EFR, digital cellular telecommunications system, Global System for Mobile communications
(GSM), speech
R
GLOBAL SYSTEM FOR
MOBILE COMMUNICATIONS
Digital cellular telecommunications system;
Enhanced Full Rate (EFR) speech transcoding
(GSM 06.60)
ETSI
European Telecommunications Standards Institute
ETSI Secretariat
Postal address: F-06921 Sophia Antipolis CEDEX - FRANCE
Office address: 650 Route des Lucioles - Sophia Antipolis - Valbonne - FRANCE
X.400: c=fr, a=atlas, p=etsi, s=secretariat - Internet: secretariat@etsi.fr
Tel.: +33 4 92 94 42 00 - Fax: +33 4 93 65 47 16
Copyright Notification: No part may be reproduced except as authorized by written permission. The copyright and the
foregoing restriction extend to reproduction in all media.
© European Telecommunications Standards Institute 1997. All rights reserved.
Page 2
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Whilst every care has been taken in the preparation and publication of this document, errors in content,
typographical or otherwise, may occur. If you have comments concerning its accuracy, please write to
"ETSI Editing and Committee Support Dept." at the address shown on the title page.
Page 3
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Contents
Foreword .5
1 Scope .7
2 Normative references.7
3 Definitions, symbols and abbreviations.8
3.1 Definitions .8
3.2 Symbols .9
3.3 Abbreviations .15
4 Outline description.15
4.1 Functional description of audio parts .16
4.2 Preparation of speech samples .16
4.2.1 PCM format conversion.16
4.3 Principles of the GSM enhanced full rate speech encoder.17
4.4 Principles of the GSM enhanced full rate speech decoder.18
4.5 Sequence and subjective importance of encoded parameters.19
5 Functional description of the encoder .19
5.1 Pre-processing.19
5.2 Linear prediction analysis and quantization .19
5.2.1 Windowing and auto-correlation computation.19
5.2.2 Levinson-Durbin algorithm .21
5.2.3 LP to LSP conversion.21
5.2.4 LSP to LP conversion.23
5.2.5 Quantization of the LSP coefficients .24
5.2.6 Interpolation of the LSPs .25
5.3 Open-loop pitch analysis.25
5.4 Impulse response computation.26
5.5 Target signal computation .26
5.6 Adaptive codebook search .27
5.7 Algebraic codebook structure and search .28
5.8 Quantization of the fixed codebook gain.31
5.9 Memory update.32
6 Functional description of the decoder .32
6.1 Decoding and speech synthesis .32
6.2 Post-processing .34
6.2.1 Adaptive post-filtering.34
6.2.2 Up-scaling .35
7 Variables, constants and tables in the C-code of the GSM EFR codec.35
7.1 Description of the constants and variables used in the C code.36
8 Homing sequences .40
8.1 Functional description.40
8.2 Definitions .40
8.3 Encoder homing.42
8.4 Decoder homing .42
8.5 Encoder home state.42
8.6 Decoder home state .44
9 Bibliography.49
History.50
Page 4
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Blank page
Page 5
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Foreword
This European Telecommunication Standard (ETS) has been produced by the Special Mobile Group
(SMG) Technical Committee of the European Telecommunications Standards Institute (ETSI).
This ETS describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform
PCM format to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160
reconstructed speech samples within the digital cellular telecommunications system.
This ETS corresponds to GSM technical specification, GSM 06.60, version 5.1.2.
Transposition dates
Date of adoption: 28 February 1997
Date of latest announcement of this ETS (doa): 30 June 1997
Date of latest publication of new National Standard
or endorsement of this ETS (dop/e): 31 December 1997
Date of withdrawal of any conflicting National Standard (dow): 31 December 1997
Page 6
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Blank page
Page 7
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
1 Scope
This European Telecommunication Standard (ETS) describes the detailed mapping between input blocks
of 160 speech samples in 13-bit uniform PCM format to encoded blocks of 244 bits and from encoded
blocks of 244 bits to output blocks of 160 reconstructed speech samples. The sampling rate is
8 000 sample/s leading to a bit rate for the encoded bit stream of 12,2 kbit/s. The coding scheme is the
so-called Algebraic Code Excited Linear Prediction Coder, hereafter referred to as ACELP.
This ETS also specifies the conversion between A-law PCM and 13-bit uniform PCM. Performance
requirements for the audio input and output parts are included only to the extent that they affect the
transcoder performance. This part also describes the codec down to the bit level, thus enabling the
verification of compliance to the part to a high degree of confidence by use of a set of digital test
sequences. These test sequences are described in GSM 06.54 [7] and are available on disks.
In case of discrepancy between the requirements described in this ETS and the fixed point computational
description (ANSI-C code) of these requirements contained in GSM 06.53 [6], the description in
GSM 06.53 [6] will prevail.
The transcoding procedure specified in this ETS is applicable for the enhanced full rate speech traffic
channel (TCH) in the GSM system.
In GSM 06.51 [5], a reference configuration for the speech transmission chain of the GSM enhanced full
rate (EFR) system is shown. According to this reference configuration, the speech encoder takes its input
as a 13-bit uniform PCM signal either from the audio part of the Mobile Station or on the network side,
from the PSTN via an 8-bit/A-law to 13-bit uniform PCM conversion. The encoded speech at the output of
the speech encoder is delivered to a channel encoder unit which is specified in GSM 05.03 [3]. In the
receive direction, the inverse operations take place.
2 Normative references
This ETS incorporates by dated and undated reference, provisions from other publications. These
normative references are cited at the appropriate places in the text and the publications are listed
hereafter. For dated references, subsequent amendments to or revisions of any of these publications
apply to this ETS only when incorporated in it by amendment or revision. For undated references, the
latest edition of the publication referred to applies.
[1] GSM 01.04 (ETR 100): "Digital cellular telecommunications system (Phase 2);
Abbreviations and acronyms".
[2] GSM 03.50 (ETS 300 540): "Digital cellular telecommunications system
(Phase 2); Transmission planning aspects of the speech service in the GSM
Public Land Mobile Network (PLMN) system".
[3] GSM 05.03 (ETS 300 575): "Digital cellular telecommunications system
(Phase 2); Channel coding".
[4] GSM 06.32 (ETS 300 580-6): "Digital cellular telecommunications system
(Phase 2); Voice Activity Detection (VAD)".
[5] GSM 06.51 (ETS 300 723): "Digital cellular telecommunications system;
Enhanced Full Rate (EFR) speech processing functions General description".
[6] GSM 06.53 (ETS 300 724): "Digital cellular telecommunications system; ANSI-C
code for the GSM Enhanced Full Rate (EFR) speech codec".
[7] GSM 06.54 (ETS 300 725): "Digital cellular telecommunications system; Test
vectors for the GSM Enhanced Full Rate (EFR) speech codec".
[8] ITU-T Recommendation G.711 (1988): "Coding of analogue signals by pulse
code modulation Pulse code modulation (PCM) of voice frequencies".
Page 8
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
[9] ITU-T Recommendation G.726: "40, 32, 24, 16 kbit/s adaptive differential pulse
code modulation (ADPCM)".
3 Definitions, symbols and abbreviations
3.1 Definitions
For the purposes of this ETS, the following definitions apply:
adaptive codebook: The adaptive codebook contains excitation vectors that are adapted for every
subframe. The adaptive codebook is derived from the long term filter state. The
lag value can be viewed as an index into the adaptive codebook.
adaptive postfilter: This filter is applied to the output of the short term synthesis filter to enhance the
perceptual quality of the reconstructed speech. In the GSM enhanced full rate
codec, the adaptive postfilter is a cascade of two filters: a formant postfilter and
a tilt compensation filter.
algebraic codebook: A fixed codebook where algebraic code is used to populate the excitation
vectors (innovation vectors).The excitation contains a small number of nonzero
pulses with predefined interlaced sets of positions.
closed-loop pitch analysis: This is the adaptive codebook search, i.e., a process of estimating the pitch
(lag) value from the weighted input speech and the long term filter state. In the
closed-loop search, the lag is searched using error minimization loop
(analysis-by-synthesis). In the GSM enhanced full rate codec, closed-loop pitch
search is performed for every subframe.
direct form coefficients: One of the formats for storing the short term filter parameters. In the GSM
enhanced full rate codec, all filters which are used to modify speech samples
use direct form coefficients.
fixed codebook: The fixed codebook contains excitation vectors for speech synthesis filters. The
contents of the codebook are non-adaptive (i.e., fixed). In the GSM enhanced
full rate codec, the fixed codebook is implemented using an algebraic codebook.
fractional lags: A set of lag values having sub-sample resolution. In the GSM enhanced full rate
codec a sub-sample resolution of 1/6th of a sample is used.
frame: A time interval equal to 20 ms (160 samples at an 8 kHz sampling rate).
integer lags: A set of lag values having whole sample resolution.
interpolating filter: An FIR filter used to produce an estimate of sub-sample resolution samples,
given an input sampled with integer sample resolution.
inverse filter: This filter removes the short term correlation from the speech signal. The filter
models an inverse frequency response of the vocal tract.
lag: The long term filter delay. This is typically the true pitch period, or a multiple or
sub-multiple of it.
Line Spectral Frequencies: (see Line Spectral Pair)
Line Spectral Pair: Transformation of LPC parameters. Line Spectral Pairs are obtained by
decomposing the inverse filter transfer function A(z) to a set of two transfer
functions, one having even symmetry and the other having odd symmetry. The
Line Spectral Pairs (also called as Line Spectral Frequencies) are the roots of
these polynomials on the z-unit circle).
Page 9
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
LP analysis window: For each frame, the short term filter coefficients are computed using the high
pass filtered speech samples within the analysis window. In the GSM enhanced
full rate codec, the length of the analysis window is 240 samples. For each
frame, two asymmetric windows are used to generate two sets of LP
coefficients. No samples of the future frames are used (no lookahead).
LP coefficients: Linear Prediction (LP) coefficients (also referred as Linear Predictive Coding
(LPC) coefficients) is a generic descriptive term for describing the short term
filter coefficients.
open-loop pitch search:A process of estimating the near optimal lag directly from the weighted speech
input. This is done to simplify the pitch analysis and confine the closed-loop
pitch search to a small number of lags around the open-loop estimated lags. In
the GSM enhanced full rate codec, open-loop pitch search is performed every
10 ms.
residual: The output signal resulting from an inverse filtering operation.
short term synthesis filter: This filter introduces, into the excitation signal, short term correlation which
models the impulse response of the vocal tract.
perceptual weighting filter: This filter is employed in the analysis-by-synthesis search of the codebooks.
The filter exploits the noise masking properties of the formants (vocal tract
resonances) by weighting the error less in regions near the formant frequencies
and more in regions away from them.
subframe: A time interval equal to 5 ms (40 samples at an 8 kHz sampling rate).
vector quantization: A method of grouping several parameters into a vector and quantizing them
simultaneously.
zero input response: The output of a filter due to past inputs, i.e. due to the present state of the filter,
given that an input of zeros is applied.
zero state response: The output of a filter due to the present input, given that no past inputs have
been applied, i.e., given the state information in the filter is all zeroes.
3.2 Symbols
For the purposes of this ETS, the following symbols apply:
Az The inverse filter with unquantized coefficients
()
�
Az The inverse filter with quantified coefficients
()
Hz = The speech synthesis filter with quantified coefficients
()
�
Az
()
a The unquantized linear prediction parameters (direct form coefficients)
i
�
a The quantified linear prediction parameters
i
m
The order of the LP model
The long-term synthesis filter
Bz()
Page 10
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Wz The perceptual weighting filter (unquantized coefficients)
()
γγ, The perceptual weighting factors
Fz() Adaptive pre-filter
E
T The nearest integer pitch lag to the closed-loop fractional pitch lag of the
subframe
β The adaptive pre-filter coefficient (the quantified pitch gain)
�
Az(/γ )
n
Hz()= The formant postfilter
f
�
Az(/γ )
d
γ Control coefficient for the amount of the formant post-filtering
n
γ Control coefficient for the amount of the formant post-filtering
d
Hz() Tilt compensation filter
t
γ Control coefficient for the amount of the tilt compensation filtering
t
μγ= k' A tilt factor, with k ' being the first reflection coefficient
t1 1
hn() The truncated impulse response of the formant postfilter
f
L The length of hn()
h f
ri() The auto-correlations of hn()
h f
�
Az(/γ ) The inverse filter (numerator) part of the formant postfilter
n
�
1 /Az(/γ ) The synthesis filter (denominator) part of the formant postfilter
d
�
�
rn() The residual signal of the inverse filter Az(/γ )
n
hz() Impulse response of the tilt compensation filter
t
β ()n The AGC-controlled gain scaling factor of the adaptive postfilter
sc
α The AGC factor of the adaptive postfilter
Hz Pre-processing high-pass filter
()
h1
wn , wn LP analysis windows
() ()
I II
()I
L wn()
1 I
Length of the first part of the LP analysis window
Page 11
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
()I
L wn()
2 I
Length of the second part of the LP analysis window
()II
L wn()
1 II
Length of the first part of the LP analysis window
()II
L ()
wn
Length of the second part of the LP analysis window II
rk() The auto-correlations of the windowed speech sn'( )
ac
wi() Lag window for the auto-correlations (60 Hz bandwidth expansion)
lag
f
The bandwidth expansion in Hz
f The sampling frequency in Hz
s
rk'()
ac
The modified (bandwidth expanded) auto-correlations
Ei() The prediction error in the ith iteration of the Levinson algorithm
LD
k The ith reflection coefficient
i
()i
a The jth direct form coefficient in the ith iteration of the Levinson algorithm
j
'
Fz() Symmetric LSF polynomial
'
Fz()
Antisymmetric LSF polynomial
′
Fz() Polynomial Fz() with root z =−1 eliminated
1 1
′
Fz() Fz() with root z = 1 eliminated
2 2
Polynomial
q
i
The line spectral pairs (LSPs) in the cosine domain
q An LSP vector in the cosine domain
()n
�
q The quantified LSP vector at the ith subframe of the frame n
i
ω
i
The line spectral frequencies (LSFs)
Tx() A mth order Chebyshev polynomial
m
fi(),f (i) The coefficients of the polynomials Fz() and Fz()
12 1 2
'' ′ ′
fi(),f (i) The coefficients of the polynomials Fz() and Fz()
12 1 2
fi() The coefficients of either Fz() or Fz()
1 2
Page 12
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Cx() Sum polynomial of the Chebyshev polynomials
x Cosine of angular frequency ω
λ
Recursion coefficients for the Chebyshev polynomial evaluation
k
f The line spectral frequencies (LSFs) in Hz
i
t
f = ff�f The vector representation of the LSFs in Hz
[]
12 10
()1 ()2
z ()n ,z ()n The mean-removed LSF vectors at frame n
()1 ()2
r ()n , r ()n The LSF prediction residual vectors at frame n
()
pn The predicted LSF vector at frame n
()
�
r ()n−1 The quantified second residual vector at the past frame
k
�
f The quantified LSF vector at quantization index k
E The LSP quantization error
LSP
wi,,=11�,0, LSP-quantization weighting factors
i
d The distance between the line spectral frequencies f and f
i i+1 i−1
hn() The impulse response of the weighted synthesis filter
O The correlation maximum of open-loop pitch analysis at delay k
k
Oi,,=13�, The correlation maxima at delays ti,,=13�,
t i
i
Mt,,i=13,�, The normalized correlation maxima M and the corresponding delays
()
ii i
ti,,=13�,
i
Az(/γ)
Hz()W()z= The weighted synthesis filter
�
Az()A(z/γ )
Az(/γ ) The numerator of the perceptual weighting filter
1/(Az/γ ) The denominator of the perceptual weighting filter
T The nearest integer to the fractional pitch lag of the previous (1st or 3rd)
subframe
sn'( ) The windowed speech signal
sn() The weighted speech signal
w
Page 13
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
�
sn() Reconstructed speech signal
�
sn′() The gain-scaled post-filtered signal
�
sn() Post-filtered speech signal (before scaling)
f
xn() The target signal for adaptive codebook search
t
xn() x The target signal for algebraic codebook search
2 2
,
res ()n The LP residual signal
LP
cn() The fixed codebook vector
vn() The adaptive codebook vector
yn()=v(n)∗h(n) The filtered adaptive codebook vector
yn() The past filtered excitation
k
un() The excitation signal
�
un() The emphasized adaptive codebook vector
�
un'( ) The gain-scaled emphasized excitation signal
T The best open-loop lag
op
t Minimum lag search value
min
t Maximum lag search value
max
Rk() Correlation term to be maximized in the adaptive codebook search
b The FIR filter for interpolating the normalized correlation term Rk()
Rk() The interpolated value of Rk() for the integer delay k and fraction t
t
b The FIR filter for interpolating the past excitation signal un() to yield the
adaptive codebook vector vn()
A Correlation term to be maximized in the algebraic codebook search at index k
k
C The correlation in the numerator of A at index k
k k
E The energy in the denominator of A at index k
D k
k
Page 14
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
t
dH= x The correlation between the target signal xn and the impulse response
()
2 2
hn , i.e., backward filtered target
()
H The lower triangular Toepliz convolution matrix with diagonal h0 and lower
()
diagonals hh13,,� 9
() ( )
t
Φ=HH The matrix of correlations of hn
()
dn() The elements of the vector d
φ(,ij) The elements of the symmetric matrix Φ
c The innovation vector
k
C The correlation in the numerator of A
k
m The position of the i th pulse
i
ϑ The amplitude of the i th pulse
i
N The number of pulses in the fixed codebook excitation
p
E The energy in the denominator of A
D k
res ()n The normalized long-term prediction residual
LTP
bn() The sum of the normalized dn vector and normalized long-term prediction
()
residual res ()n
LTP
sn() The sign signal for the algebraic codebook search
b
'
dn()
Sign extended backward filtered target
'
φ (,ij) The modified elements of the matrix Φ , including sign information
t
z , zn() The fixed codebook vector convolved with hn()
En() The mean-removed innovation energy (in dB)
E The mean of the innovation energy
~
En() The predicted energy
bb b b The MA prediction coefficients
[]
12 3 4
�
Rk() The quantified prediction error at subframe k
Page 15
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
E The mean innovation energy
I
Rn() The prediction error of the fixed-codebook gain quantization
E The quantization error of the fixed-codebook gain quantization
Q
�
en() The states of the synthesis filter 1/Az()
en() The perceptually weighted error of the analysis-by-synthesis search
w
η The gain scaling factor for the emphasized excitation
g The fixed-codebook gain
c
'
g The predicted fixed-codebook gain
c
�
g The quantified fixed codebook gain
c
g
The adaptive codebook gain
p
g�
The quantified adaptive codebook gain
p
' '
γ =gg/ g g
A correction factor between the gain and the estimated one
gc c c
c c
�
γ The optimum value for γ
gc gc
γ Gain scaling factor
sc
3.3 Abbreviations
For the purposes of this ETS, the following abbreviations apply. Further GSM related abbreviations may
be found in GSM 01.04 [1].
ACELP Algebraic Code Excited Linear Prediction
AGC Adaptive Gain Control
CELP Code Excited Linear Prediction
FIR Finite Impulse Response
ISPP Interleaved Single-Pulse Permutation
LP Linear Prediction
LPC Linear Predictive Coding
LSF Line Spectral Frequency
LSP Line Spectral Pair
LTP Long Term Predictor (or Long Term Prediction)
MA Moving Average
4 Outline description
This ETS is structured as follows:
Section 4.1 contains a functional description of the audio parts including the A/D and D/A functions.
Section 4.2 describes the conversion between 13-bit uniform and 8-bit A-law samples. Sections 4.3 and
4.4 present a simplified description of the principles of the GSM EFR encoding and decoding process
respectively. In subclause 4.5, the sequence and subjective importance of encoded parameters are given.
Page 16
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
Section 5 presents the functional description of the GSM EFR encoding, whereas clause 6 describes the
decoding procedures. Section 7 describes variables, constants and tables of the C-code of the GSM EFR
codec.
4.1 Functional description of audio parts
The analogue-to-digital and digital-to-analogue conversion will in principle comprise the following
elements:
1) Analogue to uniform digital PCM
− microphone;
− input level adjustment device;
− input anti-aliasing filter;
− sample-hold device sampling at 8 kHz;
− analogue−to−uniform digital conversion to 13−bit representation.
The uniform format shall be represented in two's complement.
2) Uniform digital PCM to analogue
− conversion from 13−bit/8 kHz uniform PCM to analogue;
− a hold device;
− reconstruction filter including x/sin( x ) correction;
− output level adjustment device;
− earphone or loudspeaker.
In the terminal equipment, the A/D function may be achieved either
− by direct conversion to 13-bit uniform PCM format;
− or by conversion to 8-bit/A-law compounded format, based on a standard A-law codec/filter
according to ITU-T Recommendations G.711 [8] and G.714, followed by the 8-bit to 13-bit
conversion as specified in subclause 4.2.1.
For the D/A operation, the inverse operations take place.
In the latter case it should be noted that the specifications in ITU-T G.714 (superseded by G.712) are
concerned with PCM equipment located in the central parts of the network. When used in the terminal
equipment, this ETS does not on its own ensure sufficient out-of-band attenuation. The specification of
out-of-band signals is defined in GSM 03.50 [2] in clause 2.
4.2 Preparation of speech samples
The encoder is fed with data comprising of samples with a resolution of 13 bits left justified in a 16-bit
word. The three least significant bits are set to '0'. The decoder outputs data in the same format. Outside
the speech codec further processing must be applied if the traffic data occurs in a different representation.
4.2.1 PCM format conversion
The conversion between 8-bit A-Law compressed data and linear data with 13-bit resolution at the speech
encoder input shall be as defined in ITU-T Rec. G.711 [8].
ITU-T Rec. G.711 [8] specifies the A-Law to linear conversion and vice versa by providing table entries.
Examples on how to perform the conversion by fixed-point arithmetic can be found in ITU-T Rec. G.726
[9]. Section 4.2.1 of G.726 [9] describes A-Law to linear expansion and subclause 4.2.7 of G.726 [9]
provides a solution for linear to A-Law compression.
Page 17
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
4.3 Principles of the GSM enhanced full rate speech encoder
The codec is based on the code-excited linear predictive (CELP) coding model. A 10th order linear
prediction (LP), or short-term, synthesis filter is used which is given by:
Hz()== , (1)
m
�
−i
Az()
�
1+ az
∑ i
i=1
where ai�,,=1�,m, are the (quantified) linear prediction (LP) parameters, and m = 10 is the predictor
i
order. The long-term, or pitch, synthesis filter is given by:
= , (2)
−T
Bz()1−gz
p
where is the pitch delay and g is the pitch gain. The pitch synthesis filter is implemented using the
T
p
so-called adaptive codebook approach.
The CELP speech synthesis model is shown in figure 2. In this model, the excitation signal at the input of
the short-term LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed
(innovative) codebooks. The speech is synthesized by feeding the two properly chosen vectors from these
codebooks through the short-term synthesis filter. The optimum excitation sequence in a codebook is
chosen using an analysis-by-synthesis search procedure in which the error between the original and
synthesized speech is minimized according to a perceptually weighted distortion measure.
The perceptual weighting filter used in the analysis-by-synthesis search technique is given by:
Az(/γ )
Wz()= , (3)
Az(/γ )
where Az() is the unquantized LP filter and 01<<γγ≤ are the perceptual weighting factors. The
values γ = 09. and γ =06. are used. The weighting filter uses the unquantized LP parameters while
1 2
the formant synthesis filter uses the quantified ones.
The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency
of 8 000 sample/s. At each 160 speech samples, the speech signal is analysed to extract the parameters
of the CELP model (LP filter coefficients, adaptive and fixed codebooks' indices and gains). These
parameters are encoded and transmitted. At the decoder, these parameters are decoded and speech is
synthesized by filtering the reconstructed excitation signal through the LP synthesis filter.
Page 18
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
The signal flow at the encoder is shown in figure 3. LP analysis is performed twice per frame. The two
sets of LP parameters are converted to line spectrum pairs (LSP) and jointly quantified using split matrix
quantization (SMQ) with 38 bits. The speech frame is divided into 4 subframes of 5 ms each
(40 samples). The adaptive and fixed codebook parameters are transmitted every subframe. The two sets
of quantified and unquantized LP filters are used for the second and fourth subframes while in the first and
third subframes interpolated LP filters are used (both quantified and unquantized). An open-loop pitch lag
is estimated twice per frame (every 10 ms) based on the perceptually weighted speech signal.
Then the following operations are repeated for each subframe:
The target signal xn() is computed by filtering the LP residual through the weighted synthesis filter
Wz()H()z with the initial states of the filters having been updated by filtering the error between
LP residual and excitation (this is equivalent to the common approach of subtracting the zero input
response of the weighted synthesis filter from the weighted speech signal).
hn()
The impulse response, of the weighted synthesis filter is computed.
Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target xn()
and impulse response hn(), by searching around the open-loop pitch lag. Fractional pitch with
1/6th of a sample resolution is used. The pitch lag is encoded with 9 bits in the first and third
subframes and relatively encoded with 6 bits in the second and fourth subframes.
The target signal xn() is updated by removing the adaptive codebook contribution (filtered
adaptive codevector), and this new target, xn() , is used in the fixed algebraic codebook search
(to find the optimum innovation). An algebraic codebook with 35 bits is used for the innovative
excitation.
The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively
(with moving average (MA) prediction applied to the fixed codebook gain).
Finally, the filter memories are updated (using the determined excitation signal) for finding the target
signal in the next subframe.
The bit allocation of the codec is shown in table 1. In each 20 ms speech frame, 244 bits are produced,
corresponding to a bit rate of 12.2 kbit/s. More detailed bit allocation is available in table 6. Note that the
most significant bits (MSB) are always sent first.
Table 1: Bit allocation of the 12.2 kbit/s coding algorithm for 20 ms frame
Parameter 1st & 3rd subframes 2nd & 4th subframes total per frame
2 LSP sets 38
Pitch delay 9 6 30
Pitch gain 4 4 16
Algebraic code 35 35 140
Codebook gain 5 5 20
Total 244
4.4 Principles of the GSM enhanced full rate speech decoder
The signal flow at the decoder is shown in figure 4. At the decoder, the transmitted indices are extracted
from the received bitstream. The indices are decoded to obtain the coder parameters at each
transmission frame. These parameters are the two LSP vectors, the 4 fractional pitch lags, the 4
innovative codevectors, and the 4 sets of pitch and innovative gains. The LSP vectors are converted to the
LP filter coefficients and interpolated to obtain LP filters at each subframe. Then, at each 40-sample
subframe:
Page 19
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
- the excitation is constructed by adding the adaptive and innovative codevectors scaled by their
respective gains;
- the speech is reconstructed by filtering the excitation through the LP synthesis filter.
Finally, the reconstructed speech signal is passed through an adaptive postfilter.
4.5 Sequence and subjective importance of encoded parameters
The encoder will produce the output information in a unique sequence and format, and the decoder must
receive the same information in the same way. In table 6, the sequence of output bits s1 to s244 and the
bit allocation for each parameter is shown.
The different parameters of the encoded speech and their individual bits have unequal importance with
respect to subjective quality. Before being submitted to the channel encoding function the bits have to be
rearranged in the sequence of importance as given in table6 in 05.03 [3].
5 Functional description of the encoder
In this clause, the different functions of the encoder represented in figure 3 are described.
5.1 Pre-processing
Two pre-processing functions are applied prior to the encoding process: high-pass filtering and signal
down-scaling.
Down-scaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the
fixed-point implementation.
The high-pass filter serves as a precaution against undesired low frequency components. A filter with a
cut off frequency of 80 Hz is used, and it is given by:
−−12
..−+zz.
0 92727435 18544941 0 92727435
Hz()= . (4)
h1
−−12
1−+1.9059465zz0 9114024
Down-scaling and high-pass filtering are combined by dividing the coefficients at the numerator of
Hz() by 2.
h1
5.2 Linear prediction analysis and quantization
Short-term prediction, or linear prediction (LP), analysis is performed twice per speech frame using the
auto-correlation approach with 30 ms asymmetric windows. No lookahead is used in the auto-correlation
computation.
The auto-correlations of windowed speech are converted to the LP coefficients using the Levinson-Durbin
algorithm. Then the LP coefficients are transformed to the Line Spectral Pair (LSP) domain for
quantization and interpolation purposes. The interpolated quantified and unquantized filter coefficients are
converted back to the LP filter coefficients (to construct the synthesis and weighting filters at each
subframe).
5.2.1 Windowing and auto-correlation computation
LP analysis is performed twice per frame using two different asymmetric windows. The first window has its
weight concentrated at the second subframe and it consists of two halves of Hamming windows with
different sizes. The window is given by:
Page 20
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
πn
()I
05. 4−0.46 cos , nL=−01,�, ,
()I
L −1
wn()= (5)
I
()I
π()nL−
1 ()II() (I)
05. 4 0.46 cos ,, nL �,L L 1.
+ =+−
11 2
()I
L−1
()I ()I
The values L = 160 and L = 80 are used. The second window has its weight concentrated at
1 2
the fourth subframe and it consists of two parts: the first part is half a Hamming window and the second
part is a quarter of a cosine function cycle. The window is given by:
2πn
()II
05. 4−0.46 cos , nL=−01,�, ,
()II
21L −
wn()= (6)
II
()II
2π()nL−
1 ()II ()II (II)
cos ,, nL=+�,L L−1
11 2
()II
41L−
()II ()II
where the values L = 232 and L = 8 are used.
1 2
Note that both LP analyses are performed on the same set of speech samples. The windows are applied
to 80 samples from past speech frame in addition to the 160 samples of the present speech frame. No
samples from future frames are used (no lookahead). A diagram of the two LP analysis windows is
depicted below.
w (n) w (n)
I II
t
fram e n-1
fram e n
5 ms
20 m s
sub fram e
fram e (160 sam ples)
(40 samples)
Figure 1: LP analysis windows
The auto-correlations of the windowed speech sn'( ),n = 0,�,239 , are computed by:
rk()=−s'(n)s'(n k) , k=01,�,0, (7)
ac∑
nk=
and a 60 Hz bandwidth expansion is used by lag windowing the auto-correlations using the window:
Page 21
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
1 2πfi
wi()=−exp , i=11,�,0, (8)
lag
2 f
s
where f = 60 Hz is the bandwidth expansion and f = 8000 Hz is the sampling frequency. Further,
0 s
r ()0 is multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor
ac
at -40 dB.
5.2.2 Levinson-Durbin algorithm
�
The modified auto-correlations rr'(0)=1.0001 (0) and rkr'()==(k)w (k), 11k , 0, are
ac ac ac ac lag
ak,,=11�,0,
used to obtain the direct form LP filter coefficients by solving the set of equations.
k
ar''i k r(i) , i 11,,0.
()−=− = � (9)
∑ ac ac
k
k=1
The set of equations in (9) is solved using the Levinson-Durbin algorithm. This algorithm uses the
following recursion:
Er()00= '( )
LD ac
for i=11 to 0 do
()i−1
a=1
i−1
()i−1
ka=− r '(i−j) /E (i−1)
i ∑ ac LD
j
[]
j=0
()i
ak=
i i
for ji=−11 to do
(i) ()i−11()i−
aa=+ka
i
j j ij−
end
Ei()=−(11k )E (i− )
LD i LD
end
()10
aa==,,j 11�,0
The final solution is given as .
jj
The LP filter coefficients are converted to the line spectral pair (LSP) representation for quantization and
interpolation purposes. The conversions to the LSP domain and back to the LP filter coefficient domain
are described in the next clause.
5.2.3 LP to LSP conversion
The LP filter coefficients ak,,=11�,0 , are converted to the line spectral pair (LSP) representation for
k
quantization and interpolation purposes. For a 10th order LP filter, the LSPs are defined as the roots of the
sum and difference polynomials:
' −−11 1
Fz()=+A()z z A(z ) (10)
and
−−
' 11 1
Fz()=−A(z) z A(z ), (11)
Page 22
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
' '
respectively. The polynomial Fz() and Fz() are symmetric and anti-symmetric, respectively. It can
1 2
'
be proven that all roots of these polynomials are on the unit circle and they alternate each other. Fz()
'
has a root z=−1()ωπ= and Fz() has a root z==10()ω . To eliminate these two roots, we
define the new polynomials:
−
' 1
Fz()=+F()z /(1 z ) (12)
and
' −1
Fz()=−F()z /(1 z ). (13)
±ωj
i
Each polynomial has 5 conjugate roots on the unit circle e , therefore, the polynomials can be written
()
as
−−12
Fz()=− qz+z (14)
()
1 ∏ i
=13 9
i ,,,
and
−−12
Fz()=−12qz+z , (15)
2 ()i
∏
i=24,,,10
ω
where q= cos ω with being the line spectral frequencies (LSF) and they satisfy the ordering
()
i
ii
property 0<<ωω<�<ω<π . We refer to q as the LSPs in the cosine domain.
i
12 10
Since both polynomials Fz() and Fz() are symmetric only the first 5 coefficients of each polynomial
1 2
need to be computed. The coefficients of these polynomials are found by the recursive relations (for i=0
to 4):
fi()+=1 a +a −fi(),
11im+−i1
(16)
fi()+=1 a −a +fi(),
21im+−i2
where m = 10 is the predictor order.
The LSPs are found by evaluating the polynomials and at 60 points equally spaced
Fz() Fz()
1 2
between 0 and π and checking for sign changes. A sign change signifies the existence of a root and the
sign change interval is then divided 4 times to better track the root. The Chebyshev polynomials are used
to evaluate Fz() and Fz() . In this method the roots are found directly in the cosine domain q . The
{}
1 2 i
jω
polynomials Fz() or Fz() evaluated at ze= can be written as:
1 2
−j5ω
Fe()ω =2 C(x),
with:
C()x=+T(x) f(123)T()xf+ ( )T(x)+f( )T(xf)+ (4)T(x)+f(5)/2 , (17)
54 3 2 1
Page 23
ETS 300 726 (GSM 06.60 version 5.1.2): March 1997
where Tx()=ωcos(m ) is the mth order Chebyshev polynomial, and fi(), i=15,�,, are the
m
coefficients of either Fz() or Fz(), computed using the equations in (16). The polynomial Cx() is
1 2
evaluated at a certain value of x = cos(ω ) using the recursive relation:
for k=41 down to
λλ=25xf−+λ ()−k
kk++12k
end
Cx()=−xλλ+ f(52)/ ,
with initial values λ =1 and λ = 0. The details of the Chebyshev polynomial evaluation method are
5 6
found in P. Kabal and R.P. Ramachandran [6].
5.2.4 LSP to LP conversion
Once the LSPs are quantified and interpolated, they are converted back to the LP coefficient domain
a . The conversion to the LP domain is done as follows. The coefficients of Fz() or Fz() are found
{}
k 1 2
by expanding equations (14) and (15) knowing the quantified and interpolated LSPs qi, = 11,�,0 . The
i
following recursive relation is used to compute fi()
:
for i=15 to
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...