5G; Update to fixed-point basic operators (3GPP TR 26.973 version 15.2.0 Release 15)

RTR/TSGS-0426973vf20

General Information

Status
Published
Publication Date
10-Mar-2019
Technical Committee
Current Stage
12 - Completion
Completion Date
11-Mar-2019
Ref Project
Standard
ETSI TR 126 973 V15.2.0 (2019-03) - 5G; Update to fixed-point basic operators (3GPP TR 26.973 version 15.2.0 Release 15)
English language
49 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


TECHNICAL REPORT
5G;
Update to fixed-point basic operators
(3GPP TR 26.973 version 15.2.0 Release 15)


3GPP TR 26.973 version 15.2.0 Release 15 1 ETSI TR 126 973 V15.2.0 (2019-03)

Reference
RTR/TSGS-0426973vf20
Keywords
5G
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© ETSI 2019.
All rights reserved.
TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are trademarks of ETSI registered for the benefit of its Members.
TM TM
3GPP and LTE are trademarks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and
of the oneM2M Partners. ®
GSM and the GSM logo are trademarks registered and owned by the GSM Association.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 2 ETSI TR 126 973 V15.2.0 (2019-03)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
Foreword
This Technical Report (TR) has been produced by ETSI 3rd Generation Partnership Project (3GPP).
The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables.
The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under
http://webapp.etsi.org/key/queryform.asp.
Modal verbs terminology
In the present document "should", "should not", "may", "need not", "will", "will not", "can" and "cannot" are to be
interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 3 ETSI TR 126 973 V15.2.0 (2019-03)
Contents
Intellectual Property Rights . 2
Foreword . 2
Modal verbs terminology . 2
Foreword . 4
Introduction . 4
1 Scope . 5
2 References . 5
3 Abbreviations . 5
4 Extension to the STL2009 Basic Operators . . 5
4.1 Analysis of the gap between current basic operators and modern DSP architectures . 5
4.2 Test methodology for validating the extended basic operators . 6
4.2.0 General . 6
4.2.1 Test methodology . 7
4.2.2 Test results for basic operator Mpy_32_16_1 . 8
4.2.3 Test results . 12
4.2.4 Test results conclusion . 12
5 Alternative EVS Implementation Using the Extended Basic Operators . 12
5.1 Merits of an alternative EVS implementation using the extended basic operators. 12
5.2 Example pseudo code to illustrate some of the benefits of modern DSP architectures . 15
5.3 Validation of an alternative EVS implementation using updated basic operators . 17
5.3.1 C-code inspection . 17
5.3.2 Objective performance evaluation of the alternative EVS implementation . 17
5.3.3 Subjective performance evaluation of the alternative EVS implementation . 18
6 Conclusions . 19
Annex A: Extended Basic Operators . 21
A.1 Basic operators that use 64 bit registers/accumulators . 21
A.2 Basic operators which use 32 bit precision multiply . 26
A.3 Basic operators which use complex data types . 32
A.4 Basic operators for control operation . 40
A.5 Basic operators for unsigned data types . 41
Annex B: Weights of the STL basic operators . 43
Annex C: Change history . 47
History . 48

ETSI
3GPP TR 26.973 version 15.2.0 Release 15 4 ETSI TR 126 973 V15.2.0 (2019-03)
Foreword
This Technical Report has been produced by the 3rd Generation Partnership Project (3GPP).
The contents of the present document are subject to continuing work within the TSG and may change following formal
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an
identifying change of release date and an increase in version number as follows:
Version x.y.z
where:
x the first digit:
1 presented to TSG for information;
2 presented to TSG for approval;
3 or greater indicates TSG approved document under change control.
y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections,
updates, etc.
z the third digit is incremented when editorial only changes have been incorporated in the document.
Introduction
The last major update to the ITU-T Basic Operators [6] was in 2005, with a follow on update in 2009. These basic
operators serve as a foundation for reference software of codecs specified by 3GPP. During the last several years,
processors with wide accumulators, and support for single-instruction-multiple-data (SIMD), and very long instruction
word (VLIW) features have become prevalent. The basic operators of 2009 now need to be extended to leverage these
capabilities of modern processors so that implementations with lower mega-cycles-per-second (MCPS) and lower-
power may be realized.
Enhanced Voice Services (EVS) is one of the recent codecs defined by 3GPP that can leverage these features of modern
processors. The existing EVS reference software would have to be appropriately modified to leverage these extended
basic operators without changing the underlying algorithm. This is referred to as an alternative EVS implementation
using the extended basic operators.
This alternative EVS implementation would have to be evaluated to ensure that inter-operability is maintained in
addition to ensuring that voice quality is not impacted.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 5 ETSI TR 126 973 V15.2.0 (2019-03)
1 Scope
The present document covers the following topics:
1) Assessment of the gaps between modern processors and the existing set of basic operators (STL2009) [6].
2) Proposal of an extended set of operators addressing modern DSP architectures as an extension to STL2009.
3) Assessment of merits of an alternative EVS implementation using extended STL2009 Basic Operators.
4) Proposal for validation of an alternative EVS implementation using extended STL2009 Basic Operators.
2 References
The following documents contain provisions which, through reference in this text, constitute provisions of the present
document.
- References are either specific (identified by date of publication, edition number, version number, etc.) or
non-specific.
- For a specific reference, subsequent revisions do not apply.
- For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same
Release as the present document.
[1] 3GPP TR 21.905: "Vocabulary for 3GPP Specifications".
[2] 3GPP TS 26.442: "Codec for Enhanced Voice Services (EVS); ANSI C code (fixed-point)".
[3] Recommendation ITU-T P.800 (08/1996): "Methods for subjective determination of transmission
quality".
[4] Recommendation ITU-T P.863 (09/2014): "Perceptual objective listening quality assessment".
[5] 3GPP TS 26.443: "Codec for Enhanced Voice Services (EVS); ANSI C code (floating-point)".
[6] Recommendation ITU-T G.191 (03/10): "Software tools for speech and audio coding
standardization".
[7] 3GPP TR 26.952: "Codec for Enhanced Voice Services (EVS); Performance Characterization
(Release 14)".
3 Abbreviations
For the purposes of the present document, the abbreviations given in 3GPP TR 21.905 [1] and the following apply. An
abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in
3GPP TR 21.905 [1].
SIMD Single Instruction Multiple Data
STL Software tools for speech and audio coding standardization
VLIW Very Long Instruction Word.
4 Extension to the STL2009 Basic Operators
4.1 Analysis of the gap between current basic operators and
modern DSP architectures
State-of-the-art processor architectures, such as the recent ones from Intel, ARM, QUALCOMM, Texas Instruments
etc., support wide accumulators, SIMD and VLIW capabilities. The last major update to the ITU-T Basic Operators was
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 6 ETSI TR 126 973 V15.2.0 (2019-03)
in 2005, with a follow on update in 2009 [6]. It appears that these earlier versions of the Basic Operators (2009 and
earlier) were influenced by older DSP architectures such as the Texas Instruments TMS320C5x and TMS320C54x
processors where the accumulator was 40 bits wide.
However, a survey of the state-of-the-art processor architectures shows that most of them support the following
capabilities:
- Wider (64 bit) accumulators and registers.
- Wider accumulators enable additional guard bits which eliminate the need for checking for saturation after every
basic operation.
- SIMD (Single Instruction Multiple Data) instructions which can process vector data. For example, a single
instruction can process two 32-bit data elements or four 16-bit elements in parallel.
- VLIW (Very Long Instruction Word) enables several operations to be executed in parallel in a single cycle.
Basic operators that are friendlier to compilers, and enable SIMD and VLIW features to be leveraged, can significantly
reduce implementation time. Improved compiler technology and software development tools interpret data types and
associated basic operators to map them to a processor architecture for better Out-of-box (OOB) performance. Without
this computer assisted optimization, an engineer would have to hand-optimize the code which would result in increased
engineering effort and longer time to market.
Many recent audio/hybrid codecs make extensive use of 16bit x 32bit MAC (multiply and accumulate) and 32bit x
32bit MAC operations which are realized quite differently between VLIW and SIMD architectures and the current
Basic Operators:
- Current STL2009 Basic operators require saturation and truncation after every multiply-accumulate (MAC)
operation to maintain bit-exactness.
- The current Basic operator saturation checks prevent use of SIMD parallelism.
- To maintain bit-exactness, cycles are wasted resulting in higher MCPS and power on VLIW and SIMD capable
devices.
- Higher precision variables, such as 64bit operands, are partitioned into smaller width operands, processed and
then put back to the original width. This results in an overhead and processor cycles are wasted.
Considering the capabilities of modern processor architectures, as well as the characteristics of the latest speech and
audio codecs, there is a need for extending STL2009 with additional basic operators & data types to better leverage the
capabilities of state-of-the-art processor architectures and characteristics of DSP algorithms.
4.2 Test methodology for validating the extended basic
operators
4.2.0 General
This clause describes a test framework that will compare the fixed-point arithmetic accuracy of the extended basic
operators against a floating-point implementation of the extended basic operators. Each basic operator will be tested for
4 different data patterns.
In table 1 below, the extended basic operators have been classified into four main classes. The test patterns used for
testing and the build options of the test framework are also shown below.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 7 ETSI TR 126 973 V15.2.0 (2019-03)
Table 1: Classification of the extended basic operators
Test framework for extended basic operators
Main class Subclass Total basops Covered basops
64-bit Integer Mac 4 4
64-bit Mac 7 7
64-bit Math 12 12
64-bit scale 7 7
64 bit accumulator 64-bit move 5 0
Complex Math 7 7
Complex Mac 9 9
Complex Move 10 0
Complex Complex Scale 9 9
32*16 bit Enh MAC 6 6
Enhanced 32 bit 32*32 bit Enh MAC 6 6
Control code ops  18 0
Total  100 67
Test data patterns:
- -1.0 to 1.0 float range with configurable interval.
- Random numbers.
- Special values: very low level values (e.g., in the range of 1e-3, 1e-6 etc.), nominal and large values
- Custom mode: users can specify their customized array of size N.
Build options:
MSVC 2017 and MSVC 2013 workspaces are provided, with 2 options:
- MSVC 2017/2013 project.
- Gcc based makefiles.
4.2.1 Test methodology
In Figure 1 below, a block diagram explains how to validate the extended STL2009 Basic Operators implementation
against a reference floating-point implementation. A data generator generates floating-point notation data values that are
then converted into fixed-point notation and these are input to the design under test (DUT) implementation of the
extended STL2009 Basic Operators implementation. The same fixed-point data is converted into floating-point
notation, and then input to a reference floating-point implementation of the extended STL2009 Basic Operators. The
fixed-point output of the DUT is converted to floating-point notation, and then compared against the reference floating-
point implementation output and an error value is generated and logged.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 8 ETSI TR 126 973 V15.2.0 (2019-03)

Figure 1: Block diagram illustrating how the fixed-point implementation is validated against a
floating-point reference implementation of the extended STL2009 basic operators
In the following clauses, the test results for an example basic operator, Mpy_32_16_1 are reported.
4.2.2 Test results for basic operator Mpy_32_16_1
The setup in figure 1 was used for testing with four different types of data:
1) Random input numbers
2) A sweep from a negative number to a positive number
3) A piecewise sweep from a negative number to a positive number
4) A custom input where a user can specify an array of size N with custom inputs
Figures 2, 3, 4 and 5 illustrate the results of the test for the above four different data types. The error between the fixed-
point implementation and floating-point implementation are extremely small thereby validating the fixed-point
implementation.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 9 ETSI TR 126 973 V15.2.0 (2019-03)

Figure 2: Test results for basic operator Mpy_32_16_1 using random input. The error between the
fixed-point output and floating-point output is very small.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 10 ETSI TR 126 973 V15.2.0 (2019-03)

Figure 3: Test results for basic operator Mpy_32_16_1 using a sweep input. The error between the
fixed-point output and floating-point output is very small.

ETSI
3GPP TR 26.973 version 15.2.0 Release 15 11 ETSI TR 126 973 V15.2.0 (2019-03)

Figure 4: Test results for basic operator Mpy_32_16_1 using a piecewise sweep input. The error
between the fixed-point output and floating-point output is very small.

ETSI
3GPP TR 26.973 version 15.2.0 Release 15 12 ETSI TR 126 973 V15.2.0 (2019-03)

Figure 5: Test results for basic operator Mpy_32_16_1 using a user defined custom input. The error
between the fixed-point output and floating-point output is very small.
4.2.3 Test results
For a complete report of the framework used, as well as the results of the test, please see the attachment
"Baseop_tst_frmwork.zip".
NOTE: The unsigned basic operators in clause A.5 were verified separately and are used by the EVS codec in TS
26.442 [2].
4.2.4 Test results conclusion
Based on the results reported in "precision_abs_err_report.csv", it can be concluded that the fixed-point implementation
of the extended basic operators all pass against the reference floating-point implementation of the same extended basic
operators.
5 Alternative EVS Implementation Using the Extended
Basic Operators
5.1 Merits of an alternative EVS implementation using the
extended basic operators
EVS [2] is a sophisticated hybrid audio-speech codec with several modes of operation. As such it has a large number of
functions. Manually optimizing this large set of functions is prohibitive from an effort (and therefore time) perspective.
Implementers will have to rely on computer assisted tools and compiler to get them as close to a final implementation as
possible, and spend the last mile in manual optimization to reach the final target performance. It is therefore imperative
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 13 ETSI TR 126 973 V15.2.0 (2019-03)
that the basic operators are defined in such a manner that they lend themselves to better leverage the features and
capabilities of modern DSP architectures. Data types need to be mapped to match the processor registers or operand
widths of data used in SIMD (Single Instruction Multiple Data) processing; basic operators need to be mapped to
processor instructions. A standard reference C code written with these aspects in mind will result in an implementation
that leverages SIMD and VLIW (Very Long Instruction Word) features of the processor better and results in an out-of-
the-box (OOB) performance that is quite close to the final desired performance. The compiler can optimize the code
across all the files and functions thereby significantly reducing manual optimization effort. Implementers can go to
market faster.
Figure 6 shows the benefits of creating an alternate reference C code for EVS using the updated basic operator:
1) Reduced hand-optimization efforts lead to reduced total engineering effort, and hence improved time to market.
2) Improved MCPS numbers in OOB and final hand-optimized code.
3) Reduced code size. Reduced MCPS and memory reduces overall power used. This should facilitate extended
battery life.
Figure 6: Benefits of proposed alternate reference C for EVS
Using the existing standard EVS Reference code version 14.0.0 as a starting point, an alternative C code that leverages
the proposed basic operators has been created. During this creation process, step by step, several key parameters have
been monitored such as the engineering effort spent expressed as time (days, weeks, months), and corresponding
reduction in MCPS.
Figure 7 shows the optimization level achieved versus engineering effort measured in units of time. As the figure
shows, the OOB performance of the existing reference C is at 269 MCPS, while the OOB performance of the proposed
alternative EVS reference C code is at 162 MCPS. This is a gain of 1.66x achieved in matter of a few days of
engineering effort. Next, time is spent restructuring the code and hand optimizing. The final hand-optimized version is
at 61.9 MCPS compared to 77.5 MCPS for the existing EVS reference implementation. This is a gain of 1.25x.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 14 ETSI TR 126 973 V15.2.0 (2019-03)

Figure 7: Impact of alternate reference C at different phases of the implementation process
In table 2, the improvement in weighted million operations per second (WMOPS) of the alternative EVS
implementation using extended basic operators is compared against the WMOPS of the existing EVS standard reference
code using STL2009 basic operators as a baseline. Second row shows a benefit of 1.07x with changing the weights for
STL2009 basic operators. Third row shows the total benefit of 1.17x with the use of the extended basic operators and
weight change of the existing STL2009 basic operators.
Table 2: WMOPS based Comparison of the alternative EVS implementation with existing EVS
implementation
Average WMOPS
EVS Code Base - STL_basops complexity
Improvement
Encoder Decoder Total
14.0.0 weights
Over Reference
Reference with
STL2009 weights as is
STL2009 53.3 24.2 77.5 1.00x
Reference with With new proposed weights for
STL2009 STL2009 50.6 22.1 72.7 1.07x
With new proposed weights for
Alternate Reference
STL2009 & for extended basic
with STL2017
operators 47.1 18.9 66 1.17x
Following test cases were used for WMOPS and MCPS calculation:
- Encoder test case: -rf HI 3 13200 32 stv32n2.INP stv32n2_rfHI3_13200_32kHz.COD
- Decoder test case: 32 stv32c_rfHI3_13200_32kHz.COD stv32c_rfHI3_13200.out
The WMOPS numbers reported in Table1 are average WMOPS for this worst case complexity test vector. Please refer
to 3GPP TR 26.952: Codec for Enhanced Voice Services (EVS); Performance Characterization (Release 14) [7], for a
more detailed explanation of WMOPs for EVS.
In table 3, the improvement in million cycles per seconds (MCPS) of the alternative EVS implementation is compared
against the MCPS of the existing EVS standard reference code on a specific DSP platform using STL2009 basic
operators as a baseline. A gain of 1.25x is observed.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 15 ETSI TR 126 973 V15.2.0 (2019-03)
The gain in final MCPS of 1.25x is significantly more than gain of 1.17x in WMOPS. The explanation is that the
existing method of computing WMOPS does not address the cycles gained with VLIW where multiple instructions are
executed in parallel. In addition, the current assigned integer weights of 1 or higher for SIMD and VLIW friendly
instructions does not account for the inherent parallelism possible of processing multiple operands in a single cycle in
modern processors.
Table 3: MCPS based Comparison of the alternative EVS implementation with existing EVS
implementation on a Cadence Tensilica HiFi DSP
ALT_REFC with
REFC with STL2009
STL2017
Perf parameter Performance improvement
Total (Enc + Dec) Total (Enc + Dec)
OOB MCPS 269.3 162.5 1.66x
Final MCPS 77.5 61.9 1.25x
Code size – OOB (in K
2117.3 2036.6 1.04x
Bytes)
5.2 Example pseudo code to illustrate some of the benefits of
modern DSP architectures
The following examples illustrate the benefits of VLIW and SIMD features of modern DSP architectures. The existing
reference code needs to be changed to leverage the extended basic operators that exploit the features of modern DSP
architectures. The following examples with pseudo code show that cycles are reduced from 4 to 2.
Example 1:
Original Reference C Code –
for (i=0: i {
acc = acc + a[i]*b[i]; /* multiply, truncate, and saturate are happening */
}
/* Regular implementation */
/* Multiply, truncate, and saturate are happening for each element. */
/* Truncate and saturate here imply that order of execution is important. Compiler cannot change this order of execution
without violating bit-exactness */

Int_32 acc;
acc = a[0]*b[0]; /* cycle 1 */
acc = acc + a[1]*b[1]; /* cycle 2 */
acc = acc + a[2]*b[2]; /* cycle 3 */
acc = acc + a[3]*b[3]; /* cycle 4 */

/* total cycles = 4: For processing 4 elements of array a and b */
/* For N elements it will take N cycles */

Example 2:
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 16 ETSI TR 126 973 V15.2.0 (2019-03)
Explanation of:
- How SIMD/VLIW friendly REFC code helps to reduce cycles.
- Why bit-exactness is violated when VLIW, SIMD features are used.

/* Example 2 - A: Implementation in 2 slots VLIW architecture */.
/* Since truncation and saturation is not required, Acc1 and Acc2 executed in 1 cycle in two different slots */
/* Final result in Acc does NOT match acc in regular implementation */
/* This 2-slot implementation is not bit-exact with regular implementation and therefore the need to define alternate
set of bit-streams */
/* Therefore the reference code has to be changed to take benefit of 2-slot architecture */
Int_64 Acc1, Acc2, Acc;
Acc1 = a[0]*b[0]; /* slot 0, cycle 1 */
Acc2 = a[1]*b[1]; /* slot 1, cycle 1 */
Acc1 = Acc1 + a[2]*b[2]; /* slot 0, cycle 2 */
Acc2 = Acc2 + a[3]*b[3]; /* slot 1, cycle 2 with VLIW supported */ /* Alternatively, this can be slot 0, cycle 2 if 2-way
SIMD is supported as illustrated in Example 2-B */

Acc = Acc1 + Acc2; /* slot 0, cycle 3. This will be done outside the loop, only once */

/* Total cycles for 4 elements = 3 */
/* For N elements it will take (N/2 + 1) cycles */

/* Example 2 - B: Implementation in 2 slots VLIW and 2-way SIMD architecture */.
/* Since truncation and saturation is not required, Acc1 and Acc2 executed in 1 cycle in two different slots */
/* In a 2-way SIMD architecture, 2 MAC operations can be done in a single cycle in single slot on two-32bit elements
stored in a 64 bit registers */
/* This SIMD/VLIW implementation is not bit-exact with regular implementation and therefore the need to define
alternate set of bit-streams */
/* Therefore the reference code has to be changed to take benefit of 2-way SIMD and 2-slot architecture */
Int_64 Acc1, Acc2, Acc;
/* One 64-bit register holds two 32 bit elements a[0] and a[1]. Another 64-bit register holds two 32 bit elements b[0]
and b[1]*/
Acc1 = a[0]*b[0] + a[1]*b[1]; /* slot 0, cycle 1 2-way SIMD mac */
Acc2 = a[2]*b[2] + a[3]*b[3]; /* slot 1, cycle 1 2-way SIMD mac*/
Acc = Acc1 + Acc2; /* slot 0, cycle 2. This will be done outside the loop, only once */
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 17 ETSI TR 126 973 V15.2.0 (2019-03)

/* Total cycles for 4 elements = 2 */
/* For N elements it will take (N/4 + 1) cycles */

In conclusion, for a loop processing N elements,
Example 1 will consume: N Cycles.
Example 2A will consume: (N/2 + 1) cycles.
Example 2B will consume: (N/4 + 1) cycles.
Hence a 2-way SIMD, 2-slot architecture, will provide close to 4X improvement in cycles for operations in a loop as
shown above.
5.3 Validation of an alternative EVS implementation using
updated basic operators
5.3.1 C-code inspection
Before starting the performance evaluation, the C-code of the alternative EVS implementation will be shared for
inspection, upon request, under NDA. The verification will be reported to SA4.
5.3.2 Objective performance evaluation of the alternative EVS
implementation
For the objective performance validation of the alternative implementation of EVS using the updated set of basic
operators, it is proposed to use the same procedure as has been used to validate the EVS floating-point. Namely, it is
proposed to process a P.800 compatible database [3] [exact database tbd] including speech and music and mixed test
samples by the following 4 combinations of the legacy fixed-point EVS [2] encoder and decoder (Ref_fxd) and the
evaluated EVS encoder and decoder (CuT):
a) Ref_fxd encoder – Ref_fxd decoder
b) CuT encoder – CuT decoder
c) Ref_fxd encoder – CuT decoder
d) CuT encoder – Ref_fxd decoder
The processing is performed according to EVS-7c and the resulting stimuli are evaluated using POLQA [4] with the
reference item being the direct item of the respective bandwidth and the test items being the EVS conditions. In other
words all stimuli are evaluated against the original signal.
For each condition and for each P.800 sample, the individual POLQA MOS-LQO scores are computed and the
differences for [a) – b)], [a) – c)] and [a) – d)] compared, both for the samples individually, and averaged for each test
condition. The proposed alternative EVS implementation and the standardized fixed-point implementation are
considered to perform equivalent if the difference values are within reasonable bounds.
It is further proposed to also objectively validate the performance of interoperation of this new EVS implementation
with the standardized EVS floating-point implementation [5] (Ref_flt) to make sure that there are no interoperability
issues when interoperating with the standardized floating-point EVS code. Consequently, two additional combinations
are added:
e) Ref_flt encoder – CuT decoder
f) CuT encoder – Ref_flt decoder
It is proposed that the objective evaluation is performed for all the conditions that were subjectively evaluated in the
EVS Selection Tests and for all conditions that were subjectively evaluated in the EVS Characterization Tests.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 18 ETSI TR 126 973 V15.2.0 (2019-03)
The analysis would follow the template in Table 4 for all the individual samples and all conditions. Additionally, for
each test condition, as well as for all the conditions combined, the following statistics will be also provided – average
difference, minimum difference, maximum difference, standard deviation and 95% confidence interval. For better
visualization, histograms or cumulative distribution functions of the differences may also be provided.
Table 4: Template for result presentation
Input Bandwidth Bit DTX Level FER/Profil a) - b) a) - c) a) – d) a) – e) a) – f)
rate e
5.3.3 Subjective performance evaluation of the alternative EVS
implementation
The goal of the subjective performance evaluation of the alternative EVS implementation is to complement the
objective validation as a sanity check. It covers all relevant configurations with emphasis on most relevant ones to
minimize the number of subjective tests. In particular:
1) Bitrates: All EVS bitrates are included, both of the EVS native modes (5.9, 7.2, 8, 13.2, 13.2 CAM, 16.4, 24.4,
32, 48, 56, 96, 128 kb/s) and the AMR-WB IO modes (23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85 and
6.6 kb/s). This is done through constant bitrate conditions or bitrate switching conditions in order to minimize
the necessary number of subjective experiments, and yet cover all the bitrates.
2) Bandwidth: It is proposed to include only WB and SWB experiments in the subjective evaluation as most
relevant for EVS operation. Further, it is assumed that most of the NB technologies are also included within WB
or SWB EVS operation. Finally FB operation is algorithmically very similar to the SWB operation.
3) Input levels: 16, 26, 36 dBov input levels are tested.
4) Noisy speech is evaluated in one experiment.
5) Mixed & Music inputs are evaluated in one experiment.
6) Impaired channel & Jitter Buffer Management (JBM) conditions are spread across all experiments. The Frame
Erasure Rates (FERs) or network error profiles have been selected such that they should allow to uncover any
issues in operation in impaired channels, yet the channel is not too bad to significantly influence the test
resolution for clean channel conditions.
7) Rate switching is included, as mentioned above.
8) Tandem conditions were not included in the test as it is assumed that any implementation issues should be
uncovered in conditions without tandeming. Further, tandem operation is not foreseen as a major operation use-
case for EVS.
The methodology used is P.800 ACR or DCR reflecting the EVS Selection and Characterization tests. It is proposed to
use 4 different talkers (two male and two female talkers), and 6 panels of 4 listeners. This set-up gives 96 votes per
condition (6panels*4talkers*4listeners).
Similarly to the objective tests, the following 4 configurations will be tested in all experiments:
a) Ref_fxd encoder – Ref_fxd decoder
b) CuT encoder – CuT decoder
c) Ref_fxd encoder – CuT decoder
d) CuT encoder – Ref_fxd decoder

Experiment 1 - WB clean speech ACR (17 conditions per codec configuration):
-16 dBov clean channel - 5.9 kb/s, switching: 7.2-9.6 kb/s, 13.2-96 kb/s, AMR-WB IO, DTX ON
-26 dBov clean channel - 7.2 kb/s, 13.2 kb/s, 13.2 kb/s Channel-Aware Mode (CAM), 24.4 kb/s, DTX ON
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 19 ETSI TR 126 973 V15.2.0 (2019-03)
-36 dBov clean channel - 5.9 kb/s DTX ON, switching: 7.2-9.6 kb/s, 13.2-96 kb/s, AMR-WB IO, DTX OFF
-26 dBov random 3% FER - 5.9 kb/s, switching: 7.2-9.6 kb/s, 13.2-96 kb/s, AMR-WB IO, DTX ON
-26 dBov Profile 8(6.2%) – 13.2 kb/s Channel-Aware Mode (CAM), DTX ON

Experiment 2 - SWB clean speech DCR (6 conditions per codec configuration):
-16 dBov clean channel - 9.6 kb/s, 13.2 kb/s, DTX OFF
-36 dBov clean channel - 24.4 kb/s, switching 32-128 kb/s, DTX ON
-26 dBov Profile 7(3.3%) - switching 9.6 - 24.4 kb/s, DTX ON
-26 dBov Profile 8(6.2%) - 13.2 kb/s CAM, DTX ON

Experiment 3 - SWB noisy speech DCR - 26 dBov, Street noise at 20 dB SNR (6 conditions per codec
configuration):
clean channel - 9.6 kb/s, DTX ON
clean channel - 13.2 kb/s, DTX ON
clean channel - 24.4 kb/s, DTX ON
3% random FER - switching 9.6 - 24.4 kb/s, DTX ON
3% random FER - switching 32 - 128 kb/s, DTX ON
Profile 8(6.2%) - 13.2 kb/s CAM, DTX ON

Experiment 4 - SWB mixed and music DCR (6 conditions per codec configuration):
-16 dBov clean channel - 9.6 kb/s, DTX ON
-26 dBov clean channel - 13.2 kb/s, DTX ON
-36 dBov clean channel - 24.4 kb/s, DTX ON
-26 dBov 3% random FER - switching 9.6 - 24.4 kb/s, DTX ON
-26 dBov 3% random FER - switching 32 - 128 kb/s, DTX ON
-26 dBov Profile 8(6.2%) - 13.2 kb/s CAM, DTX ON
6 Conclusions
During the recent several years, processors with wide accumulators, SIMD support and VLIW features have become
prevalent. On the other hand, the latest major update to the ITU-T Basic Operators [6] that serve as a foundation for
reference software of codecs specified by 3GPP occurred in 2005, with a consequent update in 2009. EVS [2], the latest
speech and audio codec standardized by 3GPP, was specified using those operators.
Given the information collected during the study, it is recommended to submit the proposed new set of basic operators
to the STL GitHub open source environment as an extension of the current ITU-T Basic Operators. It is further
recommended to inform ITU-T Study Group 12 of the new set of basic operators including the updated weights for the
current set of basic operators, which have been agreed in 3GPP SA4, and request them to update Recommendation ITU-
T G.191 (STL) [6] accordingly so that the new set of basic operators will be available for future codec standardizations.
It was further shown that by implementing the EVS codec using the new set of basic operators, a complexity gain of
about 25% can be obtained for a given example of a modern processor. The corresponding decrease in WMOPS was
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 20 ETSI TR 126 973 V15.2.0 (2019-03)
about 10%. It is thus recommended to begin normative work with the objective of specifying an alternative
implementation of the EVS codec using the new set of basic operators. The evaluation of the alternative implementation
should follow the guidelines outlined in clause 5.3 of the present document.
ETSI
3GPP TR 26.973 version 15.2.0 Release 15 21 ETSI TR 126 973 V15.2.0 (2019-03)
Annex A:
Extended Basic Operators
Name: enh64.c, enh32.c, complex_basop.c, enhUL32.c
Associated header file: enh64.h, enh32.h complex_basop.h, enhUL32.h
Variable definitions:
C_var1, C_var2: 16 bit complex variables
CL_var1, CL_var2: 32 bit complex variables
W_var1, W_var2: 64 bit variables
L_var1, L_var2: 32 bit variables
UL_var1, UL_var2, UL_varout_h, UL_varout_l: 32 bit unsigned variables
var1, var2: 16 bit variables
U_var1, U_varout_l: 16 bit unsigned variables
A.1 Basic operators that use 64 bit
registers/accumulators
W_add_nosat(W_var1, W_var2)
Adds the two 64-bit variables W_var1 and W_var2 without saturation
control on 64 bits.
W_sub_nosat (W_var1, W_var2) Subtracts the two 64-bit variables W_var1 and W_var2 without
saturation control on 64 bits.

W_shl (W_var1, var2)
Arithmetically shifts left the 64-bit variable W_var1 by var2 positions:

if var2 is negative, W_var1 is shifted to the least significant bits by (–
var2) positions with extension of the sign bit.
if var2 is positive, W_var1 is shifted to the most significant bits by (var2)
positions with saturation control on 64 bits.

W_shr (W_var1, var2) Arithmetically shifts right the 64-bit variable W_var1 by var2 positions:
if var2 is negative, W_var1 is shifted to the most significant bits by (–
var2) positions with saturation control on 64 bits .
if var2 is positive, W_var1 is shifted to the least significant bits by (var2)
positions with extension of the sign bit.

ETSI
3GPP TR 26.973 version 15.2.0 Release 15 22 ETSI TR 126 973 V15.2.0 (2019-03)
W_shl_nosat (W_var1, var2) Arithmetically shifts left the 64-bit variable W_var1 by var2 positions:
if var2 is negative, W_var1 is shifted to the least significant bits by (–
var2) positions with extension of the sign bit.
if var2 is positive, W_var1 is shifted to the most significant bits by (var2)
positions without saturation control on 64 bits.

W_shr_nosat (W_var1, var2) Arithmetically shifts right the 64-bit variable W_var1 by var2 positions:
if var2 is negative, W_var1 is shifted to the most significant bits by (–
var2) positions without saturation control on 64 bits .
if var2 is positive, W_var1 is shifted to the least significant bits by (var2)
positions with extension of the sign bit.

W_mult_32_16 (L_var1, var2) Multiplies the signed 32-bit variable L_var1 with signed 16-bit variable
var2. Shift the product left by 1 and sign extend to 64-bits without

saturation control.
The operation is performed in fractional mode.
For example, if L_var1 is in 1Q31 format and var2 is in 1Q15 format,
then the result is produced in 17Q47 format.

W_mac_32_16 (W_acc, L_var1, var2) Multiplies the signed 32-bit variable L_var1 with signed 16-bit variable
var2. Shift the product left by 1 and sign extend to 64-bits without

saturation control;
add this 64 bit value to the 64 bit W_acc without saturation control, and
return a 64 bit result
The operation is performed in fractional mode.
For example, if L_var1 is in 1Q31 format and var2 is in 1Q15 format,
then the product is produced in 17Q47 format which is then added to
W_acc (in 17Q47) format. The final result is in 17Q47 format.

W_msu_32_16 (W_acc, L_var1, var2
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...