Universal serial bus interfaces for data and power - Part 1-7: Common components - USB Audio 3.0 device class definition data formats

IEC 62680-1-7:2019 describes in detail all the Audio Data Formats that are supported by the Audio Device Class. This document is considered an integral part of the Audio Device Class Specification, although subsequent revisions of this document are independent of the revision evolution of the main USB Audio Specification. This is to easily accommodate the addition of new Audio Data Formats without impeding the core USB Audio Specification.

Interfaces de bus universel en série pour les données et l'alimentation électrique - Partie 1-7: Composants communs - Définition de classes de dispositifs USB Audio 3.0 pour formats de données

L'IEC 62680-1-7:2019 décrit en détail tous les formats de données audio pris en charge par la classe des dispositifs audio. Le présent document est vu comme faisant partie intégrante de la Spécification de classe de dispositifs audio, bien que les révisions ultérieures du présent document soient indépendantes de l'évolution des révisions de la Spécification USB audio principale. L'objectif est de faciliter la prise en compte de l'ajout de nouveaux formats de données audio sans affecter la Spécification USB audio de base.

General Information

Status
Published
Publication Date
18-Sep-2019
Current Stage
PPUB - Publication issued
Start Date
28-Aug-2019
Completion Date
19-Sep-2019
Ref Project
Standard
IEC 62680-1-7:2019 - Universal serial bus interfaces for data and power - Part 1-7: Common components - USB Audio 3.0 device class definition data formats
English and French language
55 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


IEC 62680-1-7 ®
Edition 1.0 2019-09
INTERNATIONAL
STANDARD
NORME
INTERNATIONALE
colour
inside
Universal serial bus interfaces for data and power –
Part 1-7: Common components – USB Audio 3.0 device class definition data
formats
Interfaces de bus universel en série pour les données et l'alimentation
électrique –
Partie 1-7: Composants communs – Définition de classes de dispositifs USB
Audio 3.0 pour formats de données

Copyright © 1997-2016 USB-IF
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form
or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from
IEC, or USB-IF at the respective address given below. Any questions about USB-IF copyright should be addressed to
the USB-IF. Enquiries about obtaining additional rights to this publication and other information requests should be
addressed to t he IEC or your local IEC member National Committee.

IEC Central Office USB Implementers Forum, Inc.
3, rue de Varembé 3855 S.W. 153rd Drive
CH-1211 Geneva 20 Beaverton, OR 97003
Switzerland United States of America
Tel.: +41 22 919 02 11 Tel: +1 503-619-0426
info@iec.ch admin@usb.org
www.iec.ch www.usb.org
About the IEC
The International Electrotechnical Commission (IEC) is the leading global organization that prepares and publishes
International Standards for all electrical, electronic and related technologies.

About IEC publications
The technical content of IEC publications is kept under constant review by the IEC. Please make sure that you have the
latest edition, a corrigendum or an amendment might have been published.

IEC publications search - webstore.iec.ch/advsearchform Electropedia - www.electropedia.org
The advanced search enables to find IEC publications by a The world's leading online dictionary on electrotechnology,
variety of criteria (reference number, text, technical containing more than 22 000 terminological entries in English
committee,…). It also gives information on projects, replaced and French, with equivalent terms in 16 additional languages.
and withdrawn publications. Also known as the International Electrotechnical Vocabulary

(IEV) online.
IEC Just Published - webstore.iec.ch/justpublished

Stay up to date on all new IEC publications. Just Published IEC Glossary - std.iec.ch/glossary
details all new publications released. Available online and 67 000 electrotechnical terminology entries in English and
once a month by email. French extracted from the Terms and Definitions clause of
IEC publications issued since 2002. Some entries have been
IEC Customer Service Centre - webstore.iec.ch/csc collected from earlier publications of IEC TC 37, 77, 86 and
If you wish to give us your feedback on this publication or CISPR.

need further assistance, please contact the Customer Service

Centre: sales@iec.ch.
IEC 62680-1-7 ®
Edition 1.0 2019-09
INTERNATIONAL
STANDARD
NORME
INTERNATIONALE
colour
inside
Universal serial bus interfaces for data and power –

Part 1-7: Common components – USB Audio 3.0 device class definition data

formats
Interfaces de bus universel en série pour les données et l'alimentation

électrique –
Partie 1-7: Composants communs – Définition de classes de dispositifs USB

Audio 3.0 pour formats de données

INTERNATIONAL
ELECTROTECHNICAL
COMMISSION
COMMISSION
ELECTROTECHNIQUE
INTERNATIONALE
ICS 33.160; 35.100.20 ISBN 978-2-8322-7243-5

– 2 – IEC 62680-1-7–: 2012 – 9
© USB-IF:1997-2016
INTERNATIONAL ELECTROTECHNICAL COMMISSION
____________
UNIVERSAL SERIAL BUS INTERFACES FOR DATA AND POWER –

Part 1-7: Common components –
USB Audio 3.0 device class definition data formats

FOREWORD
1) The International Electrotechnical Commission (IEC) is a worldwide organization for standardization comprising
all national electrotechnical committees (IEC National Committees). The object of IEC is to promote
international co-operation on all questions concerning standardization in the electrical and electronic fields. To
this end and in addition to other activities, IEC publishes International Standards, Technical Specifications,
Technical Reports, Publicly Available Specifications (PAS) and Guides (hereafter referred to as “IEC
Publication(s)”). Their preparation is entrusted to technical committees; any IEC National Committee interested
in the subject dealt with may participate in this preparatory work. International, governmental and non-
governmental organizations liaising with the IEC also participate in this preparation. IEC collaborates closely
with the International Organization for Standardization (ISO) in accordance with conditions determined by
agreement between the two organizations.
2) The formal decisions or agreements of IEC on technical matters express, as nearly as possible, an international
consensus of opinion on the relevant subjects since each technical committee has representation from all
interested IEC National Committees.
3) IEC Publications have the form of recommendations for international use and are accepted by IEC National
Committees in that sense. While all reasonable efforts are made to ensure that the technical content of IEC
Publications is accurate, IEC cannot be held responsible for the way in which they are used or for any
misinterpretation by any end user.
4) In order to promote international uniformity, IEC National Committees undertake to apply IEC Publications
transparently to the maximum extent possible in their national and regional publications. Any divergence
between any IEC Publication and the corresponding national or regional publication shall be clearly indicated in
the latter.
5) IEC itself does not provide any attestation of conformity. Independent certification bodies provide conformity
assessment services and, in some areas, access to IEC marks of conformity. IEC is not responsible for any
services carried out by independent certification bodies.
6) All users should ensure that they have the latest edition of this publication.
7) No liability shall attach to IEC or its directors, employees, servants or agents including individual experts and
members of its technical committees and IEC National Committees for any personal injury, property damage or
other damage of any nature whatsoever, whether direct or indirect, or for costs (including legal fees) and
expenses arising out of the publication, use of, or reliance upon, this IEC Publication or any other IEC
Publications.
8) Attention is drawn to the Normative references cited in this publication. Use of the referenced publications is
indispensable for the correct application of this publication.
9) Attention is drawn to the possibility that some of the elements of this IEC Publication may be the subject of
patent rights. IEC shall not be held responsible for identifying any or all such patent rights.
International Standard IEC 62680-1-7 has been prepared by technical area 18: Multimedia
home systems and applications for end-user networks, of IEC technical committee 100: Audio,
video and multimedia systems and equipment.
The text of this standard was prepared by the USB Implementers Forum (USB-IF). The
structure and editorial rules used in this publication reflect the practice of the organization
which submitted it.
The text of this International Standard is based on the following documents:
CDV Report on voting
100/3159/CDV 100/3229/RVC
Full information on the voting for the approval of this International Standard can be found in
the report on voting indicated in the above table.
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 63 268– 0-1-7:2019 – 3 –
© USB-IF:1997-2016
The committee has decided that the contents of this document will remain unchanged until the
stability date indicated on the IEC website under "http://webstore.iec.ch" in the data related to
the specific document. At this date, the document will be
• reconfirmed,
• withdrawn,
• replaced by a revised edition, or
• amended.
IMPORTANT – The 'colour inside' logo on the cover page of this publication indicates
that it contains colours which are considered to be useful for the correct
understanding of its contents. Users should therefore print this document using a
colour printer.
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 4 – IEC 62680-1-7–: 2014 – 9
© USB-IF:1997-2016
INTRODUCTION
The IEC 62680 series is based on a series of specifications that were originally developed by the USB
Implementers Forum (USB-IF). These specifications were submitted to the IEC under the auspices of
a special agreement between the IEC and the USB-IF.
This standard is the USB-IF publication USB Device Class Definition for Audio Data Formats Release
3.0.
The USB Implementers Forum, Inc.(USB-IF) is a non-profit corporation founded by the group of
companies that developed the Universal Serial Bus specification. The USB-IF was formed to provide a
support organization and forum for the advancement and adoption of Universal Serial Bus technology.
The Forum facilitates the development of high-quality compatible USB peripherals (devices), and
promotes the benefits of USB and the quality of products that have passed compliance testing.
ANY USB SPECIFICATIONS ARE PROVIDED TO YOU "AS IS, "WITH NO WARRANTIES
WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NON-INFRINGEMENT,
OR FITNESS FOR ANY PARTICULAR PURPOSE. THE USB IMPLEMENTERS FORUM AND THE
AUTHORS OF ANY USB SPECIFICATIONS DISCLAIM ALL LIABILITY, INCLUDING LIABILITY
FOR INFRINGEMENT OF ANY PROPRIETARY RIGHTS, RELATING TO USE OR
IMPLEMENTATION OR INFORMATION IN THIS SPECIFICAITON.
THE PROVISION OF ANY USB SPECIFICATIONS TO YOU DOES NOT PROVIDE YOU WITH ANY
LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL
PROPERTY RIGHTS.
Entering into USB Adopters Agreements may, however, allow a signing company to participate in a
reciprocal, RAND-Z licensing arrangement for compliant products. For more information, please see:
https://www.usb.org/documents
IEC DOES NOT TAKE ANY POSITION AS TO WHETHER IT IS ADVISABLE FOR YOU TO ENTER
INTO ANY USB ADOPTERS AGREEMENTS OR TO PARTICIPATE IN THE USB IMPLEMENTERS
FORUM.”
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 65 268– 0-1-7:2019 – 5 –
© USB-IF:1997-2016
UNIVERSAL SERIAL BUS
DEVICE CLASS DEFINITION
FOR
AUDIO DATA FORMATS
Release 3.0
September 22, 2016
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 6 – IEC 62680-1-7–: 2016 – 9
© USB-IF:1997-2016
SCOPE OF THIS RELEASE
This document is the Release 3.0 of this device class definition.
CONTRIBUTORS
Joe Scanlon Advanced Micro Devices
Rhoads Hollowell Apple Inc.
Girault Jones Apple Inc.
Matthew X. Mora Apple Inc.
Tzung-Dar Tsai C-Media Electronics, Inc.
Brad Lambert Cirrus Logic, Inc.
Dan Bogard Conexant Systems, Inc.
Pete Burgers DisplayLink (UK), Ltd.
David Roh Dolby Laboratories, Inc.
Leng Ooi Google, Inc.
Pierre-Louis Bossart Intel Corporation
David Hines Intel Corporation
Abdul Rahman Ismail (Co-Chair) Intel Corporation
Devon Worrell Intel Corporation
Chandrashekhar Rao Logitech, Inc.
Terry Moore MCCI Corporation
Alex Lin MediaTek, Inc.
Bala Sivakumar Microsoft Corporation
Geert Knapen (Co-Chair & Editor) NXP Semiconductors
PL Mobile Audio
411 E. Plumeria drive
San Jose, CA 95134, USA
E-mail: geert.knapen@nxp.com
James Goel Qualcomm, Inc.
Andre Schevciw Qualcomm, Inc.
Jin-Sheng Wang Qualcomm, Inc.
Morten Christiansen Synopsys
REVISION HISTORY
Revision Date Filename Description
1.0 Mar. 18, 98 Frmts10.pdf Release 1.0
2.0 May. 31, 06 Frmts20 final.pdf Release 2.0
3.0 Sep. 22, 16 Frmts30.pdf Release 3.0

Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 67 268– 0-1-7:2019 – 7 –
© USB-IF:1997-2016
Copyright © 1997-2016 USB Implementers Forum, Inc.
All rights reserved.
INTELLECTUAL PROPERTY DISCLAIMER
A LICENSE IS HEREBY GRANTED TO REPRODUCE THIS SPECIFICATION FOR INTERNAL USE ONLY. NO OTHER
LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, IS GRANTED OR INTENDED HEREBY.
USB-IF AND THE AUTHORS OF THIS SPECIFICATION EXPRESSLY DISCLAIM ALL LIABILITY FOR INFRINGEMENT OF
INTELLECTUAL PROPERTY RIGHTS RELATING TO IMPLEMENTATION OF INFORMATION IN THIS SPECIFICATION.
USB-IF AND THE AUTHORS OF THIS SPECIFICATION ALSO DO NOT WARRANT OR REPRESENT THAT SUCH
IMPLEMENTATION(S) WILL NOT INFRINGE THE INTELLECTUAL PROPERTY RIGHTS OF OTHERS.
THIS SPECIFICATION IS PROVIDED “AS IS” AND WITH NO WARRANTIES, EXPRESS OR IMPLIED, STATUTORY OR
OTHERWISE. ALL WARRANTIES ARE EXPRESSLY DISCLAIMED. USB-IF, ITS MEMBERS AND THE AUTHORS OF THIS
SPECIFICATION PROVIDE NO WARRANTY OF MERCHANTABILITY, NO WARRANTY OF NON-INFRINGEMENT, NO
WARRANTY OF FITNESS FOR ANY PARTICULAR PURPOSE, AND NO WARRANTY ARISING OUT OF ANY PROPOSAL,
SPECIFICATION, OR SAMPLE.
IN NO EVENT WILL USB-IF, MEMBERS OR THE AUTHORS BE LIABLE TO ANOTHER FOR THE COST OF PROCURING
SUBSTITUTE GOODS OR SERVICES, LOST PROFITS, LOSS OF USE, LOSS OF DATA OR ANY INCIDENTAL,
CONSEQUENTIAL, INDIRECT, OR SPECIAL DAMAGES, WHETHER UNDER CONTRACT, TORT, WARRANTY, OR
OTHERWISE, ARISING IN ANY WAY OUT OF THE USE OF THIS SPECIFICATION, WHETHER OR NOT SUCH PARTY
HAD ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES.
NOTE: VARIOUS USB-IF MEMBERS PARTICIPATED IN THE DRAFTING OF THIS SPECIFICATION. CERTAIN OF
THESE MEMBERS MAY HAVE DECLINED TO ENTER INTO A SPECIFIC AGREEMENT LICENSING INTELLECTUAL
PROPERTY RIGHTS THAT MAY BE INFRINGED IN THE IMPLEMENTATION OF THIS SPECIFICATION. PERSONS
IMPLEMENT THIS SPECIFICATION AT THEIR OWN RISK.
Dolby™, AC-3™, Pro Logic™ and Dolby Surround™ are trademarks of Dolby Laboratories, Inc.
All other product names are trademarks, registered trademarks, or service marks of their respective owners.
Please send comments via electronic mail to audio-chair@usb.org

Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 8 – IEC 62680-1-7–: 2018 – 9
© USB-IF:1997-2016
TABLE OF CONTENTS
Scope of This Release . 6
Contributors . 6
Revision History . 6
Table of Contents . 8
List of Tables . 9
List of Figures . 10
1 Introduction . 10
1.1 Related Documents . 11
1.2 Terms and Abbreviations . 11
2 Audio Data Formats . 13
2.1 Transfer Delimiter . 14
2.2 Service Interval and Service Interval Packet Definitions . 14
2.3 Simple Audio Data Formats . 14
2.3.1 Type I Formats . 14
2.3.2 Type III Formats . 18
2.3.3 Type IV Formats . 19
2.4 Extended Audio Data Formats . 19
2.4.1 Extended Type I Formats . 20
2.4.2 Extended Type III Formats . 21
2.5 Class-specific AS Interface Descriptor . 21
3 Auxiliary Protocols . 23
3.1 HDCP Protocol . 23
4 Adding New Audio Data Formats . 24
5 Adding New Side Band Protocols . 25
Appendix A. Additional Audio Device Class Codes . 26
A.1 Audio Data Formats Bit Allocations . 26
A.2 SubHeader Codes . 27
A.3 Audio Format General Constants . 27

Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 69 268– 0-1-7:2019 – 9 –
© USB-IF:1997-2016
LIST OF TABLES
Table 2-1: Packetization . 16
Table 2-2: SIPDescriptor Layout . 20
Table 2-3: Class-Specific AS Interface Descriptor . 22
Table 3-1: HDCP SubHeader Layout . 23
Table A-2: Audio Data Formats Bit Allocations in the bmFormats Field and Usage . 26
Table A-3: SubHeader Codes . 27
Table A-4: General Constants. 27

Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 10 – IEC 62680-1–-7 10:201 – 9
© USB-IF:1997-2016
LIST OF FIGURES
Figure 2-1: Type I Audio Stream . 13
Figure 2-2: Extended Type I Format . 21
Figure 2-3: Extended Type III Format . 21
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 611268 – 0-1-7:2019 – 11 –
© USB-IF:1997-2016
1 INTRODUCTION
The intention of this document is to describe in detail all the Audio Data Formats that are supported by the
Audio Device Class. This document is considered an integral part of the Audio Device Class Specification,
although subsequent revisions of this document are independent of the revision evolution of the main USB
Audio Specification. This is to easily accommodate the addition of new Audio Data Formats without impeding
the core USB Audio Specification.
1.1 RELATED DOCUMENTS
• Universal Serial Bus Specification, Revision 2.0 (referred to in this document as the USB Specification). In
particular, see Chapter 5, “USB Data Flow Model” and Chapter 9, “USB Device Framework.”
• Universal Serial Bus Device Class Definition for Audio Devices (referred to in this document as USB Audio
Device Class).
• Universal Serial Bus Device Class Definition for Terminal Types (referred to in this document as USB Audio
Terminal Types).
• ANSI S1.11-1986 standard.
• MPEG-1 standard ISO/IEC 111172-3 1993. (available from http://www.iso.ch )
• MPEG-2 standard ISO/IEC 13818-3 Feb. 20, 1997. (available from http://www.iso.ch)
• Digital Audio Compression Standard (AC-3), ATSC A/52A Aug. 20, 2001. (available from
http://www.atsc.org )
• Windows Media Audio (WMA) specification. (available from http://www.microsoft.com)
• ANSI/IEEE-754 floating-point standard.
• ISO/IEC 60958 International Standard: Digital Audio Interface and Annexes.
• ISO/IEC 61937 standard.
• ITU G.711 standard.
• ETSI Specification TS 102 114, “DTS Coherent Acoustics; Core and Extensions”. (Available from
http://webapp.etsi.org/action%5CPU/20020827/ts_102114v010101p.pdf)

1.2 TERMS AND ABBREVIATIONS
This section defines terms used throughout this document. For additional terms that pertain to the Universal
Serial Bus, see Chapter 2, “Terms and Abbreviations,” in the USB Specification.
AC-3 Audio compression standard from Dolby Labs.
Audio Slot A collection of audio subslots, each containing a PCM audio
sample of a different physical audio channel, taken at the same
moment in time.
Audio Stream A concatenation of a potentially very large number of audio
slots ordered according to ascending time.
Audio Subslot Holds a single PCM audio sample.
DTS Acronym for Digital Theater Systems.
DVD Acronym for Digital Versatile Disc.
Encoded Audio Bit Stream A concatenation of a potentially very large number of encoded
audio frames, ordered according to ascending time.
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 12 – IEC 62680-1–-7 12:201 – 9
© USB-IF:1997-2016
Encoded Audio Frame A sequence of bits that contains an encoded representation of
audio samples from one or more physical audio channels taken
over a fixed period of time.
MPEG Acronym for Moving Pictures Expert Group.
PCM Acronym for Pulse Coded Modulation.
Service Interval A grouping of USB (micro)frames or Bus Intervals that are
related.
Service Interval Packet A packet that contains all the audio slots that are transferred
over the bus during a Service Interval.
Transfer Delimiter A unique token that indicates an interruption in an isochronous
data packet stream. Can be either a zero-length data packet or
the absence of an isochronous transfer in a certain USB frame.
WMA Acronym for Windows Media Audio.

Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 613268 – 0-1-7:2019 – 13 –
© USB-IF:1997-2016
2 AUDIO DATA FORMATS
Audio Data formats can be divided in two main groups:
• Simple Audio Data Formats
• Extended Audio Data Formats
Simple Audio Data Formats can then be subdivided into three groups according to type.
The first group, Type I, deals with audio data streams that are transmitted over USB and are constructed on a
sample-by-sample basis. Each audio sample is represented by a single independent symbol, contained in an
audio subslot. Different compression schemes may be used to transform the audio samples into symbols.
Note: This is different from encoding. Compression is considered to take place on a per-audio-sample
base. Each audio sample generates one symbol (e.g. A-law compression where a 16-bit audio
sample is compressed into an 8-bit symbol).
If multiple physical audio channels are formatted into a single audio channel cluster, then samples at time x of
subsequent channels are first contained into audio subslots. These audio subslots are then interleaved,
according to the cluster channel ordering as described in the main USB Audio Specification, and then grouped
into an audio slot. The audio samples, taken at time x+1, are interleaved in the same fashion to generate the
next audio slot and so on. The notion of physical channels is explicitly preserved during transmission. A typical
example of Type I formats is the standard PCM audio data. The following figure illustrates the concept.
Figure 2-1: Type I Audio Stream

The second group, Type III, contains Audio Data Formats that use encapsulation as described in the ISO/IEC
61937 standard before being sent over USB. One or more non-PCM encoded audio data streams are packed
into “pseudo-stereo samples” and transmitted as if they were real stereo PCM audio samples. The sampling
frequency of these pseudo samples (transport sampling frequency, as reported by the Clock Frequency Control
of the associated Clock Source Entity) either matches the sampling frequency of the original non-encoded PCM
audio data streams (native sampling frequency) or there is an integer ratio relationship between them.
Therefore, clock recovery at the receiving end is relatively easy. The drawback is that unless multiple non-PCM
encoded streams are packed into one pseudo stereo stream, more bandwidth than necessary is consumed.
The third group, Type IV, deals with audio streams that are not transmitted over USB. Instead, they interface
with the Audio Function through an AudioStreaming interface that does not contain a USB isochronous IN or
OUT endpoint. These streams typically connect via a digital interface like S/PDIF (or some other means of
connectivity) and may require interaction from the Host before they enter or leave the Audio Function. A
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 14 – IEC 62680-1–-7 14:201 – 9
© USB-IF:1997-2016
typical example is an external S/PDIF connector that can accept an AC-3 encoded audio stream. This stream is
first processed by an AC-3 decoder before the (decoded) logical audio channels enter the Audio Function
through the Input Terminal that represents this S/PDIF connection.
In addition to the Simple Audio Data Formats described above, Extended Audio Data Formats are defined.
These are based on the Simple Audio Data Formats Type I and III definitions but they provide an optional
packet header and for the Extended Audio Data Format Type I, an optional synchronous (i.e. sample accurate)
control channel. Type IV Audio Data Formats do not have an Extended Audio Data Format definition.
The following sections explain the different Audio Data Formats and Format Types in more detail.
2.1 TRANSFER DELIMITER
Isochronous data streams are continuous in nature, although the actual number of bytes sent per packet may
vary throughout the lifetime of the stream (for rate adaptation purposes for instance). To indicate a temporary
stop in the isochronous data stream without closing the pipe (and thus relinquishing the USB bandwidth), an in-
band Transfer Delimiter needs to be defined. This specification considers two situations to be a Transfer
Delimiter. The first is a zero-length data packet and the second is the absence of an isochronous transfer in a
USB (micro)frame that would normally have an isochronous transfer. Both situations are considered equivalent
and the Audio Function is expected to behave the same. However, the second type consumes less isochronous
USB bandwidth (i.e. zero bandwidth). In both cases, this specification considers a Transfer Delimiter to be an
entity that can be sent over the USB.
2.2 SERVICE INTERVAL AND SERVICE INTERVAL PACKET DEFINITIONS
Note: The USB Audio 2.0 Specification used the terms Virtual Frame and Virtual Frame Packet to
describe the same concepts that were called Service Interval and Service Interval Packet in the
USB 3.0 and higher Specifications. In this specification, the terminology has been consolidated to
use the more general terms of Service Interval and Service Interval Packet instead of the USB
Audio 2.0-specific terminology of Virtual Frame and Virtual Frame Packet. Also, the term Bus
Interval is used instead of USB (micro)frame, wherever applicable.
In the following paragraphs, the packetizing process for audio is described in terms of Service Interval and
Service Interval Packets. This provides a consistent model of ‘one Service Interval Packet (SIP) per Service
Interval (SI)’, irrespective of the actual transactions on the USB and the version of USB used.
A Service Interval is defined as:
(𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏−1)
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐼𝐼𝐼𝐼𝐼𝐼𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼 =𝐵𝐵𝐵𝐵𝐵𝐵 𝐼𝐼𝐼𝐼𝐼𝐼𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼∗ 2
where Bus Interval has a value of 1 ms for full-speed isochronous endpoints and 125 µs for high-speed,
SuperSpeed, and Enhanced SuperSpeed isochronous endpoints and where bInterval is the value specified in the
bInterval field of the standard endpoint descriptor.
A Service Interval Packet is defined as the amount of isochronous data that is transported during an entire
Service Interval. For high-speed high-bandwidth endpoints, the Service Interval Packet is the concatenation of
the two or three physical packets that are transferred over the bus in a Bus Interval.
Note: The USB Specification already considers the 2 or 3 transactions of a high-speed high-bandwidth transfer
to be part of a single packet.
2.3 SIMPLE AUDIO DATA FORMATS
2.3.1 TYPE I FORMATS
The following sections describe the Audio Data Formats that belong to Type I. A number of terms and their
definition are presented.
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 615268 – 0-1-7:2019 – 15 –
© USB-IF:1997-2016
2.3.1.1 USB PACKETS
Audio data streams that are inherently continuous shall be packetized when sent over the USB. The quality of
the packetizing algorithm directly influences the amount of effort needed to reconstruct a reliable sample clock
at the receiving side.
Furthermore, the chosen size of the Service Interval has a direct impact on the amount of buffer memory
needed on both sides of the pipe and also on the incurred latency over the pipe. Shorter Service Intervals
minimize buffer requirements and therefore also latency at the potential expense of higher power
consumption. Indeed, longer Service Intervals potentially allow the bus (and parts of the sender and receiver’s
hardware) to enter lower power states for longer periods of time, thus conserving more power.
This specification defines two possible modes to transport audio data streams over the USB:
• Continuous Mode
• Burst Mode
Continuous Mode occurs when the Service Interval is smaller or equal to one USB Frame time of 1 ms. This
mode minimizes buffer requirements and latency at the potential expense of higher power consumption. Note
however that choosing a Service Interval that is smaller than one USB Frame time may result in excessive
system level interrupts.
Burst Mode occurs when the Service Interval is larger than one USB Frame time. This mode provides for
opportunities to save more power by allowing various system components to enter low power states for
extended periods of time at the expense of larger buffer sizes and increased latency.
Devices may choose to expose multiple Alternate Settings of their AudioStreaming interface(s) with different
Service Interval settings for each Alternate Setting, thus allowing the Host to choose a setting that best fits the
desired use case. However, all devices shall expose at least one Alternate Setting (besides the zero bandwidth
Alternate setting 0) that supports Continuous Mode (Service Interval <= 1 ms).
2.3.1.1.1 SERVICE INTERVAL PACKET SIZE CALCULATION
The goal shall be to keep the instantaneous number of audio slots per SI, ni as close as possible to the average
number of audio slots per SI, n . The average n shall be calculated as follows:
av av
𝑇𝑇
𝑆𝑆𝑏𝑏
𝐼𝐼 =
𝑏𝑏𝑏𝑏
∆𝐼𝐼
where 𝑇𝑇 is the duration of an SI and ∆𝐼𝐼 is the sample time (1/𝐹𝐹 ). In most cases, 𝐼𝐼 will be a number with a
𝑆𝑆𝑏𝑏 𝑠𝑠 𝑏𝑏𝑏𝑏
fractional part.
If the sampling rate is a constant, the allowable variation on ni is limited to one audio slot, that is, ∆ni = 1. This
implies that all SIPs shall either contain 𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 ) audio slots (small SIP) or 𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 ) + 1 (large SIP) audio
𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏
slots. For all 𝑆𝑆:
𝐼𝐼 =𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 )|𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 ) + 1
𝑖𝑖 𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏
Note: In the case where 𝐼𝐼 =𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 ), 𝐼𝐼 may vary between 𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 )− 1 (small SIP), 𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 )
𝑏𝑏v 𝑏𝑏𝑏𝑏 𝑖𝑖 𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏
(medium SIP) and 𝐼𝐼𝐼𝐼𝑇𝑇(𝐼𝐼 ) + 1 (large SIP).
𝑏𝑏𝑏𝑏
Furthermore, a large SIP shall be generated as soon as it becomes available. Typically, a source will generate a
number of small SIPs as long as the accumulated fractional part of 𝐼𝐼 remains < 1. Once the accumulated
𝑏𝑏𝑏𝑏
fractional part of 𝐼𝐼 becomes ≥ 1, the source shall send a large SIP and decrement the accumulator by 1.
𝑏𝑏𝑏𝑏
Due to possible different notions of time in the source and the sink (they might each have their own
independent sampling clock), the (small SIP)/(large SIP) pattern generated by the source may be different from
what the sink expects. Therefore, the sink shall be capable to accept a large SIP at all times.
Example:
Assume 𝐹𝐹 = 44,100 Hz and 𝑇𝑇 = 1ms. Then 𝐼𝐼 = 44.1 audio slots. Since the source can only send an integer
𝑆𝑆 𝑉𝑉𝑉𝑉 𝑏𝑏𝑏𝑏
number of audio slots per SI, it will send small SIPs of 44 audio slots. Each SI, it therefore sends ‘0.1 slot’ too
few and it will accumulate this fractional part in an accumulator. After having sent 9 small SIPs of 44 audio
slots, at the tenth SI it will have exactly one audio slot in excess and therefore can send a large SIP containing
45 audio slots. Decrementing the accumulator by 1 brings it back to 0 and the process can start all over again.
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 16 – IEC 62680-1–-7 16:201 – 9
© USB-IF:1997-2016
The source will thus produce a repetitive pattern of 9 small SIPs of 44 audio slots followed by 1 large SIP of 45
audio slots. The following table illustrates the process:
Table 2-1: Packetization
#SI 𝒏𝒏 𝒏𝒏 Fraction Accumulator
𝒂𝒂𝒂𝒂 𝒊𝒊
n 44.1 44 0.1 0.1
n+1 44.1 44 0.1 0.2
n+2 44.1 44 0.1 0.3
n+3 44.1 44 0.1 0.4
n+4 44.1 44 0.1 0.5
n+5 44.1 44 0.1 0.6
n+6 44.1 44 0.1 0.7
n+7 44.1 44 0.1 0.8
n+8 44.1 44 0.1 0.9
n+9 44.1 45 0.1 1.0 -> 0
n+10 44.1 44 0.1 0.1
n+11 44.1 44 0.1 0.2
… … … … …
2.3.1.2 PITCH CONTROL
If the sampling rate can be varied (to implement pitch control), the allowable variation on n is limited to one
i
audio slot per SI. For all 𝑆𝑆:
𝐼𝐼 =𝐼𝐼 |𝐼𝐼 ± 1
𝑖𝑖+1 𝑖𝑖 𝑖𝑖
Pitch control is restricted to adaptive endpoints only. AudioStreaming interfaces that support pitch control on
their isochronous endpoint are required to report this in the class-specific endpoint descriptor. In addition, a
Set/Get Pitch Control request is required to enable or disable the pitch control functionality.
2.3.1.3 AUDIO SUBSLOT
The basic structure used to represent audio data is the audio subslot. An audio subslot holds a single audio
sample. An audio subslot always contains an integer number of bytes.
This specification limits the possible audio subslot sizes (bSubslotSize) to 1, 2, 3, 4, or 8 bytes per audio subslot.
An audio sample is represented using a number of bits (bBitResolution) less than or equal to the total number
of bits available in the audio subslot, i.e. bBitResolution ≤ bSubslotSize*8.
AudioStreaming endpoints shall be constructed in such a way that a valid transfer can take place as long as the
reported audio subslot size (bSubslotSize) is respected during transmission. If the reported bits per sample
(bBitResolution) do not correspond with the number of significant bits actually used during transfer, the device
will either discard trailing significant bits ([actual_bits_per_sample] > bBitResolution) or interpret trailing
zeroes as significant bits ([actual_bits_per_sample] < bBitResolution).
2.3.1.4 AUDIO SLOT
An audio slot consists of a collection of audio subslots, each containing an audio sample of a different physical
audio channel, taken at the same moment in time. The number of audio subslots in an audio slot equals the
number of logical audio channels in the audio channel cluster. The ordering of the audio subslots in the audio
slot obeys the rules set forth in the USB Audio Specification. All audio subslots shall have the same audio
subslot size.
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

IEC– 617268 – 0-1-7:2019 – 17 –
© USB-IF:1997-2016
2.3.1.5 AUDIO STREAMS
An audio stream is a concatenation of a potentially very large number of audio slots, ordered according to
ascending time. Streams are packetized when transported over USB whereby SIPs can only contain an integer
number of audio slots. Each packet always starts with the same channel, and the channel order is respected
throughout the entire transmission. If, for any reason, there are no audio slots available to construct a SIP, a
Transfer Delimiter shall be sent instead.
2.3.1.6 TYPE I SUPPORTED FORMATS
The following paragraphs list all currently supported Type I Audio Data Formats. The bit allocations in the
bmFormats field of the class-specific AS interface descriptor for the different Type I Audio Data Formats can be
found in Appendix A.1, “Audio Data Formats Bit Allocations.” Note that an Alternate Setting of an
AudioStreaming interface can only support one Type I Audio Data Format. Consequently, only one Type I Audio
Data Format bit shall be set in the bmFormats field of the class-specific AS interface descriptor. It is allowed to
support one or more Type III Audio Data Formats in the same AudioStreaming interface if the interface is able
to independently distinguish between the Type I Audio Data Format and any other Type III Audio Data Format.
2.3.1.6.1 PCM FORMAT
The PCM (Pulse Coded Modulation) format is the most commonly used audio format to represent audio data
streams. The audio data is not compressed and uses a signed two’s-complement fixed point format. It is left-
justified (the sign bit is the Msb) and data is padded with trailing zeroes to fill the remaining unused bits of the
subslot. The binary point is located to the right of the sign bit so that all values lie within the range [-1, +1).
2.3.1.6.2 PCM8 FORMAT
The PCM8 format is introduced to be compatible with the legacy 8-bit wave format. Audio data is
uncompressed and uses 8 bits per sample (bBitResolution = 8). In this case, data is unsigned fixed-point, left-
justified in the audio subslot, Msb first. The range is [0,255].
2.3.1.6.3 IEEE_FLOAT FORMAT
The IEEE_FLOAT format is based on the ANSI/IEEE-754 floating-point standard. Audio data is represented using
the basic single-precision format. The basic single-precision number is 32 bits wide and has an 8-bit exponent
and a 24-bit mantissa. Both mantissa and exponent are signed numbers, but neither is represented in two's-
complement format. The mantissa is stored in sign magnitude format and the exponent in biased form (also
called excess-n form). In biased form, there is a positive integer (called the bias) which is subtracted from the
stored number to get the actual number. For example, in an eight-bit exponent, the bias is 127. To represent 0,
the number 127 is stored. To represent -100, 27 is stored. An exponent of all zeroes and an exponent of all
ones are both reserved for special cases, so in an eight-bit field, exponents of -126 to +127 are possible. In the
basic floating-point format, the mantissa is assumed to be normalized so that the most significant bit is always
one, and therefore is not stored. Only the fractional part is stored. Denormalized (exponent = 0) values are
considered to be zero.
The 32-bit IEEE-754 floating-point word is broken into three fields. The most significant bit stores the sign of
the mantissa, the next group of 8 bits stores the exponent in biased form, and the remaining 23 bits store the
magnitude of the fractional portion of the mantissa. For further information, refer to the ANSI/IEEE-754
standard.
The data is conveyed over USB using 32 bits per sample (bBitResolution = 32; bSubslotSize = 4).
2.3.1.6.4 ALAW FORMAT AND µLAW FORMAT
Starting from 12- or 16-bits linear PCM samples, simple compression down to 8-bits per sample (one byte per
sample) can be achieved by using logarithmic companding. The compressed audio data uses 8 bits per sample
(bBitsPerSample = 8). Data is signed fixed point, left-justified in the subslot, Msb first. The compressed range is
[-128,128]. The difference between Alaw and µLaw compression lies in the formulae used to achieve the
compression. Refer to the ITU G.711 standard for further details.
Copyright © 1997-2016 USB Implementers Forum, Inc. All rights reserved.

– 18 – IEC 62680-1–-7 18:201 – 9
© USB-IF:1997-2016
2.3.1.6.5 DSD FORMAT
The Direct-Stream Digital (DSD) format uses pulse-density modulation encoding—a technology primarily used
to store audio signals on SACD (Super Audio CD) digital storage media.
Audio data is stored as single-bit delta-sigma modulated digital audio; i.e. a sequence of single-bit values at a
sampling rate of 2.8224 MHz (64 times the CD audio sampling rate of 44.1 kHz) for basic sampling rate DSD64,
5.6448 MHz for DSD128 (2X-rate DSD), 11.2896 MHz for DSD256 (4X-rate DSD), 22.5792 MHz for DSD 512 (8X-
rate DSD), and 45.1584 MHz for DSD1024 (16X-rate DSD). 48 kHz-based DSD streams are also in existence. In
that case, the bitstream sampling rates are 3.072 MHz, 6.144 MHz, 12.288 MHz, 24.576 MHz, and 49.152 MHz
respectively.
No matter what sampling rate the DSD stream uses, the audio subslot size is fixed to 64 bits (bBitResolution =
64; bSubslotSize = 8) so that, at the transport layer, the DSD stream always looks like 64-bit PCM data.
Therefore, the transport sampling rate of the DSD stream, packetized as 64-bit PCM samples, is always 1/64 of
the DSD sampling rate and the Clock Source
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...