SIST ISO/IEC 13818-2:2005
(Main)Information technology - Generic coding of moving pictures and associated audio information: Video
Information technology - Generic coding of moving pictures and associated audio information: Video
Technologies de l'information - Codage générique des images animées et du son associé: Données vidéo
Informacijska tehnologija - Splošno kodiranje gibljivih slik in pripadajočih avdio informacij: Video
To priporočilo / mednarodni standard določa kodirano predstavitev slikovnih informacij za digitalne medije za shranjevanje in digitalno video komunikacijo ter določa proces dekodiranja. Prestavitev podpira stalno oddajanje z bitno hitrostjo, oddajanje z spremenljivo bitno hitrostjo, neposreden dostop, preskakovanje kanalov, nadgradljivo dekodiranje, urejevanje bitnega toka, ter tudi posebne funkcije, kot je na primer hitro predvajanje, hitro predvajanje nazaj, počasno predvajanje, pavza in zamrznitev slik. To priporočilo / mednarodni standard je nadalje združljiv z ISO/IEC 11172-2 in kompatibilen navzgor ali navzdol z EDTV, HDTV in SDTV formati. To priporočilo / mednarodni standard velja predvsem za digitalne medije za shranjevanje, video prenos in komunikacijo. Medij za shranjevanje je lahko neposredno priključen na dekoder ali pa preko načinov komunikacije kot so vodila, LAN, ali telekomunikacijske povezave.
General Information
- Status
- Withdrawn
- Publication Date
- 30-Nov-2005
- Withdrawal Date
- 24-Jul-2018
- Technical Committee
- ITC - Information technology
- Current Stage
- 9900 - Withdrawal (Adopted Project)
- Start Date
- 25-Jul-2018
- Due Date
- 17-Aug-2018
- Completion Date
- 25-Jul-2018
Relations
- Effective Date
- 01-Sep-2018
- Effective Date
- 06-Jun-2022
- Effective Date
- 06-Jun-2022
- Effective Date
- 06-Jun-2022
- Effective Date
- 06-Jun-2022
- Effective Date
- 01-Oct-2010
- Effective Date
- 01-Oct-2010
- Effective Date
- 01-Oct-2010
ISO/IEC 13818-2:2000 - Information technology -- Generic coding of moving pictures and associated audio information: Video
ISO/IEC 13818-2:2000 - Technologies de l'information -- Codage générique des images animées et du son associé: Données vidéo
ISO/IEC 13818-2:2000 - Technologies de l'information -- Codage générique des images animées et du son associé: Données vidéo
Frequently Asked Questions
SIST ISO/IEC 13818-2:2005 is a standard published by the Slovenian Institute for Standardization (SIST). Its full title is "Information technology - Generic coding of moving pictures and associated audio information: Video". This standard covers: Information technology - Generic coding of moving pictures and associated audio information: Video
Information technology - Generic coding of moving pictures and associated audio information: Video
SIST ISO/IEC 13818-2:2005 is classified under the following ICS (International Classification for Standards) categories: 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.
SIST ISO/IEC 13818-2:2005 has the following relationships with other standards: It is inter standard links to SIST ISO/IEC 13818-2:2018, SIST ISO/IEC 13818-2:2005/Amd 1:2010, SIST ISO/IEC 13818-2:2005/Amd 3:2010, SIST ISO/IEC 13818-2:2005/Amd 2:2010, SIST ISO/IEC 13818-2:2018; is excused to SIST ISO/IEC 13818-2:2005/Amd 3:2010, SIST ISO/IEC 13818-2:2005/Amd 2:2010, SIST ISO/IEC 13818-2:2005/Amd 1:2010. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase SIST ISO/IEC 13818-2:2005 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of SIST standards.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 13818-2
Second edition
2000-12-15
Information technology — Generic coding
of moving pictures and associated audio
information: Video
Technologies de l'information — Codage générique des images animées et
du son associé: Données vidéo
Reference number
©
ISO/IEC 2000
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not
be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this
file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this
area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters
were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event
that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2000
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic
or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body
in the country of the requester.
ISO copyright office
Case postale 56 � CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.ch
Web www.iso.ch
Printed in Switzerland
ii © ISO/IEC 2000 – All rights reserved
CONTENTS
Page
Intro. 1 Purpose. vi
Intro. 2 Application. vi
Intro. 3 Profiles and levels . vi
Intro. 4 The scalable and the non-scalable syntax . vii
1 Scope . 1
2 Normative references. 1
3 Definitions . 2
4 Abbreviations and symbols. 7
4.1 Arithmetic operators. 7
4.2 Logical operators. 8
4.3 Relational operators. 8
4.4 Bitwise operators. 8
4.5 Assignment. 8
4.6 Mnemonics . 8
4.7 Constants . 9
5 Conventions . 9
5.1 Method of describing bitstream syntax. 9
5.2 Definition of functions . 10
5.3 Reserved, forbidden and marker_bit. 10
5.4 Arithmetic precision . 11
6 Video bitstream syntax and semantics. 11
6.1 Structure of coded video data . 11
6.2 Video bitstream syntax . 21
6.3 Video bitstream semantics. 36
7 The video decoding process. 61
7.1 Higher syntactic structures . 61
7.2 Variable length decoding. 62
7.3 Inverse scan . 64
7.4 Inverse quantisation. 66
7.5 Inverse DCT . 69
7.6 Motion compensation . 69
7.7 Spatial scalability. 83
7.8 SNR scalability. 92
7.9 Temporal scalability . 99
7.10 Data partitioning. 102
7.11 Hybrid scalability . 103
7.12 Output of the decoding process . 104
8 Profiles and levels. 106
8.1 ISO/IEC 11172-2 compatibility. 109
8.2 Relationship between defined profiles. 109
8.3 Relationship between defined levels . 111
8.4 Scalable layers. 111
8.5 Parameter values for defined profiles, levels and layers. 114
8.6 Compatibility requirements on decoders. 115
9 Registration of Copyright Identifiers . 117
9.1 General . 117
9.2 Implementation of a Registration Authority (RA). 118
© ISO/IEC 2000 – All rights reserved iii
Page
Annex A – Inverse discrete transform . 119
Annex B – Variable length code tables. 121
B.1 Macroblock addressing. 121
B.2 Macroblock type. 122
B.3 Macroblock pattern. 127
B.4 Motion vectors. 128
B.5 DCT coefficients . 129
Annex C – Video buffering verifier . 138
Annex D – Features supported by the algorithm. 143
D.1 Overview . 143
D.2 Video formats . 143
D.3 Picture quality. 144
D.4 Data rate control . 144
D.5 Low delay mode . 144
D.6 Random access/channel hopping. 145
D.7 Scalability. 145
D.8 Compatibility. 151
D.9 Differences between this Specification and ISO/IEC 11172-2. 151
D.10 Complexity . 154
D.11 Editing encoded bitstreams. 154
D.12 Trick modes. 154
D.13 Error resilience . 155
D.14 Concatenated sequences . 162
Annex E – Profile and level restrictions . 163
E.1 Syntax element restrictions in profiles . 163
E.2 Permissible layer combinations . 175
Annex F – Bibliography. 197
Annex G – Registration Procedure . 198
G.1 Procedure for the request of a Registered Identifier (RID). 198
G.2 Responsibilities of the Registration Authority. 198
G.3 Responsibilities of parties requesting an RID. 198
G.4 Appeal procedure for denied applications . 199
Annex H – Registration Application Form . 200
H.1 Contact information of organization requesting a Registered Identifier (RID) . 200
H.2 Statement of an intention to apply the assigned RID. 200
H.3 Date of intended implementation of the RID. 200
H.4 Authorized representative. 200
H.5 For official use only of the Registration Authority. 200
Annex J – 4:2:2 Profile test results . 202
J.1 Introduction . 202
Annex K – Patents. 207
iv © ISO/IEC 2000 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission)
form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC
participate in the development of International Standards through technical committees established by the
respective organization to deal with particular fields of technical activity. ISO and IEC technical committees
collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in
liaison with ISO and IEC, also take part in the work.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3.
In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting.
Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this part of ISO/IEC 13818 may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
International Standard ISO/IEC 13818-2 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information
technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information,in
collaboration with ITU-T. The identical text is published as ITU-T Rec. H.262.
This second edition cancels and replaces the first edition (ISO/IEC 13818-2:1996), which has been technically
revised.
ISO/IEC 13818 consists of the following parts, under the general title Information technology — Generic coding of
moving pictures and associated audio information:
� Part 1: Systems
� Part 2: Video
� Part 3: Audio
� Part 4: Conformance testing
� Part 5: Software simulation
� Part 6: Extensions for DSM-CC
� Part 7: Advanced Audio Coding (AAC)
� Part 9: Extension for real time interface for systems decoders
� Part 10: Conformance extensions for Digital Storage Media Command and Control (DSM-CC)
Annexes A, B and C form a normative part of this part of ISO/IEC 13818. Annexes D to K are for information only.
© ISO/IEC 2000 – All rights reserved v
Introduction
Intro. 1 Purpose
This Part of this Recommendation | International Standard was developed in response to the growing need for a generic
coding method of moving pictures and of associated sound for various applications such as digital storage media,
television broadcasting and communication. The use of this Specification means that motion video can be manipulated as
a form of computer data and can be stored on various storage media, transmitted and received over existing and future
networks and distributed on existing and future broadcasting channels.
Intro. 2 Application
The applications of this Specification cover, but are not limited to, such areas as listed below:
BSS Broadcasting Satellite Service (to the home)
CATV Cable TV Distribution on optical networks, copper, etc.
CDAD Cable Digital Audio Distribution
DSB Digital Sound Broadcasting (terrestrial and satellite broadcasting)
DTTB Digital Terrestrial Television Broadcasting
EC Electronic Cinema
ENG Electronic News Gathering (including SNG, Satellite News Gathering)
FSS Fixed Satellite Service (e.g. to head ends)
HTT Home Television Theatre
IPC Interpersonal Communications (videoconferencing, videophone, etc.)
ISM Interactive Storage Media (optical disks, etc.)
MMM Multimedia Mailing
NCA News and Current Affairs
NDB Networked Database Services (via ATM, etc.)
RVS Remote Video Surveillance
SSM Serial Storage Media (digital VTR, etc.)
Intro. 3 Profiles and levels
This Specification is intended to be generic in the sense that it serves a wide range of applications, bitrates, resolutions,
qualities and services. Applications should cover, among other things, digital storage media, television broadcasting and
communications. In the course of creating this Specification, various requirements from typical applications have been
considered, necessary algorithmic elements have been developed, and they have been integrated into a single syntax.
Hence, this Specification will facilitate the bitstream interchange among different applications.
Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets
of the syntax are also stipulated by means of "profile" and "level". These and other related terms are formally defined in
clause 3.
A "profile" is a defined subset of the entire bitstream syntax that is defined by this Specification. Within the bounds
imposed by the syntax of a given profile it is still possible to require a very large variation in the performance of
encoders and decoders depending upon the values taken by parameters in the bitstream. For instance, it is possible to
14 14
specify frame sizes as large as (approximately) 2 samples wide by 2 lines high. It is currently neither practical nor
economic to implement a decoder capable of dealing with all possible frame sizes.
In order to deal with this problem, "levels" are defined within each profile. A level is a defined set of constraints imposed
on parameters in the bitstream. These constraints may be simple limits on numbers. Alternatively they may take the form
of constraints on arithmetic combinations of the parameters (e.g. frame width multiplied by frame height multiplied by
frame rate).
Bitstreams complying with this Specification use a common syntax. In order to achieve a subset of the complete syntax,
flags and parameters are included in the bitstream that signal the presence or otherwise of syntactic elements that occur
later in the bitstream. In order to specify constraints on the syntax (and hence define a profile), it is thus only necessary to
constrain the values of these flags and parameters that specify the presence of later syntactic elements.
vi © ISO/IEC 2000 – All rights reserved
Intro. 4 The scalable and the non-scalable syntax
The full syntax can be divided into two major categories: One is the non-scalable syntax, which is structured as a super
set of the syntax defined in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra compression tools
for interlaced video signals. The second is the scalable syntax, the key property of which is to enable the reconstruction
of useful video from pieces of a total bitstream. This is achieved by structuring the total bitstream in two or more layers,
starting from a standalone base layer and adding a number of enhancement layers. The base layer can use the non-
scalable syntax, or in some situations conform to the ISO/IEC 11172-2 syntax.
Intro. 4.1 Overview of the non-scalable syntax
The coded representation defined in the non-scalable syntax achieves a high compression ratio while preserving good
image quality. The algorithm is not lossless as the exact sample values are not preserved during coding. Obtaining good
image quality at the bitrates of interest demands very high compression, which is not achievable with intra picture coding
alone. The need for random access, however, is best satisfied with pure intra picture coding. The choice of the techniques
is based on the need to balance a high image quality and compression ratio with the requirement to make random access
to the coded bitstream.
A number of techniques are used to achieve high compression. The algorithm first uses block-based motion
compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current
picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion vectors
are defined for each 16-sample by 16-line region of the picture. The prediction error, is further compressed using the
Discrete Cosine Transform (DCT) to remove spatial correlation before it is quantised in an irreversible process that
discards the less important information. Finally, the motion vectors are combined with the quantised DCT information,
and encoded using variable length codes.
Intro. 4.1.1 Temporal processing
Because of the conflicting requirements of random access and highly efficient compression, three main picture types are
defined. Intra Coded Pictures (I-Pictures) are coded without reference to other pictures. They provide access points to the
coded sequence where decoding can begin, but are coded with only moderate compression. Predictive Coded Pictures (P-
Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive coded picture
and are generally used as a reference for further prediction. Bidirectionally-predictive Coded Pictures (B-Pictures)
provide the highest degree of compression but require both past and future reference pictures for motion compensation.
Bidirectionally-predictive coded pictures are never used as references for prediction (except in the case that the resulting
picture is used as a reference in a spatially scalable enhancement layer). The organisation of the three picture types in a
sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the application. Figure
Intro. 1 illustrates an example of the relationship among the three different picture types.
Bidirectional Interpolation
B
B P BB
I B P
Prediction
Figure Intro. 1 – Example of temporal picture structure
FIGURE Intro. 1/H.262.[D01] = 8 CM
© ISO/IEC 2000 – All rights reserved vii
Intro. 4.1.2 Coding interlaced video
Each frame of interlaced video consists of two fields which are separated by one field-period. The Specification allows
either the frame to be encoded as picture or the two fields to be encoded as two pictures. Frame encoding or field
encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video
scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the
first, works better when there is fast movement.
Intro. 4.1.3 Motion representation – Macroblocks
As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off
between the coding gain provided by using motion information and the overhead needed to represent it. Each macroblock
can be temporally predicted in one of a number of different ways. For example, in frame encoding, the prediction from
the previous reference frame can itself be either frame-based or field-based. Depending on the type of the macroblock,
motion vector information and other side information is encoded with the compressed prediction error in each
macroblock. The motion vectors are encoded differentially with respect to the last encoded motion vectors using variable
length codes. The maximum length of the motion vectors that may be represented can be programmed, on a picture-by-
picture basis, so that the most demanding applications can be met without compromising the performance of the system
in more normal situations.
It is the responsibility of the encoder to calculate appropriate motion vectors. This Specification does not specify how
this should be done.
Intro. 4.1.4 Spatial redundancy reduction
Both source pictures and prediction errors have high spatial redundancy. This Specification uses a block-based DCT
method with visually weighted quantisation and run-length coding. After motion compensated prediction or
interpolation, the resulting prediction error is split into 8 by 8 blocks. These are transformed into the DCT domain where
they are weighted before being quantised. After quantisation many of the DCT coefficients are zero in value and so
two-dimensional run-length and variable length coding is used to encode the remaining DCT coefficients efficiently.
Intro. 4.1.5 Chrominance formats
In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this Specification supports 4:2:2 and 4:4:4 chrominance
formats.
Intro. 4.2 Scalable extensions
The scalability tools in this Specification are designed to support applications beyond that supported by single layer
video. Among the noteworthy applications areas addressed are video telecommunications, video on Asynchronous
Transfer Mode (ATM) networks, interworking of video standards, video service hierarchies with multiple spatial,
temporal and quality resolutions, HDTV with embedded TV, systems allowing migration to higher temporal resolution
HDTV, etc. Although a simple solution to scalable video is the simulcast technique which is based on
transmission/storage of multiple independently coded reproductions of video, a more efficient alternative is scalable
video coding, in which the bandwidth allocated to a given reproduction of video can be partially re-utilised in coding of
the next reproduction of video. In scalable video coding, it is assumed that given a coded bitstream, decoders of various
complexities can decode and display appropriate reproductions of coded video. A scalable video encoder is likely to have
increased complexity when compared to a single layer encoder. However, this Recommendation | International Standard
provides several different forms of scalabilities that address non-overlapping applications with corresponding
complexities. The basic scalability tools offered are:
– data partitioning;
– SNR scalability;
– spatial scalability; and
– temporal scalability.
Moreover, combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the
case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed,
whereas in hybrid scalability up to three layers are supported. Tables Intro. 1 to Intro. 3 provide a few example
applications of various scalabilities.
viii © ISO/IEC 2000 – All rights reserved
Table Intro. 1 – Applications of SNR scalability
Lower layer Enhancement layer Application
Recommendation Same resolution and format as Two quality service for Standard TV (SDTV)
ITU-R BT.601 lower layer
High Definition Same resolution and format as Two quality service for HDTV
lower layer
4:2:0 high definition 4:2:2 chroma simulcast Video production / distribution
Table Intro. 2 – Applications of spatial scalability
Base Enhancement Application
Progressive (30 Hz) Progressive (30 Hz) Compatibility or scalability CIF/SCIF
Interlace (30 Hz) Interlace (30 Hz) HDTV/SDTV scalability
Progressive (30 Hz) Interlace (30 Hz) ISO/IEC 11172-2/compatibility with this Specification
Interlace (30 Hz) Progressive (60 Hz) Migration to high resolution progressive HDTV
Table Intro. 3 – Applications of temporal scalability
Base Enhancement Higher Application
Progressive (30 Hz) Progressive (30 Hz) Progressive (60 Hz) Migration to high resolution progressive
HDTV
Interlace (30 Hz) Interlace (30 Hz) Progressive (60 Hz) Migration to high resolution progressive
HDTV
Intro. 4.2.1 Spatial scalable extension
Spatial scalability is a tool intended for use in video applications involving telecommunications, interworking of video
standards, video database browsing, interworking of HDTV and TV, etc., i.e. video systems with the primary common
feature that a minimum of two layers of spatial resolution are necessary. Spatial scalability involves generating two
spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic
spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial
resolution of the input video source. The lower and the enhancement layers may either both use the coding tools in this
Specification, or the ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement layer. The
latter case achieves a further advantage by facilitating interworking between video coding standards. Moreover, spatial
scalability offers flexibility in choice of video formats to be employed in each layer. An additional advantage of spatial
scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer can be
sent over channel with better error performance, while the less critical enhancement layer data can be sent over a channel
with poor error performance.
Intro. 4.2.2 SNR scalable extension
SNR scalability is a tool intended for use in video applications involving telecommunications, video services with
multiple qualities, standard TV and HDTV, i.e. video systems with the primary common feature that a minimum of two
layers of video quality are necessary. SNR scalability involves generating two video layers of same spatial resolution but
different video qualities from a single video source such that the lower layer is coded by itself to provide the basic video
quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the
© ISO/IEC 2000 – All rights reserved ix
lower layer regenerates a higher quality reproduction of the input video. The lower and the enhancement layers may
either use this Specification or ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement
layer. An additional advantage of SNR scalability is its ability to provide high degree of resilience to transmission errors
as the more important data of the lower layer can be sent over channel with better error performance, while the less
critical enhancement layer data can be sent over a channel with poor error performance.
Intro. 4.2.3 Temporal scalable extension
Temporal scalability is a tool intended for use in a range of diverse video applications from telecommunications
to HDTV for which migration to higher temporal resolution systems from that of lower temporal resolution systems may
be necessary. In many cases, the lower temporal resolution video systems may be either the existing systems or the less
expensive early generation systems, with the motivation of introducing more sophisticated systems gradually. Temporal
scalability involves partitioning of video frames into layers, whereas the lower layer is coded by itself to provide the
basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer, these
layers when decoded and temporal multiplexed to yield full temporal resolution of the video source. The lower temporal
resolution systems may only decode the lower layer to provide basic temporal resolution, whereas more sophisticated
systems of the future may decode both layers and provide high temporal resolution video while maintaining interworking
with earlier generation systems. An additional advantage of temporal scalability is its ability to provide resilience to
transmission errors as the more important data of the lower layer can be sent over channel with better error performance,
while the less critical enhancement layer can be sent over a channel with poor error performance.
Intro. 4.2.4 Data partitioning extension
Data partitioning is a tool intended for use when two channels are available for transmission and/or storage of a
video bitstream, as may be the case in ATM networks, terrestrial broadcast, magnetic media, etc. The bitstream is
partitioned between these channels such that more critical parts of the bitstream (such as headers, motion vectors, low
frequency DCT coefficients) are transmitted in the channel with the better error performance, and less critical data (such
as higher frequency DCT coefficients) is transmitted in the channel with poor error performance. Thus, degradation to
channel errors are minimised since the critical parts of a bitstream are better protected. Data from neither channel may be
decoded on a decoder that is not intended for decoding data partitioned bitstreams.
x © ISO/IEC 2000 – All rights reserved
ISO/IEC 13818-2 : 2000 (E)
INTERNATIONAL STANDARD
ISO/IEC 13818-2 : 2000 (E)
ITU-T Rec. H.262 (2000 E)
ITU-T RECOMMENDATION
INFORMATION TECHNOLOGY – GENERIC CODING OF MOVING
PICTURES AND ASSOCIATED AUDIO INFORMATION: VIDEO
1 Scope
This Recommendation | International Standard specifies the coded representation of picture information for digital
storage media and digital video communication and specifies the decoding process. The representation supports constant
bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing,
as well as special functions such as fast forward playback, fast reverse playback, slow motion, pause and still pictures.
This Recommendation | International Standard is forward compatible with ISO/IEC 11172-2 and upward or downward
compatible with EDTV, HDTV, SDTV formats.
This Recommendation | International Standard is primarily applicable to digital storage media, video broadcast and
communication. The storage media may be directly connected to the decoder, or via communications means such as
busses, LANs, or telecommunications links.
2 Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent
edition of the Recommendations and Standards indicated below. Members of IEC and ISO maintain registers of currently
valid International Standards. The Telecommunication Standardization Bureau of ITU maintains a list of currently valid
ITU-T Recommendations.
– Recommendation ITU-R BT.601-3 (1992), Encoding parameters of digital television for studios.
– Recommendation ITU-R BR.648 (1986), Digital recording of audio signals.
– Report ITU-R 955-2 (1990), Satellite sound broadcasting with portable receivers and receivers in
automobiles.
– ISO/IEC 11172-1:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 1 : Systems.
– ISO/IEC 11172-2:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 2 : Video.
– ISO/IEC 11172-3:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 3 : Audio.
– IEEE 1180:1990, Standard Specifications for the Implementations of 8 by 8 Inverse Discrete Cosine
Transform.
IEC 60908 (1999), Audio recording – Compact disc digital audio system.
–
– IEC 60461 (1986), Time and control code for video tape recorders.
– ITU-T Recommendation H.261 (1993), Video codec for audiovisual services at p × 64 kbit/s.
– ITU-T Recommendation H.320 (1999), Narrow-band visual telephone systems and terminal equipment.
– CCITT Recommendation T.81 (1992) ISO/IEC 10918-1:1994, Information technology – Digital
compression and coding of continuous-tone still images: Requirements and guidelines. (JPEG.)
ITU-T Rec. H.262 (2000 E) 1
ISO/IEC 13818-2 : 2000 (E)
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply.
3.1 AC coefficient: Any DCT coefficient for which the frequency in one or both dimensions is non-zero.
3.2 big picture: A coded picture that would cause VBV buffer underflow as defined in C.7. Big pictures can only
occur in sequences where low_delay is equal to 1. "Skipped picture" is a term that is sometimes used to describe the
same concept.
3.3 B-field picture: A field structure B-Picture.
3.4 B-frame picture: A frame structure B-Picture.
3.5 B-picture; bidirectionally predictive-coded picture: A picture that is coded using motion compensated
prediction from past and/or future reference fields or frames.
3.6 backward compatibility: A newer coding standard is backward compatible with an older coding standard if
decoders designed to operate with the older coding standard are able to continue to operate by decoding all or part of a
bitstream produced according to the newer coding standard.
3.7 backward motion vector: A motion vector that is used for motion compensation from a reference frame or
reference field at a later time in display order.
3.8 backward prediction: Prediction from the future reference frame (field).
3.9 base layer: First, independently decodable layer of a scalable hierarchy.
3.10 bitstream; stream: An ordered series of bits that forms the coded representation of the data.
3.11 bitrate: The rate at which the coded bitstream is delivered from the storage medium to the input of a decoder.
3.12 block: An 8-row by 8-column matrix of samples, or 64 DCT coefficients (source, quantised or dequantised).
3.13 bottom field: One of two fields that comprise a frame. Each line of a bottom field is spatially located
immediately below the corresponding line of the top field.
3.14 byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8 bits from the first bit in
the stream.
3.15 byte: Sequence of 8 bits.
3.16 channel: A digital medium that stores or transports a bitstream constructed according to ITU-T Rec. H.262 |
ISO/IEC 13818-2.
3.17 chrominance format: Defines the number of chrominance blocks in a macroblock.
3.18 chroma simulcast: A type of scalability (which is a subset of SNR scalability) where the enhancement layer(s)
contain only coded refinement data for the DC coefficients, and all the data for the AC coefficients, of the chrominance
components.
3.19 chrominance component: A matrix, block or single sample representing one of the two colour difference
signals related to the primary colours in the manner defined in the bitstream. The symbols used for the chrominance
signals are Cr and Cb.
3.20 coded B-frame: A B-frame picture or a pair of B-field pictures.
3.21 coded frame: A coded frame is a coded I-frame, a coded P-frame or a coded B-frame.
3.22 coded I-frame: An I-frame picture or a pair of field pictures, where the first field picture is an I-picture and the
second field picture is an I-picture or a P-picture.
3.23 coded P-frame: A P-frame picture or a pair of P-field pictures.
3.24 coded picture: A coded picture is made of a picture header, the optional extensions immediately following it,
and the following picture data. A coded picture may be a coded frame or a coded field.
3.25 coded video bitstream: A coded representation of a series of one or more pictures as defined in ITU-T
Rec. H.262 | ISO/IEC 13818-2.
3.26 coded order: The order in which the pictures are transmitted and decoded. This order is not necessarily the
same as the display order.
2 ITU-T Rec. H.262 (2000 E)
ISO/IEC 13818-2 : 2000 (E)
3.27 coded representation: A data element as represented in its encoded form.
3.28 coding parameters: The set of user-definable parameters that characterise a coded video bitstream. Bitstreams
are characterised by coding parameters. Decoders are characterised by the bitstreams that they are capable of decoding.
3.29 component: A matrix, block or single sample from one of the three matrices (luminance and two chrominance)
that make up a picture.
3.30 compression: Reduction in the number of bits used to represent an item of data.
3.31 constant bitrate coded video: A coded video bitstream with a constant bitrate.
3.32 constant bitrate: Operation where the bitrate is constant from start to finish of the coded bitstream.
3.33 data element: An item of data as represented before encoding and after decoding.
3.34 data partitioning: A method for dividing a bitstream into two separate bitstreams for error resilience purposes.
The two bitstreams have to be recombined before decoding.
3.35 D-Picture: A type of picture that shall not be used except in ISO/IEC 11172-2.
3.36 DC coefficient: The DCT coefficient for which the frequency is zero in both dimensions.
3.37 DCT coefficient: The amplitude of a specific cosine basis function.
3.38 decoder input buffer: The First-In First-Out (FIFO) buffer specified in the video buffering verifier.
3.39 decoder: An embodiment of a decoding process.
3.40 decoding (
...
INTERNATIONAL ISO/IEC
STANDARD 13818-2
Second edition
2000-12-15
Information technology — Generic coding
of moving pictures and associated audio
information: Video
Technologies de l'information — Codage générique des images animées et
du son associé: Données vidéo
Reference number
©
ISO/IEC 2000
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not
be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this
file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this
area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters
were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event
that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2000
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic
or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body
in the country of the requester.
ISO copyright office
Case postale 56 CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.ch
Web www.iso.ch
Printed in Switzerland
ii © ISO/IEC 2000 – All rights reserved
CONTENTS
Page
Intro. 1 Purpose. vi
Intro. 2 Application. vi
Intro. 3 Profiles and levels . vi
Intro. 4 The scalable and the non-scalable syntax . vii
1 Scope . 1
2 Normative references. 1
3 Definitions . 2
4 Abbreviations and symbols. 7
4.1 Arithmetic operators. 7
4.2 Logical operators. 8
4.3 Relational operators. 8
4.4 Bitwise operators. 8
4.5 Assignment. 8
4.6 Mnemonics . 8
4.7 Constants . 9
5 Conventions . 9
5.1 Method of describing bitstream syntax. 9
5.2 Definition of functions . 10
5.3 Reserved, forbidden and marker_bit. 10
5.4 Arithmetic precision . 11
6 Video bitstream syntax and semantics. 11
6.1 Structure of coded video data . 11
6.2 Video bitstream syntax . 21
6.3 Video bitstream semantics. 36
7 The video decoding process. 61
7.1 Higher syntactic structures . 61
7.2 Variable length decoding. 62
7.3 Inverse scan . 64
7.4 Inverse quantisation. 66
7.5 Inverse DCT . 69
7.6 Motion compensation . 69
7.7 Spatial scalability. 83
7.8 SNR scalability. 92
7.9 Temporal scalability . 99
7.10 Data partitioning. 102
7.11 Hybrid scalability . 103
7.12 Output of the decoding process . 104
8 Profiles and levels. 106
8.1 ISO/IEC 11172-2 compatibility. 109
8.2 Relationship between defined profiles. 109
8.3 Relationship between defined levels . 111
8.4 Scalable layers. 111
8.5 Parameter values for defined profiles, levels and layers. 114
8.6 Compatibility requirements on decoders. 115
9 Registration of Copyright Identifiers . 117
9.1 General . 117
9.2 Implementation of a Registration Authority (RA). 118
© ISO/IEC 2000 – All rights reserved iii
Page
Annex A – Inverse discrete transform . 119
Annex B – Variable length code tables. 121
B.1 Macroblock addressing. 121
B.2 Macroblock type. 122
B.3 Macroblock pattern. 127
B.4 Motion vectors. 128
B.5 DCT coefficients . 129
Annex C – Video buffering verifier . 138
Annex D – Features supported by the algorithm. 143
D.1 Overview . 143
D.2 Video formats . 143
D.3 Picture quality. 144
D.4 Data rate control . 144
D.5 Low delay mode . 144
D.6 Random access/channel hopping. 145
D.7 Scalability. 145
D.8 Compatibility. 151
D.9 Differences between this Specification and ISO/IEC 11172-2. 151
D.10 Complexity . 154
D.11 Editing encoded bitstreams. 154
D.12 Trick modes. 154
D.13 Error resilience . 155
D.14 Concatenated sequences . 162
Annex E – Profile and level restrictions . 163
E.1 Syntax element restrictions in profiles . 163
E.2 Permissible layer combinations . 175
Annex F – Bibliography. 197
Annex G – Registration Procedure . 198
G.1 Procedure for the request of a Registered Identifier (RID). 198
G.2 Responsibilities of the Registration Authority. 198
G.3 Responsibilities of parties requesting an RID. 198
G.4 Appeal procedure for denied applications . 199
Annex H – Registration Application Form . 200
H.1 Contact information of organization requesting a Registered Identifier (RID) . 200
H.2 Statement of an intention to apply the assigned RID. 200
H.3 Date of intended implementation of the RID. 200
H.4 Authorized representative. 200
H.5 For official use only of the Registration Authority. 200
Annex J – 4:2:2 Profile test results . 202
J.1 Introduction . 202
Annex K – Patents. 207
iv © ISO/IEC 2000 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission)
form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC
participate in the development of International Standards through technical committees established by the
respective organization to deal with particular fields of technical activity. ISO and IEC technical committees
collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in
liaison with ISO and IEC, also take part in the work.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3.
In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting.
Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this part of ISO/IEC 13818 may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
International Standard ISO/IEC 13818-2 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information
technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information,in
collaboration with ITU-T. The identical text is published as ITU-T Rec. H.262.
This second edition cancels and replaces the first edition (ISO/IEC 13818-2:1996), which has been technically
revised.
ISO/IEC 13818 consists of the following parts, under the general title Information technology — Generic coding of
moving pictures and associated audio information:
Part 1: Systems
Part 2: Video
Part 3: Audio
Part 4: Conformance testing
Part 5: Software simulation
Part 6: Extensions for DSM-CC
Part 7: Advanced Audio Coding (AAC)
Part 9: Extension for real time interface for systems decoders
Part 10: Conformance extensions for Digital Storage Media Command and Control (DSM-CC)
Annexes A, B and C form a normative part of this part of ISO/IEC 13818. Annexes D to K are for information only.
© ISO/IEC 2000 – All rights reserved v
Introduction
Intro. 1 Purpose
This Part of this Recommendation | International Standard was developed in response to the growing need for a generic
coding method of moving pictures and of associated sound for various applications such as digital storage media,
television broadcasting and communication. The use of this Specification means that motion video can be manipulated as
a form of computer data and can be stored on various storage media, transmitted and received over existing and future
networks and distributed on existing and future broadcasting channels.
Intro. 2 Application
The applications of this Specification cover, but are not limited to, such areas as listed below:
BSS Broadcasting Satellite Service (to the home)
CATV Cable TV Distribution on optical networks, copper, etc.
CDAD Cable Digital Audio Distribution
DSB Digital Sound Broadcasting (terrestrial and satellite broadcasting)
DTTB Digital Terrestrial Television Broadcasting
EC Electronic Cinema
ENG Electronic News Gathering (including SNG, Satellite News Gathering)
FSS Fixed Satellite Service (e.g. to head ends)
HTT Home Television Theatre
IPC Interpersonal Communications (videoconferencing, videophone, etc.)
ISM Interactive Storage Media (optical disks, etc.)
MMM Multimedia Mailing
NCA News and Current Affairs
NDB Networked Database Services (via ATM, etc.)
RVS Remote Video Surveillance
SSM Serial Storage Media (digital VTR, etc.)
Intro. 3 Profiles and levels
This Specification is intended to be generic in the sense that it serves a wide range of applications, bitrates, resolutions,
qualities and services. Applications should cover, among other things, digital storage media, television broadcasting and
communications. In the course of creating this Specification, various requirements from typical applications have been
considered, necessary algorithmic elements have been developed, and they have been integrated into a single syntax.
Hence, this Specification will facilitate the bitstream interchange among different applications.
Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets
of the syntax are also stipulated by means of "profile" and "level". These and other related terms are formally defined in
clause 3.
A "profile" is a defined subset of the entire bitstream syntax that is defined by this Specification. Within the bounds
imposed by the syntax of a given profile it is still possible to require a very large variation in the performance of
encoders and decoders depending upon the values taken by parameters in the bitstream. For instance, it is possible to
14 14
specify frame sizes as large as (approximately) 2 samples wide by 2 lines high. It is currently neither practical nor
economic to implement a decoder capable of dealing with all possible frame sizes.
In order to deal with this problem, "levels" are defined within each profile. A level is a defined set of constraints imposed
on parameters in the bitstream. These constraints may be simple limits on numbers. Alternatively they may take the form
of constraints on arithmetic combinations of the parameters (e.g. frame width multiplied by frame height multiplied by
frame rate).
Bitstreams complying with this Specification use a common syntax. In order to achieve a subset of the complete syntax,
flags and parameters are included in the bitstream that signal the presence or otherwise of syntactic elements that occur
later in the bitstream. In order to specify constraints on the syntax (and hence define a profile), it is thus only necessary to
constrain the values of these flags and parameters that specify the presence of later syntactic elements.
vi © ISO/IEC 2000 – All rights reserved
Intro. 4 The scalable and the non-scalable syntax
The full syntax can be divided into two major categories: One is the non-scalable syntax, which is structured as a super
set of the syntax defined in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra compression tools
for interlaced video signals. The second is the scalable syntax, the key property of which is to enable the reconstruction
of useful video from pieces of a total bitstream. This is achieved by structuring the total bitstream in two or more layers,
starting from a standalone base layer and adding a number of enhancement layers. The base layer can use the non-
scalable syntax, or in some situations conform to the ISO/IEC 11172-2 syntax.
Intro. 4.1 Overview of the non-scalable syntax
The coded representation defined in the non-scalable syntax achieves a high compression ratio while preserving good
image quality. The algorithm is not lossless as the exact sample values are not preserved during coding. Obtaining good
image quality at the bitrates of interest demands very high compression, which is not achievable with intra picture coding
alone. The need for random access, however, is best satisfied with pure intra picture coding. The choice of the techniques
is based on the need to balance a high image quality and compression ratio with the requirement to make random access
to the coded bitstream.
A number of techniques are used to achieve high compression. The algorithm first uses block-based motion
compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current
picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion vectors
are defined for each 16-sample by 16-line region of the picture. The prediction error, is further compressed using the
Discrete Cosine Transform (DCT) to remove spatial correlation before it is quantised in an irreversible process that
discards the less important information. Finally, the motion vectors are combined with the quantised DCT information,
and encoded using variable length codes.
Intro. 4.1.1 Temporal processing
Because of the conflicting requirements of random access and highly efficient compression, three main picture types are
defined. Intra Coded Pictures (I-Pictures) are coded without reference to other pictures. They provide access points to the
coded sequence where decoding can begin, but are coded with only moderate compression. Predictive Coded Pictures (P-
Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive coded picture
and are generally used as a reference for further prediction. Bidirectionally-predictive Coded Pictures (B-Pictures)
provide the highest degree of compression but require both past and future reference pictures for motion compensation.
Bidirectionally-predictive coded pictures are never used as references for prediction (except in the case that the resulting
picture is used as a reference in a spatially scalable enhancement layer). The organisation of the three picture types in a
sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the application. Figure
Intro. 1 illustrates an example of the relationship among the three different picture types.
Bidirectional Interpolation
B
B P BB
I B P
Prediction
Figure Intro. 1 – Example of temporal picture structure
FIGURE Intro. 1/H.262.[D01] = 8 CM
© ISO/IEC 2000 – All rights reserved vii
Intro. 4.1.2 Coding interlaced video
Each frame of interlaced video consists of two fields which are separated by one field-period. The Specification allows
either the frame to be encoded as picture or the two fields to be encoded as two pictures. Frame encoding or field
encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video
scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the
first, works better when there is fast movement.
Intro. 4.1.3 Motion representation – Macroblocks
As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off
between the coding gain provided by using motion information and the overhead needed to represent it. Each macroblock
can be temporally predicted in one of a number of different ways. For example, in frame encoding, the prediction from
the previous reference frame can itself be either frame-based or field-based. Depending on the type of the macroblock,
motion vector information and other side information is encoded with the compressed prediction error in each
macroblock. The motion vectors are encoded differentially with respect to the last encoded motion vectors using variable
length codes. The maximum length of the motion vectors that may be represented can be programmed, on a picture-by-
picture basis, so that the most demanding applications can be met without compromising the performance of the system
in more normal situations.
It is the responsibility of the encoder to calculate appropriate motion vectors. This Specification does not specify how
this should be done.
Intro. 4.1.4 Spatial redundancy reduction
Both source pictures and prediction errors have high spatial redundancy. This Specification uses a block-based DCT
method with visually weighted quantisation and run-length coding. After motion compensated prediction or
interpolation, the resulting prediction error is split into 8 by 8 blocks. These are transformed into the DCT domain where
they are weighted before being quantised. After quantisation many of the DCT coefficients are zero in value and so
two-dimensional run-length and variable length coding is used to encode the remaining DCT coefficients efficiently.
Intro. 4.1.5 Chrominance formats
In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this Specification supports 4:2:2 and 4:4:4 chrominance
formats.
Intro. 4.2 Scalable extensions
The scalability tools in this Specification are designed to support applications beyond that supported by single layer
video. Among the noteworthy applications areas addressed are video telecommunications, video on Asynchronous
Transfer Mode (ATM) networks, interworking of video standards, video service hierarchies with multiple spatial,
temporal and quality resolutions, HDTV with embedded TV, systems allowing migration to higher temporal resolution
HDTV, etc. Although a simple solution to scalable video is the simulcast technique which is based on
transmission/storage of multiple independently coded reproductions of video, a more efficient alternative is scalable
video coding, in which the bandwidth allocated to a given reproduction of video can be partially re-utilised in coding of
the next reproduction of video. In scalable video coding, it is assumed that given a coded bitstream, decoders of various
complexities can decode and display appropriate reproductions of coded video. A scalable video encoder is likely to have
increased complexity when compared to a single layer encoder. However, this Recommendation | International Standard
provides several different forms of scalabilities that address non-overlapping applications with corresponding
complexities. The basic scalability tools offered are:
– data partitioning;
– SNR scalability;
– spatial scalability; and
– temporal scalability.
Moreover, combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the
case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed,
whereas in hybrid scalability up to three layers are supported. Tables Intro. 1 to Intro. 3 provide a few example
applications of various scalabilities.
viii © ISO/IEC 2000 – All rights reserved
Table Intro. 1 – Applications of SNR scalability
Lower layer Enhancement layer Application
Recommendation Same resolution and format as Two quality service for Standard TV (SDTV)
ITU-R BT.601 lower layer
High Definition Same resolution and format as Two quality service for HDTV
lower layer
4:2:0 high definition 4:2:2 chroma simulcast Video production / distribution
Table Intro. 2 – Applications of spatial scalability
Base Enhancement Application
Progressive (30 Hz) Progressive (30 Hz) Compatibility or scalability CIF/SCIF
Interlace (30 Hz) Interlace (30 Hz) HDTV/SDTV scalability
Progressive (30 Hz) Interlace (30 Hz) ISO/IEC 11172-2/compatibility with this Specification
Interlace (30 Hz) Progressive (60 Hz) Migration to high resolution progressive HDTV
Table Intro. 3 – Applications of temporal scalability
Base Enhancement Higher Application
Progressive (30 Hz) Progressive (30 Hz) Progressive (60 Hz) Migration to high resolution progressive
HDTV
Interlace (30 Hz) Interlace (30 Hz) Progressive (60 Hz) Migration to high resolution progressive
HDTV
Intro. 4.2.1 Spatial scalable extension
Spatial scalability is a tool intended for use in video applications involving telecommunications, interworking of video
standards, video database browsing, interworking of HDTV and TV, etc., i.e. video systems with the primary common
feature that a minimum of two layers of spatial resolution are necessary. Spatial scalability involves generating two
spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic
spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial
resolution of the input video source. The lower and the enhancement layers may either both use the coding tools in this
Specification, or the ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement layer. The
latter case achieves a further advantage by facilitating interworking between video coding standards. Moreover, spatial
scalability offers flexibility in choice of video formats to be employed in each layer. An additional advantage of spatial
scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer can be
sent over channel with better error performance, while the less critical enhancement layer data can be sent over a channel
with poor error performance.
Intro. 4.2.2 SNR scalable extension
SNR scalability is a tool intended for use in video applications involving telecommunications, video services with
multiple qualities, standard TV and HDTV, i.e. video systems with the primary common feature that a minimum of two
layers of video quality are necessary. SNR scalability involves generating two video layers of same spatial resolution but
different video qualities from a single video source such that the lower layer is coded by itself to provide the basic video
quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the
© ISO/IEC 2000 – All rights reserved ix
lower layer regenerates a higher quality reproduction of the input video. The lower and the enhancement layers may
either use this Specification or ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement
layer. An additional advantage of SNR scalability is its ability to provide high degree of resilience to transmission errors
as the more important data of the lower layer can be sent over channel with better error performance, while the less
critical enhancement layer data can be sent over a channel with poor error performance.
Intro. 4.2.3 Temporal scalable extension
Temporal scalability is a tool intended for use in a range of diverse video applications from telecommunications
to HDTV for which migration to higher temporal resolution systems from that of lower temporal resolution systems may
be necessary. In many cases, the lower temporal resolution video systems may be either the existing systems or the less
expensive early generation systems, with the motivation of introducing more sophisticated systems gradually. Temporal
scalability involves partitioning of video frames into layers, whereas the lower layer is coded by itself to provide the
basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer, these
layers when decoded and temporal multiplexed to yield full temporal resolution of the video source. The lower temporal
resolution systems may only decode the lower layer to provide basic temporal resolution, whereas more sophisticated
systems of the future may decode both layers and provide high temporal resolution video while maintaining interworking
with earlier generation systems. An additional advantage of temporal scalability is its ability to provide resilience to
transmission errors as the more important data of the lower layer can be sent over channel with better error performance,
while the less critical enhancement layer can be sent over a channel with poor error performance.
Intro. 4.2.4 Data partitioning extension
Data partitioning is a tool intended for use when two channels are available for transmission and/or storage of a
video bitstream, as may be the case in ATM networks, terrestrial broadcast, magnetic media, etc. The bitstream is
partitioned between these channels such that more critical parts of the bitstream (such as headers, motion vectors, low
frequency DCT coefficients) are transmitted in the channel with the better error performance, and less critical data (such
as higher frequency DCT coefficients) is transmitted in the channel with poor error performance. Thus, degradation to
channel errors are minimised since the critical parts of a bitstream are better protected. Data from neither channel may be
decoded on a decoder that is not intended for decoding data partitioned bitstreams.
x © ISO/IEC 2000 – All rights reserved
ISO/IEC 13818-2 : 2000 (E)
INTERNATIONAL STANDARD
ISO/IEC 13818-2 : 2000 (E)
ITU-T Rec. H.262 (2000 E)
ITU-T RECOMMENDATION
INFORMATION TECHNOLOGY – GENERIC CODING OF MOVING
PICTURES AND ASSOCIATED AUDIO INFORMATION: VIDEO
1 Scope
This Recommendation | International Standard specifies the coded representation of picture information for digital
storage media and digital video communication and specifies the decoding process. The representation supports constant
bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing,
as well as special functions such as fast forward playback, fast reverse playback, slow motion, pause and still pictures.
This Recommendation | International Standard is forward compatible with ISO/IEC 11172-2 and upward or downward
compatible with EDTV, HDTV, SDTV formats.
This Recommendation | International Standard is primarily applicable to digital storage media, video broadcast and
communication. The storage media may be directly connected to the decoder, or via communications means such as
busses, LANs, or telecommunications links.
2 Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent
edition of the Recommendations and Standards indicated below. Members of IEC and ISO maintain registers of currently
valid International Standards. The Telecommunication Standardization Bureau of ITU maintains a list of currently valid
ITU-T Recommendations.
– Recommendation ITU-R BT.601-3 (1992), Encoding parameters of digital television for studios.
– Recommendation ITU-R BR.648 (1986), Digital recording of audio signals.
– Report ITU-R 955-2 (1990), Satellite sound broadcasting with portable receivers and receivers in
automobiles.
– ISO/IEC 11172-1:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 1 : Systems.
– ISO/IEC 11172-2:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 2 : Video.
– ISO/IEC 11172-3:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 3 : Audio.
– IEEE 1180:1990, Standard Specifications for the Implementations of 8 by 8 Inverse Discrete Cosine
Transform.
IEC 60908 (1999), Audio recording – Compact disc digital audio system.
–
– IEC 60461 (1986), Time and control code for video tape recorders.
– ITU-T Recommendation H.261 (1993), Video codec for audiovisual services at p × 64 kbit/s.
– ITU-T Recommendation H.320 (1999), Narrow-band visual telephone systems and terminal equipment.
– CCITT Recommendation T.81 (1992) ISO/IEC 10918-1:1994, Information technology – Digital
compression and coding of continuous-tone still images: Requirements and guidelines. (JPEG.)
ITU-T Rec. H.262 (2000 E) 1
ISO/IEC 13818-2 : 2000 (E)
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply.
3.1 AC coefficient: Any DCT coefficient for which the frequency in one or both dimensions is non-zero.
3.2 big picture: A coded picture that would cause VBV buffer underflow as defined in C.7. Big pictures can only
occur in sequences where low_delay is equal to 1. "Skipped picture" is a term that is sometimes used to describe the
same concept.
3.3 B-field picture: A field structure B-Picture.
3.4 B-frame picture: A frame structure B-Picture.
3.5 B-picture; bidirectionally predictive-coded picture: A picture that is coded using motion compensated
prediction from past and/or future reference fields or frames.
3.6 backward compatibility: A newer coding standard is backward compatible with an older coding standard if
decoders designed to operate with the older coding standard are able to continue to operate by decoding all or part of a
bitstream produced according to the newer coding standard.
3.7 backward motion vector: A motion vector that is used for motion compensation from a reference frame or
reference field at a later time in display order.
3.8 backward prediction: Prediction from the future reference frame (field).
3.9 base layer: First, independently decodable layer of a scalable hierarchy.
3.10 bitstream; stream: An ordered series of bits that forms the coded representation of the data.
3.11 bitrate: The rate at which the coded bitstream is delivered from the storage medium to the input of a decoder.
3.12 block: An 8-row by 8-column matrix of samples, or 64 DCT coefficients (source, quantised or dequantised).
3.13 bottom field: One of two fields that comprise a frame. Each line of a bottom field is spatially located
immediately below the corresponding line of the top field.
3.14 byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8 bits from the first bit in
the stream.
3.15 byte: Sequence of 8 bits.
3.16 channel: A digital medium that stores or transports a bitstream constructed according to ITU-T Rec. H.262 |
ISO/IEC 13818-2.
3.17 chrominance format: Defines the number of chrominance blocks in a macroblock.
3.18 chroma simulcast: A type of scalability (which is a subset of SNR scalability) where the enhancement layer(s)
contain only coded refinement data for the DC coefficients, and all the data for the AC coefficients, of the chrominance
components.
3.19 chrominance component: A matrix, block or single sample representing one of the two colour difference
signals related to the primary colours in the manner defined in the bitstream. The symbols used for the chrominance
signals are Cr and Cb.
3.20 coded B-frame: A B-frame picture or a pair of B-field pictures.
3.21 coded frame: A coded frame is a coded I-frame, a coded P-frame or a coded B-frame.
3.22 coded I-frame: An I-frame picture or a pair of field pictures, where the first field picture is an I-picture and the
second field picture is an I-picture or a P-picture.
3.23 coded P-frame: A P-frame picture or a pair of P-field pictures.
3.24 coded picture: A coded picture is made of a picture header, the optional extensions immediately following it,
and the following picture data. A coded picture may be a coded frame or a coded field.
3.25 coded video bitstream: A coded representation of a series of one or more pictures as defined in ITU-T
Rec. H.262 | ISO/IEC 13818-2.
3.26 coded order: The order in which the pictures are transmitted and decoded. This order is not necessarily the
same as the display order.
2 ITU-T Rec. H.262 (2000 E)
ISO/IEC 13818-2 : 2000 (E)
3.27 coded representation: A data element as represented in its encoded form.
3.28 coding parameters: The set of user-definable parameters that characterise a coded video bitstream. Bitstreams
are characterised by coding parameters. Decoders are characterised by the bitstreams that they are capable of decoding.
3.29 component: A matrix, block or single sample from one of the three matrices (luminance and two chrominance)
that make up a picture.
3.30 compression: Reduction in the number of bits used to represent an item of data.
3.31 constant bitrate coded video: A coded video bitstream with a constant bitrate.
3.32 constant bitrate: Operation where the bitrate is constant from start to finish of the coded bitstream.
3.33 data element: An item of data as represented before encoding and after decoding.
3.34 data partitioning: A method for dividing a bitstream into two separate bitstreams for error resilience purposes.
The two bitstreams have to be recombined before decoding.
3.35 D-Picture: A type of picture that shall not be used except in ISO/IEC 11172-2.
3.36 DC coefficient: The DCT coefficient for which the frequency is zero in both dimensions.
3.37 DCT coefficient: The amplitude of a specific cosine basis function.
3.38 decoder input buffer: The First-In First-Out (FIFO) buffer specified in the video buffering verifier.
3.39 decoder: An embodiment of a decoding p
...
SLOVENSKI STANDARD
01-december-2005
,QIRUPDFLMVNDWHKQRORJLMD6SORãQRNRGLUDQMHJLEOMLYLKVOLNLQSULSDGDMRþLKDYGLR
LQIRUPDFLM9LGHR
Information technology - Generic coding of moving pictures and associated audio
information: Video
Technologies de l'information - Codage générique des images animées et du son
associé: Données vidéo
Ta slovenski standard je istoveten z: ISO/IEC 13818-2:2000
ICS:
35.040 Nabori znakov in kodiranje Character sets and
informacij information coding
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO/IEC
STANDARD 13818-2
Second edition
2000-12-15
Information technology — Generic coding
of moving pictures and associated audio
information: Video
Technologies de l'information — Codage générique des images animées et
du son associé: Données vidéo
Reference number
©
ISO/IEC 2000
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not
be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this
file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this
area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters
were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event
that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2000
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic
or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body
in the country of the requester.
ISO copyright office
Case postale 56 � CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.ch
Web www.iso.ch
Printed in Switzerland
ii © ISO/IEC 2000 – All rights reserved
CONTENTS
Page
Intro. 1 Purpose. vi
Intro. 2 Application. vi
Intro. 3 Profiles and levels . vi
Intro. 4 The scalable and the non-scalable syntax . vii
1 Scope . 1
2 Normative references. 1
3 Definitions . 2
4 Abbreviations and symbols. 7
4.1 Arithmetic operators. 7
4.2 Logical operators. 8
4.3 Relational operators. 8
4.4 Bitwise operators. 8
4.5 Assignment. 8
4.6 Mnemonics . 8
4.7 Constants . 9
5 Conventions . 9
5.1 Method of describing bitstream syntax. 9
5.2 Definition of functions . 10
5.3 Reserved, forbidden and marker_bit. 10
5.4 Arithmetic precision . 11
6 Video bitstream syntax and semantics. 11
6.1 Structure of coded video data . 11
6.2 Video bitstream syntax . 21
6.3 Video bitstream semantics. 36
7 The video decoding process. 61
7.1 Higher syntactic structures . 61
7.2 Variable length decoding. 62
7.3 Inverse scan . 64
7.4 Inverse quantisation. 66
7.5 Inverse DCT . 69
7.6 Motion compensation . 69
7.7 Spatial scalability. 83
7.8 SNR scalability. 92
7.9 Temporal scalability . 99
7.10 Data partitioning. 102
7.11 Hybrid scalability . 103
7.12 Output of the decoding process . 104
8 Profiles and levels. 106
8.1 ISO/IEC 11172-2 compatibility. 109
8.2 Relationship between defined profiles. 109
8.3 Relationship between defined levels . 111
8.4 Scalable layers. 111
8.5 Parameter values for defined profiles, levels and layers. 114
8.6 Compatibility requirements on decoders. 115
9 Registration of Copyright Identifiers . 117
9.1 General . 117
9.2 Implementation of a Registration Authority (RA). 118
© ISO/IEC 2000 – All rights reserved iii
Page
Annex A – Inverse discrete transform . 119
Annex B – Variable length code tables. 121
B.1 Macroblock addressing. 121
B.2 Macroblock type. 122
B.3 Macroblock pattern. 127
B.4 Motion vectors. 128
B.5 DCT coefficients . 129
Annex C – Video buffering verifier . 138
Annex D – Features supported by the algorithm. 143
D.1 Overview . 143
D.2 Video formats . 143
D.3 Picture quality. 144
D.4 Data rate control . 144
D.5 Low delay mode . 144
D.6 Random access/channel hopping. 145
D.7 Scalability. 145
D.8 Compatibility. 151
D.9 Differences between this Specification and ISO/IEC 11172-2. 151
D.10 Complexity . 154
D.11 Editing encoded bitstreams. 154
D.12 Trick modes. 154
D.13 Error resilience . 155
D.14 Concatenated sequences . 162
Annex E – Profile and level restrictions . 163
E.1 Syntax element restrictions in profiles . 163
E.2 Permissible layer combinations . 175
Annex F – Bibliography. 197
Annex G – Registration Procedure . 198
G.1 Procedure for the request of a Registered Identifier (RID). 198
G.2 Responsibilities of the Registration Authority. 198
G.3 Responsibilities of parties requesting an RID. 198
G.4 Appeal procedure for denied applications . 199
Annex H – Registration Application Form . 200
H.1 Contact information of organization requesting a Registered Identifier (RID) . 200
H.2 Statement of an intention to apply the assigned RID. 200
H.3 Date of intended implementation of the RID. 200
H.4 Authorized representative. 200
H.5 For official use only of the Registration Authority. 200
Annex J – 4:2:2 Profile test results . 202
J.1 Introduction . 202
Annex K – Patents. 207
iv © ISO/IEC 2000 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission)
form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC
participate in the development of International Standards through technical committees established by the
respective organization to deal with particular fields of technical activity. ISO and IEC technical committees
collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in
liaison with ISO and IEC, also take part in the work.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3.
In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting.
Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this part of ISO/IEC 13818 may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
International Standard ISO/IEC 13818-2 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information
technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information,in
collaboration with ITU-T. The identical text is published as ITU-T Rec. H.262.
This second edition cancels and replaces the first edition (ISO/IEC 13818-2:1996), which has been technically
revised.
ISO/IEC 13818 consists of the following parts, under the general title Information technology — Generic coding of
moving pictures and associated audio information:
� Part 1: Systems
� Part 2: Video
� Part 3: Audio
� Part 4: Conformance testing
� Part 5: Software simulation
� Part 6: Extensions for DSM-CC
� Part 7: Advanced Audio Coding (AAC)
� Part 9: Extension for real time interface for systems decoders
� Part 10: Conformance extensions for Digital Storage Media Command and Control (DSM-CC)
Annexes A, B and C form a normative part of this part of ISO/IEC 13818. Annexes D to K are for information only.
© ISO/IEC 2000 – All rights reserved v
Introduction
Intro. 1 Purpose
This Part of this Recommendation | International Standard was developed in response to the growing need for a generic
coding method of moving pictures and of associated sound for various applications such as digital storage media,
television broadcasting and communication. The use of this Specification means that motion video can be manipulated as
a form of computer data and can be stored on various storage media, transmitted and received over existing and future
networks and distributed on existing and future broadcasting channels.
Intro. 2 Application
The applications of this Specification cover, but are not limited to, such areas as listed below:
BSS Broadcasting Satellite Service (to the home)
CATV Cable TV Distribution on optical networks, copper, etc.
CDAD Cable Digital Audio Distribution
DSB Digital Sound Broadcasting (terrestrial and satellite broadcasting)
DTTB Digital Terrestrial Television Broadcasting
EC Electronic Cinema
ENG Electronic News Gathering (including SNG, Satellite News Gathering)
FSS Fixed Satellite Service (e.g. to head ends)
HTT Home Television Theatre
IPC Interpersonal Communications (videoconferencing, videophone, etc.)
ISM Interactive Storage Media (optical disks, etc.)
MMM Multimedia Mailing
NCA News and Current Affairs
NDB Networked Database Services (via ATM, etc.)
RVS Remote Video Surveillance
SSM Serial Storage Media (digital VTR, etc.)
Intro. 3 Profiles and levels
This Specification is intended to be generic in the sense that it serves a wide range of applications, bitrates, resolutions,
qualities and services. Applications should cover, among other things, digital storage media, television broadcasting and
communications. In the course of creating this Specification, various requirements from typical applications have been
considered, necessary algorithmic elements have been developed, and they have been integrated into a single syntax.
Hence, this Specification will facilitate the bitstream interchange among different applications.
Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets
of the syntax are also stipulated by means of "profile" and "level". These and other related terms are formally defined in
clause 3.
A "profile" is a defined subset of the entire bitstream syntax that is defined by this Specification. Within the bounds
imposed by the syntax of a given profile it is still possible to require a very large variation in the performance of
encoders and decoders depending upon the values taken by parameters in the bitstream. For instance, it is possible to
14 14
specify frame sizes as large as (approximately) 2 samples wide by 2 lines high. It is currently neither practical nor
economic to implement a decoder capable of dealing with all possible frame sizes.
In order to deal with this problem, "levels" are defined within each profile. A level is a defined set of constraints imposed
on parameters in the bitstream. These constraints may be simple limits on numbers. Alternatively they may take the form
of constraints on arithmetic combinations of the parameters (e.g. frame width multiplied by frame height multiplied by
frame rate).
Bitstreams complying with this Specification use a common syntax. In order to achieve a subset of the complete syntax,
flags and parameters are included in the bitstream that signal the presence or otherwise of syntactic elements that occur
later in the bitstream. In order to specify constraints on the syntax (and hence define a profile), it is thus only necessary to
constrain the values of these flags and parameters that specify the presence of later syntactic elements.
vi © ISO/IEC 2000 – All rights reserved
Intro. 4 The scalable and the non-scalable syntax
The full syntax can be divided into two major categories: One is the non-scalable syntax, which is structured as a super
set of the syntax defined in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra compression tools
for interlaced video signals. The second is the scalable syntax, the key property of which is to enable the reconstruction
of useful video from pieces of a total bitstream. This is achieved by structuring the total bitstream in two or more layers,
starting from a standalone base layer and adding a number of enhancement layers. The base layer can use the non-
scalable syntax, or in some situations conform to the ISO/IEC 11172-2 syntax.
Intro. 4.1 Overview of the non-scalable syntax
The coded representation defined in the non-scalable syntax achieves a high compression ratio while preserving good
image quality. The algorithm is not lossless as the exact sample values are not preserved during coding. Obtaining good
image quality at the bitrates of interest demands very high compression, which is not achievable with intra picture coding
alone. The need for random access, however, is best satisfied with pure intra picture coding. The choice of the techniques
is based on the need to balance a high image quality and compression ratio with the requirement to make random access
to the coded bitstream.
A number of techniques are used to achieve high compression. The algorithm first uses block-based motion
compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current
picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion vectors
are defined for each 16-sample by 16-line region of the picture. The prediction error, is further compressed using the
Discrete Cosine Transform (DCT) to remove spatial correlation before it is quantised in an irreversible process that
discards the less important information. Finally, the motion vectors are combined with the quantised DCT information,
and encoded using variable length codes.
Intro. 4.1.1 Temporal processing
Because of the conflicting requirements of random access and highly efficient compression, three main picture types are
defined. Intra Coded Pictures (I-Pictures) are coded without reference to other pictures. They provide access points to the
coded sequence where decoding can begin, but are coded with only moderate compression. Predictive Coded Pictures (P-
Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive coded picture
and are generally used as a reference for further prediction. Bidirectionally-predictive Coded Pictures (B-Pictures)
provide the highest degree of compression but require both past and future reference pictures for motion compensation.
Bidirectionally-predictive coded pictures are never used as references for prediction (except in the case that the resulting
picture is used as a reference in a spatially scalable enhancement layer). The organisation of the three picture types in a
sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the application. Figure
Intro. 1 illustrates an example of the relationship among the three different picture types.
Bidirectional Interpolation
B
B P BB
I B P
Prediction
Figure Intro. 1 – Example of temporal picture structure
FIGURE Intro. 1/H.262.[D01] = 8 CM
© ISO/IEC 2000 – All rights reserved vii
Intro. 4.1.2 Coding interlaced video
Each frame of interlaced video consists of two fields which are separated by one field-period. The Specification allows
either the frame to be encoded as picture or the two fields to be encoded as two pictures. Frame encoding or field
encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video
scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the
first, works better when there is fast movement.
Intro. 4.1.3 Motion representation – Macroblocks
As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off
between the coding gain provided by using motion information and the overhead needed to represent it. Each macroblock
can be temporally predicted in one of a number of different ways. For example, in frame encoding, the prediction from
the previous reference frame can itself be either frame-based or field-based. Depending on the type of the macroblock,
motion vector information and other side information is encoded with the compressed prediction error in each
macroblock. The motion vectors are encoded differentially with respect to the last encoded motion vectors using variable
length codes. The maximum length of the motion vectors that may be represented can be programmed, on a picture-by-
picture basis, so that the most demanding applications can be met without compromising the performance of the system
in more normal situations.
It is the responsibility of the encoder to calculate appropriate motion vectors. This Specification does not specify how
this should be done.
Intro. 4.1.4 Spatial redundancy reduction
Both source pictures and prediction errors have high spatial redundancy. This Specification uses a block-based DCT
method with visually weighted quantisation and run-length coding. After motion compensated prediction or
interpolation, the resulting prediction error is split into 8 by 8 blocks. These are transformed into the DCT domain where
they are weighted before being quantised. After quantisation many of the DCT coefficients are zero in value and so
two-dimensional run-length and variable length coding is used to encode the remaining DCT coefficients efficiently.
Intro. 4.1.5 Chrominance formats
In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this Specification supports 4:2:2 and 4:4:4 chrominance
formats.
Intro. 4.2 Scalable extensions
The scalability tools in this Specification are designed to support applications beyond that supported by single layer
video. Among the noteworthy applications areas addressed are video telecommunications, video on Asynchronous
Transfer Mode (ATM) networks, interworking of video standards, video service hierarchies with multiple spatial,
temporal and quality resolutions, HDTV with embedded TV, systems allowing migration to higher temporal resolution
HDTV, etc. Although a simple solution to scalable video is the simulcast technique which is based on
transmission/storage of multiple independently coded reproductions of video, a more efficient alternative is scalable
video coding, in which the bandwidth allocated to a given reproduction of video can be partially re-utilised in coding of
the next reproduction of video. In scalable video coding, it is assumed that given a coded bitstream, decoders of various
complexities can decode and display appropriate reproductions of coded video. A scalable video encoder is likely to have
increased complexity when compared to a single layer encoder. However, this Recommendation | International Standard
provides several different forms of scalabilities that address non-overlapping applications with corresponding
complexities. The basic scalability tools offered are:
– data partitioning;
– SNR scalability;
– spatial scalability; and
– temporal scalability.
Moreover, combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the
case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed,
whereas in hybrid scalability up to three layers are supported. Tables Intro. 1 to Intro. 3 provide a few example
applications of various scalabilities.
viii © ISO/IEC 2000 – All rights reserved
Table Intro. 1 – Applications of SNR scalability
Lower layer Enhancement layer Application
Recommendation Same resolution and format as Two quality service for Standard TV (SDTV)
ITU-R BT.601 lower layer
High Definition Same resolution and format as Two quality service for HDTV
lower layer
4:2:0 high definition 4:2:2 chroma simulcast Video production / distribution
Table Intro. 2 – Applications of spatial scalability
Base Enhancement Application
Progressive (30 Hz) Progressive (30 Hz) Compatibility or scalability CIF/SCIF
Interlace (30 Hz) Interlace (30 Hz) HDTV/SDTV scalability
Progressive (30 Hz) Interlace (30 Hz) ISO/IEC 11172-2/compatibility with this Specification
Interlace (30 Hz) Progressive (60 Hz) Migration to high resolution progressive HDTV
Table Intro. 3 – Applications of temporal scalability
Base Enhancement Higher Application
Progressive (30 Hz) Progressive (30 Hz) Progressive (60 Hz) Migration to high resolution progressive
HDTV
Interlace (30 Hz) Interlace (30 Hz) Progressive (60 Hz) Migration to high resolution progressive
HDTV
Intro. 4.2.1 Spatial scalable extension
Spatial scalability is a tool intended for use in video applications involving telecommunications, interworking of video
standards, video database browsing, interworking of HDTV and TV, etc., i.e. video systems with the primary common
feature that a minimum of two layers of spatial resolution are necessary. Spatial scalability involves generating two
spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic
spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial
resolution of the input video source. The lower and the enhancement layers may either both use the coding tools in this
Specification, or the ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement layer. The
latter case achieves a further advantage by facilitating interworking between video coding standards. Moreover, spatial
scalability offers flexibility in choice of video formats to be employed in each layer. An additional advantage of spatial
scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer can be
sent over channel with better error performance, while the less critical enhancement layer data can be sent over a channel
with poor error performance.
Intro. 4.2.2 SNR scalable extension
SNR scalability is a tool intended for use in video applications involving telecommunications, video services with
multiple qualities, standard TV and HDTV, i.e. video systems with the primary common feature that a minimum of two
layers of video quality are necessary. SNR scalability involves generating two video layers of same spatial resolution but
different video qualities from a single video source such that the lower layer is coded by itself to provide the basic video
quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the
© ISO/IEC 2000 – All rights reserved ix
lower layer regenerates a higher quality reproduction of the input video. The lower and the enhancement layers may
either use this Specification or ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement
layer. An additional advantage of SNR scalability is its ability to provide high degree of resilience to transmission errors
as the more important data of the lower layer can be sent over channel with better error performance, while the less
critical enhancement layer data can be sent over a channel with poor error performance.
Intro. 4.2.3 Temporal scalable extension
Temporal scalability is a tool intended for use in a range of diverse video applications from telecommunications
to HDTV for which migration to higher temporal resolution systems from that of lower temporal resolution systems may
be necessary. In many cases, the lower temporal resolution video systems may be either the existing systems or the less
expensive early generation systems, with the motivation of introducing more sophisticated systems gradually. Temporal
scalability involves partitioning of video frames into layers, whereas the lower layer is coded by itself to provide the
basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer, these
layers when decoded and temporal multiplexed to yield full temporal resolution of the video source. The lower temporal
resolution systems may only decode the lower layer to provide basic temporal resolution, whereas more sophisticated
systems of the future may decode both layers and provide high temporal resolution video while maintaining interworking
with earlier generation systems. An additional advantage of temporal scalability is its ability to provide resilience to
transmission errors as the more important data of the lower layer can be sent over channel with better error performance,
while the less critical enhancement layer can be sent over a channel with poor error performance.
Intro. 4.2.4 Data partitioning extension
Data partitioning is a tool intended for use when two channels are available for transmission and/or storage of a
video bitstream, as may be the case in ATM networks, terrestrial broadcast, magnetic media, etc. The bitstream is
partitioned between these channels such that more critical parts of the bitstream (such as headers, motion vectors, low
frequency DCT coefficients) are transmitted in the channel with the better error performance, and less critical data (such
as higher frequency DCT coefficients) is transmitted in the channel with poor error performance. Thus, degradation to
channel errors are minimised since the critical parts of a bitstream are better protected. Data from neither channel may be
decoded on a decoder that is not intended for decoding data partitioned bitstreams.
x © ISO/IEC 2000 – All rights reserved
ISO/IEC 13818-2 : 2000 (E)
INTERNATIONAL STANDARD
ISO/IEC 13818-2 : 2000 (E)
ITU-T Rec. H.262 (2000 E)
ITU-T RECOMMENDATION
INFORMATION TECHNOLOGY – GENERIC CODING OF MOVING
PICTURES AND ASSOCIATED AUDIO INFORMATION: VIDEO
1 Scope
This Recommendation | International Standard specifies the coded representation of picture information for digital
storage media and digital video communication and specifies the decoding process. The representation supports constant
bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing,
as well as special functions such as fast forward playback, fast reverse playback, slow motion, pause and still pictures.
This Recommendation | International Standard is forward compatible with ISO/IEC 11172-2 and upward or downward
compatible with EDTV, HDTV, SDTV formats.
This Recommendation | International Standard is primarily applicable to digital storage media, video broadcast and
communication. The storage media may be directly connected to the decoder, or via communications means such as
busses, LANs, or telecommunications links.
2 Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent
edition of the Recommendations and Standards indicated below. Members of IEC and ISO maintain registers of currently
valid International Standards. The Telecommunication Standardization Bureau of ITU maintains a list of currently valid
ITU-T Recommendations.
– Recommendation ITU-R BT.601-3 (1992), Encoding parameters of digital television for studios.
– Recommendation ITU-R BR.648 (1986), Digital recording of audio signals.
– Report ITU-R 955-2 (1990), Satellite sound broadcasting with portable receivers and receivers in
automobiles.
– ISO/IEC 11172-1:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 1 : Systems.
– ISO/IEC 11172-2:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 2 : Video.
– ISO/IEC 11172-3:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 3 : Audio.
– IEEE 1180:1990, Standard Specifications for the Implementations of 8 by 8 Inverse Discrete Cosine
Transform.
IEC 60908 (1999), Audio recording – Compact disc digital audio system.
–
– IEC 60461 (1986), Time and control code for video tape recorders.
– ITU-T Recommendation H.261 (1993), Video codec for audiovisual services at p × 64 kbit/s.
– ITU-T Recommendation H.320 (1999), Narrow-band visual telephone systems and terminal equipment.
– CCITT Recommendation T.81 (1992) ISO/IEC 10918-1:1994, Information technology – Digital
compression and coding of continuous-tone still images: Requirements and guidelines. (JPEG.)
ITU-T Rec. H.262 (2000 E) 1
ISO/IEC 13818-2 : 2000 (E)
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply.
3.1 AC coefficient: Any DCT coefficient for which the frequency in one or both dimensions is non-zero.
3.2 big picture: A coded picture that would cause VBV buffer underflow as defined in C.7. Big pictures can only
occur in sequences where low_delay is equal to 1. "Skipped picture" is a term that is sometimes used to describe the
same concept.
3.3 B-field picture: A field structure B-Picture.
3.4 B-frame picture: A frame structure B-Picture.
3.5 B-picture; bidirectionally predictive-coded picture: A picture that is coded using motion compensated
prediction from past and/or future reference fields or frames.
3.6 backward compatibility: A newer coding standard is backward compatible with an older coding standard if
decoders designed to operate with the older coding standard are able to continue to operate by decoding all or part of a
bitstream produced according to the newer coding standard.
3.7 backward motion vector: A motion vector that is used for motion compensation from a reference frame or
reference field at a later time in display order.
3.8 backward prediction: Prediction from the future reference frame (field).
3.9 base layer: First, independently decodable layer of a scalable hierarchy.
3.10 bitstream; stream: An ordered series of bits that forms the coded representation of the data.
3.11 bitrate: The rate at which the coded bitstream is delivered from the storage medium to the input of a decoder.
3.12 block: An 8-row by 8-column matrix of samples, or 64 DCT coefficients (source, quantised or dequantised).
3.13 bottom field: One of two fields that comprise a frame. Each line of a bottom field is spatially located
immediately below the corresponding line of the top field.
3.14 byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8 bits from the first bit in
the stream.
3.15 byte: Sequence of 8 bits.
3.16 channel: A digital medium that stores or transports a bitstream constructed according to ITU-T Rec. H.262 |
ISO/IEC 13818-2.
3.17 chrominance format: Defines the number of chrominance blocks in a macroblock.
3.18 chroma simulcast: A type of scalability (which is a subset of SNR scalability) where the enhancement layer(s)
contain only coded refinement data for the DC coefficients, and all the data for the AC coefficients, of the chrominance
components.
3.19 chrominance component: A matrix, block or single sample representing one of the two colour difference
signals related to the primary colours in the manner defined in the bitstream. The symbols used for the chrominance
signals are Cr and Cb.
3.20 coded B-frame: A B-frame picture or a pair of B-field pictures.
3.21 coded frame: A coded frame is a coded I-frame, a coded P-frame or a coded B-frame.
3.22 coded I-frame: An I-frame picture or a pair of field pictures, where the first field picture is an I-picture and the
second field picture is an I-picture or a P-picture.
3.23 coded P-frame: A P-frame picture or a pair of P-field pictures.
3.24 coded picture: A coded picture is made of a picture header, the optional extensions immediately following it,
and the following picture data. A coded picture may be a coded frame or a coded field.
3.25 coded video bitstream: A coded representation of a series of one or more pictures as defined in ITU-T
Rec. H.262 | ISO/IEC 13818-2.
3.26 coded order: The order in which the pictures are transmitted and decoded. This order is not necessarily the
same as the display order.
2 ITU-T Rec. H.262 (2000 E)
ISO/IEC 13818-2 : 2000 (E)
3.27 coded representation: A data element as represented in its encoded form.
3.28 coding parameters: The set of user-definable parameters that characterise a coded video bitstream. Bitstreams
are characterised by coding parameters. Decoders are charact
...
NORME ISO/CEI
INTERNATIONALE 13818-2
Deuxième édition
2000-12-15
Technologies de l'information — Codage
générique des images animées et du son
associé: Données vidéo
Information technology — Generic coding of moving pictures and
associated audio information: Video
Numéro de référence
ISO/CEI 13818-2:2000(F)
©
ISO/CEI 2000
ISO/CEI 13818-2:2000(F)
PDF – Exonération de responsabilité
Le présent fichier PDF peut contenir des polices de caractères intégrées. Conformément aux conditions de licence d'Adobe, ce fichier peut
être imprimé ou visualisé, mais ne doit pas être modifiéà moins que l'ordinateur employéà cet effet ne bénéficie d'une licence autorisant
l'utilisation de ces polices et que celles-ci y soient installées. Lors du téléchargement de ce fichier, les parties concernées acceptent de fait la
responsabilité de ne pas enfreindre les conditions de licence d'Adobe. Le Secrétariat central de l'ISO décline toute responsabilité en la
matière.
Adobe est une marque déposée d'Adobe Systems Incorporated.
Les détails relatifs aux produits logiciels utilisés pour la créationduprésent fichier PDF sont disponibles dans la rubrique General Info du
fichier; les paramètres de création PDF ont été optimisés pour l'impression. Toutes les mesures ont été prises pour garantir l'exploitation de
ce fichier par les comités membres de l'ISO. Dans le cas peu probable où surviendrait un problème d'utilisation, veuillez en informer le
Secrétariat central à l'adresse donnée ci-dessous.
Le présent CD-ROM contient la publication ISO/CEI 13818-2 au format PDF (portable document format), qui peut
être visualisée en utilisant Adobe® Acrobat® Reader.
Adobe et Acrobat sont des marques déposées de Adobe Systems Incorporated.
Cette deuxième édition annule et remplace la première édition (ISO/CEI 13818-2:1996), qui
...
NORME ISO/CEI
INTERNATIONALE 13818-2
Deuxième édition
2000-12-15
Technologies de l'information — Codage
générique des images animées et du son
associé: Données vidéo
Information technology — Generic coding of moving pictures and
associated audio information: Video
Numéro de référence
ISO/CEI 13818-2:2000(F)
©
ISO/CEI 2000
ISO/CEI 13818-2:2000(F)
PDF – Exonération de responsabilité
Le présent fichier PDF peut contenir des polices de caractères intégrées. Conformément aux conditions de licence d'Adobe, ce fichier peut
être imprimé ou visualisé, mais ne doit pas être modifiéà moins que l'ordinateur employéà cet effet ne bénéficie d'une licence autorisant
l'utilisation de ces polices et que celles-ci y soient installées. Lors du téléchargement de ce fichier, les parties concernées acceptent de fait la
responsabilité de ne pas enfreindre les conditions de licence d'Adobe. Le Secrétariat central de l'ISO décline toute responsabilité en la
matière.
Adobe est une marque déposée d'Adobe Systems Incorporated.
Les détails relatifs aux produits logiciels utilisés pour la créationduprésent fichier PDF sont disponibles dans la rubrique General Info du
fichier; les paramètres de création PDF ont été optimisés pour l'impression. Toutes les mesures ont été prises pour garantir l'exploitation de
ce fichier par les comités membres de l'ISO. Dans le cas peu probable où surviendrait un problème d'utilisation, veuillez en informer le
Secrétariat central à l'adresse donnée ci-dessous.
© ISO/CEI 2000
Droits de reproduction réservés. Sauf prescription différente, aucune partie de cette publication ne peut être reproduite ni utilisée sous quelque
forme que ce soit et par aucun procédé, électronique ou mécanique, y compris la photocopie et les microfilms, sans l'accord écrit de l’ISO à
l’adresse ci-aprèsouducomité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 56 � CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax. + 41 22 749 09 47
E-mail copyright@iso.ch
Web www.iso.ch
Version française parue en 2001
Imprimé en Suisse
ii © ISO/CEI 2000 – Tous droits réservés
ISO/CEI 13818-2:2000(F)
TABLE DES MATIÈRES
Page
Intro. 1 Objet. vi
Intro. 2 Application . vi
Intro. 3 Profils et niveaux. vi
Intro. 4 Syntaxe échelonnable et syntaxe non échelonnable . vii
1 Domaine d'application . 1
2 Références normatives. 1
3 Définitions . 2
4 Abréviations et symboles. 9
4.1 Opérateurs arithmétiques. 9
4.2 Opérateurs logiques. 9
4.3 Opérateurs relationnels. 10
4.4 Opérateurs binaires. 10
4.5 Affectation. 10
4.6 Mnémoniques . 10
4.7 Constantes . 11
5 Conventions . 11
5.1 Méthode de description de la syntaxe du flux binaire . 11
5.2 Définition des fonctions . 12
5.3 Valeur réservée, valeur interdite et bit marqueur . 12
5.4 Précision arithmétique . 12
6 Syntaxe et sémantique du flux binaire de données vidéo . 13
6.1 Structure des données vidéo codées . 13
6.2 Syntaxe du flux binaire de données vidéo codées . 24
6.3 Sémantique du flux binaire de données vidéo codées . 39
7 Processus de décodage des données vidéo. 67
7.1 Structures syntaxiques supérieures. 68
7.2 Décodage à longueur variable . 68
7.3 Balayage inverse des coefficients. 71
7.4 Quantification inverse . 72
7.5 Transformation DCT inverse. 77
7.6 Compensation de mouvement . 77
7.7 Echelonnabilité spatiale. 92
7.8 Echelonnabilité SNR . 105
7.9 Echelonnabilité temporelle. 111
7.10 Subdivision des données. 114
7.11 Echelonnabilité hybride. 115
7.12 Sortie du processus de décodage . 116
8 Profils et niveaux . 120
8.1 Compatibilité avec l'ISO/CEI 11172-2. 122
8.2 Relation entre profils définis . 122
8.3 Relation entre niveaux définis . 124
8.4 Couches échelonnables. 124
8.5 Valeurs paramétriques pour profils, niveaux et couches définis . 127
8.6 Prescriptions de compatibilité pour les décodeurs. 131
9 Enregistrement des identificateurs de droits d'auteur . 132
9.1 Généralités. 132
9.2 Implémentation d'un organisme d'enregistrement . 132
© ISO/CEI 2000 – Tous droits réservés iii
ISO/CEI 13818-2:2000(F)
Page
Annexe A – Transformation discrète en cosinus inverse . 133
Annexe B – Tables des codes à longueur variable. 135
B.1 Adressage des macroblocs. 135
B.2 Type de macrobloc . 136
B.3 Structure des macroblocs. 141
B.4 Vecteurs de mouvement . 142
B.5 Coefficients DCT . 143
Annexe C – Vérificateur de mémoire vidéo. 152
Annexe D – Caractéristiques supportées par l'algorithme. 157
D.1 Aperçu général . 157
D.2 Formats vidéo. 157
D.3 Qualité d'image. 158
D.4 Contrôle du débit . 158
D.5 Mode à faible délai . 159
D.6 Accès aléatoire/interconnexion des canaux . 159
D.7 Echelonnabilité. 159
D.8 Compatibilité . 167
D.9 Différences entre la présente Spécification et l'ISO/CEI 11172-2. 168
D.10 Complexité . 170
D.11 Edition des flux binaires codés. 170
D.12 Modes d'enrichissement. 171
D.13 Robustesse aux erreurs . 172
D.14 Séquences concaténées. 180
Annexe E – Restrictions de profil et de niveau. 182
E.1 Restrictions applicables aux éléments syntaxiques dans les profils . 182
E.2 Combinaisons de couches autorisées. 196
Annexe F – Bibliographie. 219
Annexe G – Procédure d'enregistrement. 220
G.1 Procédure de demande d'un identificateur enregistré (RID). 220
G.2 Responsabilités de l'organisme d'enregistrement. 220
G.3 Responsabilités des parties demandant un identificateur RID. 220
G.4 Procédure d'appel en cas de refus de demande. 221
Annexe H – Formulaire de demande d'enregistrement . 222
H.1 Renseignements de contact sur l'organisation demandant un identificateur enregistré (RID). 222
H.2 Déclaration d'intention d'appliquer l'identificateur RID assigné . 222
H.3 Date d'implémentation prévue de l'identificateur RID . 222
H.4 Représentant autorisé. 222
H.5 Cadre réservé à l'usage officiel de l'organisme d'enregistrement . 222
Annexe I. 223
Annexe J – Résultats d'essais avec le profil 4:2:2. 224
J.1 Introduction . 224
Annexe K – Patents . 229
iv © ISO/CEI 2000 – Tous droits réservés
ISO/CEI 13818-2:2000(F)
Avant-propos
L'ISO (Organisation internationale de normalisation) et la CEI (Commission électrotechnique internationale)
forment le système spécialisé de la normalisation mondiale. Les organismes nationaux membres de l'ISO ou de la
CEI participent au développement de Normes internationales par l'intermédiaire des comités techniques créés par
l'organisation concernée afin de s'occuper des domaines particuliers de l'activité technique. Les comités
techniques de l'ISO et de la CEI collaborent dans des domaines d'intérêt commun. D'autres organisations
internationales, gouvernementales et non gouvernementales, en liaison avec l'ISO et la CEI participent également
aux travaux.
Les Normes internationales sont rédigées conformément aux règles données dans les Directives ISO/CEI, Partie 3.
Dans le domaine des technologies de l'information, l'ISO et la CEI ont créé un comité technique mixte,
l'ISO/CEI JTC 1. Les projets de Normes internationales adoptés par le comité technique mixte sont soumis aux
organismes nationaux pour vote. Leur publication comme Normes internationales requiert l'approbation de 75 % au
moins des organismes nationaux votants.
L'attention est appelée sur le fait que certains des éléments de la présente partie de l'ISO/CEI 13818 peuvent faire
l'objet de droits de propriété intellectuelle ou de droits analogues. L'ISO et la CEI ne sauraient être tenues pour
responsables de ne pas avoir identifié de tels droits de propriété et averti de leur existence.
La Norme internationale ISO/CEI 13818-2 a été élaborée par le comité technique mixte ISO/CEI JTC 1,
Technologies de l'information, sous-comité SC 29, Codage du son, de l'image, de l'information multimédia et
hypermédia, en collaboration avec l'UIT-T. Le texte identique est publié en tant que Rec. UIT-T H.262.
Cette deuxième édition annule et remplace la première édition (ISO/CEI 13818-2:1996), qui a fait l'objet d'une
révision technique.
L'ISO/CEI 13818 comprend les parties suivantes, présentées sous le titre général Technologies de l'information —
Codage générique des images animées et du son associé:
Partie 1: Systèmes
Partie 2: Données vidéo
Partie 3: Son
Partie 4: Essais de conformité
Partie 5: Simulation de logiciel
Partie 6: Extensions pour DSM-CC
Partie 7: Codage du son avancé (AAC)
Partie 9: Extension pour interface temps réel pour systèmes décodeurs
Partie 10: Extensions de conformité pour commande et contrôle de supports de mémoire numérique (DSM-CC)
Les annexes A, B et C constituent des éléments normatifs de la présente partie de l'ISO/CEI 13818. Les annexes
D à K sont données uniquement à titre d'information.
© ISO/CEI 2000 – Tous droits réservés v
ISO/CEI 13818-2:2000(F)
Introduction
Intro. 1 Objet
La présente Partie de cette Recommandation | Norme internationale a été élaborée en réponse au besoin croissant d'une
méthode de codage générique des images animées et du son associé, pour diverses applications telles que les supports
numériques d'enregistrement, la télédiffusion et la communication. L'utilisation de la présente Spécification implique que
les données vidéo d'images animées peuvent être manipulées sous forme de données informatiques, stockées sur divers
supports d'enregistrement, être émises et reçues au moyen des réseaux existants et futurs, puis distribuées sur les canaux
de télédiffusion existants et futurs.
Intro. 2 Application
Les applications de la présente Spécification couvrent différents domaines tels que:
BSS Service de radiodiffusion par satellite (à domicile) (broadcasting satellite service)
CATV Télévision par câble, distribuée par câbles à fibres optiques, à conducteurs métalliques, etc.
(cable TV)
CDAD Distribution audionumérique par câble (cable digital audio distribution)
DSB Diffusion audionumérique (de Terre et par satellite) (digital sound broadcasting)
DTTB Télédiffusion numérique par voie hertzienne de Terre (digital terrestrial television broadcasting)
EC Cinéma électronique (electronic cinema)
ENG Journalisme électronique télévisuel (y compris le reportage d'actualités par satellite) (RAS)
(electronic news gathering)
FSS Service fixe par satellite (par exemple, vers des têtes de réseau) (fixed satellite service)
HTT Spectacle télédiffusé à domicile (home television theatre)
IPC Communications interpersonnelles (interpersonal communications) (visioconférence, visio-
phone, etc.)
ISM Supports d'enregistrements interactifs (interactive storage media) (disques optiques, etc.)
MMM Messagerie multimédia
NCA Nouvelles et actualités (news and current affairs)
NDB Base de données sur réseau (via ATM, etc.) (networked database)
RVS Télévidéosurveillance (remote video surveillance)
SSM Support d'enregistrement séquentiel (magnétoscopes numériques, etc.) (serial storage media)
Intro. 3 Profils et niveaux
La présente Spécification a une vocation générique, en ce sens qu'elle vise une large gamme d'applications, de débits, de
résolutions, de qualités et de services. Les applications devraient couvrir entre autres les supports d'enregistrement
numérique, la télédiffusion et les communications. Au cours de l'élaboration de la présente Spécification, diverses
exigences ont été prises en considération sur la base d'applications typiques. Les éléments algorithmiques nécessaires ont
été mis au point et ont été intégrés dans une syntaxe unique. C'est pourquoi la présente Spécification facilitera l'échange
de flux binaires entre applications différentes.
Compte tenu cependant des impératifs d'implémentation pratique de la syntaxe générale décrite dans la présente
Spécification, un nombre limité de modules de cette syntaxe sont également stipulés en tant que "profils" et "niveaux".
Ces termes, et leurs analogues, sont définis formellement à l'article 3.
Un "profil" est un sous-ensemble défini de la syntaxe générale du flux binaire, elle-même définie par la présente
Spécification. Dans le cadre des limites imposées par la syntaxe d'un profil donné, il est toujours possible de prescrire
une très grande étendue de variation de performance pour les codeurs et les décodeurs, en fonction des valeurs attribuées
aux paramètres du flux binaire. Il est par exemple possible de spécifier des images de dimensions allant jusqu'à environ
14 14
2 pels en largeur par 2 lignes en hauteur. Pour le moment, il n'est ni pratique ni économique d'implémenter un
décodeur capable de traiter toutes les tailles d'image possibles.
vi © ISO/CEI 2000 – Tous droits réservés
ISO/CEI 13818-2:2000(F)
Pour résoudre ce problème, des "niveaux" sont définis dans chaque profil. Un niveau est défini comme étant un ensemble
de contraintes imposées sur des paramètres dans le flux binaire. Ces contraintes peuvent être de simples limites
numériques. Elles peuvent également prendre la forme de limites imposées à des combinaisons arithmétiques des
paramètres (par exemple, largeur d'image fois la hauteur d'image fois la fréquence image).
Les flux binaires conformes à la présente Spécification font appel à une syntaxe commune. Afin d'obtenir un sous-
ensemble de la syntaxe complète, on inclut dans le flux binaire des fanions et des paramètres qui signalent la présence ou
l'absence d'éléments syntaxiques devant apparaître ultérieurement dans le flux binaire. Pour spécifier des contraintes
syntaxiques (et donc définir un profil), il suffit donc de délimiter les valeurs de ces fanions et paramètres, spécifiant la
présence d'éléments syntaxiques ultérieurs.
Intro. 4 Syntaxe échelonnable et syntaxe non échelonnable
On peut subdiviser la syntaxe complète en deux catégories principales: l'une est la syntaxe non échelonnable, qui est
structurée comme un surensemble de la syntaxe définie dans l'ISO/CEI 11172-2. La principale caractéristique de la
syntaxe non échelonnable est la présence d'outils supplémentaires de compression pour les signaux vidéo entrelacés. La
deuxième catégorie est la syntaxe échelonnable, dont la principale caractéristique est de permettre la reconstruction de
données vidéo utiles à partir de segments d'un flux binaire total. Le processus consiste à structurer le flux total en deux
couches différentes ou plus, en commençant par une couche de base autonome et en ajoutant un certain nombre de
couches d'amélioration. La couche de base peut utiliser la syntaxe non échelonnable ou, en certaines occurrences, utiliser
une syntaxe conforme à l'ISO/CEI 11172-2.
Intro. 4.1 Aperçu général de la syntaxe non échelonnable
La représentation codée qui est définie dans la syntaxe non échelonnable réalise un taux de compression élevé tout en
préservant une bonne qualité d'image. L'algorithme n'est pas sans perte car les valeurs exactes des pels ne sont pas
conservées au cours du codage. L'obtention d'une bonne qualité iconographique aux débits en cause exige un taux de
compression très élevé, ce qui n'est pas réalisable avec le seul codage intra-image. La nécessité d'un accès aléatoire est
toutefois mieux prise en compte avec un pur codage intra-image. Le choix entre les techniques est fondé sur le besoin de
trouver un compromis entre une qualité d'image élevée, un taux de compression élevé et la nécessité de permettre un
accès aléatoire au flux binaire codé.
Un certain nombre de techniques sont utilisées pour obtenir un taux de compression élevé. L'algorithme utilise d'abord
une compensation de mouvement par blocs, ce qui permet de diminuer la redondance temporelle. Cette compensation de
mouvement est utilisée aussi bien pour la prédiction causale de l'image actuelle à partir d'une image précédente que pour
la prédiction non causale (interpolative) à partir d'images précédentes et futures. Les vecteurs de mouvement sont définis
pour chaque région de 16 pels × 16 lignes de l'image. Le signal différentiel, c'est-à-dire l'erreur de prédiction, est encore
comprimé au moyen de la transformée discrète en cosinus (DCT, discrete cosine transform) afin d'en éliminer les
corrélations spatiales avant sa quantification au cours d'un processus irréversible qui ignore les informations moins
importantes. Finalement, les vecteurs de mouvement sont combinés avec les informations résiduelles de transformation
DCT puis codés au moyen de mots à longueur variable.
Intro. 4.1.1 Traitement temporel
En raison des exigences contradictoires de l'accès aléatoire et d'une compression très efficace, trois principaux types
d'image seront définis. Les images à codage intratrame ou images intra (images I) sont codées sans référence à d'autres
images. Elles fournissent des points d'accès à la séquence codée, à partir desquels le décodage peut commencer; mais
elles ne sont codées qu'à un taux de compression modeste. Les images à codage prédictif ou images prédites (images P)
sont codées plus efficacement, avec une prédiction basée sur la compensation de mouvement à partir d'une précédente
image intra ou prédite. Les images P sont généralement utilisées comme références pour la prédiction des images
suivantes. Les images codées par prédiction bidirectionnelle ou images bidirectionnelles (images B) offrent le taux de
compression le plus élevé mais nécessitent la présence d'images de référence aussi bien antérieures que postérieures pour
effectuer la compensation de mouvement. Ces images bidirectionnelles ne sont jamais utilisées comme références pour la
prédiction (sauf dans le cas où l'image résultante est utilisée comme référence dans une couche d'amélioration
spatialement échelonnable). L'organisation de ces trois types d'image dans une séquence est très souple. Le choix en est
laissé au codeur et dépendra des exigences de l'application. La Figure Intro. 1 montre la relation entre les trois différents
types d'image.
© ISO/CEI 2000 – Tous droits réservés vii
ISO/CEI 13818-2:2000(F)
interpolation bidirectionnelle
B P BB
B B
I P
T1516650-94/d01
prédiction
Figure Intro. 1 – Exemple de structure temporelle d'images
FIGURE Intro. 1/H.262.[D01]
Intro. 4.1.2 Codage d'images vidéo entrelacées
Chaque image d'une vidéo entrelacée est formée de deux trames, séparées par un signal de synchronisation de trame.
Selon la présente Spécification, les deux trames d'une image complète peuvent être codées soit comme une seule image
soit comme deux images. Le codage image (ou "bi-trame") ou le codage trame (ou monotrame) peut être sélectionné
dynamiquement, image par image. Le codage image est normalement préféré lorsque la scène vidéo contient un nombre
important de détails avec peu de mouvements. Le codage trame, dans lequel la deuxième trame peut être prédite d'après
la première, donne de meilleurs résultats lorsque les mouvements sont rapides.
Intro. 4.1.3 Représentation du mouvement – Macroblocs
Comme dans l'ISO/CEI 11172-2, le choix de 16 × 16 macroblocs pour l'unité de compensation de mouvement est le
résultat d'un compromis entre le gain de codage obtenu en utilisant les informations de mouvement et la servitude binaire
due à leur représentation. Chaque macrobloc peut être temporellement prédit, selon une parmi plusieurs méthodes
différentes. Par exemple, en codage image, la prédiction de la précédente image peut elle-même se fonder sur les deux
trames de cette image ou sur une seule. Selon le type de macrobloc, l'information du vecteur de mouvement, et toutes
autres informations associées, est codée avec un signal d'erreur de prédiction dans chaque macrobloc. Les vecteurs de
mouvement sont codés différentiellement tout en respectant les derniers vecteurs de mouvement codés au moyen de mots
à longueur variable. On peut programmer image par image la longueur maximale des vecteurs de mouvement, de
manière que les applications les plus exigeantes puissent être mises en œuvre sans compromettre la performance du
système dans des situations plus courantes.
Il appartient au codeur de calculer les vecteurs de mouvement appropriés. La présente Spécification ne spécifie pas la
façon dont il convient d'effectuer ce calcul.
Intro. 4.1.4 Réduction de la redondance spatiale
Les signaux des images originales et les signaux d'erreur de prédiction possèdent, les uns comme les autres, un degré
élevé de redondance spatiale. La présente Spécification fait appel à une méthode de filtrage par transformée DCT sur des
blocs, avec quantification pondérée en termes de perception visuelle et codage des longueurs de séquence. Après une
prédiction compensée en mouvement ou une interpolation, l'image résiduelle est subdivisée en 8 × 8 blocs. Ces blocs
sont convertis dans le domaine DCT, où ils sont ensuite pondérés avant d'être quantifiés. A l'issue du processus de
quantification, de nombreux coefficients ont une valeur nulle. On fait alors appel à un codage à longueur variable sur un
tableau à 2 dimensions afin de coder efficacement tous les coefficients.
Intro. 4.1.5 Formats de chrominance
En plus du format 4:2:0 qui est retenu dans l'ISO/CEI 11172-2, la présente Spécification supporte les formats de
chrominance 4:2:2 et 4:4:4.
viii © ISO/CEI 2000 – Tous droits réservés
ISO/CEI 13818-2:2000(F)
Intro. 4.2 Extensions échelonnables
Les outils d'échelonnabilité figurant dans la présente Spécification sont conçus pour gérer des applications pouvant
supporter plusieurs couches vidéo. Entre autres domaines d'application notables, on citera les télécommunications vidéo,
les communications vidéo sur réseaux à mode de transfert asynchrone (ATM, asynchronous transfer mode),
l'interfonctionnement des standards vidéo, la hiérarchisation des services vidéo selon diverses résolutions spatiales,
temporelles et qualitatives, la TVHD à TV incorporée, les systèmes autorisant la migration vers une TVHD à résolution
temporelle plus élevée, etc. Une solution simple, en termes de vidéo échelonnable, est la technique de diffusion
simultanée qui est fondée sur la transmission ou l'enregistrement, après lecture du support, de multiples sources vidéo
codées indépendamment les unes des autres; mais une autre solution, plus efficace, consiste à effectuer un codage
échelonnable des données vidéo, dans lequel la largeur de bande attribuée à une reproduction donnée du support vidéo
peut être partiellement réutilisée pour le codage de la reproduction vidéo suivante. Dans un codage vidéo échelonnable,
on part du principe que, pour chaque type de flux binaire codé, des décodeurs présentant divers degrés de complexité
peuvent décoder et afficher des données vidéo codées, lues de manière appropriée. Un codeur échelonnable est
susceptible d'être plus complexe qu'un codeur à une seule couche. La présente Recommandation | Norme internationale
distingue toutefois plusieurs modes d'échelonnabilité, s'adressant à des applications ne se recoupant pas, de complexité
correspondante. Les principaux outils d'échelonnabilité offerts sont les suivants:
– subdivision des données;
– échelonnabilité SNR;
– échelonnabilité spatiale;
– échelonnabilité temporelle.
Par ailleurs, des combinaisons de ces outils d'échelonnabilité de base sont également possibles: de telles combinaisons
seront appelées échelonnabilité hybride. Dans le cas de l'échelonnabilité de base, on peut avoir deux couches de flux
vidéo: la couche inférieure et la couche d'amélioration; alors que, dans le cas de l'échelonnabilité hybride, on peut avoir
jusqu'à trois couches de données. Les Tableaux Intro. 1 à Intro. 3 présentent quelques exemples d'application selon divers
degrés d'échelonnabilité.
Tableau Intro. 1 – Applications de l'échelonnabilité SNR
Couche inférieure Couche d'amélioration Application
Recommandation Même résolution et même Service à deux niveaux de qualité pour la
UIT-R BT.601 format que la couche télévision conventionnelle
inférieure
Haute définition Même résolution et même Service à deux niveaux de qualité pour la
format que la couche TVHD
inférieure
Haute définition en 4:2:0 Diffusion simultanée en Production/distribution vidéo
format chromatique 4:2:2
Tableau Intro. 2 – Applications de l'échelonnabilité spatiale
Base Amélioration Application
Balayage progressif (30 Hz) Balayage progressif (30 Hz) Compatibilité ou échelonnabilité CIF/SCIF
Balayage entrelacé (30 Hz) Balayage entrelacé (30 Hz) Echelonnabilité TVHD/SDTV
Balayage progressif (30 Hz) Balayage entrelacé (30 Hz) Compatibilité avec l'ISO/CEI 11172-2 ou avec la
présente Spécification
Balayage entrelacé (30 Hz) Balayage progressif (60 Hz) Migration vers TVHD à haute résolution temporelle
et balayage progressif
© ISO/CEI 2000 – Tous droits réservés ix
ISO/CEI 13818-2:2000(F)
Tableau Intro. 3 – Applications de l'échelonnabilité temporelle
Base Amélioration Niveau supérieur Application
Balayage progressif (30 Hz) Balayage progressif (30 Hz) Balayage progressif (60 Hz) Migration vers TVHD à haute
résolution temporelle et balayage
progressif
Balayage entrelacé (30 Hz) Balayage entrelacé (30 Hz) Balayage progressif (60 Hz) Migration vers TVHD à haute
résolution temporelle et balayage
progressif
Intro. 4.2.1 Extension à échelonnabilité spatiale
L'échelonnabilité spatiale est un outil destiné à être utilisé dans des applications vidéo telles que les télécommunications,
l'interfonctionnement de standards vidéo, la consultation de bases de données vidéo, l'interfonctionnement TV-TVHD,
etc., c'est-à-dire dans des systèmes vidéo dont la principale caractéristique commune est l'exigence d'au moins deux
couches de résolution spatiale. L'échelonnabilité spatiale implique la production, à partir d'une seule source vidéo, de
deux couches de résolution spatiale telles que la couche inférieure soit codée indépendamment pour fournir la couche de
base de la résolution spatiale et que la couche d'amélioration utilise cette couche inférieure comme base d'interpolation
spatiale pour apporter la pleine résolution spatiale de la source d'entrée vidéo. La couche de base inférieure et la couche
d'amélioration peuvent soit utiliser toutes les deux les outils de codage décrits dans la présente Spécification, ou bien
utiliser les outils de l'ISO/CEI 11172-2 pour la couche inférieure et ceux de la présente Spécification pour la couche
supérieure d'amélioration. Cette dernière variante offre l'avantage supplémentaire de faciliter l'interfonctionnement entre
normes de codage vidéo. En outre, l'échelonnabilité spatiale offre la possibilité de choisir le format vidéo à employer
dans chaque couche. Elle permet aussi d'assurer une meilleure robustesse aux erreurs de transmission en acheminant les
données plus importantes de la couche inférieure par un canal présentant de meilleures caractéristiques en termes de
protection contre les erreurs, tandis que les données moins critiques de la couche d'amélioration peuvent être acheminées
par un canal présentant une protection d'erreur de qualité moindre.
Intro. 4.2.2 Extension à échelonnabilité SNR
L'échelonnabilité rapport signal sur bruit (SNR, signal-noise ratio) est un outil destiné à être utilisé dans des applications
vidéo telles que les télécommunications, les services vidéo de diverses qualités, la TV conventionnelle et la TVHD, c'est-
à-dire dans des systèmes vidéo dont la principale caractéristique commune est l'exigence d'au moins deux couches de
qualité vidéo. L'échelonnabilité SNR implique la production, à partir d'une seule source vidéo, de deux couches vidéo de
même résolution spatiale, telles que la couche inférieure soit codée indépendamment pour fournir la couche de base de la
qualité vidéo et que la couche d'amélioration soit codée de manière à renforcer cette couche inférieure. Une fois ajoutée à
la couche inférieure, la couche d'amélioration apporte une meilleure qualité de reproduction de la source vidéo d'entrée.
La couche de base inférieure et la couche d'amélioration peuvent soit utiliser toutes les deux les outils de codage décrits
dans la présente Spécification, ou bien utiliser les outils de l'ISO/CEI 11172-2 pour la couche inférieure et ceux de la
présente Spécification pour la couche supérieure. L'échelonnabilité SNR permet aussi d'assurer une meilleure robustesse
aux erreurs de transmission en acheminant les données plus importantes de la couche inférieure par un canal présentant
de meilleures caractéristiques en termes de protection contre les erreurs, tandis que les données moins critiques de la
couche d'amélioration peuvent être acheminées par un canal présentant une protection d'erreur de qualité moindre.
Intro. 4.2.3 Extension à échelonnabilité temporelle
L'échelonnabilité temporelle est un outil destiné à être utilisé dans une gamme d'applications vidéo diverses, allant des
télécommunications à la TVHD, pour lesquelles il peut être nécessaire d'assurer une migration vers des systèmes offrant
une résolution temporelle supérieure à celle d'autres systèmes analogues. Souvent, les systèmes vidéo à faible résolution
temporelle pourront être les systèmes existants actuellement ou les systèmes moins coûteux des premières générations,
l'objectif étant d'introduire progressivement des systèmes plus évolués. L'échelonnabilité temporelle implique la
stratification des images vidéo: alors que la couche inférieure est codée indépendamment pour fournir la base de temps
principale, la couche d'amélioration subit un codage temporel prédictif par rapport à la couche inférieure. Une fois
décodées et démultiplexées dans le temps, ces deux couches fournissent la pleine résolution temporelle de la source
vidéo. Les systèmes vidéo à faible résolution temporelle ne peuvent décoder que la couche inférieure pour fournir la
x © ISO/CEI 2000 – Tous droits réservés
ISO/CEI 13818-2:2000(F)
résolution temporelle de base, tandis que les futurs systèmes plus évolués pourront décoder les deux couches et fournir
des signaux de haute résolution temporelle tout en conservant l'interfonctionnement avec les systèmes vidéo des
générations antérieures. Un avantage complémentaire de l'échelonnabilité temporelle est d'offrir une certaine élasticité
aux erreurs de transmission étant donné que les données de la couche inférieure, qui sont les plus importantes, peuvent
être acheminées par un canal présentant de meilleures caractéristiques en termes de protection contre les erreurs, tandis
que les données moins critiques de la couche d'amélioration peuvent être acheminées par un canal présentant une
moindre qualité en termes de protection contre les erreurs.
Intro. 4.2.4 Extension vers la subdivision des données
La subdivision des données est un outil destiné à être utilisé lorsque deux canaux sont disponibles pour la transmission et
pour l'enregistrement d'un flux binaire vidéo, comme ce peut être le cas dans les réseaux ATM, en radiodiffusion de
Terre, avec des supports magnétiques, etc. Le flux binaire est subdivisé entre ces canaux de manière que ses parties les
plus critiques (comme les en-têtes, les vecteurs de mouvement, les coefficients DCT) soient transmises dans le canal
offrant les meilleures caractéristiques en termes de protection contre les erreurs et que les données moins critiques (telles
que les coefficients DCT d'ordre supérieur) soient transmises dans le canal présentant une moindre qualité en termes de
protection contre les erreurs. Ce procédé permet de minimiser les erreurs de type dégradation introduite dans le canal car
les parties critiques du flux binaire sont dans un canal mieux protégé. Aucune donnée de ces deux canaux ne peut être
traitée par un décodeur non destiné au décodage de flux binaires à subdivision des données.
© ISO/CEI 2000 – Tous droits réservés xi
ISO/CEI 13818-2 : 2000 (F)
NORME INTERNATIONALE
ISO/CEI 13818-2 : 2000 (F)
Rec. UIT-T H.262 (2000 F)
RECOMMANDATION UIT-T
TECHNOLOGIES DE L'INFORMATION – CODAGE GÉNÉRIQUE
DES IMAGES ANIMÉES ET DU SON ASSOCIÉ: DONNÉES VIDÉO
1 Domaine d'application
La présente Recommandation | Norme internationale spécifie la représentation codée des informations d'image pour
supports d'enregistrement numérique et vidéocommunications, ainsi que le processus de décodage correspondant. Cette
représentation est compatible avec la transmission à débit constant, la transmission à débit variable, l'accès aléatoire,
l'interconnexion des canaux, le décodage échelonnable, l'édition du flux binaire ainsi que des fonctions spéciales telles
que la lecture rapide en avant, la lecture rapide en arrière, le ralenti, la pause et les arrêts sur image. La présente
Recommandation | Norme internationale est compatible en aval avec l'ISO/CEI 11172-2 et compatible aussi bien en
amont qu'en aval avec les formats de télévision à définition améliorée (EDTV), de télévision à haute définition (TVHD)
et de télévision conventionnelle (SDTV).
La présente Recommandation | Norme internationale est principalement applicable aux supports d'enregistrement
numérique, à la vidéodiffusion et aux vidéocommunications. Les supports d'enregistrement peuvent être reliés au
décodeur directement ou par l'intermédiaire de moyens de communication tels que des bus d'interconnexion, des réseaux
locaux ou des liaisons de télécommunication.
2 Références normatives
Les Recommandations et Normes internationales suivantes contiennent des dispositions qui, par suite de la référence qui
y est faite, constituent des dispositions valables pour la présente Recommandation | Norme internationale. Au moment de
la publication, les éditions indiquées étaient en vigueur. Toutes Recommandations et Normes sont sujettes à révision et
les parties prenantes aux accords fondés sur la présente Recommandation | Norme internationale sont invitées à
rechercher la possibilité d'appliquer les éditions les plus récentes des Recommandations et Normes indiquées ci-après.
Les membres de la CEI et de l'ISO possèdent le registre des Normes internationales en vigueur. Le Bureau de la
normalisation des télécommunications de l'UIT tient à jour une liste des Recommandations UIT-T en vigueur.
– Recommandation UIT-R BT.601-3 (1992), Paramètres de codage de télévision numérique pour studios.
– Recommandation UIT-R BR.648 (1986), Enregistrement numérique des signaux audio.
– Rapport UIT-R 955-2 (1990), Radiodiffusion sonore par satellite pour récepteurs portatifs et récepteurs
dans des véhicules automobiles.
– ISO/CEI 11172-1:1993, Technologies de l'information – Codage de l'image animée et du son associé pour
les supports de stockage numérique jusqu'à environ 1,5 Mbit/s – Partie 1: Systèmes.
– ISO/CEI 11172-2:1993, Technologies de l'information – Codage de l'image animée et du son associé pour
les supports de stockage numérique jusqu'à environ 1,5 Mbit/s – Partie 2: Vidéo.
– ISO/CEI 11172-3:1993, Technologies de l'informati
...
The article discusses the SIST ISO/IEC 13818-2:2005 standard, which is related to the generic coding of moving pictures and audio information in the field of information technology. Specifically, it focuses on video coding. The standard provides guidelines and specifications for the compression and decompression of video data, allowing for efficient storage and transmission. It defines various parameters and algorithms for video coding, enabling compatibility and interoperability between different systems and devices. The SIST ISO/IEC 13818-2:2005 standard plays a crucial role in the development and implementation of digital video technologies.
기사 제목: SIST ISO/IEC 13818-2:2005 - 정보 기술 - 움직이는 영상 및 관련 오디오 정보의 일반적인 인코딩: 비디오 기사는 정보 기술 분야에서 움직이는 영상과 오디오 정보의 일반적인 인코딩과 관련된 SIST ISO/IEC 13818-2:2005 표준에 대해 논의하고 있습니다. 특히, 비디오 코딩에 초점을 맞추고 있습니다. 이 표준은 비디오 데이터의 압축과 해제에 대한 지침과 사양을 제공하여 효율적인 저장 및 전송이 가능하게 합니다. 비디오 코딩을 위해 다양한 매개 변수와 알고리즘을 정의하며, 다른 시스템과 장치 간의 호환성과 상호 운용성을 가능하게 합니다. SIST ISO/IEC 13818-2:2005 표준은 디지털 비디오 기술의 개발과 구현에 중요한 역할을 합니다.
記事のタイトル:SIST ISO/IEC 13818-2:2005−情報技術−動画と関連する音声情報の一般的なコーディング:ビデオ この記事では、情報技術の分野における動画と音声情報の一般的なコーディングに関連するSIST ISO/IEC 13818-2:2005の標準について説明されています。具体的には、ビデオのコーディングに焦点が当てられています。この標準は、ビデオデータの圧縮と伸張のためのガイドラインや仕様を提供し、効率的な保存と伝送が可能となります。ビデオのコーディングに関するさまざまなパラメータやアルゴリズムを定義し、異なるシステムやデバイス間の互換性と相互運用性を実現する役割を果たしています。SIST ISO/IEC 13818-2:2005の標準は、デジタルビデオ技術の開発と実装において重要な役割を果たしています。


























Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...