Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video (ISO/IEC 11172-2:1993)

Informationstechnik - Codierung von bewegten Bildern und damit verbundenen Tonsignalen für digitale Speichermedien bis zu 1,5 Mbit/s - Teil 2: Video (ISO/IEC 11172-2:1993)

Technologies de l'information - Codage de l'image animée et du son associé pour les supports de stockage numérique jusqu'à environ 1,5 Mbit/s - Partie 2: Vidéo (ISO/IEC 11172-2:1993)

Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video (ISO/IEC 11172-2:1993)

General Information

Status
Withdrawn
Publication Date
23-Feb-1995
Withdrawal Date
08-Jun-2005
Current Stage
9960 - Withdrawal effective - Withdrawal
Start Date
09-Jun-2005
Completion Date
09-Jun-2005
Standard

EN ISO/IEC 11172-2:1997

English language
114 pages
Preview
Preview
e-Library read for
1 day

Frequently Asked Questions

EN ISO/IEC 11172-2:1995 is a standard published by the European Committee for Standardization (CEN). Its full title is "Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video (ISO/IEC 11172-2:1993)". This standard covers: Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video (ISO/IEC 11172-2:1993)

Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video (ISO/IEC 11172-2:1993)

EN ISO/IEC 11172-2:1995 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

You can purchase EN ISO/IEC 11172-2:1995 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of CEN standards.

Standards Content (Sample)


SLOVENSKI STANDARD
01-december-1997
Information technology - Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s - Part 2: Video (ISO/IEC 11172-
2:1993)
Information technology - Coding of moving pictures and associated audio for digital
storage media at up to about 1,5 Mbit/s - Part 2: Video (ISO/IEC 11172-2:1993)
Informationstechnik - Codierung von bewegten Bildern und damit verbundenen
Tonsignalen für digitale Speichermedien bis zu 1,5 Mbit/s - Teil 2: Video (ISO/IEC 11172
-2:1993)
Technologies de l'information - Codage de l'image animée et du son associé pour les
supports de stockage numérique jusqu'a environ 1,5 Mbit/s - Partie 2: Vidéo (ISO/IEC
11172-2:1993)
Ta slovenski standard je istoveten z: EN ISO/IEC 11172-2:1995
ICS:
35.040 Nabori znakov in kodiranje Character sets and
informacij information coding
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

INTERNATIONAL ISO/IEC
STANDARD 11172-2
First edition
1993-08-o 1
Information technology - Coding of
moving pictures and associated audio for
digital storage media at up to about
I,5 Mbit/s -
Part 2:
Video
- Codage de /‘image animee et du son
Technologies de I’informa tion
associ6 pour /es supports de stockage num&ique jusqu’a environ
7,5 Mbit/s -
Partie 2: Vid6o
Reference number
&O/l EC 11172-2: 1993(E)
ISOAEC 11172-2: 1993 (E)
Contents
. . .
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Section 1: General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Normative references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Section 2: Technical elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Symbols and abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Method of describing bitstream syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Annexes
A 8 by 8 Inverse discrete cosine transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B Variable length code tables
C Video buffering verifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
D Guide to encoding video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
holders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
F List of patent
0 ISO/IEC 1993
All rights reserved. No part of this publication may be reproduced or utilized in any form or by
any means, electronic or mechanical, including photocopying and microfilm, without
permission in writing from the publisher.
ISO/IEC Copyright Office l Case Postale 56 l CH 1211 Geneve 20 l Switzerland
Printed in Switzerland.
ii
ISOAEC 11172-2: 1993 (E)
0 ISOAEC
Foreword
IS0 (the International Organization for Standardization) and IEC (the Inter-
national Electrotechnical Commission) form the specialized system for
worldwide standardization. National bodies that are members of IS0 or
IEC participate in the development of International Standards through
technical committees established by the respective organization to deal
with particular fields of technical activity. IS0 and IEC technical com-
mittees collaborate in fields of mutual interest. Other international organ-
izations, governmental and non-governmental, in liaison with IS0 and IEC,
also take part in the work.
In the field of information technology, IS0 and IEC have established a joint
technical committee, lSO/IEC JTC 1. Draft International Standards adopted
by the joint technical committee are circulated to national bodies for vot-
ing. Publication as an International Standard requires approval by at least
75 % of the national bodies casting a vote.
International Standard lSO/IEC 11172-2 was prepared by Joint Technical
.
Committee lSO/IEC JTC 1, information technology, Sub-Committee SC 29,
Coded representation of audio, picture, multimedia and hypermedia infor-
ma tion.
lSO/IEC 11172 consists of the following parts, under the general title In-
formation technology - Coding of moving pictures and associated audio
for digital storage media at up to about 1,5 Mbit/s:
- Part 1: Systems
- Part 2: Video
- Part 3: Audio
- Part 4: Compliance testing
Annexes A, B and C form an integral part of this part of
‘IEC 11172.
Annexes D, E and F are for information only.
. . .
ISOAEC 11172-2: 1993 (E) o ISOAEC
Introduction
Note -- Readers interested in an overview of the MPEG Video layer should read this Introduction and then
proceed to annex D, before returning to clauses 1 and 2.
0.1 Purpose
This part of ISO/I.EC 11172 was developed in response to the growing need for a common format for
representing compressed video on various digital storage media such as CDs, DATs, Winchester disks and
optical drives. This part of ISO/IEC 11172 specifies a coded representation that can be used for
compressing video sequences to bitrates around 1,5 Mbit/s. The use of this part of ISOAEC 11172 means
that motion video can be manipulated as a form of computer data and can be transmitted cvld received over
existing and future networks. The coded representation can be used with both 625line and 525.line
television and provides flexibility for use with workstation and persond computer displays.
This part of ISO/IEC 11172 w(as developed to operate principally from storage media offering a continuous
transfer rate of about 1,5 Mbit/s. Nevertheless it can be used more widely than this because the approach
taken is generic.
0.1.1 Coding parameters
The intention in developing this part of ISO/IEC 11172 has been to defme a source coding algorithm with a
large degree of flexibility that can be used in many different applications. To achieve this goal, a number of
the parameters defining the characteristics of coded bitstreams and decoders are contained in the bitstream
itself, This allows for example, the algorithm to be used for pictures with a variety of sizes and aspect
ratios and on channels or devices operating at a wide range of bitrates.
Because of the large range of the characteristics of bitstreams that can be represented by this part of ISO/IEC
11172, a sub-set of these coding parameters known as the “Constrained Par(ameters” has been defined. The
aim in defining the constrained parameters is to offer guidance about a widely useful range of parameters.
Conforming to this set of constraints is not a requirement of this part of ISO/IEC 11172. A flag in the
bitstream indicates whether or not it is a Constrained Parameters bitstream.
Summary of the Constrained Parameters:
Horizontal picture size Less than or equal to 768 pels
r
Vertical picture size Less than or equal to 576 lines
1 Picture area 1 Less than or equal to 396 macroblocks
Pel rate Less than or equal to 396x25 macroblocks/s
A
r
Picture rate Less than or equal to 30 Hz
I
Motion vector range Less than -64 to +63,5 pels (using half-pel vectors)
backward f code and forward f code c= 4 (see table D.7)]
L
Input buffer size (in VBV model) Less than &equal to 327 680-&s
1 Bitrate i Less than or eaual to 1 856 000 bits/s (constant bitrate) I
0.2 Overview of the algorithm
The coded representation defined in this part of ISO/IEC 11172 achieves a high compression ratio while
preserving good picture quality. The algorithm is not lossless as the exact pel values are not preserved
during coding. The choice of the techniques is based on the need to balance a high picture quality and
compression ratio with the requirement to m(zke random access to the coded bitstream. Obtaining good
picture quality at the bitrates of interest demands a very high compression ratio, which is not achievable
with intraframe coding alone. The need for random access, however, is best satisfied with pure intr~aframe
coding. This requires a careful balance between intra- and interframe coding and between recursive and non-
recursive temporal redundancy reduction.
iv
o ISOAEC ISOAEC 11172-2:1993(E)
A number of techniques are used to achieve a high compression ratio. The first, which is almost
independent from this part of ISO/IEC 11172, is to select an appropriate spatial resolution for the signal
The algorithm then uses block-based motion compensation to reduce the temporal redundancy. Motion
compensation is used for causal prediction of the current picture from a previous picture, for noncausal
prediction of the current picture from a future picture, or for interpolative prediction from past and future
pictures. Motion vectors are defined for each 1693 by 164ine region of the picture. The difference signal,
the prediction error, is further compressed using the discrete cosine transform (DCT) to remove spatial
correlation before it is quantized in an irreversible process that discards the less important information.
Finally, the motion vectors are combined with the DCT information, and coded using variable length codes.
0.2.1 Temporal processing
Because of the conflicting requirements of random access and highly efficient compression, three main
picture types are defined.
11~.~~oded pictures (I-Pictures) are coded without reference to other pictures.
They provide access points to the coded sequence where decoding can begin, but are coded with only a
moderate compression ratio. Predictive coded pictures (P-Pictures) are coded more efficiently using motion
compensated prediction from a past intra or predictive coded picture and are generally used as a reference for
further prediction.
Bidirectionally-predictive coded pictures (B-Pictures) provide the highest degree of
compression but require both past and future reference pictures for motion compensation.
Bidirectionally-
predictive coded pictures are never used as references for prediction.
The organ&ion of the three picture
types in a sequence is very flexible. The choice is left to the encoder and will depend on the requirements of
the application. Figure 1 illustrates the relationship between the three different picture types.
Bi-directional
1 Prediction
Prediction
Figure 1
-- Example of temporal picture structure
The fourth picture type defined in this pcvt of ISO/IEC 11172, the D-picture, is provided to allow a simple,
but limited quality, fast-forward playback mode.
0.2.2 Motion representation - macroblocks
The choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off between
increasing the coding efficiency provided by using motion information and the overhead needed to store it.
Each macroblock can be one of a number of different types. For example, intra-coded, forward-predictive-
coded, backward-predictive coded, and bidirectionally-predictive-coded macroblocks bidirectionally-predictive coded pictures Depending on the type of the macroblock, motion vector
information and other side information are stored with the compressed prediction error signal in each
macroblock. The motion vectors are encoded differenti,?lly with respect to the hast coded motion vector,
using variable-length codes. The mcurimum length of the vectors that may be represented can be
programmed, on a picture-by-picture basis, so that the most demanding applications c compromising the performance of the system in more normal situations.
It is the responsibility of the encoder to calculate appropriate motion vectors.
This part of ISOIIEC 11172
does not specify how this should be done.
V
0 ISOAEC
ISOAEC 1117202:1993 (E)
0.2.3 Spatial redundancy reduction
Both original pictures and prediction error signals have high spatial redundancy. This part of ISODX
11172 uses a block-based DCT method with visually weighted quantization and run-length coding. Each 8
by 8 block of the original picture for intra-coded macroblocks or of the prediction error for predictive-coded
macroblocks is transformed into the DCT domain where it is scaled before being quantized. After
quantization many of the coefficients are zero in value (and so two-dimensional run-length and variable
length coding is used to encode the remaining coefficients efficiently.
0.3 Encoding
fies the syntax and semantics of
This This part part of of ISOAEC ISOAEC 11172 11172 does does not not specify specify an an encoding encoding process. process. It It spa specifies the syntax and semantics of
the bitstream and the signal processing in the decoder. As a result, many options are left open to encoders
the bitstream and the signal processing in the decoder. As a result, many options are left open to encoders
to trade-off cost and speed against picture quality and coding efficiency.
to trade-off cost and speed against picture quality and coding efficiency. T This is clause clause is is a a brief brief description description of of
the functions that need to be performed by an encoder. Figure 2 shows th main functional blocks.
the functions that need to be performed by an encoder. Figure 2 shows the main functional blocks.
r
Legulator
t
.
/ \ .
/
VL
DCT -+ Q
Motion
* . *
Estimator
/ II
.
Si>u&nput pictures
Picture
i ‘p
- store I
Predictor
Where
DCT is discrete cosine transform
DC1 is inverse discrete cosine transform
Q is quantization
Q-’ is dequantization
VLC is v Figure 2
-- Simplified video encoder block diagram
The input vi&o signal must be digitized and represented as a luminance cvld two colour difference signals
(Y, Cb, Cr). This may be followed by preprocessing and format conversion to select an appropriate
window, resolution and input format. This part of ISO/IEC 11172 requires that the colour difference
signals (Cb and Cr) are subsampled with respect to the luminance by 2:l in both vertical and horizontal
directions and are reformatted, if necessary, as a non-interlaced signal.
The encoder must choose which picture type to use for each picture.
Having defined the picture types, the
encoder estimates motion vectors for each 16 by 16 macroblock in the picture.
In P-Pictures one vector is
needed for each non-intra macroblock and in B-Pictures one or two vectors If B-Pictures are used, some reordering of the picture sequence is necessary before encoding.
Because B-
Pictures are coded using bidirectional motion compensated prediction, they can only be decoded after the
subsequent reference picture (an I or P-Picture) h vi
o ISOAEC ISOAEC 11172-2:1993(E)
encoder so that the pictures arrive at the in the order for decoding. The COITect. display order is
decoder
recovered by the decoder.
The basic unit of coding within a picture is the macroblock. Within each picture, macroblocks are encoded
in sequence, left to right, top to bottom. Each macroblock consists of six 8 by 8 blocks: four blocks of
luminance, one block of Cb chrominance, and one block of Cr chrominance. See figure 3. Note that the
picture area covered by the four blocks of luminance is the same as the area covered by each of the
chrominance blocks. This is due to subsampling of the chrominance information to match the sensitivity of
the human visual system.
14 El
I33
Y
Cb Cr
Figure 3 -- Macroblock structure
Firstly, for a given macroblock, the coding mode is chosen. It depends on the picture type, the
effectiveness of motion compensated prediction in that local region, and the nature of the signal within the
block. Secondly, depending on the coding mode, a motion compensated prediction of the contents of the
block based on p data in the current macroblock to form (an error signal Thirdly, this error signal is separated into 8 by 8
blocks (4 lumin~ance and 2 chromincvlce blocks in each macroblock) and a discrete cosine transform is
performed on each block. Each resulting 8 by 8 block of DCT coefficients is quantized and the two-
dimensional block is scanned in a zig-zag order to convert it into a one-dimensional string of quantized DCT
coefficients. Fourthly, the side-information for the macroblock (mode, motion vectors etc) and the
quantized coefficient data are encoded. For maximum efficiency, a number of variable length code tables are
defined for the different data elements. Run-length coding is used for the quantized coefficient data.
A consequence of using different picture types ‘and variable length coding is that the overall data rate is
variable. In applications that involve a fixed-rate channel, a FIFO buffer may be used to match the encoder
output to the chcumel. The status of this buffer may be monitored to control the number of bits generated
by the encoder. Controlling the quantization process is the most direct way of controlling.the bitt-ate. This
part of ISO/IEC 11172 specifies an abstract model of the buffering system (the Video Buffering Verifier) in
order to constrain the maximum variability in the number of bits that are used for a given picture. This
ensures that a bitstream can be decoded with a buffer of known size.
At this stage, the coded representation of the picture has been generated. The final step in the encoder is to
regenerate I-Pictures and P-Pictures by decoding the data so that they can be used subsequent encoding. The quantized coefficients are dequ‘antized and an inverse 8 by 8 DCT is performed on
each block. The prediction error signaI produced is then added back to the prediction signal and limited to
the required range to give a decoded reference picture.
0.4 Decoding
Decoding is the inverse of the encoding operation. It is considerably simpler than encoding as there is no
need to perform motion estimation and there (are many fewer options. The decoding process is defined by
this part of ISO/IEC 11172. The description that follows is a very brief overview of one possible way of
decoding a bitstream. Other decoders with different c?rchitectures are possible. Figure 4 shows the main
functional blocks.
vii
o ISO/IEC
ISOAEC 11172-2:1993 (E)
Quantizer stepsize
Picture
+ Buffer +
.
R&o&r b
+
Coded video
Reconstructed
bitstnam
output pictures
Motion Vectors
b Picture store
DCT-1 is inverse discrete cosine transform
is dequantization
Q-’
MUX-l is demultiplexing
VLD is variable length decoding
Figure 4 -- Basic video decoder block diagram
For fixed-rate applications, the channel fills a FIFO buffer at a constant rate with the coded bitstream. The
decoder reads this buffer and decodes the data elements in the bitstream according to the defined syntax.
As the decoder reads the bitstream, it identifies the start of a coded picture and then the type of the picture.
It decodes each macroblock in the picture in turn. The macroblock type and the motion vectors, if present,
are used to construct a prediction of the current macroblock based on p have been stored in the decoder. The coefficient data are decoded and dequantized. Each 8 by 8 block of
coefficient data is transformed by an inverse DCT (specified in annex A), and the result is added to the
prediction signal and limited to the defined range.
After all the macroblocks in the picture have been processed, the picture has been reconstructed, If it is an I-
picture or a P-picture it is a reference picture for subsequent pictures and is stored, replacing the oldest stored
reference picture.
Before the pictures are displayed they may need to be re-ordered from the coded or&r to
their natural display order. After reordering, the pictures are available, in digital form, for post-processing
and display in any Incanner that the application chooses.
03 Structure of the coded video bitstream
This part of ISO/IEC 11172 specifies a syntax for a coded video bitstream. This syntax contains six layers,
each of which either supports a signal processing or a system function:
Function
Layers of the syntax
Random access unit: context
Sequence layer
Random access unit: video
Group of pictures layer
Primcvy coding unit
Picture layer
Resynchronization unit
Slice layer
Motion compensation unit
Macroblock layer
DCT unit
Block layer
016 Features supported by the algorithm
Applications using compressed video on digital storage media need to be able to perform a number of
operations in addition to normaI forward playback of the sequence. The coded bitstream has been designed
to support a number of these operations.
. . .
vu1
ISOAEC 11172-2:1993(E)
o ISOAEC
0.6.1 Random access
Random access is an essential feature for video on a storage medium Random access requires that any
picture can be decoded in a limited amount of time. It implies the existence of access points in the
bitstream - that is segments of information that are identifiable and can be decoded without reference to other
segments of data. A spacing of two random access points (Intra-Pictures) per second can be achieved
without significant loss of picture quality.
0.6.2 Fast search
Depending on the storage medium, it is possible to scan the access points in a coded bitstream (with the
help of an application-specific directory or other knowledge beyond the scope of this part of ISO/IEC
11172) to obtain a fast-forward and fast-reverse playback effect.
0.6.3 Reverse playback
Some applications may require the vi&o signal to be played in reverse order. This can be achieved in a
decoder by using memory to store entire groups of pictures after they have been decoded before being
displayed in reverse order. An encoder can make this feature easier by reducing the length of groups of
pictures.
0.6.4 Error robustness
Most digital storage media and communication channels are not error-free. Appropriate channel coding
schemes should be used and are beyond the scope of this part of ISO/IEC 11172. Nevertheless the
compression scheme defined in this part of ISO/IEC 11172 is robust to residual errors. The slice structure
allows a decoder to recover after a data error and to resynchronize its decoding. Therefore, bit errors in the
compressed data will cause errors in the decoded pictures to be limited in area. Decoders may be able to use
concealment strategies to disguise these errors.
0.6.5 Editing
There is a conflict between the requirement for high coding efficiency and easy editing. The coding structure
and syntax have not been designed with the primary aim of simplifying editing at any picture. Nevertheless
a number of features have been included that enable editing of coded data.
ix
This page intentionally left blank

ISO/IEC 11172-2:1993(E)
INTERNATIONAL STANDARD @ lSO’lEC
Information technology - Coding of moving
pictures and associated audio for digital storage
media at up to about I,5 Mbit/s -
Part 2:
Video
Section 1: General
1.1 Scope
This part of ISO/IEC 11172 specifies the coded representation of video for digitaI storage media and
specifies the decoding process. The representation supports normal speed forward playback, as well as
special functions such as random access, fast forward playback, fast reverse playback, normal speed reverse
playback, pause and still pictures. This part of ISO/IEC 11172 is compatible with standard 525. and 62%
line television formats, and it provides flexibility for use with personaI computer and workstation displays.
ISO/IEC 11172 is primarily applicable to digital storage media supporting a continuous transfer rate up to
about 1,5 Mbit/s, such as Compact Disc, Digital Audio Tape, and magnetic hczrd disks. Nevertheless it can
be used more widely than this because of the generic approach taken. The storage media may be directly
connected to the decoder, or via communications means such as busses, LANs, or telecommunications
links. This part of ISO/IEC 11172 is intended for non-interlaced video formats having approximately 288
lines of 352 pels and picture rates around 24 Hz to 30 Hz.
1.2 Normative references
The following International Standards contain provisions which, through reference in this text, constitute
provisions of this part of ISO/IEC 11172. At the time of publication, the editions indicated were valid.
AI1 standards are subject to revision, and parties to agreements based on this part of ISO/IEC 11172 are
encouraged to investigate the possibility of applying the most recent editions of the standards indicated
below. Members of IEC and IS0 maintain registers of currently valid International Standards.
ISO/IEC 11172.1:1993 Information technology - Coding of moving pictures and associated audio for digital
storage media at up to about I,5 Mbitis - Part 1: Systems.
ISOAEC 11172.3:1993 Information technology - Coding of moving pictures and associated audio for digital
storage media at up to about 1,5 MbitLs - Part 3 Audio.
CCIR Recommendation 601-2 Encoding parameters of digital television for studios.
CCIR Report 624-4 Characteristics of systems for monochrome and colour television.
CCIR Recommendation 648 Recording of audio signals.
CCIR Report 955-2 Sound broadcasting by satellite for portable and mobile receivers, including Annex IV
Summary description of Advanced Digital System II.
CCITI’ Recommendation J.17 Pre-emphasis used on Sound-Programme Circuits.
0 ISOAEC
ISOAEC 11172-2:1993 (E)
IEEE Draft Standard P118OD2 1990 Specification for the implementation of 8x 8 inverse discrete cosine
trangonn’:
IEC publication 908:1987 CD Digital Audio System.

ISOAEC 11172-2: 1993 (E)
0 ISOAEC
Section 2: Technical elements
2.1 Definitions
For the purposes of ISOAEC 11172, the following definitions apply. If specific to a part, this is noted in
square brackets.
2.1.1 ac coefficient [video]: Any DCT coefficient for which the frequency in one or both dimensions
is non-zero.
2.1 .2 access unit [system]: In the case of compressed audio an access unit is an audio
access unit. In
picture.
the case of compressed video an access unit is the coded representation of a
2.1.3 adaptive segmentation [audio]: A subdivision of the digital representation of an audio signal
in variable segments of time.
2.1.4 adaptive bit allocation [audio]: The assignment of bits to subbands in a time and frequency
varying fashion according to a psychoacoustic
model.
2.1.5 adaptive noise allocation [audio]: The assignment of coding noise to frequency bands in a
time and frequency varying fashion according to a psychoacoustic model.
2.1.6 alias [audio]: Mirrored signal component resulting from sub-Nyquist sampling.
2.1.7 analysis filterbank [audio]: Filterbank in the encoder that transforms a broadband PCM audio
signal into a set of subsampled subband samples.
2.1.8 audio access unit [audio]: For Layers I and II an audio access unit is defined as the smallest
part of the encoded bitstre;un which can be decoded by itself, where decoded means “fully reconstructed
sound”. For Layer III an audio access unit is part of the bitstream that is decodable with the use of
previously acquired main information.
2.1.9 audio buffer [audio]: A buffer in the system target decoder for storage of compressed audio data.
2.1.10 audio sequence [audio]: A non-interrupted series of audio fr,unes in which the following
parameters are not chculged:
-ID
- Layer
- Scvnpling Frequency
- For Layer I and II: Bitrate index
2.1.11 backward motion vector [video]: A motion vector that is used for motion compensation
from a reference picture at a later time in display order.
2.1.12 Bark [audio]: Unit of critical b scale over the audio range closely corresponding with the frequency selectivity of the human ear across the
band.
B-picture [video]: A picture that is coded
2.1.13 bidirectionally predictive-coded picture;
using motion compensated prediction from a past and/or future reference picture.
.
2.1.14 bitrate: The rate at which the compressed bl tstream is deli vered from the storage medium to the
input of a decoder.
2.1.15 block companding [audio]: Normalizing of the digital representation of an audio signal
within a certain time period.
2.1.16 block [video]: An &row by &column orthogonal block of pels.
2.1.17 bound [audio]: The lowest subband in which intensity stereo coding is used.

0 ISOAEC
ISOAEC 11172-2:1993 (E)
2.1.18 byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8-bits
from the first bit in the stream.
2.1.19 byte: Sequence of 8-bits.
2.1.20 channel: A digital medium that stores or transports an ISO/IEC 11172 stream.
2.1.21 channel [audio]: The left and right channels of a stereo signal
2.1.22 chrominance (component) [video]: A matrix, block or single pel representing one of the
two colour difference signals related to the primary colours in the manner defined in CCIR Ret 601. The
symbols used for the colour difference signals are Cr and Cb.
2.1.23 coded audio bitstream [audio]: A coded representation of an audio signal as specified in
ISO/IEC 11172-3.
2.1.24 coded video bitstream [video]: A coded representation of a series of one or more pictures as
specified in this part of ISO/IEC 11172.
2.1.25 coded order [video]: The order in which the pictures are stored and decoded. This order is not
necessarily the same as the display order.
2J.26 coded representation: A data element as represented in its encoded form.
2.1.27 coding parameters [video]: The set of user-definable parameters that characterize a coded video
bitstream. Bitstreams are character&d by coding parameters. Decoders are ch‘aracterised by the bitstreams
that they are capable of decoding.
2.1.28 component [video]: A matrix, block or single pel from one of the three matrices (luminance
and two chromin,ulce) that make up a picture.
2.1.29 compression: Reduction in the number of bits used to represent an item of data.
2.1.30 constant bitrate coded video [video]: A compressed video bitstream with a constant
average bitrate.
2.1.31 constant bitrate: Operation where the is constant from start to finish of the compressed
bitrate
bitstream.
2.1.32 constrained parameters [video]: The values of the set of coding parameters defined in
2.4.3.2.
2.1.33 constrained system parameter stream (CSPS) [system]: An ISO/IEC 11172
multiplexed stream for which the constraints defined in 2.4.6 of ISOAEC 11172-l apply.
2.1.34 CRC: Cyclic redundancy code.
2.1.35 critical band rate [audio]: Psychoacoustic function of frequency. At a given audible
frequency it is proportional to the number of critical bands below that frequency.
The units of the critical
band rate scale are Barks.
2.1.36 critical band [audio]: Psychoacoustic measure in the spectral domain which corresponds to the
frequency selectivity of the human eCar. This selectivity is expressed in Bark.
2.137 data element: An item of data as represented before encoding and after decoding.
2.138 dc-coefficient [video]: The DCT coefficient for which the frequency is zero in both
dimensions.
ISOAEC 11172=2:1993(E)
o ISOAEC
2.1.39 dc-coded picture; D-picture [video]: A picture that is coded using only information from
itself. Of the DCT coefficients in the coded representation, only the dc-coefficients are present.
2.1.40 DCT coefficient: The amplitude of a specific cosine basis function.
2.1.41 decoded stream: The decoded reconstruction of a compressed bitstream.
2.1.42 decoder input buffer [video]: The first-in first-out (FIFO) buffer specified in the video
buffering verifier.
2.1.43 decoder input rate [video]: The data rate specified in the video buffering verifier and encoded
in the coded video bitstream.
2.1.44 decoder: An embodiment of a decoding process.
2.1.45 decoding (process): The process defined in ISO/IEC 11172 that reads an input coded bitstream
and produces decoded pictures or audio samples.
2.1.46 decoding time-stamp; DTS [system]: A field that may be present in a packet header that
indicates the time that an access unit is decoded in the system target decoder.
2.1.47 de-emphasis [audio]: Filtering applied to an audio signal after storage or transmission to undo
a linear distortion due to emphasis.
2.1.48 dequantization [video]: The process of resealing the quantized DCT coefficients after their
representation in the bitstream has been decoded and before they are presented to the inverse DCT.
2.1.49 digital storage media; DSM: A digi& storage or transmission device or system.
2.1.50 discrete cosine transform; DCT [video]: Either the forward discrete cosine transform or the
inverse discrete cosine transform. The DC?‘ is an invertible, discrete orthogonal transformation. The
inverse DCT is defined in annex A.
2.1.51 display order [video]: The order in which the decoded pictures should be displayed. Normally
this is the same order in which they were presented at the input of the encoder.
2.1.52 dual channel mode [audio]: A mode, where two audio channels with independent programme
contents (e.g. bilingual) are encoded within one bitstream. The coding process is the same as for the stereo
mode.
2.1.53 editing: The process by which one or more compressed bitstrecvns new compressed bitstream. Conforming edited bitstreams must meet the requirements defined in this part of
ISO/IEC 11172.
2.1.54 elementary stream [system]: A coded video, coded audio or other
generic term for one of the
coded bitstre 2.1.55 emphasis [audio]: Filtering applied to an audio signal before storage or transmission to
improve the signal-to-noise ratio at high frequencies.
2.1.56 encoder: An embodiment of an encoding process.
2.1.57 encoding (process): A process, not specified in ISO/IEC 11172, that reads a stream of input
pictures or audio samples and produces a valid coded bitstream as defined in ISO/IEC 11172.
2.1.58 entropy coding: Variable length lossless coding of the digital representation of a signal to
reduce redundancy.
2.1.59 fast forward playback [video]: The process of displaying a sequence, or parts of a sequence,
of pictures in display-order f,?ster Wan real-time.
ISO/IEC 1117202:1993 (E) 0 ISOAEC
2.1.60 F’FT: Fast Fourier Transformation. A fast algorithm for performing a discrete Fourier transform
(an orthogonal transform).
2.1.61 filterbank [audio]: A set of band-pass filters covering the entire audio frequency range.
2.1.62 fixed segmentation [audio]: A subdivision of the digital representation of an audio signal
into fixed segments of time.
2.1.63 forbidden: The term “forbidden” when used in the clauses defining the coded bitstream indicates
that the value shall never be used. This is usually to avoid emulation of start codes.
2.1.64 forced updating [video]: The process by which macroblocks are intra-coded from time-to-time
to ensure that mismatch errors between the inverse DCT processes in encoders and decoders cannot build up
excessively.
2.1.65 forward motion vector [video]: A motion vector that is used for motion compensation from
a reference picture at an earlier time in display order.
2.1.66 frame [audio]: A pcvt of the audio signal that corresponds to audio PCM samples from an
Audio Access Unit.
maximum
2.1.67 free format [audio]: Any bitrate other than the defined bi trates that is less than the
valid bitrate for each layer.
2.1.68 future reference picture [video]: The future reference picture is the reference picture that
occurs at a later time than the current picture in display order.
2.1.69 granules [Layer II] [audio]: The set of 3 consecutive subb‘and samples from all 32 subbands
that are considered together before quantization. They correspond to 96 PCM samples.
2.1.70 granules [Layer III] [audio]: 576 frequency lines that marry their own side information.
2.1.71 group of pictures [video]: A series of one or more coded pictures intended to assist random
access. The group of pictures is one of the layers in the coding syntax defined in this part of ISOLIEC
11172.
2.1.72 Hann window [audio]: A time function applied sample-by-sample to a block of audio samples
before Fourier transformation.
2.1.73 Huffman coding: A specific method for entropy coding.
2.1.74 hybrid filterbank [audio]: A serial combination of subband filterbank and MDCT.
2.1.75 IMDCT [audio]: Inverse Modified Discrete Cosine Transform.
2.1.76 intensity stereo [audio]: A method of exploiting stereo irrelevance or redundancy in
stereophonic audio programmes based on retaining at high frequencies only the energy envelope of the right
and left channels.
2.1.77 interlace [video]: The property of conventional television pictures where alternating lines of
the picture represent different instances in time.
2.1.78 intra coding [video]: Coding of a macroblock or picture that uses information only from that
macroblock or picture.
2.1.79 intra-coded picture; I-picture [video]: A picture coded using information only from itself.

ISOAEC 11172-2: 1993 (E)
o ISOAEC
2.1.80 ISOLIEC 11172 (multiplexed) stream [system]: A bitstream composed of zero or more
elementary streams combined in the manner defined in ISO/IEC 11172-l.
2.1.81 joint stereo coding [audio]: Any method stereophonic irrelevance or
that exploits
stereophonic redundancy.
2.1.82 joint stereo mode [audio]: A mode of the audio coding algorithm using joint stereo coding.
2.1.83 layer [audio]: One of the levels in the coding hierarchy of the audio system defined in ISO/IEC
11172-3.
2.1.84 layer [video and systems]: One of the levels in the data hierarchy of the video and system
specifications defined in ISO/IEC 11172-l and this part of ISO/IEC 11172.
2.1.85 luminance (component) [video]: A matrix, block or single pel representing a monochrome
representation of the signal and related to the primary colours in the manner defined in CCIR Ret 601. The
symbol used for luminance is Y.
2.1.86 macroblock [video]: The four 8 by 8 blocks of luminance data and the two corresponding 8 by
8 blocks of chrominance data coming from a 16 by 16 section of the lumin(ulce component of the picture.
Macroblock is sometimes used to refer to the pel data and sometimes to the coded representation of the pel
values and other data elements defined in the macroblock layer of the syntax defined in this part of ISO/IEC
11172. The usage is clear from the context.
2.1.87 mapping [audio]: Conversion of an audio signal from time to frequency domain by subband
filtering and/or by MDCT.
2.1.88 masking [audio]: A property of the human auditory system by which an audio signal cannot be
perceived in the presence of another audio signal .
2.1.89 masking threshold [audio]: A function in frequency and time below which an audio signal
cannot be perceived by the human auditory system.
2.1.90 MDCT [audio]: Modified Discrete Cosine Transform.
2.1.91 motion compensation [video]: The use of motion vectors to improve the efficiency of the
prediction of pel values. The prediction uses motion vectors to provide offsets into the past and/or future
reference pictures containing previously decoded pel values that (are used to form the prediction error signal.
2.1.92 motion estimation [video]: The process of estimating motion vectors during the encoding
process.
2.1.93 motion vector [video]: A two-dimensional vector used for motion compensation that provides
an offset from the coordinate position in the current picture to the coordinates in a reference picture.
2.1.94 MS stereo [audio]: A method of exploiting stereo irrelevance or redundancy in stereophonic
audio programmes based on coding the sum and difference signal instead of the left and right channels.
2.1.95 non-intra coding [video]: Coding of a macroblock or picture that uses information both from
itself and from macroblocks and pictures occurring at other times.
2.196 non-tonal component [audio]: A noise-like component of an audio signal.
2.1.97 Nyquist sampling: Sampling at or above twice the maximum bandwidth of a signal.
2.1.98 pack [system]: A pack consists of a pack header followed by one or more packets. It is a layer
in the system coding syntax described in ISO/IEC 11172-l.
2.1.99 packet data [system]: Contiguous bytes of data from an elementary stream present in a packet.
0 ISOAEC
ISOAEC 1117202:1993 (E)
[system]: The data structure used to convey information about the elementary
2.1.100 packet header
the packet data.
stream data contained in
2.1.101 packet [system]: A packet consists of a header followed by a number of contiguous bytes
from an elementary data stream. It is a layer in the system coding syntax described in ISO/IEC 11172-l.
2.1.102 padding [audio]: A method to adjust the average length in time of an audio frame to the
dumtion of the corresponding PCM samples, by conditionally adding a slot to the audio frame.
picture is the reference picture that occurs
2J .l 03 past
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...