ISO/IEC 14496-12:2004
(Main)Information technology — Coding of audio-visual objects — Part 12: ISO base media file format
Information technology — Coding of audio-visual objects — Part 12: ISO base media file format
ISO/IEC 14496-12:2004 specifies the structure and uses of the ISO base media file format. The identical text is published as ISO/IEC 15444-12:2004. This file format is used to contain time-based media such as video and audio. The storage of particular coding schemes is defined in specifications that derive from and reference ISO/IEC 14496-12:2004 and ISO/IEC 15444-12:2004, such as the MPEG-4 file format specified in ISO/IEC 14496-14, or the Motion JPEG file format specified in ISO/IEC 15444-3:2002/Amd.2. This file format is designed to contain timed media information for a presentation in a flexible, extensible format that facilitates interchange, management, editing and presentation of the media. This presentation may be "local" to the system containing the presentation, or may be via a network or other stream delivery mechanism. The file format is designed to be independent of any particular network protocol while enabling efficient support for them in general. The file structure is object-oriented; a file can be decomposed into constituent objects very simply, and the structure of the objects inferred directly from their type. This technically identical text is published as ISO/IEC 14496-12:2004 for MPEG-4, and as ISO/IEC 15444-12:2004 for JPEG 2000, and reference to this specification should be made accordingly. The recommendation is to reference one, for example ISO/IEC 14496-12:2004, and append to the reference a parenthetical comment identifying the other, for example "(technically identical to ISO/IEC 15444-12:2004)".
Technologies de l'information — Codage des objets audiovisuels — Partie 12: Format ISO de base pour les fichiers médias
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 14496-12
First edition
2004-02-01
Information technology — Coding of
audio-visual objects —
Part 12:
ISO base media file format
Technologies de l'information — Codage des objets audiovisuels —
Partie 12: Format ISO de base pour les fichiers médias
Reference number
ISO/IEC 14496-12:2004(E)
©
ISO/IEC 2004
---------------------- Page: 1 ----------------------
ISO/IEC 14496-12:2004(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2004
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2004 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 14496-12:2004(E)
Contents Page
Foreword. v
Introduction . vi
1 Scope. 1
2 Normative references . 1
3 Terms and definitions. 1
4 Object-structured File Organization. 2
4.1 File Structure . 2
4.2 Object Structure . 3
4.3 File Type Box. 4
5 Design Considerations. 5
5.1 Usage. 5
5.1.1 Interchange. 5
5.1.2 Content Creation . 5
5.1.3 Preparation for streaming . 6
5.1.4 Local presentation . 6
5.1.5 Streamed presentation . 6
5.2 Design principles . 7
6 ISO Base Media File organization . 7
6.1 Presentation structure. 7
6.1.1 File Structure . 7
6.1.2 Object Structure . 8
6.1.3 Meta Data and Media Data. 8
6.1.4 Track Identifiers . 8
6.2 Metadata Structure (Objects). 8
6.2.1 Box. 8
6.2.2 Data Types and fields . 8
6.2.3 Box Order. 9
7 Streaming Support. 12
7.1 Handling of Streaming Protocols . 12
7.2 Protocol ‘hint’ tracks . 12
7.3 Hint Track Format . 13
8 Box Definitions. 13
8.1 Movie Box . 13
8.2 Media Data Box . 14
8.3 Movie Header Box . 14
8.4 Track Box. 15
8.5 Track Header Box. 16
8.6 Track Reference Box . 17
8.7 Media Box . 18
8.8 Media Header Box . 18
8.9 Handler Reference Box. 19
8.10 Media Information Box . 20
8.11 Media Information Header Boxes . 20
8.11.2 Video Media Header Box . 20
8.11.3 Sound Media Header Box. 21
8.11.4 Hint Media Header Box. 21
8.11.5 Null Media Header Box . 21
8.12 Data Information Box. 22
© ISO/IEC 2004 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 14496-12:2004(E)
8.13 Data Reference Box.22
8.14 Sample Table Box.23
8.15 Time to Sample Boxes .23
8.15.2 Decoding Time to Sample Box.24
8.15.3 Composition Time to Sample Box.25
8.16 Sample Description Box .26
8.17 Sample Size Boxes.28
8.17.2 Sample Size Box.29
8.17.3 Compact Sample Size Box .29
8.18 Sample To Chunk Box .29
8.19 Chunk Offset Box .30
8.20 Sync Sample Box .31
8.21 Shadow Sync Sample Box .31
8.22 Degradation Priority Box .32
8.23 Padding Bits Box.33
8.24 Free Space Box.33
8.25 Edit Box .34
8.26 Edit List Box.34
8.27 User Data Box.35
8.28 Copyright Box.36
8.29 Movie Extends Box.36
8.30 Movie Extends Header Box .36
8.31 Track Extends Box .37
8.32 Movie Fragment Box .38
8.33 Movie Fragment Header Box.38
8.34 Track Fragment Box.38
8.35 Track Fragment Header Box .39
8.36 Track Fragment Run Box.40
8.37 Movie Fragment Random Access Box.41
8.38 Track Fragment Random Access Box.41
8.39 Movie Fragment Random Access Offset Box .42
9 Extensibility .43
9.1 Objects .43
9.2 Storage formats.43
9.3 Derived File formats.44
10 RTP Hint Track Format .44
10.1 Introduction.44
10.2 Sample Description Format.45
10.3 Sample Format.45
10.3.1 Packet Entry format.46
10.3.2 Constructor format.46
10.4 SDP Information .48
10.4.1 Movie SDP information .48
10.4.2 Track SDP Information.48
10.5 Statistical Information.48
Annex A (informative) Overview and introduction .50
A.1 Section Overview.50
A.2 Core Concepts.50
A.3 Physical structure of the media.50
A.4 Temporal structure of the media .51
A.5 Interleave.51
A.6 Composition.51
A.7 Random access .52
A.8 Fragmented movie files .52
Annex B (informative) Patent statements.54
Bibliography.55
iv © ISO/IEC 2004 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 14496-12:2004(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
ISO/IEC 14496-12 was prepared by Joint Technical Committee ISO/IEC/TC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
ISO/IEC 14496 consists of the following parts, under the general title Information technology — Coding of
audio-visual objects:
Part 1: Systems
Part 2: Visual
Part 3: Audio
Part 4: Conformance testing
Part 5: Reference software
Part 6: Delivery Multimedia Integration Framework (DMIF)
Part 7: Optimized reference software for coding of audio-visual objects
Part 8: Carriage of ISO/IEC 14496 contents over IP networks
Part 9: Reference hardware description
Part 10: Advanced Video Coding
Part 11: Scene description and application engine
Part 12: ISO base media file format
Part 13: Intellectual Property Management and Protection (IPMP) extensions
Part 14: MP4 file format
Part 15: Advanced Video Coding file format
Part 16: Animation Framework eXtension (AFX)
© ISO/IEC 2004 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO/IEC 14496-12:2004(E)
Introduction
The ISO Base Media File Format is designed to contain timed media information for a presentation in a
flexible, extensible format that facilitates interchange, management, editing, and presentation of the media.
This presentation may be ‘local’ to the system containing the presentation, or may be via a network or other
stream delivery mechanism.
The file structure is object-oriented; a file can be decomposed into constituent objects very simply, and the
structure of the objects inferred directly from their type.
The file format is designed to be independent of any particular network protocol while enabling efficient
support for them in general.
The ISO Base Media File Format is a base format for media file formats.
It is intended that the ISO Base Media File Format shall be jointly maintained by WG1 and WG11.
Consequently, a subdivision of work created ISO/IEC 15444-12 and ISO/IEC 14496-12 in order to document
the ISO Base Media File Format and to facilitate the joint maintenance.
This technically identical text is published as ISO/IEC 14496-12 for MPEG-4, and as ISO/IEC 15444-12 for
JPEG 2000, and reference to this specification should be made accordingly. The recommendation is to
reference one, for example ISO/IEC 14496-12, and append to the reference a parenthetical comment
identifying the other, for example “(technically identical to ISO/IEC 15444-12)”.
vi © ISO/IEC 2004 – All rights reserved
---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 14496-12:2004(E)
Information technology — Coding of audio-visual objects —
Part 12:
ISO base media file format
1 Scope
This International Standard specifies the ISO base media file format, which is a general format forming the
basis for a number of other more specific file formats. This format contains the timing, structure, and media
information for timed sequences of media data, such as audio/visual presentations.
This part of ISO/IEC 14496 is applicable to MPEG-4, but its technical content is identical to that of
ISO/IEC 15444-12, which is applicable to JPEG 2000.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 639-2:1998, Codes for the representation of names of languages — Part 2: Alpha-3 code
ISO/IEC 11578:1996, Information technology — Open Systems Interconnection — Remote Procedure Call
(RPC)
1)
ISO/IEC 14496-1:2001, Information technology — Coding of audio-visual objects — Part 1: Systems
ITU-T Rec.T.800 | ISO/IEC 15444-1, Information technology — JPEG 2000 image coding system: Core
coding system
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
Box
An object-oriented building block defined by a unique type identifier and length (called ‘atom’ in some
specifications, including the first definition of MP4).
3.2
Chunk
A contiguous set of samples for one track.
3.3
Container Box
A box whose sole purpose is to contain and group a set of related boxes.
1) Refer, in particular, to Clause 14, Syntactic Description Language (SDL).
© ISO/IEC 2004 – All rights reserved 1
---------------------- Page: 7 ----------------------
ISO/IEC 14496-12:2004(E)
3.4
Hint Track
A special track which does not contain media data. Instead it contains instructions for packaging one or more
tracks into a streaming channel.
3.5
Hinter
A tool that is run on a file containing only media, to add one or more hint tracks to the file and so facilitate
streaming.
3.6
Movie Box
A container box whose sub-boxes define the metadata for a presentation (‘moov’).
3.7
Media Data Box
A container box which can hold the actual media data for a presentation (‘mdat’).
3.8
ISO Base Media File
The name of the file format described in this specification.
3.9
Presentation
One or more motion sequences (q.v.), possibly combined with audio.
3.10
Sample
In non-hint tracks, a sample is an individual frame of video, a time-contiguous series of video frames, or a
time-contiguous compressed section of audio. In hint tracks, a sample defines the formation of one or more
streaming packets. No two samples within a track may share the same time-stamp.
3.11
Sample Description
A structure which defines and describes the format of some number of samples in a track.
3.12
Sample Table
A packed directory for the timing and physical layout of the samples in a track.
3.13
Track
A collection of related samples (q.v.) in an ISO base media file. For media data, a track corresponds to a
sequence of images or sampled audio. For hint tracks, a track corresponds to a streaming channel.
4 Object-structured File Organization
4.1 File Structure
Files are formed as a series of objects, called boxes in this specification. All data is contained in boxes; there
is no other data within the file. This includes any initial signature required by the specific file format.
All object-structured files conformant to this section of this specification (all Object-Structured files) shall
contain a File Type Box.
2 © ISO/IEC 2004 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/IEC 14496-12:2004(E)
4.2 Object Structure
An object in this terminology is a box.
Boxes start with a header which gives both size and type. The header permits compact or extended size (32
or 64 bits) and compact or extended types (32 bits or full UUIDs). The standard boxes all use compact types
(32-bit) and most boxes will use the compact (32-bit) size. Typically only the Media Data Box(es) need the 64-
bit size.
The size is the entire size of the box, including the size and type header, fields, and all contained boxes. This
facilitates general parsing of the file.
The definitions of boxes are given in the syntax description language (SDL) defined in MPEG-4 (see reference
in clause 2). Comments in the code fragments in this specification indicate informative material.
The fields in the objects are stored with the most significant byte first, commonly known as network byte order
or big-endian format.
aligned(8) class Box (unsigned int(32) boxtype,
optional unsigned int(8)[16] extended_type) {
unsigned int(32) size;
unsigned int(32) type = boxtype;
if (size==1) {
unsigned int(64) largesize;
} else if (size==0) {
// box extends to end of file
}
if (boxtype==‘uuid’) {
unsigned int(8)[16] usertype = extended_type;
}
}
The semantics of these two fields are:
size is an integer that specifies the number of bytes in this box, including all its fields and contained
boxes; if size is 1 then the actual size is in the field largesize; if size is 0, then this box is the last
one in the file, and its contents extend to the end of the file (normally only used for a Media Data Box)
type identifies the box type; standard boxes use a compact type, which is normally four printable
characters, to permit ease of identification, and is shown so in the boxes below. User extensions use
an extended type; in this case, the type field is set to ‘uuid’.
Boxes with an unrecognized type shall be ignored and skipped.
Many objects also contain a version number and flags field:
aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f)
extends Box(boxtype) {
unsigned int(8) version = v;
bit(24) flags = f;
}
The semantics of these two fields are:
version is an integer that specifies the version of this format of the box.
flags is a map of flags
Boxes with an unrecognized version shall be ignored and skipped.
© ISO/IEC 2004 – All rights reserved 3
---------------------- Page: 9 ----------------------
ISO/IEC 14496-12:2004(E)
4.3 File Type Box
4.3.1 Definition
Box Type: `ftyp’
Container: File
Mandatory: Yes
Quantity: Exactly one
A media-file structured to this part of this specification may be compatible with more than one detailed
specification, and it is therefore not always possible to speak of a single ‘type’ or ‘brand’ for the file. This
means that the utility of the file name extension and mime type are somewhat reduced.
This box must be placed as early as possible in the file (e.g. after any obligatory signature, but before any
significant variable-size boxes such as a Movie Box, Media Data Box, or Free Space). It identifies which
specification is the ‘best use’ of the file, and a minor version of that specification; and also a set of other
specifications to which the file complies. Readers implementing this format should attempt to read files that
are marked as compatible with any of the specifications that the reader implements. Any incompatible change
in a specification should therefore register a new ‘brand’ identifier to identify files conformant to the new
specification.
The minor version is informative only. It does not appear for compatible-brands, and must not be used to
determine the conformance of a file to a standard. It may allow more precise identification of the major
specification, for inspection, debugging, or improved decoding.
The type ‘isom’ (ISO Base Media file) is defined in this section of this specification, as identifying files that
conform to the ISO Base Media File Format. More specific identifiers can be used to identify precise versions
of specifications providing more detail. This brand should not be used as the major brand; this base file
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.