ISO/IEC 15444-3:2002
(Main)Information technology - JPEG 2000 image coding system - Part 3: Motion JPEG 2000
Information technology - JPEG 2000 image coding system - Part 3: Motion JPEG 2000
ISO/IEC 15444-3:2002 specifies the use of the wavelet-based JPEG2000 codec for the coding and display of timed sequences of images (motion sequences), possibly combined with audio, and composed into an overall presentation. In this specification, a file format is defined, and guidelines for the use of the JPEG2000 codec for motion sequences are supplied.
Technologies de l'information — Système de codage d'image JPEG 2000 — Partie 3: Motion JPEG 2000
General Information
Relations
Frequently Asked Questions
ISO/IEC 15444-3:2002 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - JPEG 2000 image coding system - Part 3: Motion JPEG 2000". This standard covers: ISO/IEC 15444-3:2002 specifies the use of the wavelet-based JPEG2000 codec for the coding and display of timed sequences of images (motion sequences), possibly combined with audio, and composed into an overall presentation. In this specification, a file format is defined, and guidelines for the use of the JPEG2000 codec for motion sequences are supplied.
ISO/IEC 15444-3:2002 specifies the use of the wavelet-based JPEG2000 codec for the coding and display of timed sequences of images (motion sequences), possibly combined with audio, and composed into an overall presentation. In this specification, a file format is defined, and guidelines for the use of the JPEG2000 codec for motion sequences are supplied.
ISO/IEC 15444-3:2002 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.30 - Coding of graphical and photographical information. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 15444-3:2002 has the following relationships with other standards: It is inter standard links to ISO/IEC 15444-3:2002/FDAM 3, ISO/IEC 15444-3:2002/Amd 2:2003, ISO/IEC 15444-3:2007; is excused to ISO/IEC 15444-3:2002/FDAM 3, ISO/IEC 15444-3:2002/Amd 2:2003. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/IEC 15444-3:2002 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 15444-3
First edition
2002-09-01
Information technology — JPEG 2000
image coding system —
Part 3:
Motion JPEG 2000
Technologies de l'information — Système de codage d'image
JPEG 2000 —
Partie 3: Motion JPEG 2000
Reference number
©
ISO/IEC 2002
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not
be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this
file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this
area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters
were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event
that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2002
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic
or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body
in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Printed in Switzerland
ii © ISO/IEC 2002 – All rights reserved
CONTENTS
1 SCOPE . 1
2 NORMATIVE REFERENCES. 1
3 DEFINITIONS . 1
4 COMPATIBILITY AND TECHNOLOGY DERIVATION . 2
4.1 FAMILY MEMBERS. 2
4.2 MP4 INHERITANCE AND COMPATIBILITY . 2
4.3 JP2 INHERITANCE AND COMPATIBILITY . 2
4.4 CONFORMANCE. 3
4.5 PROFILES AND LEVELS. 3
5 FILE ORGANIZATION. 3
5.1 PRESENTATION STRUCTURE . 3
5.1.1 File Structure. 3
5.1.2 Object Structure. 3
5.1.3 Meta Data and Media Data. 3
5.1.4 Track Identifiers . 3
5.1.5 Visual Composition. 4
5.2 META-DATA STRUCTURE (OBJECTS). 5
5.2.1 Box. 5
5.2.2 Data Types and fields. 6
5.2.3 Box Order. 6
5.3 BOX DEFINITIONS. 8
5.3.1 Movie Box. 8
5.3.2 Media Data Box . 8
5.3.3 Movie Header Box . 8
5.3.4 Track Box. 9
5.3.5 Track Header Box . 10
5.3.6 Track Reference Box. 11
5.3.7 Media Box. 11
5.3.8 Media Header Box . 11
5.3.9 Handler Reference Box . 12
5.3.10 Media Information Box . 12
5.3.11 Media Information Header Boxes . 13
5.3.12 Data Information Box. 14
5.3.13 Data Reference Box. 14
5.3.14 Sample Table Box . 15
5.3.15 Time to Sample Box. 15
5.3.16 Sample Description Box. 16
5.3.17 Sample Size Box . 19
5.3.18 Sample To Chunk Box . 19
5.3.19 Chunk Offset Box. 20
5.3.20 Free Space Box . 20
5.3.21 Edit Box . 21
5.3.22 Edit List Box . 21
5.3.23 User Data Box . 22
5.3.24 Movie Extends Box. 22
5.3.25 Track Extends Box. 22
5.3.26 Movie Fragment Box. 23
5.3.27 Movie Fragment Header Box . 23
5.3.28 Track Fragment Box. 23
5.3.29 Track Fragment Header Box . 24
5.3.30 Track Fragment Run Box . 24
6 EXTENSIBILITY . 25
6.1 OBJECTS. 25
6.2 STORAGE FORMATS. 26
© ISO/IEC 2002 – All rights reserved iii
ANNEX A: FILE AND CODESTREAM PROFILES .27
A.1 PROFILE INTRODUCTION . 27
A.2 MOTION JPEG2000 SIMPLE PROFILE. 27
ANNEX B: OVERVIEW AND INTRODUCTION. 28
B.1 SECTION OVERVIEW. 28
B.2 CORE CONCEPTS. 28
B.3 PHYSICAL STRUCTURE OF THE MEDIA. 28
B.4 TEMPORAL STRUCTURE OF THE MEDIA. 29
B.5 INTERLEAVE. 29
B.6 COMPOSITION. 29
B.7 RANDOM ACCESS . 29
B.8 FRAGMENTED MOVIE FILES . 29
ANNEX C: GUIDELINES FOR USE OF THE JPEG2000 CODEC . 31
C.1 INTRODUCTION . 31
C.2 FREQUENCY WEIGHTING FOR MOTION SEQUENCES. 31
C.3 ENCODER SUB-SAMPLING OF COMPONENTS. 32
ANNEX D: INDICATING SUB-SAMPLING CHROMA OFFSET . 33
ANNEX E: BIBLIOGRAPHY . 35
iv © ISO/IEC 2002 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the
specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the
development of International Standards through technical committees established by the respective organization to deal with
particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In
the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3.
The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by
the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires
approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this part of ISO/IEC 15444 may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 15444-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29,
Coding of audio, picture, multimedia and hypermedia information, in collaboration with ITU-T, but is not published as
common text at this time.
ISO/IEC 15444 consists of the following parts, under the general title Information technology — JPG 2000 image coding
system:
Part 1: Core coding system
Part 2: Extensions
Part 3: Motion JPEG 2000
Part 4: Conformance testing
Part 5: Reference software
Part 6: Compound image file format
Annex A forms a normative part of this part of ISO/IEC 15444. Annexes B to E are for information only.
© ISO/IEC 2002 – All rights reserved v
Introduction
This document specifies the use of the wavelet-based JPEG2000 codec for the coding and display of timed sequences of
images. It has been defined by ISO/IEC JTC 1 SC 29/WG 1 as part three of the JPEG2000 International Standard. In
this specification, a file format is defined, and guidelines for the use of the JPEG2000 codec for timed sequences are
supplied. The Motion JPEG2000 file format MJ2 is designed to contain one or more motion sequences of JPEG2000
images, with their timing, and also optional audio annotations, all composed into an overall presentation.
Motion JPEG2000 is expected to be used in a variety of applications, particularly where the codec is already available
for other reasons, or where the high-quality frame-based approach, with no inter-frame coding, is appropriate. These
application areas include:
� digital still cameras,
� error-prone environments such as wireless and the internet,
� PC-based video capturing,
� high quality digital video recording for professional broadcasting and motion picture production from film-based
to digital systems,
� and high-resolution medical and satellite imaging.
Motion JPEG2000 is a flexible format, permitting a wide variety of usages, such as editing, display, interchange, and
streaming.
The file structure is object-oriented; a file can be decomposed into constituent objects very simply, and the structure of
the objects inferred directly from their type.
Media-data is not ‘framed’ by the file format; the file format declarations that give the size, type and position of media
data units are not physically contiguous with the media data. This makes it possible to subset the media-data, and to use
it in its natural state, without requiring it to be copied to make space for framing. The meta-data is used to describe the
media data by reference, not by inclusion.
The file format does not require that a single presentation be in a single file. This enables both sub-setting and re-use of
content. When combined with the non-framing approach, it also makes it possible to include media data in files not
formatted to this specification (e.g. ‘raw’ files containing only media data and no declarative information, or file formats
already in use in the media or computer industries).
The file format is based on a common set of designs and a rich set of possible structures and usages. The same format
serves all usages; translation is not required. However, when used in a particular way (e.g. for local presentation), the
file may need structuring in certain ways for optimal behavior (e.g. time-ordering of the data). No normative structuring
rules are defined by this specification, unless a restricted profile is used.
Motion JPEG2000 is based on the MPEG-4 MP4 file format, and JPEG2000 is represented as a peer coding system to
MPEG4 visual, in this specification.
vi © ISO/IEC 2002 – All rights reserved
INTERNATIONAL STANDARD ISO/IEC 15444-3:2002(E)
INFORMATION TECHNOLOGY —
JPEG 2000 IMAGE CODING SYSTEM —
PART 3: MOTION JPEG 2000
1 Scope
This document specifies the use of the wavelet-based JPEG2000 codec for the coding and display of timed sequences of images
(motion sequences), possibly combined with audio, and composed into an overall presentation. In this specification, a file
format is defined, and guidelines for the use of the JPEG2000 codec for motion sequences are supplied.
2 Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated were
valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this Recommendation |
International Standard are encouraged to investigate the possibility of applying the most recent edition of the Recommendations
and Standards listed below. Members of IEC and ISO maintain registers of currently valid International Standards. The
Telecommunication Standardization Bureau of the ITU maintains a list of currently valid ITU-T Recommendations.
ITU-T Rec.T.800 | ISO/IEC 15444-1, Information technology – JPEG 2000 image coding system – Part 1: Core coding system
ISO/IEC 14496-1:2001, Information technology – Coding of audio-visual objects – Part 1: Systems; particularly the MP4 file
format: clause 13, and the syntax description language (SDL), clause 14
ISO 639-2:1998, Codes for the representation of names of languages – Part 2: Alpha-3 code
3 Definitions
3.1 Box: An object-oriented building block defined by a unique type identifier and length
3.2 Chunk: A contiguous set of samples for one track
3.3 Container Box: A box whose sole purpose is to contain and group a set of related boxes
3.4 Hint Track: A special track which does not contain media data. Instead it contains instructions for packaging one or
more tracks into a streaming channel
3.5 Hinter: A tool that is run on a file containing only media, to add one or more hint tracks to the file and so facilitate
streaming
3.6 Movie Box: A container box whose sub-boxes define the meta-data for a presentation. (‘moov’)
3.7 Media Data Box: A container box which can hold the actual media data for a presentation (‘mdat’)
3.8 Motion sequence: A timed sequence of JPEG2000 images
3.9 MJ2 File: The name of the file format described in this specification
3.10 Presentation: One or more motion sequences (q.v.), possibly combined with audio
3.11 Sample: In non-hint tracks, a sample is an individual frame of video, or a compressed frame of audio. In hint tracks, a
sample defines the formation of one or more streaming packets
3.12 Sample Table: A packed directory for the timing and physical layout of the samples in a track
© ISO/IEC 2002 – All rights reserved 1
3.13 Track: A collection of related samples (q.v.) in an MJ2 file. For media data, a track corresponds to a sequence of
images or sampled audio. For hint tracks, a track corresponds to a streaming channel
4 Compatibility and Technology derivation
4.1 Family Members
This is a standalone specification; it defines the file format for MJ2. However, it stands as a member of a family of
specifications with common formatting.
The other family members include:
� The JPEG2000 single image format, JP2.
� The MPEG-4 file format, MP4.
� The QuickTime file format, on which MP4 and this specification are based.
These specifications share a common definition for the structure of a file (a sequence of objects, called boxes here and atoms in
MP4 and QuickTime), and a common definition of the general structure of an object (the size and type).
All these specifications require that readers ignore objects that are unrecognizable to them.
This specification takes precedence over those from which it inherits, in any case where there are differences or conflicts;
however no such conflicts are known to exist.
4.2 MP4 Inheritance and Compatibility
Motion JPEG2000 is represented as a peer coding system to MPEG4 visual, in this specification. Data structures and concepts
that are held in common with these other specifications are defined to be compatible with them. Most boxes (atoms in MP4)
are defined identically; this includes:
Movie, Media Data, Track, Track Reference, Media, Media Header, Handler Reference, Media Information,
Hint Media Header, Data Information, Data Reference, Sample Table, Time to Sample, Sample Size, Sample to
Chunk, Chunk Offset, Free Space, Edit, Edit List, User Data, and Extension (UUID) boxes.
A number of boxes are used in a compatible fashion, but there are a number of fields in MP4 which, in that specification, have
required initial values but are ignored on reading, which are used here. This includes:
Movie Header, Track Header, Video Media Header, Sound Media Header,
The format of the Sample Description Box itself is the same, but a new VideoSampleDescription Box for motion JPEG2000 is
introduced within it; and likewise, a new Audio Sample Description format for raw audio is introduced.
4.3 JP2 Inheritance and Compatibility
The still image format, JP2, defines a number of boxes. The following boxes from that specification shall be present. If the
JP2 specification requires a particular position (e.g. first in the file), that positioning shall be followed here:
1) The JP2 'family' signature box ‘jP ’.
2) The file type compatibility box ‘ftyp’.
In the file type compatibility box, the brand shall be 'mjp2' for files conforming to this specification, and 'mjp2' shall be a
member of the compatibility list.
It is permissible under this specification to make a file that adheres to both this specification and the JP2 specification. In that
case:
1) The compatibility list shall include all the compatible brands
2) The objects (boxes or atoms) required by the JP2 specification shall also be present.
3) The objects (boxes or atoms) optional in the JP2 specification may also be present.
A still image reader, reading a file which contains both a presentation (conformant to this specification) and a still image, would
'see' only the still image. Likewise a motion reader would 'see' only the presentation. A more powerful reader may display
both, or offer the user a choice.
The JP2 specification includes an optional IPR (Intellectual Property Rights) box which is therefore also optional in this
specification. Among other issues this addresses unique identification and protection of content.
2 © ISO/IEC 2002 – All rights reserved
4.4 Conformance
Implementations of Motion JPEG2000 decoders shall support JPEG2000 image sequences, as well as raw and twos-
complement audio if audio output is available. They may also support compressed audio, using MP4 formats, or other track
types from MPEG-4. The support of such MPEG-4 tracks is not required; however, readers shall not fail if they are present. If
MPEG-4 composition (BIFS) is used, then the simple composition used in this specification should also be set up in such a way
that a reader not implementing BIFS will display a suitable result.
Files conformant with this specification shall contain at least one Motion JPEG2000 video track. They may contain more video
tracks, uncompressed audio, or compressed MP4 audio.
4.5 Profiles and Levels
There are two tools for profiling Motion JPEG2000 files.
The first consists of the optional specification of tools and levels of the JPEG2000 coding system (codestream features). These
are indicated in the optional sample description extension JP2 Profile Box (see below 5.3.16).
The second tool allows a file overall to be identified as belonging to a definition which forms a proper subset of the general
specification. Such definitions might restrict such features as:
� the use of data references, and multiple files
� the layout order of the boxes, and the data within the boxes (e.g. that data is in time order and interleaved)
� the use of profiles of the JPEG2000 codestream
� the existence of other tracks, and their format (e.g. audio, MPEG-7, etc.).
The conformance to these restricted profiles is indicated in the file type box by the addition of the compatible profiles as brands
within the compatibility list. "Annex A File and Codestream profiles" defines the available profiles in this specification.
5 File organization
5.1 Presentation structure
5.1.1 File Structure
A presentation may be contained in several files. One file contains the meta-data for the whole presentation, and is formatted to
this specification. This file may also contain all the media data, whereupon the presentation is self-contained. The other files, if
used, are not required to be formatted to this specification; they are used to contain media data, and may also contain unused
media data, or other information. This specification concerns the structure of the presentation file only. The format of the
media-data files is constrained by this specification only in that the media-data in the media files must be capable of description
by the meta-data defined here.
These other files may be MJ2 files, JPEG2000 image files, MPEG-4 files containing JPEG2000 images, or other formats
containing JPEG2000 images. Only the media data itself, such as the JPEG2000 images, is stored in these other files; all
timing and framing (position and size) information is in the MJ2 file, so the ancillary files are essentially free-format.
If an MJ2 file contains hint tracks, the media tracks that reference the media data from which the hints were built shall remain
in the file, even if the data within them is not directly referenced by the hint tracks.
5.1.2 Object Structure
The file is structured as a sequence of objects; some of these objects may contain other objects. The sequence of objects in the
file shall contain exactly one presentation meta-data wrapper (the Movie Box). It is usually close to the beginning or end of the
file, to permit its easy location. The other objects found at this level may be free space, or media data boxes.
The fields in the objects are stored with the most significant byte first, commonly known as network byte order or big-endian
format.
5.1.3 Meta Data and Media Data
The meta-data is contained within the meta-data wrapper (the Movie Box); the media data is contained either in the same file,
within Media Data Box(es), or in other files. The media data is composed of images or audio data; the media data objects, or
media data files, may contain other un-referenced information.
5.1.4 Track Identifiers
The track identifiers used in an MJ2 file are unique within that file; no two tracks shall use the same identifier.
© ISO/IEC 2002 – All rights reserved 3
The next track identifier value in the movie header generally contains a value one greater than the largest track identifier value
found in the file. This enables easy generation of a track identifier under most circumstances. However, if this value is equal to
ones (32-bit unsigned maxint), then a search for a free track identifier is needed for all additions.
5.1.5 Visual Composition
Composition of multiple image sequences in a 2D environment can be achieved by using multiple video tracks which overlap in
time. Their composition is defined by the following structures:
� The matrix in the track header specifies their positioning and scaling.
� The layer field in the track header specifies the front-to-back ordering of the tracks.
� The graphics mode and opcolor fields in the video media header are used to specify the ways in which each track is
composited onto the existing image (this compositing is performed from back to front).
Applications requiring more complex compositing may use the BIFS system from MPEG-4, optionally. The matrix, graphics
mode, and layers should be setup so that a reader not implementing BIFS displays the desired result. Matrix values which occur
in the headers specify a transformation of video images for presentation. The point (p,q) is transformed into (p', q') using the
matrix as follows:
(pq1)* |a bu|=(mnz)
|c dv|
|x yw|
m=ap+cq+x; n=bp+dq+y; z=up+vq+w;
p' = m/z; q' = n/z
The coordinates {p,q} are on the decompressed frame, and {p’, q’} are at the rendering output. Therefore, for example, the
matrix {2,0,0, 0,2,0, 0,0,1} exactly doubles the pixel dimension of an image. The co-ordinates transformed by the matrix are
not normalized in any way, and represent actual sample locations. Therefore {x,y} can, for example, be considered a
translation vector for the image.
The co-ordinate origin is located at the upper left corner, and X values increase to the right, and Y values increase downwards.
{p,q} and {p’,q’} are to be taken as absolute pixel locations relative to the upper left hand corner of the original image (after
scaling to the size determined by the track header's width and height) and the transformed (rendering) surface, respectively.
Each track is composed using its matrix as specified into an overall image; this is then transformed and composed according to
the matrix at the movie level in the MovieHeaderBox. It is application-dependent whether the resulting image is ‘clipped’ to
eliminate pixels, which have no display, to a vertical rectangular region within a window, for example. So for example, if only
one video track is displayed and it has a translation to {20,30}, and a unity matrix is in the MovieHeaderBox, an application
may choose not to display the empty “L” shaped region between the image and the origin.
All the values in a matrix are stored as 16.16 fixed-point values, except for u, v and w, which are stored as 2.30 fixed-point
values. For upwards compatibility into the MPEG-4 BIFS (scene composition) system, matrices used here restrict (u,v,w) to be
(0,0,1), for which the hex values are (0,0,0x40000000). This permits the simple composition used here to be mapped into BIFS
if a scene later requires full scene management.
The values in the matrix are stored in the order {a,b,u, c,d,v, x,y,w}.
Tracks are composed to the presentation surface from back (highest layer number) to front (lowest layer number), against an
indeterminate initial colour. There are various composition modes available; the backmost (first-rendered) track would
normally use 'copy' as the initial image is indeterminate. Subsequent layers can then be composed on top in a variety of ways.
The following table details the composition modes available. Note that (currently) only the 'transparent' mode uses the opcolor
field.
4 © ISO/IEC 2002 – All rights reserved
Table 1 - Graphics Composition Modes
Mode Code Description
Copy 0x0 Copy the source image over the destination
Transparent 0x24 Replace the destination pixel with the source pixel if the source pixel isn't
equal to the opcolor. (Also known as 'blue-screen').
Alpha 0x100 Replace the destination pixel with a blend of the source and destination pixels,
with the proportion controlled by the alpha channel. The alpha channel is
applied to all channels.
Pre-multiplied 0x102 Pre-multiplied with black means that the colour components of each pixel
black alpha have already been blended with a black pixel, based on their alpha channel
value. Effectively, this means that the image has already been combined with a
black background, which must be removed before composition.
Component 0x110 One or more alpha channels are present, which are applied to individual
alpha colour channels, and the image must be composed channel-by-channel
Images are only alpha-composed if both the graphics composition mode requests alpha composition, and the images contain
alpha channels, as declared by the Channel Definition Box inside the JP2 Header Box. Therefore the graphics mode can be
used to prevent alpha composition of an image with alpha channels, if that is desired.
If there is a single alpha channel applied to the entire image, then the value of the graphics must be ‘Alpha’ if that channel is a
straight ‘Opacity’ channel, and must be ‘Pre-multiplied black alpha’ if that channel is a ‘Pre-multiplied’ opacity channel. If
there are one or more alpha channels in the image which are applied to individual channels and not to the whole image, and
alpha composition is desired, then the ‘Component alpha’ value must be used for the graphics mode. Support of ‘Component
alpha’ composition is optional in Part 3 of this specification.
The alpha blending formulas are defined in Part 1 of this specification.
Note: use of the “transparent” opcode may be yield unexpected results when the image codestreams are compressed in a non-
reversible fashion, or are subject to scaling in quality or resolution, either during or after content production. Such operations
are not guaranteed to preserve individual sample values precisely.
5.2 Meta-data Structure (Objects)
5.2.1 Box
The following represents the subset of the QuickTime file specification that is required to define an MJ2 file. An object in this
terminology is a box.
Boxes start with a header which gives both size and type. The header permits compact or extended size (32 or 64 bits) and
compact or extended types (32 bits or full UUIDs). The standard boxes all use compact types (32-bit) and most boxes will use
the compact (32-bit) size. Typically only the media data box(es) need the 64-bit size.
The size is the entire size of the box, including the size and type header, fields, and all contained boxes. This facilitates general
parsing of the file.
The definitions of boxes are given in the syntax description language (SDL) defined in MPEG-4 (see reference 0 in clause 2).
Comments in the code fragments in this specification indicate informative material.
A number of boxes contain index value into sequences in other boxes. These indexes start with the value 1 (1 is the first entry
in the sequence).
aligned(8) class Box (unsigned int(32) boxtype,
optional unsigned int(8)[16] extended-type) {
unsigned int(32) size;
unsigned int(32) type = boxtype;
if (size==1) {
unsigned int(64) largesize;
} else if (size==0) {
// box extends to end of file
}
if (boxtype==‘uuid’) {
unsigned int(8)[16] usertype = extended-type;
}
}
© ISO/IEC 2002 – All rights reserved 5
The semantics of these two fields are:
size is an integer that specifies the number of bytes in this box, including all its fields and contained boxes; if size is 1
then the actual size is in the field largesize; if size is 0, then this box is the last one in the file, and its contents
extend to the end of the file (normally only used for a Media Data Box)
type identifies the box type; standard boxes use a compact type, which is normally four printable characters, to permit
ease of identification, and is shown so in the boxes below. User extensions use an extended type; in this case, the type
field is set to ‘uuid’.
Type fields not defined here are reserved. Private extensions shall be achieved through the ‘uuid’ type. In addition, the
following types are not and will not be used, or used only in their existing sense, in future versions of this specification, to
avoid conflict with existing content using earlier pre-standard versions of this format:
clip, crgn, matt, kmat, pnot, ctab, load, imap; track reference types tmcd, chap, sync, scpt, ssrc.
Boxes not explicitly defined in this standard, or otherwise unrecognized by a reader, may be ignored.
Many objects also contain a version number and flags field:
aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f)
extends Box(boxtype) {
unsigned int(8) version = v;
bit(24) flags = f;
}
The semantics of these two fields are:
version is an integer that specifies the version of this format of the box.
flags is a map of flags
5.2.2 Data Types and fields
In a number of boxes in this specification, there are two variant forms: version 0 using 32-bit fields, and version 1 using 64-bit
sizes for those same fields. In general, if a version 0 box (32-bit field sizes) can be used, it should be; version 1 boxes should
be used only when the 64-bit field sizes they permit, are required.
For convenience during content creation there are creation and modification times stored in the file. These can be 32-bit or 64-
bit numbers, counting seconds since midnight, Jan. 1, 1904, which is a convenient date for leap-year calculations. 32 bits are
sufficient until approximately year 2040.
Fixed-point numbers are signed or unsigned values resulting from dividing an integer by an appropriate power of 2. For
example, a 30.2 fix-point number is formed by dividing a 32-bit integer by 4.
Fields shown as “pre-defined” in the box descriptions should be initialized to the given value on box creation, copied un-
inspected when boxes are copied, and ignored on reading.
An overall view of the normal encapsulation structure is provided in the following table.
5.2.3 Box Order
The table shows those boxes which may occur at the top-level in the left-most column; indentation is used to show possible
containment. Thus, for example, a track header (tkhd) is found in a track (trak), which is found in a movie (moov). Not all
boxes need be used in all files; the mandatory boxes are marked with an asterisk (*). See the description of the individual
boxes for a discussion of what must be assumed if the optional boxes are not present.
User data objects shall be placed only in Movie or Track Boxes, and objects using an extended type may be placed in a wide
variety of containers, not just the top level.
In order to improve interoperability and utility of the files, the following rules and guidelines shall be followed for the order of
boxes:
1) The JP2 Signature Box and File Type Box shall occur first and second in the file (see 4.2).
2) It is strongly recommended that all header boxes be placed first in their container: these boxes are the
Movie Header, Track Header, Media Header, and the specific media headers inside the Media
Information Box (e.g. the Video Media Header).
3) Any Movie Fragment Boxes shall be in sequence order (see 5.3.27).
4) It is recommended that the boxes within the Sample Table Box be in the following order: Sample
Description, Time to Sample, Sample to Chunk, Sample Size, Chunk Offset.
6 © ISO/IEC 2002 – All rights reserved
5) It is strongly recommended that the Track Reference Box and Edit List (if any) should precede the
Media Box, and the Handler Box should precede the Media Information Box, and the Data Information
Box should precede the Sample Table Box.
6) It is recommended that user Data Boxes be placed last in their container, which is either the Movie Box
or Track Box.
Table 2 - Box types, structure, and cross-reference
jP * 4.3 the JP2 family signature
ftyp * 4.3 file type and compatibility
moov * 5.3 container for all the meta-data
mvhd * 5.3.3 movie header, overall declarations
trak * 5.3.4 container for an individual track or stream
tkhd * 5.3.5 track header, overall information about the track
tref 5.3.6 track reference container
edts 5.3.21 edit list container
elst 5.3.22 an edit list
mdia * 5.3.7 container for the media information in a track
mdhd * 5.3.8 media header, overall information about the
media
hdlr * 5.3.9 handler, declares the media (handler) type
minf * 5.3.10 media information container
vmhd 5.3.11.2 video media header, overall information (video
track only)
smhd 5.3.11.3 sound media header, overall information (sound
track only)
hmhd 5.3.11.4 hint media header, overall information (hint track
only)
dinf * 5.3.12 data information box, container
dref * 5.3.13 data reference box, declares source(s) of media
data in track
stbl * 5.3.14 sample table box, container for the time/space
map
stsd * 5.3.16 sample descriptions (codec types, initialization
etc.)
stts * 5.3.15.1 (decoding) time-to-sample
stsc * 5.3.18 sample-to-chunk, partial data-offset information
stsz * 5.3.17 sample sizes (framing)
stco * 5.3.19 chunk offset, partial data-offset information
mvex 5.3.24 movie extends box
trex * 5.3.25 track extends defaults
moof 5.3.26 movie fragment
mfhd * 5.3.27 movie fragment header
traf 5.3.28 track fragment
tfhd * 5.3.29 track fragment header
trun 5.3.30 track fragment run
mdat 5.3.2 media data container
free 5.3.20 free space
skip 5.3.20 free space
udta 5.3.23 user-data, copyright etc.
© ISO/IEC 2002 – All rights reserved 7
5.3 Box Definitions
5.3.1 Movie Box
5.3.1.1 Definition
Box Type: ‘moov’
Container: File
Mandatory: Yes
Quantity: Exactly one
The meta-data for a presentation is stored in the single Movie Box which occurs at the top-level of a file. Normally this box is
close to the beginning or end of the file, though this is not required.
5.3.1.2 Syntax
aligned(8) class MovieBox extends Box(‘moov’){
}
5.3.2 Media Data Box
5.3.2.1 Definition
Box Type: ‘mdat’
Container: File
Mandatory: No
Quantity: Any number
This box contains the media data. In video tracks, this box would contain JPEG2000 video frames. A presentation may contain
zero or more Media Data Boxes. The actual media data follows the type field; its structure is described by the meta-data (see
particularly the sample table, subclause 5.3.14).
In large presentations, it may be desirable to have more data in this box than a 32-bit size would permit. In this case, the large
variant of the size field, above in subclause 5.2, is used.
There may be any number of these boxes in the file (including zero, if all the media data is in other files). The meta-data refers
to media data by its absolute offset within the file (see subclause 5.3.19, the Chunk Offset Box); so Media Data Box headers
and free space may easily be skipped, and files without any box structure may also be referenced and used.
5.3.2.2 Syntax
aligned(8) class MediaDataBox extends Box(‘mdat’) {
bit(8) data[];
}
5.3.2.3 Semantics
data is the contained media data
5.3.3 Movie Header Box
5.3.3.1 Definition
Box Type: ‘mvhd’
Container: Movie Box (‘moov’)
Mandatory: Yes
Quantity: Exactly one
This box defines overall information which is media-independent, and relevant to the entire presentation considered as a whole.
8 © ISO/IEC 2002 – All rights reserved
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...