ISO/IEC 15444-12:2008/FDAmd 3
(Amendment)Information technology - JPEG 2000 image coding system - Part 12: ISO base media file format - Amendment 3: DASH support and RTP reception hint track processing
Information technology - JPEG 2000 image coding system - Part 12: ISO base media file format - Amendment 3: DASH support and RTP reception hint track processing
Technologies de l'information — Système de codage d'images JPEG 2000 — Partie 12: Format ISO de base pour les fichiers médias — Amendement 3: Processus de suivi de l'encre de réception RTP et du support DASH
General Information
Relations
Frequently Asked Questions
ISO/IEC 15444-12:2008/FDAmd 3 is a draft published by the International Organization for Standardization (ISO). Its full title is "Information technology - JPEG 2000 image coding system - Part 12: ISO base media file format - Amendment 3: DASH support and RTP reception hint track processing". This standard covers: Information technology - JPEG 2000 image coding system - Part 12: ISO base media file format - Amendment 3: DASH support and RTP reception hint track processing
Information technology - JPEG 2000 image coding system - Part 12: ISO base media file format - Amendment 3: DASH support and RTP reception hint track processing
ISO/IEC 15444-12:2008/FDAmd 3 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.30 - Coding of graphical and photographical information. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 15444-12:2008/FDAmd 3 has the following relationships with other standards: It is inter standard links to ISO/IEC 15444-12:2008, ISO/IEC 15444-12:2012. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/IEC 15444-12:2008/FDAmd 3 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
FINAL ISO/IEC
AMENDMENT
DRAFT 15444-12:2008
FDAM 3
ISO/IEC JTC 1
Information technology — JPEG 2000
Secretariat: ANSI
image coding system —
Voting begins on:
2011-10-07
Part 12:
ISO base media file format
Voting terminates on:
2011-12-07
AMENDMENT 3: DASH support and RTP
reception hint track processing
Technologies de l'information — Système de codage d'images
JPEG 2000 —
Partie 12: Format ISO de base pour les fichiers médias
AMENDEMENT 3: Processus de suivi de l'encre de réception RTP et
du support DASH
Please see the administrative notes on page iii
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPORT-
ING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
ISO/IEC 2011
NATIONAL REGULATIONS.
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
Copyright notice
This ISO document is a Draft International Standard and is copyright-protected by ISO. Except as permitted
under the applicable laws of the user's country, neither this ISO draft nor any extract from it may be
reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic,
photocopying, recording or otherwise, without prior written permission being secured.
Requests for permission to reproduce should be addressed to either ISO at the address below or ISO's
member body in the country of the requester.
ISO copyright office
Case postale 56 CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Reproduction may be subject to royalty payments or a licensing agreement.
Violators may be prosecuted.
ii © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
In accordance with the provisions of Council Resolution 21/1986, this document is circulated in the
English language only.
© ISO/IEC 2011 – All rights reserved iii
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Amendment 3 to ISO/IEC 15444-12:2008 was prepared by Joint Technical Committee ISO/IEC JTC 1,
Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia
information.
iv © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
Information technology — JPEG 2000 image coding system
Part 12:
ISO base media file format
AMENDMENT 3: DASH support and RTP reception hint track
processing
Add the following normative references to Clause 2:
IETF RFC 3550, RTP: A Transport Protocol for Real-Time Applications, SCHULZRINNE, H. et al., July 2003
IETF RFC 5905, Network Time Protocol Version 4: Protocol and Algorithms Specification, MILLS, D., et al.,
June 2010
Add the following terms to Clause 3 in alphabetical sequence, renumbering subclauses as appropriate:
3.x.x
segment
portion of an ISO base media file format file, consisting of either (a) a movie box, with its associated media
data (if any) and other associated boxes or (b) one or more movie fragment boxes, with their associated
media data, and other associated boxes
3.x.x
subsegment
time interval of a segment formed from movie fragment boxes, that is also a valid segment
3.x.x
leaf subsegment
subsegment that does not contain any indexing information that would enable its further division into
subsegments;
At the end of the first paragraph in 6.2.2, add the following:
Values for counters, offsets, times, durations etc. in this format do not ‘wrap’ to 0 when the maximum value
that can be stored in their field is reached; appropriately large fields must be used for all values.
In 6.2.3, add the following to Table 1:
after the first line documenting the ‘subs’ box, add:
saiz 8.7.8 sample auxiliary information sizes
saio 8.7.9 sample auxiliary information offsets
and also, after the line ‘trex’:
leva 8.8.13 level assignment
and also, before the line ‘mfra’:
saiz 8.7.8 sample auxiliary information sizes
saio 8.7.9 sample auxiliary information offsets
tfdt 8.8.12 track fragment decode time
© ISO/IEC 2011 – All rights reserved 1
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
at the end of the table, add:
styp 8.16.2 segment type
sidx 8.16.3 segment index
ssix 8.16.4 subsegment index
prft 8.16.5 producer reference time
Remove the line permitting an sdtp box in a track fragment (indented under ‘traf’, after ‘trun’):
sdtp 8.6.4 independent and disposable samples
In 8.6.1.4.1, add at the end:
When the Composition to Decode Box is included in the Sample Table Box, it documents the composition and
decoding time relations of the samples in the Movie Box only, not including any subsequent movie fragments.
In 8.6.1.4.3, change the definition of compositionEndTime as follows:
compositionEndTime: the composition time plus the composition duration, of the sample with the
largest computed composition time (CTS) in the media of this track; if this field takes the value 0, the
composition end time is unknown.
In 8.6.4.1 change the header to:
Box Types: ‘sdtp’
Container: Sample Table Box (‘stbl’)
Mandatory: No
Quantity: Zero or one
add this paragraph before the paragraph beginning “The size of the table…”:
For tracks with a handler_type that is not ‘vide’, ‘soun’, ‘hint’ or ‘auxv’, if another sample with
sample_depends_on=2 or another sample tagged as a “Sync Sample” has already been processed and
unless specified otherwise, a sample tagged with sample_depends_on=2, and
sample_has_redundancy=1 can be discarded, and its duration added to the duration of the preceding one,
to maintain the timing of subsequent samples.
and delete this line from the text:
A sample dependency Box may also occur in the track fragment Box.
In 8.6.6.1, insert at the end:
A non-empty edit may insert a portion of the media timeline that is not present in the initial movie, and is
present only in subsequent movie fragments. The segment_duration of this edit may be zero, whereupon
the edit provides the offset from media composition time to movie presentation time, for the movie and
subsequent movie fragments. It is recommended that such an edit be used to establish a presentation time of
0 for the first presented sample, when composition offsets are used.
For example, if the composition time of the first composed frame is 20, then the edit that maps the media time
from 20 onwards to movie time 0 onwards, would read:
Entry-count = 1
Segment-duration = 0
Media-Time = 20
Media-Rate = 1
2 © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
At the end of 8.7.2.1, add this paragraph:
When a file that has data entries with the flag set indicating that the media data is in the same file, is split into
segments for transport, the value of this flag does not change, as the file is (logically) reassembled after the
transport operation.
Insert the following Subclauses into 8.7, numbered consecutively from the existing Subclauses:
8.7.8 Sample Auxiliary Information Sizes Box
8.7.8.1 Definition
Box Type: ‘saiz’
Container: Sample Table Box (‘stbl’) or Track Fragment Box ('traf')
Mandatory: No
Quantity: Zero or More
Per-sample sample auxiliary information may be stored anywhere in the same file as the sample data itself;
for self-contained media files, this is typically in a MediaData box or a box from a derived specification. It is
stored either (a) in multiple chunks, with the number of samples per chunk, as well as the number of chunks,
matching the chunking of the primary sample data or (b) in a single chunk for all the samples in a movie
sample table (or a movie fragment). The Sample Auxiliary Information for all samples contained within a single
chunk (or track run) is stored contiguously (similarly to sample data).
Sample Auxiliary Information, when present, is always stored in the same file as the samples to which it
relates as they share the same data reference (‘dref’) structure. However, this data may be located
anywhere within this file, using auxiliary information offsets (‘saio’) to indicate the location of the data.
Whether sample auxiliary information is permitted or required may be specified by the brands or the coding
format in use. The format of the sample auxiliary information is determined by aux_info_type. If
aux_info_type and aux_info_type_parameter are omitted then the implied value of aux_info_type
is either (a) in the case of transformed content, such as protected content, the scheme_type included in the
Protection Scheme Information box or otherwise (b) the sample entry type. The default value of the
aux_info_type_parameter is 0. Some values of aux_info_type may be restricted to be used only with
particular track types. A track may have multiple streams of sample auxiliary information of different types.
The types are registered at the registration authority.
While aux_info_type determines the format of the auxiliary information, several streams of auxiliary
information having the same format may be used when their value of aux_info_type_parameter differs.
The semantics of aux_info_type_parameter for a particular aux_info_type value must be specified
along with specifying the semantics of the particular aux_info_type value and the implied auxiliary
information format.
This box provides the size of the auxiliary information for each sample. For each instance of this box, there
must be a matching SampleAuxiliaryInformationOffsetsBox with the same values of
aux_info_type and aux_info_type_parameter, providing the offset information for this auxiliary
information.
NOTE For discussions on the use of sample auxiliary information versus other mechanisms, see
Annex C.8.
© ISO/IEC 2011 – All rights reserved 3
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
8.7.8.2 Syntax
aligned(8) class SampleAuxiliaryInformationSizesBox
extends FullBox(‘saiz’, version = 0, flags)
{
if (flags & 1) {
unsigned int(32) aux_info_type;
unsigned int(32) aux_info_type_parameter;
}
unsigned int(8) default_sample_info_size;
unsigned int(32) sample_count;
if (default_sample_info_size == 0)
{
unsigned int(8) sample_info_size[ sample_count ];
}
}
8.7.8.3 Semantics
aux_info_type is an integer that identifies the type of the sample auxiliary information. At most one
occurrence of this box with the same values for aux_info_type and aux_info_type_parameter
shall exist in the containing box.
aux_info_type_parameter identifies the “stream” of auxiliary information having the same value of
aux_info_type and associated to the same track. The semantics of aux_info_type_parameter
are determined by the value of aux_info_type.
default_sample_info_size is an integer specifying the sample auxiliary information size for the case
where all the indicated samples have the same sample auxiliary information size. If the size varies
then this field shall be zero.
sample_count is an integer that gives the number of samples for which a size is defined. For a Sample
Auxiliary Information Sizes box appearing in the Sample Table Box this must be the same as, or less
than, the sample_count within the Sample Size Box or Compact Sample Size Box. For a Sample
Auxiliary Information Sizes box appearing in a Track Fragment box this must be the same as, or less
than, the sum of the sample_count entries within the Track Fragment Run boxes of the Track
Fragment. If this is less than the number of samples, then auxiliary information is supplied for the
initial samples, and the remaining samples have no associated auxiliary information.
sample_info_size gives the size of the sample auxiliary information in bytes. This may be zero to
indicate samples with no associated auxiliary information.
8.7.9 Sample Auxiliary Information Offsets Box
8.7.9.1 Definition
Box Type: ‘saio’
Container: Sample Table Box (‘stbl’) or Track Fragment Box ('traf')
Mandatory: No
Quantity: Zero or More
For an introduction to sample auxiliary information, see the definition of the Sample Auxiliary Information Size
Box.
This box provides the position information for the sample auxiliary information, in a way similar to the chunk
offsets for sample data.
4 © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
8.7.9.2 Syntax
aligned(8) class SampleAuxiliaryInformationOffsetsBox
extends FullBox(‘saio’, version, flags)
{
if (flags & 1) {
unsigned int(32) aux_info_type;
unsigned int(32) aux_info_type_parameter;
}
unsigned int(32) entry_count;
if ( version == 0 )
{
unsigned int(32) offset[ entry_count ];
}
else
{
unsigned int(64) offset[ entry_count ];
}
}
8.7.9.3 Semantics
aux_info_type and aux_info_type_parameter are defined as in the
SampleAuxiliaryInformationSizesBox
entry_count gives the number of entries in the following table. For a Sample Auxiliary Information
Offsets box appearing in a Sample Table Box this must be equal to one or to the value of the
entry_count field in the Chunk Offset Box or Chunk Large Offset Box. For a Sample Auxiliary
Information Offsets Box appearing in a Track Fragment box, this must be equal to one or to the
number of Track Fragment Run boxes in the Track Fragment Box.
offset gives the position in the file of the Sample Auxiliary Information for each Chunk or Track
Fragment Run. If entry_count is one, then the Sample Auxiliary Information for all Chunks or Runs
is contiguous in the file in chunk or run order. When in the Sample Table Box, the offsets are absolute.
In a track fragment box, this value is relative to the base offset established by the track fragment
header box (‘tfhd’) in the same track fragment (see 8.8.14).
At the end of 8.8.4.1, add this note:
NOTE There is no requirement that any particular movie fragment extend all tracks present in the
movie header, and there is no restriction on the location of the media data referred to by the movie
fragments. However, derived specifications may make such restrictions.
In 8.8.5.1, after:
The movie fragment header contains a sequence number, as a safety check. The sequence number usually
starts at 1 and must increase for each movie fragment in the file, in the order in which they occur. This allows
readers to verify integrity of the sequence; it is an error to construct a file where the fragments are out of
sequence.
insert:
NOTE There is no requirement that the sequence numbers be consecutive, only that the value in
a given movie fragment be greater than in any preceding movie fragment.
At the end of 8.8.8.1, add:
The composition offset values in the composition time-to-sample box and in the track run box may be signed
or unsigned. The recommendations given in the composition time-to-sample box concerning the use of signed
composition offsets also apply here.
© ISO/IEC 2011 – All rights reserved 5
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
Replace the contents of 8.8.8.2 with the following:
aligned(8) class TrackRunBox
extends FullBox(‘trun’, version, tr_flags) {
unsigned int(32) sample_count;
// the following are optional fields
signed int(32) data_offset;
unsigned int(32) first_sample_flags;
// all fields in the following array are optional
{
unsigned int(32) sample_duration;
unsigned int(32) sample_size;
unsigned int(32) sample_flags
if (version == 0)
{ unsigned int(32) sample_composition_time_offset; }
else
{ signed int(32) sample_composition_time_offset; }
}[ sample_count ]
}
In 8.8, add the following:
8.8.12 Track fragment decode time
8.8.12.1 Definition
Box Type: `tfdt’
Container: Track Fragment box (‘traf’)
Mandatory: No
Quantity: Zero or one
The Track Fragment Base Media Decode Time Box provides the absolute decode time, measured on the
media timeline, of the first sample in decode order in the track fragment. This can be useful, for example,
when performing random access in a file; it is not necessary to sum the sample durations of all preceding
samples in previous fragments to find this value (where the sample durations are the deltas in the Decoding
Time to Sample Box and the sample_durations in the preceding track runs).
The Track Fragment Base Media Decode Time Box, if present, shall be positioned after the Track Fragment
Header Box and before the first Track Fragment Run box.
NOTE The decode timeline is a media timeline, established before any explicit or implied
mapping of media time to presentation time, for example by an edit list or similar structure.
8.8.12.2 Syntax
aligned(8) class TrackFragmentBaseMediaDecodeTimeBox
extends FullBox(‘tfdt’, version, 0) {
if (version==1) {
unsigned int(64) baseMediaDecodeTime;
} else { // version==0
unsigned int(32) baseMediaDecodeTime;
}
}
8.8.12.3 Semantics
version is an integer that specifies the version of this box (0 or 1 in this specification).
baseMediaDecodeTime is an integer equal to the sum of the decode durations of all earlier samples in
the media, expressed in the media's timescale. It does not include the samples added in the enclosing
track fragment.
6 © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
8.8.13 Level Assignment Box
8.8.13.1 Definition
Box Type: `leva’
Container: Movie Extends Box (`mvex’)
Mandatory: No
Quantity: Zero or one
Levels specify subsets of the file. Samples mapped to level n may depend on any samples of levels m, where
m <= n, and shall not depend on any samples of levels p, where p > n. For example, levels can be specified
according to temporal level (e.g., temporal_id of SVC or MVC).
Levels cannot be specified for the initial movie. When the Level Assignment box is present, it applies to all
movie fragments subsequent to the initial movie.
For the context of the Level Assignment box, a fraction is defined to consist of one or more Movie Fragment
boxes and the associated Media Data boxes, possibly including only an initial part of the last Media Data Box.
Within a fraction, data for each level shall appear contiguously. Data for levels within a fraction shall appear in
increasing order of level value. All data in a fraction shall be assigned to levels.
NOTE In the context of DASH (ISO/IEC 23009-1), each subsegment indexed within a
Subsegment Index box is a fraction.
The Level Assignment box provides a mapping from features, such as scalability layers, to levels. A feature
can be specified through a track, a sub-track within a track, or a sample grouping of a track.
When padding_flag is equal to 1 this indicates that a conforming fraction can be formed by
concatenating any positive integer number of levels within a fraction and padding the last Media Data
box by zero bytes up to the full size that is indicated in the header of the last Media Data box. For
example, padding_flag can be set equal to 1 when the following conditions are true:
Each fraction contains two or more AVC, SVC, or MVC [ISO/IEC 14496-15] tracks of the
same video bitstream.
The samples for each track of a fraction are contiguous and in decoding order in a Media
Data box.
The samples of the first AVC, SVC, or MVC level contain extractor NAL units for including the
video coding NAL units from the other levels of the same fraction.
8.8.13.2 Syntax
aligned(8) class LevelAssignmentBox extends FullBox(‘leva’, 0, 0)
{
unsigned int(8) level_count;
for (j=1; j <= level_count; j++) {
unsigned int(32) track_id;
unsigned int(1) padding_flag;
unsigned int(7) assignment_type;
if (assignment_type == 0)
unsigned int(32) grouping_type;
else if (assignment_type == 1) {
unsigned int(32) grouping_type;
unsigned int(32) grouping_type_parameter;
}
else if (assignment_type == 2) {} // no further syntax elements needed
else if (assignment_type == 3) {} // no further syntax elements needed
else if (assignment_type == 4)
unsigned int(32) sub_track_id;
// other assignment_type values are reserved
}
}
© ISO/IEC 2011 – All rights reserved 7
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
8.8.13.3 Semantics
level_count specifies the number of levels each fraction is grouped into. level_count shall be
greater than or equal to 2.
track_id for loop entry j specifies the track identifier of the track assigned to level j.
padding_flag equal to 1 indicates that a conforming fraction can be formed by concatenating any
positive integer number of levels within a fraction and padding the last Media Data box by zero bytes
up to the full size that is indicated in the header of the last Media Data box. The semantics of
padding_flag equal to 0 are that this is not assured.
assignment_type indicates the mechanism used to specify the assignment to a level.
assignment_type values greater than 4 are reserved, while the semantics for the other values are
specified as follows. The sequence of assignment_types is restricted to be a set of zero or more of
type 2 or 3, followed by zero or more of exactly one type.
0: sample groups are used to specify levels, i.e., samples mapped to different sample group
description indexes of a particular sample grouping lie in different levels within the identified track;
other tracks are not affected and must have all their data in precisely one level;
1: as for assignment_type 0 except assignment is by a parameterized sample group;
2, 3: level assignment is by track (see the Subsegment Index Box for the difference in processing
of these levels)
4: the respective level contains the samples for a sub-track. The sub-tracks are specified through
the Sub Track box; other tracks are not affected and must have all their data in precisely one
level;
grouping_type and grouping_type_parameter, if present, specify the sample grouping used to
map sample group description entries in the Sample Group Description box to levels. Level n contains
the samples that are mapped to the sample group description entry having index n in the Sample
Group Description box having the same values of grouping_type and
grouping_type_parameter, if present, as those provided in this box.
sub_track_id specifies that the sub-track identified by sub_track_id within loop entry j is mapped to
level j.
8.8.14 Sample Auxiliary Information in Movie Fragments
When sample auxiliary information (8.7.8 and 8.7.9) is present in the Movie Fragment box, the offsets in the
Sample Auxiliary Information Offsets Box are treated the same as the data_offset in the Track Fragment
Run box, that is, they are relative to any base data offset established for that track fragment. If movie fragment
relative addressing is used (no base data offset is provided in the track fragment header) and auxiliary
information is present, then the default_base_is_moof flag must also be set in the flags of that track
fragment header.
If only one offset is provided, then the Sample Auxiliary Information for all the track runs in the fragment is
stored contiguously, otherwise exactly one offset must be provided for each track run.
If the field default_sample_info_size is non-zero in one of these boxes, then the size of the auxiliary
information is constant for the identified samples.
In addition, if:
this box is present in the movie box,
and default_sample_info_size is non-zero in the box in the movie box,
and the sample auxiliary information sizes box is absent in a movie fragment,
then the auxiliary information has this same constant size for every sample in the movie fragment also; it is
then not necessary to repeat the box in the movie fragment.
8 © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
In 8.9.3.1, change header as follows:
Box Type: ‘sgpd’
Container: Sample Table Box (‘stbl’) or Track Fragment Box (‘traf’)
Mandatory: No
Quantity: Zero or more, with one for each Sample to Group Box.
Add to the end of 8.9.4:
Zero or more SampleGroupDefinition boxes may also be present in a Track Fragment Box. These definitions
are additional to the definitions provided in the Sample Table of the track in the Movie Box. Group definitions
within a movie fragment can also be referenced and used from within that same movie fragment.
Within the SampleToGroup box in that movie fragment, the group description indexes for groups defined
within the same fragment start at 0x10001, i.e. the index value 1, with the value 1 in the top 16 bits. This
means there must be fewer than 65536 group definitions for this track and grouping type in the sample table in
the Movie Box.
When changing the size of movie fragments, or removing them, these fragment-local group definitions will
need to be merged into the definitions in the movie box, or into the new movie fragments, and the index
numbers in the SampleToGroup box(es) adjusted accordingly. It is recommended that, in this process,
identical (and hence duplicate) definitions not be made in any SampleGroupDescription box, but that
duplicates be merged and the indexes adjusted accordingly.
In 8.12.1.1, change the box header:
Box Types: ‘sinf’
Container: Protected Sample Entry, or Item Protection Box (‘ipro’)
Mandatory: Yes
Quantity: Exactly oneOne or More
And add the following paragraph:
At least one protection scheme information box must occur in a protected sample entry. When more than one
occurs, they are equivalent, alternative, descriptions of the same protection. Readers should choose one to
process.
Following 8.15, add:
8.16 Segments
8.16.1 Introduction
Media presentations may be divided into segments for delivery, for example, it is possible (e.g. in HTTP
streaming) to form files that contain a segment – or concatenated segments – which would not necessarily
form ISO base media file format compliant files (e.g. they do not contain a movie box).
This Subclause defines specific boxes that may be used in such segments.
8.16.2 Segment Type Box
Box Type: `styp’
Container: File
Mandatory: No
Quantity: Zero or more
© ISO/IEC 2011 – All rights reserved 9
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
If segments are stored in separate files (e.g. on a standard HTTP server) it is recommended that these
‘segment files’ contain a segment-type box, which must be first if present, to enable identification of those files,
and declaration of the specifications with which they are compliant.
A segment type has the same format as an 'ftyp' box [4.3], except that it takes the box type 'styp'. The
brands within it may include the same brands that were included in the 'ftyp' box that preceded the
‘moov’ box, and may also include additional brands to indicate the compatibility of this segment with various
specification(s).
Valid segment type boxes shall be the first box in a segment. Segment type boxes may be removed if
segments are concatenated (e.g. to form a full file), but this is not required. Segment type boxes that are not
first in their files may be ignored.
8.16.3 Segment Index Box
8.16.3.1 Definition
Box Type: `sidx’
Container: File
Mandatory: No
Quantity: Zero or more
The Segment Index box ('sidx') provides a compact index of one media stream within the media segment to
which it applies. It is designed so that it can be used not only with media formats based on this specification
(i.e. segments containing sample tables or movie fragments), but also other media formats (for example,
MPEG-2 Transport Streams [ISO/IEC 13818-1]). For this reason, the formal description of the box given here
is deliberately generic, and then at the end of this Subclause the specific definitions for segments using movie
fragments are given.
Each Segment Index box documents how a (sub)segment is divided into one or more subsegments (which
may themselves be further subdivided using Segment Index boxes).
A subsegment is defined as a time interval of the containing (sub)segment, and corresponds to a single range
of bytes of the containing (sub)segment. The durations of all the subsegments sum to the duration of the
containing (sub)segment.
Each entry in the Segment Index box contains a reference type that indicates whether the reference points
directly to the media bytes of a referenced leaf subsegment, or to a Segment Index box that describes how
the referenced subsegment is further subdivided; as a result, the segment may be indexed in a ‘hierarchical’
or ‘daisy-chain’ or other form by documenting time and byte offset information for other Segment Index boxes
applying to portions of the same (sub)segment.
Each Segment Index box provides information about a single media stream of the Segment, referred to as the
reference stream. If provided, the first Segment Index box in a segment, for a given media stream, shall
document the entirety of that media stream in the segment, and shall precede any other Segment Index box in
the segment for the same media stream.
If a segment index is present for at least one media stream but not all media streams in the segment, then
normally a media stream in which not every access unit is independently coded, such as video, is selected to
be indexed. For any media stream for which no segment index is present, referred to as non-indexed stream,
the media stream associated with the first Segment Index box in the segment serves as a reference stream in
a sense that it also describes the subsegments for any non-indexed media stream.
NOTE 1 Further restrictions may be specified in derived specifications.
Segment Index boxes may be inline in the same file as the indexed media or, in some cases, in a separate file
containing only indexing information.
10 © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
A Segment Index box contains a sequence of references to subsegments of the (sub)segment documented by
the box. The referenced subsegments are contiguous in presentation time. Similarly, the bytes referred to by a
Segment Index box are always contiguous in both the media file, and the separate index segment, or in the
single file if indexes are placed within the media file. The referenced size gives the count of the number of
bytes in the material referenced.
NOTE 2 A media segment may be indexed by more than one “top-level” Segment Index box that
are independent of each other, each of which indexes one media stream within the media segment. In
segments containing multiple media streams the referenced bytes may contain media from multiple
streams, even though the Segment Index box provides timing information for only one media stream.
In the file containing the Segment Index box, the anchor point for a Segment Index box is the first byte after
that box. If there are two files, the anchor point in the media file is the beginning of the top-level segment (i.e.
the beginning of the segment file if each segment is stored in a separate file). The material in the file
containing media (which may also be the file that contains the segment index boxes) starts at the indicated
offset from the anchor point. If there are two files, the material in the index file starts at the anchor point, i.e.
immediately following the Segment Index box.
Within the two constraints (a) that, in time, the subsegments are contiguous, that is, each entry in the loop is
consecutive from the immediately preceding one and (b) within a given file (integrated file, media file, or index
side file) the referenced bytes are contiguous, there are a number of possibilities, including:
1) a reference to a segment index box may include, in its byte count, immediately following Segment
Index boxes that document subsegments;
2) in an integrated file, using the first_offset field, it is possible to separate Segment Index boxes
from the media that they refer to;
3) in an integrated file, it is possible to locate Segment Index boxes for subsegments close to the media
they index;
4) when a separate file containing Segment Indexes is used, it is possible for the loop entries to be of
‘mixed type’, some to Segment Index boxes in the index segment, some to media subsegments in the
media file.
NOTE 3 Profiles may be used to restrict the placement of segment indexes, or the overall
complexity of the indexing.
The Segment Index box documents the presence of Stream Access Points (SAPs), as specified in Annex I, in
the referenced subsegments. The annex specifies characteristics of SAPs, such as I , I and T , as well
SAU SAP SAP
as SAP types, which are all used in the semantics below. A subsegment starts with a SAP when the
subsegment contains a SAP, and for the first SAP, I is the index of the first access unit that follows I ,
SAU SAP
and I is contained in the subsegment.
SAP
For segments based on this specification (i.e. based on movie sample tables or movie fragments):
an access unit is a sample;
a subsegment is a self-contained set of one or more consecutive movie fragments; a self-contained
set contains one or more Movie Fragment boxes with the corresponding Media Data box(es), and a
Media Data Box containing data referenced by a Movie Fragment Box must follow that Movie
Fragment box and precede the next Movie Fragment box containing information about the same
track;
Segment Index boxes shall be placed before subsegment material they document, that is, before any
Movie Fragment (‘moof’) box of the documented material of the subsegment;
streams are tracks in the file format, and stream IDs are track IDs;
a subsegment contains a stream access point if a track fragment within the subsegment for the track
with track_ID equal to reference_ID contains a stream access point;
initialisation data for SAPs consists of the movie box;
presentation times are in the movie timeline, that is they are composition times after the application of
any edit list for the track;
© ISO/IEC 2011 – All rights reserved 11
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
the I is a position exactly pointing to the start of a top-level box, such as a movie fragment box
SAP
'moof';
a SAP of type 1 or type 2 is indicated as a sync sample, or by sample_is_not_sync_sample
equal to 0 in the movie fragment;
a SAP of type 3 is marked as a member of a sample group of type ‘rap ‘;
a SAP of type 4 is marked as a member of a sample group of type ‘roll‘ where the value of the
roll_distance field is greater than 0.
NOTE 4 For SAPs of type 5 and 6, no specific signalling in the ISO base media file format is
supported.
8.16.3.2 Syntax
aligned(8) class SegmentIndexBox extends FullBox(‘sidx’, version, 0) {
unsigned int(32) reference_ID;
unsigned int(32) timescale;
if (version==0)
{
unsigned int(32) earliest_presentation_time;
unsigned int(32) first_offset;
}
else
{
unsigned int(64) earliest_presentation_time;
unsigned int(64) first_offset;
}
unsigned int(16) reserved = 0;
unsigned int(16) reference_count;
for(i=1; i <= reference_count; i++)
{
bit (1) reference_type;
unsigned int(31) referenced_size;
unsigned int(32) subsegment_duration;
bit(1) starts_with_SAP;
unsigned int(3) SAP_type;
unsigned int(28) SAP_delta_time;
}
}
8.16.3.3 Semantics
reference_ID provides the stream ID for the reference stream; if this Segment Index box is referenced
from a “parent” Segment Index box, the value of reference_ID shall be the same as the value of
reference_ID of the “parent” Segment Index box;
timescale provides the timescale, in ticks per second, for the time and duration fields within this box; it
is recommended that this match the timescale of the reference stream or track; for files based on this
specification, that is the timescale field of the Media Header Box of the track;
earliest_presentation_time is the earliest presentation time of any access unit in the reference
stream in the first subsegment, in the timescale indicated in the timescale field;
first_offset is the distance in bytes, in the file containing media, from the anchor point, to the first
byte of the indexed material;
reference_count provides the number of referenced items;
reference_type: when set to 1 indicates that the reference is to a segment index (‘sidx’) box;
otherwise the reference is to media content (e.g., in the case of files based on this specification, to a
movie fragment box); if a separate index segment is used, then entries with reference type 1 are in
the index segment, and entries with reference type 0 are in the media file;
referenced_size: the distance in bytes from the first byte of the referenced item to the first byte of the
next referenced item, or in the case of the last entry, the end of the referenced material;
subsegment_duration: when the reference is to Segment Index box, this field carries the sum of the
subsegment_duration fields in that box; when the reference is to a subsegment, this field carries
the difference between the earliest presentation time of any access unit of the reference stream in the
next subsegment (or the first subsegment of the next segment, if this is the last subsegment of the
12 © ISO/IEC 2011 – All rights reserved
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
segment, or the end presentation time of the reference stream if this is the last subsegment of the
stream) and the earliest presentation time of any access unit of the reference stream in the referenced
subsegment; the duration is in the same units as earliest_presentation_time;
starts_with_SAP indicates whether the referenced subsegments start with a SAP. For the detailed
semantics of this field in combination with other fields, see the table below.
SAP_type indicates a SAP type as specified in Annex I, or the value 0. Other type values are reserved.
For the detailed semantics of this field in combination with other fields, see the table below.
SAP_delta_time: indicates T of the first SAP, in decoding order, in the referenced subsegment for
SAP
the reference stream. If the referenced subsegments do not contain a SAP, SAP_delta_time is
reserved with the value 0; otherwise SAP_delta_time is the difference between the earliest
presentation time of the subsegment, and the T (note that this difference may be zero, in the case
SAP
that the subsegment starts with a SAP).
Table 1 — Semantics of SAP and reference type combinations
starts_with_SAP SAP_type reference_type Meaning
0 0 0 or 1 No information of SAPs is provided.
0 1 to 6, 0 (media) The subsegment contains (but may not
inclusive start with) a SAP of the given SAP_type
and the first SAP of the given SAP_type
corresponds to SAP_delta_time.
0 1 to 6, 1 (index) All the referenced subsegments contain a
inclusive SAP of at most the given SAP_type and
none of these SAPs is of an unknown
type.
1 0 0 (media) The subsegment starts with a SAP of an
unknown type.
1 0 1 (index) All the referenced subsegments start with
a SAP which may be of an unknown type
1 1 to 6, 0 (media) The referenced subsegment starts with a
inclusive SAP of the given SAP_type.
1 1 to 6, 1 (index) All the referenced subsegments start with
inclusive a SAP of at most the given SAP_type
and none of these SAPs is of an
unknown type.
8.16.4 Subsegment Index Box
8.16.4.1 Definition
Box Type: `ssix’
Container: File
Mandatory: No
Quantity: Zero or more
The Subsegment Index box ('ssix') provides a mapping from levels (as specified by the Level Assignment box)
to byte ranges of the indexed subsegment. In other words, this box provides a compact index for how the data
in a subsegment is ordered according to levels into partial subsegments. It enables a client to easily access
data for partial subsegments by downloading ranges of data in the subsegment.
Each byte in the subsegment shall be assigned to a level. If the range is not associated with any information in
the level assignment, then any level that is not included in the level assignment may be used.
© ISO/IEC 2011 – All rights reserved 13
ISO/IEC 15444-12:2008/FDAM 3:2011(E)
There shall be 0 or 1 Subsegment Index boxes per each Segment Index box that indexes only leaf
subsegments, i.e. that only indexes subsegments but no segment indexes. A Subsegment Index box, if any,
shall be the next box after the associated Segment Index box. A Subsegment Index box documents the
subsegment that is indicated in the immediately preceding Segment Index box.
In general, the media data constructed from the byte ranges is incomplete, i.e. it does not conform to the
media format of the entire subsegment.
For leaf subsegments based on this specification (i.e. based on movie sample tables and movie fragments):
Each level shall be assigned to exactly one partial subsegment, i.e. byte ranges for one level shall be
contiguous.
Levels of partial subsegments shall be assigned by increasing numbers within a subsegment, i.e.,
samples of a partial subsegment may depend on any samples of preceding partial subsegments in
the same subsegment, but not the other way around. For example, each partial subsegment contains
samples having an identical temporal level and partial subsegments appear in increasing temporal
level order within the subsegment.
When a partial subsegment is accessed in this way, for any assignment_type other than 3, the
final Media Data box may be incomplete, that is, less data is accessed than the length indication of
the Media Data Box indicates is present. The length of the Media Data box may need adjusting, or
padding used. The padding_flag in the Level Assignment Box indicates whether this missing data
can be replaced by zeros. If not, the sample data for samples assigned to levels that are not accessed
is not present, and care should be taken not to attempt to process such samples.
NOTE assignment_type equal to 0 (specified in the subsegment index box ‘leva’) can be used,
for example, together with the temporal level sample grouping (‘tele’) when frames of a video bitstream
are temporally ordered within subsegments; assignment_type equal to 2 can be used, for example,
when each view of a multiview video bitstream is contained in a separate track and the track fragments
for all the views are contained in a single movie fragment. assignment_type equal to 3 may be used,
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...