Information technology — Multimedia content description interface — Part 15: Compact descriptors for video analysis

This document addresses descriptor technology for search and retrieval applications, i.e. for visual content matching in video. Visual content matching includes matching of views of large and small objects and scenes, with robustness to partial occlusions as well as changes in vantage point, camera parameters and lighting conditions. The objects of interest comprise planar or non-planar, rigid or partially rigid, textured or partially textured objects, but exclude the identification of people and faces. The databases can be large, for example broadcast archives or videos available on the internet. Such applications thus require video descriptors that enable matching with smaller descriptor sizes and shorter runtimes as compared to application enabled by single-frame (still image) descriptors (e.g. CVDS, ISO/IEC 15938-13) in the video domain. Compact descriptors for video analysis for search and retrieval applications: — enable design of interoperable object instance search applications; — minimize the size of video descriptors; — ensure high matching performances of objects (in terms of accuracy and complexity); — enable efficient implementation of those functionalities on professional or embedded systems. This document provides a complementary tool to the suite of existing standards, such as ISO/IEC 15938-13.

Technologies de l'information — Interface de description du contenu multimédia — Partie 15: Descripteurs compacts pour analyse de vidéo

General Information

Status: Published
Publication Date: 14-Jul-2019

ICS: 35.040.40 - Coding of audio, video, multimedia and hypermedia information

Technical Committee: ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information
Drafting Committee: ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information

Current Stage: 9060 - Close of review
Completion Date: 04-Mar-2030

Relations

Consolidated By: ISO 19901-4:2025 - Oil and gas industries including lower carbon energy — Specific requirements for offshore structures — Part 4: Geotechnical design considerations
Effective Date: 06-Jun-2022

Overview

ISO/IEC 15938-15:2019 - Compact descriptors for video analysis (CDVA) specifies a standardized descriptor format and extraction/encoding process for visual content matching in video. The standard targets search and retrieval applications that require robust, compact video descriptors for matching object instances and scenes across large video databases (broadcast archives, internet video). It emphasizes small descriptor size, short runtimes, interoperability and efficient implementation on professional and embedded systems. The scope excludes person or face identification.

Key topics and requirements

CDVA descriptor structure
- Defines descriptor components including global descriptors, local descriptors, and deep feature descriptors for video analysis (see Clause 6).
- Specifies a binary bitstream syntax and header/segment structures to enable interoperable exchange (see Clause 5).
Extraction and encoding procedures
- Normative steps for extracting and encoding CDVA descriptors to minimize size while preserving matching performance (Clause 6).
- Annex A provides recommended parameter values for the encoding process.
Deep feature extraction
- Parameters and a neural network model for deep features are described (Annex B), enabling compact learned representations suitable for video.
Robustness and performance goals
- Designed for robustness to partial occlusion, viewpoint and camera changes, and varying lighting conditions.
- Prioritizes accuracy vs. complexity trade-offs and reduced runtime compared with single-frame visual descriptors.
Interoperability and implementation
- Bitstream syntax, descriptor semantics and recommended parameters to support interoperable object instance search systems and efficient implementation on embedded and professional platforms.

Applications and users

Video search & retrieval platforms
- Large-scale content matching across broadcast archives, streaming libraries, and user-generated video repositories.
Media asset management
- Automated cataloging, duplicate detection and content-based retrieval for broadcasters and OTT services.
Embedded and edge devices
- Low-latency visual matching on cameras, mobile devices and edge analytics systems that require small, efficient descriptors.
Research and product development
- Developers of visual search engines, video analytics tools and computer vision systems seeking a standardized descriptor format.

Typical users: software engineers, system architects, multimedia indexing teams, researchers in computer vision and companies building video search, content monitoring or asset-management solutions.

Related standards

ISO/IEC 15938-13:2015 - Compact descriptors for visual search (still-image descriptors); CDVA complements this for video.
Other parts of the ISO/IEC 15938 (MPEG-7) series covering systems, description languages and profiles.

Buy Documents

ISO/IEC 15938-15:2019 - Information technology -- Multimedia content description interface - Page 1 preview

ISO/IEC 15938-15:2019 - Information technology -- Multimedia content description interface - Page 2 preview

ISO/IEC 15938-15:2019 - Information technology -- Multimedia content description interface - Page 3 preview

Standard

ISO/IEC 15938-15:2019 - Information technology -- Multimedia content description interface

English language (32 pages)

sale 15% off

Preview

sale 15% off

Preview

ISO/IEC 15938-15:2019 - Information technology — Multimedia content description interface — Part 15: Compact descriptors for video analysis
Released:7/15/2019 - Page 1 preview

Standard

ISO/IEC 15938-15:2019 - Information technology — Multimedia content description interface — Part 15: Compact descriptors for video analysis Released:7/15/2019

English language (32 pages)

sale 15% off

Preview

sale 15% off

Preview

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

Visit Website

NYCE

Mexican standards and certification body.

EMA Mexico Verified

Visit Website

Frequently Asked Questions

What is ISO/IEC 15938-15:2019?

ISO/IEC 15938-15:2019 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology — Multimedia content description interface — Part 15: Compact descriptors for video analysis". This standard covers: This document addresses descriptor technology for search and retrieval applications, i.e. for visual content matching in video. Visual content matching includes matching of views of large and small objects and scenes, with robustness to partial occlusions as well as changes in vantage point, camera parameters and lighting conditions. The objects of interest comprise planar or non-planar, rigid or partially rigid, textured or partially textured objects, but exclude the identification of people and faces. The databases can be large, for example broadcast archives or videos available on the internet. Such applications thus require video descriptors that enable matching with smaller descriptor sizes and shorter runtimes as compared to application enabled by single-frame (still image) descriptors (e.g. CVDS, ISO/IEC 15938-13) in the video domain. Compact descriptors for video analysis for search and retrieval applications: — enable design of interoperable object instance search applications; — minimize the size of video descriptors; — ensure high matching performances of objects (in terms of accuracy and complexity); — enable efficient implementation of those functionalities on professional or embedded systems. This document provides a complementary tool to the suite of existing standards, such as ISO/IEC 15938-13.

What is the scope of ISO/IEC 15938-15:2019?

What ICS categories does ISO/IEC 15938-15:2019 belong to?

ISO/IEC 15938-15:2019 is classified under the following ICS (International Classification for Standards) categories: 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

What standards are related to ISO/IEC 15938-15:2019?

ISO/IEC 15938-15:2019 has the following relationships with other standards: It is inter standard links to ISO 19901-4:2025. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

How can I access ISO/IEC 15938-15:2019?

ISO/IEC 15938-15:2019 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 15938-15
First edition
2019-07
Information technology — Multimedia
content description interface —
Part 15:
Compact descriptors for video analysis
Technologies de l'information — Interface de description du contenu
multimédia —
Partie 15: Descripteurs compacts pour analyse de vidéo
Reference number
©
ISO/IEC 2019
© ISO/IEC 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2019 – All rights reserved

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms, operators, mnemonics, functions and symbols .3
4.1 General . 3
4.2 Abbreviated terms . 3
4.3 Arithmetic operators . 3
4.4 Logical operators . 4
4.5 Relational operators . 4
4.6 Bitwise operators. 4
4.7 Interval specification . 4
4.8 Mnemonics . 5
4.9 Functions . 5
4.10 Symbols . 5
5 CDVA bitstream syntax . 6
5.1 CDVA descriptor . 6
5.1.1 Binary representation syntax . 6
5.1.2 Descriptor component semantics . 7
5.2 CDVA header . 7
5.2.1 Binary representation syntax . 7
5.2.2 Descriptor component semantics . 8
5.3 Segment header .10
5.3.1 General.10
5.3.2 Binary representation syntax .10
5.3.3 Descriptor component semantics .10
5.4 Global descriptor .11
5.4.1 Binary representation syntax .11
5.4.2 Descriptor component semantics .11
5.5 Local descriptor .12
5.5.1 General.12
5.5.2 Local feature descriptor .12
5.5.3 Local descriptor locations .14
5.6 Deep feature descriptor .15
5.6.1 Binary representation syntax .15
5.6.2 Descriptor component semantics .15
6 CDVA descriptor .15
6.1 Components .15
6.1.1 General.15
6.1.2 Global descriptor .16
6.1.3 Local descriptor .19
6.1.4 Deep feature descriptor .20
6.2 Encoding procedure .23
6.2.1 General.23
6.2.2 Normative steps .25
6.2.3 Informative steps .26
Annex A (normative) Recommended parameter values .28
Annex B (normative) Parameters of the deep feature extraction process .29
Bibliography .32
© ISO/IEC 2019 – All rights reserved iii

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that
are members of ISO or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work.
The procedures used to develop this document and those intended for its further maintenance are
specified in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see: www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 15938 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at http: //www .iso .org/members .html.
iv © ISO/IEC 2019 – All rights reserved

Introduction
ISO/IEC 15938 (all parts), also known as "Multimedia content description interface", provides a
standardized set of technologies for describing multimedia content. It addresses a broad spectrum of
multimedia applications and requirements by providing a metadata system for describing the features
of multimedia content.
The following are specified in this ISO/IEC 15938 (all parts):
Description schemes (DS) describe entities or relationships pertaining to multimedia content.
Description schemes specify the structure and semantics of their components, which may be description
schemes, descriptors or datatypes.
Descriptors (D) describe features, attributes or groups of attributes of multimedia content.
Datatypes are the basic reusable datatypes employed by description schemes and descriptors.
Description definition language (DDL) defines description schemes, descriptors and datatypes by
specifying their syntax, and allows their extension.
Systems tools support delivery of descriptions, multiplexing of descriptions with multimedia content,
synchronization, file format, etc.
The ISO/IEC 15938 series is subdivided into 15 published parts with further parts in development:
— Part 1: Systems: specifies the tools for preparing descriptions for efficient transport and storage,
compressing descriptions, and allowing synchronization between content and descriptions.
— Part 2: Description definition language: specifies the language for defining the series set of
description tools (DSs, Ds and datatypes) and for defining new description tools.
— Part 3: Visual: specifies the description tools pertaining to visual content.
— Part 4: Audio: specifies the description tools pertaining to audio content.
— Part 5: Multimedia description schemes: specifies the generic description tools pertaining to
multimedia including audio and visual content.
— Part 6: Reference software: provides a software implementation of the series.
— Part 7: Conformance testing: specifies the guidelines and procedures for testing conformance of
implementations of the series.
— Part 8: Extraction and use of MPEG-7 descriptions: provides guidelines and examples of the
extraction and use of descriptions.
— Part 9: Profiles and levels: provides guidelines and standard profiles.
— Part 10: Schema definition: specifies the schema using description definition language.
— Part 11: MPEG-7 profile schemas: listing of profile schemas using description definition language.
— Part 12: Query format: contains the tools of the MPEG query format (MPQF).
— Part 13: Compact descriptors for visual search: specifies an image description tool for visual
search applications.
— Part 14: Reference software, conformance and usage guidelines for compact descriptors for
visual search: provides the reference software and guidelines, specifies the conformance testing.
— Part 15: Compact descriptors for video analysis (this document): specifies a video description
tool designed to enable efficient and interoperable video analysis applications, allowing visual
content matching in videos.
© ISO/IEC 2019 – All rights reserved v

The structure of this document is as follows:
— Clause 5 specifies the binary representation syntax and descriptor component semantics for a
CDVA descriptor.
— Clause 6 specifies the extraction and encoding process for a CDVA descriptor.
— Annex A specifies recommended values for the parameters of the encoding process of Clause 6.
— Annex B specifies parameters and a neural network model of the deep feature extraction process.
The International Organization for Standardization (ISO) and International Electrotechnical
Commission (IEC) draw attention to the fact that it is claimed that compliance with this document may
involve the use of a patent.
ISO and IEC take no position concerning the evidence, validity and scope of this patent right. The
holder of this patent right has assured ISO and IEC that he/she is willing to negotiate licences under
reasonable and non-discriminatory terms and conditions with applicants throughout the world. In this
respect, the statement of the holder of this patent right is registered with ISO and IEC. Information may
be obtained from:
Joanneum Research Forschungagesellshaft mbH
Leonhardstrasse 59
8010 Graz, Austria
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights other than those identified above. ISO and IEC shall not be held responsible for identifying
any or all such patent rights.
vi © ISO/IEC 2019 – All rights reserved

INTERNATIONAL STANDARD ISO/IEC 15938-15:2019(E)
Information technology — Multimedia content description
interface —
Part 15:
Compact descriptors for video analysis
1 Scope
This document addresses descriptor technology for search and retrieval applications, i.e. for visual
content matching in video. Visual content matching includes matching of views of large and small
objects and scenes, with robustness to partial occlusions as well as changes in vantage point, camera
parameters and lighting conditions. The objects of interest comprise planar or non-planar, rigid or
partially rigid, textured or partially textured objects, but exclude the identification of people and
faces. The databases can be large, for example broadcast archives or videos available on the internet.
Such applications thus require video descriptors that enable matching with smaller descriptor sizes
and shorter runtimes as compared to application enabled by single-frame (still image) descriptors
(e.g. CVDS, ISO/IEC 15938-13) in the video domain.
Compact descriptors for video analysis for search and retrieval applications:
— enable design of interoperable object instance search applications;
— minimize the size of video descriptors;
— ensure high matching performances of objects (in terms of accuracy and complexity);
— enable efficient implementation of those functionalities on professional or embedded systems.
This document provides a complementary tool to the suite of existing standards, such as
ISO/IEC 15938-13.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO/IEC 15938-13:2015, Information technology — Multimedia content description interface — Part 13:
Compact descriptors for visual search
Neural Network Exchange Format, The Khronos Group, Version 1.0, Revision 3, 2018-06-13.
RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, Jan. 2005.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
© ISO/IEC 2019 – All rights reserved 1

3.1
image descriptor
descriptor extracted from a single key frame (3.6) sampled from the input video (3.8), which contains
global descriptor (3.2), local feature descriptor (3.3) and deep feature descriptor (3.4)
Note 1 to entry: Image descriptors are encoded as described in Clause 6.
3.2
global descriptor
aggregation of local feature descriptors into a compact representation of the image (3.5)
Note 1 to entry: The aggregation is as described in subclause 6.1.2.
3.3
local feature descriptor
descriptor of a local region, extracted around an interest point (a point in an image (3.5) showing
detection stability under local and global perturbations in the image domain, including perspective
transformations, changes in image scale, and illumination variations)
Note 1 to entry: The extraction is as described in subclause 6.1.3.
3.4
deep feature descriptor
feature descriptor extracted from a layer of a trained convolutional neural network
Note 1 to entry: The extraction is as described in subclause 6.1.4.
3.5
image
input key frame (3.6) to the image descriptor (3.1) encoder
Note 1 to entry: The image is as described in Clause 6.
3.6
key frame
frame extracted from the input video segment (3.7) by the frame difference process of colour histogram
Note 1 to entry: The extraction is as described in subclause 6.2.
3.7
input video segment
time range (temporal segment) of a video and from which a descriptor is extracted
3.8
input video
image sequence to be processed by the system containing a number of input video segment(s) (3.7) to
CDVA extraction process
Note 1 to entry: Input video is as described in Clause 6.
3.9
segment descriptor
descriptor extracted from the sampled key frames (3.6) of an input video segment (3.7)
Note 1 to entry: Segment descriptors are encoded as described in Clause 6. They are contructed from the image
descriptors (3.1) of the sampled key frames of the input video segment.
3.10
representative frame
frame of an input video segment (3.7) for which an uncompressed descriptor is represented and which is
used as the basis for differential encoding
2 © ISO/IEC 2019 – All rights reserved

3.11
pixel
indexable element on an integer grid of the original image or the converted image, comprising spatial
coordinates, a luminance value and (optional) chrominance values
4 Abbreviated terms, operators, mnemonics, functions and symbols
4.1 General
The mathematical symbols used in this document are similar to those used in the C programming
language. However, integer divisions with truncation and rounding are specifically defined. Numbering
and counting loops generally begin with zero.
4.2 Abbreviated terms
ABAC adaptive binary arithmetic coding
CDVA compact descriptors for visual analysis as defined by this document
CDVS compact descriptors for visual search as defined by ISO/IEC 15938-13
CNN convolutional neural network
MPEG-7 ISO/IEC 15938 (all parts)
NIP nested invariance pooling
NN neural network
NNEF neural network exchange format as defined by the Khronos specification referenced
in Clause 2
PCA principal component analysis
RGB red-green-blue colour space
ROI region of interest
SCFV scalable compressed fisher vector
URI uniform resource identifier as defined by RFC 3986
XOR binary exclusive OR operation
4.3 Arithmetic operators
+ addition
- subtraction (as a binary operator) or negation (as a unary operator)
++ increment, i.e. x++ is equivalent to x = x+1
-- decrement, i.e. x-- is equivalent to x = x−1
* multiplication
× multiplication
© ISO/IEC 2019 – All rights reserved 3

^ power
/ integer division with truncation of the result towards zero
For example, 7/4 and −7/−4 are truncated to 1, and −7/4 and 7/−4 are truncated to −1
// integer division with rounding to the nearest integer; half-integer values are round-
ed away from zero unless otherwise specified
For example, 3//2 is rounded to 2, and −3//2 is rounded to −2.
÷ indicates division in mathematical equations where no rounding is intended
% modulus operator, defined only for positive numbers
ceil minimum integer number greater than or equal to the given floating point number
sqrt square root
4.4 Logical operators
|| logical OR
&& logical AND
! logical NOT
⊕ bit-wise difference (XOR) operator
4.5 Relational operators
> greater than
>= greater than or equal to
≥ greater than or equal to
< less than
<= less than or equal to
≤ less than or equal to
== equal to
!= not equal to
4.6 Bitwise operators
| OR
& AND
4.7 Interval specification
[a;b] inclusive range from a to b
4 © ISO/IEC 2019 – All rights reserved

4.8 Mnemonics
The following mnemonics are defined to describe the different data types used in the coded bitstream.
bslbf bit string, left bit first, where “left” is the order in which bits are written in this
document
Bit strings are generally written as a string of 1s and 0s within single quote marks,
e.g. ‘1000 0001’. Blanks within a bit string are for ease of reading and have no sig-
nificance. For convenience, large strings are occasionally written in hexadecimal, in
which case conversion to a binary in the conventional manner will yield the value of
the bit string. Thus, the left-most hexadecimal digit is first and in each hexadecimal
digit the most significant of the four digits is first.
uimsbf unsigned integer, most significant bit first
vlclbf variable length code, left bit first, where “left” refers to the order in which the VLC
codes are written in this document
The byte order of multibyte words is most significant byte first.
4.9 Functions
argmax () maximum value in argument list
i
argmin () minimum value in argument list
i
δ distance function for global descriptors
g
δ distance function for local descriptors
l
ib<
summation of f(i) with i taking integer values from a up to, but not including b
fi
()
∑
ia=
10a!=
L0 norm ()

Lx0 ,y =−xy =−δ xy , where δ a =
() () () 
∑ ii
i
00a=
()


L1 norm
Lx1 ,y =−xy =−xy
()
ii
∑
1 i
L2 norm
Lx2 ,y =−xy =−sqrt xy
() ()
()∑ ii
i
Euclidean distance
D,xy =−sqrt xy
() ()
()∑ ii
i
hist(I,C) histogram of image I for colour channel C
4.10 Symbols
selection priority of local feature
β
c number of channels of feature map (dimension of descriptor extracted from CNN)
Δ deep feature descriptor for frame k
k
D binarized deep feature descriptor for frame k
k
f feature vector of local descriptor
© ISO/IEC 2019 – All rights reserved 5

G set of global descriptors
G global descriptor of frame k
k
γ result of pooling operation for feature map for frame k
k
h feature map height
I RGB image of frame k
k
k key frame index
m feature map index
n number of key frames in a video segment
k
number of local descriptors of frame k
k
n
l
n number of rotation transformations
r
n number of scale transformations
s
ρ representative frame of video segment
p , p , p statistical moment for pooling operation
r s t
P scale invariance pooling
s
P rotation invariance pooling
r
P translation invariance pooling
t
q quantization function
k
L set of local descriptors of frame k
L list of local feature descriptors of a video segment
th
i local descriptor of frame k
k
l
i
θ threshold for local descriptor distance
l
w feature map width
x horizontal image coordinate
y vertical image coordinate
5 CDVA bitstream syntax
5.1 CDVA descriptor
5.1.1 Binary representation syntax
CDVADescriptor { Number of bits Mnemonics
CDVAHeader ≥80 vlclbf
for (i=0; i 6 © ISO/IEC 2019 – All rights reserved

CDVADescriptor { Number of bits Mnemonics
SegmentHeader 136 bslbf
GlobalDescriptor ≥40 vlclbf
if (LocalFeatureDescriptorPresent) {
LocalDescriptor ≥1 vlclbf
LocalDescriptorLocations ≥1 vlclbf
}
if (DeepFeatureDescriptorPresent) {
DeepFeatureDescriptor ≥1 vlclbf
}
}
}
5.1.2 Descriptor component semantics
CDVAHeader
A bitstream header as defined in subclause 5.2. It also contains information whether the
corresponding descriptior components are present, i.e. it defines LocalFeatureDescriptorPresent and
DeepFeatureDescriptorPresent.
NrSegments
Number of the segments in the input video.
SegmentHeader
Header for each segment as defined in subclause 5.3. The segment header contains the size of the rest of
the segment description. The segments header contains size fields.
GlobalDescriptor
As defined in subclause 5.4.
LocalDescriptor
As defined in subclause 5.5.
LocalDescriptorLocations
As defined in subclause 5.5.3.
DeepFeatureDescriptor
As defined in subclause 5.6.
5.2 CDVA header
5.2.1 Binary representation syntax
CDVAHeader { Number of bits Mnemonics
VersionID 4 bslbf
ExtractionParams {
skip 4 uimsbf
kfTh 8 uimsbf
segTh 8 uimsbf
© ISO/IEC 2019 – All rights reserved 7

CDVAHeader { Number of bits Mnemonics
verTh 8 uimsbf
minLocalDiff 8 uimsbf
CDVSModeID 3 uimsbf
Reserved5Bits 5
}
LocalFeatureDescriptorPresent 1 bslbf
DeepFeatureDescriptorPresent 1 bslbf
IsDefaultNN 1 bslbf
IsDeepFeatureDescriptorBinarized 1 bslbf
Reserved4Bits 4
if (DeepFeatureDescriptorPresent AND IsDefaultNN>0) {
NNUri >=8 vlclbf
NNFormat 8 bslbf
DeepFvDim 16 uimsbf
}
OriginalPictureWidth 16 uimsbf
OriginalPictureHeight 16 uimsbf
if (LocalFeatureDescriptorPresent) {
HistogramMapSizeX 16 uimsbf
HistogramMapSizeY 16 uimsbf
}
}
5.2.2 Descriptor component semantics
The header is included once for each file or stream, at the beginning of the CDVA descriptor bitstream.
VersionID
The version number of the CDVA descriptor (1 for the version specified in this document).
ExtractionParams
Extraction parameters include all free parameters for extraction that are not predefined to ensure
interoperability. This set includes the parameters for all descriptors supported by CDVA.
skipNum Number of frames to be skipped after a sampled frame.
kfTh Threshold (colour histogram) for selecting key frames. [0.00;2.55], represented as
unsigned integer kfTh*100.
segTh Threshold (colour histogram) for segment candidates. [0.0;2.55], represented as
unsigned integer segTh*100.
verTh Threshold (SCFV) for verifying segment candidates [0;255]
minLocalDiff The minimum local difference (in terms of elements) between local descriptors to be
encoded. [0;255]
CDVSModeID Identifier of the CDVS mode used for extracting the local descriptors [0;6]
Reserved5Bits To be skipped by the parser.
8 © ISO/IEC 2019 – All rights reserved

LocalFeatureDescriptorPresent
Indication of whether local feature descriptors are present for each of the segments. If
LocalFeatureDescriptorPresent is set to 1, local feature descriptors are present for each segment, if
LocalFeatureDescriptorPresent is set to 0, no local feature descriptors are present.
DeepFeatureDescriptorPresent
Indication of whether deep feature descriptors are present for each of the segments. If
DeepFeatureDescriptorPresent is set to 1, deep feature descriptors are present for each segment, if
DeepFeatureDescriptorPresent is set to 0, no deep feature descriptors are present.
IsDefaultNN
Indication of whether custom NN identification is present as follows:
IsDefaultNN == 0 The neural network specified in Annex B.
IsDefaultNN == 1 Another neural network, e.g., a variant of Annex B with different quantization of
layers, or alternative network. In this case, the three components NNUri, NNFormat
and DeepFvDim are provided.
IsDeepFeatureDescriptorBinarized
Indication of whether deep feature descriptors are binarized (IsDeepFeatureDescriptorBinarized set to
1) or not (IsDeepFeatureDescriptorBinarized set to 0).
Reserved4Bits
To be skipped by the parser.
NNUri
Identifier (URI) of the trained neural network used for deep feature descriptor extraction. This is to
be used for quantized versions of the default network, as well as for any other network. The URI is
provided as a 0 terminated string with 8 bit characters. The URI serves as an identifier for the particular
network instance, and it is recommended to provide the network definition at this location. The format
of the network definition shall conform to the format specified in NNFormat.
NNFormat
The format used for representing the neural network referenced by NNUri.
NNFormat == 1 … NNEF
NNFormat == 0 … custom format
other values reserved for future use
DeepFvDim
Dimension of deep feature vector, i.e. size of the binarized feature vector resulting from the neural
network evaluation (512 for the neural network defined in Annex B).
OriginalPictureWidth
Input frame width in pixels.
OriginalPictureHeight
Input frame height in pixels.
© ISO/IEC 2019 – All rights reserved 9

LocalFeatureDescriptorPresent
This descriptor component specifies whether a relevance bit for each compressed local feature
descriptor is present in the bitstream. If LocalFeatureDescriptorPresent is equal to 1 then the relevance
bits are present in the bitstream, and if LocalFeatureDescriptorPresent is equal to 0 then the relevance
bits are not present in the bitstream. More details are provided in subclause 5.5.
HistogramMapSizeX
Spatial resolution of the histogram for coordinate coding (see subclause 5.5.3).
HistogramMapSizeY
Spatial resolution of the histogram for coordinate coding (see subclause 5.5.3).
5.3 Segment header
5.3.1 General
The segment header is included once per segment, preceding the encoded descriptors of the segment.
At least one segment shall be present in the bitstream.
5.3.2 Binary representation syntax
SegmentHeader { Number of bits Mnemonics
IsValidStartTime 1 bslbf
IsValidEndTime 1 bslbf
IsBufferSizeValid 1 bslbf
Reserved5Bits 5
StartTime 24 uimsbf
EndTime 24 uimsbf
GlobalDescSize 24 uimsbf
LocalDescSize 24 uimsbf
DeepFeatureDescSize 24 uimsbf
NrFrames 8 uimsbf
}
5.3.3 Descriptor component semantics
IsValidStartTime
This descriptor component specifies whether the start time is valid in the bit stream. If IsValidStartTime
is equal to 1 then the start time is valid. If IsValidStartTime is equal to 0 then the start time is not valid
in the bitstream.
IsValidEndTime
This descriptor component specifies whether the end time is valid in the bit stream. If IsValidEndTime
is equal to 1 then the end time is valid. If IsValidEndTime is equal to 0 then the end time is not valid in
the bitstream.
IsBufferSizeValid
This descriptor component specifies whether the buffer size is valid in the bit stream. If IsBufferSizeValid
is equal to 1 then the buffer sizes are valid. If IsBufferSizeValid is equal to 0 then the buffer size is not
valid in the bitstream.
10 © ISO/IEC 2019 – All rights reserved

Reserved5Bits
To be skipped by the parser.
StartTime
Segment start time in the input video in milliseconds.
EndTime
Segment end time in the input video in milliseconds.
GlobalDescSize
Encoded size in bytes of the global descriptor buffer, including the descriptor of the representative
frame and the difference descriptors.
LocalDescSize
Encoded size in bytes of the local descriptor buffer, including the descriptor of the representative frame
and the difference descriptors. Set to 0, if the local descriptor is not included.
DeepFeatureDescSize
Encoded size in bytes of the deep feature descriptor buffer. Set to 0, if the deep feature descriptor is not
included.
NrFrames
Number of key frames of the segment retained for encoding [1;255].
5.4 Global descriptor
5.4.1 Binary representation syntax
GlobalDescriptor { Number of bits Mnemonics
GlobalHasBitSelection 1 bslbf
GlobalHasVariance 1 bslbf
Reserved6Bits 6
GlobalRefDescSize 16 uimsbf
EncodedRefDescriptor 8*GlobalRefDescSize vlclbf
RawGlobalDiffDescSize 24 uimsbf
EncodedDiffDescriptor 8*(GlobalDiffDesc- vlclbf
Size − GlobalRef-
DescSize)
}
5.4.2 Descriptor component semantics
GlobalHasBitSelection
This descriptor component specifies whether the bit selection is present in the bit stream. If
GlobalHasBitSelection is equal to 1 then it uses bit selection. If GlobalHasBitSelection is eqaual to 0 then
the bit selection is not present in the bitstream.
© ISO/IEC 2019 – All rights reserved 11

GlobalHasVariance
This descriptor component specifies whether the variance is present in the bit stream. If
GlobalHasVariance is equal to 1 then it includes variance. If GlobalHasVariance is equal to 0 then the
variance is not present in the bitstream.
Reserved6Bits
To be skipped by the parser.
RawGlobalRefDescSize
The uncompressed size (in bytes) of the binarized global descriptor of the representative frame of the
segment, required for the termination of ABAC decoding.
EncodedRefDescriptor
The ABAC encoded block formed from the sequence of the binarized global descriptor of the
representative frame of the segment. Bit stuffing is used to fill the block to the next byte boundary with
bits having the value 0.
RawGlobalDiffDescSize
The uncompressed size (in bytes) of the block formed from the sequence of binarized differences of the
global descriptors of the segment for the frames other than the representative frame, required for the
termination of ABAC decoding.
EncodedDiffDescriptor
The ABAC encoded block formed from the sequence of binarized differences of the global descriptors of
the segment for the frames other than the representative frame. Bit stuffing is used to fill the block to
the next byte boundary with bits having the value 0.
5.5 Local descriptor
5.5.1 General
The local descriptor is optional.
5.5.2 Local feature descriptor
5.5.2.1 Binary representation syntax
LocalDescriptor { Number of bits Mnemonics
HasRelevanceBits 1 bslbf
Reserved7Bits 7
for (i=0; i NrLocalDesc[i] 16 uimsbf
}
TotalNrLocalDesc 16 uimsbf
for (i=0; i if (i Constant1 1 bslbf
AbsFeatureIdx 14 bslbf
}
else {
12 © ISO/IEC 2019 – All rights reserved

LocalDescriptor { Number of bits Mnemonics
if (FeatureSkipped[i]) {
Constant1 1 bslbf
AbsFeatureIdx 14 bslbf
}
else
Constant0 1 bslbf
}
}
}
RawLocalDescSize 24 uimsbf
EncodedDescriptor 8*LocalDescSize vlclbf
}
The ABAC encoded block is formed from the sequence of binarized local descriptors of the segment. In
particular, the following syntax elements defined in ISO/IEC 15938-13:2015 subclause 4.1 are included
for each descriptor:
EncodedDescriptor { Number of bits Mnemonics
for(k=0; k for(n=0; n<(4*NumberOfElementGroups); n++) {
LocalDescriptorElements[k][n] 1-2 vlclbf
}
}
if(HasRelevanceBits) {
for(k=0; k RelevanceBits[k] 1 bslbf
}
}
}
Bit stuffing is used to fill the block to the next byte boundary with bits having the value 0.
5.5.2.2 Descriptor component semantics
The number and order of frames shall be consistent between the specification of the number of
descriptors, the descriptor index and the location histograms.
HasRelevanceBits
This descriptor component specifies whether relevance bits are present in the bit stream. If
HasRelevanceBits is equal to 1, then relevance bits are present (relevance information of local
descriptors as defined by CDVS).
Reserved7Bits
To be skipped by the parser.
NrFrames
Number of key frames in the input video.
© ISO/IEC 2019 – All rights reserved 13

NrLocalDesc[i]
Number of local descriptors in frame i, starting with the representative frame.
TotalNrLocalDesc
n
k k
Total number of independently encoded local descriptors in the input video, nn≤ i , where n
 
Tl∑   T
i=0
k
is the value of TotalNrLocalDesc, n [i] is the value of NrLocalDesc[i], and n is the number of key frames
l k
of the segment.
Constant1
Bit set to 1 to indicate that absolute feature index will follow.
AbsFeatureIdx
Absolute local feature index.
FeatureSkipped[i]
The condition is true if the ith local feature is skipped because it is too similar to an already encoded
one, and false otherwise.
Constant0
Bit set to 0 to indicate that the feature index will be incremented by 1. This is the case if no features
have been skipped at this position, and the feature list is encoded incrementally.
RawLocalDescSize
The uncompressed size (in bytes) of the block formed from the sequence of binarized local descriptors
of the segment, required for the termination of ABAC decoding.
5.5.3 Local descriptor locations
5.5.3.1 General
The local descriptor block contains the encoded coordinates of the local feature locations in each of the
frames of the segment.
5.5.3.2 Binary representation syntax
LocalDescriptorLocations { Number of bits Mnemonics
HistogramBufferSize 24 uimsbf
for (i=0; i CoordHisto[i] 8*FrameHistogramBufferSize[i] bslbf
}
}
Encoded coordinate histogram for frame i as defined by CDVS. FrameHistogramBufferSize[i] is
the size of the encoded histogram buffer for frame i. The following syntax elements defined in
ISO/IEC 15938-13:2015, subclause 4.1 are included for each frame:
HistogramCount (arithmetically coded block) ≥1 vlclbf
HistogramMap (arithmetically coded block) ≥1 vlclbf
14 © ISO/IEC 2019 – All rights reserved

5.5.3.3 Descriptor component semantics
HistogramBufferSize
Size of the encoded histogram buffer in bytes.
NrFrames
Number of key frames in the input video.
5.6 Deep feature descriptor
5.6.1 Binary representation syntax
DeepFeatureDescriptor { Number of bits Mnemonics
DeepRefDescSize 16 uimsbf
EncodedRefDescriptor 8*DeepRefDescSize vlclbf
RawDeepDiffDescSize 24 uimsbf
EncodedDiffDescriptor 8*(DeepFeatureDesc- vlclbf
Size − DeepRefDescSize)
}
5.6.2 Descriptor component semantics
RawDeepRefDescSize
The uncompressed size (in bytes) of the binarized deep feature descriptor of the representative frame
of the segment, required for the termination of ABAC decoding.
EncodedRefDescriptor
The ABAC encoded block formed from the sequence of the binarized deep feature descriptor of the
representative frame of the segment. Bit stuffing is used to fill the block to the next byte boundary with
bits having the value 0.
RawDeepDiffDescSize
The uncompressed size (in bytes) of the block formed from the sequence of binarized differences of the
deep feature descriptors of the segment for the frames other than the representative frame, required
for the termination of ABAC decoding.
EncodedDiffDescriptor
The ABAC encoded block formed from the sequence of binarized differences of the deep feature
descriptors of the segment for the frames other than the representative frame. Bit
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

Information technology — Multimedia content description interface — Part 15: Compact descriptors for video analysis

Technologies de l'information — Interface de description du contenu multimédia — Partie 15: Descripteurs compacts pour analyse de vidéo

General Information

Relations

Overview

Key topics and requirements

Applications and users

Related standards

Buy Documents

ISO/IEC 15938-15:2019 - Information technology -- Multimedia content description interface

ISO/IEC 15938-15:2019 - Information technology — Multimedia content description interface — Part 15: Compact descriptors for video analysis Released:7/15/2019

Get Certified

BSI Group

NYCE

Frequently Asked Questions

Standards Content (Sample)

Questions, Comments and Discussion

This May Also Interest You