Information technology — Coding of audio-visual objects — Part 7: Optimized reference software for coding of audio-visual objects

ISO/IEC TR 14496-7:2004 specifies the encoding tools that enhance both the execution and quality for the coding of visual objects as defined in ISO/IEC 14496-2. There are five visual tools, including: Fast Motion Estimation; Fast Global Motion Estimation; Fast and Robust Sprite Generation; Optimized Reference Software for Simple Profile with Fast Variable Length Decoder Technique; and Error Resilience Tools with RVLC. The platform specific optimization is not currently addressed. The error resilience tools are separately implemented based on the Momusys reference software.

Technologies de l'information — Codage des objets audiovisuels — Partie 7: Logiciel de référence optimisé pour le codage des objets audiovisuels

General Information

Status: Published
Publication Date: 26-Oct-2004

ICS: 35.040 - Information coding
: 35.040.40 - Coding of audio, video, multimedia and hypermedia information

Technical Committee: ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information
Drafting Committee: ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information

Current Stage: 9093 - International Standard confirmed
Start Date: 12-Oct-2019
Completion Date: 12-Feb-2026

Relations

Consolidated By: ISO 11997-1:2017 - Paints and varnishes — Determination of resistance to cyclic corrosion conditions — Part 1: Wet (salt fog)/dry/humid
Effective Date: 06-Jun-2022

Revises: ISO/IEC TR 14496-7:2002 - Information technology — Coding of audio-visual objects — Part 7: Optimized reference software for coding of audio-visual objects
Effective Date: 15-Apr-2008

Overview

ISO/IEC TR 14496-7:2004 - "Optimized reference software for coding of audio‑visual objects" is a technical report (Part 7 of the MPEG‑4 suite) that provides non‑normative, well‑tested encoding tools to improve execution speed and visual quality for the visual coding specified in ISO/IEC 14496‑2. The report supplies reference implementations and optimization guidance so developers can adopt efficient algorithms (without changing the normative bitstream syntax) and accelerate time‑to‑market for MPEG‑4 compliant products.

Key topics

Fast Motion Estimation (MVFAST / PMVFAST)
- Motion‑adaptive, scalable search strategies to balance rate, quality (PSNR), speed and memory.
- Optional early‑elimination for stationary blocks (example threshold T = 512 used in tests).
Fast Global Motion Estimation (FFRGMET)
- Feature‑based robust global motion techniques with outlier exclusion and robust objective functions.
Fast and Robust Sprite Generation
- Algorithms for sprite (panorama/region) generation including segmentation, motion estimation and blending.
Fast Variable Length Decoder
- Hierarchical table lookup techniques to accelerate VLC decoding (Simple Profile optimization).
Error Resilience Tools
- Error resilience techniques (including RVLC) and their integration; implementations referenced from MoMusys software.
Software integration & optimization
- Guidance for removing unused procedures, revising code bases, using fast algorithms, and compile‑time switches to include only needed tools for a profile/level. Platform‑specific (assembly/hardware) optimizations are explicitly not addressed.

Practical applications

Implementers of MPEG‑4 Visual encoders and decoders seeking faster execution and better visual quality while remaining standards‑compliant.
Developers building real‑time or constrained systems: video conferencing, live streaming, surveillance, mobile video, and broadcasting where computational resources and latency matter.
Codec researchers and QA teams using the optimized reference as a benchmark for performance comparisons and experimentation.
Product teams wanting a tested toolkit to reduce development time for Simple Profile implementations.

Who should use this report

Firmware and software engineers implementing MPEG‑4 encoders/decoders
System architects optimizing media pipelines
Academic and industry researchers in video coding and motion estimation
Standards writers and conformance test developers

Related standards

ISO/IEC 14496‑2 (Visual)
ISO/IEC 14496‑5 (Reference software)
Other MPEG‑4 parts (Systems, File formats) for end‑to‑end implementations

Keywords: ISO/IEC TR 14496-7:2004, optimized reference software, MPEG‑4, motion estimation, MVFAST, PMVFAST, global motion estimation, sprite generation, fast VLC decoder, error resilience, RVLC, encoder optimization.

ISO/IEC TR 14496-7:2004 - Information technology -- Coding of audio-visual objects - Page 1 preview

ISO/IEC TR 14496-7:2004 - Information technology -- Coding of audio-visual objects - Page 2 preview

ISO/IEC TR 14496-7:2004 - Information technology -- Coding of audio-visual objects - Page 3 preview

Technical report

ISO/IEC TR 14496-7:2004 - Information technology -- Coding of audio-visual objects

English language

32 pages

sale 15% off

Preview

sale 15% off

Preview

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

Visit Website

NYCE

Mexican standards and certification body.

EMA Mexico Verified

Visit Website

Frequently Asked Questions

What is ISO/IEC TR 14496-7:2004?

ISO/IEC TR 14496-7:2004 is a technical report published by the International Organization for Standardization (ISO). Its full title is "Information technology — Coding of audio-visual objects — Part 7: Optimized reference software for coding of audio-visual objects". This standard covers: ISO/IEC TR 14496-7:2004 specifies the encoding tools that enhance both the execution and quality for the coding of visual objects as defined in ISO/IEC 14496-2. There are five visual tools, including: Fast Motion Estimation; Fast Global Motion Estimation; Fast and Robust Sprite Generation; Optimized Reference Software for Simple Profile with Fast Variable Length Decoder Technique; and Error Resilience Tools with RVLC. The platform specific optimization is not currently addressed. The error resilience tools are separately implemented based on the Momusys reference software.

What is the scope of ISO/IEC TR 14496-7:2004?

What ICS categories does ISO/IEC TR 14496-7:2004 belong to?

ISO/IEC TR 14496-7:2004 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

What standards are related to ISO/IEC TR 14496-7:2004?

ISO/IEC TR 14496-7:2004 has the following relationships with other standards: It is inter standard links to ISO 11997-1:2017, ISO/IEC TR 14496-7:2002. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

How can I access ISO/IEC TR 14496-7:2004?

ISO/IEC TR 14496-7:2004 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)

ISO/IEC TR 14496-7:2004 - Info...

TECHNICAL ISO/IEC
REPORT TR
14496-7
Second edition
2004-10-15
Information technology — Coding
of audio-visual objects —
Part 7:
Optimized reference software for coding
of audio-visual objects
Technologies de l'information — Codage des objets audiovisuels —
Partie 7: Logiciel de référence optimisé pour le codage des objets
audiovisuels
Reference number
©
ISO/IEC 2004
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

© ISO/IEC 2004
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2004 – All rights reserved

Contents Page
Foreword. iv
Introduction . vi
1 Scope. 1
2 Fast Motion Estimation. 1
2.1 Introduction to Motion Adaptive Fast Motion Estimation. 1
2.2 Technical Description of Core Technology  MVFAST . 2

2.2.1 Detection of stationary blocks. 2
2.2.2 Determination of local motion activity. 2
2.2.3 Search Center. 3
2.2.4 Search Strategy. 4
2.2.5 Perspectives on implementing MVFAST. 4
2.2.6 Special Acknowledgements. 5
2.3 Technical Description of PMVFAST . 5
2.3.1 Introduction. 5
2.3.2 Technical Description of PMVFAST . 6
2.3.3 Special Acknowledgement. 7
2.4 Conclusions. 7
3 Fast Global Motion Estimation . 8
3.1 Introduction to Feature-based Fast and Robust Global Motion Estimation Technique. 8
3.2 Technical Description of FFRGMET. 9
3.2.1 Outlier Exclusion. 9
3.2.2 Robust Object Function . 9
3.2.3 Feature Selection. 10
3.2.4 Algorithm Description. 10
3.2.5 Perspectives on implementing FFRGMET. 11
3.2.6 Special Acknowledgements. 11
3.3 Conclusions. 11
4 Fast and Robust Sprite Generation. 11
4.1 Introduction to Fast and Robust Sprite Generation . 11
4.2 Algorithm Description. 11
4.2.1 Outline of Algorithm. 11
4.2.2 Image Region Division. 12
4.2.3 Fast and Robust Motion Estimation. 13
4.2.4 Image Segmentation. 14
4.2.5 Image Blending. 14
4.3 Conclusions. 15
5 Optimised Reference Software For Simple Profile and Error Resilience Tools. 15
5.1 Scope. 15
5.2 Integration and Optimization of the Reference Software. 15
5.2.1 Introduction. 15
5.2.2 Removal of the unused procedures, parameters, and data structures. 16
5.2.3 Revision of the code bases for saving the execution time and code sizes. 16
5.2.4 Use of the existing fast algorithms for the computational burden modules . 21
5.2.5 Optimised Simple Profile encoder and decoder.25
5.2.6 Experimental Results. 25
5.3 Error Resilience Tools. 29
5.3.1 Abbreviations. 29
5.3.2 New Processing / functionalities. 29
6 Contact Information. 31
Bibliography . 32
© ISO/IEC 2004 – All rights reserved iii

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
In exceptional circumstances, the joint technical committee may propose the publication of a Technical Report
of one of the following types:
 type 1, when the required support cannot be obtained for the publication of an International Standard,
despite repeated efforts;
 type 2, when the subject is still under technical development or where for any other reason there is the
future but not immediate possibility of an agreement on an International Standard;
 type 3, when the joint technical committee has collected data of a different kind from that which is
normally published as an International Standard (“state of the art”, for example).
Technical Reports of types 1 and 2 are subject to review within three years of publication, to decide whether
they can be transformed into International Standards. Technical Reports of type 3 do not necessarily have to
be reviewed until the data they provide are considered to be no longer valid or useful.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC TR 14496-7, which is a Technical Report of type 3, was prepared by Joint Technical Committee
ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and
hypermedia information.
This second edition cancels and replaces the first edition (ISO/IEC 14496-7:2002) which has been technically
revised.
ISO/IEC TR 14496 consists of the following parts, under the general title Information technology — Coding of
audio-visual objects:
 Part 1: Systems
 Part 2: Visual
 Part 3: Audio
 Part 4: Conformance testing
 Part 5: Reference software
iv © ISO/IEC 2004 – All rights reserved

 Part 6: Delivery Multimedia Integration Framework (DMIF)
 Part 7: Optimized reference software for coding of audio-visual objects [Technical Report]
 Part 8: Carriage of ISO/IEC 14496 contents over IP networks
 Part 9: Reference hardware description [Technical Report]
 Part 10: Advanced Video Coding
 Part 11: Scene description and application engine
 Part 12: ISO base media file format
 Part 13: Intellectual Property Management and Protection (IPMP) extensions
 Part 14: MP4 file format
 Part 15: Advanced Video Coding (AVC) file format
 Part 16: Animation Framework eXtension (AFX)
 Part 17: Streaming text format
 Part 18: Font compression and streaming
 Part 19: Synthesized texture stream

© ISO/IEC 2004 – All rights reserved v

Introduction
Purpose
This part of ISO/IEC 14496 was developed in response to the growing need for optimized reference software
that provides both improved visual quality and faster execution while compliance is preserved. The goal is to
provide non-normative tools that are essential for implementations of the normative parts of the
ISO/IEC 14496 specifications. For example, Part 5 of the ISO/IEC 14496 specifications uses a full search
motion estimation which is theoretical optimum in coding efficiency but impractical for commercial
implementation. In the past, the industry needs to create its own encoding tools for its target products. In this
part, we provide a well-tested set of encoding tools that can enhance the performance but should not be
standardized. The following recommended tools would be up to the individual organization to decide if it
wishes to adopt or adapt these tools for its specific needs. This part provides significant reduction in the time-
to-market and provides a reference benchmark for commercial ISO/IEC 14496 compliant products.

vi © ISO/IEC 2004 – All rights reserved

TECHNICAL REPORT ISO/IEC TR 14496-7:2004(E)

Information technology — Coding of audio-visual objects —
Part 7:
Optimized reference software for coding of audio-visual objects
1 Scope
This part of ISO/IEC 14496 specifies the encoding tools that enhance both the execution and quality for the
coding of visual objects as defined in ISO/IEC 14496-2. The tool set is not limited to visual objects but at this
point all the recommended tools are visual encoding tools. There are four tools that have been described in
this technical report.
� Fast Motion Estimation
� Fast Global Motion Estimation
� Fast and Robust Sprite Generation
� Fast Variable Length Decoder Using Hierarchical Table Lookup

These tools have been demonstrated as robust tools with source codes for both MoMusys and Microsoft
implementations. In the current implementations, there is single software that includes all tools existed in the
ISO/IEC 14496-2. This is obviously inefficient in terms of code size and execution speed. To address this
issue, the optimized reference software has compilation switches such that only selected tools as defined by
the profiles and levels are included. Such level of optimization is performed at high level programming
language. The platform specific optimization is currently not addressed by this part.
2 Fast Motion Estimation
2.1 Introduction to Motion Adaptive Fast Motion Estimation
The optimization of fast motion estimation is essentially a multi-dimensional problem. The key dimensions
concerned in this problem are: Rate, Quality (PSNR), Speed-up (or Computational Gain), Algorithmic
Complexity, Memory Size and Memory Bandwidth (see Figure 1). There always exists a trade-off among all
these five key dimensions. Therefore, it is highly desirable to have an adaptive fast motion estimation core
algorithm with scalable structure, which can be adaptively optimized with respect to all or selected aspects for
various coding environment and requirements. Since the rate control is used to fix the bit-rate, the
optimization problem is reduced by one dimension to four dimensions.
Motion Vector Field Adaptive Search Technique (MVFAST) [1] is a generic algorithm of the family of
motion-adaptive fast search techniques, originally proposed by Kai-Kuang Ma and Prabhudev Irappa Hosur
from Nanyang Technological University (NTU), Singapore. The MVFAST offers high performance both in
quality and speed and does not require memory to store the searched points and motion vectors. The
MVFAST has been adopted by MPEG-4 Part 7 in the Noordwijkerhout MPEG meeting (March 2000) as the
core technology for fast motion estimation.
A derivative of MVFAST, called Predictive MVFAST (PMVFAST) [2], is considered as an optional approach
that might benefit in special coding situations. PMVFAST incorporates a set of thresholds into MVFAST to
trade higher speed-up at the cost of memory size, memory bandwidth and additional algorithmic complexity.
In PMVFAST, the threshold values are adjusted based on the 54 test cases specified by MPEG-4. However,
the coding performance and sensitivity of PMVFAST using these thresholds for the video sequences and
encoding conditions outside the MPEG-4 test set has not been studied and verified.
© ISO/IEC 2004 – All rights reserved 1

Quality
Speed
Bit-rate
Memory Algorithmic
(Size and Bandwidth)
complexity
Figure 1 — Five dimensional optimization problem of fast motion estimation
2.2 Technical Description of Core Technology  MVFAST
2.2.1 Detection of stationary blocks
A large number of MBs in the video sequences (e.g., “talking head” video sequences) with low-motion content
tend to have motion vectors equal to (0,0). Such MBs in the regions of no-motion activity can be detected
simply based on the sum of absolute difference (SAD) at the origin. Therefore, we exploit an optional phase,
called early elimination of search, as the first step in MVFAST as follows. The search for a MB will be
terminated immediately, if its SAD value obtained at (0,0) is less than a threshold T, and the motion vector is
assigned as (0,0). Through extensive simulations, we found that among those zero-motion blocks identified,
about 98% of them have their SAD at position (0,0) less than 512. Hence, we choose T = 512 to enable the
mechanism of early elimination of search. Since this early elimination of search phase is optional, it can be
turned off or disabled by imposing T = 0.
2.2.2 Determination of local motion activity
The local motion vector field at a macroblock (MB) position is defined as the set of motion vectors in the
region of support (ROS) of that MB. The ROS of a MB includes the n neighborhood MBs. In MVFAST, the
ROS with n = 3 is shown in Figure 2. Let V={V , V , ….V }, where V = (0,0), and V (and i ≠ 0) is the motion
0 1 n 0 i
vector of MB in the ROS (see Figure 2). The cityblock length of V =(x, y) is defined as l = |x| + |y|. Let L =
i i i i vi i i
MAX{l } for all V . The motion activity at the current MB position is defined as follows.
vi i
Motion Activity = Low, if L≤ L1;
= Medium, if L1 < L ≤ L2;
= High, if L > L2 ;    (1)
where L and L are integer constants. We choose L and L as the cityblock distance from the center point of
1 2 1 2
the pattern to any other point on the small and large search patterns (see Figure 3), respectively. Thus, L =1
and L =2.
2 © ISO/IEC 2004 – All rights reserved

MB MB
2 3
Current
MB
MB
Figure 2 — Region of support (ROS) for the current MB consists of MB1, MB2 and MB3

V
V
V
Figure 3 — Example of distribution of motion vectors belonging to set V. In this case, lv1 = 2, lv2 = 1,
lv3 = 6; thus L = MAX{lv1, lv2, lv3} = 6
2.2.3 Search Center
The choice of the search center depends on the local motion activity at the current MB position. If the motion
activity is low or medium, the search center is the origin. Otherwise, the vector belonging to set V that yields
the minimum sum of absolute difference (SAD) is chosen as the search center.

(a)         (b)
Figure 4 — (a) Large Diamond Search Pattern (LDSP) and (b) Small Diamond Search Pattern (SDSP)
© ISO/IEC 2004 – All rights reserved 3

2.2.4 Search Strategy
A local search is performed around the search center to obtain the motion vector for the current MB. The
search patterns employed for the local search are shown in Fig. 4. Two strategies are proposed for the local
search and their choice depends on the motion activity identified. If the motion activity is low or high, we
employ small diamond search (SDS). Otherwise, we choose large diamond search (LDS).
i) Small Diamond Search (SDS)
Step 1: Small diamond search pattern (SDSP) is centered at the search center, and all the
checking points of SDSP are tested. If the center position yields the minimum SAD (i.e., no
motion), then the center represents the motion vector; otherwise, go to Step 2.

Step 2: The center of SDSP moves to the point where the minimum SAD was obtained in
the previous step, and all the points on SDSP are tested. If the center position yields the
minimum SAD, then the center represents the motion vector; otherwise, recursively repeat
this step.
ii) Large Diamond Search (LDS)

Step 1: Large diamond search pattern (LDSP) is centered at the search center, and all the
checking points of LDSP are tested. If the center position gives the minimum SAD, go to
Step 3; otherwise, go to Step 2.
Step 2: The center of LDSP moves to the point where the minimum SAD was obtained in
the previous step, and all the points on LDSP are tested. If the center position gives the
minimum SAD, go to Step 3; otherwise, recursively repeat this step.
Step 3: Switch the search pattern from LDSP to SDSP. The point that yields the minimum
SAD, is the final solution of the motion vector.

Table 1 summarizes the methodology for selection of search center and search strategy depending on the
motion activity at the current MB position.
Table 1 — The search modes for MVFAST
Motion Activity Search Center Search Strategy
Low Origin SDS
Medium Origin LDS
High The position of the vector in SDS
set V that yields minimum SAD
2.2.5 Perspectives on implementing MVFAST
The MVFAST algorithm can be structured in terms of profiles. The MVFAST itself as described above can be
viewed as the main profile. The low, medium and high motion activity cases in Table 1 can be considered
individually as three other different profiles of MVFAST. Depending on the video coding applications, any one
of these individual profiles can be turned “ON” simply by adjusting the two parameters, L and L , in
1 2
Equation (1). If we set L = L = Search Range, we obtain “low motion activity” profile. The “medium motion
1 2
activity” profile (which is the same as Diamond Search, as described in VM Version 14) can be obtained, if we
set L = −1 and L = Search Range. For “high motion activity” profile, we can set L = L = −1. Note that in
1 2 1 2
this case, Search Range = 2*N, if the search in either coordinate is in the range [−N, N-1].
4 © ISO/IEC 2004 – All rights reserved

Although MVFAST is implemented in an intelligent way such that the overlap of search points is minimized
when the search pattern moves, few search points are visited more than once. This overlap can be avoided
by keeping the record of all the search points visited and testing if the current search point is visited earlier.
Thus further improvement over speed-up can be achieved.
The search point (0,0) is always tested in MVFAST. However, some improvement in computational gain is
obtained by testing (0,0) point only, if any of the motion vectors in the ROS has motion vector = (0,0).
Through extensive experiments using MVFAST, it is found that further improvement in objective quality can be
achieved when interlaced CCIR sequences with high global motion are coded in progressive mode, by
including the motion vector of collocated block on the previously coded non-intra frame in the set V. During
the motion estimation of interlaced pictures, each frame prediction of macroblock motion is performed before
field motion estimation. Therefore, for field motion estimation of current macroblock, its frame motion vector is
included in set V.
From hardware implementation viewpoint, to restrict the total number of search points for a block in the worst
case to be N, an additional stopping criterion  “stop the search when the number of search points visited so
far is equal to N”, can be included in SDS and LDS given in subclause 2.4.
2.2.6 Special Acknowledgements
Kai-Kuang Ma and Prabhudev Irappa Hosur would like to sincerely acknowledge tremendous support from
Professor Meng Hwa Er, Dean, School of Electrical and Electronic Engineering, and Deputy President of
Nanyang Technological University, Singapore, who plays a vital role on promoting and directing all Singapore
MPEG activities. For independent verification efforts, the following individuals are greatly acknowledged: Dr.
Weisi Lin, Mr. Chengyu Xiong, Dr. Ee Ping Ong, all from Institute of Microelectronics (IME), Singapore.
CONTACT PERSON:
Dr. Kai-Kuang Ma, School of Electrical and Electronic Engineering, Nanyang Technological University, Block
S2, Nanyang Avenue, Singapore 639798. Tel: +65-790-6366; Fax: +65-792-0415; Emails: ekkma@ntu.edu.sg
and kaikuang@hotmail.com.
2.3 Technical Description of PMVFAST
2.3.1 Introduction
This section provides the technical description of the Predictive Motion Vector Field Adaptive Search
Technique (PMVFAST) which adds some techniques from the Advance Predictive Diamond Zonal Search
(APDZS) [2] proposed by the Hong Kong University of Science and Technology (HKUST) to the MVFAST
core mentioned above to achieve larger speed up. The PMVFAST was contributed by Prof. Ming L. Liou, Dr.
Oscar C. Au, and Alexis Tourapis of HKUST. PMVFAST is faster than MVFAST at the expense of higher
hardware complexity
Several independent parties, Optivision Inc., Sarnoff Co., Mitsubishi Electric Information Technology Center
America, National Technical University of Athens (NTUA), and Beijing University of Aeronautics and
Astronautics (BUAA), conducted evaluation throughout the entire adoption process. For independent
verification efforts, the following individuals are greatly acknowledged: Dr. Weiping Li (from Optivision), Dr.
Hung-Ju Lee and Dr. Tihao Chiang (from Sarnoff), Mr. Anthony Vetro and Dr. Huifan Sun (from Mitsubishi), Mr.
Gabriel Tsechpenakis, Mr. Yannis Avtithis and Prof. Stefanos Kollias(from NTUA), and Prof. Bo Li, Yaming Tu
(from BUAA).
© ISO/IEC 2004 – All rights reserved 5

2.3.2 Technical Description of PMVFAST
PMVFAST combines the ‘stop when good enough’ spirit, the thresholding stopping criteria and the spatial and
temporal motion vector prediction of APDZS and the efficient large and small diamond search patterns of
MVFAST. Let the refBlock be the block in the reference frame at the same spatial location as the current block.
Without loss of generality, the distortion criterion is assumed to be the Sum-of-Absolute-Difference (SAD),
though it can be other measures. The predicted motion vector in PMVFAST is the median of the motion
vectors of three blocks spatially adjacent to the current block (left, top and top right), as in MPEG motion
vector predictive coding.
Firstly, the PMVFAST computes the SAD of the predicted motion vector (PMV), and stops if any one of two
stopping criteria is satisfied. The first criterion is that the PMV is equal to the motion vector of refBlock and the
SAD of PMV is less than that of refBlock. The second criterion is that the SAD of PMV is less than a threshold.
Secondly, the PMVFAST computes the SAD of some highly-probable motion vectors (MV of left, top and top
right spatially neighboring blocks, MV of (0,0) and MV of refBlock) and stops if any one of two stopping criteria
is satisfied. The first criterion is that the best motion vector so far is equal to the MV of refBlock and the
minimum SAD so far (MinSAD) is less than that of refBlock. The second criterion is that the MinSAD is less
than a threshold.
Thirdly, the PMVFAST selects the MV associated with minSAD and performs a local search using techniques
of MVFAST. If PMV is equal to (0,0) and the motion vectors of the three spatially adjacent blocks are identical
with large associated SAD, the large diamond search of MVFAST is applied. Otherwise, if the motion vectors
of the three spatially adjacent blocks are identical and are the same as the MV of refBlock, small diamond
search is applied with the simplication that only one small diamond pattern is examined. Otherwise, the small
diamond search of MVFAST is applied.
Here is the step-by-step algorithm of PMVFAST: The variables thresa, thresb are integers used as thresholds
in the stopping criteria.
(Initialization)
Step 1: Set thresholding parameters (thresa & thresb). These are set as follows:
If first row and column, thresa = 512, thresb = 1024
Else thresa = minimum value of the sad of left, top and top-right blocks. thresb = thresa + 256;
If thresa<512, thresa = 512. If thresa > 1024, thresa = 1024.
If thresb > 1792, thresb = 1792.
Set Found=0 and PredEq=0
Compute the predicted MV according to the Median rule.
Select previous MV, above, and above-right and calculate median.
If block is an edge block, depending to the position, do the following:
If block is on the first column, assume previous MV to be equal to (0,0).
If block is on the first row, select previous MV as the prediction.
If block is on the last column, assume above right MV to be equal to (0,0).
If left MV = top MV = top-right MV then set PredEq=1;

(Initial prediction calculation)
Step 2: Calculate Distance= |MedianMV | + |MedianMV | where MedianMV is the motion vector of the
X Y
median.
If PredEq=1 and MV = Previous Frame MV, set Found=2
predicted
Step 3: If Distance>0 or thresb<1536 or PredEq=1.
Select small Diamond Search. Otherwise select large Diamond Search.
Step 4: Calculate SAD around the Median prediction. MinSAD=SAD
If Motion Vector equal to Previous frame motion vector and MinSAD If SAD<=256 goto Step 10.
Step 5: Calculate SAD for motion vectors taken from left block, top, top-right, and Previous frame block. Also
calculate (0,0) but do not subtract offset.
Let MinSAD be the smallest SAD up to this point. If MV is (0,0) subtract offset.
6 © ISO/IEC 2004 – All rights reserved

Step 6: If MinSAD <= thresa goto Step 10.
If Motion Vector equal to Previous frame motion vector and MinSAD (Diamond Search)
Step 7: Perform Diamond search, with either the small or large diamond. If Found=2 only examine one
Diamond pattern, and afterwards goto step 10
Step 8: If small diamond, iterate small diamond search pattern until motion vector lies in the center of the
diamond. If center then goto step 10.
Step 9: If large diamond, iterate large diamond search pattern until motion vector lies in the center. Refine by
using small diamond and goto step 10.
(Final step. Use best MV found.)
Step 10: The motion vector is chosen according to the block corresponding to MinSAD.
By performing an optional local half-pixel search, we can refine this result even further.
2.3.3 Special Acknowledgement
M.L. Liou, O.C. Au, A.M. Tourapis would like to sincerely acknowledge the tremendous support from Prof.
Ishfaq Ahmad, the Department of Electrical and Electronic Engineering of the Hong Kong University of
Science and Technology (HKUST), the Hong Kong Telecom Institute of Information Technology (HKTIIT), the
Hong Kong Reseach Grants Council (RGC).
CONTACT PERSON: Dr. Oscar C. Au, Department of Electrical and Electronic Engineering, Hong Kong
University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China. Tel: +852 2358-7053;
Fax: +852 2358-1485; Emails: eeau@ust.hk.
2.4 Conclusions
The comparison of MVFAST vis-à-vis PMVFAST is given in Table 2.
© ISO/IEC 2004 – All rights reserved 7

Table 2 — Comparison of MVFAST and its derivative algorithm PMVFAST
MVFAST PMVFAST
Threshold No threshold or one optional A set of compulsory thresholds.
comparisons threshold Coding performance and
sensitivity of PMVFAST using
these thresholds for the video
sequences and encoding
conditions outside the MPEG-4
test set has not been studied and
verified.
Algorithmic Less complex Higher complexity
complexity
Memory size No need to store either search Memory is compulsory.
points or motion vectors
Up to 4 Mbytes of memory for a
search window size of +- 1024
to store search points and up to
1.3 kilobytes for storing motion
vectors
Memory bandwidth Not applicable (since no memory is Memory bandwidth is wasted for
needed) accessing the memory when each
search point is visited
Objective quality On average, about 0.2 dB less than On average, about 0.1 dB less
(PSNR) Full Search than Full Search
Speed-up About 50% faster than MVFAST
Scalability Scalable to three different profiles; Not scalable
where each profile is obtained by
simply assigning values to two
parameters in the beginning of the
algorithm.
The MVFAST is recommended as the core technology for fast motion estimation, since it is a generic solution
suitable for all encoding environments. However, if issues such as memory, algorithmic complexity and
threshold sensitivity are not of concern, then PMVFAST algorithm can be used. Therefore, MVFAST is
integrated as the core mode in Part 7.
3 Fast Global Motion Estimation
3.1 Introduction to Feature-based Fast and Robust Global Motion Estimation Technique
Sprite coding is an important technology in MPEG-4 encoder, but sprite coding could be hardly applied in real-
time application because the global motion estimation (GME) used in sprite coding is a time-consuming task.
In order to overcome this problem a feature-based fast and robust GME technique (FFRGMET) is proposed
by Tsinghua University, which improves the original GME method [3]. Comparison experimental results show
that FFRGMET improves the speed substantially. There are three significant improvements of FFRGMET
compared with original GME method.
8 © ISO/IEC 2004 – All rights reserved

(1) More Accurate Outlier Detection on the Base Level
Three-level pyramid is applied in the GME calculation. Local motion will affect GME when there is local
motion. Pixels undergoing local motion are the outliers in GME. Residual block based outlier detection
method is used in the base level of GME in FFRGMET. The original outlier detection method in VM is
residual histogram based, which could not represent the spatial distribution of the residuals. At the base
level of the estimation, the pixels belonging to the foreground object appear to show large residuals and
concentrate together to a compact region, so the residual block based outlier detection could help to locate
the outliers more accurately.
(2) More Robust Object Function for Parameter Estimation
The robust object function is used to replace the quadratic object function in VM. This object function is
adaptive to the variance of all pixels to be estimated, and is more robust to outliers than the quadratic form.
Robust object function is very important in FFRGMET because there are fewer pixels in calculation
compared with GME method in VM. Robust object function restrains the effect of outliers in GME. Robust
object function considers the residual and variance of the pixels set in the estimation.
(3) Fewer Pixels Used in GEM Calculation
On the intermediate level and base level of pyramid, it does not need to use all the pixels in the object
region to participate the GME calculation. There is much redundant information in whole background. So in
FFRGMET, only some feature pixels are selected as representatives in GEM. The feature pixels are
selected based on the spatial edge and the temporal difference, which could contribute more to the motion
estimation than other pixels. The speed of GME is accelerated because fewer pixels are used.

3.2 Technical Description of FFRGMET
3.2.1 Outlier Exclusion
Residual-block based method to exclude the outliers in FFRGMET calculation. The image is divided into
16×16 sized blocks. The block is regarded as a potential block to be rejected if the SAD of this block belongs
to the top 30% ordered by the SAD of a block.
16 16
SAD = γ ; γ is the residual of pixel (i,j).
∑∑
ij ij
ij==0 0
There are two steps to determine whether to reject the blocks in GME calculation or not. (a) Firstly if there is at
least four potential blocks to be rejected in the eight nearest neighbor blocks of current potential block to be
rejected, then the current potential block to be rejected will be rejected, otherwise the current potential block to
be rejected is reserved. (b) Secondly if there is a rejected block in the eight nearest neighbor of remainder
potential block to be rejected, then this potential block to be rejected will be rejected in the calculation of global
motion estimation.
3.2.2 Robust Object Function
The following robust object function is used in the Levenberg-Marquadet calculation of FFRGMET:
r
i
F = , σ = 1.253 E(r)
∑
2 2
(σ + r )
i
i
E(r) is the mean of absolute value of residual r. The weight function for the object function is:
2 2
weight(r) = 1/(σ + r )
© ISO/IEC 2004 – All rights reserved 9

3.2.3 Feature Selection
Feature pixels are selected in GME calculation according to the following conditions (Condition 7-1).
{(x, y) |
(UPTHRESH ⋅ E( I + I )) > ( I + I ) > (DOWNTHRESH ⋅ E( I + I ))
x y x y x y
AND }  (7-1)
( I > THRESHOLD ⋅ E (I ))
t t t
I and I are the spatial gradient components of luminance. I is the temporal gradient of luminance. E(x) is
x y t
the mean value of the set of x. The pixel that belongs to motion edge must meet the following two conditions.
The first is that its spatial gradient value must be larger than a predefined down threshold and less than a
predefined up threshold. Differential operator is sensitive to noise when the gradient value is small. So large
gradient value can reduce the influence of noise. And large gradient value means that there is an edge of
luminance. The second condition is that the absolute value of temporal gradient value must be larger than a
predefined threshold. I of that pixel will be small if the direction of global motion is perpendicular to the
t
direction of luminance gradient of the point. The pixels belonging to motion edge are used in the intermediate
and base levels of pyramid calculation of global motion estimation.
3.2.4 Algorithm Description
Following is the complete description of FFRGMET algorithm (Note that only Y-component data is used in the
following steps). A 6-parameter affine model is used in the FFRGMET.
Step1: Set parameters:
Maximum iterative steps for GN (Gauss-Newton) and LM (Levenberg-Marquadet) calculation
(MAX_TIMES=32)
Resistant factor of LM (RESISTANCE = 0.002) and amplificatory factor of LM (AMPLIFIER = 10.0)
Number of parameters to be estimated (PARNUM = 6)
Threshold of gradient value (DOWNTHRESH =1.25, UPTHRESH = 1.65)
Threshold of temporal gradient value (THRESHOLD =1.0)
t
Step2: Generate the three-level pyramid:
Use Gaussian down-sampling filter [1/4, 1/2, 1/4] on the original image to generate the images of the
three-level pyramid.
Step3: Calculate the motion parameters of top level:
Use three-step search method to calculate the initial two translational parameters of 6-parameter
affine motion model.
Exclude the top 10% of total pixels in the residual histogram.
Estimate the parameters based on the left pixels using GN method.
Step4: Project the parameters of the top level onto the intermediate level.
Step5: Calculate the motion parameters of the intermediate level:
Exclude the top 10% of total pixels using residual histogram.
Select the pixels according to condition 7-1.
Estimate the parameters of second level with the selected pixels using LM optimizing method.
Step6: Project the parameters of the intermediate level onto the base level.
Step7: Calculate the motion parameters of the base level:
If (VOP.Shape = Rectangle) Then
Exclude the residual blocks with top 1/3 SAD value using the residual-block based method.
Else
Exclude the top 10% of total pixels using residual histogram.
10 © ISO/IEC 2004 – All rights reserved

End If
Select the pixels according to condition 7-1.
Estimate the parameters of base level with the selected pixels using LM optimizing method.
Note: GN means Gauss-Newton optimizing method, LM means Levenberg-Marquadet optimizing method.
3.2.5 Perspectives on implementing FFRGMET
FFRGMET is faster than the original GME method in MPEG-4 VM, but it is more complex and the PSNR of
coding will have a little loss. The original GME method can be used if there is no requirement for speed,
otherwise FFRGMET can be used. All those predefined parameters can be changed, which is gotten from the
experimentation.
3.2.6 Special Acknowledgements
Shiqiang Yang, Yuzhuo Zhong, Wei Qi and Yuwen He would like to sincerely acknowledge tremendous
support from Professor Tihao Chiang and Dr. Huifang Sun, who are charge of Encoder Optimization. For
independent verification efforts, the following individuals are greatly acknowledged: Dr. Oscar C. Au, Dr.
Alexis M. Tourap, from the Hong Kong University of Science and Technology, China; Dr. Feng Wu, from
Microsoft Research China.
CONTACT PERSON: Prof. Shiqiang Yang, Department of Computer Science and Technology, Tsinghua
University, Beijing100084, China. Tel: +8610 6278-4141; Fax: +8610 6277-1138; Email:
yangshq@mail.tsinghua.edu.cn.
3.3 Conclusions
The feature-based fast and robust global motion estimation technique (FFRGMET) is about 7 times faster
than the original global motion estimation (GME) algorithm in MPEG-4 verification model (VM). But the PSNR
decreases about 0.06 dB on the average for luminance component. The total GMC coding speed is
accelerated about 3.5 times. FFRGMET for sprite coding is much faster than then original global motion
estimation method in MPEG-4 VM from the comparison at the cost of a little loss in PSNR, which is negligible.
Therefore, FFRGMET is integrated in Part 7.
4 Fast and Robust Sprite Generation
4.1 Introduction to Fast and Robust Sprite Generation
Just as those techniques for fast motion estimation and fast global motion estimation, the technique for sprite generation is
also very important for sprite coding. This section dedicates to describe the algorithm for fast and robust sprite generation.
Firstly, the described algorithm can significantly speed up the sprite generation compared with the method provided in
MPEG-4 video VM [3], meanwhile only the little extra memory is necessary. Secondly, the described algorithm can provide
better subjective visual quality as well. The visual quality is another key point for static sprite coding because the
background object is reconstructed by directly warping the sprite according to the definition of static sprite coding in
MPEG-4 standard. Furthermore, when no auxiliary mask information is available, a rough imag
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...