Media Content Distribution (MCD); Subtitles distribution, situation and perspectives

DTR/MCD-00012

General Information

Status
Published
Publication Date
12-May-2011
Current Stage
12 - Completion
Due Date
25-May-2011
Completion Date
13-May-2011
Ref Project
Standard
tr_102989v010101p - Media Content Distribution (MCD); Subtitles distribution, situation and perspectives
English language
27 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


Technical Report
Media Content Distribution (MCD);
Subtitles distribution, situation and perspectives

2 ETSI TR 102 989 V1.1.1 (2011-05)

Reference
DTR/MCD-00012
Keywords
access, distribution, teletext, transmission
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
Individual copies of the present document can be downloaded from:
http://www.etsi.org
The present document may be made available in more than one electronic version or in print. In any case of existing or
perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF).
In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive
within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
http://portal.etsi.org/tb/status/status.asp
If you find errors in the present document, please send your comment to one of the following services:
http://portal.etsi.org/chaircor/ETSI_support.asp
Copyright Notification
No part may be reproduced except as authorized by written permission.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2011.
All rights reserved.
TM TM TM TM
DECT , PLUGTESTS , UMTS , TIPHON , the TIPHON logo and the ETSI logo are Trade Marks of ETSI registered
for the benefit of its Members.
TM
3GPP is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners.
LTE™ is a Trade Mark of ETSI currently being registered
for the benefit of its Members and of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI
3 ETSI TR 102 989 V1.1.1 (2011-05)
Contents
Intellectual Property Rights . 4
Foreword . 4
1 Scope . 5
2 References . 5
2.1 Normative references . 5
2.2 Informative references . 5
3 Definitions and abbreviations . 6
3.1 Definitions . 6
3.2 Abbreviations . 7
4 Subtitling in the media content distribution context . 8
4.1 Subtitling vs. captioning . 8
4.2 Bitmap and textual subtitles . 8
4.3 Presentation techniques . 9
4.3.1 On-screen techniques . 9
4.3.2 Techniques complementary to subtitling facilitating other functionalities . 10
4.4 3D-specific challenges . 11
4.5 Necessity of subtitles distribution . 11
5 Subtitles flow . 12
5.1 In an SD-only content distribution . 14
5.2 In an HD content distribution with HD-SDI transmission . 15
5.3 In an HD content distribution with encoded transmission . 16
5.4 In a content distribution with late component binding . 17
6 Production . 17
6.1 Off-line vs. live subtitling . 17
6.2 Production formats for off-line contents . 18
6.2.1 Historical formats . 18
6.2.2 WebVTT . 18
6.2.3 TTML . 18
6.2.4 European Broadcasting Union . 19
6.3 Production formats for live subtitling . 19
7 Broadcasting . . 19
7.1 SD signal . 19
7.2 HD signal. 19
7.3 SMPTE-TT . 20
8 Transmission . 20
8.1 (HD-)SDI transmission . 20
8.2 Encoded transmission . 20
9 Distribution . 21
9.1 Bitmap and textual distribution formats . 21
9.2 Available formats . 22
10 Synthesis and conclusions . 23
10.1 For a better balance between text and bitmap formats . 23
10.2 For a modern timed text format . 23
10.3 Lack of a production standard . 23
10.4 Lack of a broadcasting standard . 23
10.5 Lack of a distribution standard . 23
10.6 Wide application for a standardized timed text . 24
Annex A: Bibliography . 25
History . 26
ETSI
4 ETSI TR 102 989 V1.1.1 (2011-05)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (http://webapp.etsi.org/IPR/home.asp).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This Technical Report (TR) has been produced by ETSI Technical Committee Media Content Distribution (MCD).
ETSI
5 ETSI TR 102 989 V1.1.1 (2011-05)
1 Scope
The present document is an analysis of the situation in the distribution of subtitling information of associated television
services.
2 References
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
reference document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication ETSI cannot guarantee
their long term validity.
2.1 Normative references
The following referenced documents are necessary for the application of the present document.
Not applicable.
2.2 Informative references
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] SMPTE 0259M: "Television - SDTI Digital Signal/Data - Serial Digital Interface".
[i.2] SMPTE 292: "1.5 Gb/s Signal/Data Serial Interface".
[i.3] ETSI EN 300 743: "Digital Video Broadcasting (DVB); Subtitling systems".
[i.4] ETSI EN 300 706: "Enhanced Teletext specification".
[i.5] CEA-608: "Line 21 Data Services".
[i.6] EBU Tech. 3264-E, Specification of the EBU Subtitling Data Exchange Format, European
Broadcasting Union, February 1991.
NOTE: Available at: http://tech.ebu.ch/docs/tech/tech3264.pdf.
[i.7] Association media for all.
NOTE: Available at: http://www.mediaforall.eu/.
[i.8] ETSI TS 102 796: "Hybrid Broadcast Broadband TV".
[i.9] Directive 2007/65/EC of the European Parliament and of the Council of 11 December 2007
amending Council Directive 89/552/EEC on the coordination of certain provisions laid down by
law, regulation or administrative action in Member States concerning the pursuit of television
broadcasting activities.
[i.10] Directive 2002/19/EC of the European Parliament and of the Council of 7 March 2002 on access
to, and interconnection of, electronic communications networks and associated facilities
(Access Directive).
ETSI
6 ETSI TR 102 989 V1.1.1 (2011-05)
[i.11] Directive 2002/21/EC of the European Parliament and of the Council of 7 March 2002 on a
common regulatory framework for electronic communications networks and services
(Framework Directive, (FwD)).
[i.12] Directive 2002/22/EC of the European Parliament and of the Council of 7 march 2002 on universal
service and users' rights relating to electronic communications networks and services
(Universal Service Directive, (USD)).
[i.13] Directive 1999/5/EC of the European Parliament and of the Council of 9 March 1999 on radio
equipment and telecommunications terminal equipment and the mutual recognition of their
conformity (R&TTE Directive).
[i.14] ETSI TR 102 688-3: "Media Content Distribution (MCD); MCD framework; Part 3: Regulatory
issues, social needs and policy matters".
[i.15] WHATWG WebVTT, extracted from the WHATWG HTML specification.
NOTE: Available at http://www.whatwg.org/specs/web-apps/current-work/webvtt.html.
[i.16] W3C Recommendation: "Timed Text Markup Language (TTML)".
NOTE: Available at: http://www.w3.org/TR/ttaf1-dfxp/.
[i.17] Free TV Australia Operational Practice OP- 47: " Storage and distribution of teletext subtitles and
VBI data for high definition television".
[i.18] SMPTE 2031: "Carriage of DVB/SCTE VBI Data in VANC".
[i.19] IETF RFC 3629: "UTF-8, a transformation format of ISO 10646".
[i.20] EBU R 110: "Subtitling on digital TV services".
[i.21] ETSI TS 101 547: "Digital Video Broadcasting (DVB); Frame Compatible Plano-Stereoscopic
3DTV".
[i.22] ETSI ES 202 432: "Human Factors (HF); Access symbols for use with video content and
ICT devices".
[i.23] ITU-T Recommendation Y.1901: "Requirements for the support of IPTV services".
[i.24] Directive 2002/21/EC of the European Parliament and of the Council of 7 March 2002 on a
common regulatory framework for electronic communications networks and services
(Framework Directive, (FwD)).
[i.25] Directive 98/34/EC: "Procedure for the provision of information in the field of technical
regulations and rules on information society services".
3 Definitions and abbreviations
3.1 Definitions
For the purposes of the present document, the following terms and definitions apply:
anti-aliasing: technique of minimizing the distortion artefacts appearing when representing high-resolution objects
such as fonts to a low-resolution display such as a TV screen
ETSI
7 ETSI TR 102 989 V1.1.1 (2011-05)
captions: real-time on-screen transcript of the dialogue as well as any sound effects
NOTE 1: This service can be provided by means of either textual or graphical supplementary content. The captions
and the dialogue are usually in the same language. The service is primarily to assist users having
difficulty hearing the sound. Ideally, users may have some control over the position and size of the
presentation. Different speakers are distinguished, usually by different colours.
NOTE 2: This is based on ITU-T Recommendation Y.1901 [i.23], clause 3.2.4.
content provider (CP): actor making available any kind of content where it has editorial responsibility or represent
those with editorial responsibility
electronic communications service (ECS): service normally provided for remuneration which consists wholly or
mainly in the conveyance of signals on electronic communications networks, including telecommunications services
and transmission services in networks used for broadcasting, but excluding services providing, or exercising
editorial control over, content transmitted using electronic communications networks and services; it does not include
information society services, as defined in Article 1 of Directive 98/34/EC [i.25], which do not consist wholly or mainly
in the conveyance of signals on electronic communications networks
NOTE: This is based on the FwD [i.24], Article 2, Definitions, (c).
electronic communications service provider (ECSP): actor offering ECS; in the context of Media Content
Distribution (MCD) means those that offer the contents available from Content Providers but do not have editorial
interference on it
encode: convert content from one format (typically raw) to another (typically lower bit rate)
sign language: language that uses a system of manual, facial, and other body movements as the means of
communication
subtitler: person in charge of creating subtitles for a content
teletext: data delivery system within television transmission
3.2 Abbreviations
For the purposes of the present document, the following abbreviations apply:
3D Three Dimensional
ANSI American National Standards Institute
ASCII American Standard Code for Information Interchange
ATSC Advanced Television Systems Committee
CBR Constant Bit Rate
CC Closed Caption
CP Content Provider
DFXP Distribution Format Exchange Profile
DVB Digital Video Broadcasting
TM
DVD Digital Versatile Disc
ECI Experts Community Integrated production
ECSP Electronic Communications Service Provider
HD High Definition
HDTV High Definition Television
HTML Hyper Text Markup Language
IP Internet Protocol
IPTV Internet Protocol TeleVision
MXF Metadata eXchange Format
OCR Optical Character Recognition
PID Packet Identifier
SD Standard Definition
SDI Serial Digital Interface
SDTV Standard Definition Television
SMIL Synchronised Multimedia Integration Language
VANC Vertical ANCillary data space
ETSI
8 ETSI TR 102 989 V1.1.1 (2011-05)
VBI Vertical Blanking Interval
VBR Variable Bit Rate
VOD Video On Demand
W3C World Wide Web Consortium
XML Extensible Markup Language

4 Subtitling in the media content distribution context
4.1 Subtitling vs. captioning
In general the term caption is more related to original capture of all information (noise, sound, surrounding conditions)
associated to an event and the term subtitling rather used in relation to the textual (in a picture or as a sequence of words
and letters) expression describing or reproducing the same event. Using this textual expression, different types of
presentation techniques (text overlapped on the image in one or several languages, the language of signs or even audio
description) may be used.
However "captioning" is more common in the U.S.A. and is typically used for same language transcription, for instance
for the hearing impaired; "subtitling" often refers to language translation of foreign programmes.
The terms "open captions" and "closed captions" are also very common:
• Open captions (also called "burnt-in" "in-vision", or "hardsubs") are encoded as a permanent part of the video
image, irreversibly merged into the frames. They cannot be disabled, and no specific "track" is needed to
convey them. Also, no special equipment or software is required for playback, and any character set (including
pure graphics) can be used. Consequently this technique allows for very complex transition effects and
animation (for instance karaoke song lyrics). However it does not allow for multiple user-selectable variants of
subtitling (such as English, English Hard of Hearing, and French).
• Closed captions are viewer-selectable and carried over a separate "track" or "component" (which may be
collocated inside the video track, but not in the video data itself). Examples of closed captions include the Line
21 Closed Captioning system in the USA (carried as text), Teletext subtitles in Europe (text as well) or DVB
subtitles (graphics). The carriage of this separate, discrete data stream may be problematic in some contexts.
Nowadays both the subtitling and the captioning terms are disappearing in favour of the "timed text" expression. In the
other clauses of the present document, "subtitling" refers to both "subtitles" and "closed captions"; open captions are
carried inside the video data and do not require a specific handling.
In other categorization, digital video subtitles are sometimes called internal, if they're embedded in a single video file
container along with video and audio streams, and external if they are distributed as separate file (that is less
convenient, but it is easier to edit/change such file).
4.2 Bitmap and textual subtitles
Clause 4.1 shows that subtitling data can be carried on a different track than video data, which requires special player
support, but allows the user to disable the subtitles, or choose a specific variant. Two types of presentation formats can
then be used:
• A bitmap, pre-rendered format, such as DVD Sub-Picture Units or DVB subtitles: the subtitles are stored as
images (generally not the same codec as the video, but a simpler algorithm, with minimal bit rate and colour
depth), and the player overlays these images over the video frames.
• A textual format, such as Teletext (for distribution) or an authoring format: the subtitles are stored textually
with instructions, usually a specially marked up text with time stamps and stylistic information (position,
colour, weight, etc.). They are rendered by the player (or converted to another format) and displayed over the
video at the specified time stamps.
ETSI
9 ETSI TR 102 989 V1.1.1 (2011-05)
The choice of bitmap vs. textual formats depends on a lot of factors, which are examined in more details in clause 9.1.
Bitmap subtitles are currently more popular for the distribution step because they historically allow for a better-looking,
consistent between receivers, user experience.
There are many flavours of textual formats, especially for the production domain, but they are in general reciprocally
convertible. Typically, textual formats are easier to create, change, convert and re-use in other applications; they are
thus frequently used for fansubs (subtitles fans created by viewers). Bitmap subtitles are difficult to convert or re-use,
though special OCR software exist to convert bitmap subtitles to textual format.
4.3 Presentation techniques
4.3.1 On-screen techniques
When producing subtitling content, the first choice to make is the display mode:
• Pop-up mode: traditional display, scheduled, as in movie theatres.

Figure 1
ETSI
10 ETSI TR 102 989 V1.1.1 (2011-05)
• Cumulative mode: word by word and sentence by sentence, also called Roll, Push or Snake.

Figure 2
The cumulative mode typically takes a higher bandwidth than the pop-up mode. The presentation technique also greatly
depends on the nature of the programme, and the amount of time allocated to the task of subtitling.
NOTE: Regardless of the mode of display of the subtitles themselves, ETSI has already standardized the way for
an end-user to enable subtitling, via standardized icons ([i.22], clause 5.1).
4.3.2 Techniques complementary to subtitling facilitating other
functionalities
While not technically "subtitling", subtitling data can also be used as a source for other presentation techniques.
Among the European deaf population, approximately 12 % of the people are illiterate and cannot read subtitles. These
persons often need the sign language to understand the transmitted message. It normally requires a human translation
made by someone correctly perceiving the original message and reproducing it in the sign language. However, there are
new techniques to synthesize so-called "Avatars", "Talking heads" or "3D puppets", which may be three-dimensional
images of a face that reproduces lip movements. The goal is to allow hearing-impaired people to follow a speaker, even
when the lips are hidden, or the speaker is off-screen. In order to facilitate lip reading, a hand is also represented to add
information on the consonants and vowels, which makes it possible to eliminate ambiguities between sounds
corresponding to the same shapes of the lips, such as "p" and "b". Also, some avatars actually reproduce the simplified
spoken language, the most common sign language, which conveys meaning by simultaneously combining hand shapes,
orientation and movement of the hands, arms or body, and facial expression.
ETSI
11 ETSI TR 102 989 V1.1.1 (2011-05)
Those systems can take advantage of subtitling data; indeed they combine a phoneme recognition tool and a complex
driving system based on a model that makes it possible to associate a shape of the lips and face, and its computer
graphics translation, with each sound. Examples of these research projects are the ISA and PAROLE project teams of
INRIA Lorraine/LORIA, in partnership with the DATHA association (development of technological aids for hearing-
impaired persons), and Artus/Arte (shown below).

Figure 3: 3D puppet reproducing lips movement (Artus/Arte project)
Another presentation technique, which is not strictly-speaking subtitling, is Braille, available on special keyboards
capable of activating a matrix of dots reproducing the message. It is especially useful for original version subtitling for
vision-impaired people. When growing old, a person may have difficulties reading the original version subtitles on
screen, and may use a Braille device for that purpose.
With the progress of automatic translation machines and voice synthesizers, it is easy to imagine that the text associated
to subtitling might in a near future be used as the base for:
• the presentation of the subtitles in several languages, possibly selectable by the user; or
• audio description facilities used for blind or elderly people with visual impairments.
Textual formats facilitate the use of subtitling data by these presentation techniques; however the current trend is to
encourage bitmap subtitling.
4.4 3D-specific challenges
With 3D, the subtitling process becomes more complex because of the depth of the picture: subtitles have to appear in
front of the objects behind them. If the subtitles are flat, it is impossible to read them because they would appear behind
the action. Also, some people prefer the subtitles to be always 1 pixel in front of the most forward object/the object of
interest, and that means that the subtitles depth can be dynamic.
The subtitle operator has to be in charge of this. If the depth is not respected, the picture or the subtitle will be visually
damaged, or uncomfortable for the viewer.
A new DVB specification [i.21] summarizes the normative requirements by DVB to carry and render
3D plano-stereoscopic content. DVB subtitles [i.3] have been extended to define subregions within regions, and assign
them a depth (disparity) indication. The disparity can also be temporally adjusted.
ETSI
12 ETSI TR 102 989 V1.1.1 (2011-05)
4.5 Necessity of subtitles distribution
Subtitling is a very important feature for users with hearing difficulties. These difficulties may have genetic origin,
result from an accident (e.g. acoustic shock) or from the common biological human ageing process or be associated to
the fact that the user is listening to a language or a dialect that is not his own. The overall percentage of the population
suffering from this impairment is increasing since the life expectance is increasing, the number of migrants also and the
circulation of audiovisual contents as well.
Governments are getting more and more concerned with the fate of impaired users, and a lot of regulation authorities
have taken measures to enhance the integration of the affected groups of the society. Some authorities force the bigger
content providers to implement subtitling features on a significant part of their programs and communications service
providers and network operators to appropriately support these services. These actions are normally in the context of or
resulting from studies in the area of e.inclusion and more specifically e.accessibility. TR 102 688-3 [i.14] explains the
European regulatory and policy environment and refers to this area of issues in clauses 9.6.2, 9.6.4 and 9.6.7.
The legislated requirements of digital broadcasting in Europe (European directive 2007/65/CE [i.9]) have prompted an
increase in the number of captioned programs of up to 100 % at the end of 2015 for the latest European country. In
France the law 2005-102 requires broadcasters whose mean audience is over 2.5 % of the population to caption 100 %
of their programmes by 2010. The bases of these measures are cited in TR 102 688-3 [i.14], clause 6.2.2.
At a level of communications services, the e-communications directives and particularly the framework [i.11], the
universal service [i.12] and the access (and interconnection) [i.10] directives were revised to underline the importance
of the support to services for people with the disabilities. This is explained in TR 102 688-3 [i.14], clause 7.1 and
further clauses of clause 7.
The EU directive covering telecommunications terminal equipment (the R&TTE directive [i.13]) also considers in
article 3.3.f the need to support "certain features in order to facilitate its use by users with a disability". This is referred
to in TR 102 688-3 [i.14], clause 8.2.1 but has not been yet translated into specific measures in the European regulatory
environment at the present.
Regulatory measures are more often addressing real time broadcasting, but the general recognition of the importance of
subtitling services is very likely to impose it as a general market request for all types of distribution of media contents,
including on-demand services, over managed and non-managed networks, using broadcast, unicast and multicast
techniques.
Subtitles are also used in a lot of countries to display movie contents with original audio; original audio is a plus for
premium channels that is more and more often required.
Therefore subtitling is nowadays considered an essential compo
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...