ISO 24616:2012 provides a generic platform for modeling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modeling applications. MLIF (multilingual information framework) provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains. MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to, XLIFF (Localization Interchange File Format), TMX (Transition Memory eXchange), smilText (Synchronized Multimedia Integration Language) and ITS (Internationalization Tag Set).
Gestion des ressources langagières — Plateforme d'informations multilingues
Upravljanje z jezikovnimi viri - Ogrodje za večjezične informacije
Ta mednarodni standard zagotavlja splošno platformo za modeliranje večjezikovnih informacij in upravljanje z njimi na različnih področjih: lokalizacija, prevajanje, multimedijsko označevanje, upravljanje z dokumenti, podpora digitalni knjižnici in aplikacije za modeliranje poslovanja. MLIF (ogrodje za večjezične informacije) zagotavlja metamodel in sklop splošnih podatkovnih kategorij [ISO 12620:2009] za različna področja uporabe. MLIF zagotavlja tudi strategije za interoperabilnost in/ali povezovanje modelov, med drugim XLIFF, TMX, smilText in ITS.
SIST ISO 24616:2013
SIST ISO 24616:2013
Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
SIST ISO 24616:2013
SIST ISO 24616:2013
SIST ISO 24616:2013
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
Language resources management — Multilingual information
1 Scope
This International Standard provides a generic platform for modelling and managing multilingual information in
various domains: localization, translation, multimedia annotation, document management, digital library
support, and information or business modelling applications. MLIF (multilingual information framework)
provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.
MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,
XLIFF, TMX, smilText and ITS.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 12620:2009; Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008,
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
data category attached to a component of a metamodel
inline code
inline instructions inserted in a source document
Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).
textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom
of the screen
working language
language in which linguistic sequences are expressed
SIST ISO 24616:2013
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)
The MLIF specification complies with the modelling principles of UML as defined by the Object Management
Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.
4.2 Metamodel and adornment
In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that
is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization
Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML
serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.
5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.
Figure 1 — MLIF metamodel
ISO 24616:2012(E)
The MLIF metamodel is defined by the following seven "core components". These components are listed as
follows, according to their XML serialization:
(Multilingual Data Collection), which represents a collection of data containing global information
and several multilingual units;
(Global Information), which represents technical and administrative information applying to the
entire multilingual data collection;
(Grouping components), which represents a sub-collection of multilingual data that have a
common origin or purpose within a given project;
(Multilingual Component), which groups together all variants of a given textual content;
(Monolingual Component), which groups together information related to one language and is
part of a multilingual component (MultiC);
(History Component), which traces modifications to the component to which it is anchored (i.e.
(Segmentation Component), which allows any level of segmentation for textual information,
possibly in a recursive manner.
6 MLIF compliance
Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:
by fully implementing the MLIF metamodel starting at the level of ;
by specifically embedding MLIF-compliant information within another model, by implementing one of the
lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction
The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the
following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI
guidelines (, some attributes are specified by means of a class attribute, with the
convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes
are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The
specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:
the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working
language of any relevant element and, in particular, shall be used systematically for any implementation
of MonoC;
the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier
to an element of the MLIF metamodel.
SIST ISO 24616:2013
7.3 Recommended adornment for GI
7.4 Recommended adornment for GroupC
7.5 Recommended adornment for MultiC
SIST ISO 24616:2013
7.6 Recommended and mandatory adornment for MonoC
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC
7.8 Recommended adornment for HistoC
The HistoC component is a generic component that traces modifications made on the component to which it is
anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be
anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or
enhancements to, the component to be recorded.
HistoC may be adorned by four elements:
7.9 Recommended online annotation adornment
Multilingual text documents are often only one stage in a complex workflow that involves external document
sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the
presentational features that have to be retained in a translated target document. To this end, MLIF-compliant
applications should use the following elements, in relation to the element, that map onto similar
subsets in TMX and XLIFF:
SIST ISO 24616:2013
7.10 Recommended adornment for localization
All the following elements should be used to provide localization-related information:
7.11 Recommended adornment for internationalization
7.12 Recommended adornment for temporal synchronization
The following elements should be used when textual content has to be conveyed (in written or spoken form)
together with some constraints:
8 Relation with other standards
As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a
metamodel that combines with selected data categories as a way of ensuring interoperability between several
multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the
translation relations between them. In each domain where MLIF is applicable, a specific granularity may be
considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF
[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description
MLIF supports the construction and the interoperability of localization and translation memories resources,
and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed
list of description features. Rather, it provides a list of data categories that is much easier to update and
extend. This list represents a point of reference for multilingual information in the context of various application
However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word
and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and
section). In addition, MLIF allows for external and internal links (annotations and references).
MLIF is designed to provide a common framework that facilitates the interoperability with formats such as
TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them
SIST ISO 24616:2013
deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated
and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
SIST ISO 24616:2013
Annex A
Example using MLIF for Computer-Assisted Translation (CAT)
The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on
translation memory to produce translations of new words and sentences that are not in the translation
For example, using a translation memory that contains the English sentence "The meal is nice." and its
translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench
are not able to provide the predicted translation for the sentence "The meals are nice." even though the word
lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that
these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
o-tmf="TW4Win 2.0 Format"
The meal is nice.
Le repas est bon.
To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following
Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.
Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.
SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is
given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this
SIST ISO 24616:2013
Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the
lemma and morphological features.
Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French
inflected form as follows:
"The meals are nice." => "Les repas sont bons."
The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word
segmentation and tagset defined in MAF:
The meal is nice.
Le repas est bon.
tag="#mP #p1 #nS">is
tag="#gM #nS">Le
tag="#gM #nS">repas
SIST ISO 24616:2013
tag="#mP #p1 #nS">est
tag="#gM #nS">bon
SIST ISO 24616:2013
Annex B
Example: representing TMX data
B.1 Introduction
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of
Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The
purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation
vendors with little or no loss of critical data during the process. TMX, which has been on the market since
1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for
Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF
TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to
MLIF as follows:
maps onto the element;
is a container for the element and maps onto the element;
maps onto the element;
maps onto the element;
maps onto the element;
of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
The "creationtool" attribute maps onto the element;
The "creationdate" attribute maps onto the element;
The "tuid" attribute maps onto the element within MultiC.
The element does not map onto any specific element as it represents a generic placeholder for
application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF
elements or onto a standardized ISO/TC 37 data category as available from ISOCat.
SIST ISO 24616:2013
B.3 Example of data
The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and
does not translate all the details of the header.
creationtool="Heartsome TM Server"
Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.
Il suo metodo di controllo di qualità in 10 fasi risale a più
di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.
그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
The corresponding representation in MLIF default representation is as follows:
Heartsome TM Server
SIST ISO 24616:2013
Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.
B.4 Example of TMX and MLIF interaction
Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of
extraction, translation and merging. The process begins with a TMX document containing linguistic content in
English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM
formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic
information is stored. As most translators (human beings or automatic software modules) work with TMX
software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX
document. This file does not contain any formatting information. Once the translator has added the
appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF
document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the
“Skeleton File” to produce a new TMX formatted document (5).
Figure B.1 — TMX and MLIF interaction
SIST ISO 24616:2013
Annex C
Example of XLIFF data representation
C.1 Introduction
The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of
localizable software- and document-based objects and related metadata.
