Computer applications in terminology — Terminological markup framework

ISO 16642:2017 specifies a framework for representing data recorded in terminological data collections (TDCs). This framework includes a metamodel and methods for describing specific terminological markup languages (TMLs) expressed in XML. The mechanisms for implementing constraints in a TML are defined, but not the specific constraints for individual TMLs. ISO 16642:2017 is designed to support the development and use of computer applications for terminological data and the exchange of such data between different applications. This document also defines the conditions that allow the data expressed in one TML to be mapped onto another TML.

Applications informatiques en terminologie — Plate-forme pour le balisage de terminologies informatisées

Računalniške aplikacije v terminologiji - Ogrodje za označevanje terminologije

Ta dokument določa ogrodje za predstavitev podatkov, zabeleženih v zbirkah terminoloških podatkov (TDC). To ogrodje vključuje metamodel in metode opisovanja določenih jezikov za označevanje terminologije (TML), izraženih z jezikom XML. Opredeljeni so mehanizmi za uvajanje omejitev pri jezikih za označevanje terminologije,
vendar ne določene omejitve posameznih jezikov za označevanje terminologije.
Namen tega dokumenta je pomoč pri razvijanju in uporabi računalniških aplikacij za terminološke podatke ter izmenjavi takšnih podatkov med različnimi aplikacijami. Ta dokument opredeljuje tudi pogoje, ki podatkom, izraženim z enim jezikom za označevanje terminologije, omogočajo preslikavo na drug jezik za označevanje terminologije.

General Information

Status
Published
Publication Date
19-Nov-2017
Current Stage
9092 - International Standard to be revised
Completion Date
23-Jun-2023

Buy Standard

Standard
ISO 16642:2017 - Computer applications in terminology -- Terminological markup framework
English language
21 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 16642:2018
English language
27 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)

INTERNATIONAL ISO
STANDARD 16642
Second edition
2017-11
Computer applications in
terminology — Terminological
markup framework
Applications informatiques en terminologie — Plate-forme pour le
balisage de terminologies informatisées
Reference number
ISO 16642:2017(E)
©
ISO 2017

---------------------- Page: 1 ----------------------
ISO 16642:2017(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2017, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2017 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 16642:2017(E)

Contents Page
Foreword .iv
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Modular approach . 4
5 Generic model for describing terminological data . 5
5.1 Principles . 5
5.2 Generic representation of components and information units . 6
5.3 The metamodel . 8
5.4 Example .10
6 Requirements for compliance to TMF .11
7 Interchange and interoperability .12
8 Representing languages .12
9 Defining a TML .13
9.1 Steps .13
9.2 Defining interoperability conditions .13
10 Implementing a TML .13
10.1 General .13
10.2 Implementing the metamodel .13
10.3 Anchoring data categories on the XML outline .14
10.3.1 General.14
10.3.2 Styles and vocabulary .14
10.4 Constraints on datatypes .15
10.5 Implementing annotations .15
10.6 Implementing brackets .15
Annex A (informative) Conformance of terminological data to TMF: example scenario .16
Bibliography .21
© ISO 2017 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO 16642:2017(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see the following
URL: www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Terminology and other language and
content resources, Subcommittee SC 3, Computer applications for terminology.
This second edition cancels and replaces the first edition (ISO 16642:2003), which has been technically
revised.
The main changes compared to the previous version are as follows:
— The following formats are no longer actively used. Consequently, references to these formats have
been removed (including Annex A, Annex B, and Annex C):
— Martif with specified constraints (MSC);
— Geneter;
— Data category interchange format (DCIF);
— Generic mapping tool (GMT).
— With the removal of Annex B and Annex C, this document no longer includes any comprehensive
code examples of a TML. Examples of TMLs are now available in ISO 30042, TermBase eXchange,
and also at the following Web site: www.tbxinfo.net.
— References to the former ISO/TC 37 Data Category Registry or ISOcat have been changed from
normative to informative. In addition, the name has changed to DatCatInfo, now as an example of
data category repositories.
— References to ISO 12620:1999 and ISO 12620:2009 have been removed. These previous standards
have been withdrawn.
— The TypedValuedElement style has been added.
— Examples have been updated to reflect ISO 30042:2008 (TBX). TBX-Basic is mentioned as a TML.
iv © ISO 2017 – All rights reserved

---------------------- Page: 4 ----------------------
ISO 16642:2017(E)

— Some of the examples and tables have been moved to appropriate sections.
— As a consequence of the aforementioned changes, some historical, didactic, or duplicate information
has been removed to adhere more closely to ISO editorial standards.
© ISO 2017 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO 16642:2017(E)

Introduction
Terminological data are collected, managed and stored in a wide variety of systems, typically various
kinds of database management systems, ranging from personal computer applications for individual
users to large terminological database systems operated by major companies and governmental
agencies. Terminology databases are comprised of various types of information, called data categories,
and can adopt different structural models. However, terminological data often need to be shared and
reused in a number of applications, and this sharing is facilitated when the data adheres to a common
model. To facilitate co-operation and to prevent duplicate work, it is important to develop standards
and guidelines for creating and using terminological data collections (TDCs) as well as for sharing and
exchanging data.
This document presents a modular approach for analysing existing TDCs and designing new ones. It also
provides a framework for defining terminological markup languages (TMLs) that are interoperable.
This document makes reference to DatCatInfo, an example of an available data category repository.
DatCatInfo is an online database of information about the types of data that can be included in
terminological data collections and other language resources. It is available at www.datcatinfo.net.
vi © ISO 2017 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO 16642:2017(E)
Computer applications in terminology — Terminological
markup framework
1 Scope
This document specifies a framework for representing data recorded in terminological data collections
(TDCs). This framework includes a metamodel and methods for describing specific terminological
markup languages (TMLs) expressed in XML. The mechanisms for implementing constraints in a TML
are defined, but not the specific constraints for individual TMLs.
This document is designed to support the development and use of computer applications for
terminological data and the exchange of such data between different applications. This document also
defines the conditions that allow the data expressed in one TML to be mapped onto another TML.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 704, Terminology work — Principles and methods
ISO 1087-1, Terminology work — Vocabulary — Part 1: Theory and application
ISO 3166-1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes
ISO 26162, Systems to manage terminology, knowledge and content — Design, implementation and
maintenance of terminology management systems
ISO 30042:2008, Systems to manage terminology, knowledge and content — TermBase eXchange (TBX)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
basic information unit
information unit (3.12) attached to a component (3.3) of the metamodel and that can be expressed by
means of a single data category (3.6)
3.2
complementary information
Cl
information supplementary to that described in terminological entries (3.22) and shared across the
terminological data collection (3.21)
Note 1 to entry: Domain hierarchies, institution descriptions, bibliographic references and references to text
corpora are typical examples of complementary information.
© ISO 2017 – All rights reserved 1

---------------------- Page: 7 ----------------------
ISO 16642:2017(E)

3.3
component
elementary description unit of a metamodel to which data categories (3.6) can be associated to form a
data model
3.4
compound information unit
information unit (3.12) attached to a component (3.3) of the metamodel that is expressed by means of
several grouped data categories (3.6), that, taken together, express a coherent unit of information
3.5
conceptual domain
set of valid value meanings associated with a data category (3.6)
Note 1 to entry: For example, the data category /part of speech/ could have the following conceptual domain: /
noun/, /verb/, /adjective/, /adverb/, and so forth.
3.6
data category
elementary descriptor used in a linguistic description or annotation scheme
Note 1 to entry: In this document, data categories are indicated in between forward slashes (/), e.g. /definition/.
3.7
data category repository
DCR
electronic repository of data category specifications (3.9) to be used as a reference for the definition of
linguistic annotation schemes or any other representation model for language resources
Note 1 to entry: A DCR for language resources is available at http://www.datcatinfo.net.
3.8
data category selection
DCS
set of data categories (3.6) selected from a DCR (3.7)
3.9
data category specification
set of attributes used to fully describe a given data category (3.6)
Note 1 to entry: The abbreviation “DCS” is associated with data category selection and is not used for data
category specification.
3.10
expansion tree
structured group of XML elements that implement a level of the metamodel in a given TML (3.23)
3.11
global information
GI
technical and administrative information applying to the entire terminological data collection (3.21)
Note 1 to entry: For example, the title of the terminological data collection, revision history, owner or copyright
information.
3.12
information unit
IU
elementary piece of information attached to a structural level of the metamodel
2 © ISO 2017 – All rights reserved

---------------------- Page: 8 ----------------------
ISO 16642:2017(E)

3.13
language section
LS
part of a terminological entry (3.22) containing information related to one language
Note 1 to entry: One terminological entry may contain information on one or more languages.
3.14
object language
language being described
3.15
persistent identifier
PID
unique Uniform Resource Identifier (URI) that assures permanent access for a digital object by
providing access to it independently of its physical location or current ownership
3.16
structural node
instance of component (3.3) within the representation of a terminological data collection (3.21)
3.17
structural skeleton
abstract description of an instance of a terminological data collection (3.21) in conformity with the
metamodel
3.18
style
specification for the implementation of a data category (3.6) in XML
3.19
term component section
TCS
part of a term section (3.20) giving linguistic information about the components of a term
3.20
term section
TS
part of a language section (3.13) giving information about a term
3.21
terminological data collection
TDC
resource consisting of terminological entries (3.22) with associated meta data and documentary
information
3.22
terminological entry
TE
part of a terminological data collection (3.21) which contains the terminological data related to one concept
Note 1 to entry: Every element in the TE can be linked to complementary information, to other terminological
entries and to other elements in the same terminological entry.
3.23
terminological markup language
TML
XML format for representing a terminological data collection (3.21) conforming to the constraints
expressed in this document
© ISO 2017 – All rights reserved 3

---------------------- Page: 9 ----------------------
ISO 16642:2017(E)

3.24
Unified Modeling Language
UML
language for specifying, visualizing, constructing and documenting the artifacts of software systems
3.25
vocabulary
set of strings used to implement a data category (3.6) according to a style (3.18)
3.26
working language
language used to describe objects
3.27
XML outline
part of a terminological data collection (3.21) corresponding to the XML implementation of the
metamodel
4 Modular approach
Terminological Markup Framework (TMF) consists of two levels of abstraction. The first (and most
abstract) level is the metamodel level. The metamodel level supports analysis, design and exchange at
a very general level, i.e. it is independent of any specific implementation or software. The metamodel
shall be shared by all TDCs that are compliant with TMF. The second level is the data model level, which
adds the necessary data categories for representing specific TDCs .
The implementation of a data model in XML is called a terminological markup language (TML). TMLs
can be described on the basis of a limited number of characteristics, namely
— how the TML expresses the structural organization of the metamodel (i.e. the expansion trees of
the TML);
— the specific data categories used by the TML and how they relate to the metamodel;
— the way in which these data categories can be expressed in XML and anchored on the expansion
trees of the TML, i.e. the XML style of any given data category;
— the vocabularies used by the TML to express those various informational objects as XML elements
and attributes according to the corresponding XML styles.
Figure 1 represents the information required to fully specify a TML.
— The metamodel describes the basic hierarchy of components to which any TML shall conform.
— A set of data category specifications from a data category repository, which can form the basis for
defining a data category selection (DCS) for the TML
— The dialectal specification (dialect) includes the various elements needed to represent a given TML
in an XML format. These elements comprise expansion trees and data category instantiation styles,
together with their corresponding vocabularies.
4 © ISO 2017 – All rights reserved

---------------------- Page: 10 ----------------------
ISO 16642:2017(E)

Figure 1 — Various knowledge sources involved in the description of a TML
A DCR providing sample data category specifications for language resources is available at www.
datcatinfo.net. Where possible, data categories documented in this DCR should be used for a TML. If
no suitable data category is available in this DCR, the implementers of the TML should propose the
creation of the required data category specification within this DCR.
5 Generic model for describing terminological data
5.1 Principles
This clause describes a class of XML document structures which can be used to represent a wide
range of terminological data formats, and provides a framework for representing these document
structures in XML.
Each type of document structure is described by means of a three-tiered information structure that
describes:
— a metamodel, which comprises a hierarchy of components;
— information units, which can be associated with each component of the metamodel;
— annotations, which can be used to qualify properties associated with a given information unit.
Information units can be basic or compound. A basic information unit encapsulates information that
can be expressed by means of a single data category. A compound information unit encapsulates
information that is expressed by means of several grouped data categories that, taken together, express
a coherent unit of information. For instance, a compound information unit can be used to represent the
fact that a transaction can be a combination of a transaction type (such as modification), the person
who performed it, and the date when it was performed.
Basic information units, whether they are directly attached to a component or are placed within a
compound information unit, can take two non-exclusive types of value:
— an atomic value corresponding either to a simple type (in the sense of XML schemas) such as a
number, string, element of a picklist, etc., or to a mixed content type in the case of annotated text;
— a reference to a component in order to express a relation between it and the current component.
© ISO 2017 – All rights reserved 5

---------------------- Page: 11 ----------------------
ISO 16642:2017(E)

Information units can be abstractly represented as feature-value structures. For instance, the following
markup sample
   UHB

can be modelled as a basic information unit in the following feature-value structure:
[owner = UHB]

Similarly, the following TBX markup sample
   
          modification
     YYY
     1964-04-04
   

can be modelled in a feature-value structure as shown in Figure 2.

transac = modiication
transacGrp = responsiblePerson = YYY
date = 1964-04-04
Figure 2 — Feature-value structure
There is also a need to associate semantic information with the content of a data category; this is
achieved through annotations. A typical example is a definition in which the genus and/or differentia
are explicitly marked, as in the following definition for lead pencil:
      
     pencilwhose
          casingis fixed around a central
          graphitemedium which is
          used for writing or making marks
      

Such information cannot be represented as a feature-value structure.
5.2 Generic representation of components and information units
Terminological data can be represented using a generic architecture that consists of a graph of
elementary structural nodes to which one or more information units are attached. This architecture is
shown in the UML diagram in Figure 3.
6 © ISO 2017 – All rights reserved

---------------------- Page: 12 ----------------------
ISO 16642:2017(E)

Figure 3 — UML diagram for structural nodes and information units
The diagram expresses the relationship between the following defined classes:
— structural node: a class containing one attribute (LevelName) which identifies objects of this
type in the context of a given language resource (for example, TE/Terminological Entry for the
representation of terminological data);
— information unit: a class containing three attributes that: a) identify objects of this type in relation
to a given data category (IUName, e.g. /definition/, /partOfSpeech/, etc.); b) describe a type for its
content (C_type); and c) provide the actual content value (C_value).
The value of C_type can either belong to the set of simple types as defined in XML Schema Part 2:
Datatypes or be MIXED.
Objects of these two classes can be related in the following ways.
— association: Indicates that a structural node is related to another structural node by a hierarchical
link. There is no constraint on the number of links or the structure of the network that those links
create (tree, directed acyclic graph, etc.) (0.*);
— hasContent: Relates a structural node to information units (for instance, a /definition/ attached to a
TE node (terminological entry)). An instance of an information unit is attached to one and only one
structural node (1.1);
— refinement: Relates information units that provide additional information to another information
unit (for example, a /note/ refining a /definition/). A refining information unit is related to one and
only one refined information unit (1.1). Some TMLs allow more levels of refinement than others,
and this affects the degree of interoperability.
The MIXED type is an ordered combination of textual content (strings) and information units,
corresponding to any kind of annotated content. It can be represented in UML by means of the
aggregation operator, as shown in Figure 4.
© ISO 2017 – All rights reserved 7

---------------------- Page: 13 ----------------------
ISO 16642:2017(E)

Figure 4 — MIXED object class
Adherence to this definition permits annotations to be refined by other information units (for instance
to indicate when and by whom the annotation has been made).
5.3 The metamodel
The terminological metamodel is based on guidelines concerning the methods and principles of
terminology management as described in ISO 704. One of the most important characteristics of a
terminological entry, compared to a lexicographical entry, is its concept orientation. A terminological
entry treats one concept in a given language and, in the case of multilingual terminological entries,
one or more totally or partially equivalent concepts in one or more other languages, whereas a
lexicographical entry contains one lemma (the base form of a lexical unit) and one or more definitions
(representing different meanings) in one or more languages.
Note that some concepts are not universal in that they present slight differences in different languages
or cultures. These differences may be significant enough to declare that they form different and distinct
concepts. Depending on the degree of conceptual difference and similarity, it may be decided to describe
these concepts in the same entry or in different entries.
A terminological data collection (TDC) comprises global information about the collection and a number
of entries. Each entry performs three functions.
— It describes a single concept.
— It identifies the terms that designate the concept.
— It describes the terms themselves.
Each terminological entry can have multiple language sections, and each language section can have
multiple term sections (terms and their accompanying information). Each data element in an entry
can be associated with various kinds of descriptive and administrative information. In addition, there
are various other resources that can be referenced by multiple entries. Such shared resources include
bibliographic references, descriptions of ontologies, and binary data such as images that illustrate
concepts.
The principles of terminology management as described in ISO 704, ISO 26162 and ISO 30042, shall be
respected. These include:
— term autonomy;
— concept orientation;
— data elementarity;
— data granularity.
The terminological metamodel is described through seven instances from the structural node class, as
shown in Figure 5.
8 © ISO 2017 – All rights reserved

---------------------- Page: 14 ----------------------
ISO 16642:2017(E)

Figure 5 — Terminological metamodel — UML diagram
These seven instances of the structural node class are:
— TDC (terminological data collection): Top level container for all information contained in a
terminological data collection.
— GI (global information): Information about the TDC as a whole. The GI section usually contains, for
example, the title of the TDC, the institution or individual from which the file originated, address
information, copyright information, update information, and so forth.
— TE (terminological entry): Information that pertains to a single concept, or two or more nearly
equivalent concepts. The TE section contains descriptive information pertinent to a concept, such
as a definition and subject field, and administrative information about the entry.
— LS (language section): The LS is a container for all the term sections of a terminological entry for
a given language, as well as information pertaining to the concept in that language. For example, it
may contain
...

SLOVENSKI STANDARD
SIST ISO 16642:2018
01-oktober-2018
5DþXQDOQLãNHDSOLNDFLMHYWHUPLQRORJLML2JURGMH]DR]QDþHYDQMHWHUPLQRORJLMH
Computer applications in terminology -- Terminological markup framework
Applications informatiques en terminologie -- Plate-forme pour le balisage de
terminologies informatisées
Ta slovenski standard je istoveten z: ISO 16642:2017
ICS:
01.020 7HUPLQRORJLMD QDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 16642:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------

SIST ISO 16642:2018

---------------------- Page: 2 ----------------------

SIST ISO 16642:2018
INTERNATIONAL ISO
STANDARD 16642
Second edition
2017-11
Computer applications in
terminology — Terminological
markup framework
Applications informatiques en terminologie — Plate-forme pour le
balisage de terminologies informatisées
Reference number
ISO 16642:2017(E)
©
ISO 2017

---------------------- Page: 3 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2017, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2017 – All rights reserved

---------------------- Page: 4 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

Contents Page
Foreword .iv
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Modular approach . 4
5 Generic model for describing terminological data . 5
5.1 Principles . 5
5.2 Generic representation of components and information units . 6
5.3 The metamodel . 8
5.4 Example .10
6 Requirements for compliance to TMF .11
7 Interchange and interoperability .12
8 Representing languages .12
9 Defining a TML .13
9.1 Steps .13
9.2 Defining interoperability conditions .13
10 Implementing a TML .13
10.1 General .13
10.2 Implementing the metamodel .13
10.3 Anchoring data categories on the XML outline .14
10.3.1 General.14
10.3.2 Styles and vocabulary .14
10.4 Constraints on datatypes .15
10.5 Implementing annotations .15
10.6 Implementing brackets .15
Annex A (informative) Conformance of terminological data to TMF: example scenario .16
Bibliography .21
© ISO 2017 – All rights reserved iii

---------------------- Page: 5 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see the following
URL: www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Terminology and other language and
content resources, Subcommittee SC 3, Computer applications for terminology.
This second edition cancels and replaces the first edition (ISO 16642:2003), which has been technically
revised.
The main changes compared to the previous version are as follows:
— The following formats are no longer actively used. Consequently, references to these formats have
been removed (including Annex A, Annex B, and Annex C):
— Martif with specified constraints (MSC);
— Geneter;
— Data category interchange format (DCIF);
— Generic mapping tool (GMT).
— With the removal of Annex B and Annex C, this document no longer includes any comprehensive
code examples of a TML. Examples of TMLs are now available in ISO 30042, TermBase eXchange,
and also at the following Web site: www.tbxinfo.net.
— References to the former ISO/TC 37 Data Category Registry or ISOcat have been changed from
normative to informative. In addition, the name has changed to DatCatInfo, now as an example of
data category repositories.
— References to ISO 12620:1999 and ISO 12620:2009 have been removed. These previous standards
have been withdrawn.
— The TypedValuedElement style has been added.
— Examples have been updated to reflect ISO 30042:2008 (TBX). TBX-Basic is mentioned as a TML.
iv © ISO 2017 – All rights reserved

---------------------- Page: 6 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

— Some of the examples and tables have been moved to appropriate sections.
— As a consequence of the aforementioned changes, some historical, didactic, or duplicate information
has been removed to adhere more closely to ISO editorial standards.
© ISO 2017 – All rights reserved v

---------------------- Page: 7 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

Introduction
Terminological data are collected, managed and stored in a wide variety of systems, typically various
kinds of database management systems, ranging from personal computer applications for individual
users to large terminological database systems operated by major companies and governmental
agencies. Terminology databases are comprised of various types of information, called data categories,
and can adopt different structural models. However, terminological data often need to be shared and
reused in a number of applications, and this sharing is facilitated when the data adheres to a common
model. To facilitate co-operation and to prevent duplicate work, it is important to develop standards
and guidelines for creating and using terminological data collections (TDCs) as well as for sharing and
exchanging data.
This document presents a modular approach for analysing existing TDCs and designing new ones. It also
provides a framework for defining terminological markup languages (TMLs) that are interoperable.
This document makes reference to DatCatInfo, an example of an available data category repository.
DatCatInfo is an online database of information about the types of data that can be included in
terminological data collections and other language resources. It is available at www.datcatinfo.net.
vi © ISO 2017 – All rights reserved

---------------------- Page: 8 ----------------------

SIST ISO 16642:2018
INTERNATIONAL STANDARD ISO 16642:2017(E)
Computer applications in terminology — Terminological
markup framework
1 Scope
This document specifies a framework for representing data recorded in terminological data collections
(TDCs). This framework includes a metamodel and methods for describing specific terminological
markup languages (TMLs) expressed in XML. The mechanisms for implementing constraints in a TML
are defined, but not the specific constraints for individual TMLs.
This document is designed to support the development and use of computer applications for
terminological data and the exchange of such data between different applications. This document also
defines the conditions that allow the data expressed in one TML to be mapped onto another TML.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 704, Terminology work — Principles and methods
ISO 1087-1, Terminology work — Vocabulary — Part 1: Theory and application
ISO 3166-1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes
ISO 26162, Systems to manage terminology, knowledge and content — Design, implementation and
maintenance of terminology management systems
ISO 30042:2008, Systems to manage terminology, knowledge and content — TermBase eXchange (TBX)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
basic information unit
information unit (3.12) attached to a component (3.3) of the metamodel and that can be expressed by
means of a single data category (3.6)
3.2
complementary information
Cl
information supplementary to that described in terminological entries (3.22) and shared across the
terminological data collection (3.21)
Note 1 to entry: Domain hierarchies, institution descriptions, bibliographic references and references to text
corpora are typical examples of complementary information.
© ISO 2017 – All rights reserved 1

---------------------- Page: 9 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

3.3
component
elementary description unit of a metamodel to which data categories (3.6) can be associated to form a
data model
3.4
compound information unit
information unit (3.12) attached to a component (3.3) of the metamodel that is expressed by means of
several grouped data categories (3.6), that, taken together, express a coherent unit of information
3.5
conceptual domain
set of valid value meanings associated with a data category (3.6)
Note 1 to entry: For example, the data category /part of speech/ could have the following conceptual domain: /
noun/, /verb/, /adjective/, /adverb/, and so forth.
3.6
data category
elementary descriptor used in a linguistic description or annotation scheme
Note 1 to entry: In this document, data categories are indicated in between forward slashes (/), e.g. /definition/.
3.7
data category repository
DCR
electronic repository of data category specifications (3.9) to be used as a reference for the definition of
linguistic annotation schemes or any other representation model for language resources
Note 1 to entry: A DCR for language resources is available at http://www.datcatinfo.net.
3.8
data category selection
DCS
set of data categories (3.6) selected from a DCR (3.7)
3.9
data category specification
set of attributes used to fully describe a given data category (3.6)
Note 1 to entry: The abbreviation “DCS” is associated with data category selection and is not used for data
category specification.
3.10
expansion tree
structured group of XML elements that implement a level of the metamodel in a given TML (3.23)
3.11
global information
GI
technical and administrative information applying to the entire terminological data collection (3.21)
Note 1 to entry: For example, the title of the terminological data collection, revision history, owner or copyright
information.
3.12
information unit
IU
elementary piece of information attached to a structural level of the metamodel
2 © ISO 2017 – All rights reserved

---------------------- Page: 10 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

3.13
language section
LS
part of a terminological entry (3.22) containing information related to one language
Note 1 to entry: One terminological entry may contain information on one or more languages.
3.14
object language
language being described
3.15
persistent identifier
PID
unique Uniform Resource Identifier (URI) that assures permanent access for a digital object by
providing access to it independently of its physical location or current ownership
3.16
structural node
instance of component (3.3) within the representation of a terminological data collection (3.21)
3.17
structural skeleton
abstract description of an instance of a terminological data collection (3.21) in conformity with the
metamodel
3.18
style
specification for the implementation of a data category (3.6) in XML
3.19
term component section
TCS
part of a term section (3.20) giving linguistic information about the components of a term
3.20
term section
TS
part of a language section (3.13) giving information about a term
3.21
terminological data collection
TDC
resource consisting of terminological entries (3.22) with associated meta data and documentary
information
3.22
terminological entry
TE
part of a terminological data collection (3.21) which contains the terminological data related to one concept
Note 1 to entry: Every element in the TE can be linked to complementary information, to other terminological
entries and to other elements in the same terminological entry.
3.23
terminological markup language
TML
XML format for representing a terminological data collection (3.21) conforming to the constraints
expressed in this document
© ISO 2017 – All rights reserved 3

---------------------- Page: 11 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

3.24
Unified Modeling Language
UML
language for specifying, visualizing, constructing and documenting the artifacts of software systems
3.25
vocabulary
set of strings used to implement a data category (3.6) according to a style (3.18)
3.26
working language
language used to describe objects
3.27
XML outline
part of a terminological data collection (3.21) corresponding to the XML implementation of the
metamodel
4 Modular approach
Terminological Markup Framework (TMF) consists of two levels of abstraction. The first (and most
abstract) level is the metamodel level. The metamodel level supports analysis, design and exchange at
a very general level, i.e. it is independent of any specific implementation or software. The metamodel
shall be shared by all TDCs that are compliant with TMF. The second level is the data model level, which
adds the necessary data categories for representing specific TDCs .
The implementation of a data model in XML is called a terminological markup language (TML). TMLs
can be described on the basis of a limited number of characteristics, namely
— how the TML expresses the structural organization of the metamodel (i.e. the expansion trees of
the TML);
— the specific data categories used by the TML and how they relate to the metamodel;
— the way in which these data categories can be expressed in XML and anchored on the expansion
trees of the TML, i.e. the XML style of any given data category;
— the vocabularies used by the TML to express those various informational objects as XML elements
and attributes according to the corresponding XML styles.
Figure 1 represents the information required to fully specify a TML.
— The metamodel describes the basic hierarchy of components to which any TML shall conform.
— A set of data category specifications from a data category repository, which can form the basis for
defining a data category selection (DCS) for the TML
— The dialectal specification (dialect) includes the various elements needed to represent a given TML
in an XML format. These elements comprise expansion trees and data category instantiation styles,
together with their corresponding vocabularies.
4 © ISO 2017 – All rights reserved

---------------------- Page: 12 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

Figure 1 — Various knowledge sources involved in the description of a TML
A DCR providing sample data category specifications for language resources is available at www.
datcatinfo.net. Where possible, data categories documented in this DCR should be used for a TML. If
no suitable data category is available in this DCR, the implementers of the TML should propose the
creation of the required data category specification within this DCR.
5 Generic model for describing terminological data
5.1 Principles
This clause describes a class of XML document structures which can be used to represent a wide
range of terminological data formats, and provides a framework for representing these document
structures in XML.
Each type of document structure is described by means of a three-tiered information structure that
describes:
— a metamodel, which comprises a hierarchy of components;
— information units, which can be associated with each component of the metamodel;
— annotations, which can be used to qualify properties associated with a given information unit.
Information units can be basic or compound. A basic information unit encapsulates information that
can be expressed by means of a single data category. A compound information unit encapsulates
information that is expressed by means of several grouped data categories that, taken together, express
a coherent unit of information. For instance, a compound information unit can be used to represent the
fact that a transaction can be a combination of a transaction type (such as modification), the person
who performed it, and the date when it was performed.
Basic information units, whether they are directly attached to a component or are placed within a
compound information unit, can take two non-exclusive types of value:
— an atomic value corresponding either to a simple type (in the sense of XML schemas) such as a
number, string, element of a picklist, etc., or to a mixed content type in the case of annotated text;
— a reference to a component in order to express a relation between it and the current component.
© ISO 2017 – All rights reserved 5

---------------------- Page: 13 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

Information units can be abstractly represented as feature-value structures. For instance, the following
markup sample
   UHB

can be modelled as a basic information unit in the following feature-value structure:
[owner = UHB]

Similarly, the following TBX markup sample
   
          modification
     YYY
     1964-04-04
   

can be modelled in a feature-value structure as shown in Figure 2.

transac = modiication
transacGrp = responsiblePerson = YYY
date = 1964-04-04
Figure 2 — Feature-value structure
There is also a need to associate semantic information with the content of a data category; this is
achieved through annotations. A typical example is a definition in which the genus and/or differentia
are explicitly marked, as in the following definition for lead pencil:
      
     pencilwhose
          casingis fixed around a central
          graphitemedium which is
          used for writing or making marks
      

Such information cannot be represented as a feature-value structure.
5.2 Generic representation of components and information units
Terminological data can be represented using a generic architecture that consists of a graph of
elementary structural nodes to which one or more information units are attached. This architecture is
shown in the UML diagram in Figure 3.
6 © ISO 2017 – All rights reserved

---------------------- Page: 14 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

Figure 3 — UML diagram for structural nodes and information units
The diagram expresses the relationship between the following defined classes:
— structural node: a class containing one attribute (LevelName) which identifies objects of this
type in the context of a given language resource (for example, TE/Terminological Entry for the
representation of terminological data);
— information unit: a class containing three attributes that: a) identify objects of this type in relation
to a given data category (IUName, e.g. /definition/, /partOfSpeech/, etc.); b) describe a type for its
content (C_type); and c) provide the actual content value (C_value).
The value of C_type can either belong to the set of simple types as defined in XML Schema Part 2:
Datatypes or be MIXED.
Objects of these two classes can be related in the following ways.
— association: Indicates that a structural node is related to another structural node by a hierarchical
link. There is no constraint on the number of links or the structure of the network that those links
create (tree, directed acyclic graph, etc.) (0.*);
— hasContent: Relates a structural node to information units (for instance, a /definition/ attached to a
TE node (terminological entry)). An instance of an information unit is attached to one and only one
structural node (1.1);
— refinement: Relates information units that provide additional information to another information
unit (for example, a /note/ refining a /definition/). A refining information unit is related to one and
only one refined information unit (1.1). Some TMLs allow more levels of refinement than others,
and this affects the degree of interoperability.
The MIXED type is an ordered combination of textual content (strings) and information units,
corresponding to any kind of annotated content. It can be represented in UML by means of the
aggregation operator, as shown in Figure 4.
© ISO 2017 – All rights reserved 7

---------------------- Page: 15 ----------------------

SIST ISO 16642:2018
ISO 16642:2017(E)

Figure 4 — MIXED object class
Adherence to this definition permits annotations to be refined by other information units (for instance
to indicate when and by whom the annotation has been made).
5.3 The metamodel
The terminological metamodel is based on guidelines concerning the methods and principles of
terminology management as described in ISO 704. One of the most important characteristics of a
terminological entry, compared to a lexicographical entry, is its concept orientation. A terminological
entry treats one concept in a given language and, in the case of multilingual terminological entries,
one or more totally or partially equivalent concepts in one or more other languages, whereas a
lexicographical entry contains one lemma (the base form of a lexical unit) and one or more definitions
(representing different meanings) in one or more languages.
Note that some concepts are not universal in that they present slight differences in different languages
or cultures. These differences may be significant enough to declare that they form different and distinct
concepts. Depending on the degree of conceptual difference and similarity, it may be decided to describe
these concepts in the same entry or in different entries.
A terminological data collection (TDC) comprises global information about the collection and a number
of entries. Each entry performs three functions.
— It describes a single concept.
— It identifies the terms that designate the concept.
— It describes the terms themselves.
Each terminological entry can have multiple language sections, and each language section can have
multiple term sections (terms and their accompanying information). Each data element in an entry
can be associated with various kinds of descriptive and administrative information. In addition, there
are various other resources that can be referenced by multiple entries. Such shared resources include
bibliographic references, descriptions of ontologies, and binary data such as images that illustrate
concepts.
The principles of terminology management as described in ISO 704, ISO 26162 and ISO 30042, shall be
respected. These include:
— term autonomy;
— concept orientation;
— data elementarity;
— data granularity.
The terminological metamodel is described through seven in
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.