SIST ISO 24622-2:2021
(Main)Language resource management -- Component metadata infrasctructure (CMDI) -- Part 2: The component metadata specific language
Language resource management -- Component metadata infrasctructure (CMDI) -- Part 2: The component metadata specific language
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate well together. To enable this level of cooperation this document provides in depth descriptions and definitions of what CMDI records, components and their representations in XML look like.
This document describes these XML representations, which enable the flexible construction of interoperable metadata schemas suitable for, but not limited to, describing language resources. The metadata schemas based on these representations can be used to describe resources at different levels of granularity (e.g. descriptions on the collection level or on the level of individual resources).
Gestion des ressources linguistiques -- Composante infrastructure de métadonnées (CMDI) -- Partie 2: Composante linguistique spécifique aux métadonnées
Upravljanje jezikovnih virov - Infrastruktura komponentnih metapodatkov (CMDI) - 2. del: Poseben jezik komponentnih metapodatkov
General Information
Buy Standard
Standards Content (Sample)
SLOVENSKI STANDARD
SIST ISO 24622-2:2021
01-marec-2021
Upravljanje jezikovnih virov - Infrastruktura komponentnih metapodatkov (CMDI) -
2. del: Poseben jezik komponentnih metapodatkov
Language resource management -- Component metadata infrasctructure (CMDI) -- Part
2: The component metadata specific language
Gestion des ressources linguistiques -- Composante infrastructure de métadonnées
(CMDI) -- Partie 2: Composante linguistique spécifique aux métadonnées
Ta slovenski standard je istoveten z: ISO 24622-2:2019
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24622-2:2021 en
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24622-2:2021
---------------------- Page: 2 ----------------------
SIST ISO 24622-2:2021
INTERNATIONAL ISO
STANDARD 24622-2
First edition
2019-07
Language resource management —
Component metadata infrasctructure
(CMDI) —
Part 2:
Component metadata specification
language
Gestion des ressources linguistiques — Composante infrastructure de
métadonnées (CMDI) —
Partie 2: Composante linguistique spécifique aux métadonnées
Reference number
ISO 24622-2:2019(E)
©
ISO 2019
---------------------- Page: 3 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General terms . 1
3.2 CMDI . 3
3.3 XML . 5
4 Notational and XML namespace conventions . 7
5 Structure of CMDI instances . 8
5.1 General structure . 8
5.2 The main structure . 9
5.3 The
5.4 The element.11
5.4.1 General structure of the element .11
5.4.2 The list of resource proxies .11
5.4.3 The list of journal files .12
5.4.4 The list of relations between resource files .13
5.5 The element .15
5.6 The CMD components .15
6 CCSL (CMDI Component Specification Language) .17
6.1 General structure of the CCSL .17
6.2 CCSL header .19
6.3 CMD specification .20
6.4 Definition of CMD elements .21
6.5 CMD attribute definition .23
6.6 Value schemes for CMD elements and CMD attributes .24
6.7 Cue attributes .26
7 CMD .27
7.1 Transformation of CCSL into a CMD profile schema definition .27
7.2 General properties of the CMD profile schema definition .27
7.3 Interpretation of CMD specifications in the CCSL .27
7.3.1 General structure of CMD specifications .27
7.3.2 Document structure prescribed by the CMD profile schema .28
7.4 Interpretation of CMD element definitions in the CCSL .28
7.5 Interpretation of CMD attribute definitions in the CCSL .29
7.6 Content model for CMD elements and CMD attributes in the schema definition.30
Bibliography .31
© ISO 2019 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24622 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Introduction
Many researchers, from the humanities and other domains, have a strong need to study resources
in close detail. Nowadays more and more of these resources are available online. To be able to find
these resources, they are described with metadata. These component metadata (CMD) instances are
collected and made available via central catalogues. Often, resource providers want to include specific
properties of a resource in their metadata to provide all relevant descriptions for a specific type of
resource. The purpose of catalogues tends to be more generic and addresses a broader target audience.
It is hard to strike the balance between these two ends of the spectrum with one metadata schema,
and mismatches can negatively impact the quality of metadata provided. The goal of the component
metadata infrastructure (CMDI) is to provide a flexible mechanism to build resource specific metadata
[14][15]
schemas out of shared components and semantics .
In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated
metadata profile for a specific type of resource. Modellers can browse and search a registry for
components and profiles that are suitable or come close to meeting their requirements. A component
groups together metadata elements that belong together and can potentially be reused in a different
context. Components can also group other components. Existing component registries, e.g., the CLARIN
[16]
(common language resources and technology infrastructure) Component Registry , might already
contain any number of components. These can be reused as they are, or be adapted by modifying, adding
or removing some metadata elements and/or components. Also completely new components can be
created to model the unique aspects of the resources under consideration. All the needed components
are combined into one profile specific for the type of resources. Any component, element and value in
[21]
such a profile may be linked to a semantic description — a concept — to make their meaning explicit .
[17]
These semantic descriptions can be stored in a semantic registry, e.g., the CLARIN Concept Registry .
In the end metadata creators can create records for specific resources that comply with the profile
[22]
relevant for the resource type, and these records can be provided to local and global catalogues .
CMDI has originally been developed in the context of the European CLARIN infrastructure initiative
with input from other initiatives and experts. Already in its preparatory phase, which started in 2007,
the infrastructure needed flexibility in the metadata domain as it was confronted with many types of
[20]
resources that had to be accurately described. For Version 1.0 a toolkit was created, consisting of
the XML schemas and XSLT stylesheets to validate and transform components, profiles and records.
Version 1.1 included some small changes and has seen small incremental backward compatible
advances since 2011. This version has been in use, new developments and the development of this
[18]
document resulted in Version 1.2 . Also CMDI has seen a growing number of tools and infrastructure
systems that deal with its records and components and rely on its shared syntax and semantics.
In ISO 24622-1, the component metadata model has been standardized. This document is compliant
with ISO 24622-1, and also extends and constrains it at various places (see also the red parts in the UML
class diagram in Figure 1):
— support for attributes on both components and elements is added,
— a profile is limited to one root component, and
— an element always belongs to a specific component.
© ISO 2019 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Figure 1 — Component metadata model and its extensions
vi © ISO 2019 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24622-2:2021
INTERNATIONAL STANDARD ISO 24622-2:2019(E)
Language resource management — Component metadata
infrasctructure (CMDI) —
Part 2:
Component metadata specification language
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate
well together. To enable this level of cooperation this document provides in depth descriptions and
definitions of what CMDI records, components and their representations in XML look like.
This document describes these XML representations, which enable the flexible construction of
interoperable metadata schemas suitable for, but not limited to, describing language resources. The
metadata schemas based on these representations can be used to describe resources at different levels
of granularity (e.g. descriptions on the collection level or on the level of individual resources).
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1 General terms
3.1.1
concept
unit of knowledge created by a unique combination of characteristics
1)
[SOURCE: ISO 1087:— , 3.2.3, modified — Note 1 to entry and Note 2 to entry have been deleted.]
3.1.2
concept link
reference from a CMD profile (3.2.11), CMD component (3.2.3), CMD element (3.2.5), CMD attribute (3.2.2)
or a value in a controlled vocabulary (3.1.4) to an entry in a semantic registry (3.1.11) via a Uniform
Resource Identifier (3.1.13)
Note 1 to entry: Typically a concept link is provided as a persistent identifier (3.1.9).
1) Revision of ISO 1087:2000 under preparation. Stage at the time of publication: ISO/FDIS 1087:2019.
© ISO 2019 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.1.3
concept registry
semantic registry (3.1.11) maintaining concepts (3.1.1)
[17]
EXAMPLE The CLARIN Concept Registry as used in the CLARIN infrastructure.
3.1.4
controlled vocabulary
closed/open vocabulary
set of values that can be used either to constrain the set of permissible values or to provide suggestions
for applicable values in a given context
3.1.5
data category
class of data items that are closely related from a formal or semantic point of view
EXAMPLE /part of speech/, /subject field/, /definition/.
Note 1 to entry: A data category can be viewed as a generalization of the notion of a field in a database.
[SOURCE: ISO 30042:2019, 3.8, modified — Note 2 to entry has been deleted.]
3.1.6
language tag
textual code used to assist in identifying languages in every mode of communication
Note 1 to entry: This includes constructed and artificial languages but excludes languages not intended primarily
for human communication, for example in spoken, written, signed, or otherwise signaled, communication (see
[6]
IETF BCP 47 ).
Note 2 to entry: Language tags may be used to assist in the identification of a language in every mode of
communication, for example in spoken, written, signed, or otherwise signaled, communication.
3.1.7
media type
MIME type
media type specification used originally for textual, non-textual, multi-part message bodies of emails
and which provides technical format information on data
[8]
Note 1 to entry: For the purposes of this document, it is as described in IETF RFC 6838 .
3.1.8
metadata
resource (3.1.10) that is a description of another resource, usually given as a set of properties in the
form of attribute-value pairs
Note 1 to entry: This description can contain information about the resource, aspects or parts of the resource
and/or artefacts and actors connected to the resource.
3.1.9
persistent identifier
PID
unique identifier that ensures permanent access for a digital object by providing access to it
independently of its physical location or current ownership
Note 1 to entry: Unique in this context means that the PID will not be issued again for other resources (3.1.10).
However, the same PID can reference different representations or incarnations of the resource at the discretion
of the resource provider.
[SOURCE: ISO 24619:2011, 3.2.4]
2 © ISO 2019 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.1.10
resource
entity, possibly digitally accessible, that can be described in terms of its content and technical
properties, referenced by a Uniform Resource Identifier (3.1.13)
3.1.11
semantic registry
directory of (authoritative) definitions of term (3.1.12), concept (3.1.1) or data category (3.1.5), or the
system maintaining it
Note 1 to entry: These registries generally also provide persistent identifiers (3.1.9) for their entries.
3.1.12
term
designation that represents a general concept (3.1.1) in a specific domain or subject
EXAMPLE “planet”, “tower”, “pen”, “numeral”, “number”, “square root”, “logarithm”, “unit of measurement”,
“base of a logarithm”, “chemical element”, “chemical compound”, “HP Laserjet 1100”, “Nobel Prize in Physics”.
Note 1 to entry: Terms may be partly or wholly verbal.
Note 2 to entry: Terms can include letters and letter symbols, numerals, mathematical symbols, typographical
signs and syntactic signs (e.g. punctuation marks, such as hyphens, parentheses, square brackets and other
connectors or delimiters), sometimes in character styles (i.e. fonts and bold, italic, bold italic, or other style
conventions) governed by domain-, subject-, or language-specific conventions.
[SOURCE: ISO 1087:—, 3.4.2]
3.1.13
Uniform Resource Identifier
URI
sequence of characters that identifies a resource (3.1.10)
[7]
Note 1 to entry: IETC RFC 3986 defines the generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
3.2 CMDI
3.2.1
CCSL
CMDI component specification language
XML (3.3.4) based language for describing a CMD component (3.2.3) and a CMD profile (3.2.11) according
to the CMD model (3.2.10)
3.2.2
CMD attribute
unit within a CMD element (3.2.5) that describes the level at which properties of a CMD element can be
provided by means of value scheme (3.2.20) constrained atomic values
3.2.3
CMD component
component
reusable, structured template for the description of (an aspect of) a resource (3.1.10), defined by means
of a CMD specification (3.2.14) document with the potential of including other CMD components, either
through reference or inline definition
3.2.4
CMD component registry
component registry
service where a CMD specification (3.2.14) can be registered and accessed
© ISO 2019 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.2.5
CMD element
element definition
unit within a CMD component (3.2.3) that describes the level of the CMD instance (3.2.6) that can carry
atomic values governed by a value scheme (3.2.20), and does not contain further levels except for that of
the CMD attribute (3.2.2)
3.2.6
CMD instance
metadata instance
CMDI file
CMDI instance
metadata record
CMD record
file that conforms to the general CMD instance structure as described in this document and, at the CMD
instance payload (3.2.9) level, follows the specific structure defined by the CMD profile (3.2.11) it relates to
3.2.7
CMD instance envelope
section of a CMD instance (3.2.6) which is structured uniformly for all instances and contains the CMD
instance header (3.2.8) and the list of resource proxies (3.2.18) which may be referenced from the CMD
instance payload (3.2.9) section
3.2.8
CMD instance header
section of a CMD instance (3.2.6) marked as ‘header’, providing information on that CMD instance as
such, not the resource (3.1.10) that is described by the metadata file
3.2.9
CMD instance payload
section of a CMD instance (3.2.6) that follows the structure defined by the CMD profile (3.2.11) it
references and contains the description of the resource (3.1.10) to which that CMD instance relates
3.2.10
CMD model
component metadata model
metadata model that is based on CMD components (3.2.3)
Note 1 to entry: For the purposes of this document, it is as specified in ISO 24622-1.
3.2.11
CMD profile
profile
structured template for the description of a class of resources (3.1.10) providing the complete structure
for a CMD instance payload (3.2.9) by means of a hierarchy of CMD components (3.2.3)
3.2.12
CMD profile schema
schema definition by which the correctness of a CMD instance (3.2.6) with respect to the CMD profile
(3.2.11) it pertains to can be evaluated
Note 1 to entry: The CMD profile schema may be expressed as XML Schema (3.3.11) but also in other XML schema
languages.
3.2.13
CMD root component
CMD component (3.2.3) that is defined at the highest level within a CMD profile (3.2.11) that may have
one or more child CMD components (3.2.3) but no siblings
Note 1 to entry: In the CMD instance payload (3.2.9), it is instantiated exactly once.
4 © ISO 2019 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.2.14
CMD specification
component specification
component definition
profile specification
profile definition
representation of a CMD component (3.2.3) or CMD profile (3.2.11), expressed using the constructs of
the CCSL (3.2.1)
3.2.15
CMD specification header
component header
profile header
section of a CMD specification (3.2.14) marked as ‘header’, providing information on that CMD
specification as such that is not part of the defined structure
3.2.16
CMDI
component metadata infrastructure
metadata description framework consisting of the CMD model (3.2.10) and infrastructure to process
instances of parts of the model
3.2.17
inline CMD component
CMD component (3.2.3) that is created and stored within another CMD component and cannot be
addressed from other CMD components
3.2.18
resource proxy
CMD resource reference
representation of a resource (3.1.10) within a CMD instance (3.2.6) containing a Uniform Resource
Identifier (3.1.13) as a reference to the resource itself and an indication of its nature
3.2.19
resource proxy reference
reference from any point within the CMD instance payload (3.2.9) to any of the resource proxy (3.2.18)
elements
3.2.20
value scheme
set of constraints governing the range of values allowed for a specific CMD element (3.2.5) or CMD
attribute (3.2.2) in a CMD instance (3.2.6), expressed in terms of an XML Schema datatype (3.3.12),
controlled vocabulary (3.1.4), or regular expression (3.3.3)
3.3 XML
3.3.1
foreign attribute
XML attribute (3.3.5) defined in a namespace (3.3.2) other than those declared in CMDI (3.2.16), to be
included in a CMD instance (3.2.6) as additional information targeted to specific receivers or applications
3.3.2
XML namespace
namespace
method for qualifying element and attribute names used in XML
[10]
Note 1 to entry: For the purposes of this document, it is as described in W3C XML Namespaces .
© ISO 2019 – All rights reserved 5
---------------------- Page: 13 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.3.3
regular expression
sequence of characters that denote a set of strings
Note 1 to entry: When used to constrain a lexical space, a regular expression asserts that only strings in the
defined set of strings are valid literals for values of that type.
[12]
Note 2 to entry: See also W3C XSchema Part 2 , Appendix F.
3.3.4
XML
markup language for describing hierarchical structures within a text file
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the Extensible
[9]
Markup Language XML .
3.3.5
XML attribute
property of an XML element (3.3.9)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.6
XML attribute declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
attribute (3.3.5)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.2.
3.3.7
XML container element
XML element (3.3.9) that has one or more XML elements as its descendants
3.3.8
XML document
document represented in XML
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.9
XML element
constituent of an XML document (3.3.8)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.10
XML element declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
element (3.3.9)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.3.
3.3.11
XML Schema
document that complies with the XML Schema recommendation
[11]
Note 1 to entry: For the purposes of this document, it refers to the W3C XSchema Part 1 recommendation .
6 © ISO 2019 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.3.12
XML Schema datatype
predefined set of permissible content within an XML element (3.3.9) or an XML attribute (3.3.5) of an
XML document (3.3.8) used in an XML Schema (3.3.11)
[12]
Note 1 to entry: For the purposes of this document, it is as described in W3C XSchema Part 2 .
4 Notational and XML namespace conventions
The following notational conv
...
SLOVENSKI STANDARD
SIST ISO 24622-2:2021
01-marec-2021
Upravljanje jezikovnih virov - Infrastruktura komponentnih metapodatkov (CMDI) -
2. del: Poseben jezik komponentnih metapodatkov
Language resource management -- Component metadata infrasctructure (CMDI) -- Part
2: The component metadata specific language
Gestion des ressources linguistiques -- Composante infrastructure de métadonnées
(CMDI) -- Partie 2: Composante linguistique spécifique aux métadonnées
Ta slovenski standard je istoveten z: ISO 24622-2:2019
ICS:
01.140.20 Informacijske vede Information sciences
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24622-2:2021 en
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24622-2:2021
---------------------- Page: 2 ----------------------
SIST ISO 24622-2:2021
INTERNATIONAL ISO
STANDARD 24622-2
First edition
2019-07
Language resource management —
Component metadata infrasctructure
(CMDI) —
Part 2:
Component metadata specification
language
Gestion des ressources linguistiques — Composante infrastructure de
métadonnées (CMDI) —
Partie 2: Composante linguistique spécifique aux métadonnées
Reference number
ISO 24622-2:2019(E)
©
ISO 2019
---------------------- Page: 3 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General terms . 1
3.2 CMDI . 3
3.3 XML . 5
4 Notational and XML namespace conventions . 7
5 Structure of CMDI instances . 8
5.1 General structure . 8
5.2 The main structure . 9
5.3 The
5.4 The element.11
5.4.1 General structure of the element .11
5.4.2 The list of resource proxies .11
5.4.3 The list of journal files .12
5.4.4 The list of relations between resource files .13
5.5 The element .15
5.6 The CMD components .15
6 CCSL (CMDI Component Specification Language) .17
6.1 General structure of the CCSL .17
6.2 CCSL header .19
6.3 CMD specification .20
6.4 Definition of CMD elements .21
6.5 CMD attribute definition .23
6.6 Value schemes for CMD elements and CMD attributes .24
6.7 Cue attributes .26
7 CMD .27
7.1 Transformation of CCSL into a CMD profile schema definition .27
7.2 General properties of the CMD profile schema definition .27
7.3 Interpretation of CMD specifications in the CCSL .27
7.3.1 General structure of CMD specifications .27
7.3.2 Document structure prescribed by the CMD profile schema .28
7.4 Interpretation of CMD element definitions in the CCSL .28
7.5 Interpretation of CMD attribute definitions in the CCSL .29
7.6 Content model for CMD elements and CMD attributes in the schema definition.30
Bibliography .31
© ISO 2019 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24622 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Introduction
Many researchers, from the humanities and other domains, have a strong need to study resources
in close detail. Nowadays more and more of these resources are available online. To be able to find
these resources, they are described with metadata. These component metadata (CMD) instances are
collected and made available via central catalogues. Often, resource providers want to include specific
properties of a resource in their metadata to provide all relevant descriptions for a specific type of
resource. The purpose of catalogues tends to be more generic and addresses a broader target audience.
It is hard to strike the balance between these two ends of the spectrum with one metadata schema,
and mismatches can negatively impact the quality of metadata provided. The goal of the component
metadata infrastructure (CMDI) is to provide a flexible mechanism to build resource specific metadata
[14][15]
schemas out of shared components and semantics .
In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated
metadata profile for a specific type of resource. Modellers can browse and search a registry for
components and profiles that are suitable or come close to meeting their requirements. A component
groups together metadata elements that belong together and can potentially be reused in a different
context. Components can also group other components. Existing component registries, e.g., the CLARIN
[16]
(common language resources and technology infrastructure) Component Registry , might already
contain any number of components. These can be reused as they are, or be adapted by modifying, adding
or removing some metadata elements and/or components. Also completely new components can be
created to model the unique aspects of the resources under consideration. All the needed components
are combined into one profile specific for the type of resources. Any component, element and value in
[21]
such a profile may be linked to a semantic description — a concept — to make their meaning explicit .
[17]
These semantic descriptions can be stored in a semantic registry, e.g., the CLARIN Concept Registry .
In the end metadata creators can create records for specific resources that comply with the profile
[22]
relevant for the resource type, and these records can be provided to local and global catalogues .
CMDI has originally been developed in the context of the European CLARIN infrastructure initiative
with input from other initiatives and experts. Already in its preparatory phase, which started in 2007,
the infrastructure needed flexibility in the metadata domain as it was confronted with many types of
[20]
resources that had to be accurately described. For Version 1.0 a toolkit was created, consisting of
the XML schemas and XSLT stylesheets to validate and transform components, profiles and records.
Version 1.1 included some small changes and has seen small incremental backward compatible
advances since 2011. This version has been in use, new developments and the development of this
[18]
document resulted in Version 1.2 . Also CMDI has seen a growing number of tools and infrastructure
systems that deal with its records and components and rely on its shared syntax and semantics.
In ISO 24622-1, the component metadata model has been standardized. This document is compliant
with ISO 24622-1, and also extends and constrains it at various places (see also the red parts in the UML
class diagram in Figure 1):
— support for attributes on both components and elements is added,
— a profile is limited to one root component, and
— an element always belongs to a specific component.
© ISO 2019 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
Figure 1 — Component metadata model and its extensions
vi © ISO 2019 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24622-2:2021
INTERNATIONAL STANDARD ISO 24622-2:2019(E)
Language resource management — Component metadata
infrasctructure (CMDI) —
Part 2:
Component metadata specification language
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate
well together. To enable this level of cooperation this document provides in depth descriptions and
definitions of what CMDI records, components and their representations in XML look like.
This document describes these XML representations, which enable the flexible construction of
interoperable metadata schemas suitable for, but not limited to, describing language resources. The
metadata schemas based on these representations can be used to describe resources at different levels
of granularity (e.g. descriptions on the collection level or on the level of individual resources).
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1 General terms
3.1.1
concept
unit of knowledge created by a unique combination of characteristics
1)
[SOURCE: ISO 1087:— , 3.2.3, modified — Note 1 to entry and Note 2 to entry have been deleted.]
3.1.2
concept link
reference from a CMD profile (3.2.11), CMD component (3.2.3), CMD element (3.2.5), CMD attribute (3.2.2)
or a value in a controlled vocabulary (3.1.4) to an entry in a semantic registry (3.1.11) via a Uniform
Resource Identifier (3.1.13)
Note 1 to entry: Typically a concept link is provided as a persistent identifier (3.1.9).
1) Revision of ISO 1087:2000 under preparation. Stage at the time of publication: ISO/FDIS 1087:2019.
© ISO 2019 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.1.3
concept registry
semantic registry (3.1.11) maintaining concepts (3.1.1)
[17]
EXAMPLE The CLARIN Concept Registry as used in the CLARIN infrastructure.
3.1.4
controlled vocabulary
closed/open vocabulary
set of values that can be used either to constrain the set of permissible values or to provide suggestions
for applicable values in a given context
3.1.5
data category
class of data items that are closely related from a formal or semantic point of view
EXAMPLE /part of speech/, /subject field/, /definition/.
Note 1 to entry: A data category can be viewed as a generalization of the notion of a field in a database.
[SOURCE: ISO 30042:2019, 3.8, modified — Note 2 to entry has been deleted.]
3.1.6
language tag
textual code used to assist in identifying languages in every mode of communication
Note 1 to entry: This includes constructed and artificial languages but excludes languages not intended primarily
for human communication, for example in spoken, written, signed, or otherwise signaled, communication (see
[6]
IETF BCP 47 ).
Note 2 to entry: Language tags may be used to assist in the identification of a language in every mode of
communication, for example in spoken, written, signed, or otherwise signaled, communication.
3.1.7
media type
MIME type
media type specification used originally for textual, non-textual, multi-part message bodies of emails
and which provides technical format information on data
[8]
Note 1 to entry: For the purposes of this document, it is as described in IETF RFC 6838 .
3.1.8
metadata
resource (3.1.10) that is a description of another resource, usually given as a set of properties in the
form of attribute-value pairs
Note 1 to entry: This description can contain information about the resource, aspects or parts of the resource
and/or artefacts and actors connected to the resource.
3.1.9
persistent identifier
PID
unique identifier that ensures permanent access for a digital object by providing access to it
independently of its physical location or current ownership
Note 1 to entry: Unique in this context means that the PID will not be issued again for other resources (3.1.10).
However, the same PID can reference different representations or incarnations of the resource at the discretion
of the resource provider.
[SOURCE: ISO 24619:2011, 3.2.4]
2 © ISO 2019 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.1.10
resource
entity, possibly digitally accessible, that can be described in terms of its content and technical
properties, referenced by a Uniform Resource Identifier (3.1.13)
3.1.11
semantic registry
directory of (authoritative) definitions of term (3.1.12), concept (3.1.1) or data category (3.1.5), or the
system maintaining it
Note 1 to entry: These registries generally also provide persistent identifiers (3.1.9) for their entries.
3.1.12
term
designation that represents a general concept (3.1.1) in a specific domain or subject
EXAMPLE “planet”, “tower”, “pen”, “numeral”, “number”, “square root”, “logarithm”, “unit of measurement”,
“base of a logarithm”, “chemical element”, “chemical compound”, “HP Laserjet 1100”, “Nobel Prize in Physics”.
Note 1 to entry: Terms may be partly or wholly verbal.
Note 2 to entry: Terms can include letters and letter symbols, numerals, mathematical symbols, typographical
signs and syntactic signs (e.g. punctuation marks, such as hyphens, parentheses, square brackets and other
connectors or delimiters), sometimes in character styles (i.e. fonts and bold, italic, bold italic, or other style
conventions) governed by domain-, subject-, or language-specific conventions.
[SOURCE: ISO 1087:—, 3.4.2]
3.1.13
Uniform Resource Identifier
URI
sequence of characters that identifies a resource (3.1.10)
[7]
Note 1 to entry: IETC RFC 3986 defines the generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
3.2 CMDI
3.2.1
CCSL
CMDI component specification language
XML (3.3.4) based language for describing a CMD component (3.2.3) and a CMD profile (3.2.11) according
to the CMD model (3.2.10)
3.2.2
CMD attribute
unit within a CMD element (3.2.5) that describes the level at which properties of a CMD element can be
provided by means of value scheme (3.2.20) constrained atomic values
3.2.3
CMD component
component
reusable, structured template for the description of (an aspect of) a resource (3.1.10), defined by means
of a CMD specification (3.2.14) document with the potential of including other CMD components, either
through reference or inline definition
3.2.4
CMD component registry
component registry
service where a CMD specification (3.2.14) can be registered and accessed
© ISO 2019 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.2.5
CMD element
element definition
unit within a CMD component (3.2.3) that describes the level of the CMD instance (3.2.6) that can carry
atomic values governed by a value scheme (3.2.20), and does not contain further levels except for that of
the CMD attribute (3.2.2)
3.2.6
CMD instance
metadata instance
CMDI file
CMDI instance
metadata record
CMD record
file that conforms to the general CMD instance structure as described in this document and, at the CMD
instance payload (3.2.9) level, follows the specific structure defined by the CMD profile (3.2.11) it relates to
3.2.7
CMD instance envelope
section of a CMD instance (3.2.6) which is structured uniformly for all instances and contains the CMD
instance header (3.2.8) and the list of resource proxies (3.2.18) which may be referenced from the CMD
instance payload (3.2.9) section
3.2.8
CMD instance header
section of a CMD instance (3.2.6) marked as ‘header’, providing information on that CMD instance as
such, not the resource (3.1.10) that is described by the metadata file
3.2.9
CMD instance payload
section of a CMD instance (3.2.6) that follows the structure defined by the CMD profile (3.2.11) it
references and contains the description of the resource (3.1.10) to which that CMD instance relates
3.2.10
CMD model
component metadata model
metadata model that is based on CMD components (3.2.3)
Note 1 to entry: For the purposes of this document, it is as specified in ISO 24622-1.
3.2.11
CMD profile
profile
structured template for the description of a class of resources (3.1.10) providing the complete structure
for a CMD instance payload (3.2.9) by means of a hierarchy of CMD components (3.2.3)
3.2.12
CMD profile schema
schema definition by which the correctness of a CMD instance (3.2.6) with respect to the CMD profile
(3.2.11) it pertains to can be evaluated
Note 1 to entry: The CMD profile schema may be expressed as XML Schema (3.3.11) but also in other XML schema
languages.
3.2.13
CMD root component
CMD component (3.2.3) that is defined at the highest level within a CMD profile (3.2.11) that may have
one or more child CMD components (3.2.3) but no siblings
Note 1 to entry: In the CMD instance payload (3.2.9), it is instantiated exactly once.
4 © ISO 2019 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.2.14
CMD specification
component specification
component definition
profile specification
profile definition
representation of a CMD component (3.2.3) or CMD profile (3.2.11), expressed using the constructs of
the CCSL (3.2.1)
3.2.15
CMD specification header
component header
profile header
section of a CMD specification (3.2.14) marked as ‘header’, providing information on that CMD
specification as such that is not part of the defined structure
3.2.16
CMDI
component metadata infrastructure
metadata description framework consisting of the CMD model (3.2.10) and infrastructure to process
instances of parts of the model
3.2.17
inline CMD component
CMD component (3.2.3) that is created and stored within another CMD component and cannot be
addressed from other CMD components
3.2.18
resource proxy
CMD resource reference
representation of a resource (3.1.10) within a CMD instance (3.2.6) containing a Uniform Resource
Identifier (3.1.13) as a reference to the resource itself and an indication of its nature
3.2.19
resource proxy reference
reference from any point within the CMD instance payload (3.2.9) to any of the resource proxy (3.2.18)
elements
3.2.20
value scheme
set of constraints governing the range of values allowed for a specific CMD element (3.2.5) or CMD
attribute (3.2.2) in a CMD instance (3.2.6), expressed in terms of an XML Schema datatype (3.3.12),
controlled vocabulary (3.1.4), or regular expression (3.3.3)
3.3 XML
3.3.1
foreign attribute
XML attribute (3.3.5) defined in a namespace (3.3.2) other than those declared in CMDI (3.2.16), to be
included in a CMD instance (3.2.6) as additional information targeted to specific receivers or applications
3.3.2
XML namespace
namespace
method for qualifying element and attribute names used in XML
[10]
Note 1 to entry: For the purposes of this document, it is as described in W3C XML Namespaces .
© ISO 2019 – All rights reserved 5
---------------------- Page: 13 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.3.3
regular expression
sequence of characters that denote a set of strings
Note 1 to entry: When used to constrain a lexical space, a regular expression asserts that only strings in the
defined set of strings are valid literals for values of that type.
[12]
Note 2 to entry: See also W3C XSchema Part 2 , Appendix F.
3.3.4
XML
markup language for describing hierarchical structures within a text file
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the Extensible
[9]
Markup Language XML .
3.3.5
XML attribute
property of an XML element (3.3.9)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.6
XML attribute declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
attribute (3.3.5)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.2.
3.3.7
XML container element
XML element (3.3.9) that has one or more XML elements as its descendants
3.3.8
XML document
document represented in XML
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.9
XML element
constituent of an XML document (3.3.8)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.10
XML element declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
element (3.3.9)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.3.
3.3.11
XML Schema
document that complies with the XML Schema recommendation
[11]
Note 1 to entry: For the purposes of this document, it refers to the W3C XSchema Part 1 recommendation .
6 © ISO 2019 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24622-2:2021
ISO 24622-2:2019(E)
3.3.12
XML Schema datatype
predefined set of permissible content within an XML element (3.3.9) or an XML attribute (3.3.5) of an
XML document (3.3.8) used in an XML Schema (3.3.11)
[12]
Note 1 to entry: For the purposes of this document, it is as described in W3C XSchema Part 2 .
4 Notational and XML namespace conventions
The
...
INTERNATIONAL ISO
STANDARD 24622-2
First edition
2019-07
Language resource management —
Component metadata infrasctructure
(CMDI) —
Part 2:
Component metadata specification
language
Gestion des ressources linguistiques — Composante infrastructure de
métadonnées (CMDI) —
Partie 2: Composante linguistique spécifique aux métadonnées
Reference number
ISO 24622-2:2019(E)
©
ISO 2019
---------------------- Page: 1 ----------------------
ISO 24622-2:2019(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24622-2:2019(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General terms . 1
3.2 CMDI . 3
3.3 XML . 5
4 Notational and XML namespace conventions . 7
5 Structure of CMDI instances . 8
5.1 General structure . 8
5.2 The main structure . 9
5.3 The
5.4 The element.11
5.4.1 General structure of the element .11
5.4.2 The list of resource proxies .11
5.4.3 The list of journal files .12
5.4.4 The list of relations between resource files .13
5.5 The element .15
5.6 The CMD components .15
6 CCSL (CMDI Component Specification Language) .17
6.1 General structure of the CCSL .17
6.2 CCSL header .19
6.3 CMD specification .20
6.4 Definition of CMD elements .21
6.5 CMD attribute definition .23
6.6 Value schemes for CMD elements and CMD attributes .24
6.7 Cue attributes .26
7 CMD .27
7.1 Transformation of CCSL into a CMD profile schema definition .27
7.2 General properties of the CMD profile schema definition .27
7.3 Interpretation of CMD specifications in the CCSL .27
7.3.1 General structure of CMD specifications .27
7.3.2 Document structure prescribed by the CMD profile schema .28
7.4 Interpretation of CMD element definitions in the CCSL .28
7.5 Interpretation of CMD attribute definitions in the CCSL .29
7.6 Content model for CMD elements and CMD attributes in the schema definition.30
Bibliography .31
© ISO 2019 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24622-2:2019(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24622 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24622-2:2019(E)
Introduction
Many researchers, from the humanities and other domains, have a strong need to study resources
in close detail. Nowadays more and more of these resources are available online. To be able to find
these resources, they are described with metadata. These component metadata (CMD) instances are
collected and made available via central catalogues. Often, resource providers want to include specific
properties of a resource in their metadata to provide all relevant descriptions for a specific type of
resource. The purpose of catalogues tends to be more generic and addresses a broader target audience.
It is hard to strike the balance between these two ends of the spectrum with one metadata schema,
and mismatches can negatively impact the quality of metadata provided. The goal of the component
metadata infrastructure (CMDI) is to provide a flexible mechanism to build resource specific metadata
[14][15]
schemas out of shared components and semantics .
In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated
metadata profile for a specific type of resource. Modellers can browse and search a registry for
components and profiles that are suitable or come close to meeting their requirements. A component
groups together metadata elements that belong together and can potentially be reused in a different
context. Components can also group other components. Existing component registries, e.g., the CLARIN
[16]
(common language resources and technology infrastructure) Component Registry , might already
contain any number of components. These can be reused as they are, or be adapted by modifying, adding
or removing some metadata elements and/or components. Also completely new components can be
created to model the unique aspects of the resources under consideration. All the needed components
are combined into one profile specific for the type of resources. Any component, element and value in
[21]
such a profile may be linked to a semantic description — a concept — to make their meaning explicit .
[17]
These semantic descriptions can be stored in a semantic registry, e.g., the CLARIN Concept Registry .
In the end metadata creators can create records for specific resources that comply with the profile
[22]
relevant for the resource type, and these records can be provided to local and global catalogues .
CMDI has originally been developed in the context of the European CLARIN infrastructure initiative
with input from other initiatives and experts. Already in its preparatory phase, which started in 2007,
the infrastructure needed flexibility in the metadata domain as it was confronted with many types of
[20]
resources that had to be accurately described. For Version 1.0 a toolkit was created, consisting of
the XML schemas and XSLT stylesheets to validate and transform components, profiles and records.
Version 1.1 included some small changes and has seen small incremental backward compatible
advances since 2011. This version has been in use, new developments and the development of this
[18]
document resulted in Version 1.2 . Also CMDI has seen a growing number of tools and infrastructure
systems that deal with its records and components and rely on its shared syntax and semantics.
In ISO 24622-1, the component metadata model has been standardized. This document is compliant
with ISO 24622-1, and also extends and constrains it at various places (see also the red parts in the UML
class diagram in Figure 1):
— support for attributes on both components and elements is added,
— a profile is limited to one root component, and
— an element always belongs to a specific component.
© ISO 2019 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO 24622-2:2019(E)
Figure 1 — Component metadata model and its extensions
vi © ISO 2019 – All rights reserved
---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO 24622-2:2019(E)
Language resource management — Component metadata
infrasctructure (CMDI) —
Part 2:
Component metadata specification language
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate
well together. To enable this level of cooperation this document provides in depth descriptions and
definitions of what CMDI records, components and their representations in XML look like.
This document describes these XML representations, which enable the flexible construction of
interoperable metadata schemas suitable for, but not limited to, describing language resources. The
metadata schemas based on these representations can be used to describe resources at different levels
of granularity (e.g. descriptions on the collection level or on the level of individual resources).
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1 General terms
3.1.1
concept
unit of knowledge created by a unique combination of characteristics
1)
[SOURCE: ISO 1087:— , 3.2.3, modified — Note 1 to entry and Note 2 to entry have been deleted.]
3.1.2
concept link
reference from a CMD profile (3.2.11), CMD component (3.2.3), CMD element (3.2.5), CMD attribute (3.2.2)
or a value in a controlled vocabulary (3.1.4) to an entry in a semantic registry (3.1.11) via a Uniform
Resource Identifier (3.1.13)
Note 1 to entry: Typically a concept link is provided as a persistent identifier (3.1.9).
1) Revision of ISO 1087:2000 under preparation. Stage at the time of publication: ISO/FDIS 1087:2019.
© ISO 2019 – All rights reserved 1
---------------------- Page: 7 ----------------------
ISO 24622-2:2019(E)
3.1.3
concept registry
semantic registry (3.1.11) maintaining concepts (3.1.1)
[17]
EXAMPLE The CLARIN Concept Registry as used in the CLARIN infrastructure.
3.1.4
controlled vocabulary
closed/open vocabulary
set of values that can be used either to constrain the set of permissible values or to provide suggestions
for applicable values in a given context
3.1.5
data category
class of data items that are closely related from a formal or semantic point of view
EXAMPLE /part of speech/, /subject field/, /definition/.
Note 1 to entry: A data category can be viewed as a generalization of the notion of a field in a database.
[SOURCE: ISO 30042:2019, 3.8, modified — Note 2 to entry has been deleted.]
3.1.6
language tag
textual code used to assist in identifying languages in every mode of communication
Note 1 to entry: This includes constructed and artificial languages but excludes languages not intended primarily
for human communication, for example in spoken, written, signed, or otherwise signaled, communication (see
[6]
IETF BCP 47 ).
Note 2 to entry: Language tags may be used to assist in the identification of a language in every mode of
communication, for example in spoken, written, signed, or otherwise signaled, communication.
3.1.7
media type
MIME type
media type specification used originally for textual, non-textual, multi-part message bodies of emails
and which provides technical format information on data
[8]
Note 1 to entry: For the purposes of this document, it is as described in IETF RFC 6838 .
3.1.8
metadata
resource (3.1.10) that is a description of another resource, usually given as a set of properties in the
form of attribute-value pairs
Note 1 to entry: This description can contain information about the resource, aspects or parts of the resource
and/or artefacts and actors connected to the resource.
3.1.9
persistent identifier
PID
unique identifier that ensures permanent access for a digital object by providing access to it
independently of its physical location or current ownership
Note 1 to entry: Unique in this context means that the PID will not be issued again for other resources (3.1.10).
However, the same PID can reference different representations or incarnations of the resource at the discretion
of the resource provider.
[SOURCE: ISO 24619:2011, 3.2.4]
2 © ISO 2019 – All rights reserved
---------------------- Page: 8 ----------------------
ISO 24622-2:2019(E)
3.1.10
resource
entity, possibly digitally accessible, that can be described in terms of its content and technical
properties, referenced by a Uniform Resource Identifier (3.1.13)
3.1.11
semantic registry
directory of (authoritative) definitions of term (3.1.12), concept (3.1.1) or data category (3.1.5), or the
system maintaining it
Note 1 to entry: These registries generally also provide persistent identifiers (3.1.9) for their entries.
3.1.12
term
designation that represents a general concept (3.1.1) in a specific domain or subject
EXAMPLE “planet”, “tower”, “pen”, “numeral”, “number”, “square root”, “logarithm”, “unit of measurement”,
“base of a logarithm”, “chemical element”, “chemical compound”, “HP Laserjet 1100”, “Nobel Prize in Physics”.
Note 1 to entry: Terms may be partly or wholly verbal.
Note 2 to entry: Terms can include letters and letter symbols, numerals, mathematical symbols, typographical
signs and syntactic signs (e.g. punctuation marks, such as hyphens, parentheses, square brackets and other
connectors or delimiters), sometimes in character styles (i.e. fonts and bold, italic, bold italic, or other style
conventions) governed by domain-, subject-, or language-specific conventions.
[SOURCE: ISO 1087:—, 3.4.2]
3.1.13
Uniform Resource Identifier
URI
sequence of characters that identifies a resource (3.1.10)
[7]
Note 1 to entry: IETC RFC 3986 defines the generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
3.2 CMDI
3.2.1
CCSL
CMDI component specification language
XML (3.3.4) based language for describing a CMD component (3.2.3) and a CMD profile (3.2.11) according
to the CMD model (3.2.10)
3.2.2
CMD attribute
unit within a CMD element (3.2.5) that describes the level at which properties of a CMD element can be
provided by means of value scheme (3.2.20) constrained atomic values
3.2.3
CMD component
component
reusable, structured template for the description of (an aspect of) a resource (3.1.10), defined by means
of a CMD specification (3.2.14) document with the potential of including other CMD components, either
through reference or inline definition
3.2.4
CMD component registry
component registry
service where a CMD specification (3.2.14) can be registered and accessed
© ISO 2019 – All rights reserved 3
---------------------- Page: 9 ----------------------
ISO 24622-2:2019(E)
3.2.5
CMD element
element definition
unit within a CMD component (3.2.3) that describes the level of the CMD instance (3.2.6) that can carry
atomic values governed by a value scheme (3.2.20), and does not contain further levels except for that of
the CMD attribute (3.2.2)
3.2.6
CMD instance
metadata instance
CMDI file
CMDI instance
metadata record
CMD record
file that conforms to the general CMD instance structure as described in this document and, at the CMD
instance payload (3.2.9) level, follows the specific structure defined by the CMD profile (3.2.11) it relates to
3.2.7
CMD instance envelope
section of a CMD instance (3.2.6) which is structured uniformly for all instances and contains the CMD
instance header (3.2.8) and the list of resource proxies (3.2.18) which may be referenced from the CMD
instance payload (3.2.9) section
3.2.8
CMD instance header
section of a CMD instance (3.2.6) marked as ‘header’, providing information on that CMD instance as
such, not the resource (3.1.10) that is described by the metadata file
3.2.9
CMD instance payload
section of a CMD instance (3.2.6) that follows the structure defined by the CMD profile (3.2.11) it
references and contains the description of the resource (3.1.10) to which that CMD instance relates
3.2.10
CMD model
component metadata model
metadata model that is based on CMD components (3.2.3)
Note 1 to entry: For the purposes of this document, it is as specified in ISO 24622-1.
3.2.11
CMD profile
profile
structured template for the description of a class of resources (3.1.10) providing the complete structure
for a CMD instance payload (3.2.9) by means of a hierarchy of CMD components (3.2.3)
3.2.12
CMD profile schema
schema definition by which the correctness of a CMD instance (3.2.6) with respect to the CMD profile
(3.2.11) it pertains to can be evaluated
Note 1 to entry: The CMD profile schema may be expressed as XML Schema (3.3.11) but also in other XML schema
languages.
3.2.13
CMD root component
CMD component (3.2.3) that is defined at the highest level within a CMD profile (3.2.11) that may have
one or more child CMD components (3.2.3) but no siblings
Note 1 to entry: In the CMD instance payload (3.2.9), it is instantiated exactly once.
4 © ISO 2019 – All rights reserved
---------------------- Page: 10 ----------------------
ISO 24622-2:2019(E)
3.2.14
CMD specification
component specification
component definition
profile specification
profile definition
representation of a CMD component (3.2.3) or CMD profile (3.2.11), expressed using the constructs of
the CCSL (3.2.1)
3.2.15
CMD specification header
component header
profile header
section of a CMD specification (3.2.14) marked as ‘header’, providing information on that CMD
specification as such that is not part of the defined structure
3.2.16
CMDI
component metadata infrastructure
metadata description framework consisting of the CMD model (3.2.10) and infrastructure to process
instances of parts of the model
3.2.17
inline CMD component
CMD component (3.2.3) that is created and stored within another CMD component and cannot be
addressed from other CMD components
3.2.18
resource proxy
CMD resource reference
representation of a resource (3.1.10) within a CMD instance (3.2.6) containing a Uniform Resource
Identifier (3.1.13) as a reference to the resource itself and an indication of its nature
3.2.19
resource proxy reference
reference from any point within the CMD instance payload (3.2.9) to any of the resource proxy (3.2.18)
elements
3.2.20
value scheme
set of constraints governing the range of values allowed for a specific CMD element (3.2.5) or CMD
attribute (3.2.2) in a CMD instance (3.2.6), expressed in terms of an XML Schema datatype (3.3.12),
controlled vocabulary (3.1.4), or regular expression (3.3.3)
3.3 XML
3.3.1
foreign attribute
XML attribute (3.3.5) defined in a namespace (3.3.2) other than those declared in CMDI (3.2.16), to be
included in a CMD instance (3.2.6) as additional information targeted to specific receivers or applications
3.3.2
XML namespace
namespace
method for qualifying element and attribute names used in XML
[10]
Note 1 to entry: For the purposes of this document, it is as described in W3C XML Namespaces .
© ISO 2019 – All rights reserved 5
---------------------- Page: 11 ----------------------
ISO 24622-2:2019(E)
3.3.3
regular expression
sequence of characters that denote a set of strings
Note 1 to entry: When used to constrain a lexical space, a regular expression asserts that only strings in the
defined set of strings are valid literals for values of that type.
[12]
Note 2 to entry: See also W3C XSchema Part 2 , Appendix F.
3.3.4
XML
markup language for describing hierarchical structures within a text file
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the Extensible
[9]
Markup Language XML .
3.3.5
XML attribute
property of an XML element (3.3.9)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.6
XML attribute declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
attribute (3.3.5)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.2.
3.3.7
XML container element
XML element (3.3.9) that has one or more XML elements as its descendants
3.3.8
XML document
document represented in XML
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.9
XML element
constituent of an XML document (3.3.8)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.10
XML element declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
element (3.3.9)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.3.
3.3.11
XML Schema
document that complies with the XML Schema recommendation
[11]
Note 1 to entry: For the purposes of this document, it refers to the W3C XSchema Part 1 recommendation .
6 © ISO 2019 – All rights reserved
---------------------- Page: 12 ----------------------
ISO 24622-2:2019(E)
3.3.12
XML Schema datatype
predefined set of permissible content within an XML element (3.3.9) or an XML attribute (3.3.5) of an
XML document (3.3.8) used in an XML Schema (3.3.11)
[12]
Note 1 to entry: For the purposes of this document, it is as described in W3C XSchema Part 2 .
4 Notational and XML namespace conventions
The following notational conventions for XML fragments are used throughout this document:
—
an XML element with the generic identifier Element that is bound to a default XML namespace;
—
an XML element with the generic identifier Element that is bound to an XML namespace denoted by
the prefix prefix;
—
an XML element with a contextually specified identifier that is bound to an XML namespace denoted
by the prefix prefix;
— *
any number of XML elements with contextually specified identifiers that are bound to an XML
namespace denoted by the prefix prefix;
— @attr
an XML attribute with the name attr;
— @{attr}
an XML attribute with a contextually specified name;
— @{attr}*
any number of XML attributes with contextually specified names;
— @prefix: attr
an XML attribute with the name attr that is bound to an XML namespace denoted by the prefix prefix;
— string
the literal string shall be used either as element content or attribute value;
— xs: type
the XML schema type with name type.
The XML namespace names and prefixes given in Table 1 are used throughout this document as existing
suitable examples. The column “Recommended Syntax” indicates which syntax variant should be used
by the toolkit and other creators of CMDI related documents.
Table 1 — XML namespaces and prefixes used in this d
...
SLOVENSKI STANDARD
oSIST ISO/DIS 24622-2:2019
01-september-2019
Upravljanje z jezikovnimi viri - Infrastruktura komponentnih metapodatkov (CMDI) -
2. del: Poseben jezik komponentnih metapodatkov
Language resource management -- Component metadata infrasctructure (CMDI) -- Part
2: The component metadata specific language
Gestion des ressources linguistiques -- Composante infrastructure de métadonnées
(CMDI) -- Partie 2: Composante linguistique spécifique aux métadonnées
Ta slovenski standard je istoveten z: ISO/DIS 24622-2
ICS:
01.140.20 Informacijske vede Information sciences
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
oSIST ISO/DIS 24622-2:2019 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
oSIST ISO/DIS 24622-2:2019
---------------------- Page: 2 ----------------------
oSIST ISO/DIS 24622-2:2019
DRAFT INTERNATIONAL STANDARD
ISO/DIS 24622-2
ISO/TC 37/SC 4 Secretariat: KATS
Voting begins on: Voting terminates on:
2018-08-10 2018-11-02
Language resource management — Component metadata
infrasctructure (CMDI) —
Part 2:
The component metadata specific language
Gestion des ressources linguistiques — Composante infrastructure de métadonnées (CMDI) —
Partie 2: Composante linguistique spécifique aux métadonnées
ICS: 01.140.20
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 24622-2:2018(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
©
PROVIDE SUPPORTING DOCUMENTATION. ISO 2018
---------------------- Page: 3 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2018
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2018 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
Contents Page
Foreword . iv
Introduction . v
History . v
1 Scope .1
2 Normative references .2
3 Terms and definitions .2
3.1 General terms .2
3.2 CMDI .4
3.3 XML .6
4 Typographic and XML Namespace conventions .7
5 Structure of CMDI files .9
5.1 General structure .9
5.2 The main structure .9
5.3 The
5.4 The element . 12
5.4.1 The list of resource proxies . 13
5.4.2 The list of journal files . 14
5.4.3 The list of relations between resource files . 14
5.5 The IsPartOf List. 16
5.6 The components . 17
6 The CMDI Component Specification Language (CCSL) . 19
6.1 CCSL header . 21
6.2 CMD component definition . 23
6.3 CMD element definition . 24
6.4 CMD attribute definition . 26
6.5 Value schemes for elements and attributes . 27
6.6 Cue attributes . 29
7 CMD . 30
7.1 Transformation of CCSL into a CMD profile schema definition . 30
7.2 General properties of the CMD profile schema definition . 31
7.3 Interpretation of CMD component definitions in the CCSL. 31
7.3.1 Document structure prescribed by the schema . 32
7.4 Interpretation of CMD element definitions in the CCSL . 32
7.5 Interpretation of CMD attribute definitions in the CCSL . 33
7.6 Content model for CMD elements and CMD attributes in the schema definition . 34
Bibliography . 35
© ISO 2018 – All rights reserved
iii
---------------------- Page: 5 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national
standards bodies (ISO member bodies). The work of preparing International Standards is normally
carried out through ISO technical committees. Each member body interested in a subject for which a
technical committee has been established has the right to be represented on that committee.
International organizations, governmental and non-governmental, in liaison with ISO, also take part in
the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all
matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see the following
URL: www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24622 series can be found on the ISO website.
© ISO 2018 – All rights reserved
iv
---------------------- Page: 6 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
Introduction
Many researchers, from the humanities and other domains, have a strong need to study resources in
close detail. Nowadays more and more of these resources are available online. To be able to find these
resources, they are described with metadata. These metadata records are collected and made available
via central catalogues. Often, resource providers want to include specific properties of a resource in
their metadata to provide all relevant descriptions for a specific type of resource. The purpose of
catalogues tends to be more generic and address a broader target audience. It is hard to strike the
balance between these two ends of the spectrum with one metadata schema, and mismatches can
negatively impact the quality of metadata provided. The goal of the Component Metadata Infrastructure
(CMDI) is to provide a flexible mechanism to build resource specific metadata schemas out of shared
components and semantics (Broeder et al, 2010 and Broeder et al, 2012).
In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated
metadata profile for a specific type of resources. Modellers can browse and search a registry for
components and profiles that are suitable or come close to meeting their requirements. A component
groups together metadata elements that belong together and can potentially be reused in a different
context. Components can also group other components. A component registry, e.g., the CLARIN
Component Registry, might already contain any number of components. These can be reused as they are,
or be adapted by modifying, adding or removing some metadata elements and/or components. Also
completely new components can be created to model the unique aspects of the resources under
consideration. All the needed components are combined into one profile specific for the type of
resources. Any component, element and value in such a profile may be linked to a semantic description -
a concept - to make their meaning explicit (Durco & Windhouwer, 2013). These semantic descriptions
can be stored in a semantic registry, e.g., the CLARIN Concept Registry. In the end metadata creators can
create records for specific resources that comply with the profile relevant for the resource type, and
these records can be provided to local and global catalogues (Van Uytvanck et al, 2012).
History
CMDI has been developed in the context of the European CLARIN infrastructure with input from other
initiatives and experts. Already in its preparatory phase, which started in 2007, the infrastructure
needed flexibility in the metadata domain as it was confronted with many types of resources that had to
be accurately described. For version 1.0 the CMDI toolkit was created, consisting of the XML schemas
and XSLT stylesheets to validate and transform components, profiles and records. Version 1.1 included
some small changes and has seen small incremental backward compatible advances since 2011. This
version has been in use throughout CLARIN’s construction phase. Also CMDI has seen a growing
number of tools and infrastructure systems that deal with its records and components and rely on its
shared syntax and semantics.
© ISO 2018 – All rights reserved
v
---------------------- Page: 7 ----------------------
oSIST ISO/DIS 24622-2:2019
---------------------- Page: 8 ----------------------
oSIST ISO/DIS 24622-2:2019
DRAFT INTERNATIONAL STANDARD ISO/DIS 24622-2:2018(E)
Language resource management — Component metadata
infrastructure (CMDI) — Part 2: The component metadata
specification language
1 Scope
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate
well together. To enable this level of cooperation this document provides in depth descriptions and
definitions of what CMDI records, components and their representations in XML look like.
The scope of this document is to describe these XML representations, which enable the flexible
construction of interoperable metadata schemas suitable for, but not limited to, describing language
resources. The metadata schemas based on these representations can be used to describe resources at
different levels of granularity (e.g. descriptions on the collection level or on the level of individual
resources).
In ISO 24622-1:2015 the component metadata model has been standardized. This document is
compliant with ISO 24622-1:2015, and also extends and constrains it at various places (see also the red
parts in the UML class diagram below):
— support for attributes on both components and elements is added,
— a profile is limited to one root component, and
— an element always belongs to a specific component.
Figure 1 — Component metadata model and its extensions
© ISO 2018 – All rights reserved
1
---------------------- Page: 9 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24622-1:2015, Language resource management— Component metadata infrastructure (CMDI) —
Part 1: The component metadata model
IETF BCP 47, Tags for Identifying Languages, September 2009, https://tools.ietf.org/rfc/bcp/bcp47.txt
IETF RFC 2119, Key words for use in RFCs to Indicate Requirement Levels, March 1997,
https://www.ietf.org/rfc/rfc2119.txt
IETF RFC 3023, XML Media Types, January 2001, https://tools.ietf.org/rfc/rfc3023.txt
IETF RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, January 2005,
https://tools.ietf.org/rfc/rfc3986.txt
IETF RFC 6838, Media Type Specifications and Registration Procedures, January 2013,
https://tools.ietf.org/rfc/rfc6838.txt
W3C XML, Extensible Markup Language (XML) 1.0, (Fifth Edition), T. Bray, J. Paoli, C. M. Sperberg-
McQueen, E. Maler and F. Yergeau (eds.), W3C Recommendation 26 November 2008,
http://www.w3.org/TR/2008/REC-xml-20081126/
W3C XML Namespaces, Namespaces in XML 1.0, (Third Edition), T. Bray, D. Hollander, A. Layman, R.
Tobin and H. S. Thompson (eds.), W3C Recommendation 8 December 2009,
http://www.w3.org/TR/2009/REC-xml-names-20091208/
W3C XSD, XML Schema Part 1: Structures, (Second Edition), H. S. Thompson, D. Beech, M. Maloney and N.
Mendelsohn (eds.), W3C Recommendation 28 October 2004,
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/
W3C XSD Part 2: Datatypes XML Schema Part 2: Datatypes, (Second Edition), P.V. Biron and A. Malhotra
(eds.), W3C Recommendation 02 May 2001,
http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1 General terms
3.1.1
CLARIN infrastructure, CLARIN
infrastructure governed by the CLARIN ERIC
© ISO 2018 – All rights reserved
2
---------------------- Page: 10 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
3.1.2
concept
abstract idea conceived in the mind or generalised from particular instances
Note 1 to entry: cf. Merriam-Webster Dictionary and Thesaurus, definition of concept.
3.1.3.
concept link
reference from a CMD profile, CMD component, CMD element, CMD attribute or a value in a controlled
vocabulary to an entry in a semantic registry via a URI, typically a persistent identifier
3.1.4
concept registry
semantic registry maintaining concepts, e.g., the CLARIN Concept Registry as used in the CLARIN
infrastructure
3.1.5
controlled vocabulary, closed/open vocabulary
set of values that can be used either to constrain the set of permissible values or to provide suggestions
for applicable values in a given context
3.1.6
data category
result of the specification of a given data field
[SOURCE: ISO 12620:2009, 3.1.3]
3.1.7
language tag
textual code used to assist in identifying languages, whether spoken, written, signed, or otherwise
signaled, for the purpose of communication
Note 1 to entry: This includes constructed and artificial languages but excludes languages not intended
primarily for human communication, such as programming languages (IETF BCP 47).
3.1.8
media type, MIME type
type which specifies the nature of the data as described in IETF RFC 6838
3.1.9
metadata
resource that is a description of another resource, usually given as a set of properties in the form of
attribute-value pairs
Note 1 to entry: This description may contain information about the resource, aspects or parts of the resource
and/or artefacts and actors connected to the resource.
3.1.10
persistent identifier, PID
Unique Uniform Resource Identifier that assures permanent access for a resource by providing access
to it independently of its physical location or current ownership
© ISO 2018 – All rights reserved
3
---------------------- Page: 11 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
3.1.11
resource
entity, possibly digitally accessible, that can be described in terms of its content and technical
properties, referenced by a Uniform Resource Identifier
3.1.12
semantic registry
directory of (authoritative) definitions of term, concept or data category, or the system maintaining it
Note 1 to entry: These registries should also provide persistent identifier for their entries.
3.1.13
term
verbal designation of a general concept in a specific subject field
[SOURCE: ISO 1087-1:2000, 3.4.3]
3.1.14
Uniform Resource Identifier, URI
identifier for resource as described in IETF RFC 3986
3.2 CMDI
3.2.1
CCSL
CMDI Component Specification Language
XML based language for describing CMD component and CMD profile according to the CMD model
3.2.2
CMD attribute
unit within a CMD element that describes the level at which properties of a CMD element can be
provided by means of value scheme constrained atomic values
3.2.3
CMD component, component
reusable, structured template for the description of (an aspect of) a resource, defined by means of a
CMD specification document with the potential of including other CMD component, either through
reference or inline definition
3.2.4
CMD component registry
component registry
service where a CMD specification can be registered and accessed
3.2.5
CMD element
element definition
unit within a CMD component that describes the level of the CMD instance that can carry atomic values
governed by a value scheme, and does not contain further levels except for that of the CMD attribute
© ISO 2018 – All rights reserved
4
---------------------- Page: 12 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
3.2.6
CMD instance
metadata instance
CMDI file
CMDI instance
metadata record
CMD record
file that conforms to the general CMD instance structure as described in this document, and at the CMD
instance payload level follows the specific structure defined by the CMD profile it relates to
3.2.7
CMD instance envelope
section of a CMD instance which is structured uniformly for all instances, and contains the CMD instance
header and the list of resource proxies which may be referenced from the CMD instance payload section
3.2.8
CMD instance header
section of a CMD instance marked as ‘header’, providing information on that metadata instance as such,
not the resource that is described by the metadata file
3.2.9
CMD instance payload
section of a CMD instance that follows the structure defined by the CMD profile it references and
contains the description of the resource to which that CMD instance relates
3.2.10
CMD model
Component Metadata model
component based metadata model according to ISO 24622-1
3.2.11
CMD profile
profile definition, profile
structured template for the description of a class of resource providing the complete structure for a
CMD instance payload by means of a hierarchy of CMD components
3.2.12
CMD profile schema
schema definition by which the correctness of a CMD instance with respect to the CMD profile it
pertains to can be evaluated
Note 1 to entry: The CMD profile schema may be expressed as XML Schema but also in other XML schema
languages.
3.2.13
CMD root component
CMD component that is defined at the highest level within a CMD profile that may have one or more
child CMD component but no siblings
Note 1 to entry: In the CMD instance payload, it is instantiated exactly once.
© ISO 2018 – All rights reserved
5
---------------------- Page: 13 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
3.2.14
CMD specification
component specification/definition
profile specification/definition
representation of a CMD component or CMD profile, expressed using the constructs of the CCSL
3.2.15
CMD specification header
component header
profile header
section of a CMD specification marked as ‘header’, providing information on that specification as such
that is not part of the defined structure
3.2.16
CMDI
Component Metadata Infrastructure
metadata description framework consisting of the CMD model and infrastructure to process instances
of (parts of) the model
3.2.17
inline CMD component
CMD component that is created and stored within another component and cannot be addressed from
other components
3.2.18
resource proxy
CMD resource reference
representation of a resource within a CMD instance containing a Uniform Resource Identifier as a
reference to the resource itself and an indication of its nature
3.2.19
resource proxy reference
reference from any point within the CMD instance payload to any of the resource proxy
3.2.20
value scheme
set of constraints governing the range of values allowed for a specific CMD element or CMD attribute in
a CMD instance, expressed in terms of an XML Schema datatype, controlled vocabulary, or regular
expression
3.3 XML
3.3.1
foreign attribute
XML attribute defined in a namespace other than those declared in CMDI, to be included in CMD
instance as additional information targeted to specific receivers or applications
3.3.2
namespace
XML namespace as described in W3C XML Namespaces
© ISO 2018 – All rights reserved
6
---------------------- Page: 14 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
3.3.3
regular expression
expression that constrains the set of permissible values, as described in XML Schema Regular
Expressions (W3C XSD Part 2: Datatypes, appendix F Regular expressions)
3.3.4
XML
Extensible Markup Language as described by W3C recommendation (W3C XML)
3.3.5
XML attribute
property of an XML element as defined in W3C XML
3.3.6
XML attribute declaration
constituent of an XML Schema that constrains the structure and content of a specific XML attribute, in
accordance with W3C XSD, clause 3.2 “Attribute Declarations”
3.3.7
XML container element
XML element that has one or more XML elements as its descendants
3.3.8
XML document
well-formed document as defined in the W3C XML recommendation (W3C XML, definition of XML
Document)
3.3.9
XML element
constituent of an XML document as defined in W3C XML
3.3.10
XML element declaration
constituent of an XML Schema that constrains the structure and content of a specific XML element, in
accordance with W3C XSD, clause 3.3 “Element Declarations”
3.3.11
XML Schema
document that complies with the W3C XML Schema recommendation (W3C XSD)
3.3.12
XML Schema datatype
predefined set of permissible content within a section of an XML document as described in “W3C XSD
Part 2: Datatypes”
4 Typographic and XML Namespace conventions
The following typographic conventions for XML fragments will be used throughout this specification:
—
An XML element with the generic identifier Element that is bound to a default XML namespace;
© ISO 2018 – All rights reserved
7
---------------------- Page: 15 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
—
An XML element with the generic identifier Element that is bound to an XML namespace denoted by
the prefix prefix;
—
An XML element with a contextually specified identifier that is bound to an XML namespace
denoted by the prefix prefix;
— *
Any number of XML elements with contextually specified identifiers that are bound to an XML
namespace denoted by the prefix prefix;
— @attr
An XML attribute with the name attr;
— @{attr}
An XML attribute with a contextually specified name;
— @{attr}*
Any number of XML attributes with contextually specified names;
— @prefix:attr
An XML attribute with the name attr that is bound to an XML namespaces denoted by the prefix
prefix;
— string
The literal string shall be used either as element content or attribute value;
— xs:type
The XML schema type with name type
The following XML namespace names and prefixes are used throughout this specification. The column
“Recommended Syntax” indicates which syntax variant SHOULD be used by the toolkit and other
creators of CMDI related documents.
Prefix Namespace Name Comment Recommended
Syntax
cmd http://www.clarin.eu/cmd/1 CMD instance prefixed
(general/enve-lope)
cmdp http://www.clarin.eu/cmd/1/profiles/{profileId} CMDI payload prefixed
(profile specific)
cue http://www.clarin.eu/cmd/cues/1 Cues for tools prefixed
xs http://www.w3.org/2001/XMLSchema XML Schema prefixed
NOTE the inclusion of the major version number (i.e. 1) in the clarin.eu namespaces, but not the minor
version number reflects the approach that across minor versions within a major version of the CMDI specification,
the namespace is kept constant for compatibility reasons.
© ISO 2018 – All rights reserved
8
---------------------- Page: 16 ----------------------
oSIST ISO/DIS 24622-2:2019
ISO/DIS 24622-2:2018(E)
5 Structure of CMDI files
5.1 General structure
Figure 2 — The structure of a CMDI file (CMD instance)
Colour scheme: Green boxes represent elements that are potentially present in all CMDI files (the
CMD instance envelope). Blue boxes and associations represent elements defined by the CMD
profile (the CMD instance payload). The diagram is meant for overview and illustration; full details
to be found in the tables below.
A CMDI file contains the actual metadata of one specific resource (hereafter referred to as the described
reso
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.