Language resource management — Component metadata infrasctructure (CMDI) — Part 2: Component metadata specification language

The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate well together. To enable this level of cooperation this document provides in depth descriptions and definitions of what CMDI records, components and their representations in XML look like. This document describes these XML representations, which enable the flexible construction of interoperable metadata schemas suitable for, but not limited to, describing language resources. The metadata schemas based on these representations can be used to describe resources at different levels of granularity (e.g. descriptions on the collection level or on the level of individual resources).

Gestion des ressources linguistiques — Composante infrastructure de métadonnées (CMDI) — Partie 2: Composante linguistique spécifique aux métadonnées

Upravljanje jezikovnih virov - Infrastruktura komponentnih metapodatkov (CMDI) - 2. del: Poseben jezik komponentnih metapodatkov

General Information

Status
Published
Publication Date
11-Jul-2019
Current Stage
9093 - International Standard confirmed
Start Date
12-Feb-2025
Completion Date
13-Dec-2025
Standard
ISO 24622-2:2021 - BARVE na PDF-str 8,16,26
English language
38 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24622-2:2021 - BARVE
English language
38 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24622-2:2019 - Language resource management -- Component metadata infrasctructure (CMDI)
English language
32 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


SLOVENSKI STANDARD
01-marec-2021
Upravljanje jezikovnih virov - Infrastruktura komponentnih metapodatkov (CMDI) -
2. del: Poseben jezik komponentnih metapodatkov
Language resource management -- Component metadata infrasctructure (CMDI) -- Part
2: The component metadata specific language
Gestion des ressources linguistiques -- Composante infrastructure de métadonnées
(CMDI) -- Partie 2: Composante linguistique spécifique aux métadonnées
Ta slovenski standard je istoveten z: ISO 24622-2:2019
ICS:
01.140.20 Informacijske vede Information sciences
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

INTERNATIONAL ISO
STANDARD 24622-2
First edition
2019-07
Language resource management —
Component metadata infrasctructure
(CMDI) —
Part 2:
Component metadata specification
language
Gestion des ressources linguistiques — Composante infrastructure de
métadonnées (CMDI) —
Partie 2: Composante linguistique spécifique aux métadonnées
Reference number
©
ISO 2019
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General terms . 1
3.2 CMDI . 3
3.3 XML . 5
4 Notational and XML namespace conventions . 7
5 Structure of CMDI instances . 8
5.1 General structure . 8
5.2 The main structure . 9
5.3 The

element .10
5.4 The element.11
5.4.1 General structure of the element .11
5.4.2 The list of resource proxies .11
5.4.3 The list of journal files .12
5.4.4 The list of relations between resource files .13
5.5 The element .15
5.6 The CMD components .15
6 CCSL (CMDI Component Specification Language) .17
6.1 General structure of the CCSL .17
6.2 CCSL header .19
6.3 CMD specification .20
6.4 Definition of CMD elements .21
6.5 CMD attribute definition .23
6.6 Value schemes for CMD elements and CMD attributes .24
6.7 Cue attributes .26
7 CMD .27
7.1 Transformation of CCSL into a CMD profile schema definition .27
7.2 General properties of the CMD profile schema definition .27
7.3 Interpretation of CMD specifications in the CCSL .27
7.3.1 General structure of CMD specifications .27
7.3.2 Document structure prescribed by the CMD profile schema .28
7.4 Interpretation of CMD element definitions in the CCSL .28
7.5 Interpretation of CMD attribute definitions in the CCSL .29
7.6 Content model for CMD elements and CMD attributes in the schema definition.30
Bibliography .31
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24622 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved

Introduction
Many researchers, from the humanities and other domains, have a strong need to study resources
in close detail. Nowadays more and more of these resources are available online. To be able to find
these resources, they are described with metadata. These component metadata (CMD) instances are
collected and made available via central catalogues. Often, resource providers want to include specific
properties of a resource in their metadata to provide all relevant descriptions for a specific type of
resource. The purpose of catalogues tends to be more generic and addresses a broader target audience.
It is hard to strike the balance between these two ends of the spectrum with one metadata schema,
and mismatches can negatively impact the quality of metadata provided. The goal of the component
metadata infrastructure (CMDI) is to provide a flexible mechanism to build resource specific metadata
[14][15]
schemas out of shared components and semantics .
In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated
metadata profile for a specific type of resource. Modellers can browse and search a registry for
components and profiles that are suitable or come close to meeting their requirements. A component
groups together metadata elements that belong together and can potentially be reused in a different
context. Components can also group other components. Existing component registries, e.g., the CLARIN
[16]
(common language resources and technology infrastructure) Component Registry , might already
contain any number of components. These can be reused as they are, or be adapted by modifying, adding
or removing some metadata elements and/or components. Also completely new components can be
created to model the unique aspects of the resources under consideration. All the needed components
are combined into one profile specific for the type of resources. Any component, element and value in
[21]
such a profile may be linked to a semantic description — a concept — to make their meaning explicit .
[17]
These semantic descriptions can be stored in a semantic registry, e.g., the CLARIN Concept Registry .
In the end metadata creators can create records for specific resources that comply with the profile
[22]
relevant for the resource type, and these records can be provided to local and global catalogues .
CMDI has originally been developed in the context of the European CLARIN infrastructure initiative
with input from other initiatives and experts. Already in its preparatory phase, which started in 2007,
the infrastructure needed flexibility in the metadata domain as it was confronted with many types of
[20]
resources that had to be accurately described. For Version 1.0 a toolkit was created, consisting of
the XML schemas and XSLT stylesheets to validate and transform components, profiles and records.
Version 1.1 included some small changes and has seen small incremental backward compatible
advances since 2011. This version has been in use, new developments and the development of this
[18]
document resulted in Version 1.2 . Also CMDI has seen a growing number of tools and infrastructure
systems that deal with its records and components and rely on its shared syntax and semantics.
In ISO 24622-1, the component metadata model has been standardized. This document is compliant
with ISO 24622-1, and also extends and constrains it at various places (see also the red parts in the UML
class diagram in Figure 1):
— support for attributes on both components and elements is added,
— a profile is limited to one root component, and
— an element always belongs to a specific component.
Figure 1 — Component metadata model and its extensions
vi © ISO 2019 – All rights reserved

INTERNATIONAL STANDARD ISO 24622-2:2019(E)
Language resource management — Component metadata
infrasctructure (CMDI) —
Part 2:
Component metadata specification language
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate
well together. To enable this level of cooperation this document provides in depth descriptions and
definitions of what CMDI records, components and their representations in XML look like.
This document describes these XML representations, which enable the flexible construction of
interoperable metadata schemas suitable for, but not limited to, describing language resources. The
metadata schemas based on these representations can be used to describe resources at different levels
of granularity (e.g. descriptions on the collection level or on the level of individual resources).
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1 General terms
3.1.1
concept
unit of knowledge created by a unique combination of characteristics
1)
[SOURCE: ISO 1087:— , 3.2.3, modified — Note 1 to entry and Note 2 to entry have been deleted.]
3.1.2
concept link
reference from a CMD profile (3.2.11), CMD component (3.2.3), CMD element (3.2.5), CMD attribute (3.2.2)
or a value in a controlled vocabulary (3.1.4) to an entry in a semantic registry (3.1.11) via a Uniform
Resource Identifier (3.1.13)
Note 1 to entry: Typically a concept link is provided as a persistent identifier (3.1.9).
1) Revision of ISO 1087:2000 under preparation. Stage at the time of publication: ISO/FDIS 1087:2019.
3.1.3
concept registry
semantic registry (3.1.11) maintaining concepts (3.1.1)
[17]
EXAMPLE The CLARIN Concept Registry as used in the CLARIN infrastructure.
3.1.4
controlled vocabulary
closed/open vocabulary
set of values that can be used either to constrain the set of permissible values or to provide suggestions
for applicable values in a given context
3.1.5
data category
class of data items that are closely related from a formal or semantic point of view
EXAMPLE /part of speech/, /subject field/, /definition/.
Note 1 to entry: A data category can be viewed as a generalization of the notion of a field in a database.
[SOURCE: ISO 30042:2019, 3.8, modified — Note 2 to entry has been deleted.]
3.1.6
language tag
textual code used to assist in identifying languages in every mode of communication
Note 1 to entry: This includes constructed and artificial languages but excludes languages not intended primarily
for human communication, for example in spoken, written, signed, or otherwise signaled, communication (see
[6]
IETF BCP 47 ).
Note 2 to entry: Language tags may be used to assist in the identification of a language in every mode of
communication, for example in spoken, written, signed, or otherwise signaled, communication.
3.1.7
media type
MIME type
media type specification used originally for textual, non-textual, multi-part message bodies of emails
and which provides technical format information on data
[8]
Note 1 to entry: For the purposes of this document, it is as described in IETF RFC 6838 .
3.1.8
metadata
resource (3.1.10) that is a description of another resource, usually given as a set of properties in the
form of attribute-value pairs
Note 1 to entry: This description can contain information about the resource, aspects or parts of the resource
and/or artefacts and actors connected to the resource.
3.1.9
persistent identifier
PID
unique identifier that ensures permanent access for a digital object by providing access to it
independently of its physical location or current ownership
Note 1 to entry: Unique in this context means that the PID will not be issued again for other resources (3.1.10).
However, the same PID can reference different representations or incarnations of the resource at the discretion
of the resource provider.
[SOURCE: ISO 24619:2011, 3.2.4]
2 © ISO 2019 – All rights reserved

3.1.10
resource
entity, possibly digitally accessible, that can be described in terms of its content and technical
properties, referenced by a Uniform Resource Identifier (3.1.13)
3.1.11
semantic registry
directory of (authoritative) definitions of term (3.1.12), concept (3.1.1) or data category (3.1.5), or the
system maintaining it
Note 1 to entry: These registries generally also provide persistent identifiers (3.1.9) for their entries.
3.1.12
term
designation that represents a general concept (3.1.1) in a specific domain or subject
EXAMPLE “planet”, “tower”, “pen”, “numeral”, “number”, “square root”, “logarithm”, “unit of measurement”,
“base of a logarithm”, “chemical element”, “chemical compound”, “HP Laserjet 1100”, “Nobel Prize in Physics”.
Note 1 to entry: Terms may be partly or wholly verbal.
Note 2 to entry: Terms can include letters and letter symbols, numerals, mathematical symbols, typographical
signs and syntactic signs (e.g. punctuation marks, such as hyphens, parentheses, square brackets and other
connectors or delimiters), sometimes in character styles (i.e. fonts and bold, italic, bold italic, or other style
conventions) governed by domain-, subject-, or language-specific conventions.
[SOURCE: ISO 1087:—, 3.4.2]
3.1.13
Uniform Resource Identifier
URI
sequence of characters that identifies a resource (3.1.10)
[7]
Note 1 to entry: IETC RFC 3986 defines the generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
3.2 CMDI
3.2.1
CCSL
CMDI component specification language
XML (3.3.4) based language for describing a CMD component (3.2.3) and a CMD profile (3.2.11) according
to the CMD model (3.2.10)
3.2.2
CMD attribute
unit within a CMD element (3.2.5) that describes the level at which properties of a CMD element can be
provided by means of value scheme (3.2.20) constrained atomic values
3.2.3
CMD component
component
reusable, structured template for the description of (an aspect of) a resource (3.1.10), defined by means
of a CMD specification (3.2.14) document with the potential of including other CMD components, either
through reference or inline definition
3.2.4
CMD component registry
component registry
service where a CMD specification (3.2.14) can be registered and accessed
3.2.5
CMD element
element definition
unit within a CMD component (3.2.3) that describes the level of the CMD instance (3.2.6) that can carry
atomic values governed by a value scheme (3.2.20), and does not contain further levels except for that of
the CMD attribute (3.2.2)
3.2.6
CMD instance
metadata instance
CMDI file
CMDI instance
metadata record
CMD record
file that conforms to the general CMD instance structure as described in this document and, at the CMD
instance payload (3.2.9) level, follows the specific structure defined by the CMD profile (3.2.11) it relates to
3.2.7
CMD instance envelope
section of a CMD instance (3.2.6) which is structured uniformly for all instances and contains the CMD
instance header (3.2.8) and the list of resource proxies (3.2.18) which may be referenced from the CMD
instance payload (3.2.9) section
3.2.8
CMD instance header
section of a CMD instance (3.2.6) marked as ‘header’, providing information on that CMD instance as
such, not the resource (3.1.10) that is described by the metadata file
3.2.9
CMD instance payload
section of a CMD instance (3.2.6) that follows the structure defined by the CMD profile (3.2.11) it
references and contains the description of the resource (3.1.10) to which that CMD instance relates
3.2.10
CMD model
component metadata model
metadata model that is based on CMD components (3.2.3)
Note 1 to entry: For the purposes of this document, it is as specified in ISO 24622-1.
3.2.11
CMD profile
profile
structured template for the description of a class of resources (3.1.10) providing the complete structure
for a CMD instance payload (3.2.9) by means of a hierarchy of CMD components (3.2.3)
3.2.12
CMD profile schema
schema definition by which the correctness of a CMD instance (3.2.6) with respect to the CMD profile
(3.2.11) it pertains to can be evaluated
Note 1 to entry: The CMD profile schema may be expressed as XML Schema (3.3.11) but also in other XML schema
languages.
3.2.13
CMD root component
CMD component (3.2.3) that is defined at the highest level within a CMD profile (3.2.11) that may have
one or more child CMD components (3.2.3) but no siblings
Note 1 to entry: In the CMD instance payload (3.2.9), it is instantiated exactly once.
4 © ISO 2019 – All rights reserved

3.2.14
CMD specification
component specification
component definition
profile specification
profile definition
representation of a CMD component (3.2.3) or CMD profile (3.2.11), expressed using the constructs of
the CCSL (3.2.1)
3.2.15
CMD specification header
component header
profile header
section of a CMD specification (3.2.14) marked as ‘header’, providing information on that CMD
specification as such that is not part of the defined structure
3.2.16
CMDI
component metadata infrastructure
metadata description framework consisting of the CMD model (3.2.10) and infrastructure to process
instances of parts of the model
3.2.17
inline CMD component
CMD component (3.2.3) that is created and stored within another CMD component and cannot be
addressed from other CMD components
3.2.18
resource proxy
CMD resource reference
representation of a resource (3.1.10) within a CMD instance (3.2.6) containing a Uniform Resource
Identifier (3.1.13) as a reference to the resource itself and an indication of its nature
3.2.19
resource proxy reference
reference from any point within the CMD instance payload (3.2.9) to any of the resource proxy (3.2.18)
elements
3.2.20
value scheme
set of constraints governing the range of values allowed for a specific CMD element (3.2.5) or CMD
attribute (3.2.2) in a CMD instance (3.2.6), expressed in terms of an XML Schema datatype (3.3.12),
controlled vocabulary (3.1.4), or regular expression (3.3.3)
3.3 XML
3.3.1
foreign attribute
XML attribute (3.3.5) defined in a namespace (3.3.2) other than those declared in CMDI (3.2.16), to be
included in a CMD instance (3.2.6) as additional information targeted to specific receivers or applications
3.3.2
XML namespace
namespace
method for qualifying element and attribute names used in XML
[10]
Note 1 to entry: For the purposes of this document, it is as described in W3C XML Namespaces .
3.3.3
regular expression
sequence of characters that denote a set of strings
Note 1 to entry: When used to constrain a lexical space, a regular expression asserts that only strings in the
defined set of strings are valid literals for values of that type.
[12]
Note 2 to entry: See also W3C XSchema Part 2 , Appendix F.
3.3.4
XML
markup language for describing hierarchical structures within a text file
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the Extensible
[9]
Markup Language XML .
3.3.5
XML attribute
property of an XML element (3.3.9)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.6
XML attribute declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
attribute (3.3.5)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.2.
3.3.7
XML container element
XML element (3.3.9) that has one or more XML elements as its descendants
3.3.8
XML document
document represented in XML
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.9
XML element
constituent of an XML document (3.3.8)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.10
XML element declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
element (3.3.9)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.3.
3.3.11
XML Schema
document that complies with the XML Schema recommendation
[11]
Note 1 to entry: For the purposes of this document, it refers to the W3C XSchema Part 1 recommendation .
6 © ISO 2019 – All rights reserved

3.3.12
XML Schema datatype
predefined set of permissible content within an XML element (3.3.9) or an XML attribute (3.3.5) of an
XML document (3.3.8) used in an XML Schema (3.3.11)
[12]
Note 1 to entry: For the purposes of this document, it is as described in W3C XSchema Part 2 .
4 Notational and XML namespace conventions
The following notational conventions for XML fragments are used throughout this document:

an XML element with the generic identifier Element that is bound to a default XML namespace;

an XML element with the generic identifier Element that is bound to an XML namespace denoted by
the prefix prefix;

an XML element with a contextually specified identifier that is bound to an XML namespace denoted
by the prefix prefix;
— *
any number of XML elements with contextually specified identifiers that are bound to an XML
namespace denoted by the prefix prefix;
— @attr
an XML attribute with the name attr;
— @{attr}
an XML attribute with a contextually specified name;
— @{attr}*
any number of XML attributes with contextually specified names;
— @prefix: attr
an XML attribute with the name attr that is bound to an XML namespace denoted by the prefix prefix;
— string
the literal string shall be used either as element content or attribute value;
— xs: type
the XML schema type with name type.
The XML namespace names and prefixes given in Table 1 are used throughout this document as existing
suitable examples. The column “Recommended Syntax” indicates which syntax variant should be used
by the toolkit and other creators of CMDI related documents.
Table 1 — XML namespaces and prefixes used in this document as existing suitable examples
Prefix Namespace name Comment Recommended
syntax
cmd http: //www .clarin .eu/cmd/1 CMD instance prefixed
(general/envelope)
cmdp http: //www .clarin .eu/cmd/1/profiles/profileId CMDI payload prefixed
(CMD profile specific)
cue http: //www .clarin .eu/cmd/cues/1 Cues for tools prefixed
xs http: //www .w3 .org/2001/XMLSchema XML Schema prefixed
NOTE The inclusion of the major version number (i.e. 1) in the clarin.eu namespaces, but not the minor
version number reflects the approach that across minor versions within a major version of the CMDI specification,
the namespace is kept constant for compatibility reasons.
5 Structure of CMDI instances
5.1 General structure
Figure 2 — The structure of a CMD instance
See Figure 2, which uses the following colour scheme: Green boxes represent elements that are
potentially present in all CMD instances (the CMD instance envelope). Blue boxes and associations
8 © ISO 2019 – All rights reserved

represent elements defined by the CMD profile (the CMD instance payload). The diagram is meant for
overview and illustration; full details are found in Tables 2 to 15.
A CMD instance contains the actual metadata of one specific resource (hereafter referred to as the
described resource) and might also be referred to as a CMD record or CMD instance. All CMD instances
have the same structure at the top level (the CMD instance envelope). At a lower level, parts of its
structure are defined by the CMD profile upon which it is based (the CMD instance payload).
5.2 The main structure
A CMD instance has the (XML) root element with one attribute and 4 sub-elements that
appear in the mandatory order described in Table 2.
Table 2 — Root element: order of child elements
Name Value type Occurrences Description
xs:complex type
The (XML) root element of
the CMD instance.
@CMDVersion xs:string("1.2") 1 Denotes the CMDI version on
which this CMDI file is based.
xs:complexType
1 Encapsulates core admin-
istrative data about the
CMDI file.
xs:complexType
1 Includes 3 lists containing
information about resource
proxies and their interre-
lations.
xs:complexType
0 or 1 A list of
elements, each referencing
a larger external resource
of which the described
resource (as a whole) forms
a part.
xs:complexType
1 This element contains the
CMD profile specific section
of the CMD instance. Here
the descriptive metadata of
the resource are found.
The first three elements (, and ) constitute the
CMD instance envelope and reside in the cmd namespace. The CMD instance payload is contained in
the element, the elements of the instance payload (which is CMD profile specific)
exists in the CMD profile specific namespace (prefix cmdp), possibly adorned with attributes in the cmd
namespace.
In addition to this, foreign attributes (XML attributes of other namespaces than those defined in
Clause 4) may occur anywhere in , and elements
and on the element (but not on any of its children). These foreign namespaces
should be ignored by tools unrelated to the party associated with the namespace and therefore may be
removed during processing. The foreign namespace shall be representative of the party that introduces
the extension. For example, the namespace should not start with http://www.clarin.eu, http://
clarin.eu, etc. unless the foreign namespace is introduced by the owner of the domain clarin.eu.
A detailed specification of the above-mentioned parts of a CMD instance is given in the next four clauses.
EXAMPLE CMD instance envelope.
This example shows the main structure of a CMD instance.
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:cmd="http://www.clarin.eu/cmd/1"
xmlns:cmdp="http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1311927752306"
xsi:schemaLocation="http://www.clarin.eu/cmd/1
http://www.clarin.eu/cmd/1/xsd/cmd-envelop.xsd
http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1311927752306
https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/cla
rin.eu:cr1:p_1311927752306/1.2/xsd">





















5.3 The
element
The header of a CMD instance mainly contains administrative information about the metadata, that
is metadata about the CMD instance itself. The included elements shall follow the structure and order
described in Table 3.
Table 3 —
element: order of child elements
Name Value type Occurrences Description
xs:complexType Encapsulates core admin-
istrative data about the
CMD instance.
xs:string
0 to unbounded Denotes the creator of
this metadata file.
xs:date
0 or 1 The date this metadata
file was created.
xs:anyURI
0 or 1 A reference to this
metadata file in its home
repository, in the form of a
persistent identifier (REC-
OMMENDED) or a Uniform
Resource Identifier.
xs:anyURI
1 The CMD profile upon
which this metadata file
is based, given by its
identifier in a component
registry.
xs:string
0 or 1 The collection to which
the described resource
belongs, given as a
human-readable name.
Exploitation tools can
use this name to present
metadata collections.
EXAMPLE Header with foreign attribute.
10 © ISO 2019 – All rights reserved

This example shows the header of a CMD instance, including the use of a foreign attribute, i.e., containing the
ORCID id of the creator.

xmlns:orcid="http://www.orcid.org/ns/orcid">John Doe
2012-04-17
hdl:1234/567890
clarin.eu:cr1:p_1311927752306
CLARIN-NL web
services

5.4 The element
5.4.1 General structure of the element
This section of the CMDI file provides the sequence of
— files which are parts of or closely related to the described resource ( and
),
— possible relations between pairs of these files (),
and shall follow the structure and order described in Table 4.
Table 4 — element: order of child elements
Name Value type Occurrences Description
Includes 3 lists containing in-
xs:complexType formation about resource prox-
ies and their interrelations.
A list of
ele-
xs:complexType 1 ments, each referencing a file
contained in or closely related
to the described resource.
A list of

elements, each referencing a
xs:complexType 1
file (“journal file”) containing
provenance information about
the described resource.
A list of

elements, each representing
xs:complexType 1
a relationship between 2
resource files (as listed in the
)
5.4.2 The list of resource proxies
contains a sequence of zero or more occurrences of ,
each representing a file/part of the described resource, and shall follow the structure and order
described in Table 5.
Table 5 — element: order of child elements
Name Value type Occurrences Description
Contains a list of resource proxies
xs:complexType
(see next row).
Represents a file which is a part of
xs:complexType 0 to unbounded or closely related to the described
resource.
Local identifier for the parent
@id xs:ID
1 , unique
within this CMD instance.
xs:string
(“Resource”,
“Metadata”,
“LandingPage”,
The type of the file represented by

“SearchService”, 1
this .
“SearchPage”;
see the description for
each of the possible
values)
@mimetype xs:string
0 or 1 The media type of the file.
A reference to the file represented
by this ,
xs:anyURI 1 in the form of a persistent identi-
fier (RECOMMENDED) or a Uniform
Resource Identifier.
Resource types
— Resource
A resource that is described in the present CMD instance, e.g., a text document, media file or tool.
— Metadata
A metadata resource, i.e., another CMD instance, that is subordinate to the present CMD instance.
The media type of this metadata resource should be application/x-cmdi+xml.
— SearchPage
Resource that is a web page that allows the described resource to be queried by an end-user.
— SearchService
A resource that is a web service that allows the described resource to be queried by means of
dedicated software.
— LandingPage
A resource that is a web page that provides the original context of the described resource, e.g., a
“deep link” into a repository system.
The most general value of ResourceType is that of a resource. Other values are more specific and should
be selected only if they are applicable. This way, the value resource is consistent with the use of resource
in this document, usually not enclosing metadata, landing pages, etc. that are not part of the resource to
be described.
5.4.3 The list of journal files
contains a sequence of zero or more occurrences of
, each representing a file containing provenance information about the
described resource, and shall follow the structure and order described in Table 6.
12 © ISO 2019 – All rights reserved

Table 6 — element: order of child elements
Name Value type Occurrences Description
Contains a list of journal file
xs:complexType
proxies (see next row).
Represents a file contain-
ing provenance informa-
xs:complexType 0 to unbounded
tion about the described
resource.
A reference to the file
represented by this
,
xs:anyURI 1 in the form of a persistent
identifier (RECOMMENDED)
or a Uniform Resource
Identifier.
NOTE The actual content and layout of the journal file is outside the scope of this document.
5.4.4 The list of relations between resource files
contains a sequence of zero or more occurrences of
, each representing a relation between any pair of ,
and shall follow the structure and order described in Table 7.
If these parts are present they shall appear in the order given in Table 7.
Table 7 — element: order of child elements
Name Value type Occurrences Description
Contains a list of resource relations
xs:complexType
(see next row).
A representation of a relation be-
xs:complexType
0 to unbounded tween 2 resource proxies listed in
.
The type of the relation
xs:string 1 represented by its parent
.
A reference to some concept regis-
try (e.g. CLARIN Concept
@ConceptLink xs:anyURI 0 or 1
[17]
Registry ), indicating the seman-
tics of .
References one of the resource
xs:complexType 2 proxies participating in the rela-
tionship.
A reference to the

with id=ref (the
@ref xs:IDREF

represented by its parent
element).
Indicates the role its parent re-
xs:string
0 or 1
source plays in the relationship.
A reference to some concept regis-
try (e.g. CLARIN Concept
@ConceptLink xs:anyURI 0 or 1
[17]
Registry ), indicating the seman-
tics of .
EXAMPLE 1 Resources.
This example shows a list of resources of various types.



...


SLOVENSKI STANDARD
01-marec-2021
Upravljanje jezikovnih virov - Infrastruktura komponentnih metapodatkov (CMDI) -
2. del: Poseben jezik komponentnih metapodatkov
Language resource management -- Component metadata infrasctructure (CMDI) -- Part
2: The component metadata specific language
Gestion des ressources linguistiques -- Composante infrastructure de métadonnées
(CMDI) -- Partie 2: Composante linguistique spécifique aux métadonnées
Ta slovenski standard je istoveten z: ISO 24622-2:2019
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

INTERNATIONAL ISO
STANDARD 24622-2
First edition
2019-07
Language resource management —
Component metadata infrasctructure
(CMDI) —
Part 2:
Component metadata specification
language
Gestion des ressources linguistiques — Composante infrastructure de
métadonnées (CMDI) —
Partie 2: Composante linguistique spécifique aux métadonnées
Reference number
©
ISO 2019
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General terms . 1
3.2 CMDI . 3
3.3 XML . 5
4 Notational and XML namespace conventions . 7
5 Structure of CMDI instances . 8
5.1 General structure . 8
5.2 The main structure . 9
5.3 The

element .10
5.4 The element.11
5.4.1 General structure of the element .11
5.4.2 The list of resource proxies .11
5.4.3 The list of journal files .12
5.4.4 The list of relations between resource files .13
5.5 The element .15
5.6 The CMD components .15
6 CCSL (CMDI Component Specification Language) .17
6.1 General structure of the CCSL .17
6.2 CCSL header .19
6.3 CMD specification .20
6.4 Definition of CMD elements .21
6.5 CMD attribute definition .23
6.6 Value schemes for CMD elements and CMD attributes .24
6.7 Cue attributes .26
7 CMD .27
7.1 Transformation of CCSL into a CMD profile schema definition .27
7.2 General properties of the CMD profile schema definition .27
7.3 Interpretation of CMD specifications in the CCSL .27
7.3.1 General structure of CMD specifications .27
7.3.2 Document structure prescribed by the CMD profile schema .28
7.4 Interpretation of CMD element definitions in the CCSL .28
7.5 Interpretation of CMD attribute definitions in the CCSL .29
7.6 Content model for CMD elements and CMD attributes in the schema definition.30
Bibliography .31
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24622 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved

Introduction
Many researchers, from the humanities and other domains, have a strong need to study resources
in close detail. Nowadays more and more of these resources are available online. To be able to find
these resources, they are described with metadata. These component metadata (CMD) instances are
collected and made available via central catalogues. Often, resource providers want to include specific
properties of a resource in their metadata to provide all relevant descriptions for a specific type of
resource. The purpose of catalogues tends to be more generic and addresses a broader target audience.
It is hard to strike the balance between these two ends of the spectrum with one metadata schema,
and mismatches can negatively impact the quality of metadata provided. The goal of the component
metadata infrastructure (CMDI) is to provide a flexible mechanism to build resource specific metadata
[14][15]
schemas out of shared components and semantics .
In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated
metadata profile for a specific type of resource. Modellers can browse and search a registry for
components and profiles that are suitable or come close to meeting their requirements. A component
groups together metadata elements that belong together and can potentially be reused in a different
context. Components can also group other components. Existing component registries, e.g., the CLARIN
[16]
(common language resources and technology infrastructure) Component Registry , might already
contain any number of components. These can be reused as they are, or be adapted by modifying, adding
or removing some metadata elements and/or components. Also completely new components can be
created to model the unique aspects of the resources under consideration. All the needed components
are combined into one profile specific for the type of resources. Any component, element and value in
[21]
such a profile may be linked to a semantic description — a concept — to make their meaning explicit .
[17]
These semantic descriptions can be stored in a semantic registry, e.g., the CLARIN Concept Registry .
In the end metadata creators can create records for specific resources that comply with the profile
[22]
relevant for the resource type, and these records can be provided to local and global catalogues .
CMDI has originally been developed in the context of the European CLARIN infrastructure initiative
with input from other initiatives and experts. Already in its preparatory phase, which started in 2007,
the infrastructure needed flexibility in the metadata domain as it was confronted with many types of
[20]
resources that had to be accurately described. For Version 1.0 a toolkit was created, consisting of
the XML schemas and XSLT stylesheets to validate and transform components, profiles and records.
Version 1.1 included some small changes and has seen small incremental backward compatible
advances since 2011. This version has been in use, new developments and the development of this
[18]
document resulted in Version 1.2 . Also CMDI has seen a growing number of tools and infrastructure
systems that deal with its records and components and rely on its shared syntax and semantics.
In ISO 24622-1, the component metadata model has been standardized. This document is compliant
with ISO 24622-1, and also extends and constrains it at various places (see also the red parts in the UML
class diagram in Figure 1):
— support for attributes on both components and elements is added,
— a profile is limited to one root component, and
— an element always belongs to a specific component.
Figure 1 — Component metadata model and its extensions
vi © ISO 2019 – All rights reserved

INTERNATIONAL STANDARD ISO 24622-2:2019(E)
Language resource management — Component metadata
infrasctructure (CMDI) —
Part 2:
Component metadata specification language
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate
well together. To enable this level of cooperation this document provides in depth descriptions and
definitions of what CMDI records, components and their representations in XML look like.
This document describes these XML representations, which enable the flexible construction of
interoperable metadata schemas suitable for, but not limited to, describing language resources. The
metadata schemas based on these representations can be used to describe resources at different levels
of granularity (e.g. descriptions on the collection level or on the level of individual resources).
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1 General terms
3.1.1
concept
unit of knowledge created by a unique combination of characteristics
1)
[SOURCE: ISO 1087:— , 3.2.3, modified — Note 1 to entry and Note 2 to entry have been deleted.]
3.1.2
concept link
reference from a CMD profile (3.2.11), CMD component (3.2.3), CMD element (3.2.5), CMD attribute (3.2.2)
or a value in a controlled vocabulary (3.1.4) to an entry in a semantic registry (3.1.11) via a Uniform
Resource Identifier (3.1.13)
Note 1 to entry: Typically a concept link is provided as a persistent identifier (3.1.9).
1) Revision of ISO 1087:2000 under preparation. Stage at the time of publication: ISO/FDIS 1087:2019.
3.1.3
concept registry
semantic registry (3.1.11) maintaining concepts (3.1.1)
[17]
EXAMPLE The CLARIN Concept Registry as used in the CLARIN infrastructure.
3.1.4
controlled vocabulary
closed/open vocabulary
set of values that can be used either to constrain the set of permissible values or to provide suggestions
for applicable values in a given context
3.1.5
data category
class of data items that are closely related from a formal or semantic point of view
EXAMPLE /part of speech/, /subject field/, /definition/.
Note 1 to entry: A data category can be viewed as a generalization of the notion of a field in a database.
[SOURCE: ISO 30042:2019, 3.8, modified — Note 2 to entry has been deleted.]
3.1.6
language tag
textual code used to assist in identifying languages in every mode of communication
Note 1 to entry: This includes constructed and artificial languages but excludes languages not intended primarily
for human communication, for example in spoken, written, signed, or otherwise signaled, communication (see
[6]
IETF BCP 47 ).
Note 2 to entry: Language tags may be used to assist in the identification of a language in every mode of
communication, for example in spoken, written, signed, or otherwise signaled, communication.
3.1.7
media type
MIME type
media type specification used originally for textual, non-textual, multi-part message bodies of emails
and which provides technical format information on data
[8]
Note 1 to entry: For the purposes of this document, it is as described in IETF RFC 6838 .
3.1.8
metadata
resource (3.1.10) that is a description of another resource, usually given as a set of properties in the
form of attribute-value pairs
Note 1 to entry: This description can contain information about the resource, aspects or parts of the resource
and/or artefacts and actors connected to the resource.
3.1.9
persistent identifier
PID
unique identifier that ensures permanent access for a digital object by providing access to it
independently of its physical location or current ownership
Note 1 to entry: Unique in this context means that the PID will not be issued again for other resources (3.1.10).
However, the same PID can reference different representations or incarnations of the resource at the discretion
of the resource provider.
[SOURCE: ISO 24619:2011, 3.2.4]
2 © ISO 2019 – All rights reserved

3.1.10
resource
entity, possibly digitally accessible, that can be described in terms of its content and technical
properties, referenced by a Uniform Resource Identifier (3.1.13)
3.1.11
semantic registry
directory of (authoritative) definitions of term (3.1.12), concept (3.1.1) or data category (3.1.5), or the
system maintaining it
Note 1 to entry: These registries generally also provide persistent identifiers (3.1.9) for their entries.
3.1.12
term
designation that represents a general concept (3.1.1) in a specific domain or subject
EXAMPLE “planet”, “tower”, “pen”, “numeral”, “number”, “square root”, “logarithm”, “unit of measurement”,
“base of a logarithm”, “chemical element”, “chemical compound”, “HP Laserjet 1100”, “Nobel Prize in Physics”.
Note 1 to entry: Terms may be partly or wholly verbal.
Note 2 to entry: Terms can include letters and letter symbols, numerals, mathematical symbols, typographical
signs and syntactic signs (e.g. punctuation marks, such as hyphens, parentheses, square brackets and other
connectors or delimiters), sometimes in character styles (i.e. fonts and bold, italic, bold italic, or other style
conventions) governed by domain-, subject-, or language-specific conventions.
[SOURCE: ISO 1087:—, 3.4.2]
3.1.13
Uniform Resource Identifier
URI
sequence of characters that identifies a resource (3.1.10)
[7]
Note 1 to entry: IETC RFC 3986 defines the generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
3.2 CMDI
3.2.1
CCSL
CMDI component specification language
XML (3.3.4) based language for describing a CMD component (3.2.3) and a CMD profile (3.2.11) according
to the CMD model (3.2.10)
3.2.2
CMD attribute
unit within a CMD element (3.2.5) that describes the level at which properties of a CMD element can be
provided by means of value scheme (3.2.20) constrained atomic values
3.2.3
CMD component
component
reusable, structured template for the description of (an aspect of) a resource (3.1.10), defined by means
of a CMD specification (3.2.14) document with the potential of including other CMD components, either
through reference or inline definition
3.2.4
CMD component registry
component registry
service where a CMD specification (3.2.14) can be registered and accessed
3.2.5
CMD element
element definition
unit within a CMD component (3.2.3) that describes the level of the CMD instance (3.2.6) that can carry
atomic values governed by a value scheme (3.2.20), and does not contain further levels except for that of
the CMD attribute (3.2.2)
3.2.6
CMD instance
metadata instance
CMDI file
CMDI instance
metadata record
CMD record
file that conforms to the general CMD instance structure as described in this document and, at the CMD
instance payload (3.2.9) level, follows the specific structure defined by the CMD profile (3.2.11) it relates to
3.2.7
CMD instance envelope
section of a CMD instance (3.2.6) which is structured uniformly for all instances and contains the CMD
instance header (3.2.8) and the list of resource proxies (3.2.18) which may be referenced from the CMD
instance payload (3.2.9) section
3.2.8
CMD instance header
section of a CMD instance (3.2.6) marked as ‘header’, providing information on that CMD instance as
such, not the resource (3.1.10) that is described by the metadata file
3.2.9
CMD instance payload
section of a CMD instance (3.2.6) that follows the structure defined by the CMD profile (3.2.11) it
references and contains the description of the resource (3.1.10) to which that CMD instance relates
3.2.10
CMD model
component metadata model
metadata model that is based on CMD components (3.2.3)
Note 1 to entry: For the purposes of this document, it is as specified in ISO 24622-1.
3.2.11
CMD profile
profile
structured template for the description of a class of resources (3.1.10) providing the complete structure
for a CMD instance payload (3.2.9) by means of a hierarchy of CMD components (3.2.3)
3.2.12
CMD profile schema
schema definition by which the correctness of a CMD instance (3.2.6) with respect to the CMD profile
(3.2.11) it pertains to can be evaluated
Note 1 to entry: The CMD profile schema may be expressed as XML Schema (3.3.11) but also in other XML schema
languages.
3.2.13
CMD root component
CMD component (3.2.3) that is defined at the highest level within a CMD profile (3.2.11) that may have
one or more child CMD components (3.2.3) but no siblings
Note 1 to entry: In the CMD instance payload (3.2.9), it is instantiated exactly once.
4 © ISO 2019 – All rights reserved

3.2.14
CMD specification
component specification
component definition
profile specification
profile definition
representation of a CMD component (3.2.3) or CMD profile (3.2.11), expressed using the constructs of
the CCSL (3.2.1)
3.2.15
CMD specification header
component header
profile header
section of a CMD specification (3.2.14) marked as ‘header’, providing information on that CMD
specification as such that is not part of the defined structure
3.2.16
CMDI
component metadata infrastructure
metadata description framework consisting of the CMD model (3.2.10) and infrastructure to process
instances of parts of the model
3.2.17
inline CMD component
CMD component (3.2.3) that is created and stored within another CMD component and cannot be
addressed from other CMD components
3.2.18
resource proxy
CMD resource reference
representation of a resource (3.1.10) within a CMD instance (3.2.6) containing a Uniform Resource
Identifier (3.1.13) as a reference to the resource itself and an indication of its nature
3.2.19
resource proxy reference
reference from any point within the CMD instance payload (3.2.9) to any of the resource proxy (3.2.18)
elements
3.2.20
value scheme
set of constraints governing the range of values allowed for a specific CMD element (3.2.5) or CMD
attribute (3.2.2) in a CMD instance (3.2.6), expressed in terms of an XML Schema datatype (3.3.12),
controlled vocabulary (3.1.4), or regular expression (3.3.3)
3.3 XML
3.3.1
foreign attribute
XML attribute (3.3.5) defined in a namespace (3.3.2) other than those declared in CMDI (3.2.16), to be
included in a CMD instance (3.2.6) as additional information targeted to specific receivers or applications
3.3.2
XML namespace
namespace
method for qualifying element and attribute names used in XML
[10]
Note 1 to entry: For the purposes of this document, it is as described in W3C XML Namespaces .
3.3.3
regular expression
sequence of characters that denote a set of strings
Note 1 to entry: When used to constrain a lexical space, a regular expression asserts that only strings in the
defined set of strings are valid literals for values of that type.
[12]
Note 2 to entry: See also W3C XSchema Part 2 , Appendix F.
3.3.4
XML
markup language for describing hierarchical structures within a text file
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the Extensible
[9]
Markup Language XML .
3.3.5
XML attribute
property of an XML element (3.3.9)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.6
XML attribute declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
attribute (3.3.5)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.2.
3.3.7
XML container element
XML element (3.3.9) that has one or more XML elements as its descendants
3.3.8
XML document
document represented in XML
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.9
XML element
constituent of an XML document (3.3.8)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.10
XML element declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
element (3.3.9)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.3.
3.3.11
XML Schema
document that complies with the XML Schema recommendation
[11]
Note 1 to entry: For the purposes of this document, it refers to the W3C XSchema Part 1 recommendation .
6 © ISO 2019 – All rights reserved

3.3.12
XML Schema datatype
predefined set of permissible content within an XML element (3.3.9) or an XML attribute (3.3.5) of an
XML document (3.3.8) used in an XML Schema (3.3.11)
[12]
Note 1 to entry: For the purposes of this document, it is as described in W3C XSchema Part 2 .
4 Notational and XML namespace conventions
The following notational conventions for XML fragments are used throughout this document:

an XML element with the generic identifier Element that is bound to a default XML namespace;

an XML element with the generic identifier Element that is bound to an XML namespace denoted by
the prefix prefix;

an XML element with a contextually specified identifier that is bound to an XML namespace denoted
by the prefix prefix;
— *
any number of XML elements with contextually specified identifiers that are bound to an XML
namespace denoted by the prefix prefix;
— @attr
an XML attribute with the name attr;
— @{attr}
an XML attribute with a contextually specified name;
— @{attr}*
any number of XML attributes with contextually specified names;
— @prefix: attr
an XML attribute with the name attr that is bound to an XML namespace denoted by the prefix prefix;
— string
the literal string shall be used either as element content or attribute value;
— xs: type
the XML schema type with name type.
The XML namespace names and prefixes given in Table 1 are used throughout this document as existing
suitable examples. The column “Recommended Syntax” indicates which syntax variant should be used
by the toolkit and other creators of CMDI related documents.
Table 1 — XML namespaces and prefixes used in this document as existing suitable examples
Prefix Namespace name Comment Recommended
syntax
cmd http: //www .clarin .eu/cmd/1 CMD instance prefixed
(general/envelope)
cmdp http: //www .clarin .eu/cmd/1/profiles/profileId CMDI payload prefixed
(CMD profile specific)
cue http: //www .clarin .eu/cmd/cues/1 Cues for tools prefixed
xs http: //www .w3 .org/2001/XMLSchema XML Schema prefixed
NOTE The inclusion of the major version number (i.e. 1) in the clarin.eu namespaces, but not the minor
version number reflects the approach that across minor versions within a major version of the CMDI specification,
the namespace is kept constant for compatibility reasons.
5 Structure of CMDI instances
5.1 General structure
Figure 2 — The structure of a CMD instance
See Figure 2, which uses the following colour scheme: Green boxes represent elements that are
potentially present in all CMD instances (the CMD instance envelope). Blue boxes and associations
8 © ISO 2019 – All rights reserved

represent elements defined by the CMD profile (the CMD instance payload). The diagram is meant for
overview and illustration; full details are found in Tables 2 to 15.
A CMD instance contains the actual metadata of one specific resource (hereafter referred to as the
described resource) and might also be referred to as a CMD record or CMD instance. All CMD instances
have the same structure at the top level (the CMD instance envelope). At a lower level, parts of its
structure are defined by the CMD profile upon which it is based (the CMD instance payload).
5.2 The main structure
A CMD instance has the (XML) root element with one attribute and 4 sub-elements that
appear in the mandatory order described in Table 2.
Table 2 — Root element: order of child elements
Name Value type Occurrences Description
xs:complex type
The (XML) root element of
the CMD instance.
@CMDVersion xs:string("1.2") 1 Denotes the CMDI version on
which this CMDI file is based.
xs:complexType
1 Encapsulates core admin-
istrative data about the
CMDI file.
xs:complexType
1 Includes 3 lists containing
information about resource
proxies and their interre-
lations.
xs:complexType
0 or 1 A list of
elements, each referencing
a larger external resource
of which the described
resource (as a whole) forms
a part.
xs:complexType
1 This element contains the
CMD profile specific section
of the CMD instance. Here
the descriptive metadata of
the resource are found.
The first three elements (, and ) constitute the
CMD instance envelope and reside in the cmd namespace. The CMD instance payload is contained in
the element, the elements of the instance payload (which is CMD profile specific)
exists in the CMD profile specific namespace (prefix cmdp), possibly adorned with attributes in the cmd
namespace.
In addition to this, foreign attributes (XML attributes of other namespaces than those defined in
Clause 4) may occur anywhere in , and elements
and on the element (but not on any of its children). These foreign namespaces
should be ignored by tools unrelated to the party associated with the namespace and therefore may be
removed during processing. The foreign namespace shall be representative of the party that introduces
the extension. For example, the namespace should not start with http://www.clarin.eu, http://
clarin.eu, etc. unless the foreign namespace is introduced by the owner of the domain clarin.eu.
A detailed specification of the above-mentioned parts of a CMD instance is given in the next four clauses.
EXAMPLE CMD instance envelope.
This example shows the main structure of a CMD instance.
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:cmd="http://www.clarin.eu/cmd/1"
xmlns:cmdp="http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1311927752306"
xsi:schemaLocation="http://www.clarin.eu/cmd/1
http://www.clarin.eu/cmd/1/xsd/cmd-envelop.xsd
http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1311927752306
https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/cla
rin.eu:cr1:p_1311927752306/1.2/xsd">





















5.3 The
element
The header of a CMD instance mainly contains administrative information about the metadata, that
is metadata about the CMD instance itself. The included elements shall follow the structure and order
described in Table 3.
Table 3 —
element: order of child elements
Name Value type Occurrences Description
xs:complexType Encapsulates core admin-
istrative data about the
CMD instance.
xs:string
0 to unbounded Denotes the creator of
this metadata file.
xs:date
0 or 1 The date this metadata
file was created.
xs:anyURI
0 or 1 A reference to this
metadata file in its home
repository, in the form of a
persistent identifier (REC-
OMMENDED) or a Uniform
Resource Identifier.
xs:anyURI
1 The CMD profile upon
which this metadata file
is based, given by its
identifier in a component
registry.
xs:string
0 or 1 The collection to which
the described resource
belongs, given as a
human-readable name.
Exploitation tools can
use this name to present
metadata collections.
EXAMPLE Header with foreign attribute.
10 © ISO 2019 – All rights reserved

This example shows the header of a CMD instance, including the use of a foreign attribute, i.e., containing the
ORCID id of the creator.

xmlns:orcid="http://www.orcid.org/ns/orcid">John Doe
2012-04-17
hdl:1234/567890
clarin.eu:cr1:p_1311927752306
CLARIN-NL web
services

5.4 The element
5.4.1 General structure of the element
This section of the CMDI file provides the sequence of
— files which are parts of or closely related to the described resource ( and
),
— possible relations between pairs of these files (),
and shall follow the structure and order described in Table 4.
Table 4 — element: order of child elements
Name Value type Occurrences Description
Includes 3 lists containing in-
xs:complexType formation about resource prox-
ies and their interrelations.
A list of
ele-
xs:complexType 1 ments, each referencing a file
contained in or closely related
to the described resource.
A list of

elements, each referencing a
xs:complexType 1
file (“journal file”) containing
provenance information about
the described resource.
A list of

elements, each representing
xs:complexType 1
a relationship between 2
resource files (as listed in the
)
5.4.2 The list of resource proxies
contains a sequence of zero or more occurrences of ,
each representing a file/part of the described resource, and shall follow the structure and order
described in Table 5.
Table 5 — element: order of child elements
Name Value type Occurrences Description
Contains a list of resource proxies
xs:complexType
(see next row).
Represents a file which is a part of
xs:complexType 0 to unbounded or closely related to the described
resource.
Local identifier for the parent
@id xs:ID
1 , unique
within this CMD instance.
xs:string
(“Resource”,
“Metadata”,
“LandingPage”,
The type of the file represented by

“SearchService”, 1
this .
“SearchPage”;
see the description for
each of the possible
values)
@mimetype xs:string
0 or 1 The media type of the file.
A reference to the file represented
by this ,
xs:anyURI 1 in the form of a persistent identi-
fier (RECOMMENDED) or a Uniform
Resource Identifier.
Resource types
— Resource
A resource that is described in the present CMD instance, e.g., a text document, media file or tool.
— Metadata
A metadata resource, i.e., another CMD instance, that is subordinate to the present CMD instance.
The media type of this metadata resource should be application/x-cmdi+xml.
— SearchPage
Resource that is a web page that allows the described resource to be queried by an end-user.
— SearchService
A resource that is a web service that allows the described resource to be queried by means of
dedicated software.
— LandingPage
A resource that is a web page that provides the original context of the described resource, e.g., a
“deep link” into a repository system.
The most general value of ResourceType is that of a resource. Other values are more specific and should
be selected only if they are applicable. This way, the value resource is consistent with the use of resource
in this document, usually not enclosing metadata, landing pages, etc. that are not part of the resource to
be described.
5.4.3 The list of journal files
contains a sequence of zero or more occurrences of
, each representing a file containing provenance information about the
described resource, and shall follow the structure and order described in Table 6.
12 © ISO 2019 – All rights reserved

Table 6 — element: order of child elements
Name Value type Occurrences Description
Contains a list of journal file
xs:complexType
proxies (see next row).
Represents a file contain-
ing provenance informa-
xs:complexType 0 to unbounded
tion about the described
resource.
A reference to the file
represented by this
,
xs:anyURI 1 in the form of a persistent
identifier (RECOMMENDED)
or a Uniform Resource
Identifier.
NOTE The actual content and layout of the journal file is outside the scope of this document.
5.4.4 The list of relations between resource files
contains a sequence of zero or more occurrences of
, each representing a relation between any pair of ,
and shall follow the structure and order described in Table 7.
If these parts are present they shall appear in the order given in Table 7.
Table 7 — element: order of child elements
Name Value type Occurrences Description
Contains a list of resource relations
xs:complexType
(see next row).
A representation of a relation be-
xs:complexType
0 to unbounded tween 2 resource proxies listed in
.
The type of the relation
xs:string 1 represented by its parent
.
A reference to some concept regis-
try (e.g. CLARIN Concept
@ConceptLink xs:anyURI 0 or 1
[17]
Registry ), indicating the seman-
tics of .
References one of the resource
xs:complexType 2 proxies participating in the rela-
tionship.
A reference to the

with id=ref (the
@ref xs:IDREF

represented by its parent
element).
Indicates the role its parent re-
xs:string
0 or 1
source plays in the relationship.
A reference to some concept regis-
try (e.g. CLARIN Concept
@ConceptLink xs:anyURI 0 or 1
[17]
Registry ), indicating the seman-
tics of .
EXAMPLE 1 Resources.
This example shows a list of resources of various types.



mimetype="application/x-h
...


INTERNATIONAL ISO
STANDARD 24622-2
First edition
2019-07
Language resource management —
Component metadata infrasctructure
(CMDI) —
Part 2:
Component metadata specification
language
Gestion des ressources linguistiques — Composante infrastructure de
métadonnées (CMDI) —
Partie 2: Composante linguistique spécifique aux métadonnées
Reference number
©
ISO 2019
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General terms . 1
3.2 CMDI . 3
3.3 XML . 5
4 Notational and XML namespace conventions . 7
5 Structure of CMDI instances . 8
5.1 General structure . 8
5.2 The main structure . 9
5.3 The

element .10
5.4 The element.11
5.4.1 General structure of the element .11
5.4.2 The list of resource proxies .11
5.4.3 The list of journal files .12
5.4.4 The list of relations between resource files .13
5.5 The element .15
5.6 The CMD components .15
6 CCSL (CMDI Component Specification Language) .17
6.1 General structure of the CCSL .17
6.2 CCSL header .19
6.3 CMD specification .20
6.4 Definition of CMD elements .21
6.5 CMD attribute definition .23
6.6 Value schemes for CMD elements and CMD attributes .24
6.7 Cue attributes .26
7 CMD .27
7.1 Transformation of CCSL into a CMD profile schema definition .27
7.2 General properties of the CMD profile schema definition .27
7.3 Interpretation of CMD specifications in the CCSL .27
7.3.1 General structure of CMD specifications .27
7.3.2 Document structure prescribed by the CMD profile schema .28
7.4 Interpretation of CMD element definitions in the CCSL .28
7.5 Interpretation of CMD attribute definitions in the CCSL .29
7.6 Content model for CMD elements and CMD attributes in the schema definition.30
Bibliography .31
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24622 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved

Introduction
Many researchers, from the humanities and other domains, have a strong need to study resources
in close detail. Nowadays more and more of these resources are available online. To be able to find
these resources, they are described with metadata. These component metadata (CMD) instances are
collected and made available via central catalogues. Often, resource providers want to include specific
properties of a resource in their metadata to provide all relevant descriptions for a specific type of
resource. The purpose of catalogues tends to be more generic and addresses a broader target audience.
It is hard to strike the balance between these two ends of the spectrum with one metadata schema,
and mismatches can negatively impact the quality of metadata provided. The goal of the component
metadata infrastructure (CMDI) is to provide a flexible mechanism to build resource specific metadata
[14][15]
schemas out of shared components and semantics .
In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated
metadata profile for a specific type of resource. Modellers can browse and search a registry for
components and profiles that are suitable or come close to meeting their requirements. A component
groups together metadata elements that belong together and can potentially be reused in a different
context. Components can also group other components. Existing component registries, e.g., the CLARIN
[16]
(common language resources and technology infrastructure) Component Registry , might already
contain any number of components. These can be reused as they are, or be adapted by modifying, adding
or removing some metadata elements and/or components. Also completely new components can be
created to model the unique aspects of the resources under consideration. All the needed components
are combined into one profile specific for the type of resources. Any component, element and value in
[21]
such a profile may be linked to a semantic description — a concept — to make their meaning explicit .
[17]
These semantic descriptions can be stored in a semantic registry, e.g., the CLARIN Concept Registry .
In the end metadata creators can create records for specific resources that comply with the profile
[22]
relevant for the resource type, and these records can be provided to local and global catalogues .
CMDI has originally been developed in the context of the European CLARIN infrastructure initiative
with input from other initiatives and experts. Already in its preparatory phase, which started in 2007,
the infrastructure needed flexibility in the metadata domain as it was confronted with many types of
[20]
resources that had to be accurately described. For Version 1.0 a toolkit was created, consisting of
the XML schemas and XSLT stylesheets to validate and transform components, profiles and records.
Version 1.1 included some small changes and has seen small incremental backward compatible
advances since 2011. This version has been in use, new developments and the development of this
[18]
document resulted in Version 1.2 . Also CMDI has seen a growing number of tools and infrastructure
systems that deal with its records and components and rely on its shared syntax and semantics.
In ISO 24622-1, the component metadata model has been standardized. This document is compliant
with ISO 24622-1, and also extends and constrains it at various places (see also the red parts in the UML
class diagram in Figure 1):
— support for attributes on both components and elements is added,
— a profile is limited to one root component, and
— an element always belongs to a specific component.
Figure 1 — Component metadata model and its extensions
vi © ISO 2019 – All rights reserved

INTERNATIONAL STANDARD ISO 24622-2:2019(E)
Language resource management — Component metadata
infrasctructure (CMDI) —
Part 2:
Component metadata specification language
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate
well together. To enable this level of cooperation this document provides in depth descriptions and
definitions of what CMDI records, components and their representations in XML look like.
This document describes these XML representations, which enable the flexible construction of
interoperable metadata schemas suitable for, but not limited to, describing language resources. The
metadata schemas based on these representations can be used to describe resources at different levels
of granularity (e.g. descriptions on the collection level or on the level of individual resources).
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1 General terms
3.1.1
concept
unit of knowledge created by a unique combination of characteristics
1)
[SOURCE: ISO 1087:— , 3.2.3, modified — Note 1 to entry and Note 2 to entry have been deleted.]
3.1.2
concept link
reference from a CMD profile (3.2.11), CMD component (3.2.3), CMD element (3.2.5), CMD attribute (3.2.2)
or a value in a controlled vocabulary (3.1.4) to an entry in a semantic registry (3.1.11) via a Uniform
Resource Identifier (3.1.13)
Note 1 to entry: Typically a concept link is provided as a persistent identifier (3.1.9).
1) Revision of ISO 1087:2000 under preparation. Stage at the time of publication: ISO/FDIS 1087:2019.
3.1.3
concept registry
semantic registry (3.1.11) maintaining concepts (3.1.1)
[17]
EXAMPLE The CLARIN Concept Registry as used in the CLARIN infrastructure.
3.1.4
controlled vocabulary
closed/open vocabulary
set of values that can be used either to constrain the set of permissible values or to provide suggestions
for applicable values in a given context
3.1.5
data category
class of data items that are closely related from a formal or semantic point of view
EXAMPLE /part of speech/, /subject field/, /definition/.
Note 1 to entry: A data category can be viewed as a generalization of the notion of a field in a database.
[SOURCE: ISO 30042:2019, 3.8, modified — Note 2 to entry has been deleted.]
3.1.6
language tag
textual code used to assist in identifying languages in every mode of communication
Note 1 to entry: This includes constructed and artificial languages but excludes languages not intended primarily
for human communication, for example in spoken, written, signed, or otherwise signaled, communication (see
[6]
IETF BCP 47 ).
Note 2 to entry: Language tags may be used to assist in the identification of a language in every mode of
communication, for example in spoken, written, signed, or otherwise signaled, communication.
3.1.7
media type
MIME type
media type specification used originally for textual, non-textual, multi-part message bodies of emails
and which provides technical format information on data
[8]
Note 1 to entry: For the purposes of this document, it is as described in IETF RFC 6838 .
3.1.8
metadata
resource (3.1.10) that is a description of another resource, usually given as a set of properties in the
form of attribute-value pairs
Note 1 to entry: This description can contain information about the resource, aspects or parts of the resource
and/or artefacts and actors connected to the resource.
3.1.9
persistent identifier
PID
unique identifier that ensures permanent access for a digital object by providing access to it
independently of its physical location or current ownership
Note 1 to entry: Unique in this context means that the PID will not be issued again for other resources (3.1.10).
However, the same PID can reference different representations or incarnations of the resource at the discretion
of the resource provider.
[SOURCE: ISO 24619:2011, 3.2.4]
2 © ISO 2019 – All rights reserved

3.1.10
resource
entity, possibly digitally accessible, that can be described in terms of its content and technical
properties, referenced by a Uniform Resource Identifier (3.1.13)
3.1.11
semantic registry
directory of (authoritative) definitions of term (3.1.12), concept (3.1.1) or data category (3.1.5), or the
system maintaining it
Note 1 to entry: These registries generally also provide persistent identifiers (3.1.9) for their entries.
3.1.12
term
designation that represents a general concept (3.1.1) in a specific domain or subject
EXAMPLE “planet”, “tower”, “pen”, “numeral”, “number”, “square root”, “logarithm”, “unit of measurement”,
“base of a logarithm”, “chemical element”, “chemical compound”, “HP Laserjet 1100”, “Nobel Prize in Physics”.
Note 1 to entry: Terms may be partly or wholly verbal.
Note 2 to entry: Terms can include letters and letter symbols, numerals, mathematical symbols, typographical
signs and syntactic signs (e.g. punctuation marks, such as hyphens, parentheses, square brackets and other
connectors or delimiters), sometimes in character styles (i.e. fonts and bold, italic, bold italic, or other style
conventions) governed by domain-, subject-, or language-specific conventions.
[SOURCE: ISO 1087:—, 3.4.2]
3.1.13
Uniform Resource Identifier
URI
sequence of characters that identifies a resource (3.1.10)
[7]
Note 1 to entry: IETC RFC 3986 defines the generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
3.2 CMDI
3.2.1
CCSL
CMDI component specification language
XML (3.3.4) based language for describing a CMD component (3.2.3) and a CMD profile (3.2.11) according
to the CMD model (3.2.10)
3.2.2
CMD attribute
unit within a CMD element (3.2.5) that describes the level at which properties of a CMD element can be
provided by means of value scheme (3.2.20) constrained atomic values
3.2.3
CMD component
component
reusable, structured template for the description of (an aspect of) a resource (3.1.10), defined by means
of a CMD specification (3.2.14) document with the potential of including other CMD components, either
through reference or inline definition
3.2.4
CMD component registry
component registry
service where a CMD specification (3.2.14) can be registered and accessed
3.2.5
CMD element
element definition
unit within a CMD component (3.2.3) that describes the level of the CMD instance (3.2.6) that can carry
atomic values governed by a value scheme (3.2.20), and does not contain further levels except for that of
the CMD attribute (3.2.2)
3.2.6
CMD instance
metadata instance
CMDI file
CMDI instance
metadata record
CMD record
file that conforms to the general CMD instance structure as described in this document and, at the CMD
instance payload (3.2.9) level, follows the specific structure defined by the CMD profile (3.2.11) it relates to
3.2.7
CMD instance envelope
section of a CMD instance (3.2.6) which is structured uniformly for all instances and contains the CMD
instance header (3.2.8) and the list of resource proxies (3.2.18) which may be referenced from the CMD
instance payload (3.2.9) section
3.2.8
CMD instance header
section of a CMD instance (3.2.6) marked as ‘header’, providing information on that CMD instance as
such, not the resource (3.1.10) that is described by the metadata file
3.2.9
CMD instance payload
section of a CMD instance (3.2.6) that follows the structure defined by the CMD profile (3.2.11) it
references and contains the description of the resource (3.1.10) to which that CMD instance relates
3.2.10
CMD model
component metadata model
metadata model that is based on CMD components (3.2.3)
Note 1 to entry: For the purposes of this document, it is as specified in ISO 24622-1.
3.2.11
CMD profile
profile
structured template for the description of a class of resources (3.1.10) providing the complete structure
for a CMD instance payload (3.2.9) by means of a hierarchy of CMD components (3.2.3)
3.2.12
CMD profile schema
schema definition by which the correctness of a CMD instance (3.2.6) with respect to the CMD profile
(3.2.11) it pertains to can be evaluated
Note 1 to entry: The CMD profile schema may be expressed as XML Schema (3.3.11) but also in other XML schema
languages.
3.2.13
CMD root component
CMD component (3.2.3) that is defined at the highest level within a CMD profile (3.2.11) that may have
one or more child CMD components (3.2.3) but no siblings
Note 1 to entry: In the CMD instance payload (3.2.9), it is instantiated exactly once.
4 © ISO 2019 – All rights reserved

3.2.14
CMD specification
component specification
component definition
profile specification
profile definition
representation of a CMD component (3.2.3) or CMD profile (3.2.11), expressed using the constructs of
the CCSL (3.2.1)
3.2.15
CMD specification header
component header
profile header
section of a CMD specification (3.2.14) marked as ‘header’, providing information on that CMD
specification as such that is not part of the defined structure
3.2.16
CMDI
component metadata infrastructure
metadata description framework consisting of the CMD model (3.2.10) and infrastructure to process
instances of parts of the model
3.2.17
inline CMD component
CMD component (3.2.3) that is created and stored within another CMD component and cannot be
addressed from other CMD components
3.2.18
resource proxy
CMD resource reference
representation of a resource (3.1.10) within a CMD instance (3.2.6) containing a Uniform Resource
Identifier (3.1.13) as a reference to the resource itself and an indication of its nature
3.2.19
resource proxy reference
reference from any point within the CMD instance payload (3.2.9) to any of the resource proxy (3.2.18)
elements
3.2.20
value scheme
set of constraints governing the range of values allowed for a specific CMD element (3.2.5) or CMD
attribute (3.2.2) in a CMD instance (3.2.6), expressed in terms of an XML Schema datatype (3.3.12),
controlled vocabulary (3.1.4), or regular expression (3.3.3)
3.3 XML
3.3.1
foreign attribute
XML attribute (3.3.5) defined in a namespace (3.3.2) other than those declared in CMDI (3.2.16), to be
included in a CMD instance (3.2.6) as additional information targeted to specific receivers or applications
3.3.2
XML namespace
namespace
method for qualifying element and attribute names used in XML
[10]
Note 1 to entry: For the purposes of this document, it is as described in W3C XML Namespaces .
3.3.3
regular expression
sequence of characters that denote a set of strings
Note 1 to entry: When used to constrain a lexical space, a regular expression asserts that only strings in the
defined set of strings are valid literals for values of that type.
[12]
Note 2 to entry: See also W3C XSchema Part 2 , Appendix F.
3.3.4
XML
markup language for describing hierarchical structures within a text file
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the Extensible
[9]
Markup Language XML .
3.3.5
XML attribute
property of an XML element (3.3.9)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.6
XML attribute declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
attribute (3.3.5)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.2.
3.3.7
XML container element
XML element (3.3.9) that has one or more XML elements as its descendants
3.3.8
XML document
document represented in XML
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.9
XML element
constituent of an XML document (3.3.8)
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation for the extensible
[9]
Markup Language XML .
3.3.10
XML element declaration
constituent of an XML Schema (3.3.11) that constrains the structure and content of a specific XML
element (3.3.9)
[13]
Note 1 to entry: For the purposes of this document, it is as defined by W3C recommendation on XSD , Section 3.3.
3.3.11
XML Schema
document that complies with the XML Schema recommendation
[11]
Note 1 to entry: For the purposes of this document, it refers to the W3C XSchema Part 1 recommendation .
6 © ISO 2019 – All rights reserved

3.3.12
XML Schema datatype
predefined set of permissible content within an XML element (3.3.9) or an XML attribute (3.3.5) of an
XML document (3.3.8) used in an XML Schema (3.3.11)
[12]
Note 1 to entry: For the purposes of this document, it is as described in W3C XSchema Part 2 .
4 Notational and XML namespace conventions
The following notational conventions for XML fragments are used throughout this document:

an XML element with the generic identifier Element that is bound to a default XML namespace;

an XML element with the generic identifier Element that is bound to an XML namespace denoted by
the prefix prefix;

an XML element with a contextually specified identifier that is bound to an XML namespace denoted
by the prefix prefix;
— *
any number of XML elements with contextually specified identifiers that are bound to an XML
namespace denoted by the prefix prefix;
— @attr
an XML attribute with the name attr;
— @{attr}
an XML attribute with a contextually specified name;
— @{attr}*
any number of XML attributes with contextually specified names;
— @prefix: attr
an XML attribute with the name attr that is bound to an XML namespace denoted by the prefix prefix;
— string
the literal string shall be used either as element content or attribute value;
— xs: type
the XML schema type with name type.
The XML namespace names and prefixes given in Table 1 are used throughout this document as existing
suitable examples. The column “Recommended Syntax” indicates which syntax variant should be used
by the toolkit and other creators of CMDI related documents.
Table 1 — XML namespaces and prefixes used in this document as existing suitable examples
Prefix Namespace name Comment Recommended
syntax
cmd http: //www .clarin .eu/cmd/1 CMD instance prefixed
(general/envelope)
cmdp http: //www .clarin .eu/cmd/1/profiles/profileId CMDI payload prefixed
(CMD profile specific)
cue http: //www .clarin .eu/cmd/cues/1 Cues for tools prefixed
xs http: //www .w3 .org/2001/XMLSchema XML Schema prefixed
NOTE The inclusion of the major version number (i.e. 1) in the clarin.eu namespaces, but not the minor
version number reflects the approach that across minor versions within a major version of the CMDI specification,
the namespace is kept constant for compatibility reasons.
5 Structure of CMDI instances
5.1 General structure
Figure 2 — The structure of a CMD instance
See Figure 2, which uses the following colour scheme: Green boxes represent elements that are
potentially present in all CMD instances (the CMD instance envelope). Blue boxes and associations
8 © ISO 2019 – All rights reserved

represent elements defined by the CMD profile (the CMD instance payload). The diagram is meant for
overview and illustration; full details are found in Tables 2 to 15.
A CMD instance contains the actual metadata of one specific resource (hereafter referred to as the
described resource) and might also be referred to as a CMD record or CMD instance. All CMD instances
have the same structure at the top level (the CMD instance envelope). At a lower level, parts of its
structure are defined by the CMD profile upon which it is based (the CMD instance payload).
5.2 The main structure
A CMD instance has the (XML) root element with one attribute and 4 sub-elements that
appear in the mandatory order described in Table 2.
Table 2 — Root element: order of child elements
Name Value type Occurrences Description
xs:complex type
The (XML) root element of
the CMD instance.
@CMDVersion xs:string("1.2") 1 Denotes the CMDI version on
which this CMDI file is based.
xs:complexType
1 Encapsulates core admin-
istrative data about the
CMDI file.
xs:complexType
1 Includes 3 lists containing
information about resource
proxies and their interre-
lations.
xs:complexType
0 or 1 A list of
elements, each referencing
a larger external resource
of which the described
resource (as a whole) forms
a part.
xs:complexType
1 This element contains the
CMD profile specific section
of the CMD instance. Here
the descriptive metadata of
the resource are found.
The first three elements (, and ) constitute the
CMD instance envelope and reside in the cmd namespace. The CMD instance payload is contained in
the element, the elements of the instance payload (which is CMD profile specific)
exists in the CMD profile specific namespace (prefix cmdp), possibly adorned with attributes in the cmd
namespace.
In addition to this, foreign attributes (XML attributes of other namespaces than those defined in
Clause 4) may occur anywhere in , and elements
and on the element (but not on any of its children). These foreign namespaces
should be ignored by tools unrelated to the party associated with the namespace and therefore may be
removed during processing. The foreign namespace shall be representative of the party that introduces
the extension. For example, the namespace should not start with http://www.clarin.eu, http://
clarin.eu, etc. unless the foreign namespace is introduced by the owner of the domain clarin.eu.
A detailed specification of the above-mentioned parts of a CMD instance is given in the next four clauses.
EXAMPLE CMD instance envelope.
This example shows the main structure of a CMD instance.
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:cmd="http://www.clarin.eu/cmd/1"
xmlns:cmdp="http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1311927752306"
xsi:schemaLocation="http://www.clarin.eu/cmd/1
http://www.clarin.eu/cmd/1/xsd/cmd-envelop.xsd
http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1311927752306
https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/cla
rin.eu:cr1:p_1311927752306/1.2/xsd">





















5.3 The
element
The header of a CMD instance mainly contains administrative information about the metadata, that
is metadata about the CMD instance itself. The included elements shall follow the structure and order
described in Table 3.
Table 3 —
element: order of child elements
Name Value type Occurrences Description
xs:complexType Encapsulates core admin-
istrative data about the
CMD instance.
xs:string
0 to unbounded Denotes the creator of
this metadata file.
xs:date
0 or 1 The date this metadata
file was created.
xs:anyURI
0 or 1 A reference to this
metadata file in its home
repository, in the form of a
persistent identifier (REC-
OMMENDED) or a Uniform
Resource Identifier.
xs:anyURI
1 The CMD profile upon
which this metadata file
is based, given by its
identifier in a component
registry.
xs:string
0 or 1 The collection to which
the described resource
belongs, given as a
human-readable name.
Exploitation tools can
use this name to present
metadata collections.
EXAMPLE Header with foreign attribute.
10 © ISO 2019 – All rights reserved

This example shows the header of a CMD instance, including the use of a foreign attribute, i.e., containing the
ORCID id of the creator.

xmlns:orcid="http://www.orcid.org/ns/orcid">John Doe
2012-04-17
hdl:1234/567890
clarin.eu:cr1:p_1311927752306
CLARIN-NL web
services

5.4 The element
5.4.1 General structure of the element
This section of the CMDI file provides the sequence of
— files which are parts of or closely related to the described resource ( and
),
— possible relations between pairs of these files (),
and shall follow the structure and order described in Table 4.
Table 4 — element: order of child elements
Name Value type Occurrences Description
Includes 3 lists containing in-
xs:complexType formation about resource prox-
ies and their interrelations.
A list of
ele-
xs:complexType 1 ments, each referencing a file
contained in or closely related
to the described resource.
A list of

elements, each referencing a
xs:complexType 1
file (“journal file”) containing
provenance information about
the described resource.
A list of

elements, each representing
xs:complexType 1
a relationship between 2
resource files (as listed in the
)
5.4.2 The list of resource proxies
contains a sequence of zero or more occurrences of ,
each representing a file/part of the described resource, and shall follow the structure and order
described in Table 5.
Table 5 — element: order of child elements
Name Value type Occurrences Description
Contains a list of resource proxies
xs:complexType
(see next row).
Represents a file which is a part of
xs:complexType 0 to unbounded or closely related to the described
resource.
Local identifier for the parent
@id xs:ID
1 , unique
within this CMD instance.
xs:string
(“Resource”,
“Metadata”,
“LandingPage”,
The type of the file represented by

“SearchService”, 1
this .
“SearchPage”;
see the description for
each of the possible
values)
@mimetype xs:string
0 or 1 The media type of the file.
A reference to the file represented
by this ,
xs:anyURI 1 in the form of a persistent identi-
fier (RECOMMENDED) or a Uniform
Resource Identifier.
Resource types
— Resource
A resource that is described in the present CMD instance, e.g., a text document, media file or tool.
— Metadata
A metadata resource, i.e., another CMD instance, that is subordinate to the present CMD instance.
The media type of this metadata resource should be application/x-cmdi+xml.
— SearchPage
Resource that is a web page that allows the described resource to be queried by an end-user.
— SearchService
A resource that is a web service that allows the described resource to be queried by means of
dedicated software.
— LandingPage
A resource that is a web page that provides the original context of the described resource, e.g., a
“deep link” into a repository system.
The most general value of ResourceType is that of a resource. Other values are more specific and should
be selected only if they are applicable. This way, the value resource is consistent with the use of resource
in this document, usually not enclosing metadata, landing pages, etc. that are not part of the resource to
be described.
5.4.3 The list of journal files
contains a sequence of zero or more occurrences of
, each representing a file containing provenance information about the
described resource, and shall follow the structure and order described in Table 6.
12 © ISO 2019 – All rights reserved

Table 6 — element: order of child elements
Name Value type Occurrences Description
Contains a list of journal file
xs:complexType
proxies (see next row).
Represents a file contain-
ing provenance informa-
xs:complexType 0 to unbounded
tion about the described
resource.
A reference to the file
represented by this
,
xs:anyURI 1 in the form of a persistent
identifier (RECOMMENDED)
or a Uniform Resource
Identifier.
NOTE The actual content and layout of the journal file is outside the scope of this document.
5.4.4 The list of relations between resource files
contains a sequence of zero or more occurrences of
, each representing a relation between any pair of ,
and shall follow the structure and order described in Table 7.
If these parts are present they shall appear in the order given in Table 7.
Table 7 — element: order of child elements
Name Value type Occurrences Description
Contains a list of resource relations
xs:complexType
(see next row).
A representation of a relation be-
xs:complexType
0 to unbounded tween 2 resource proxies listed in
.
The type of the relation
xs:string 1 represented by its parent
.
A reference to some concept regis-
try (e.g. CLARIN Concept
@ConceptLink xs:anyURI 0 or 1
[17]
Registry ), indicating the seman-
tics of .
References one of the resource
xs:complexType 2 proxies participating in the rela-
tionship.
A reference to the

with id=ref (the
@ref xs:IDREF

represented by its parent
element).
Indicates the role its parent re-
xs:string
0 or 1
source plays in the relationship.
A reference to some concept regis-
try (e.g. CLARIN Concept
@ConceptLink xs:anyURI 0 or 1
[17]
Registry ), indicating the seman-
tics of .
EXAMPLE 1 Resources.
This example shows a list of resources of various types.



mimetype="application/x-httpd-php"
>LandingPage

http://hdl.handle.net/11858/00-1779-0000-0007-D919-0



mimetype="application/sru+xml"
>SearchService
>https://clarin.phonetik.uni-muenchen.de/BASSRU/


mimetype="application/x-cmdi+xml"
>Metadata

https://clarin.phonetik.uni-
muenchen.de/BASRepository/Public/Corpora/ZIPTEL/0001.1.cmdi.xml



Resource
hdl:1839/00-SERV-0000-0000-0009-D


mimetype="application/vnd.sun.wadl+xml"
>Resource

http://catalog.clarin.eu/adelheidws/wadl/main.wadl

...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...