ISO/IEC 19757-7:2009
(Main)Information technology - Document Schema Definition Languages (DSDL) - Part 7: Character Repertoire Description Language (CREPDL)
Information technology - Document Schema Definition Languages (DSDL) - Part 7: Character Repertoire Description Language (CREPDL)
ISO/IEC 19757 defines a set of Document Schema Definition Languages (DSDL) that can be used to specify one or more validation processes performed against Extensible Markup Language (XML) documents. ISO/IEC 19757-7:2009 specifies a Character Repertoire Description Language (CREPDL); a CREPDL schema describes a character repertoire. ISO/IEC 19757-7:2009 introduces kernels and hulls of repertoires, then specifies the syntax of CREPDL schemas and the semantics of a correct CREPDL schema; the semantics specify when a character is in a repertoire described by a CREPDL schema. ISO/IEC 19757-7:2009 defines CREPDL processors and their behaviour. Finally, it describes differences of conformant CREPDL processors, and provides examples of CREPDL schemas.
Technologies de l'information — Langages de définition de schéma de documents (DSDL) — Partie 7: Langage de description de répertoire de caractères (CREPDL)
General Information
Relations
Frequently Asked Questions
ISO/IEC 19757-7:2009 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Document Schema Definition Languages (DSDL) - Part 7: Character Repertoire Description Language (CREPDL)". This standard covers: ISO/IEC 19757 defines a set of Document Schema Definition Languages (DSDL) that can be used to specify one or more validation processes performed against Extensible Markup Language (XML) documents. ISO/IEC 19757-7:2009 specifies a Character Repertoire Description Language (CREPDL); a CREPDL schema describes a character repertoire. ISO/IEC 19757-7:2009 introduces kernels and hulls of repertoires, then specifies the syntax of CREPDL schemas and the semantics of a correct CREPDL schema; the semantics specify when a character is in a repertoire described by a CREPDL schema. ISO/IEC 19757-7:2009 defines CREPDL processors and their behaviour. Finally, it describes differences of conformant CREPDL processors, and provides examples of CREPDL schemas.
ISO/IEC 19757 defines a set of Document Schema Definition Languages (DSDL) that can be used to specify one or more validation processes performed against Extensible Markup Language (XML) documents. ISO/IEC 19757-7:2009 specifies a Character Repertoire Description Language (CREPDL); a CREPDL schema describes a character repertoire. ISO/IEC 19757-7:2009 introduces kernels and hulls of repertoires, then specifies the syntax of CREPDL schemas and the semantics of a correct CREPDL schema; the semantics specify when a character is in a repertoire described by a CREPDL schema. ISO/IEC 19757-7:2009 defines CREPDL processors and their behaviour. Finally, it describes differences of conformant CREPDL processors, and provides examples of CREPDL schemas.
ISO/IEC 19757-7:2009 is classified under the following ICS (International Classification for Standards) categories: 35.240.30 - IT applications in information, documentation and publishing. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 19757-7:2009 has the following relationships with other standards: It is inter standard links to ISO/IEC 19757-7:2020. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/IEC 19757-7:2009 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 19757-7
First edition
2009-12-15
Information technology — Document
Schema Definition Languages (DSDL) —
Part 7:
Character Repertoire Description
Language (CREPDL)
Technologies de l'information — Langages de définition de schéma de
documents (DSDL) —
Partie 7: Langage de description de répertoire de caractères (CREPDL)
Reference number
©
ISO/IEC 2009
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2009
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2009 – All rights reserved
Contents Page
Foreword .iv
Introduction.v
1 Scope.1
2 Normative references.1
3 Terms and definitions .2
4 Notation .2
5 Repertoire, kernel, and hull .2
6 Syntax.3
6.1 General.3
6.2 RELAX NG schema.3
6.3 NVDL script.4
6.4 Regular expressions .5
7 Semantics.6
7.1 General.6
7.2 char.6
7.3 union.7
7.4 intersection.7
7.5 difference.7
7.6 ref.8
7.7 repertoire.8
8 Validation.9
Annex A (informative) Differences of Conformant Processors.10
Annex B (informative) Example CREPDL schemas.11
Bibliography.15
© ISO/IEC 2009 – All rights reserved iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 19757-7 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 34, Document description and processing languages.
ISO/IEC 19757 consists of the following parts, under the general title Information technology — Document
Schema Definition Languages (DSDL):
⎯ Part 1: Overview
⎯ Part 2: Regular-grammar-based validation — RELAX NG
⎯ Part 3: Rule-based validation — Schematron
⎯ Part 4: Namespace-based Validation Dispatching Language (NVDL)
⎯ Part 5: Extensible datatypes
⎯ Part 7: Character Repertoire Description Language (CREPDL)
⎯ Part 8: Document Semantics Renaming Language (DSRL)
⎯ Part 9: Namespace and datatype declaration in Document Type Definitions (DTDs)
iv © ISO/IEC 2009 – All rights reserved
Introduction
ISO/IEC 19757 defines a set of Document Schema Definition Languages (DSDL) that can be used to specify
one or more validation processes performed against Extensible Markup Language (XML) documents. A
number of validation technologies are standardized in DSDL to complement those already available as
standards or from industry.
The main objective of ISO/IEC 19757 is to bring together different validation-related technologies to form a
single extensible framework that allows technologies to work in series or in parallel to produce a single or a
set of validation results. The extensibility of DSDL accommodates validation technologies not yet designed or
specified.
This part of ISO/IEC 19757 provides a language for describing character repertoires. Descriptions in this
language may be referenced from schemas. Furthermore, they may also be referenced from forms and
stylesheets.
NOTE At present, no schema languages provide mechanisms for referencing CREPDL schemas.
Descriptions of repertoires need not be exact. Non-exact descriptions are made possible by kernels and hulls,
which provide the lower and upper limits, respectively.
© ISO/IEC 2009 – All rights reserved v
INTERNATIONAL STANDARD ISO/IEC 19757-7:2009(E)
Information technology — Document Schema Definition
Languages (DSDL) —
Part 7:
Character Repertoire Description Language (CREPDL)
1 Scope
This part of ISO/IEC 19757 specifies a Character Repertoire Description Language (CREPDL); a CREPDL
schema describes a character repertoire. This part of ISO/IEC 19757 introduces kernels and hulls of
repertoires, then specifies the syntax of CREPDL schemas and the semantics of a correct CREPDL schema;
the semantics specify when a character is in a repertoire described by a CREPDL schema. This part of
ISO/IEC 19757 defines CREPDL processors and their behaviour. Finally, it describes differences of
conformant CREPDL processors, and provides examples of CREPDL schemas.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
NOTE Each of the following documents has a unique identifier that is used to cite the document in the text. The
unique identifier consists of the part of the reference up to the first comma.
ISO/IEC 10646, Information technology — Universal Multiple-Octet Coded Character Set (UCS)
ISO/IEC 19757-2, Information technology — Document Schema Definition Language (DSDL) — Part 2:
Regular-grammar-based validation — RELAX NG
ISO/IEC 19757-4, Information technology — Document Schema Definition Languages (DSDL) — Part 4:
Namespace-based Validation Dispatching Language (NVDL)
W3C XML, Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, 16 August 2006,
available at http://www.w3.org/TR/2006/REC-xml-20060816
W3C XML-Names, Namespaces in XML 1.0 (Second Edition), W3C Recommendation, 16 August 2006,
available at http://www.w3.org/TR/2006/REC-xml-names-20060816
W3C XML Schema Part 2, XML Schema Part 2: Datatypes (Second Edition), W3C Recommendation, 28
October 2004, available at http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
IETF RFC 3987, Internationalized Resource Identifiers (IRIs), Internet Standards Track Specification, January
2005, available at http://www.ietf.org/rfc/rfc3987.txt
IANA Charsets, IANA CHARACTER SETS, Internet Assigned Numbers Authority, available at
http://www.iana.org/assignments/character-sets
© ISO/IEC 2009 – All rights reserved 1
Unicode, The Unicode Standard, The Unicode Consortium, available at http://www.unicode.org/
CLDR, Unicode Common Locale Data Repository, The Unicode Consortium, available at
http://www.unicode.org/cldr/
3 Terms and definitions
For the purposes of this document, the terms “character” and “repertoire” as defined in ISO/IEC 10646 and the
following apply.
3.1
kernel
set of characters that are guaranteed to be in the repertoire
3.2
hull
set of characters that may be in the repertoire
4 Notation
in(x, A): character x is in the repertoire described by a CREPDL element A
not-in(x, A): character x is not in the repertoire described by a CREPDL element A
unknown(x, A): it is unknown whether character x is in the repertoire described by a CREPDL element A
5 Repertoire, kernel, and hull
A repertoire shall be described by specifying a kernel and hull. Kernels and hulls shall be sets of characters.
A character shall be in a repertoire when it is in the kernel. A sequence of characters shall be in a repertoire
when any of the characters is in the kernel.
A character shall not be in a repertoire when it is in neither the hull nor the kernel. A sequence of characters
shall be not in a repertoire when at least one of the characters is in neither the kernel nor the hull.
It shall be unknown whether or not a character is in a repertoire when it is in the hull but is not in the kernel. It
shall be unknown whether or not a sequence of characters is in a reperoire when at least one of the
characters is not in the kernel but any of the characters is in the hull or kernel.
NOTE 1 Kernel and hull are borrowed from W3C Note-charcol[3]. Some examples in Annex B also borrowed.
NOTE 2 It may be impossible to specify a repertoire exactly, since characters may continue to be added to the
repertoire. However, it is often possible to specify which character is absolutely included, and which character is absolutely
excluded. Kernels and hulls help to describe such open repertoires. A kernel is used to specify those characters which are
guaranteed to be in the repertoire, while a hull is used to specify an outer boundary. An example of such open repertoires
is shown in B.4.
NOTE 3 This part of ISO/IEC 19757 can handle sets of characters, but cannot handle sets of sequences of characters.
In other words, CREPDL schemas cannot indicate that a combining character is allowed only when it directly follows some
base character. Likewise, CREPDL schemas cannot handle named sequences, but can only handle characters occurring
in named sequences. It is believed that this part of ISO/IEC 19757 needs this limitation, since implementations become
significantly easier.
NOTE 4 It is possible but not recommended to specify a hull that disallows some character in the corresponding kernel.
Note that the condition that a character is in a repertoire does not mention the hull.
2 © ISO/IEC 2009 – All rights reserved
6 Syntax
6.1 General
A CREPDL schema shall be an XML document (W3C XML) valid against the the NVDL (ISO/IEC 19757-4)
script in 6.3, which in turn relies on the RELAX NG (ISO/IEC 19757-2) schema in 6.2. The elements
allowed in the RELAX NG schema are of the name space (W3C XML-Names)
http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0. Further constraints on the character
content of the char, kernel or hull elements are shown in 6.4
NOTE 1 W3C XML 1.1[6] shall not be used for representing CREPDL schemas.
NOTE 2 W3C XML specifies that characters in XML documents are either U+0009 (CHARACTER TABULATION),
U+000A (LINE FEED), U+000D (CARRIAGE RETURN), or a character in the ranges from U+0020 to U+D7FF, U+E000 to
U+FFFD, or U+10000 to U+10FFFF. In other words, XML documents cannot contain U+0000, U+0001, U+0002, U+0003,
U+0004, U+0005, U+0006, U+0007, U+0008, U+000B, U+000C, U+000E, U+000F, U+0010, U+0011, U+0012, U+0013,
U+0014, U+0015, U+0016, U+0017, U+0018, U+0019, U+001A, U+001B, U+001C, U+001D, U+001E, or U+001F. Since
CREPDL schemas are represented by XML documents, these characters cannot directly occur in CREPDL schemas.
6.2 RELAX NG schema
#$Id: crepdl.rnc 5 2009-05-02 09:48:49Z makoto $
#
# The following permission notice and disclaimer shall be included in all
# copies of this schema ("the Schema"), and derivations of the Schema:
#
# Permission is hereby granted, free of charge in perpetuity, to any
# person obtaining a copy of the Schema, to use, copy, modify, merge and
# distribute free of charge, copies of the Schema for the purposes of
# developing, implementing, installing and using software based on the
# Schema, and to permit persons to whom the Schema is furnished to do so,
# subject to the following conditions:
#
# THE SCHEMA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SCHEMA OR THE USE OR
# OTHER DEALINGS IN THE SCHEMA.
#
# In addition, any modified copy of the Schema shall include the following
# notice:
#
# THIS SCHEMA HAS BEEN MODIFIED FROM THE SCHEMA DEFINED IN ISO/IEC 19757-7,
# AND SHOULD NOT BE INTERPRETED AS COMPLYING WITH THAT STANDARD.
default namespace = "http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0"
start = coll
coll =
union | intersection | difference | ref | repertoire | char
union = element union { commonAtts, coll+ }
intersection = element intersection { commonAtts, coll+ }
difference = element difference { commonAtts, coll+ }
ref =
element ref {
commonAtts,
attribute href { xsd:anyURI }
}
repertoire =
element repertoire {
commonAtts,
© ISO/IEC 2009 – All rights reserved 3
attribute registry { text },
attribute version { text }?,
(attribute name { text } | attribute number {xsd:int} }
char =
element char {
commonAtts,
(text
| element kernel { commonAtts, text }
| element hull { commonAtts, text }
| (element kernel { commonAtts, text },
element hull { commonAtts, text }))
}
commonAtts =
attribute minUcsVersion { text }?,
attribute maxUcsVersion { text }?
# Note that xml:id is allowed, since any foreign attribute is
# allowed by the NVDL script.
The value of a minUcsVersion or maxUcsVersion attribute shall be a string indicating a verion number of
the Unicode standard, possibly having leading or trailing whitespace.
6.3 NVDL script
schemaType="application/relax-ng-compact-syntax">
4 © ISO/IEC 2009 – All rights reserved
NOTE This NVDL script allows foreign elements and attributes everywhere.
6.4 Regular expressions
The character content of a char, kernel or hull element shall be a regular expression that matches either
Char or charClass as specified in W3C XML Schema Part 2.
NOTE 1 Since this part of ISO/IEC 19757 uses regular expressions for representing sets of characters rather than sets
of strings, regular expressions are restricted to Char and charClass.
NOTE 2 The following rules are duplicated from W3C XML Schema Part 2 for information. The semantics of [29]
through [37] depend on the version of Unicode.
[10] Char ::= [^.\?*+()|#x5B#x5D]
[11] charClass ::= charClassEsc | charClassExpr | WildcardEsc
[12] charClassExpr ::= '[' charGroup ']'
[13] charGroup ::= posCharGroup | negCharGroup | charClassSub
[14] posCharGroup ::= ( charRange | charClassEsc )+
[15] negCharGroup ::= '^' posCharGroup
[16] charClassSub ::= ( posCharGroup | negCharGroup )
'-' charClassExpr
[17] charRange ::= seRange | XmlCharIncDash
[18] seRange ::= charOrEsc '-' charOrEsc
[20] charOrEsc ::= XmlChar | SingleCharEsc
[21] XmlChar ::= [^\#x2D#x5B#x5D]
[22] XmlCharIncDash ::= [^\#x5B#x5D]
[23] charClassEsc ::= ( SingleCharEsc | MultiCharEsc
| catEsc | complEsc )
[24] SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
[25] catEsc ::= '\p{' charProp '}'
[26] complEsc ::= '\P{' charProp '}'
[27] charProp ::= IsCategory | IsBlock
[28] IsCategory ::= Letters | Marks | Numbers
| Punctuation | Separators | Symbols | Others
[29] Letters ::= 'L' [ultmo]?
[30] Marks ::= 'M' [nce]?
[31] Numbers ::= 'N' [dlo]?
[32] Punctuation ::= 'P' [cdseifo]?
[33] Separators ::= 'Z' [slp]?
[34] Symbols ::= 'S' [mcko]?
[35] Others ::= 'C' [cfon]?
[36] IsBlock ::= 'Is' [a-zA-Z0-9#x2D]+
[37] MultiCharEsc ::= '\' [sSiIcCdDwW]
[37a] WildcardEsc ::= '.'
NOTE 3 Since W3C REC-xpath-functions[4] extends the definition of regular expressions in W3C XML Schem
...
INTERNATIONAL ISO/IEC
STANDARD 19757-7
First edition
2009-12-15
Information technology — Document
Schema Definition Languages (DSDL) —
Part 7:
Character Repertoire Description
Language (CREPDL)
Technologies de l'information — Langages de définition de schéma de
documents (DSDL) —
Partie 7: Langage de description de répertoire de caractères (CREPDL)
Reference number
©
ISO/IEC 2009
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2009
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2009 – All rights reserved
Contents Page
Foreword .iv
Introduction.v
1 Scope.1
2 Normative references.1
3 Terms and definitions .2
4 Notation .2
5 Repertoire, kernel, and hull .2
6 Syntax.3
6.1 General.3
6.2 RELAX NG schema.3
6.3 NVDL script.4
6.4 Regular expressions .5
7 Semantics.6
7.1 General.6
7.2 char.6
7.3 union.7
7.4 intersection.7
7.5 difference.7
7.6 ref.8
7.7 repertoire.8
8 Validation.9
Annex A (informative) Differences of Conformant Processors.10
Annex B (informative) Example CREPDL schemas.11
Bibliography.15
© ISO/IEC 2009 – All rights reserved iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 19757-7 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 34, Document description and processing languages.
ISO/IEC 19757 consists of the following parts, under the general title Information technology — Document
Schema Definition Languages (DSDL):
⎯ Part 1: Overview
⎯ Part 2: Regular-grammar-based validation — RELAX NG
⎯ Part 3: Rule-based validation — Schematron
⎯ Part 4: Namespace-based Validation Dispatching Language (NVDL)
⎯ Part 5: Extensible datatypes
⎯ Part 7: Character Repertoire Description Language (CREPDL)
⎯ Part 8: Document Semantics Renaming Language (DSRL)
⎯ Part 9: Namespace and datatype declaration in Document Type Definitions (DTDs)
iv © ISO/IEC 2009 – All rights reserved
Introduction
ISO/IEC 19757 defines a set of Document Schema Definition Languages (DSDL) that can be used to specify
one or more validation processes performed against Extensible Markup Language (XML) documents. A
number of validation technologies are standardized in DSDL to complement those already available as
standards or from industry.
The main objective of ISO/IEC 19757 is to bring together different validation-related technologies to form a
single extensible framework that allows technologies to work in series or in parallel to produce a single or a
set of validation results. The extensibility of DSDL accommodates validation technologies not yet designed or
specified.
This part of ISO/IEC 19757 provides a language for describing character repertoires. Descriptions in this
language may be referenced from schemas. Furthermore, they may also be referenced from forms and
stylesheets.
NOTE At present, no schema languages provide mechanisms for referencing CREPDL schemas.
Descriptions of repertoires need not be exact. Non-exact descriptions are made possible by kernels and hulls,
which provide the lower and upper limits, respectively.
© ISO/IEC 2009 – All rights reserved v
INTERNATIONAL STANDARD ISO/IEC 19757-7:2009(E)
Information technology — Document Schema Definition
Languages (DSDL) —
Part 7:
Character Repertoire Description Language (CREPDL)
1 Scope
This part of ISO/IEC 19757 specifies a Character Repertoire Description Language (CREPDL); a CREPDL
schema describes a character repertoire. This part of ISO/IEC 19757 introduces kernels and hulls of
repertoires, then specifies the syntax of CREPDL schemas and the semantics of a correct CREPDL schema;
the semantics specify when a character is in a repertoire described by a CREPDL schema. This part of
ISO/IEC 19757 defines CREPDL processors and their behaviour. Finally, it describes differences of
conformant CREPDL processors, and provides examples of CREPDL schemas.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
NOTE Each of the following documents has a unique identifier that is used to cite the document in the text. The
unique identifier consists of the part of the reference up to the first comma.
ISO/IEC 10646, Information technology — Universal Multiple-Octet Coded Character Set (UCS)
ISO/IEC 19757-2, Information technology — Document Schema Definition Language (DSDL) — Part 2:
Regular-grammar-based validation — RELAX NG
ISO/IEC 19757-4, Information technology — Document Schema Definition Languages (DSDL) — Part 4:
Namespace-based Validation Dispatching Language (NVDL)
W3C XML, Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, 16 August 2006,
available at http://www.w3.org/TR/2006/REC-xml-20060816
W3C XML-Names, Namespaces in XML 1.0 (Second Edition), W3C Recommendation, 16 August 2006,
available at http://www.w3.org/TR/2006/REC-xml-names-20060816
W3C XML Schema Part 2, XML Schema Part 2: Datatypes (Second Edition), W3C Recommendation, 28
October 2004, available at http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
IETF RFC 3987, Internationalized Resource Identifiers (IRIs), Internet Standards Track Specification, January
2005, available at http://www.ietf.org/rfc/rfc3987.txt
IANA Charsets, IANA CHARACTER SETS, Internet Assigned Numbers Authority, available at
http://www.iana.org/assignments/character-sets
© ISO/IEC 2009 – All rights reserved 1
Unicode, The Unicode Standard, The Unicode Consortium, available at http://www.unicode.org/
CLDR, Unicode Common Locale Data Repository, The Unicode Consortium, available at
http://www.unicode.org/cldr/
3 Terms and definitions
For the purposes of this document, the terms “character” and “repertoire” as defined in ISO/IEC 10646 and the
following apply.
3.1
kernel
set of characters that are guaranteed to be in the repertoire
3.2
hull
set of characters that may be in the repertoire
4 Notation
in(x, A): character x is in the repertoire described by a CREPDL element A
not-in(x, A): character x is not in the repertoire described by a CREPDL element A
unknown(x, A): it is unknown whether character x is in the repertoire described by a CREPDL element A
5 Repertoire, kernel, and hull
A repertoire shall be described by specifying a kernel and hull. Kernels and hulls shall be sets of characters.
A character shall be in a repertoire when it is in the kernel. A sequence of characters shall be in a repertoire
when any of the characters is in the kernel.
A character shall not be in a repertoire when it is in neither the hull nor the kernel. A sequence of characters
shall be not in a repertoire when at least one of the characters is in neither the kernel nor the hull.
It shall be unknown whether or not a character is in a repertoire when it is in the hull but is not in the kernel. It
shall be unknown whether or not a sequence of characters is in a reperoire when at least one of the
characters is not in the kernel but any of the characters is in the hull or kernel.
NOTE 1 Kernel and hull are borrowed from W3C Note-charcol[3]. Some examples in Annex B also borrowed.
NOTE 2 It may be impossible to specify a repertoire exactly, since characters may continue to be added to the
repertoire. However, it is often possible to specify which character is absolutely included, and which character is absolutely
excluded. Kernels and hulls help to describe such open repertoires. A kernel is used to specify those characters which are
guaranteed to be in the repertoire, while a hull is used to specify an outer boundary. An example of such open repertoires
is shown in B.4.
NOTE 3 This part of ISO/IEC 19757 can handle sets of characters, but cannot handle sets of sequences of characters.
In other words, CREPDL schemas cannot indicate that a combining character is allowed only when it directly follows some
base character. Likewise, CREPDL schemas cannot handle named sequences, but can only handle characters occurring
in named sequences. It is believed that this part of ISO/IEC 19757 needs this limitation, since implementations become
significantly easier.
NOTE 4 It is possible but not recommended to specify a hull that disallows some character in the corresponding kernel.
Note that the condition that a character is in a repertoire does not mention the hull.
2 © ISO/IEC 2009 – All rights reserved
6 Syntax
6.1 General
A CREPDL schema shall be an XML document (W3C XML) valid against the the NVDL (ISO/IEC 19757-4)
script in 6.3, which in turn relies on the RELAX NG (ISO/IEC 19757-2) schema in 6.2. The elements
allowed in the RELAX NG schema are of the name space (W3C XML-Names)
http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0. Further constraints on the character
content of the char, kernel or hull elements are shown in 6.4
NOTE 1 W3C XML 1.1[6] shall not be used for representing CREPDL schemas.
NOTE 2 W3C XML specifies that characters in XML documents are either U+0009 (CHARACTER TABULATION),
U+000A (LINE FEED), U+000D (CARRIAGE RETURN), or a character in the ranges from U+0020 to U+D7FF, U+E000 to
U+FFFD, or U+10000 to U+10FFFF. In other words, XML documents cannot contain U+0000, U+0001, U+0002, U+0003,
U+0004, U+0005, U+0006, U+0007, U+0008, U+000B, U+000C, U+000E, U+000F, U+0010, U+0011, U+0012, U+0013,
U+0014, U+0015, U+0016, U+0017, U+0018, U+0019, U+001A, U+001B, U+001C, U+001D, U+001E, or U+001F. Since
CREPDL schemas are represented by XML documents, these characters cannot directly occur in CREPDL schemas.
6.2 RELAX NG schema
#$Id: crepdl.rnc 5 2009-05-02 09:48:49Z makoto $
#
# The following permission notice and disclaimer shall be included in all
# copies of this schema ("the Schema"), and derivations of the Schema:
#
# Permission is hereby granted, free of charge in perpetuity, to any
# person obtaining a copy of the Schema, to use, copy, modify, merge and
# distribute free of charge, copies of the Schema for the purposes of
# developing, implementing, installing and using software based on the
# Schema, and to permit persons to whom the Schema is furnished to do so,
# subject to the following conditions:
#
# THE SCHEMA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SCHEMA OR THE USE OR
# OTHER DEALINGS IN THE SCHEMA.
#
# In addition, any modified copy of the Schema shall include the following
# notice:
#
# THIS SCHEMA HAS BEEN MODIFIED FROM THE SCHEMA DEFINED IN ISO/IEC 19757-7,
# AND SHOULD NOT BE INTERPRETED AS COMPLYING WITH THAT STANDARD.
default namespace = "http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0"
start = coll
coll =
union | intersection | difference | ref | repertoire | char
union = element union { commonAtts, coll+ }
intersection = element intersection { commonAtts, coll+ }
difference = element difference { commonAtts, coll+ }
ref =
element ref {
commonAtts,
attribute href { xsd:anyURI }
}
repertoire =
element repertoire {
commonAtts,
© ISO/IEC 2009 – All rights reserved 3
attribute registry { text },
attribute version { text }?,
(attribute name { text } | attribute number {xsd:int} }
char =
element char {
commonAtts,
(text
| element kernel { commonAtts, text }
| element hull { commonAtts, text }
| (element kernel { commonAtts, text },
element hull { commonAtts, text }))
}
commonAtts =
attribute minUcsVersion { text }?,
attribute maxUcsVersion { text }?
# Note that xml:id is allowed, since any foreign attribute is
# allowed by the NVDL script.
The value of a minUcsVersion or maxUcsVersion attribute shall be a string indicating a verion number of
the Unicode standard, possibly having leading or trailing whitespace.
6.3 NVDL script
schemaType="application/relax-ng-compact-syntax">
4 © ISO/IEC 2009 – All rights reserved
NOTE This NVDL script allows foreign elements and attributes everywhere.
6.4 Regular expressions
The character content of a char, kernel or hull element shall be a regular expression that matches either
Char or charClass as specified in W3C XML Schema Part 2.
NOTE 1 Since this part of ISO/IEC 19757 uses regular expressions for representing sets of characters rather than sets
of strings, regular expressions are restricted to Char and charClass.
NOTE 2 The following rules are duplicated from W3C XML Schema Part 2 for information. The semantics of [29]
through [37] depend on the version of Unicode.
[10] Char ::= [^.\?*+()|#x5B#x5D]
[11] charClass ::= charClassEsc | charClassExpr | WildcardEsc
[12] charClassExpr ::= '[' charGroup ']'
[13] charGroup ::= posCharGroup | negCharGroup | charClassSub
[14] posCharGroup ::= ( charRange | charClassEsc )+
[15] negCharGroup ::= '^' posCharGroup
[16] charClassSub ::= ( posCharGroup | negCharGroup )
'-' charClassExpr
[17] charRange ::= seRange | XmlCharIncDash
[18] seRange ::= charOrEsc '-' charOrEsc
[20] charOrEsc ::= XmlChar | SingleCharEsc
[21] XmlChar ::= [^\#x2D#x5B#x5D]
[22] XmlCharIncDash ::= [^\#x5B#x5D]
[23] charClassEsc ::= ( SingleCharEsc | MultiCharEsc
| catEsc | complEsc )
[24] SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
[25] catEsc ::= '\p{' charProp '}'
[26] complEsc ::= '\P{' charProp '}'
[27] charProp ::= IsCategory | IsBlock
[28] IsCategory ::= Letters | Marks | Numbers
| Punctuation | Separators | Symbols | Others
[29] Letters ::= 'L' [ultmo]?
[30] Marks ::= 'M' [nce]?
[31] Numbers ::= 'N' [dlo]?
[32] Punctuation ::= 'P' [cdseifo]?
[33] Separators ::= 'Z' [slp]?
[34] Symbols ::= 'S' [mcko]?
[35] Others ::= 'C' [cfon]?
[36] IsBlock ::= 'Is' [a-zA-Z0-9#x2D]+
[37] MultiCharEsc ::= '\' [sSiIcCdDwW]
[37a] WildcardEsc ::= '.'
NOTE 3 Since W3C REC-xpath-functions[4] extends the definition of regular expressions in W3C XML Sche
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...