ISO/IEC 30112:2020
(Main)Information technology — Specification methods for cultural conventions
Information technology — Specification methods for cultural conventions
This document specifies description formats and functionality for the specification of cultural conventions, description formats for character sets, and description formats for binding character names to ISO/IEC 10646, as well as a set of default values for some of these items.
Technologies de l'information — Méthodes de spécification des conventions culturelles
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 30112
First edition
2020-09
Information technology —
Specification methods for cultural
conventions
Technologies de l'information — Méthodes de spécification des
conventions culturelles
Reference number
ISO/IEC 30112:2020(E)
©
ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC 30112:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 30112:2020(E)
Contents Page
Foreword . v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Bytes and characters . 2
3.2 Cultural and other major concepts . 2
3.3 FDCC-related categories . 3
4 Notations . 3
4.1 Notation for defining syntax . 3
4.2 Portable character set . 4
5 FDCC-set . 6
5.1 General . 6
5.2 FDCC-set description . 7
5.2.1 General . 7
5.2.2 Character representation . 8
5.2.3 Continuation of lines . 9
5.2.4 Names for copy keyword . 9
5.2.5 Pre-category statements . 9
5.3 LC_IDENTIFICATION . 10
5.4 LC_CTYPE . 12
5.4.1 General . 12
5.4.2 Character classification keywords . 13
5.4.3 Character string transliteration . 17
5.4.4 "i18n" LC_CTYPE category. 17
5.5 LC_COLLATE . 42
5.5.1 General . 42
5.5.2 Collation statements . 44
5.5.3 "copy" keyword . 46
5.5.4 "coll_weight_max" keyword . 46
5.5.5 "section-symbol" keyword . 47
5.5.6 "collating-element" keyword . 47
5.5.7 "collating-symbol" keyword . 47
5.5.8 "symbol-equivalence" keyword . 48
5.5.9 "order_start" keyword . 48
5.5.10 "order_end" keyword . 49
5.5.11 "reorder-after" keyword . 49
5.5.12 "reorder-end" keyword . 50
5.5.13 "section" keyword . 50
5.5.14 "reorder-section-after" keyword . 51
5.6 LC_MONETARY . 53
5.7 LC_NUMERIC . 57
5.8 LC_TIME . 58
5.8.1 General . 58
iii
© ISO/IEC 2020 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/IEC 30112:2020(E)
5.8.2 Date field descriptors . 62
5.8.3 Modified field descriptors . 63
5.8.4 "i18n" LC_TIME category . 64
5.9 LC_MESSAGES . 65
5.10 LC_XLITERATE . 65
5.10.1 General . 65
5.10.2 Transliteration statements . 66
5.10.3 "include" keyword . 67
5.10.4 Example of use of transliteration . 67
5.11 LC_NAME . 68
5.12 LC_ADDRESS . 69
5.13 LC_TELEPHONE . 72
5.14 LC_PAPER . 73
5.15 LC_MEASUREMENT . 73
5.16 LC_KEYBOARD . 74
6 CHARMAP . 74
6.1 General . 74
6.2 Character Set Description Text . 74
7 Repertoiremap . 79
8 Functionality . 117
8.1 General . 117
8.2 The “strpcoll” function . 117
8.3 The “setmedia” function. 118
8.4 String, encoding, repertoire and locale data types . 118
8.4.1 General . 118
8.4.2 String data type . 118
8.4.3 Encoding data type . 118
8.4.4 Repertoire data type . 121
8.4.5 Locale data type . 121
8.4.6 Character handling . 123
8.4.7 String comparison . 124
8.4.8 Message formatting . 125
8.4.9 Conversion between string and other data types . 127
8.4.10 Utilities . 131
9 Messages format . 133
Annex A (informative) Differences from ISO/IEC/IEEE 9945 . 134
Annex B (informative) Rationale . 136
Annex C (informative) BNF grammar . 149
Annex D (informative) Relation to taxonomy . 155
Annex E (informative) Implementation in glibc. 158
Annex F (informative) Relation between categories and keywords, and APIs . 159
Annex G (informative) Bindings guidelines . 160
© ISO/IEC 2020 – All rights reserved
iv
---------------------- Page: 4 ----------------------
ISO/IEC 30112:2020(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the
IEC list of patent declarations received (see http://patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT)
see www.iso.org/iso/foreword.html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 35, User interfaces.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
© ISO/IEC 2020 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/IEC 30112:2020(E)
Introduction
This document defines general mechanisms to specify cultural conventions. It also defines formats for a
number of specific cultural conventions in the areas of character classification and conversion, sorting,
number formatting, monetary formatting, date formatting, message display, addressing of persons,
postal address formatting, and telephone number handling.
The benefits from this document are:
Rigid specification Using this document, a user can rigidly specify a number of the cultural
conventions that apply to their information technology environment.
Cultural adaptability If an application has been designed and built in a culturally neutral
manner, the application can use the specifications as data to its
application programming interfaces (APIs), and thus the same
application can accommodate different users in a culturally acceptable
way to each of the users, without change of the binary application.
Productivity This document specifies cultural conventions and how to specify data for
them. With that data, an application developer is released from getting
the different information to support all the cultural environments for the
expected customers of the product. The application developer is assured
of culturally correct behaviour as specified by the customer, and more
markets can potentially be reached as customers can provide the data
themselves for markets that were not targeted.
Uniform behaviour When a number of applications share one cultural specification, which
may be supplied from the user or provided by the application or
operating system, their behaviour for cultural adaptation becomes
uniform.
The specification formats are independent of platforms and specific encoding and they are designed to
be usable from a wide range of programming languages.
A number of cultural conventions, such as spelling, hyphenation rules and terminology, are not
specifiable with this document, but the document provides mechanisms to define new categories and
also new keywords within existing categories. An internationalized application can take advantage of
information provided with the FDCC-set (such as the language) to provide further internationalized
services to the user.
This document defines a format compatible with the one used in ISO/IEC 14651.
This document is upward compatible with elements of ISO/IEC/IEEE 9945, especially those on POSIX
locales and charmaps – a locale or charmap conformant to POSIX specifications will also be conformant
to specifications in this document, while the reverse condition will not hold. Some of the descriptions
are intended to be coded in text files to be used via APIs developed for a number of systems which
comply with ISO/IEC/IEEE 9945.
This document has enhanced functionality in a number of areas such as ISO/IEC 10646 support, more
classification of characters, transliteration, dual (multi) currency support, enhanced date and time
formatting, personal name writing, postal address formatting, telephone number handling, keyboard
handling, and management of categories. There is enhanced support for character sets including
ISO/IEC 2022 handling and an enhanced method to separate the specification of cultural conventions
from an actual encoding via a description of the character repertoire employed. A standard set of values
for all the categories has been defined covering the repertoire of ISO/IEC 10646.
© ISO/IEC 2020 – All rights reserved
vi
---------------------- Page: 6 ----------------------
ISO/IEC 30112:2020(E)
This document has been developed to align with ISO/IEC/IEEE 9945. The major extensions from
ISO/IEC/IEEE 9945 are listed in Annex A.
A rationale for elements of this document is found in Annex B.
A BNF specification of the syntax for formats in this document is given in Annex C.
The relation to the taxonomy of ISO/IEC TR 24785 is listed in Annex D.
A listing of the implementation of the specifications of this document in the GNU libc compiler product
is given in Annex E.
The relation between formats and APIs of this document is listed in Annex F.
A guideline for a method to bind APIs of other programming languages to APIs defined in this document
is specified in Annex G.
vii
© ISO/IEC 2020 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/IEC 30112:2020(E)
Information technology — Specification methods for
cultural conventions
1 Scope
This document specifies description formats and functionality for the specification of cultural
conventions, description formats for character sets, and description formats for binding character
names to ISO/IEC 10646, as well as a set of default values for some of these items.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 639 (all parts), Codes for the representation of names of languages
ISO/IEC 2022, Information technology — Character code structure and extension techniques
ISO 3166 (all parts), Codes for the representation of names of countries and their subdivisions
ISO 4217, Codes for the representation of currencies
ISO 8601, Date and time — Representations for information interchange
ISO/IEC 9899, Information technology — Programming languages — C
ISO/IEC/IEEE 9945, Information technology — Portable Operating System Interface (POSIX) Base
Specifications, Issue 7
ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)
ISO/IEC 14651, Information technology — International string ordering and comparison — Method for
comparing character strings and description of the common template tailorable ordering
ISO/IEC 15897:2011, Information technology — User interfaces — Procedures for the registration of
cultural elements
ISO 15924, Information and documentation — Codes for the representation of names of scripts
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp
— IEC Electropedia: available at http://www.electropedia.org/
1
© ISO/IEC 2020 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/IEC 30112:2020(E)
3.1 Bytes and characters
3.1.1
byte
individually addressable unit of data storage that is equal to or larger than an octet, used to store a
character or a portion of a character
Note 1 to entry: A byte is composed of a contiguous sequence of bits, the number of which is implementation
defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit.
3.1.2
character
member of a set of elements used for the organization, control or representation of data
3.1.3
coded character
sequence of one or more bytes representing a single character
3.1.4
text file
file that contains characters organized into one or more lines
3.2 Cultural and other major concepts
3.2.1
cultural convention
data item for information technology that may vary dependent on language, territory, or other cultural
habits
3.2.2
FDCC
formal definition of a cultural convention
cultural convention put into a formal definition scheme
3.2.3
FDCC-set
set of FDCCs
subset of a user's information technology environment that depends on language and cultural
conventions
Note 1 to entry: The FDCC-set is a superset of the "locale" term in C and POSIX.
3.2.4
charmap
definition of a mapping between symbolic character names and character codes, plus related
information
3.2.5
repertoiremap
definition of a mapping between symbolic character names and characters for the repertoire of
characters used in a FDCC-set
Note 1 to entry: This is further described in Clause 7.
© ISO/IEC 2020 – All rights reserved
2
---------------------- Page: 9 ----------------------
ISO/IEC 30112:2020(E)
3.3 FDCC-related categories
3.3.1
character class
named set of characters sharing an attribute associated with the name of the class
3.3.2
collation
logical ordering of strings according to defined precedence rules
3.3.3
collating element
smallest entity used to determine logical ordering
Note 1 to entry: See collating sequence. A collating element consists of either a single character, or two or more
characters collating as a single entity. The LC_COLLATE category in the associated FDCC-set determines the set of
collating elements.
3.3.4
multicharacter collating element
sequence of two or more characters that collate as an entity
Note 1 to entry: For example, in some languages two characters are sorted as one letter, as in the case for Danish
and Norwegian "aa".
3.3.5
collating sequence
relative order of collating elements as determined by the setting of the LC_COLLATE category in the
applied FDCC-set
3.3.6
equivalence class
set of collating elements with the same primary collation weight
Note 1 to entry: Elements in an equivalence class are typically elements that naturally group together, such as all
accented letters based on the same letter. The collation order of elements within an equivalence class is
determined by the weights assigned on any subsequent levels after the primary weight.
4 Notations
4.1 Notation for defining syntax
In this document, the description of an individual record in a FDCC-set is done using the syntax notation
given in the following.
The syntax notation:
"",[,,.,]
The is given in a format string enclosed in double quotes, followed by a number of
parameters, separated by commas. It is similar to the format specification defined in
ISO/IEC/IEEE 9945 and the format specification used in C language printf() function. The format of
each parameter is given by an escape sequence:
%s specifies a string
%d specifies a decimal integer
3
© ISO/IEC 2020 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/IEC 30112:2020(E)
%c specifies a character
%o specifies an octal integer
%x specifies a hexadecimal integer
A " " (an empty character position) in the syntax string represents one or more characters.
All other characters in the format string represent themselves, except:
%% specifies a single %
\n specifies an end-of-line
The notation "." is used to specify that repetition of the previous specification is optional, and this is
done in both the format string and in the parameter list.
4.2 Portable character set
A set of symbolic names for characters in Table 1, which is called the portable character set, is used in
character description text of this specification. The first eight entries in Table 1 are defined in
ISO/IEC 6429 and the rest are defined in ISO/IEC/IEEE 9945 with some additional definitions from
ISO/IEC 10646.
Table 1 — Portable character set
Symbolic name Glyph UCS Description
NULL (NUL)
BELL (BEL)
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.