Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet

This document specifies the sequence of characters to be used in the alphabetical ordering of multilingual terminological and lexicographical data (terms, term elements, or words) represented in the Latin alphabet. Character sets of languages represented in the Latin alphabet are taken into account insofar as terminological or lexicographical data have been recorded. Character sets used in internationally standardized transliteration into Latin script are also taken into account. The sequence of alphabetical characters given is intended for multilingual purposes only and is not intended to affect the alphabetical order of any specific language. The main part of this document specifies letter-by-letter ordering of character strings. Annex A treats word-by-word ordering, which is a widely used alternative to this system. Annex B gives two additional rules that can be useful for lexicographical and terminological ordering. Annex C gives ordering rules for chemical names. Annex D lists the character repertoire of the Latin alphabet. Annex E lists languages using the Latin alphabet. Annex F gives alphabetical sequences derived from the sequence specified in this document for a number of languages that use the Latin alphabet. Annex G gives a formal description of the rules laid down in the main part of this document conforming with ISO/IEC 14651.

Mise en ordre alphabétique des données lexicographiques et terminologiques multilingues représentées dans l'alphabet latin

Abecedno urejanje večjezičnih terminoloških in leksikografskih podatkov, predstavljenih v latinici

General Information

Status
Published
Publication Date
13-Jun-2022
Current Stage
9599 - Withdrawal of International Standard
Start Date
15-Apr-2024
Completion Date
13-Dec-2025

Relations

Effective Date
06-Jun-2022

Overview

ISO 12199:2022 specifies a practical, language-neutral method for alphabetical ordering (collation) of multilingual terminological and lexicographical data represented in the Latin alphabet. It defines the sequence of characters and multi-level ordering rules to ensure consistent sorting of terms, term elements and words across systems, databases and printed lists. The standard is intended for multilingual environments and does not replace language‑specific collation rules.

Key topics and technical requirements

  • Character sequence and basic order: Defines ordering of digits (0–9) and basic Latin letters (a–z, with case equivalence). The sequence is designed to minimize conflicts between languages in multilingual resources.
  • Multi-level ordering: Specifies first to fourth ordering levels:
    • First level: primary letter-by-letter order (letters and digits).
    • Second level: diacritical marks and special Latin letters treated relative to base letters.
    • Third level: capitalization differences.
    • Fourth level: special characters and punctuation.
  • Equivalence mappings: Special Latin letters and letters with diacritics are mapped to corresponding basic Latin letters for primary ordering (see Table 1 in the standard).
  • Word‑by‑word alternative: Annex A provides a normative word‑by‑word ordering method commonly used instead of pure letter‑by‑letter sorting.
  • Additional rules and domain-specific handling:
    • Annex B: extra lexicographical and terminological rules.
    • Annex C: ordering rules for chemical names.
    • Annex D: Latin alphabet character repertoire.
    • Annex E/F: lists of languages using Latin script and derived alphabetical sequences for selected languages.
    • Annex G: formal rule description conforming with ISO/IEC 14651.
  • Preparatory procedures: The standard notes pre-sorting steps (e.g., case folding, numeral padding, handling polygraphs) but does not mandate extraction or normalization methods.
  • Language sensitivities: Special handling examples (e.g., Turkish dotless/dotted I) are described to support correct multilingual ordering.

Practical applications and typical users

ISO 12199:2022 is used where consistent, language-agnostic ordering is required:

  • Terminologists and lexicographers compiling multilingual glossaries, dictionaries and terminological databases.
  • Software engineers and database designers implementing collation/sorting for multilingual search, index and UI lists.
  • Localization and internationalization specialists ensuring consistent sort order across locales.
  • Libraries, archives and content managers producing multilingual catalogues and indexes.
  • Chemical database curators applying Annex C for name ordering.

Benefits include improved data interchange, predictable user experience in multilingual indexes, and compatibility with other standards-based sorting (e.g., ISO/IEC 14651).

Related standards

  • ISO/IEC 14651 (formal collation specification) - informatively referenced and used as a formal model in Annex G.
  • ISO 1087 (terminology vocabulary) - normative reference.
  • ISO 10241-1 - complementary for terminological documentation.

Keywords: ISO 12199:2022, alphabetical ordering, multilingual collation, Latin alphabet, terminological data, lexicographical ordering, diacritics, character sequence, localization, data interchange.

Standard

ISO 12199:2022 - Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet Released:14. 06. 2022

English language
52 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO 12199:2022 is a standard published by the International Organization for Standardization (ISO). Its full title is "Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet". This standard covers: This document specifies the sequence of characters to be used in the alphabetical ordering of multilingual terminological and lexicographical data (terms, term elements, or words) represented in the Latin alphabet. Character sets of languages represented in the Latin alphabet are taken into account insofar as terminological or lexicographical data have been recorded. Character sets used in internationally standardized transliteration into Latin script are also taken into account. The sequence of alphabetical characters given is intended for multilingual purposes only and is not intended to affect the alphabetical order of any specific language. The main part of this document specifies letter-by-letter ordering of character strings. Annex A treats word-by-word ordering, which is a widely used alternative to this system. Annex B gives two additional rules that can be useful for lexicographical and terminological ordering. Annex C gives ordering rules for chemical names. Annex D lists the character repertoire of the Latin alphabet. Annex E lists languages using the Latin alphabet. Annex F gives alphabetical sequences derived from the sequence specified in this document for a number of languages that use the Latin alphabet. Annex G gives a formal description of the rules laid down in the main part of this document conforming with ISO/IEC 14651.

This document specifies the sequence of characters to be used in the alphabetical ordering of multilingual terminological and lexicographical data (terms, term elements, or words) represented in the Latin alphabet. Character sets of languages represented in the Latin alphabet are taken into account insofar as terminological or lexicographical data have been recorded. Character sets used in internationally standardized transliteration into Latin script are also taken into account. The sequence of alphabetical characters given is intended for multilingual purposes only and is not intended to affect the alphabetical order of any specific language. The main part of this document specifies letter-by-letter ordering of character strings. Annex A treats word-by-word ordering, which is a widely used alternative to this system. Annex B gives two additional rules that can be useful for lexicographical and terminological ordering. Annex C gives ordering rules for chemical names. Annex D lists the character repertoire of the Latin alphabet. Annex E lists languages using the Latin alphabet. Annex F gives alphabetical sequences derived from the sequence specified in this document for a number of languages that use the Latin alphabet. Annex G gives a formal description of the rules laid down in the main part of this document conforming with ISO/IEC 14651.

ISO 12199:2022 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination). The ICS classification helps identify the subject area and facilitates finding related standards.

ISO 12199:2022 has the following relationships with other standards: It is inter standard links to ISO 12199:2000. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO 12199:2022 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO
STANDARD 12199
Second edition
2022-06
Alphabetical ordering of multilingual
terminological and lexicographical
data represented in the Latin alphabet
Mise en ordre alphabétique des données lexicographiques et
terminologiques multilingues représentées dans l'alphabet latin
Reference number
© ISO 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Preparatory procedures . 2
5 First ordering level . 3
5.1 First-ordering-level values . 3
5.2 First-ordering-level sequence . 3
5.3 Equivalence between special Latin letters and basic letters . 4
6 Second ordering level . 4
6.1 Second-ordering-level values . 4
6.2 Special Latin letters and letters with diacritical marks . 5
7 Third ordering level .6
7.1 Third-ordering-level values . 6
7.2 Ordering according to capitalization . 6
8 Fourth ordering level .6
8.1 Fourth-ordering-level values . 6
8.2 Ordering according to special characters . 6
Annex A (normative) Word-by-word ordering . 7
Annex B (informative) Special rules for lexicographical and terminological ordering .9
Annex C (informative) Ordering rules for chemical names .10
Annex D (informative) Character repertoire of the Latin alphabet .12
Annex E (informative) Languages using the Latin alphabet .20
Annex F (informative) Alphabetical sequences and character repertoires .27
Annex G (informative) Formal description of the rules of the main body of this document .40
Bibliography .50
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO’s adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 2, Terminology workflow and language coding.
This second edition cancels and replaces the first edition (ISO 12199:2000), of which it constitutes a
minor revision. The changes are as follows:
— the relationship of this document with other International Standards has been updated and
transferred from the Foreword to the Introduction;
— in Clause 2 and in the Bibliography, the references have been updated;
— ISO/IEC 14651 is cited informatively and therefore has been moved from Clause 2 to the Bibliography;
— in Annexes D, E and F, the Serbian language has been added among the languages using the Latin
alphabet, together with a character set and alphabetical ordering information relating to the Serbian
language;
— in Annex E, the references to Serbo-Croatian have been deleted;
— in Annexes E and F, the entries related to Moldovan have been corrected in line with ISO 639-1 and
ISO 639-2;
— Annex G is cited informatively and therefore has been changed to “(informative)”.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
In the development of international terminologies, both in printed form and in databases, it is essential
to have uniform and internationally recognized rules for the alphabetical ordering of terminological
and lexicographical data, to make these terminologies more easily accessible for the users. In addition,
it will facilitate the interchange of terminological and lexicographical data.
This document complements other International Standards, such as ISO 10241-1.
v
INTERNATIONAL STANDARD ISO 12199:2022(E)
Alphabetical ordering of multilingual terminological and
lexicographical data represented in the Latin alphabet
1 Scope
This document specifies the sequence of characters to be used in the alphabetical ordering of
multilingual terminological and lexicographical data (terms, term elements, or words) represented
in the Latin alphabet. Character sets of languages represented in the Latin alphabet are taken into
account insofar as terminological or lexicographical data have been recorded. Character sets used in
internationally standardized transliteration into Latin script are also taken into account.
The sequence of alphabetical characters given is intended for multilingual purposes only and is not
intended to affect the alphabetical order of any specific language.
The main part of this document specifies letter-by-letter ordering of character strings. Annex A treats
word-by-word ordering, which is a widely used alternative to this system.
Annex B gives two additional rules that can be useful for lexicographical and terminological ordering.
Annex C gives ordering rules for chemical names.
Annex D lists the character repertoire of the Latin alphabet.
Annex E lists languages using the Latin alphabet.
Annex F gives alphabetical sequences derived from the sequence specified in this document for a
number of languages that use the Latin alphabet.
Annex G gives a formal description of the rules laid down in the main part of this document conforming
with ISO/IEC 14651.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 1087, Terminology work and terminology science — Vocabulary
1)
ISO/IEC 10646-1 , Information technology — Universal Multiple-Octet Coded Character Set (UCS) —
Part 1: Architecture and Basic Multilingual Plane
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
1) In this minor revision of ISO 12199:2000, reference continues to be made to ISO/IEC 10646-1:1993.
ISO/IEC 10646-1 and ISO/IEC 10646-2 have since been merged into ISO/IEC 10646:2020.
3.1
character
member of a set of elements used for the organization, control or representation of data
3.2
letter
character (3.1) used for writing natural language, often representing a sound in the language
3.3
digit
character (3.1) used to represent the numeric value, or part thereof, of a number
3.4
special character
character (3.1) that is not a letter (3.2) nor a digit (3.3)
EXAMPLE The space character is a special character.
3.5
ligature
character (3.1) resulting from the joining of two or more letters (3.2)
Note 1 to entry: The resulting character is, in some cases, considered a separate letter.
3.6
polygraph
two or more consecutive letters (3.2) that are regarded as one letter for some purpose
Note 1 to entry: A polygraph consisting of two or three letters may be referred to as a digraph or a trigraph,
respectively.
3.7
diacritical mark
character (3.1) that is not a letter (3.2) and is placed over, under, or through a letter or a combination of
letters
3.8
ordering
act of bringing strings of characters (3.1) into a well-defined sequence according to a string comparison
specification
4 Preparatory procedures
In the process of alphabetical ordering, character strings are compared according to a set of rules.
This document specifies the set of rules to be used for the ordering, but does not address the means of
selection of relevant character strings, nor any modification of the strings that can be needed for a given
purpose. Consequently, certain preparatory procedures can be needed before applying the ordering
rules. Depending on the needs in each individual case, it is possible that:
— the relevant character strings have to be selected, e.g. relevant terms have to be extracted from a
corpus;
— the character strings have to be modified, e.g. sentence-initial uppercase letters have to be changed
to lowercase letters, plural form of words have to be changed to singular form;
— leading zeroes or spaces can be added, e.g. in lists containing numerals.
Polygraphs are treated as sequences of separate letters.
An application may arrange information into several ordering fields, and determine ranking order with
several separate and independent comparisons. This document only defines a single comparison for
one such field, where the field is a character-string field.
Only the characters that appear in the string and their arrangement are taken into account. Apart from
the ordering rules and passes, no other knowledge about the words in the character string is used. For
example, dictionary information or rules about language syntax, phonetics and semantics are not used.
5 First ordering level
5.1 First-ordering-level values
When comparing strings to be ordered, the first-ordering-level values of the strings shall be considered
first. The subsequent ordering-level values need to be considered only if two or more strings have
identical first-ordering-level values.
For multilingual ordering, the following rules shall be applied (Annex A shall be applied for word-by-
word ordering).
5.2 First-ordering-level sequence
Digits and letters have the following ordering values:
a) Digits:
0 1 2 3 4 5 6 7 8 9
NOTE 1 Sequences of digits are ordered from left to right as written, thus generating the following order,
for example: 1 10 100 11 110 111 12 19 190 2 21 3.
NOTE 2 Leading zeroes can be inserted as a preparatory procedure, e.g. to generate the following order:
0001 0002 0003 0010 0011 0012 0019 0021 0100 0110 0111 0190.
b) Basic letters of the Latin alphabet:
a A b B c C d D e E f F g G h H i I j J k K l L m M n N
o O p P q Q r R s S t T u U v V w W x X y Y z Z þ Þ
NOTE 3 This order has been established for use in multilingual environments so as to conflict with as
few individual languages as possible. See Annex F for examples of deviations from this sequence in some
languages.
Uppercase and lowercase letters shall be treated as equivalent (see Clause 7). Letters of the Latin
alphabet with diacritical marks shall be treated as equivalent to the corresponding basic Latin
letters (see Clause 6). Special letters of the Latin alphabet shall be treated as equivalent to basic
Latin letters according to Table 1 in 5.3 (see Clause 6).
The Turkish language distinguishes ı/I from i/İ, while other languages have the pair i/I only. To
order multilingual data including Turkish text, the i/I pair shall be expanded as follows:
1: ı/I U0131/U0049 latin letter dotless i (Turkish)
2: i/I U0131/U0049 latin letter i (non-Turkish)
3: i/İ U0069/U0130 latin letter i with dot above (Turkish)
It should also be noted that, for example, í (U00ED latin small letter i with acute) in normal
print is represented as latin small letter dotless i with acute. For the purpose of ordering,
however, it shall be treated as equivalent to i (U0069 latin small letter i) on the first ordering
level.
NOTE 4 Throughout this document, characters are referenced as UXXXX, where X is any hexadecimal
digit and refers to the position of the character in ISO/IEC 10646-1. Character names are given as in
ISO/IEC 10646-1. Most names of Latin letters start with “latin small letter …” and “latin capital letter
…”. When referring to both lowercase and uppercase letter, the name “latin letter …” is used. When there
is no danger of misinterpretation, the words “latin letter” are sometimes omitted.
c) Letters of other alphabets:
Letters of other alphabets follow in the sequences established for each alphabet. The order of non-
Latin alphabets shall be: the Greek alphabet, the Cyrillic alphabet, other alphabets.
NOTE 5 It is outside the scope of this document to establish the sequences for alphabets other than the
Latin alphabet. The Greek alphabet has the following sequence of letters:
α Α β Β γ Γ δ Δ ε Ε ζ Ζ η Η θ Θ ι Ι κ Κ λ Λ μ Μ ν Ν ξ Ξ
ο Ο π Π ρ Ρ σ Σ τ Τ υ Υ φ Φ χ Χ ψ Ψ ω Ω
All other characters, e.g. punctuation marks, shall be ignored. See Clause 8.
5.3 Equivalence between special Latin letters and basic letters
Special Latin letters shall be treated as equivalent to basic letters of the Latin alphabet according to
Table 1. Uppercase and lowercase letters shall be treated as equivalent.
Table 1 — Equivalence between special Latin letters and basic letters
Position Character name in ISO/IEC 10646-1 Character position for Equivalent to
lowercase/uppercase
in ISO/IEC 10646-1
01 latin letter ae U00E6 U00C6 ae
02 latin letter b with hook U0253 U0181 b
03 latin letter c with hook U0188 U0187 c
04 latin letter d with stroke U0111 U0110 d
05 latin letter d with hook U0257 U018A d
06 latin letter eth U00F0 U00D0 d
07 latin letter g with hook U0260 U0193 g
08 latin letter h with stroke U0127 U0126 h
09 latin letter k with hook U0199 U0198 k
a
10 latin small letter kra U0138 k
11 latin letter l with stroke U0142 U0141 l
12 latin letter eng U014B U014A n
13 latin letter o with stroke U00F8 U00D8 o
14 latin ligature oe U0153 U0152 oe
a
15 latin small letter sharp s U00DF ss
16 latin letter t with stroke U0167 U0166 t
a
No corresponding uppercase letter.
6 Second ordering level
6.1 Second-ordering-level values
If the comparison of two strings results in identical first-ordering-level values, second-ordering-level
values shall be applied according to 6.2.
The rule shall be applied from left to right.
6.2 Special Latin letters and letters with diacritical marks
Special Latin letters, that have been treated as equivalent to basic Latin letters according to Table 1,
shall be ordered according to the order in Table 1.
Diacritical marks shall be ordered according to Table 2.
NOTE This order has been established for multilingual environments so as to be in conflict with as few
individual languages as possible. See Annex F for examples of deviations from this sequence in some languages.
Table 2 — Ordering of diacritical marks
Position Name Position for combining
diacritical mark in
ISO/IEC 10646-1
0000 none —
0100 acute accent U0301
0200 grave accent U0300
0300 breve U0306
0301 breve and acute —
0302 breve and grave —
0310 breve and hook above —
0311 breve and tilde —
0313 breve and dot below —
0315 breve and comma below —
0400 circumflex accent U0302
0401 circumflex and acute —
0402 circumflex and grave —
0410 circumflex and hook above —
0411 circumflex and tilde —
0413 circumflex and dot below —
0500 circumflex accent below U032D
0600 caron U030C
0614 caron and cedilla —
0700 ring above U030A
0701 ring above and acute —
0800 diaeresis U0308
0813 diaeresis and dot below —
0817 diaeresis and macron —
0900 double acute accent U030B
1000 hook above U0309
1100 tilde U0303
1200 dot above U0307
1300 dot below U0323
1400 cedilla U0327
a
1500 comma above/below U0313 and U0326
1600 ogonek U0328
a
The position of combining comma above and below the base character.
Table 2 (continued)
Position Name Position for combining
diacritical mark in
ISO/IEC 10646-1
1700 macron U0304
1713 macron and dot below —
1800 macron below U0331
1900 preceded by apostrophe —
2000 followed by apostrophe —
2100 horn U031B
2101 horn and acute —
2102 horn and grave —
2110 horn and hook above —
2111 horn and tilde —
2113 horn and dot below —
a
The position of combining comma above and below the base character.
7 Third ordering level
7.1 Third-ordering-level values
If the comparison of two strings results in identical first- and second-ordering-level values, third-
ordering-level values shall be applied according to 7.2.
The rule shall be applied from left to right.
7.2 Ordering according to capitalization
A lowercase letter shall be ordered before the corresponding uppercase letter. [See 5.2, item b), first
paragraph after NOTE 3.]
NOTE The terms “lowercase letter” and “uppercase letter” are used for members of the sets “a b c …” and “A
B C …”, respectively. In character names, the naming conventions of ISO/IEC 10646-1 are used. ISO/IEC 10646-1
uses “latin small letter” and “latin capital letter”, respectively.
8 Fourth ordering level
8.1 Fourth-ordering-level values
If the comparison of two strings results in identical first-, second- and third-ordering-level values,
fourth-ordering-level values shall be applied according to 8.2.
The rule shall be applied from left to right.
8.2 Ordering according to special characters
Special characters are ordered according to the sequence of the default template of ISO/IEC 14651. For
most special characters, this is the order in which they are listed in ISO/IEC 10646-1.
NOTE In word-by-word ordering (see Annex A), the space character and possibly other special characters
can have special functions as key separators.
Annex A
(normative)
Word-by-word ordering
A.1 Principles of word-by-word ordering
As noted in the Scope, this document specifies the letter-by-letter ordering of character strings. Word-
by-word ordering is a widely used alternative to this system. Table A.1 illustrates the difference
between letter-by-letter ordering and word-by-word ordering.
Table A.1 — Letter-by-letter and word-by-word ordering
Letter-by-letter ordering Word-by-word ordering
ad ad
adhesive ad hoc
ad hoc ad infinitum
adieu adhesive
ad infinitum adieu
adipose adipose
A.2 Multiple-key ordering
Single-key ordering is described in the main body of this document. In multiple-key ordering, all the
ordering rules are applied to one key before they are applied to the next, until all the keys have been
considered or a unique sequence has been established.
NOTE One typical example of multiple-key ordering is a list of delegates to a meeting, where the first key can
be the country names, the second key can be the delegates’ last names, and the third key can be the delegates’
first names. In this example, if a country has one delegate only, the second key (last names) will not be considered.
A.3 Word-by-word ordering as multiple-key ordering
In word-by-word ordering, space characters, and possibly also by definition other characters, are key
separators. The key-separator characters function as key separators only, and they have no position in
the ordering sequence.
When the character string has been divided into a sequence of keys, the ordering rules of the main body
of this document are invoked for one key at a time.
NOTE 1 In addition to the space characters, some or all punctuation marks can be defined as key separators.
It can also be useful to define some space characters as key separators, while other space characters remain
special characters within a key. The choices depend on the language(s) and type of strings to be ordered.
NOTE 2 If space characters and hyphens are defined as key separators, the title of this clause would be split
into the following keys: , where each key
is contained within < and >, and the spaces are added for increased readability.
A.4 Simple word-by-word ordering
If the text to be ordered using word-by-word ordering contains very few special Latin letters and
diacritical marks, the following extension to the rules in the main body of this document will produce
the same or nearly the same output as the rules described in Clause A.3.
On the first ordering level (see 5.2), the space character is added as the first item. Items 1, 2, and 3 in
5.2 then become items 2, 3, and 4. The space character is not treated as a special character on the fourth
ordering level (see Clause 8).
NOTE Depending on the language(s) and type of strings to be ordered, it can be useful to treat even other
special characters (e.g. hyphens) in the same way as the space character.
Annex B
(informative)
Special rules for lexicographical and terminological ordering
B.1 Background
For lexicographical and terminological applications, it can sometimes be desirable to add additional
ordering criteria to the rules that are described in the main body of this document.
The features that are described in this annex cannot easily be described in the formalism given in
ISO/IEC 14651.
B.2 Position relative to baseline
It can be desirable to distinguish, for example, m2, m , m for ordering purposes. If this is deemed
necessary, it is recommended that this be done on the third ordering level (see Clause 7) combined with
capitalization.
The ordering value of any given character based on its position relative to the baseline may be
determined according to Table B.1.
Table B.1 — Position relative to baseline
1 character(s) on baseline
2 character(s) above baseline, superscript character(s)
3 character(s) below baseline, subscript character(s)
B.3 Ordering according to styles
If ordering by the first through fourth ordering level does not produce a unique sequence, typographical
styles may be taken into consideration as a fifth ordering level.
Styles may be ordered according to Table B.2.
Table B.2 — Order of styles
Position Style name Example
1 roman abcdefghij
2 boldface abcdefghij
3 italic abcdefghij
4 boldface-italic abcdefghij
5 others
Annex C
(informative)
Ordering rules for chemical names
C.1 Background
There are no universally accepted ordering rules for chemical names. The ordering rules of the main
body of this document may be used, if so desired, with the extension of the word-by-word ordering
rules described in Annex A.
However, some indexes and databases, in particular at the Chemical Abstracts Services (CAS), use a
specially designed multiple-key ordering system. The main features of this system are outlined in this
2)
annex.
C.2 Division into three keys
C.2.1 Parent name
The first key consists of the parent name, which normally is all roman letters and space characters,
whether or not interrupted by italic letters, Greek letters, digits or special characters (e.g. punctuation).
C.2.2 Initial locants
The second key consists of initial locants, being all characters before the first roman letter.
C.2.3 Other locants
The third key consists of all non-initial locants, being all remaining characters.
NOTE The name “2-Butanone-1,1,1-d , 3,3-dimethyl” is divided into three keys as follows: dimethyl> <2-> <-1,1,1-d , 3,3->.
C.3 Ordering rules within each key
The first key is ordered according to the rules of the main body of this document.
In the second and third keys, the following order is used:
— letters of the Latin alphabet (which is in italic), in the order specified in 5.2, item b);
— letter of the Greek alphabet, in the order given in 5.2, item c);
— numerals, in the order of the numeric value.
C.4 Output
Table C.1 shows ordered output from the rules that are described in this annex compared with output
from the rules of the main body of this document.
2) For further details, consult Chemical Abstracts Services (CAS), P.O. Box 3012, Columbus, Ohio 43210, USA.
Table C.1 — Sample output
Ordered according to Annex C Ordered according to general rules
Bromine fluoride (BrF ) 1-Butanone
Bromine fluoride (BrF ) 1-Butanone, 1-phenyl-
2-Butanol 2-Butanol
2-Butanol, (R)- 2-Butanol, 1-chloro-
2-Butanol, (S)- 2-Butanol, 4-(trimethylstannyl)-
2-Butanol, sodium salt, (S)- 2-Butanol, (R)-
2-Butanol, 1-chloro- 2-Butanol, (S)-
2-Butanol, 4-(trimethylstannyl)- 2-Butanol, sodium salt, (S)-
1-Butanone 2-Butanone
1-Butanone, 1-phenyl- 2-Butanone, 1-(dimethylamino)-3,3-dimethyl-
2-Butanone 2-Butanone-1,1,1-d
2-Butanone, O-methyloxime 2-Butanone-1,1,1-d , 3,3-dimethyl-
2-Butanone, oxime 2-Butanone, 3-(4-acetylphenyl)-
2-Butanone, polymer with formaldehyde 2-Butanone, 3-ethoxy-1,1-dihydroxy-
2-Butanone, 3-(4-acetylphenyl)- 2-Butanone, O-methyloxime
2-Butanone, 1-(dimethylamino)-3,3-dimethyl- 2-Butanone, oxime
2-Butanone, 3-ethoxy-1,1-dihydroxy- 2-Butanone, polymer with formaldehyde
2-Butanone-1,1,1-d Bromine fluoride (BrF )
3 3
2-Butanone-1,1,1-d , 3,3-dimethyl- Bromine fluoride (BrF )
3 5
Butanoyl chloride Butanoyl chloride
Annex D
(informative)
Character repertoire of the Latin alphabet
Table D.1 lists the character repertoire of the Latin alphabet. The languages listed in Annex E have been
taken into account if reliable information is available. Characters that are exclusive to the International
Phonetic Alphabet have not been included.
NOTE The names used in ISO/IEC 10646-1 are used in the “Name” column of Table D.1. The full names of
the letters are “latin small letter …” and “latin capital letter …” for the lowercase and uppercase letters,
respectively. In the “Type” column of Table D.1, b = basic Latin letter; d = Latin letter with diacritical mark;
s = special Latin letter. In the column “languages used”, + indicates that the letter is used in most or all languages
that use the Latin alphabet. For the language symbols used in Table D.1, see Annexes E and F. The language
symbols in square brackets refer to transliteration systems; see Table F.2.
Table D.1 — Character repertoire
Name Type Position for Languages used
lowercase/uppercase
in ISO/IEC 10646-1
latin letter a b U0061 U0041 +
with acute d U00E1 U00C1 af ca cs cy da es fo fur ga gd gl
hu is kl nl pt qal sk smi ss vi
[Cyr] [ar]
with grave d U00E0 U00C0 ca cy de fr fur fy gd it nl no pt
qal rm vi [Cyr]
with breve d U0103 U0102 mo ro vi [Cyr]
with breve and acute d U1EAF U1EAE vi
with breve and grave d U1EB1 U1EB0 vi
with breve and d U1EB3 U1EB2 vi
hook above
with breve and tilde d U1EB5 U1EB4 vi
with breve and d U1EB7 U1EB6 vi
dot below
with circumflex d U00E2 U00C2 br cy de fr fur fy kl mo pt qal rm
ro smi vi [Cyr] [ar]
with circumflex and d U1EA5 U1EA4 vi
acute
with circumflex and d U1EA7 U1EA6 vi
grave
with circumflex and d U1EA9 U1EA8 vi
hook above
with circumflex and d U1EAB U1EAA vi
tilde
with circumflex and d U1EAD U1EAC vi
dot below
with caron d U01CE U01CD [Cyr]
with ring above d U00E5 U00C5 da kl no smi sv [Cyr]
with ring above and d U01FB U01FA
acute
Table D.1 (continued)
Name Type Position for Languages used
lowercase/uppercase
in ISO/IEC 10646-1
with diaeresis d U00E4 U00C4 cy de et fi fy lb nl sk smi sv tr
[Cyr]
with diaeresis and d — — [Cyr]
dot below
with diaeresis and d U01DF U01DE
macron
with double acute d — — [Cyr]
with hook above d U1EA3 U1EA2 vi
with tilde d U00E3 U00C3 kl pt vi
with dot below d U1EA1 U1EA0 vi
with ogonek d U0105 U0104 lt pl
with macron d U0101 U0100 lv [Cyr] [ar]
latin letter ae s U00E6 U00C6 da fo fr is kl no smi [Cyr]
with acute s d U01FD U01FC
with macron s d U01E3 U01E2
latin letter b b U0062 U0042 +
with dot above d U1E03 U1E02 [he]
with dot below d U1E05 U1E04
with hook s U0253 U0181 ha
latin letter c b U0063 U0043 +
with acute d U0107 U0106 hr pl sr [Cyr]
with grave d — — [Cyr]
with breve d — — [Cyr]
with breve and d — — [Cyr]
comma below
with circumflex d U0109 U0108 eo [Cyr]
with caron d U010D U010C cs hr lt lv sk sl smi sr [Cyr]
with diaeresis d — — [Cyr]
with dot above d U010B U010A mt
with dot below d — — [Cyr]
with cedilla d U00E7 U00C7 ca fr oc pt sq tr [Cyr]
with macron d — — [Cyr]
with hook s U0188 U0187
latin letter d b U0064 U0044 +
with circumflex d — — [Cyr]
with circumflex d U1E13 U1E12 hz ve
below
with caron d U010F (cs) (sk)
with caron d U010E cs sk
with dot above d U1E0B U1E0A
with dot below d U1E0D U1E0C [ar]
with line below d U1E0F U1E0E [ar]
followed by d — cs sk
apostrophe
Table D.1 (continued)
Name Type Position for Languages used
lowercase/uppercase
in ISO/IEC 10646-1
a
with stroke s U0111 U0110 hr smi sr vi [Cyr]
with hook s U0257 U018A ha
a
latin letter eth s U00F0 U00D0 fo is
latin letter e b U0065 U0045 +
with acute d U00E9 U00C9 af ca cs cy da de es fr fy ga gd gl
hu is it kl lb nl no pt qal sk sl ss
sv vi
with grave d U00E8 U00C8 af ca cy de fr fur gd it nl no pt
qal rm vi [Cyr]
with breve d U0115 U0114 [Cyr]
with circumflex d U00EA U00CA af br cy de fr fy nl no nso pt qal
rm sl tn vi [Cyr]
with circumflex and d U1EBF U1EBE vi
acute
with circumflex and d U1EC1 U1EC0 vi
grave
with circumflex and d U1EC3 U1EC2 vi
hook above
with circumflex and d U1EC5 U1EC4 vi
tilde
with circumflex and d U1EC7 U1EC6 vi
dot below
with circumflex below d U1E19 U1E18
with caron d U011B U011A cs [Cyr]
with diaeresis d U00EB U00CB af cy de fr fy lb nl sq [Cyr]
with hook above d U1EBB U1EBA vi
with tilde d U1EBD U1EBC vi
with dot above d U0117 U0116 lt
with dot below d U1EB9 U1EB8 vi
with ogonek d U0119 U0118 lt pl
with macron d U0113 U0112 lv
latin letter f b U0066 U0046 +
with grave d — — [Cyr]
latin letter g b U0067 U0047 +
with acute d U01F5 U01F4 [Cyr]
with grave d — — [Cyr]
with breve d U011F U011E tr [Cyr]
with circumflex d U011D U011C eo
with caron d U01E7 U01E6 [ar]
with dot above d U0121 U0120 mt [Cyr] [ar]
with comma d — — lv
b
below/above
with hook s U0260 U0193
latin letter h b U0068 U0048 +
with circumflex d U0125 U0124 eo
Table D.1 (continued)
Name Type Position for Languages used
lowercase/uppercase
in ISO/IEC 10646-1
with dot above d U1E23 U1E22 [he]
with dot below d U1E25 U1E24 [Cyr] [ar] [he]
with cedilla d U1E29 U1E28 [Cyr]
with line below d U1E96 — [ar]
with stroke s U0127 U0126 mt
latin letter i b U0069 U0049 +
latin small letter dotless i b U0131 tr
latin capital letter i b U0130 tr
with dot above
latin letter i d U00ED U00CD af ca cs cy da es fo ga gd gl hu is
with acute it kl nl pt qal sk vi [Cyr] [ar]
with grave d U00EC U00CC cy fur gd it qal vi [Cyr]
with breve d U012D U012C
with circumflex d U00EE U00CE af cy fr it kl mo qal ro [Cyr]
with caron d U01D0 U01CF [Cyr]
with diaeresis d U00EF U00CF af ca cy de fr fy it nl oc [Cyr]
with hook above d U1EC9 U1EC8 vi
with tilde d U0129 U0128 kl vi
with dot below d U1ECB U1ECA vi
with ogonek d U012F U012E lt
with macron d U012B U012A lv [Cyr] [ar]
latin letter j b U006A U004A +
with acute d — — [Cyr]
with circumflex d U0135 U0134 eo
with caron d U01F0 — [Cyr]
latin letter k b U006B U004B +
with acute d U1E31 U1E30 [Cyr]
with grave d — — [Cyr]
with circumflex d — — [Cyr]
with caron d U01E9 U01E8 [Cyr]
with dot above d — — [he]
with dot below d U1E33 U1E32 [Cyr]
c
with comma below d — — lv [Cyr]
with macron d — — [Cyr]
latin letter k with hook s U0199 U0198 ha
latin small letter kra s U0138 kl
latin letter l b U006C U004C +
with acute d U013A U0139 [Cyr]
with circumflex d — — [Cyr]
with circumflex below d U1E3D U1E3C ve
with caron d U013E U013D
with dot below d U1E37 U1E36
d
with comma below d — — lv [Cyr]
Table D.1 (continued)
Name Type Position for Languages used
lowercase/uppercase
in ISO/IEC 10646-1
with stroke s U0142 U0141 pl
latin letter m b U006D U004D +
with acute d U1E3F U1E3E
with circumflex d — — lb
with dot above d U1E41 U1E40
with dot below d U1E43 U1E42
latin letter n b U006E U004E +
with acute d U0144 U0143 pl smi [Cyr]
with grave d — — [Cyr]
with breve d — — [Cyr]
with circumflex d — — lb [Cyr]
with circumflex below d U1E4B U1E4A hz ve
with caron d U0148 U0147 cs sk
with tilde d U00F1 U00D1 br es eu gl
with dot above d U1E45 U1E44 ve [Cyr]
with dot below d U1E47 U1E46 [Cyr]
e
with comma below d — — lv [Cyr]
with macron d — — [Cyr]
preceded by apostrophe d U0149 —
followed by apostrophe d — — ts
latin letter eng s U014B U014A se
latin letter o b U006F U004F +
with acute d U00F3 U00D3 af ca cs cy da es fo ga gd gl hu is
it nl no pl pt qal sk sl ss vi [Cyr]
with grave d U00F2 U00D2 ca cy fur gd it no pt qal rm vi
[Cyr]
with breve d U014F U014E
with circumflex d U00F4 U00D4 af cy de fr fy kl no nso pt qal sk
sl tn vi [Cyr]
with circumflex and acute d U1ED1 U1ED0 vi
with circumflex and grave d U1ED3 U1ED2 vi
with circumflex and d U1ED5 U1ED4 vi
hook above
with circumflex and tilde d U1ED7 U1ED6 vi
with circumflex and d U1ED9 U1ED8 vi
dot below
with caron d U01D2 U01D1
with diaeresis d U00F6 U00D6 af cy de et fi fy hu is lb nl rm sv
tr [Cyr]
with diaeresis and d — — [Cyr]
dot below
with double acute d U0151 U0150 hu [Cyr]
with hook above d U1ECF U1ECE vi
with tilde d U00F5 U00D5 et pt vi
Table D.1 (continued)
Name Type Position for Languages used
lowercase/uppercase
in ISO/IEC 10646-1
with dot below d U1ECD U1ECC vi
with ogonek d U01EB U01EA
with macron d U014D U014C lv [Cyr]
with horn d U01A1 U01A0 vi
with horn and acute d U1EDB U1EDA vi
with horn and grave d U1EDD U1EDC vi
with horn and d U1EDF U1EDE vi
hook above
with horn and tilde d U1EE1 U1EE0 vi
with horn and dot below d U1EE3 U1EE2 vi
with stroke s U00F8 U00D8 da fo is kl no smi
with stroke and acute s d U01FF U01FE
latin ligature oe s U0153 U0152 fr [Cyr]
latin letter p b U0070 U0050 +
with acute d U1E55 U1E54 [Cyr]
with grave d — — [Cyr]
with dot above d U1E57 U1E56 [he]
latin letter q b U0071 U0051 +
latin letter r b U0072 U0052 +
with acute d U0155 U0154 sk
with caron d U0159 U0158 cs sk
with dot above d U1E59 U1E58
with dot below d U1E5B U1E5A
with cedilla d U0157 U0156 lv
latin letter s b U0073 U0053 +
with acute d U015B U015A pl [he]
with grave d — — [Cyr] [he]
with circumflex d U015D U015C eo [Cyr]
with caron d U0161 U0160 cs et hr lt lv nso sk sl smi sr tn
[Cyr] [ar] [he]
with dot above d U1E61 U1E60
with dot below d U1E63 U1E62 [ar] [he]
with cedilla d U015F U015E tr
with comma below d — — mo ro [Cyr]
latin small letter sharp s s U00DF de
latin letter t b U0074 U0054 +
with grave d — — [Cyr]
with circumflex below d U1E71 U1E70 hz ve
with caron d U0165 (cs) (sk) [Cyr]
with caron d U0164 cs sk [Cyr]
with diaeresis d U1E97 — [ar]
with dot above d U1E6B U1E6A
with dot below d U1E6D U1E6C [ar] [he]
Table D.1 (continued)
Name Type Position for Languages used
lowercase/uppercase
in ISO/IEC 10646-1
f
with comma below d — — mo ro [Cyr]
with line below d U1E6F U1E6E [ar]
followed by d — cs sk
apostrophe
with stroke s U0167 U0166 se
latin letter u b U0075 U0055 +
with acute d U00FA U00DA af ca cs cy da es fo fy ga gl hu is
it kl nl pt qal sk vi [Cyr] [ar]
with grave d U00F9 U00D9 br cy fr fur gd it qal vi [Cyr]
with breve d U016D U016C eo [Cyr]
with circumflex d U00FB U00DB af cy fr fy kl qal tr [Cyr]
with caron d U01D4 U01D3
with ring above d U016F U016E cs [Cyr]
with diaeresis d U00FC U00DC br ca cy de es et fr fy gl hu lb nl
pt rm tr [Cyr]
with diaeresis and d — — [Cyr]
dot below
with double acute d U0171 U0170 hu [Cyr]
with hook above d U1EE7 U1EE6 vi
with tilde d U0169 U0168 kl vi
with dot above d — — [Cyr]
with dot below d U1EE5 U1EE4 vi
with ogonek d U0173 U0172 lt
with macron d U016B U016A lt lv [Cyr] [ar]
with macron and d — — [Cyr]
dot below
with horn d U01B0 U01AF vi
with horn and acute d U1EE9 U1EE8 vi
with horn and grave d U1EEB U1EEA vi
with horn and d U1EED U1EEC vi
hook above
with horn and tilde d U1EEF U1EEE vi
with horn and d U1EF1 U1EF0 vi
dot below
latin letter v b U0076 U0056 +
with tilde d U1E7D U1E7C
with dot below d U1E7F U1E7E
latin letter w b U0077 U0057 +
with acute d U1E83 U1E82 cy
with grave d U1E81 U1E80 cy
with circumflex d U0175 U0174 cy [he]
with diaeresis d U1E85 U1E84 cy
with dot above d U1E87 U1E86 [he]
with dot below d U1E89 U1E88
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

기사 제목: ISO 12199:2022 - 라틴 문자알파벳으로 표현된 다중 언어 용어 및 어휘 데이터의 알파벳순 배열 기사 내용: 이 문서는 라틴 문자알파벳으로 표현된 다중 언어 용어 및 어휘 데이터(용어, 용어 요소 또는 단어)의 알파벳순 배열에 사용될 문자 순서를 지정한다. 라틴 문자알파벳으로 표현된 언어의 문자 집합은 용어 또는 어휘 데이터가 기록된 범위에서 고려된다. 라틴 문자로 국제적으로 표기된 문자 집합도 고려된다. 제시된 알파벳 문자의 순서는 다중 언어 목적으로만 사용되며, 특정 언어의 알파벳순 배열에 영향을 미치지 않는다. 이 문서의 주요 부분은 문자열의 문자별 순서를 지정한다. 부록 A는 이 시스템의 대안으로 널리 사용되는 단어별 순서를 다룬다. 부록 B는 어휘적 및 용어적 순서에 유용한 두 가지 추가 규칙을 제공한다. 부록 C는 화학 이름에 대한 순서 규칙을 제시한다. 부록 D는 라틴 문자알파벳의 문자 집합을 나열한다. 부록 E는 라틴 문자알파벳을 사용하는 언어 목록을 제공한다. 부록 F는 라틴 문자알파벳을 사용하는 특정 언어들을 위해 이 문서에서 지정된 순서에서 파생된 알파벳 순서를 제공한다. 마지막으로, 부록 G는 ISO/IEC 14651을 준수하는 이 문서의 주요 부분에 기술된 규칙의 공식적인 설명을 제시한다.

ISO 12199:2022 is a document that specifies the sequence of characters to be used for the alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet. The document considers the character sets of languages represented in the Latin alphabet, as well as internationally standardized transliteration into the Latin script. The sequence of alphabetical characters provided is intended for multilingual purposes only and does not affect the alphabetical order of any specific language. The document includes different annexes that cover word-by-word ordering, additional rules for ordering, ordering rules for chemical names, the character repertoire of the Latin alphabet, a list of languages using the Latin alphabet, and alphabetical sequences for specific languages. Lastly, the document includes a formal description of the rules mentioned in the main part of the document, conforming to ISO/IEC 14651.

記事のタイトル: ISO 12199:2022 - ラテン文字アルファベットによる多言語用語やレキシコグラフィックデータのアルファベット順 記事の内容:この文書は、ラテン文字アルファベットで表現された多言語の用語やレキシコグラフィックデータ(用語、用語要素、または単語)のアルファベット順の文字のシーケンスを指定しています。ラテン文字アルファベットで表現された言語の文字セットは、用語やレキシコグラフィックデータが記録されている範囲を考慮しています。また、ラテン文字スクリプトへの国際標準化された転写に使用される文字セットも考慮されています。提供されているアルファベット文字のシーケンスは、多言語の目的にのみ使用され、特定の言語のアルファベット順に影響を与えるものではありません。この文書の主要な部分では、文字列ごとに順番に並べる方法が指定されています。付属の付録Aでは、このシステムの代わりとして広く使用される単語ごとの順序について説明しています。付録Bでは、用語の順序に便利な2つの追加規則を提供しています。付録Cでは、化学名の順序付けのルールを示しています。付録Dでは、ラテン文字アルファベットの文字のレパートリーがリスト化されています。付録Eでは、ラテン文字アルファベットを使用する言語のリストが提供されています。付録Fでは、この文書で指定された順序から派生した一部の言語についてのアルファベットのシーケンスを提供しています。最後に、付録Gでは、ISO/IEC 14651に準拠するこの文書の主要部分に含まれるルールの正式な説明が示されています。