SIST-TS CEN/TS 1923:2003
(Main)European character repertoires and their coding - 8-bit single-byte coding
European character repertoires and their coding - 8-bit single-byte coding
This Technical Specificationspecifies the graphic char-ac-ter repertoires and their single-byte coding, which are available for use for information inter-change between information processing systems and for use within such systems, in the scripts that are commonly used by the members of CEN/CENELEC and the Institutions of the European Union and the European Free Trade Association.
This Technical Specificationdoes not specify the interchange of information using a telematic service. The character repertoire and the coding used by a telematic service are defined by the specification of that service. The transmission of information based on the specifications of this Technical Specificationusing a telematic service may necessitate an adaptation of the number of characters of a repertoire (repertoire transformation function) or a change to the coding (code transformation function).
Informationstechnik - Europäische Zeichenvorräte und deren Codierung - 8-Bit-Einzelbyte-Codierung
Diese Technische Spezifikation legt die Schriftzeichenvorräte sowie deren Einzelbyte-Codierungen der Sprachen fest, die von den CEN/CENELEC-Mitgliedern und den Institutionen der Europäischen Union und der Europäischen Freihandelszone bevorzugt verwendet werden und die für den Informationsaustausch zwischen Informationsverarbeitungssystemen und für die Anwendung innerhalb dieser Systeme zur Verfügung stehen.
Diese Technische Spezifikation trifft keine Festlegungen hinsichtlich des Austausches von Informationen in oder mit Telematikdiensten. Der in einem solchen Dienst verwendete Zeichenvorrat und dessen Codierung sind in den Spezifikationen des Telematikdienstes festgelegt. Werden dieser Technischen Spezifikation entsprechende Informationen mit Hilfe eines Telematikdienstes übermittelt, kann es nötig werden, die Anzahl der Zeichen in einem Zeichenvorrat anzupassen (Zeichenvorrats-Umsetzungsfunktion) oder die Codierung zu ändern (Code-Umsetzungsfunktion).
Nabori evropskih znakov in njihovo kodiranje – kodiranje v 8-bitne besede
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
SIST-TS CEN/TS 1923:2003
01-oktober-2003
Nabori evropskih znakov in njihovo kodiranje – kodiranje v 8-bitne besede
European character repertoires and their coding - 8-bit single-byte coding
Informationstechnik - Europäische Zeichenvorräte und deren Codierung - 8-Bit-
Einzelbyte-Codierung
Ta slovenski standard je istoveten z: CEN/TS 1923:2003
ICS:
35.040 Nabori znakov in kodiranje Character sets and
informacij information coding
SIST-TS CEN/TS 1923:2003 en
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST-TS CEN/TS 1923:2003
---------------------- Page: 2 ----------------------
SIST-TS CEN/TS 1923:2003
TECHNICAL SPECIFICATION
CEN/TS 1923
SPÉCIFICATION TECHNIQUE
TECHNISCHE SPEZIFIKATION
May 2003
ICS 35.040
Supersedes EN 1923:1998
English version
European character repertoires and their coding - 8-bit single-
byte coding
This Technical Specification (CEN/TS) was approved by CEN on 16 October 2002 for provisional application.
The period of validity of this CEN/TS is limited initially to three years. After two years the members of CEN will be requested to submit their
comments, particularly on the question whether the CEN/TS can be converted into a European Standard.
CEN members are required to announce the existence of this CEN/TS in the same way as for an EN and to make the CEN/TS available. It
is permissible to keep conflicting national standards in force (in parallel to the CEN/TS) until the final decision about the possible
conversion of the CEN/TS into an EN is reached.
CEN members are the national standards bodies of Austria, Belgium, Czech Republic, Denmark, Finland, France, Germany, Greece,
Hungary, Iceland, Ireland, Italy, Luxembourg, Malta, Netherlands, Norway, Portugal, Slovakia, Spain, Sweden, Switzerland and United
Kingdom.
EUROPEAN COMMITTEE FOR STANDARDIZATION
COMITÉ EUROPÉEN DE NORMALISATION
EUROPÄISCHES KOMITEE FÜR NORMUNG
Management Centre: rue de Stassart, 36 B-1050 Brussels
© 2003 CEN All rights of exploitation in any form and by any means reserved Ref. No. CEN/TS 1923:2003 E
worldwide for CEN national Members.
---------------------- Page: 3 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
Contents
Foreword.3
1 Scope .4
2 Normative references .4
3 Terms and definitions.4
4 Conformance.5
4.1 Conformance for information interchange.5
4.2 Conformance of devices .5
4.2.1 General.5
4.2.2 Device description .5
4.2.3 Originating devices.5
4.2.4 Receiving devices.5
5 Scenario description .5
5.1 Repertoires .5
5.2 Combinations of repertoires and their coding.5
6 Repertoire descriptions.6
6.1 Latin script.6
6.2 Greek script .6
6.3 Cyrillic script .6
6.4 The symbols repertoire .6
7 Coding methods applicable.7
7.1 8-bit single-byte coding.7
7.2 Formation of G-sets.7
7.2.1 Invariant-Latin repertoire .7
7.2.2 Initial-Latin repertoire .7
7.2.3 Basic-Latin-a repertoire.7
7.2.4 Basic-Latin-b repertoire .7
7.2.5 Basic-Latin-c repertoire.7
7.2.6 Large-Latin-8-a repertoire .8
7.2.7 Large-Latin-8-b repertoire.8
7.2.8 Celtic repertoire.8
7.2.9 Romanian repertoire.8
7.2.10 Basic-Greek repertoire .8
7.2.11 Basic-Cyrillic repertoire .8
7.2.12 Symbols repertoire .8
8 Identification of options .8
Annex A (informative) Specifications of referenced ISO-IR code tables.10
Annex B (informative) CEN/TS 1923 options compared to ISO/IEC 7/8-bit standards.21
Annex C (informative) Code table illustrations .22
2
---------------------- Page: 4 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
Foreword
This document (CEN/TS 1923:2003) has been prepared by Technical Committee CEN/TC 304, "Information and
communications technology - European localization requirements", the secretariat of which is held by SIS.
According to the CEN/CENELEC Internal Regulations, the national standards organizations of the following coun-
tries are bound to announce this European Standard: Austria, Belgium, Czech Republic, Denmark, Finland, France,
Germany, Greece, Hungary, Iceland, Ireland, Italy, Luxembourg, Malta, Netherlands, Norway, Portugal, Slovakia,
Spain, Sweden, Switzerland and the United Kingdom.
This Technical Specification is a revision of the European Standard EN 1923:1998, which it cancels and replaces.
The main purpose of the revision is to include, and thereby to publicize the availability of, 8-bit code tables devel-
oped after the publication of EN 1923:1998; in particular the code table of ISO/IEC 8859-15 and the tables of other
additions to the ISO/IEC 8859 series. Although CEN/TC 304 decided that a revision of the contents of EN
1923:1998 was necessary, some uncertainty existed whether the standard as such is needed by the data commu-
nity in the present-day direction towards multi-octet coding schemes. The committee therefore decided to classify
the revised document as a Technical Specification. Its usefulness will thereby become evaluated.
The contents of this document differs from that of EN 1923:1998 in the following respects:
– Extensive editorial changes have been made to the text for conformance with present CEN/CENELEC drafting
rules.
– Additional coding scheme options have been introduced, corresponding to ISO/IEC 8859 parts 14, 15 and 16
(Latin-8, Latin-9 and Latin-10), and also to ISO-IR 204 ("Latin-1 alternative with Euro").
– For consistency, the definitions of all options now refer to registrations according to ISO 2375:1985 in the ISO
"International register of coded character sets to be used with escape sequences". Relationships to ISO/IEC
10646-1:2000 specifications are also given, to the extent applicable.
– An informative Annex A has been added, containing ISO/IEC 10646-1:2000 identifications for all characters in
the options character sets.
– An informative Annex B has been added, listing relationships to ISO/IEC 7/8-bit coding standards.
– An informative Annex C has been added, illustrating the code tables for all options.
3
---------------------- Page: 5 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
3.2
byte
1 Scope
bit string that is operated upon as a unit
This Technical Specification specifies the graphic
3.3
character repertoires and their single-byte coding,
character
which are available for use for information interchange
member of a set of elements used for the organiza-
between information processing systems and for use
tion, control, or representation of data
within such systems, in the scripts that are commonly
used by the members of CEN/CENELEC and the In-
3.4
stitutions of the European Union and the European
coded-character-data-element
Free Trade Association.
CC-data-element
element of interchanged information that is specified
This Technical Specification does not specify the in-
to consist of a sequence of coded representations of
terchange of information using a telematic service.
characters, in accordance with one or more identified
The character repertoire and the coding used by a
standards for coded character sets
telematic service are defined by the specification of
that service. The transmission of information based on
3.5
the specifications of this Technical Specification using
coded character set
a telematic service may necessitate an adaptation of
code
the number of characters of a repertoire (repertoire
set of unambiguous rules that establishes a character
transformation function) or a change to the coding
set and the one-to-one relationship between the char-
(code transformation function).
acters of the set and their bit combinations
3.6
2 Normative references
code extension
techniques for the encoding of characters that are not
This Technical Specification incorporates by dated or
included in the character set of a given code
undated reference, provisions from other publications.
These normative references are cited at the appropri-
3.7
ate places in the text and the publications are listed
code table
hereafter. For dated references, subsequent amend-
table showing the characters allocated to each bit
ments to or revisions of any of these publications ap-
combination in a code
ply to this Technical Specification only when incorpo-
rated in it by amendment or revision. For undated ref-
3.8
erences the latest edition of the publication referred to
control character
applies.
control function the coded representation of which
consists of a single bit combination
ISO/IEC 2022:1994, Information technology – Char-
acter code structure and extension techniques.
3.9
control function
ISO 2375:1985, Data processing – Procedure for
action that affects the recording, processing, trans-
registration of escape sequences
mission or interpretation of data, and that has a coded
representation consisting of one or more bit combina-
ISO/IEC 4873:1991, Information technology – ISO
tions
8-bit code for information interchange – Structure and
rules for implementation.
3.10
to designate
to identify a set of characters that are to be repre-
3 Terms and definitions sented, in some cases immediately and in others on
the occurrence of a further control function, in a pre-
scribed manner
For the purposes of this Technical Specification, the
following terms and definitions apply:
3.11
device
component of information processing equipment
3.1
which can transmit and/or receive coded in-formation
bit combination
within CC-data-elements; it may be an input/output
ordered set of bits used for the representation of
device in the conventional sense, or a process such
characters
as an application program or gateway function
4
---------------------- Page: 6 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
3.12 4.2.2 Device description
escape sequence
string of bit combinations that is used for control pur- A device that conforms to this Technical Specification
poses in code extension procedures; the first of these
shall be the subject of a description that identifies the
bit combinations represents the control function ES- means by which the user may supply characters to
CAPE the device, or may recognize them when they are
made available to him, as specified respectively in
3.13 clauses 4.2.3 and 4.2.4.
graphic character
character, other than a control function, that has a
4.2.3 Originating devices
visual representation normally handwritten, printed or
displayed, and that has a coded representation con-
An originating device shall allow its user to supply any
sisting of one or more bit combinations
sequence of graphic characters from the option
adopted, and shall be capable of transmitting their
NOTE In CEN/TS 1923 a single bit combination is
coded representations within a CC-data-element.
used to represent each character.
3.14
4.2.4 Receiving devices
G-set
same as "coded graphic character set" in ISO/IEC
A receiving device shall be capable of receiving and
2022:1994
interpreting any coded representations of graphic
characters that are within a CC-data-element, and that
3.15
conform to clause 4.1, and shall make the corre-
position
sponding characters available to the user in such a
that part of a code table identified by its column and
way that the user can identify them from among those
row coordinates
conforming to the option adopted, and can distinguish
them from each other.
3.16
repertoire
specified set of characters that are each represented
5 Scenario description
by one or more bit combinations of a coded character
set
5.1 Repertoires
3.17
user There are four collections of graphic characters identi-
person or other entity that invokes the services pro- fied in this Technical Specification, comprising the
vided by a device; this entity may be a process such characters needed for the:
as an application program if the "device" is a code
convertor or a gateway function, for example
Latin script
Greek script
4 Conformance
Cyrillic script
4.1 Conformance for information interchange
Symbols repertoire
A CC-data-element within coded information for inter-
These collections are further divided into repertoires
change is in conformance with this Technical Specifi-
as described in clause 6.
cation if all the coded representations of graphic char-
acters within that CC-data-element conform to the
requirements of clauses 6 and 7. 5.2 Combinations of repertoires and their
coding
4.2 Conformance of devices
This Technical Specificationidentifies combinations of
character repertoires and their coding as options. An
4.2.1 General
option identified in this Technical Specificationdefines
only the minimum requirements, in terms of character
A device is in conformance with this Technical Speci-
repertoire and coding, applied to a conforming device.
fication if it conforms to the requirements of clause
Additional capabilities of the originating or receiving
4.2.2, and either or both of clauses 4.2.3 and 4.2.4. A
device may be used, during the information inter-
claim of conformance shall identify the document
change, subject to bilateral agreement.
which contains the description specified in clause
4.2.2, and shall identify the option adopted.
5
---------------------- Page: 7 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
8-bit single byte coding shall be a version of ISO/IEC The Large-Latin-8-a repertoire for the 8-bit environ-
4873:1991, clause 9; with the exception of the Invari- ment, comprising the union of the Basic-Latin-a rep-
ant-Latin repertoire (see below). ertoire with the repertoires of ISO-IR 101 and ISO-IR
154 (repertoire LL8a). It is a true superset of the
NOTE This Technical Specificationis intended to be Basic-Latin-a repertoire.
used with other standards specifying control functions, as
needed by the base coding standards.
The Large-Latin-8-b repertoire for the 8-bit environ-
ment, comprising the union of the Basic-Latin-b rep-
ertoire with the repertoires of ISO-IR 101 and ISO-IR
154 (repertoire LL8b). It is a true superset of the
6 Repertoire descriptions
Basic-Latin-b repertoire.
The following descriptions refer to registrations ac-
The Celtic repertoire containing 96 characters as de-
cording to ISO 2375:1985 in the ISO "International
fined in ISO-IR 199 (repertoire BK).
register of coded character sets to be used with es-
cape sequences" (ISO-IR).
NOTE The set has been added in the revision. It is
intended for use together with the Initial-Latin repertoire.
NOTE Some identical coded character sets or sub-
sets/supersets of them also exist in ISO/IEC 10646-1:2000
and/or in ISO/IEC 7/8-bit standards; see annexes A and B The Romanian repertoire containing 96 characters as
for details.
defined in ISO-IR 226 (repertoire BR).
NOTE The set has been added in the revision. It is
6.1 Latin script
intended for use together with the Initial-Latin repertoire.
Nine subsets of this collection of graphic characters
6.2 Greek script
are identified, each with a subset/superset relation
with the others. The subsets are the following:
In the 8-bit environment only one Greek repertoire is
defined, which is:
The Invariant-Latin repertoire containing 83 charac-
ters as defined in ISO-IR 170 (repertoire IVL).
The Basic-Greek repertoire comprising the charac-
ters defined in ISO-IR 126 (repertoire BG).
The Initial-Latin repertoire containing 95 characters
as defined in ISO-IR 6 (repertoire IL). It is a true su-
perset of the Invariant-Latin repertoire.
6.3 Cyrillic script
The Basic-Latin-a repertoire comprising the Initial-
In the 8-bit environment only one Cyrillic repertoire is
Latin repertoire plus the repertoire of ISO-IR 100
defined, which is:
"Latin-1 Supplement" (repertoire BLa). It is a true su-
perset of the Initial-Latin repertoire.
The Basic-Cyrillic repertoire comprising the charac-
ters defined in ISO-IR 144 (repertoire BC).
NOTE This set was named Basic-Latin (BL) in EN
1923:1998.
6.4 The symbols repertoire
The Basic-Latin-b repertoire comprising the Initial-
This repertoire shall comprise the characters defined
Latin repertoire plus the repertoire of ISO-IR 204
in ISO-IR 155 (repertoire BS).
"Supplementary Set for Latin-1 alternative with Euro
sign" (repertoire BLb). It is a true superset of the Ini-
tial-Latin repertoire.
NOTE The set has been added in the revision, to pro-
vide a coding scheme corresponding to Basic-Latin-a but
also containing the Euro sign.
The Basic-Latin-c repertoire comprising the Initial-
Latin repertoire plus the repertoire of ISO-IR 203
"European Supplementary Latin Set (’Latin-9’)" (rep-
ertoire BLc). It is a true superset of the Initial-Latin
repertoire.
NOTE The set has been added in the revision, to pro-
vide a coding scheme containing the Euro sign, and also to
complement the repertoire of European letters.
6
---------------------- Page: 8 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
The G0 element shall contain the Initial-Latin reper-
7 Coding methods applicable
toire and shall be coded and designated according to
paragraph 7.2.2.
7.1 8-bit single-byte coding
The "Latin-1 Supplement" repertoire shall form either
Each character shall be coded by the use of a single
a G1 or a G2 or a G3 set in a version of ISO/IEC
byte. No control function shall be used that would
4873:1991. The characters shall be arranged in the
cause characters within a repertoire to be combined to
code table as specified in ISO-IR 100.
represent any other character.
The escape sequences to designate this set will be:
The various repertoires shall form G-sets, according
to the relevant provisions of ISO/IEC 2022:1994.
When code extension techniques are applied, then
the provisions of ISO/IEC 2022:1994 and ISO/IEC
4873:1991 shall be followed. The application should
7.2.4 Basic-Latin-b repertoire
always conform to a certain level of ISO/IEC
4873:1991 (except for the Invariant-Latin repertoire;
The BLb repertoire shall form two G-sets in a version
see below).
of ISO/ IEC 4873:1991.
When code extension techniques are applied, then all
The G0 element shall contain the Initial-Latin reper-
the necessary control functions must exist, coded as
toire and shall be coded and designated according to
specified in ISO/IEC 4873:1991.
paragraph 7.2.2.
7.2 Formation of G-sets
The "Supplementary Set for Latin-1 alternative with
Euro sign" repertoire shall form either a G1 or a G2 or
The characters belonging to the repertoires defined in
a G3 set in a version of ISO/IEC 4873:1991. The
clause 6 shall be arranged to the code table positions
characters shall be arranged in the code table as
and shall form G-sets as specified in the following.
specified in ISO-IR 204.
7.2.1 Invariant-Latin repertoire The escape sequences to designate this set will be:
The IVL repertoire shall always form a G0 code ele-
ment according to ISO/ IEC 2022:1994.
The characters shall be arranged in the code table as
specified in ISO-IR 170.
7.2.5 Basic-Latin-c repertoire
The escape sequence to designate this set will be:
The BLc repertoire shall form two G-sets in a version
of ISO/IEC 4873:1991.
The G0 element shall contain the Initial-Latin reper-
toire and shall be coded and designated according to
7.2.2 Initial-Latin repertoire
paragraph 7.2.2.
The IL repertoire shall always form a G0 code element
The "European Supplementary Latin Set" repertoire
in a version of ISO/ IEC 4873:1991.
shall form either a G1 or a G2 or a G3 set in a version
of ISO/IEC 4873:1991. The characters shall be ar-
The characters shall be arranged in the code table as
ranged in the code table as specified in ISO-IR 203.
specified in ISO-IR 6.
The escape sequences to designate this set will be:
The escape sequence to designate this set will be:
7.2.3 Basic-Latin-a repertoire
The BLa repertoire shall form two G-sets in a version
of ISO/ IEC 4873:1991.
7
---------------------- Page: 9 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
7.2.6 Large-Latin-8-a repertoire
The LL8a repertoire shall form four G-sets in a version
of ISO/IEC 4873:1991.
7.2.10 Basic-Greek repertoire
Two G-sets will contain the BLa repertoire and shall
be coded and designated according to 7.2.3.
The BG repertoire shall form one G-set in a version of
ISO/IEC 4873:1991.
The rest of the repertoire shall be arranged in the
code table positions as specified in ISO-IR 101 and
The repertoire shall be arranged in the code table as
ISO-IR 154, thus forming two G-sets that can be used
specified in ISO-IR 126 as a G1 or G2 or G3 set.
as G1 or G2 or G3 sets in a version of ISO/IEC
4873:1991.
The escape sequences to designate this set will be:
The escape sequences to designate the registration
101 set will be:
NOTE A new Greek registration, tentatively designated
ISO-IR 227, is at present in processing; for further informa-
tion see character set ISO-IR 126 in Annex A.
The escape sequences to designate the registration
154 set will be:
7.2.11 Basic-Cyrillic repertoire
The BC repertoire shall form one G-set in a version of
ISO/IEC 4873:1991.
The repertoire shall be arranged in the code table as
7.2.7 Large-Latin-8-b repertoire specified in ISO-IR 144, as a G1 or G2 or G3 set.
The LL8b repertoire is identical to Large-Latin-8-a ex-
The escape sequences to designate this set will be:
cept that two G-sets will contain the BLb repertoire,
coded and designated according to 7.2.4. For the rest
of the repertoire the same coding and designations as
specified in 7.2.6 applies.
7.2.8 Celtic repertoire
7.2.12 Symbols repertoire
The BK repertoire shall form one G-set in a version of
The BS repertoire shall form one G-set in a version of
ISO/IEC 4873:1991.
ISO/IEC 4873:1991.
The repertoire shall be arranged in the code table as
The repertoire shall be arranged in the code table as
specified in ISO-IR 199 as a G1 or G2 or G3 set.
specified in ISO-IR 155, as a G1 or G2 or G3 set.
The escape sequences to designate this set will be:
The escape sequences to designate this set will be:
7.2.9 Romanian repertoire
8 Identification of options
The BR repertoire shall form one G-set in a version of
ISO/IEC 4873:1991.
If a reference to this Technical Specificationis made in
another document, the option adopted shall be clearly
The repertoire shall be arranged in the code table as
identified.
specified in ISO-IR 226 as a G1 or G2 or G3 set.
Table 1 summarizes the options that conform to the
The escape sequences to designate this set will be:
requirements of this Technical Specification.
8
---------------------- Page: 10 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
Table 1 — Summary of options
!"!"
# # !"!"
$% % !"
!"!"
& !"!"
!"!"
' ( !"!"
) " !"!"
* !"!"
# #* !"
* * !"
# #* * !"
& * !"!"
#& #* !"
& * * !"
#& #* * !"
& * * !"
& * * * !"
#& #* * !"
' * ( !"!"
) * " !"!"
The letter "x" in the table above stands for "a", "b" or "c", indicating repertoire Basic-Latin-a, Basic-Latin-b or Basic-
Latin-c, respectively. The letter "y" stands for either "a" or "b", indicating repertoire Large-Latin-8-a or Large-Latin-8-b,
respectively.
For instance, option Ca specifies repertoire Basic-Latin-a; and option CcE repertoire Basic-Latin-c + Basic-Greek.
9
---------------------- Page: 11 ----------------------
SIST-TS CEN/TS 1923:2003
CEN/TS 1923:2003 (E)
Annex A
(informative)
Specifications of referenced ISO-IR code tables
Several of the ISO-IR registrations referenced in In this informative annex the code tables of all the
clauses 6 and 7 were developed before ISO/IEC registrations referred to are presented, with character
10646 existed. The character names in those registra- names harmonized and identifiers added. Comments
tions are therefore not harmonized with the names in on their relationship to corresponding tables in ISO/IEC
ISO/IEC 10646-1, and character identifications (on the 10646-1:2000 are also provided.
form U+xxxx) are missing.
Table A.1 – Character set ISO-IR 6 (Basic Latin), coded represen
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.