kSIST ISO 21636-1:2024
(Main)Language coding — A framework for language varieties — Part 1: Vocabulary
Language coding — A framework for language varieties — Part 1: Vocabulary
The ISO 21636 series provides a framework for the identification and description of varieties of all individual human languages (see ISO 639).
It is applicable to sign languages.
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);
— those means of human communication which are neither fully nor largely equivalent to human language (such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely combined into complex expressions).
This document defines the terms necessary to identify basic dimensions and sub-dimensions of linguistic variation and the resulting varieties, including major modalities of human communication.
Codage des langues — Identification et description des variétés de langues — Partie 1: Vocabulaire
Jezikovno kodiranje - Ogrodje za jezikovne različice - 1. del: Slovar
General Information
Standards Content (Sample)
International
Standard
ISO 21636-1
First edition
Language coding — A framework
2024-06
for language varieties —
Part 1:
Vocabulary
Codage des langues — Identification et description des variétés
de langues —
Partie 1: Vocabulaire
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Terms related to language and languages .1
3.2 Terms related to linguistic variation and language varieties .3
3.3 Terms related to dimensions of linguistic variation .4
3.4 Terms related to types of language varieties .5
3.5 Terms related to specific language modalities .8
3.6 Terms related to major language registers .11
3.7 Terms related to the documentation of language resources . 12
3.8 Terms related to certainty . 13
Bibliography . 14
Index .15
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 2, Terminology workflow and language coding.
A list of all parts of the ISO 21636 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
An increasing amount of digital language resources (LRs) are being created (including via retro-digitization),
archived, processed and analysed. Within this context, the detailed and exact characterization of language
varieties present in a given language use event is quickly gaining importance. Here, language use includes
all modalities such as written, spoken, or signed, and also new forms of language use supported by digital
technology (in social media and similar forms of digital communication). Such modalities demonstrate one
way in which languages vary internally. Others include, for instance, familiar regional (dialectal) and social
variation.
In the past, a primary goal of working with LRs was the archiving and preservation of LRs. However, new
goals have now emerged and are still emerging:
— Institutions and individuals need to exchange metadata (i.e. bibliographic description data and other
secondary information) for making the information on existing LRs widely available in a harmonized form.
— Researchers are identifying primary data (i.e. the LRs themselves) for various research purposes,
including research on linguistic variation.
— Researchers and developers need LRs for the development of more advanced language technologies (LTs)
and for testing purposes, because LTs, in particular those concerning speech recognition and language
analysis, are entering more dimensions of human communication.
In order to achieve the above-mentioned goals and purposes, along with others not outlined in the
ISO 21636 series, a standardized set of metadata for the identification of language varieties is important
for guaranteeing the frictionless exchange of secondary information. Well-organized metadata also help
to indicate the degree of interoperability (equalling re-usability and re-purposability of LRs), and the
applicability of LTs to different situations or LRs over time. These metadata are applicable in eBusiness,
eHealth, eGovernment, eInclusion, eLearning, smart environments, ambient assisted living (AAL), and
virtually all other information-rich applications which depend on information about LRs. A clear metadata
approach is also a prerequisite for the durability of LR archiving (in particular in the case of cultural heritage
and scientific research data).
ISO 639 provides a framework for identifying the individual languages used in an LR. The ISO 21636 series
presupposes and complements ISO 639 in that it extends the language coding framework in order to allow for
the identification of different types of language varieties (e.g. geographical, social, modal). The identification
of language varieties can then be included in general metadata, library metadata and archival metadata for
describing LRs (which may also include technical information, time and location of recording, and similar
general information, which are not included in the ISO 21636 series).
The conceptual framework developed in this document for dealing with linguistic variation respects the
major approaches represented in the linguistic literature without simply reproducing them. The framework
is closest though in general orientation and in a number of details, such as the role assigned to idiolects, to
[5]
work of a type represented by Lieb .
This document comprises:
— terms and definitions underlying a general conceptual framework to coherently deal with language-
internal linguistic variation;
— terms and definitions for a set of dimensions for identifying and describing language varieties.
Stakeholders include, but are not limited to:
— information and communication technologies (ICTs) industry (including LTs);
— libraries;
— the media industry (including entertainment);
— internet communities;
v
— people engaging in language documentation and preservation;
— language archivists;
— researchers (linguists, in particular sociolinguists, ethnologists, sociologists, etc.);
— people and institutions providing language training;
— emerging new user communities.
It is anticipated that these stakeholders will need to refer not only to a certain individual language, but also
to a certain language variety, for instance for oral human-computer interaction, or for tailoring a certain
LR or LT to the needs and specific environment of a target user group. An initial step towards achieving
the needed specificity involves the ability to identify the dimension(s) of linguistic variation internal to
individual languages involved, and the respective relevant language varieties. A conceptually sound uniform
framework of reference as developed in the ISO 21636 series is superior to the proliferation of different
individual ad-hoc solutions.
vi
International Standard ISO 21636-1:2024(en)
Language coding — A framework for language varieties —
Part 1:
Vocabulary
1 Scope
The ISO 21636 series provides a framework for the identification and description of varieties of all individual
human languages (see ISO 639).
It is applicable to sign languages.
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);
— those means of human communication which are neither fully nor largely equivalent to human language
(such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely
combined into complex expressions).
This document defines the terms necessary to identify basic dimensions and sub-dimensions of linguistic
variation and the resulting varieties, including major modalities of human communication.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 Terms related to language and languages
3.1.1
human language
means of communication characterized by a systematic use of sounds, visual-spatial signs, characters or
other written symbols or signs that can be combined to express or communicate meaning or a message
between humans
Note 1 to entry: Human language was originally developed for, and mainly used in, direct communication between
humans. Today its use is increasingly supported by information and communication technologies (ICTs).
Note 2 to entry: As the term “language” can represent different concepts, it is not listed as a synonym to the term
“human language”.
Note 3 to entry: Visual-spatial signs are indicated under signed modality (3.5.4).
3.1.2
idiolect
comprehensive set of all expressions of human language (3.1.1) with their meaning, characterized by a
coherent system of structural features, which is capable of coding complex facts and thoughts, potentially
used by a given individual person, in a given type of situation, at a given time, and in a given medium
Note 1 to entry: Typically, a person has command of several idiolects of an individual language (3.1.3), for instance
written and spoken idiolects (belonging to different language modalities; see 3.4.7 and 3.5), and idiolects for situations
with different degrees of formality (belonging to different language registers; see 3.4.8 and 3.6).
3.1.3
individual language
individual human language
largest set of idiolects (3.1.2), used by different speakers (3.1.5), which are all interconnected through
high mutual intelligibility, or through a chain of high mutual intelligibility, or which are sociopolitically
considered as a unit equivalent to such a largest set
Note 1 to entry: Individual languages also encompass constructed langua
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.