Language coding — A framework for language varieties — Part 3: Application of the framework

The ISO 21636 series provides a framework for the identification and description of varieties of all individual human languages (see ISO 639).  
It is applicable to sign languages.  
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);  
— those means of human communication which are neither fully nor largely equivalent to human language (such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely combined into complex expressions).
This document gives guidance on how to apply the framework to identify basic dimensions and sub-dimensions of linguistic variation and the resulting varieties, including major modalities of human communication. It does not include any code or individual identifiers.
This document is structured strictly analogously to ISO/TR 21636-2. For a general description of the dimension and varieties dealt with in each clause, the user can refer to the corresponding clause in that document.
This document focuses only on the identification and description of language varieties, not on the general, formal or technical aspects of the description of human language resources (LRs), which are covered by general metadata frameworks.
NOTE 1 For the general description of a language resource, a user can minimally apply at least the metadata of the Open Language Archives Community (OLAC) metadata standard, which provides an application of the Dublin Core metadata element set as defined by the Dublin Core Metadata Initiative (DCMI). These descriptors have been recognized in ISO 15836-1:2017.
NOTE 2 The Component Metadata Infrastructure (CMDI) provides a best practice guide for the sake of technical and content interoperability between LRs as well as of their sustainability.

Codage des langues — Identification et description des variétés de langues — Partie 3: Exigences et recommandations pour la mise en œuvre

Jezikovno kodiranje - Ogrodje za jezikovne različice - 3. del: Uporaba ogrodja

Skupina standardov ISO 21636 zagotavlja ogrodje za identifikacijo in opis različic vse posameznih človeških jezikov (glej standard ISO 639).  
Uporablja se za znakovne jezike.  
Ne uporablja se za:
– sredstva umetne komunikacije, ki poteka s stroji ali med njimi (npr. programski jeziki);  
– sredstva človeške komunikacije, ki niso v celoti oziroma v velikem delu enakovredna človeškemu jeziku (npr. posamezni simboli ali kretnje z ločenim pomenom, ki jih ni mogoče prosto kombinirati v zapletene izraze).
Ta dokument podaja navodila za uporabo ogrodja za identifikacijo osnovnih dimenzij in poddimenzij jezikovnega razlikovanja in nastalih različic, vključno z glavnimi modalitetami človeške komunikacije. Ne vključuje nobene kode ali posameznih identifikatorjev.
Struktura tega dokumenta je v celoti enaka standardu ISO/TR 21636-2. Splošne opise dimenzij in različic, obravnavanih v posameznih točkah, lahko uporabnik najde v ustreznih točkah v tem dokumentu.
Ta dokument se osredotoča samo na identifikacijo in opis jezikovnih različic, ne pa tudi na splošne, formalne ali tehnične vidike opisa človeških jezikovnih virov (LR), ki so zajeti v splošnih okvirih metapodatkov.
OPOMBA 1: Za splošen opis jezikovnega vira je dovolj, če uporabnik uporabi vsaj metapodatke standarda za metapodatke Open Language Archives Community (OLAC), ki zagotavlja uporabo nabora metapodatkovnih elementov Dublin Core, kot je opredeljeno v pobudi za metapodatke Dublin Core (DCMI). Ti deskriptorji so določeni v standardu ISO 15836-1:2017.
OPOMBA 2: Infrastruktura komponentnih metapodatkov (CMDI) zagotavlja vodnik po najboljših praksah zaradi tehnične in vsebinske interoperabilnosti med jezikovnimi viri ter njihove trajnosti.

General Information

Status
Published
Publication Date
06-Oct-2024
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
17-Sep-2024
Due Date
22-Nov-2024
Completion Date
07-Oct-2024
Standard
SIST ISO 21636-3:2024
English language
16 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 21636-3:2024 - Language coding — A framework for language varieties — Part 3: Application of the framework Released:1. 06. 2024
English language
10 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


SLOVENSKI STANDARD
01-november-2024
Jezikovno kodiranje - Ogrodje za jezikovne različice - 3. del: Uporaba ogrodja
Language coding — A framework for language varieties — Part 3: Application of the
framework
Codage des langues — Identification et description des variétés de langues — Partie 3:
Exigences et recommandations pour la mise en œuvre
Ta slovenski standard je istoveten z: ISO 21636-3:2024
ICS:
01.140.20 Informacijske vede Information sciences
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

International
Standard
ISO 21636-3
First edition
Language coding — A framework
2024-06
for language varieties —
Part 3:
Application of the framework
Codage des langues — Identification et description des variétés
de langues —
Partie 3: Exigences et recommandations pour la mise en œuvre
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and abbreviated terms . 2
3.1 Terms and definitions .2
3.2 Abbreviated terms .2
4 Indication of language varieties according to the dimensions of linguistic variation . 2
4.1 Overview .2
4.2 Indication of individual language varieties .3
4.3 Indication of the (geographical) space dimension of linguistic variation .4
4.4 Indication of the time dimension of linguistic variation .4
4.5 Indication of the social group dimension of linguistic variation .5
4.6 Indication of the medium dimension of linguistic variation .5
4.7 Indication of the situation dimension of linguistic variation .7
4.8 Indication of the person dimension of linguistic variation .8
4.9 Indication of the proficiency dimension of linguistic variation .8
4.10 Indication of the communicative functioning dimension of linguistic variation .8
Bibliography .10

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 2, Terminology workflow and language coding.
A list of all parts of the ISO 21636 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
An increasing amount of digital language resources (LRs) are being created (including via retro-digitization),
archived, processed and analysed. Within this context, the detailed and exact characterization of language
varieties present in a given language use event is quickly gaining importance. Here, language use includes
all modalities such as written, spoken, or signed, and also new forms of language use supported by digital
technology (in social media and similar forms of digital communication). Such modalities demonstrate one
way in which languages vary internally. Others include, for instance, familiar regional (dialectal) and social
variation.
In the past, a primary goal of working with LRs was the archiving and preservation of LRs. However, new
goals have now emerged and are still emerging:
— Institutions and individuals need to exchange metadata (i.e. bibliographic description data and other
secondary information) for making the information on existing LRs widely available in a harmonized form.
— Researchers are identifying primary data (i.e. the LRs themselves) for various research purposes,
including research on linguistic variation.
— Researchers and developers need LRs for the development of more advanced language technologies (LTs)
and for testing purposes, because LTs, in particular those concerning speech recognition and language
analysis, are entering more dimensions of human communication.
In order to achieve the above-mentioned goals and purposes, along with others not outlined in the
ISO 21636 series, a standardized set of metadata for the identification of language varieties is important
for guaranteeing the frictionless exchange of secondary information. Well-organized metadata also help
to indicate the degree of interoperability (equalling re-usability and re-purposability of LRs), and the
applicability of LTs to different situations or LRs over time. These metadata are applicable in eBusiness,
eHealth, eGovernment, eInclusion, eLearning, smart environments, ambient assisted living (AAL), and
virtually all other information-rich applications which depend on information about LRs. A clear metadata
approach is also a prerequisite for the durability of LR archiving (in particular in the case of cultural heritage
and scientific research data).
ISO 639 provides a framework for identifying the individual languages used in an LR. The ISO 21636 series
presupposes and complements ISO 639 in that it extends the language coding framework in order to allow for
the identification of different types of language varieties (e.g. geographical, social, modal). The identification
of language varieties can then be included in general metadata, library metadata and archival metadata for
describing LRs (which may also include technical information, time and location of recording, and similar
general information, which are not included in the ISO 21636 series).
The conceptual framework developed in this document for dealing with linguistic variation respects the
major approaches represented in the linguistic literature without simply reproducing them. The framework
is closest though in general orientation and in a number of details, such as the role assigned to idiolects, to
[6]
work of a type represented by Lieb .
The metadata categories and values addressed in this document can be candidates for a future fine-grained
coding of language varieties based on the comprehensive principles of the ISO 21636 series. Thus, this
document fits within the general framework of the ISO/IEC 11179 series for metadata.
Stakeholders include, but are not limited to:
— information and communication technologies (ICTs) industry (including LTs);
— libraries;
— the media industry (including entertainment);
— internet communities;
— people engaging in language documentation and preservation;

v
— language archivists;
— researchers (linguists, in particular sociolinguists, ethnologists, sociologists, etc.);
— people and institutions providing language training;
— emerging new user communities.
It is anticipated that these stakeholders will need to refer not only to a certain individual language, but also
to a certain language variety, for instance for oral human-computer interaction, or for tailoring a certain
LR or LT to the needs and specific environment of a target user group. An initial step towards achieving
the needed specificity involves the ability to identify the dimension(s) of linguistic variation internal to
individual languages involved, and the respective relevant language varieties. A conceptually sound uniform
framework of reference as developed in the ISO 21636 series is superior to the proliferation of different
individual ad-hoc solutions.
vi
International Standard ISO 21636-3:2024(en)
Language coding — A framework for language varieties —
Part 3:
Application of the framework
1 Scope
The ISO 21636 series provides a framework for the identification and description of varieties of all individual
human languages (see ISO 639).
It is applicable to sign languages.
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);
— those means of human communication which are neither fully nor largely equivalent to human language
(such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely
combined into complex expressions).
This document gives guidance on how to apply the framework to identify basic dimensions and sub-
dimensions of linguistic variation and the resulting varieties, including major modalities of human
communication. It does not include any code or individual identifiers.
This document is structured strictly analogously to ISO/TR 21636-2. For a general description of the
dimension and varieties dealt with in each clause, the user can refer to the corresponding clause in that
document.
This document focuses only on the identification and description of language varieties, not on the general,
formal or technical aspects of the description of human language resources (LRs), which are covered by
general metadata frameworks.
NOTE 1 For the general description of a language resource, a user can minimally apply at least the metadata of
[7]
the Open Language Archives Community (OLAC) metadata standard , which provides an application of the Dublin
[8]
Core metadata element set as defined by the Dublin Core Metadata Initiative (DCMI) . These descriptors have been
recognized in ISO 15836-1:2017.
[9] [10]
NOTE 2 The Component Metadata Infrastructure (CMDI) provides a best practice guide for the sake of
technical and content interoperability between LRs as well as of their sustainability.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 21636-1, Language coding — A framework for language varieties — Part 1: Vocabulary

3 Terms, definitions and abbreviated terms
3.1 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 21636-1 apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.2 Abbreviated terms
ACTFL American Council on the Teaching of Foreign Languages
AAL ambient assisted living
BCP Best Current Practice
CEFRL Common European Framework of Reference for Languages
CMDI Component Metadata Infrastructure
DCMI Dublin Core Metadata Initiative
ICT information and communication technology
IETF Internet Engineering Task Force
ILR Interagency Language Roundtable
LR language resource
LT language technology
OLAC Open Language Archives Community
4 Indication of language varieties according to the dimensions of linguistic variation
4.1 Overview
All major metadata sets for the formal description of human communication and LRs include the
identification of the individual language(s) of the LR in accordance with ISO 639.
The ISO 21636 series, in turn, specifies which type of descriptors are needed to exhaustively account for
the place of an event of language use or an LR in the multidimensional space of linguistic variation within
an individual language. Therefore, the ISO 21636 series represents the basis for extensions to the major
[7] [9]
metadata standards such as the OLAC or CMDI metadata formats.
In particular, it can be employed as the basis for a coherent system of extensions to the widely used IETF Best
[11]
Current Practice 47 (BCP 47) , which specifies “Language Tags”. BCP 47 already includes the identification
of language varieties (in particular, dialects, scripts and regions), but the BCP 47 recommendations are far
from complete. It is also feasible to define extensions to subtag elements. These extensions may be used
to implement the concepts established in this document, for example by establishing one extension for
each dimension of linguistic variation, or just one extension that combines both the dimension of linguistic
variation and its assigned value.
With the exception of examples and some very general values (e.g. in the medium dimension), this document
does not propose any concrete values to be used within the framework defined. A mechanism to register

these values will complement this document in future editions, providing an implementation based on this
document.
Certain dimensions of linguistic variation are for all practical purposes open lists, such as dialects (different
for each individual language), language periods (potentially different for each individual language)
and sociolects (some potentially specific to certain individual languages). According to this document,
the language modalities and language registers as well as proficiency levels of learner varieties and
communicative functioning abilities and constraints comprise shorter lists. The individual dimension
of linguistic variation is already covered by identifying the speaker or speakers, which often happens
elsewhere in the metadata and will usually not need to be coded among the language varieties.
It is important for all stakeholders to be aware of the complexity of linguistic variation within individual
languages and that at least the eight different dimensions of linguistic variation identified and described in
the ISO 21636 series are needed to cope with these complexities. This document develops a conception of
individual languages as sets of idiolects, and varieties as subsets that are characterized by external criteria
(containing properties related to space, time, social space, medium, etc.) and structural criteria (properties
related to pronunciation, lexical items, etc.). This conception is offered as a capable model for dealing with
linguistic variation for the purpose of standardization. The need to recognize the establishment of individual
varieties is an empirical question, and also practically depends on the application. As such, individual values
and resulting tags can be subject to scholarly debate, but this does not take away the need for a general
framework of reference.
4.2 Indication of individual language varieties
Any given event of language use, represented in an LR, belongs simultaneously to a certain dialect, to a
certain language period, to a certain language modality (it may be written, or oral, etc.), to a certain sociolect,
etc. Therefore, the values for each of these dimensions of linguistic variation can and should be stated side
by side.
For the sake of optimal re-usability, it is generally advisable to identify language varieties of as many
dimensions of linguistic variation as possible used in a given LR, as far as they are known. For this purpose,
established conventions and labels for identifying specific varieties should be followed whenever possible.
For the retrieval and re-use of LRs, the specification of language varieties can be crucial, even if they were
not the focus of the creators of the LRs at hand. Therefore, the position of a given LR should ideally be made
explicit for each of the dimensions of linguistic variation.
However, it is not possible or relevant in all cases to indicate the respective varieties according to all
dimensions of linguistic variation. In the course of the description and identification of the language varieties
at hand, omission of a dimension of linguistic variation should also be made explicit, for instance by marking
them as “unspecified”, or a more specific value, such as “unknown”.
EXAMPLE 1 [dialect:] unspecified.
For the purpose of practicality, it may be established that leaving out a certain dimension of linguistic
variation implies that the value (variety) for that dimension of linguistic variation is unspecified, or that
it belongs to a certain value – for instance, leaving out the proficiency dimension specification can imply
that only native speakers are involved. However, such implicit implications should be made explicit at some
prominent place in the general description of the LRs at hand.
The specification of any of the dimensions of linguistic variation can vary with respect to its reliability or
certainty status – for example, some values are being assumed or inferred, others are certain or confirmed.
This certainty status of the specification should be made explicit by adding a certainty status label
immediately to the language variety indication in question, such as “unconfirmed”, or more specifically
“assumed” or “inferred”, or similar. This label then indicates a property that refers to the respective
statement of the language variety, not itself to the event of language use or the LR in question. In this sense,
it is meta-meta-information.
EXAMPLE 2 [dialect:] Bavarian (certainty: inferred).

The fact that an LR contains several events of language use that belong to different language varieties
belonging to the same dimension of linguistic variation, such as a dialogue between speakers that use
different dialects, should be clearly stated.
Sometimes, the idiolect used in an LR is deliberately chosen to imitate that of different variety than the
one that would naturally apply to the speaker or situation. In such cases, the fact that a language variety is
applied to imitate another language variety should be indicated by a qualification such as “adopted”, “non-
original”, “imitated” or similar. Again, this qualification applies only to the one variety in question and not to
the speech-event as such.
EXAMPLE 3 [time period:] Victorian Engl
...


International
Standard
ISO 21636-3
First edition
Language coding — A framework
2024-06
for language varieties —
Part 3:
Application of the framework
Codage des langues — Identification et description des variétés
de langues —
Partie 3: Exigences et recommandations pour la mise en œuvre
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and abbreviated terms . 2
3.1 Terms and definitions .2
3.2 Abbreviated terms .2
4 Indication of language varieties according to the dimensions of linguistic variation . 2
4.1 Overview .2
4.2 Indication of individual language varieties .3
4.3 Indication of the (geographical) space dimension of linguistic variation .4
4.4 Indication of the time dimension of linguistic variation .4
4.5 Indication of the social group dimension of linguistic variation .5
4.6 Indication of the medium dimension of linguistic variation .5
4.7 Indication of the situation dimension of linguistic variation .7
4.8 Indication of the person dimension of linguistic variation .8
4.9 Indication of the proficiency dimension of linguistic variation .8
4.10 Indication of the communicative functioning dimension of linguistic variation .8
Bibliography .10

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 2, Terminology workflow and language coding.
A list of all parts of the ISO 21636 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
An increasing amount of digital language resources (LRs) are being created (including via retro-digitization),
archived, processed and analysed. Within this context, the detailed and exact characterization of language
varieties present in a given language use event is quickly gaining importance. Here, language use includes
all modalities such as written, spoken, or signed, and also new forms of language use supported by digital
technology (in social media and similar forms of digital communication). Such modalities demonstrate one
way in which languages vary internally. Others include, for instance, familiar regional (dialectal) and social
variation.
In the past, a primary goal of working with LRs was the archiving and preservation of LRs. However, new
goals have now emerged and are still emerging:
— Institutions and individuals need to exchange metadata (i.e. bibliographic description data and other
secondary information) for making the information on existing LRs widely available in a harmonized form.
— Researchers are identifying primary data (i.e. the LRs themselves) for various research purposes,
including research on linguistic variation.
— Researchers and developers need LRs for the development of more advanced language technologies (LTs)
and for testing purposes, because LTs, in particular those concerning speech recognition and language
analysis, are entering more dimensions of human communication.
In order to achieve the above-mentioned goals and purposes, along with others not outlined in the
ISO 21636 series, a standardized set of metadata for the identification of language varieties is important
for guaranteeing the frictionless exchange of secondary information. Well-organized metadata also help
to indicate the degree of interoperability (equalling re-usability and re-purposability of LRs), and the
applicability of LTs to different situations or LRs over time. These metadata are applicable in eBusiness,
eHealth, eGovernment, eInclusion, eLearning, smart environments, ambient assisted living (AAL), and
virtually all other information-rich applications which depend on information about LRs. A clear metadata
approach is also a prerequisite for the durability of LR archiving (in particular in the case of cultural heritage
and scientific research data).
ISO 639 provides a framework for identifying the individual languages used in an LR. The ISO 21636 series
presupposes and complements ISO 639 in that it extends the language coding framework in order to allow for
the identification of different types of language varieties (e.g. geographical, social, modal). The identification
of language varieties can then be included in general metadata, library metadata and archival metadata for
describing LRs (which may also include technical information, time and location of recording, and similar
general information, which are not included in the ISO 21636 series).
The conceptual framework developed in this document for dealing with linguistic variation respects the
major approaches represented in the linguistic literature without simply reproducing them. The framework
is closest though in general orientation and in a number of details, such as the role assigned to idiolects, to
[6]
work of a type represented by Lieb .
The metadata categories and values addressed in this document can be candidates for a future fine-grained
coding of language varieties based on the comprehensive principles of the ISO 21636 series. Thus, this
document fits within the general framework of the ISO/IEC 11179 series for metadata.
Stakeholders include, but are not limited to:
— information and communication technologies (ICTs) industry (including LTs);
— libraries;
— the media industry (including entertainment);
— internet communities;
— people engaging in language documentation and preservation;

v
— language archivists;
— researchers (linguists, in particular sociolinguists, ethnologists, sociologists, etc.);
— people and institutions providing language training;
— emerging new user communities.
It is anticipated that these stakeholders will need to refer not only to a certain individual language, but also
to a certain language variety, for instance for oral human-computer interaction, or for tailoring a certain
LR or LT to the needs and specific environment of a target user group. An initial step towards achieving
the needed specificity involves the ability to identify the dimension(s) of linguistic variation internal to
individual languages involved, and the respective relevant language varieties. A conceptually sound uniform
framework of reference as developed in the ISO 21636 series is superior to the proliferation of different
individual ad-hoc solutions.
vi
International Standard ISO 21636-3:2024(en)
Language coding — A framework for language varieties —
Part 3:
Application of the framework
1 Scope
The ISO 21636 series provides a framework for the identification and description of varieties of all individual
human languages (see ISO 639).
It is applicable to sign languages.
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);
— those means of human communication which are neither fully nor largely equivalent to human language
(such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely
combined into complex expressions).
This document gives guidance on how to apply the framework to identify basic dimensions and sub-
dimensions of linguistic variation and the resulting varieties, including major modalities of human
communication. It does not include any code or individual identifiers.
This document is structured strictly analogously to ISO/TR 21636-2. For a general description of the
dimension and varieties dealt with in each clause, the user can refer to the corresponding clause in that
document.
This document focuses only on the identification and description of language varieties, not on the general,
formal or technical aspects of the description of human language resources (LRs), which are covered by
general metadata frameworks.
NOTE 1 For the general description of a language resource, a user can minimally apply at least the metadata of
[7]
the Open Language Archives Community (OLAC) metadata standard , which provides an application of the Dublin
[8]
Core metadata element set as defined by the Dublin Core Metadata Initiative (DCMI) . These descriptors have been
recognized in ISO 15836-1:2017.
[9] [10]
NOTE 2 The Component Metadata Infrastructure (CMDI) provides a best practice guide for the sake of
technical and content interoperability between LRs as well as of their sustainability.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 21636-1, Language coding — A framework for language varieties — Part 1: Vocabulary

3 Terms, definitions and abbreviated terms
3.1 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 21636-1 apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.2 Abbreviated terms
ACTFL American Council on the Teaching of Foreign Languages
AAL ambient assisted living
BCP Best Current Practice
CEFRL Common European Framework of Reference for Languages
CMDI Component Metadata Infrastructure
DCMI Dublin Core Metadata Initiative
ICT information and communication technology
IETF Internet Engineering Task Force
ILR Interagency Language Roundtable
LR language resource
LT language technology
OLAC Open Language Archives Community
4 Indication of language varieties according to the dimensions of linguistic variation
4.1 Overview
All major metadata sets for the formal description of human communication and LRs include the
identification of the individual language(s) of the LR in accordance with ISO 639.
The ISO 21636 series, in turn, specifies which type of descriptors are needed to exhaustively account for
the place of an event of language use or an LR in the multidimensional space of linguistic variation within
an individual language. Therefore, the ISO 21636 series represents the basis for extensions to the major
[7] [9]
metadata standards such as the OLAC or CMDI metadata formats.
In particular, it can be employed as the basis for a coherent system of extensions to the widely used IETF Best
[11]
Current Practice 47 (BCP 47) , which specifies “Language Tags”. BCP 47 already includes the identification
of language varieties (in particular, dialects, scripts and regions), but the BCP 47 recommendations are far
from complete. It is also feasible to define extensions to subtag elements. These extensions may be used
to implement the concepts established in this document, for example by establishing one extension for
each dimension of linguistic variation, or just one extension that combines both the dimension of linguistic
variation and its assigned value.
With the exception of examples and some very general values (e.g. in the medium dimension), this document
does not propose any concrete values to be used within the framework defined. A mechanism to register

these values will complement this document in future editions, providing an implementation based on this
document.
Certain dimensions of linguistic variation are for all practical purposes open lists, such as dialects (different
for each individual language), language periods (potentially different for each individual language)
and sociolects (some potentially specific to certain individual languages). According to this document,
the language modalities and language registers as well as proficiency levels of learner varieties and
communicative functioning abilities and constraints comprise shorter lists. The individual dimension
of linguistic variation is already covered by identifying the speaker or speakers, which often happens
elsewhere in the metadata and will usually not need to be coded among the language varieties.
It is important for all stakeholders to be aware of the complexity of linguistic variation within individual
languages and that at least the eight different dimensions of linguistic variation identified and described in
the ISO 21636 series are needed to cope with these complexities. This document develops a conception of
individual languages as sets of idiolects, and varieties as subsets that are characterized by external criteria
(containing properties related to space, time, social space, medium, etc.) and structural criteria (properties
related to pronunciation, lexical items, etc.). This conception is offered as a capable model for dealing with
linguistic variation for the purpose of standardization. The need to recognize the establishment of individual
varieties is an empirical question, and also practically depends on the application. As such, individual values
and resulting tags can be subject to scholarly debate, but this does not take away the need for a general
framework of reference.
4.2 Indication of individual language varieties
Any given event of language use, represented in an LR, belongs simultaneously to a certain dialect, to a
certain language period, to a certain language modality (it may be written, or oral, etc.), to a certain sociolect,
etc. Therefore, the values for each of these dimensions of linguistic variation can and should be stated side
by side.
For the sake of optimal re-usability, it is generally advisable to identify language varieties of as many
dimensions of linguistic variation as possible used in a given LR, as far as they are known. For this purpose,
established conventions and labels for identifying specific varieties should be followed whenever possible.
For the retrieval and re-use of LRs, the specification of language varieties can be crucial, even if they were
not the focus of the creators of the LRs at hand. Therefore, the position of a given LR should ideally be made
explicit for each of the dimensions of linguistic variation.
However, it is not possible or relevant in all cases to indicate the respective varieties according to all
dimensions of linguistic variation. In the course of the description and identification of the language varieties
at hand, omission of a dimension of linguistic variation should also be made explicit, for instance by marking
them as “unspecified”, or a more specific value, such as “unknown”.
EXAMPLE 1 [dialect:] unspecified.
For the purpose of practicality, it may be established that leaving out a certain dimension of linguistic
variation implies that the value (variety) for that dimension of linguistic variation is unspecified, or that
it belongs to a certain value – for instance, leaving out the proficiency dimension specification can imply
that only native speakers are involved. However, such implicit implications should be made explicit at some
prominent place in the general description of the LRs at hand.
The specification of any of the dimensions of linguistic variation can vary with respect to its reliability or
certainty status – for example, some values are being assumed or inferred, others are certain or confirmed.
This certainty status of the specification should be made explicit by adding a certainty status label
immediately to the language variety indication in question, such as “unconfirmed”, or more specifically
“assumed” or “inferred”, or similar. This label then indicates a property that refers to the respective
statement of the language variety, not itself to the event of language use or the LR in question. In this sense,
it is meta-meta-information.
EXAMPLE 2 [dialect:] Bavarian (certainty: inferred).

The fact that an LR contains several events of language use that belong to different language varieties
belonging to the same dimension of linguistic variation, such as a dialogue between speakers that use
different dialects, should be clearly stated.
Sometimes, the idiolect used in an LR is deliberately chosen to imitate that of different variety than the
one that would naturally apply to the speaker or situation. In such cases, the fact that a language variety is
applied to imitate another language variety should be indicated by a qualification such as “adopted”, “non-
original”, “imitated” or similar. Again, this qualification applies only to the one variety in question and not to
the speech-event as such.
EXAMPLE 3 [time period:] Victorian English (imitated).
This document uses explicit unabbreviated labels to refer to the respective varieties used in the examples.
Of course, for the purpose of building, for instance, language tags, such detailed identifications can and will
be abbreviated. The abbreviated labels should be unique for the specific individual language and across
dimensions of linguistic variation to avoid confusion, and they shall be clearly explained at some place, for
instance in a registry for such labe
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...