Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization

This document describes the serialization of the lexical markup framework (LMF) model defined as an extensible markup language (XML) model derived from the language base exchange (LBX) schema and compliant with the W3C XML schema. This serialization covers the classes, data categories, and mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model), and ISO 24613-3 (etymological extension).

Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 5: Sérialisation de l’échange de bases lexicales (LBX)

Le présent document décrit la sérialisation du modèle de cadre de balisage lexical (LMF) défini en tant que modèle de langage de balisage extensible (XML) issu du schéma d’échange de bases lexicales (LBX) et conforme au schéma W3C XML. Cette sérialisation couvre les classes, les catégories de données et les mécanismes de l’ISO 24613-1 (modèle de base), de l’ISO 24613-2 (modèle de dictionnaire lisible par ordinateur (MRD)) et de l’ISO 24613-3 (extension étymologique).

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del: Serializacija leksikalne osnovne izmenjave (LBX)

Ta dokument opisuje serializacijo modela ogrodja za označevanje leksikonov (LMF), opredeljenega kot model razširljivega označevalnega jezika (XML), ki izhaja iz sheme jezikovne osnovne izmenjave (LBX) in je skladen s shemo W3C XML. Ta serializacija zajema razrede, podatkovne kategorije in mehanizme standardov ISO 24613-1 (jedrni model), ISO 24613-2 (model strojno berljivega slovarja (MRD)) in ISO 24613-3 (etimološka razširitev).

General Information

Status: Published
Publication Date: 18-Jan-2022

ICS: 01.020 - Terminology (principles and coordination)

Technical Committee: ISO/TC 37/SC 4 - Language resource management
Drafting Committee: ISO/TC 37/SC 4/WG 4 - Lexical resources

Current Stage: 6060 - International Standard published
Start Date: 19-Jan-2022
Due Date: 25-Feb-2022
Completion Date: 19-Jan-2022

Ref Project: SIST ISO 24613-5:2023 - Language resource management -- Lexical markup framework (LMF) - Part 5: Lexical base exchange (LBX) serialization

Relations

Revises: ISO 24613:2008 - Language resource management - Lexical markup framework (LMF)
Effective Date: 06-Aug-2016

Overview

ISO 24613-5:2022 - Lexical base exchange (LBX) serialization specifies an XML serialization for the Lexical Markup Framework (LMF). LBX defines XML constructs and constraints to represent LMF core classes and the MRD and etymological extensions, ensuring LMF model instances can be exchanged and validated using W3C XML Schema 1.1. The document maps LMF classes (for example, LexicalResource, Lexicon, LexicalEntry) to LBX XML elements, prescribes data category selections and value lists, and provides examples and appendix material for implementation.

Key topics and technical requirements

Scope and compliance
- Serializes ISO 24613 core model (Part 1), MRD extension (Part 2) and etymological extension (Part 3).
- Requires compliance with W3C XML Schema 1.1 and LMF cardinalities.
LBX constructs
- Element-level mappings for central classes: <LexicalResource>, <GlobalInformation>, <Lexicon>, <LexicalEntry>, orthographic and form representations, sense and definition elements.
- Support for MRD constructs: word forms, stems, related forms, translations, examples and subject fields.
- Etymology support: classes such as Etymology, Etymon, EtyLink, CognateSet.
Mechanisms and extensions
- Cross-referencing mechanisms (CrossREF) to external media and virtual lexical resources (e.g., @lexicalResourceID).
- Feature structure implementation, label representation (<LBL>), rendering hints via @rend.
- XML namespace usage (example: xmlns="http://www.LexicalBaseExchange.org/2021/schema") and datatype conformity to XML Schema Part 2.
Documentation and examples
- Informative annexes for LBX data category selection, feature-structure implementation and LBX examples.

Practical applications and users

ISO 24613-5:2022 is intended for teams and projects that need interoperable lexical data exchange, including:

Computational linguists and NLP engineers building lexica for parsers, taggers and language models.
Lexicographers and dictionary publishers converting MRDs to validated XML.
Language resource managers and digital humanities projects harmonizing multilingual lexical datasets.
Software developers implementing lexical databases, import/export tools, and interoperability layers.
Terminologists, localization engineers and academic researchers** who require standardized, schema-validated lexical representations.

Benefits include improved interoperability, validated XML serialization for lexical resources, clearer mapping of etymological and MRD data, and support for linking to external media and database partitioning.

Related standards

ISO 24613-1:2019 (LMF core model)
ISO 24613-2:2020 (MRD model)
ISO 24613-3:2021 (Etymological extension)
ISO 15924 (script codes), IETF BCP 47 (language tags), W3C XML 1.1

Keywords: ISO 24613-5:2022, LBX serialization, Lexical Markup Framework, LMF, XML schema, MRD, etymology, lexical resource exchange, language resource management.

Buy Documents

Standard

ISO 24613-5:2023

English language (37 pages)

Preview

e-Library read for

AI-Chat

1 day

Create e-Library subscription and get permanent access to the document. Subscriptions are available for: 01 01.020

ISO 24613-5:2022 - Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization
Released:1/19/2022 - Page 1 preview

Standard

ISO 24613-5:2022 - Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization Released:1/19/2022

English language (32 pages)

sale 15% off

Preview

sale 15% off

Preview

Standard

ISO 24613-5:2022 - Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization Released:1/19/2022

French language (33 pages)

sale 15% off

Preview

sale 15% off

Preview

Frequently Asked Questions

What is ISO 24613-5:2022?

ISO 24613-5:2022 is a standard published by the International Organization for Standardization (ISO). Its full title is "Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization". This standard covers: This document describes the serialization of the lexical markup framework (LMF) model defined as an extensible markup language (XML) model derived from the language base exchange (LBX) schema and compliant with the W3C XML schema. This serialization covers the classes, data categories, and mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model), and ISO 24613-3 (etymological extension).

What is the scope of ISO 24613-5:2022?

What ICS categories does ISO 24613-5:2022 belong to?

ISO 24613-5:2022 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination). The ICS classification helps identify the subject area and facilitates finding related standards.

What standards are related to ISO 24613-5:2022?

ISO 24613-5:2022 has the following relationships with other standards: It is inter standard links to ISO 24613:2008. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

How can I access ISO 24613-5:2022?

ISO 24613-5:2022 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)

SLOVENSKI STANDARD
01-januar-2023
Nadomešča:
SIST ISO 24613:2013
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del:
Serializacija leksikalne osnovne izmenjave (LBX)
Language resource management -- Lexical markup framework (LMF) - Part 5: Lexical
base exchange (LBX) serialization
Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 5:
Sérialisation de l’échange de bases lexicales (LBX)
Ta slovenski standard je istoveten z: ISO 24613-5:2022
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

INTERNATIONAL ISO
STANDARD 24613-5
First edition
2022-01
Language resource management —
Lexical markup framework (LMF) —
Part 5:
Lexical base exchange (LBX)
serialization
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 5: Sérialisation de l’échange de bases lexicales (LBX)
Reference number
© ISO 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General requirements . 1
5 Serialization of the LMF core model (ISO 24613-1) . 2
5.1 Implementing the LexicalResource class . 2
5.2 Implementing the GlobalInformation class. 2
5.3 Implementing the Lexicon class . 3
5.4 Implementing the LexiconInformation class . 4
5.5 Implementing the LexicalEntry class . 4
5.6 Implementing the OrthographicRepresentation class . 5
5.7 Implementing the Form class . 6
5.7.1 Form class . 6
5.7.2 Lemma class . 6
5.8 Implementing the GrammaticalInformation class . 6
5.9 Implementing the Sense class . 7
5.10 Implementing the Definition class . 8
5.11 Implementing the CrossREF class . 8
6 Serialization of the MRD extension (ISO 24613-2) .10
6.1 Implementing OrthographicRepresentation subclasses . 10
6.2 Implementing the FormRepresentation class . 10
6.3 Implementing the Form subclasses. 11
6.3.1 General principles . 11
6.3.2 Implementing the WordForm class. 11
6.3.3 Implementing the Stem class . 11
6.3.4 Implementing the WordPart class . 11
6.3.5 Implementing the RelatedForm class .12
6.3.6 Implementing the TextRepresentation class .13
6.3.7 Implementing the Translation class . 14
6.3.8 Implementing the Example class . 14
6.4 Implementing the SubjectField class . 15
6.5 Implementing the Bibliography class . 15
7 Implementing the CrossREF mechanism to refer to external media files.15
8 Implementing the classes from the etymological extension (ISO 24613-3) .15
8.1 Implementing the Etymology class . 15
8.2 Implementing the Etymon class . 16
8.2.1 General . 16
8.2.2 Referencing forms in an etymon . 16
8.2.3 Representing the meaning of an etymon . 16
8.2.4 Representing the language of an etymon . 16
8.2.5 Dating an etymon . 17
8.2.6 Providing sources associated with an etymon . 17
8.3 Implementing the EtyLink class . 17
8.4 Implementing the CognateSet class . 17
8.5 Implementing the Cognate class . 17
9 Additional mechanisms .18
9.1 Overview . 18
9.2 XML feature structure implementation . 18
9.3 Representing various labels with . 18
9.4 Providing rendering information with the @rend attribute . 18
iii
Annex A (informative) LBX data category selection .19
Annex B (informative) LBX feature structure implementation .24
Annex C (informative) LBX examples for applying LBX serialization .27
Bibliography .32
iv
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-5, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-3:2021
and ISO 24613-4:2021, cancels and replaces ISO 24613:2008, which has been technically revised.
The main change compared to the previous edition is as follows:
— entire revision of the content and its subdivisions into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
INTERNATIONAL STANDARD ISO 24613-5:2022(E)
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
1 Scope
This document describes the serialization of the lexical markup framework (LMF) model defined as
an extensible markup language (XML) model derived from the language base exchange (LBX) schema
and compliant with the W3C XML schema. This serialization covers the classes, data categories, and
mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model),
and ISO 24613-3 (etymological extension).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
ISO 24613-3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological
extension
IETF BCP 47. Tags for Identifying Languages. Phillips, A., Davis, M. (eds.), September 2009. Best Current
Practice. Available from: https:// tools .ietf .org/ html/ bcp47
W3C. Extensible Markup Language (XML) 1.1 (Second Edition). W3C Recommendation 16 August
2006, edited in place 29 September 2006. Available from: https:// www .w3 .org/ TR/ 2006/ REC -xml11
-20060816/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and ISO 24613-3
apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
4 General requirements
This document aims at providing constructs for each LMF class from ISO 24613-1 (core model),
ISO 24613-2 (MRD extension), and ISO 24613-3 (etymological extension). It requires compliance
with ISO 24613-1, ISO 24613-2, and ISO 24613-3 when implementing data categories referred to in
the respective parts, and compliance with the W3C XML Schema 1.1 for representing structured
information in XML. LBX extends the original models by means of data category selections and precise
value lists, the creation of new subclasses and the definition of new constraints. In addition, this
document complies with the cardinalities expressed in ISO 24613-1, ISO 24613-2, and ISO 24613-3.
The LBX serialization is richer in detail than LMF, in order to meet specific design objectives. Still, this
document does not elaborate on the metadata aspects from LMF, since the LBX schema is by essence
much richer for the representation of all the aspects related to the creation, content, versioning and
database implementation of lexical content at large. Occasionally, slightly equivalent constructs to
explicit requirements from the LMF standard are mentioned.
The XML examples in this document are simplified by omitting namespaces. Except where otherwise
stated, it is assumed that XML elements belong to the LBX namespace and that the examples lie within
the scope of the following XML namespace declaration:
xmlns="http:// www .LexicalBaseExchange .org/ 2021/ schema"
Besides, datatypes in this document are defined in compliance to the XML Schema Part 2
recommendation. The “xs:” prefix corresponds to the following namespace:
ht t p:// w w w .w3 . or g / 20 01/ X M L S c hem a
5 Serialization of the LMF core model (ISO 24613-1)
5.1 Implementing the LexicalResource class
The LexicalResource class shall be implemented in LBX by means of the element
(see Table 1), which groups together one to many lexicons in a single collection. This level may be
omitted in cases where the lexical resource contains only one lexicon so that the resource starts
directly with the lexicon level. In cases where a lexical resource contains a large number of lexicons or
several very large lexicons, the lexicon (XML document) can reference a virtual lexical resource using a
@lexicalResourceID in the element and optionally the element (see 5.5).
Table 1 — LexicalResource class
LMF class LBX construct
/LexicalResource/
5.2 Implementing the GlobalInformation class
The GlobalInformation class shall be implemented in LBX by means of the element
(see Table 2) either by referencing a GlobalInformation.xsd schema using an element, or
as a direct child of a element. allows the encoding of a variety
of administrative, technical, documentary, and bibliographic information attached to the corresponding
lexical resource.
Table 2 — GlobalInformation class
LMF class LBX construct
/GlobalInformation/
Since the LBX serialization is based on the W3C recommendation for XML, it implements the
@xml: lang attribute to indicate language information corresponding to the content of specific
elements. According to the W3C recommendation, @xml: lang content shall be compliant with
BCP 47. There is no need for a specific implementation of the /language coding/ data category or
the /script coding/ data category in order to ensure compliance of this document with ISO 24613-1.
LBX does allow the inclusion of these data categories in the element in order
to support the validation of equivalent metadata found in the elements
of one or more lexicons (see 5.4). When included, the /script coding/ shall use the codes from
ISO 15924. The /character encoding/ data category is implemented in the XML declaration of an
LBX conformant document using the @encoding attribute. For instance, an XML-LBX document
encoded as UTF-8 according to the Unicode standard shall begin with the following declaration:

A non-exclusive list of sub-elements, simple types indexed by value, follows:
— “ISO 639-3”, a simple type enumerating the set of language codes used across all lexicons;
— “ISO 15924”, a simple type enumerating the set of scripts used across all lexicons;
— GlobalNotationType, a simple type enumerating the set of notations used across all lexicons;
— GlobalPartOfSpeechType, a simple type enumerating the set of values used across
all lexicons;
— SubjectFieldType, a simple type enumerating the set of values used across all
lexicons.
Examples can be found in the LBX reference schema, GlobalInformation document (see Annex B).
5.3 Implementing the Lexicon class
The Lexicon class shall be implemented in LBX by means of the element (see Table 3),
which is a direct child of the element when is used. If the
element is not used, becomes the root element. In cases where a lexical
resource contains a large number of lexicons or several very large lexicons, the lexicon (XML document)
can reference a virtual lexical resource using a @lexicalResourceID in the element (see
5.1). In the case of a virtual lexical resource, where the element is not part of the
same XML document as the element, the lexicon can use an include statement to reference
a relevant element. Other information within the element should be
qualified through the following child element(s) and attributes as direct children of the
element or, optimally, as children of the element (see 5.4):
— , the title of the lexicon; — @lexiconID, of datatype xs:ID as a unique identifier for the lexicon; as a best practice, the id should be a URI and be unique within a language resource; @xml:ID can be used in place of @lexiconID when there is a design intent to make the entry accessible on the web; — @lexicalResourceID of datatype xs:ID as a unique identifier for the lexical resource; as a best practice, the ID should be a URI for global scope; in addition, @xml:ID can be used in place of @lexicalResourceID when there is a design intent to make the entry accessible on the web; — @lexiconType, of datatype "xs: string"; the type of lexicon, e.g. bilingual dictionary, monolingual dictionary; — @sourceLanguage, of datatype "xs: string"; the language of the <Lemma> element or its inflected forms; — @targetLanguage, of datatype "xs: string"; the language the lemma is translated to, principally represented in the <Translation> element. Table 3 — Lexicon class LMF class LBX construct /Lexicon/ <Lexicon> 5.4 Implementing the LexiconInformation class The LexiconInformation class shall be implemented in LBX by means of the <LexiconInformation> element (see Table 4) either by referencing a LexiconInformation.xsd schema using an <xsd: include> element or as a direct child of the <Entry> element. <LexiconInformation> allows the encoding of a variety of administrative, technical, documentary, and bibliographic information attached to the corresponding lexical entry. Table 4 — LexiconInformation class LMF class LBX construct /LexiconInformation/ <LexiconInformation> When not included in the <Lexicon> element, information qualifying the lexicon shall be included as elements and attributes in the <LexiconInformation> element. These include (see 5.3): — <Title>; — @lexiconID; — @lexicalResourceID; — @lexiconType; — @sourceLanguage; — @targetLanguage. The <LexiconInformation> can also include elements and data categories that further qualify information in the lexicon and can be used to support the validation of the XML document (lexicon). These elements and data categories should also be included in the global set of elements and data categories found in the <GlobalInformation> element (see 5.2) and a comparison of the corresponding values in <GlobalInformation> and <LexiconInformation> should be part of the validation process. A non-exclusive list of these sub-elements, simple types indexed by value, follows: — NotationType, a simple type enumerating the set of notations used in a lexicon; — PartOfSpeechType, a simple type enumerating the set of <partOfSpeech> values used in a lexicon; — SubjectFieldType, a simple type enumerating the set of <SubjectField> values used in a lexicon. NOTE In addition to the <LexiconInformation> construct, LBX allows the concatenation of lexicon information for a subset of lexicons grouped by language by referencing a named language data schema (e.g. ArabicLanguageData.xsd) (see Clause B.1). 5.5 Implementing the LexicalEntry class The LexicalEntry class shall be implemented in LBX by means of the <Entry> element (see Table 5). Lexical information inside <Entry> elements should be encoded through the following child elements: — <GramFeats> for grammatical information related to the whole entry; — <Form> for containing the text literal and attributes qualifying the text literal (the Form class is serialized through subclasses in LBX); — <Etymology> for etymological aspects; — <Sense> for semantic information; — <Xref> for referencing internal or external elements. Attributes used for the <LexicalEntry> element can include: — @entryID of datatype xs:ID as a unique identifier for an entry; as a best practice, the id should be a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when there is a design intent to make the entry accessible on the web; — @lexiconID of datatype xs:ID as a unique identifier for the parent lexicon; as a best practice, the id should be a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when there is a design intent to make the lexicon accessible on the web; — @lexicalResourceID, a reference to the @lexicalResourceID of the associated lexicon collection when there is more than one lexicon. Table 5 — LexicalEntry class LMF class LBX construct /LexicalEntry/ <Entry> The following example in French illustrates the encoding of a simple dictionary entry with two senses. EXAMPLE <Entry xml:lang="fr"> <Etymology>XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7); anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology> <Lemma> <GramFeats> <POS>noun</POS> <Gender>fem</Gender> </GramFeats> <FormRep xml:lang="fr" notation="French">langouste</FormRep> <FormRep xml:lang="fr" notation="IPA">lägust</FormRep> </Lemma> <Sense senseNR="1"> <Def> <DefRep xml:lang="fr">Grand crustacé marin (Décapodes macroures) aux pattes antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est très appréciée.</DefRep> </Def> </Sense> <Sense senseNR="2"> <Note type="socioCultural">Fig. et fam. (vulg.).</Note> <Def> <DefRep xml:lang="fr">Femme, maîtresse</DefRep> </Def> </Sense> </Entry> NOTE 1 The style in the above example is appropriate for use in a lexical resource that contains a collection of bilingual lexicons in a variety of source languages, e.g. French, Spanish, Russian, Chinese. A simpler style can be used for a collection of monolingual French lexicons. For example, <Orth> and <Pron> can be used in place of the equivalent <FormRep> elements and the <Def> element can directly contain the text content rather than employing a <DefRep> child element for managing text content (see 5.10). See 6.2 for an example of simplification using the <Orth> and <Pron> elements. NOTE 2 The @notation value “French” is short for “Canonical French”. 5.6 Implementing the OrthographicRepresentation class Classes containing an OrthographicRepresentation class include the Form, Lemma, and Definition classes. Orthographic representations shall be implemented in LBX by means of elements corresponding to OrthographicRepresentation subclasses that are introduced in ISO 24613-2 (machine-readable dictionary (MRD) model), or possible new OrthographicRepresentation subclasses derived through the principles for LMF extensions described in ISO 24613-1 (core model). ISO 24613-1:2019, 5.6.1, describes some of the representation types that can serve as a basis for extending the OrthographicRepresentation class. ISO 24613-4:2021 (TEI extension), 6.1, lists a number of representation elements that are valid for use with the Form class. Elements implemented in this part are described in 5.7.2, 5.10, and successive subclauses from 6.3.2 to 6.3.8. 5.7 Implementing the Form class 5.7.1 Form class The Form class shall be implemented in LBX by elements that instantiate Form subclasses (see Table 6, 6.2 and 6.3). Table 6 — Form class LMF class LBX construct /Form/ <Form> 5.7.2 Lemma class The Lemma class, a subclass of the Form class, shall be implemented in LBX by means of the <Lemma> element (see Table 7). Table 7 — Lemma class LMF class LBX construct /Lemma/ <Lemma> Orthographic representations in the <Lemma> element shall be implemented in LBX by means of the <FormRep> element, or by elements that instantiate Form subclasses, including <Orth> and <Pron>. NOTE 1 The <FormRep>, <Orth>, and <Pron> elements are introduced in 6.2. NOTE 2 <Orth> and <Pron> can be allowed when justified by design goals. 5.8 Implementing the GrammaticalInformation class The GrammaticalInformation class groups grammatical features associated with the LexicalEntry class, Form class, or other classes (e.g. Translation, Sense) in case of specific grammatical restrictions. The GrammaticalInformation class shall be implemented in LBX by means of the <GramFeats> element (see Table 8) combined with various possible child elements for specific grammatical features. Table 8 — GrammaticalInformation class LMF class LBX construct /GrammaticalInformation/ <GramFeats> LBX provides the following child elements of <GramFeats> for describing specific grammatical features of associated elements (e.g. <Lemma>, <WordForm>): — <POS> to indicate the grammatical category of the lexical item. This corresponds to the /partOfSpeech/ data category in ISO 24611:2012, Annex A; — <Person> to indicate the grammatical person (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /person/ data category in ISO 24611:2012, Annex A; — <Gender> to indicate the grammatical gender (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /grammaticalGender/ data category in ISO 24611:2012, Annex A; — <Number> to indicate the grammatical number (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /grammaticalNumber/ data category in ISO 24611:2012, Annex A; — <Tense> to indicate the grammatical tense (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /grammaticalTense/ data category in ISO 24611:2012, Annex A; — <Aspect> to indicate the grammatical aspect (if relevant) of the lexical item or one of its inflected forms; — <Mood> to indicate the grammatical mood (if relevant) of the lexical item or one of its inflected forms; — <Voice> to indicate the grammatical voice (if relevant) of the lexical item or one of its inflected forms; — <Animacy> to indicate the grammatical animacy (if relevant) of the lexical item or one of its inflected forms (e.g. in Russian); — <GrammaticalClass> to indicate the grammatical class (gender) of Bantu languages; — <GrammaticalClassGroup> to indicate the aggregate grammatical classes (genders) of a specific noun in the singular and plural; — <iType> to indicate the inflectional class associated with the lexical item or one of its inflected forms; — <Subcat> to indicate subcategorization information (e.g. transitive/intransitive, countable/non- countable). The following example shows the grammatical information for a word form in a monolingual French dictionary that is part of a notional language resource containing a collection of monolingual and bilingual dictionaries in multiple source languages. The @notation="French", denoting canonical French, is used in databases that support a large set of possibly idiosyncratic notations (e.g. for canonical, transliterated and transcribed forms). EXAMPLE <Entry> <Lemma> <GramFeats> <POS>verb</POS> <Subcat>transitive</Subcat> </GramFeats> <FormRep xml:lang="fr" notation="French">pacifier</FormRep> <FormRep xml:lang="fr" notation="ipa">pasifje</FormRep> </Lemma> <Sense/> </Entry> For an example of simplifying this schema, see 6.2. 5.9 Implementing the Sense class The Sense class, as a recursive construct, shall be implemented in LBX by means of the <Sense> element (see Table 9). LBX does not allow character content in the element. Table 9 — Sense class LMF class LBX construct /Sense/ <Sense> 5.10 Implementing the Definition class The Definition class, which contains a narrative description of the word sense, shall be implemented in LBX by means of the <Def> element (see Table 10). Table 10 — Definition class LMF class LBX construct /Definition/ <Def> LBX provides the following child element for the description of definition information: — <DefRep>, an element instantiating a TextRepresentation subclass containing the character content of the definition. See 6.3.6. The <Def> element allows mixed data enabling the text literal (character content) to be contained within the <Def> element itself. See 6.3.6 for a description of the <DefRep> element, which provides an alternative approach for managing the text literal. Within an LBX <LexicalResource> or <Lexicon> element, the consistent use of <Def> or <DefRep> for character content is a best practice. NOTE The <DefRep> element supports the inclusion of multiple orthographic representations for a <Def> element (e.g. Simplified Chinese, Traditional Chinese). 5.11 Implementing the CrossREF class The CrossREF class shall be implemented in LBX by means of the <Xref> element (see Table 11), which points to an internal or external dictionary object, such as an entry, lemma, sense, or translation. LBX allows a range of different data types for the cross reference (URI, IRI, HREF, etc.). In order to make the data accessible through the web, LBX should implement web standards, such as a URL or Resource Description Framework (RDF). The <Xref> element can be qualified through attributes, such as @relType for describing the relationship type (e.g. synonym, antonym, hyponym). The type of identifier and any of its inherent characteristics and constraints should be identified in the <GlobalInformation> element or the <LexiconInformation> element, as appropriate. Table 11 — CrossREF class LMF class LBX construct /CrossREF/ <Xref [URI|IRI|HREF|…]=""> The <Xref> element can be used as a child of the <RelForm> element to point to content and metadata in a different entry. In this case, the <Xref> element can replace the content and metadata contained in the <RelForm>. EXAMPLE 1 <Entry> <Lemma> <FormRep xml:lang="en">lawful</FormRep> </Lemma> <RelForm relType="synonym"> <Xref href="#legal">legal</Xref> </RelForm> <Sense/> </Entry> The <Xref> element can also implement the CrossREF class through other classes. As in the following example, linking related senses provides a better description of semantic relationships. EXAMPLE 2 <Entry> <Lemma> <FormRep xml:lang="en">lawful</FormRep> </Lemma> <Sense xml:id="legal"> <Def>being legal</Def> <Xref IRI="#legal-sense1"/> </Sense> </Entry> LBX allows multiple strategies for describing multi-word expressions using the CrossREF mechanism. In the following example, the <Xref> element is contained in the Lemma and used to point to other entries, each of which contains a component of the multi-word expression. EXAMPLE 3 <Entry> <Lemma formStructure="MWE"> <GramFeats> <POS>noun</POS> </GramFeats> <FormRep xml:lang="en">motion picture</FormRep> <Xref relType="component" order="1" IRI="#motion_form_1">motion</Xref> <Xref relType="component" order="2" IRI="#picture_form_1">picture</Xref> </Lemma> <Sense> <Def>sequence of pictures that give the effect of motion when shown in rapid succession</Def> </Sense> </Entry> In the following example, the use of the <RelForm> to implement the multi-word expression allows a more in-depth grammatical analysis. EXAMPLE 4 <Entry> <Lemma formStructure="MWE"> <GramFeats> <POS>noun</POS> </GramFeats> <FormRep xml:lang="en">motion picture</FormRep> </Lemma> <RelForm relType="MWEComponent"> <GramFeats> <POS>attributiveNoun</POS> </GramFeats> <FormRep xml:lang="en">motion</FormRep> <Xref relType="component" order="1" IRI="#motion_form_1"/> </RelForm> <RelForm relType="MWEComponent"> <FormRep xml:lang="en">picture</FormRep> <Xref relType="component" order="2" IRI="#picture_form_1"/> </RelForm> <Sense> <Def xml:lang="en">sequence of pictures that give the effect of motion when shown in rapid succession</Def> </Sense> </Entry> 6 Serialization of the MRD extension (ISO 24613-2) 6.1 Implementing OrthographicRepresentation subclasses The OrthographicRepresentation class in LBX shall be serialized by means of elements derived from the FormRepresentation and TextRepresentation classes or their subclasses described in ISO 24613-2. FormRepresentation subclasses are further described in 6.2 and TextRepresentation subclasses in 6.3.6. In all cases, these corresponding elements can be qualified by attributes, including @xml: lang, @script, and @notation. Other attributes, such as @representationType (e.g. canonicalForm, phoneticForm), are also available. An LBX implementation should use BCP 47 for language description, especially for the support of web-based applications, data interchange, and system interoperability (@script is not used when BCP 47 is implemented). 6.2 Implementing the FormRepresentation class Depending on design goals, the FormRepresentation class in LBX shall be serialized by means of a general <FormRep> element derived from the FormRepresentation class, or by elements derived from FormRepresentation subclasses that correspond to the lexical environment of the Form subclasses (see Table 12). The goal of this design is to support effective resource management for large-scale, complex lexical databases (e.g. many lexicons encompassing many languages). When justified by design goals, simplification can be achieved by reducing the number of subclasses employed. These subclasses are represented by the elements described in 6.3.1 to 6.3.5, coupled with the appropriate attributes to qualify the content, in particular @xml: lang, @script, and @notation. Table 12 — FormRepresentation class LMF class LBX construct /FormRepresentation/ <FormRep> <StemRep> <PartRep> <RelFormRep> <FormRep> is contained by the <Lemma> (see 5.7.2), <WordForm>, and <RelForm> elements; <StemRep>, <PartRep>, and <RelFormRep> are restricted to specific elements (see 6.3, 6.4 and 6.5). Where the FormRepresentation class by itself is sufficient without further qualification, equivalents of TEI elements, such as <Orth>, <Pron>, <Hyph>, <Stress>, and <Syll> can potentially be used for further simplification. In such cases, descriptions of qualifying attributes (e.g. @xml: lang) should be included in <GlobalInformation> or <LexiconInformation>, as appropriate. The following example shows the <Orth> and <Pron> elements for a lemma in a monolingual French dictionary, part of a collection of monolingual French lexicons. In reference to ISO 24613-4:2021, 5.2, there is no requirement to include the language and script codes in the <GlobalInformation> element (although LBX allows the inclusion of these codes). <GlobalInformation> can also be used to implement a definition that assigns the "ipa" @ notation value to the <Pron> element. The following example shows the implementation of this principle using a revised version of the example in 5.8. EXAMPLE <Entry> <Lemma> <GramFeats> <POS>verb</POS> <Subcat>transitive</Subcat> </GramFeats> <Orth>pacifier</Orth> <Pron>pasifje</Pron> </Lemma> <Sense/> </Entry> 6.3 Implementing the Form subclasses 6.3.1 General principles Form subclasses described in ISO 24613-2 shall be serialized by means of the elements described in 6.3.2 to 6.3.5. LBX typically treats the Form class described in ISO 24613-1 as an abstract class. 6.3.2 Implementing the WordForm class The WordForm class, a subclass of the Form class, shall be implemented in LBX by means of the <WordForm> element (see Table 13). The <WordForm> element can further be characterized by a @formType attribute (e.g. inflection, abbreviation, etc.) and other qualifying values (see Annex B). Table 13 — WordForm class LMF class LBX construct /WordForm/ <WordForm> Orthographic representation in the <WordForm> element should be encoded through the following child elements: — <FormRep>; — <Orth>; — <Pron>. NOTE The FormRep derived elements <Orth> and <Pron> can be used in limited contexts when warranted by design goals (see 6.2). 6.3.3 Implementing the Stem class The Stem class is derived from the Form class for the representation of a stem or root, and shall be implemented by the <Stem> element (see Table 14) further constrained by means of the @stemType attribute (stem, root, arabicRoot, etc.) depending on the linguistic context. Table 14 — Stem class LMF class LBX construct /Stem/ <Stem> Orthographic representation in the <Stem> element should be encoded through the following child element: — <StemRep>. 6.3.4 Implementing the WordPart class The WordPart class, a subclass of the Form class, shall be implemented in LBX by means of the <WordPart> element (see Table 15). <WordPart> represents a sub-lexeme component of a word form that is not a stem or root (e.g. pre ...

INTERNATIONAL ISO
STANDARD 24613-5
First edition
2022-01
Language resource management —
Lexical markup framework (LMF) —
Part 5:
Lexical base exchange (LBX)
serialization
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 5: Sérialisation de l’échange de bases lexicales (LBX)
Reference number
© ISO 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General requirements . 1
5 Serialization of the LMF core model (ISO 24613-1) . 2
5.1 Implementing the LexicalResource class . 2
5.2 Implementing the GlobalInformation class. 2
5.3 Implementing the Lexicon class . 3
5.4 Implementing the LexiconInformation class . 4
5.5 Implementing the LexicalEntry class . 4
5.6 Implementing the OrthographicRepresentation class . 5
5.7 Implementing the Form class . 6
5.7.1 Form class . 6
5.7.2 Lemma class . 6
5.8 Implementing the GrammaticalInformation class . 6
5.9 Implementing the Sense class . 7
5.10 Implementing the Definition class . 8
5.11 Implementing the CrossREF class . 8
6 Serialization of the MRD extension (ISO 24613-2) .10
6.1 Implementing OrthographicRepresentation subclasses . 10
6.2 Implementing the FormRepresentation class . 10
6.3 Implementing the Form subclasses. 11
6.3.1 General principles . 11
6.3.2 Implementing the WordForm class. 11
6.3.3 Implementing the Stem class . 11
6.3.4 Implementing the WordPart class . 11
6.3.5 Implementing the RelatedForm class .12
6.3.6 Implementing the TextRepresentation class .13
6.3.7 Implementing the Translation class . 14
6.3.8 Implementing the Example class . 14
6.4 Implementing the SubjectField class . 15
6.5 Implementing the Bibliography class . 15
7 Implementing the CrossREF mechanism to refer to external media files.15
8 Implementing the classes from the etymological extension (ISO 24613-3) .15
8.1 Implementing the Etymology class . 15
8.2 Implementing the Etymon class . 16
8.2.1 General . 16
8.2.2 Referencing forms in an etymon . 16
8.2.3 Representing the meaning of an etymon . 16
8.2.4 Representing the language of an etymon . 16
8.2.5 Dating an etymon . 17
8.2.6 Providing sources associated with an etymon . 17
8.3 Implementing the EtyLink class . 17
8.4 Implementing the CognateSet class . 17
8.5 Implementing the Cognate class . 17
9 Additional mechanisms .18
9.1 Overview . 18
9.2 XML feature structure implementation . 18
9.3 Representing various labels with . 18
9.4 Providing rendering information with the @rend attribute . 18
iii
Annex A (informative) LBX data category selection .19
Annex B (informative) LBX feature structure implementation .24
Annex C (informative) LBX examples for applying LBX serialization .27
Bibliography .32
iv
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-5, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-3:2021
and ISO 24613-4:2021, cancels and replaces ISO 24613:2008, which has been technically revised.
The main change compared to the previous edition is as follows:
— entire revision of the content and its subdivisions into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
INTERNATIONAL STANDARD ISO 24613-5:2022(E)
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
1 Scope
This document describes the serialization of the lexical markup framework (LMF) model defined as
an extensible markup language (XML) model derived from the language base exchange (LBX) schema
and compliant with the W3C XML schema. This serialization covers the classes, data categories, and
mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model),
and ISO 24613-3 (etymological extension).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
ISO 24613-3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological
extension
IETF BCP 47. Tags for Identifying Languages. Phillips, A., Davis, M. (eds.), September 2009. Best Current
Practice. Available from: https:// tools .ietf .org/ html/ bcp47
W3C. Extensible Markup Language (XML) 1.1 (Second Edition). W3C Recommendation 16 August
2006, edited in place 29 September 2006. Available from: https:// www .w3 .org/ TR/ 2006/ REC -xml11
-20060816/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and ISO 24613-3
apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
4 General requirements
This document aims at providing constructs for each LMF class from ISO 24613-1 (core model),
ISO 24613-2 (MRD extension), and ISO 24613-3 (etymological extension). It requires compliance
with ISO 24613-1, ISO 24613-2, and ISO 24613-3 when implementing data categories referred to in
the respective parts, and compliance with the W3C XML Schema 1.1 for representing structured
information in XML. LBX extends the original models by means of data category selections and precise
value lists, the creation of new subclasses and the definition of new constraints. In addition, this
document complies with the cardinalities expressed in ISO 24613-1, ISO 24613-2, and ISO 24613-3.
The LBX serialization is richer in detail than LMF, in order to meet specific design objectives. Still, this
document does not elaborate on the metadata aspects from LMF, since the LBX schema is by essence
much richer for the representation of all the aspects related to the creation, content, versioning and
database implementation of lexical content at large. Occasionally, slightly equivalent constructs to
explicit requirements from the LMF standard are mentioned.
The XML examples in this document are simplified by omitting namespaces. Except where otherwise
stated, it is assumed that XML elements belong to the LBX namespace and that the examples lie within
the scope of the following XML namespace declaration:
xmlns="http:// www .LexicalBaseExchange .org/ 2021/ schema"
Besides, datatypes in this document are defined in compliance to the XML Schema Part 2
recommendation. The “xs:” prefix corresponds to the following namespace:
ht t p:// w w w .w3 . or g / 20 01/ X M L S c hem a
5 Serialization of the LMF core model (ISO 24613-1)
5.1 Implementing the LexicalResource class
The LexicalResource class shall be implemented in LBX by means of the element
(see Table 1), which groups together one to many lexicons in a single collection. This level may be
omitted in cases where the lexical resource contains only one lexicon so that the resource starts
directly with the lexicon level. In cases where a lexical resource contains a large number of lexicons or
several very large lexicons, the lexicon (XML document) can reference a virtual lexical resource using a
@lexicalResourceID in the element and optionally the element (see 5.5).
Table 1 — LexicalResource class
LMF class LBX construct
/LexicalResource/
5.2 Implementing the GlobalInformation class
The GlobalInformation class shall be implemented in LBX by means of the element
(see Table 2) either by referencing a GlobalInformation.xsd schema using an element, or
as a direct child of a element. allows the encoding of a variety
of administrative, technical, documentary, and bibliographic information attached to the corresponding
lexical resource.
Table 2 — GlobalInformation class
LMF class LBX construct
/GlobalInformation/
Since the LBX serialization is based on the W3C recommendation for XML, it implements the
@xml: lang attribute to indicate language information corresponding to the content of specific
elements. According to the W3C recommendation, @xml: lang content shall be compliant with
BCP 47. There is no need for a specific implementation of the /language coding/ data category or
the /script coding/ data category in order to ensure compliance of this document with ISO 24613-1.
LBX does allow the inclusion of these data categories in the element in order
to support the validation of equivalent metadata found in the elements
of one or more lexicons (see 5.4). When included, the /script coding/ shall use the codes from
ISO 15924. The /character encoding/ data category is implemented in the XML declaration of an
LBX conformant document using the @encoding attribute. For instance, an XML-LBX document
encoded as UTF-8 according to the Unicode standard shall begin with the following declaration:

A non-exclusive list of sub-elements, simple types indexed by value, follows:
— “ISO 639-3”, a simple type enumerating the set of language codes used across all lexicons;
— “ISO 15924”, a simple type enumerating the set of scripts used across all lexicons;
— GlobalNotationType, a simple type enumerating the set of notations used across all lexicons;
— GlobalPartOfSpeechType, a simple type enumerating the set of values used across
all lexicons;
— SubjectFieldType, a simple type enumerating the set of values used across all
lexicons.
Examples can be found in the LBX reference schema, GlobalInformation document (see Annex B).
5.3 Implementing the Lexicon class
The Lexicon class shall be implemented in LBX by means of the element (see Table 3),
which is a direct child of the element when is used. If the
element is not used, becomes the root element. In cases where a lexical
resource contains a large number of lexicons or several very large lexicons, the lexicon (XML document)
can reference a virtual lexical resource using a @lexicalResourceID in the element (see
5.1). In the case of a virtual lexical resource, where the element is not part of the
same XML document as the element, the lexicon can use an include statement to reference
a relevant element. Other information within the element should be
qualified through the following child element(s) and attributes as direct children of the
element or, optimally, as children of the element (see 5.4):
— , the title of the lexicon; — @lexiconID, of datatype xs:ID as a unique identifier for the lexicon; as a best practice, the id should be a URI and be unique within a language resource; @xml:ID can be used in place of @lexiconID when there is a design intent to make the entry accessible on the web; — @lexicalResourceID of datatype xs:ID as a unique identifier for the lexical resource; as a best practice, the ID should be a URI for global scope; in addition, @xml:ID can be used in place of @lexicalResourceID when there is a design intent to make the entry accessible on the web; — @lexiconType, of datatype "xs: string"; the type of lexicon, e.g. bilingual dictionary, monolingual dictionary; — @sourceLanguage, of datatype "xs: string"; the language of the <Lemma> element or its inflected forms; — @targetLanguage, of datatype "xs: string"; the language the lemma is translated to, principally represented in the <Translation> element. Table 3 — Lexicon class LMF class LBX construct /Lexicon/ <Lexicon> 5.4 Implementing the LexiconInformation class The LexiconInformation class shall be implemented in LBX by means of the <LexiconInformation> element (see Table 4) either by referencing a LexiconInformation.xsd schema using an <xsd: include> element or as a direct child of the <Entry> element. <LexiconInformation> allows the encoding of a variety of administrative, technical, documentary, and bibliographic information attached to the corresponding lexical entry. Table 4 — LexiconInformation class LMF class LBX construct /LexiconInformation/ <LexiconInformation> When not included in the <Lexicon> element, information qualifying the lexicon shall be included as elements and attributes in the <LexiconInformation> element. These include (see 5.3): — <Title>; — @lexiconID; — @lexicalResourceID; — @lexiconType; — @sourceLanguage; — @targetLanguage. The <LexiconInformation> can also include elements and data categories that further qualify information in the lexicon and can be used to support the validation of the XML document (lexicon). These elements and data categories should also be included in the global set of elements and data categories found in the <GlobalInformation> element (see 5.2) and a comparison of the corresponding values in <GlobalInformation> and <LexiconInformation> should be part of the validation process. A non-exclusive list of these sub-elements, simple types indexed by value, follows: — NotationType, a simple type enumerating the set of notations used in a lexicon; — PartOfSpeechType, a simple type enumerating the set of <partOfSpeech> values used in a lexicon; — SubjectFieldType, a simple type enumerating the set of <SubjectField> values used in a lexicon. NOTE In addition to the <LexiconInformation> construct, LBX allows the concatenation of lexicon information for a subset of lexicons grouped by language by referencing a named language data schema (e.g. ArabicLanguageData.xsd) (see Clause B.1). 5.5 Implementing the LexicalEntry class The LexicalEntry class shall be implemented in LBX by means of the <Entry> element (see Table 5). Lexical information inside <Entry> elements should be encoded through the following child elements: — <GramFeats> for grammatical information related to the whole entry; — <Form> for containing the text literal and attributes qualifying the text literal (the Form class is serialized through subclasses in LBX); — <Etymology> for etymological aspects; — <Sense> for semantic information; — <Xref> for referencing internal or external elements. Attributes used for the <LexicalEntry> element can include: — @entryID of datatype xs:ID as a unique identifier for an entry; as a best practice, the id should be a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when there is a design intent to make the entry accessible on the web; — @lexiconID of datatype xs:ID as a unique identifier for the parent lexicon; as a best practice, the id should be a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when there is a design intent to make the lexicon accessible on the web; — @lexicalResourceID, a reference to the @lexicalResourceID of the associated lexicon collection when there is more than one lexicon. Table 5 — LexicalEntry class LMF class LBX construct /LexicalEntry/ <Entry> The following example in French illustrates the encoding of a simple dictionary entry with two senses. EXAMPLE <Entry xml:lang="fr"> <Etymology>XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7); anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology> <Lemma> <GramFeats> <POS>noun</POS> <Gender>fem</Gender> </GramFeats> <FormRep xml:lang="fr" notation="French">langouste</FormRep> <FormRep xml:lang="fr" notation="IPA">lägust</FormRep> </Lemma> <Sense senseNR="1"> <Def> <DefRep xml:lang="fr">Grand crustacé marin (Décapodes macroures) aux pattes antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est très appréciée.</DefRep> </Def> </Sense> <Sense senseNR="2"> <Note type="socioCultural">Fig. et fam. (vulg.).</Note> <Def> <DefRep xml:lang="fr">Femme, maîtresse</DefRep> </Def> </Sense> </Entry> NOTE 1 The style in the above example is appropriate for use in a lexical resource that contains a collection of bilingual lexicons in a variety of source languages, e.g. French, Spanish, Russian, Chinese. A simpler style can be used for a collection of monolingual French lexicons. For example, <Orth> and <Pron> can be used in place of the equivalent <FormRep> elements and the <Def> element can directly contain the text content rather than employing a <DefRep> child element for managing text content (see 5.10). See 6.2 for an example of simplification using the <Orth> and <Pron> elements. NOTE 2 The @notation value “French” is short for “Canonical French”. 5.6 Implementing the OrthographicRepresentation class Classes containing an OrthographicRepresentation class include the Form, Lemma, and Definition classes. Orthographic representations shall be implemented in LBX by means of elements corresponding to OrthographicRepresentation subclasses that are introduced in ISO 24613-2 (machine-readable dictionary (MRD) model), or possible new OrthographicRepresentation subclasses derived through the principles for LMF extensions described in ISO 24613-1 (core model). ISO 24613-1:2019, 5.6.1, describes some of the representation types that can serve as a basis for extending the OrthographicRepresentation class. ISO 24613-4:2021 (TEI extension), 6.1, lists a number of representation elements that are valid for use with the Form class. Elements implemented in this part are described in 5.7.2, 5.10, and successive subclauses from 6.3.2 to 6.3.8. 5.7 Implementing the Form class 5.7.1 Form class The Form class shall be implemented in LBX by elements that instantiate Form subclasses (see Table 6, 6.2 and 6.3). Table 6 — Form class LMF class LBX construct /Form/ <Form> 5.7.2 Lemma class The Lemma class, a subclass of the Form class, shall be implemented in LBX by means of the <Lemma> element (see Table 7). Table 7 — Lemma class LMF class LBX construct /Lemma/ <Lemma> Orthographic representations in the <Lemma> element shall be implemented in LBX by means of the <FormRep> element, or by elements that instantiate Form subclasses, including <Orth> and <Pron>. NOTE 1 The <FormRep>, <Orth>, and <Pron> elements are introduced in 6.2. NOTE 2 <Orth> and <Pron> can be allowed when justified by design goals. 5.8 Implementing the GrammaticalInformation class The GrammaticalInformation class groups grammatical features associated with the LexicalEntry class, Form class, or other classes (e.g. Translation, Sense) in case of specific grammatical restrictions. The GrammaticalInformation class shall be implemented in LBX by means of the <GramFeats> element (see Table 8) combined with various possible child elements for specific grammatical features. Table 8 — GrammaticalInformation class LMF class LBX construct /GrammaticalInformation/ <GramFeats> LBX provides the following child elements of <GramFeats> for describing specific grammatical features of associated elements (e.g. <Lemma>, <WordForm>): — <POS> to indicate the grammatical category of the lexical item. This corresponds to the /partOfSpeech/ data category in ISO 24611:2012, Annex A; — <Person> to indicate the grammatical person (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /person/ data category in ISO 24611:2012, Annex A; — <Gender> to indicate the grammatical gender (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /grammaticalGender/ data category in ISO 24611:2012, Annex A; — <Number> to indicate the grammatical number (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /grammaticalNumber/ data category in ISO 24611:2012, Annex A; — <Tense> to indicate the grammatical tense (if relevant) of the lexical item or one of its inflected forms. This corresponds to the /grammaticalTense/ data category in ISO 24611:2012, Annex A; — <Aspect> to indicate the grammatical aspect (if relevant) of the lexical item or one of its inflected forms; — <Mood> to indicate the grammatical mood (if relevant) of the lexical item or one of its inflected forms; — <Voice> to indicate the grammatical voice (if relevant) of the lexical item or one of its inflected forms; — <Animacy> to indicate the grammatical animacy (if relevant) of the lexical item or one of its inflected forms (e.g. in Russian); — <GrammaticalClass> to indicate the grammatical class (gender) of Bantu languages; — <GrammaticalClassGroup> to indicate the aggregate grammatical classes (genders) of a specific noun in the singular and plural; — <iType> to indicate the inflectional class associated with the lexical item or one of its inflected forms; — <Subcat> to indicate subcategorization information (e.g. transitive/intransitive, countable/non- countable). The following example shows the grammatical information for a word form in a monolingual French dictionary that is part of a notional language resource containing a collection of monolingual and bilingual dictionaries in multiple source languages. The @notation="French", denoting canonical French, is used in databases that support a large set of possibly idiosyncratic notations (e.g. for canonical, transliterated and transcribed forms). EXAMPLE <Entry> <Lemma> <GramFeats> <POS>verb</POS> <Subcat>transitive</Subcat> </GramFeats> <FormRep xml:lang="fr" notation="French">pacifier</FormRep> <FormRep xml:lang="fr" notation="ipa">pasifje</FormRep> </Lemma> <Sense/> </Entry> For an example of simplifying this schema, see 6.2. 5.9 Implementing the Sense class The Sense class, as a recursive construct, shall be implemented in LBX by means of the <Sense> element (see Table 9). LBX does not allow character content in the element. Table 9 — Sense class LMF class LBX construct /Sense/ <Sense> 5.10 Implementing the Definition class The Definition class, which contains a narrative description of the word sense, shall be implemented in LBX by means of the <Def> element (see Table 10). Table 10 — Definition class LMF class LBX construct /Definition/ <Def> LBX provides the following child element for the description of definition information: — <DefRep>, an element instantiating a TextRepresentation subclass containing the character content of the definition. See 6.3.6. The <Def> element allows mixed data enabling the text literal (character content) to be contained within the <Def> element itself. See 6.3.6 for a description of the <DefRep> element, which provides an alternative approach for managing the text literal. Within an LBX <LexicalResource> or <Lexicon> element, the consistent use of <Def> or <DefRep> for character content is a best practice. NOTE The <DefRep> element supports the inclusion of multiple orthographic representations for a <Def> element (e.g. Simplified Chinese, Traditional Chinese). 5.11 Implementing the CrossREF class The CrossREF class shall be implemented in LBX by means of the <Xref> element (see Table 11), which points to an internal or external dictionary object, such as an entry, lemma, sense, or translation. LBX allows a range of different data types for the cross reference (URI, IRI, HREF, etc.). In order to make the data accessible through the web, LBX should implement web standards, such as a URL or Resource Description Framework (RDF). The <Xref> element can be qualified through attributes, such as @relType for describing the relationship type (e.g. synonym, antonym, hyponym). The type of identifier and any of its inherent characteristics and constraints should be identified in the <GlobalInformation> element or the <LexiconInformation> element, as appropriate. Table 11 — CrossREF class LMF class LBX construct /CrossREF/ <Xref [URI|IRI|HREF|…]=""> The <Xref> element can be used as a child of the <RelForm> element to point to content and metadata in a different entry. In this case, the <Xref> element can replace the content and metadata contained in the <RelForm>. EXAMPLE 1 <Entry> <Lemma> <FormRep xml:lang="en">lawful</FormRep> </Lemma> <RelForm relType="synonym"> <Xref href="#legal">legal</Xref> </RelForm> <Sense/> </Entry> The <Xref> element can also implement the CrossREF class through other classes. As in the following example, linking related senses provides a better description of semantic relationships. EXAMPLE 2 <Entry> <Lemma> <FormRep xml:lang="en">lawful</FormRep> </Lemma> <Sense xml:id="legal"> <Def>being legal</Def> <Xref IRI="#legal-sense1"/> </Sense> </Entry> LBX allows multiple strategies for describing multi-word expressions using the CrossREF mechanism. In the following example, the <Xref> element is contained in the Lemma and used to point to other entries, each of which contains a component of the multi-word expression. EXAMPLE 3 <Entry> <Lemma formStructure="MWE"> <GramFeats> <POS>noun</POS> </GramFeats> <FormRep xml:lang="en">motion picture</FormRep> <Xref relType="component" order="1" IRI="#motion_form_1">motion</Xref> <Xref relType="component" order="2" IRI="#picture_form_1">picture</Xref> </Lemma> <Sense> <Def>sequence of pictures that give the effect of motion when shown in rapid succession</Def> </Sense> </Entry> In the following example, the use of the <RelForm> to implement the multi-word expression allows a more in-depth grammatical analysis. EXAMPLE 4 <Entry> <Lemma formStructure="MWE"> <GramFeats> <POS>noun</POS> </GramFeats> <FormRep xml:lang="en">motion picture</FormRep> </Lemma> <RelForm relType="MWEComponent"> <GramFeats> <POS>attributiveNoun</POS> </GramFeats> <FormRep xml:lang="en">motion</FormRep> <Xref relType="component" order="1" IRI="#motion_form_1"/> </RelForm> <RelForm relType="MWEComponent"> <FormRep xml:lang="en">picture</FormRep> <Xref relType="component" order="2" IRI="#picture_form_1"/> </RelForm> <Sense> <Def xml:lang="en">sequence of pictures that give the effect of motion when shown in rapid succession</Def> </Sense> </Entry> 6 Serialization of the MRD extension (ISO 24613-2) 6.1 Implementing OrthographicRepresentation subclasses The OrthographicRepresentation class in LBX shall be serialized by means of elements derived from the FormRepresentation and TextRepresentation classes or their subclasses described in ISO 24613-2. FormRepresentation subclasses are further described in 6.2 and TextRepresentation subclasses in 6.3.6. In all cases, these corresponding elements can be qualified by attributes, including @xml: lang, @script, and @notation. Other attributes, such as @representationType (e.g. canonicalForm, phoneticForm), are also available. An LBX implementation should use BCP 47 for language description, especially for the support of web-based applications, data interchange, and system interoperability (@script is not used when BCP 47 is implemented). 6.2 Implementing the FormRepresentation class Depending on design goals, the FormRepresentation class in LBX shall be serialized by means of a general <FormRep> element derived from the FormRepresentation class, or by elements derived from FormRepresentation subclasses that correspond to the lexical environment of the Form subclasses (see Table 12). The goal of this design is to support effective resource management for large-scale, complex lexical databases (e.g. many lexicons encompassing many languages). When justified by design goals, simplification can be achieved by reducing the number of subclasses employed. These subclasses are represented by the elements described in 6.3.1 to 6.3.5, coupled with the appropriate attributes to qualify the content, in particular @xml: lang, @script, and @notation. Table 12 — FormRepresentation class LMF class LBX construct /FormRepresentation/ <FormRep> <StemRep> <PartRep> <RelFormRep> <FormRep> is contained by the <Lemma> (see 5.7.2), <WordForm>, and <RelForm> elements; <StemRep>, <PartRep>, and <RelFormRep> are restricted to specific elements (see 6.3, 6.4 and 6.5). Where the FormRepresentation class by itself is sufficient without further qualification, equivalents of TEI elements, such as <Orth>, <Pron>, <Hyph>, <Stress>, and <Syll> can potentially be used for further simplification. In such cases, descriptions of qualifying attributes (e.g. @xml: lang) should be included in <GlobalInformation> or <LexiconInformation>, as appropriate. The following example shows the <Orth> and <Pron> elements for a lemma in a monolingual French dictionary, part of a collection of monolingual French lexicons. In reference to ISO 24613-4:2021, 5.2, there is no requirement to include the language and script codes in the <GlobalInformation> element (although LBX allows the inclusion of these codes). <GlobalInformation> can also be used to implement a definition that assigns the "ipa" @ notation value to the <Pron> element. The following example shows the implementation of this principle using a revised version of the example in 5.8. EXAMPLE <Entry> <Lemma> <GramFeats> <POS>verb</POS> <Subcat>transitive</Subcat> </GramFeats> <Orth>pacifier</Orth> <Pron>pasifje</Pron> </Lemma> <Sense/> </Entry> 6.3 Implementing the Form subclasses 6.3.1 General principles Form subclasses described in ISO 24613-2 shall be serialized by means of the elements described in 6.3.2 to 6.3.5. LBX typically treats the Form class described in ISO 24613-1 as an abstract class. 6.3.2 Implementing the WordForm class The WordForm class, a subclass of the Form class, shall be implemented in LBX by means of the <WordForm> element (see Table 13). The <WordForm> element can further be characterized by a @formType attribute (e.g. inflection, abbreviation, etc.) and other qualifying values (see Annex B). Table 13 — WordForm class LMF class LBX construct /WordForm/ <WordForm> Orthographic representation in the <WordForm> element should be encoded through the following child elements: — <FormRep>; — <Orth>; — <Pron>. NOTE The FormRep derived elements <Orth> and <Pron> can be used in limited contexts when warranted by design goals (see 6.2). 6.3.3 Implementing the Stem class The Stem class is derived from the Form class for the representation of a stem or root, and shall be implemented by the <Stem> element (see Table 14) further constrained by means of the @stemType attribute (stem, root, arabicRoot, etc.) depending on the linguistic context. Table 14 — Stem class LMF class LBX construct /Stem/ <Stem> Orthographic representation in the <Stem> element should be encoded through the following child element: — <StemRep>. 6.3.4 Implementing the WordPart class The WordPart class, a subclass of the Form class, shall be implemented in LBX by means of the <WordPart> element (see Table 15). <WordPart> represents a sub-lexeme component of a word form that is not a stem or root (e.g. prefix, suffix). Table 15 — WordPart class LMF class LBX construct /WordPart/ <WordPart> Orthographic representation in the <WordForm> element should be encoded through the following child element: — <PartRep>. Examples of use cases include filling prefix and suffix slots in agglutinative languages and “indexing” lexical items by means of shared affixes. The following example illustrates the latter use case. EXAMPLE <Entry> <Lemma> <FormRep xml:lang="de">entdecken</FormRep> </Lemma> <WordPart> <PartRep>ent</PartRep> <Xref IRI="#ent-prefix"/> </WordPart> <Sense> <Def xml:lang="en">to discover</Def> </Sense> </Entry> <Entry> <Lemma> <FormRep xml:lang="de">empfehlen</FormRep> </Lemma> <WordPart> <PartRep>emp</PartRep> <Xref IRI="#ent-prefix"/> </WordPart> <Sense> <Def xml:lang="en">to recommend</Def> </Sense> </Entry> <Entry entryID="ent-prefix"> <Lemma partType="prefix"> <FormRep xml:lang="de">ent</FormRep> </Lemma> <WordForm partType="prefix" formType="variant"> <FormRep xml:lang="de">emp</FormRep> </WordForm> <Sense> <Def xml:lang="en">An inseparable verbal prefix originally denoting the beginning of an action or separation. This fundamental meaning has been lost in many ...

NORME ISO
INTERNATIONALE 24613-5
Première édition
2022-01
Gestion des ressources
linguistiques — Cadre de balisage
lexical (LMF) —
Partie 5:
Sérialisation de l’échange de bases
lexicales (LBX)
Language resource management — Lexical markup framework
(LMF) —
Part 5: Lexical base exchange (LBX) serialization
Numéro de référence
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2022
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii
Sommaire Page
Avant-propos .v
1 Domaine d’application . 1
2 Références normatives .1
3 Termes et définitions . 1
4 Exigences générales . .2
5 Sérialisation du modèle de base LMF (ISO 24613-1) . 2
5.1 Implémentation de la classe LexicalResource . 2
5.2 Implémentation de la classe GlobalInformation . 2
5.3 Implémentation de la classe Lexicon . 3
5.4 Implémentation de la classe LexiconInformation . 4
5.5 Implémentation de la classe LexicalEntry . 5
5.6 Implémentation de la classe OrthographicRepresentation . 6
5.7 Implémentation de la classe Form . 6
5.7.1 Classe Form . 6
5.7.2 Classe Lemma . . 6
5.8 Implémentation de la classe GrammaticalInformation . 7
5.9 Implémentation de la classe Sense . 8
5.10 Implémentation de la classe Definition . 8
5.11 Implémentation de la classe CrossREF . 9
6 Sérialisation de l’extension MRD (ISO 24613-2) .10
6.1 Implémentation des sous-classes OrthographicRepresentation . 10
6.2 Implémentation de la classe FormRepresentation . 10
6.3 Implémentation des sous-classes Form . 11
6.3.1 Principes généraux . 11
6.3.2 Implémentation de la classe WordForm . 11
6.3.3 Implémentation de la classe Stem .12
6.3.4 Implémentation de la classe WordPart .12
6.3.5 Implémentation de la classe RelatedForm . 13
6.3.6 Implémentation de la classe TextRepresentation . 14
6.3.7 Implémentation de la classe Translation . 15
6.3.8 Implémentation de la classe Example. 15
6.4 Implémentation de la classe SubjectField . 16
6.5 Implémentation de la classe Bibliography . 16
7 Implémentation du mécanisme CrossREF de renvoi à des fichiers multimédia
externes .16
8 Implémentation des classes à partir de l’extension étymologique (ISO 24613-3) .16
8.1 Implémentation de la classe Etymology . 16
8.2 Implémentation de la classe Etymon . 17
8.2.1 General . 17
8.2.2 Référencement des formes dans un étymon . 17
8.2.3 Représentation de la signification d’un étymon . 17
8.2.4 Représentation de la langue d’un étymon . 17
8.2.5 Datation d’un étymon . 17
8.2.6 Citation de sources associées à un étymon . 18
8.3 Implémentation de la classe EtyLink . 18
8.4 Implémentation de la classe CognateSet . 18
8.5 Implémentation de la classe Cognate . 18
9 Mécanismes supplémentaires .19
9.1 Vue d’ensemble . 19
9.2 Implémentation d’une structure de traits XML . 19
9.3 Représentation de diverses étiquettes avec . 19
iii
9.4 Transmission d’informations de rendu avec l’attribut @rend . 19
Annexe A (informative) Sélection des catégories de données LBX.20
Annexe B (informative) Implémentation d’une structure de traits en LBX .25
Annexe C (informative) Exemples d’application de la sérialisation LBX.28
Bibliographie .33
iv
Avant-propos
L'ISO (Organisation internationale de normalisation) est une fédération mondiale d'organismes
nationaux de normalisation (comités membres de l'ISO). L'élaboration des Normes internationales est
en général confiée aux comités techniques de l'ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l'ISO participent également aux travaux.
L'ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier, de prendre note des différents
critères d'approbation requis pour les différents types de documents ISO. Le présent document a
été rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir
www.iso.org/directives).
L'attention est attirée sur le fait que certains des éléments du présent document peuvent faire l'objet de
droits de propriété intellectuelle ou de droits analogues. L'ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l'élaboration du document sont indiqués dans l'Introduction et/ou dans la liste des déclarations de
brevets reçues par l'ISO (voir www.iso.org/brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l'ISO liés à l'évaluation de la conformité, ou pour toute information au sujet de l'adhésion
de l'ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles
techniques au commerce (OTC), voir www.iso.org/avant-propos.
Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-
comité SC 4, Gestion des ressources linguistiques.
Cette première édition de l’ISO 24613-5, utilisée conjointement avec l’ISO 24613-1:2019,
l’ISO 24613-2:2020, l’ISO 24613-3:2021 et l’ISO 24613-4:2021, annule et remplace l’ISO 24613:2008, qui
a fait l’objet d’une révision technique.
La principale modification par rapport à l’édition précédente est la suivante:
— révision complète du contenu et de ses subdivisions en plusieurs parties.
Une liste de toutes les parties de la série ISO 24613 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes
se trouve à l’adresse www.iso.org/fr/members.html.
v
NORME INTERNATIONALE ISO 24613-5:2022(F)
Gestion des ressources linguistiques — Cadre de balisage
lexical (LMF) —
Partie 5:
Sérialisation de l’échange de bases lexicales (LBX)
1 Domaine d’application
Le présent document décrit la sérialisation du modèle de cadre de balisage lexical (LMF) défini en tant
que modèle de langage de balisage extensible (XML) issu du schéma d’échange de bases lexicales (LBX)
et conforme au schéma W3C XML. Cette sérialisation couvre les classes, les catégories de données et
les mécanismes de l’ISO 24613-1 (modèle de base), de l’ISO 24613-2 (modèle de dictionnaire lisible par
ordinateur (MRD)) et de l’ISO 24613-3 (extension étymologique).
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.
Pour les références non datées, la dernière édition du document de référence s'applique (y compris les
éventuels amendements).
ISO 15924, Information et documentation — Codes pour la représentation des noms d’écritures
ISO 24613-1, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 1: Modèle de
base
ISO 24613-2, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 2: Modèle de
dictionnaire lisible par ordinateur (MRD)
ISO 24613-3, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension
étymologique
IETF BCP 47. Tags for Identifying Languages. Phillips, A., Davis, M. (eds.), septembre 2009. Best Current
Practice. Disponible à l’adresse: https:// tools .ietf .org/ html/ bcp47
W3C. Extensible Markup Language (XML) 1.1 (Seconde édition). W3C Recommendation, 16 août
2006, rééditée le 29 septembre 2006. Disponible à l’adresse: https:// www .w3 .org/ TR/ 2006/
REC-xml11-20060816/
3 Termes et définitions
Pour les besoins du présent document, les termes et les définitions de l’ISO 24613-1 et l’ISO 24613-3
s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp
— IEC Electropedia: disponible à l’adresse https:// www .electropedia .org/
4 Exigences générales
Le présent document est destiné à fournir des constructions pour chaque classe LMF issue de
l’ISO 24613-1 (modèle de base), de l’ISO 24613-2 (extension MRD) et de l’ISO 24613-3 (extension
étymologique). Il nécessite de se conformer aux ISO 24613-1, ISO 24613-2 et ISO 24613-3 lors de
l’implémentation des catégories de données citées dans les parties respectives, ainsi qu’au Schéma W3C
XML 1.1 pour représenter des informations structurées en XML. LBX étend les modèles d’origine par
des sélections de catégories de données et des listes de valeurs précises, la création de nouvelles sous-
classes et la définition de nouvelles contraintes. De plus, le présent document respecte les cardinalités
exprimées dans l’ISO 24613-1, l’ISO 24613-2 et l’ISO 24613-3. La sérialisation LBX offre un niveau
de détail plus riche que LMF, afin d’atteindre des objectifs de conception spécifiques. Cependant, le
présent document ne développe pas les aspects de LMF liés aux métadonnées car le schéma LBX est par
essence nettement plus riche pour la représentation de tous les aspects liés à la création, au contenu, au
versionnage et à l’implémentation de bases de données d’un contenu lexical dans son ensemble. Dans
certains cas, des constructions légèrement équivalentes sont mentionnées pour expliciter les exigences
par rapport au LMF normalisé.
Les exemples XML du présent document sont simplifiés en omettant les espaces de nommage.
Sauf spécification contraire, il est présumé que les éléments XML appartiennent à l’espace de nommage
LBX et que les exemples entrent dans le domaine d’application de la déclaration d’espace de nommage
XML suivante:
x m l n s =" ht t p:// w w w . L e x ic a l B a s eE xc h a n g e . or g / 2021/ s c hem a"
De plus, les datatypes spécifiés dans le présent document sont définis conformément à la
recommandation de la Partie 2 concernant le schéma XML. Le préfixe «xs:» correspond à l’espace de
nommage suivante:
ht t p:// w w w .w3 . or g / 20 01/ X M L S c hem a
5 Sérialisation du modèle de base LMF (ISO 24613-1)
5.1 Implémentation de la classe LexicalResource
La classe LexicalResource doit être implémentée en LBX au moyen de l’élément
(voir Tableau 1), qui regroupe un à plusieurs lexiques dans une seule collection. Ce niveau peut être
omis dans les cas où la ressource lexicale ne contient qu’un seul lexique, de sorte que la ressource
débute directement par le niveau lexique. Dans les cas où une ressource lexicale contient un grand
nombre de lexiques ou plusieurs lexiques volumineux, le lexique (document XML) peut faire référence à
une ressource lexicale virtuelle en utilisant un attribut @lexicalResourceID dans l’élément ,
et l’élément facultatif (voir 5.5).
Tableau 1 — Classe LexicalResource
Classe LMF Construction LBX
/LexicalResource/
5.2 Implémentation de la classe GlobalInformation
La classe GlobalInformation doit être implémentée en LBX au moyen de l’élément
(voir Tableau 2), soit en faisant référence à un schéma GlobalInformation.xsd en utilisant un élément
, soit en tant qu’enfant direct d’un élément .
permet de coder une diversité d’informations administratives, techniques, documentaires et
bibliographiques associées à la ressource lexicale correspondante.
Tableau 2 — Classe GlobalInformation
Classe LMF Construction LBX
/GlobalInformation/
Comme la sérialisation LBX repose sur la recommandation W3C pour XML, elle implémente l’attribut
@xml: lang pour indiquer les informations linguistiques correspondant au contenu d’éléments
spécifiques. D’après la recommandation W3C, le contenu @xml: lang doit être conforme au BCP 47. Il n’est
pas nécessaire d’effectuer une implémentation spécifique de la catégorie de données /language coding/
ou /script coding/ afin de garantir la conformité du présent document à l’ISO 24613-1. LBX permet
d’inclure ces catégories de données dans l’élément afin de faciliter la validation
de métadonnées équivalentes trouvées dans les éléments d’un ou plusieurs
lexiques (voir 5.4). Une fois incluse, la catégorie /script coding/ doit utiliser les codes de l’ISO 15924.
La catégorie de données /character encoding/ est implémentée dans la déclaration XML d’un document
conforme au LBX, en utilisant l’attribut @encoding. Par exemple, un document XML-LBX codé en UTF-8
conformément à la norme Unicode doit débuter par la déclaration suivante:
.
La liste non exhaustive suivante concerne des sous-éléments de types simples
indexés par valeurs:
— «ISO 639-3», un type simple énumérant l’ensemble des codes de langue utilisés dans tous les lexiques;
— «ISO 15924», un type simple énumérant l’ensemble des scripts utilisés dans tous les lexiques;
— GlobalNotationType, un type simple énumérant l’ensemble des notations utilisées dans tous les
lexiques;
— GlobalPartOfSpeechType, un type simple énumérant l’ensemble des valeurs
utilisées dans tous les lexiques;
— SubjectFieldType, un type simple énumérant l’ensemble des valeurs utilisées dans
tous les lexiques.
Des exemples peuvent être consultés dans le schéma de référence LBX, document GlobalInformation
(voir Annexe B).
5.3 Implémentation de la classe Lexicon
La classe Lexicon doit être implémentée en LBX au moyen de l’élément (voir Tableau 3),
qui est un enfant direct de l’élément lorsque ce dernier est utilisé. Si l’élément
n’est pas utilisé, devient l’élément racine. Dans les cas où une ressource
lexicale contient un grand nombre de lexiques ou plusieurs lexiques volumineux, le lexique (document
XML) peut faire référence à une ressource lexicale virtuelle en utilisant un attribut @lexicalResourceID
dans l’élément (voir 5.1). Dans le cas d’une ressource lexicale virtuelle, où l’élément
ne fait pas partie du même document XML que l’élément , le lexique peut
utiliser une instruction include pour faire référence à un élément pertinent. Il
convient de qualifier les autres informations dans l’élément en utilisant les éléments enfants
et les attributs suivants en tant qu’enfants directs de l’élément ou, idéalement, en tant
qu’enfants de l’élément (voir 5.4):
— , le titre du lexique; — @lexiconID de datatype «xs:ID» en tant qu’identifiant unique pour le lexique; à titre de meilleure pratique, il convient que l’ID soit un URI et qu’il soit unique au sein d’une ressource linguistique; @xml:ID peut être utilisé à la place de @lexiconID lorsque la conception cherche à rendre l’entrée accessible par le web; — @lexicalResourceID de datatype «xs:ID» en tant qu’identifiant unique pour la ressource lexicale; à titre de meilleure pratique, il convient que l’ID soit un URI pour le domaine d’application global; de plus, @xml:ID peut être utilisé à la place de @lexicalResourceID lorsque la conception cherche à rendre l’entrée accessible par le web; — @lexiconType de datatype «xs: string»; le type de lexique, par exemple dictionnaire bilingue ou monolingue; — @sourceLanguage de datatype «xs: string»; la langue de l’élément <Lemma> ou de ses formes fléchies; — @targetLanguage de datatype «xs: string»; la langue dans laquelle le lemme est traduit, principalement représentée dans l’élément <Translation>. Tableau 3 — Classe Lexicon Classe LMF Construction LBX /Lexicon/ <Lexicon> 5.4 Implémentation de la classe LexiconInformation La classe LexiconInformation doit être implémentée en LBX au moyen de l’élément <LexiconInformation> (voir Tableau 4), soit en faisant référence à un schéma LexiconInformation.xsd en utilisant un élément <xsd: include>, soit en tant qu’enfant direct de l’élément <Entry>. <LexiconInformation> permet de coder une diversité d’informations administratives, techniques, documentaires et bibliographiques associées à l’entrée lexicale correspondante. Tableau 4 — Classe LexiconInformation Classe LMF Construction LBX /LexiconInformation/ <LexiconInformation> Lorsque les informations qualifiant le lexique ne sont pas incluses dans l’élément <Lexicon>, ells doivent être intégrées sous forme d’éléments et d’attributs dans l’élément <LexiconInformation>. Ces informations comprennent (voir 5.3): — <Title>; — @lexiconID; — @lexicalResourceID; — @lexiconType; — @sourceLanguage; — @targetLanguage. La classe <LexiconInformation> peut également inclure des éléments et des catégories de données qui peuvent améliorer la qualification des informations dans le lexique et peuvent être utilisés pour faciliter la validation du document XML (lexique). Il convient également d’intégrer ces éléments et catégories de données dans l’ensemble global d’éléments et de catégories de données trouvés dans l’élément <GlobalInformation> (voir 5.2) et il est recommandé d’inclure dans le processus de validation une comparaison des valeurs correspondantes dans <GlobalInformation> et <LexiconInformation>. La liste non exhaustive suivante concerne ces sous-éléments de types simples indexés par valeurs: — NotationType, un type simple énumérant l’ensemble des notations utilisées dans un lexique; — PartOfSpeechType, un type simple énumérant l’ensemble des valeurs <partOfSpeech> utilisées dans un lexique; — SubjectFieldType, un type simple énumérant l’ensemble des valeurs <SubjectField> utilisées dans un lexique. NOTE En plus de la construction <LexiconInformation>, LBX permet de concaténer les informations d’un sous-ensemble de lexiques regroupés par langue, en faisant référence à un schéma de données linguistiques nommé (par exemple ArabicLanguageData.xsd) (voir paragraphe B.1). 5.5 Implémentation de la classe LexicalEntry La classe LexicalEntry doit être implémentée en LBX au moyen de l’élément <Entry> (voir Tableau 5). Il est recommandé de coder les informations lexicales à l’intérieur de l’élément <Entry> en utilisant les éléments enfants suivants: — <GramFeats> pour les informations grammaticales associées à l’entrée complète; — <Form> pour contenir le texte littéral et les attributs qualifiant le texte littéral (la classe Form est sérialisée par des sous-classes en LBX); — <Etymology> pour les aspects étymologiques; — <Sense> pour les informations sémantiques; — <Xref> pour les références aux éléments internes ou externes. Les attributs utilisés pour l’élément <LexicalEntry> peuvent comprendre: — @entryID de datatype «xs:ID» en tant qu’identifiant unique pour une entrée; à titre de meilleure pratique, il convient que l’ID soit un URI et qu’il soit unique au sein d’une ressource linguistique; @xml:ID peut être utilisée à la place de @entryID lorsque la conception cherche à rendre l’entrée accessible par le web; — @lexiconID de datatype «xs:ID» en tant qu’identifiant unique pour le lexique parent; à titre de meilleure pratique, il convient que l’ID soit un URI et qu’il soit unique au sein d’une ressource linguistique; @xml:ID peut être utilité à la place de @entryID lorsque la conception cherche à rendre l’entrée accessible par le web; — @lexicalResourceID, une référence au @lexicalResourceID de la collection de lexiques associée lorsqu’il y a plusieurs lexiques. Tableau 5 — Classe LexicalEntry Classe LMF Construction LBX /LexicalEntry/ <Entry> L’exemple suivant en français illustre le codage d’une entrée simple de dictionnaire décrivant deux sens. EXEMPLE <Entry xml:lang="fr"> <Etymology>XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7); anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology> <Lemma> <GramFeats> <POS>noun</POS> <Gender>fem</Gender> </GramFeats> <FormRep xml:lang="fr" notation="French">langouste</FormRep> <FormRep xml:lang="fr" notation="IPA">lägust</FormRep> </Lemma> <Sense senseNR="1"> <Def> <DefRep xml:lang="fr">Grand crustacé marin (Décapodes macroures) aux pattes antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est très appréciée.</DefRep> </Def> </Sense> <Sense senseNR="2"> <Note type="socioCultural">Fig. et fam. (vulg.).</Note> <Def> <DefRep xml:lang="fr">Femme, maîtresse</DefRep> </Def> </Sense> </Entry> NOTE 1 Le style de l’exemple ci-dessus peut être utilisé dans une ressource lexicale contenant une collection de lexiques bilingues dans une diversité de langues sources (français, espagnol, russe et chinois, par exemple). Un style plus simple peut être utilisé pour une collection de lexiques français monolingues. Par exemple, <Orth> et <Pron> peuvent être utilisés à la place des éléments <FormRep> équivalents et l’élément <Def> peut directement contenir le texte plutôt que d’utiliser un élément enfant <DefRep> pour gérer le contenu textuel (voir 5.10). Voir 6.2 pour obtenir un exemple de simplification utilisant les éléments <Orth> et <Pron>. NOTE 2 La valeur «French» de @notation est l’abrégé de «Canonical French». 5.6 Implémentation de la classe OrthographicRepresentation Les classes Form, Lemma et Definition contiennent une classe OrthographicRepresentation. Les représentations orthographiques doivent être implémentées en LBX au moyen d’éléments correspondant aux sous-classes OrthographicRepresentation qui sont présentées dans l’ISO 24613-2 (modèle de dictionnaire lisible par ordinateur (MRD)), ou aux nouvelles sous-classes OrthographicRepresentation possibles découlant des principes des extensions LMF décrites dans l’ISO 24613-1 (modèle de base). L’ISO 24613-1:2019, 5.6.1, décrit certains types de représentation qui peuvent servir de base pour l’extension de la classe OrthographicRepresentation. L’ISO 24613-4:2021 (extension TEI), 6.1, énumère plusieurs éléments de représentation qui peuvent être utilisés avec la classe Form. Les éléments implémentés dans la présente partie sont décrits en 5.7.2 et 5.10 ainsi que dans les paragraphes successifs de 6.3.2 à 6.3.8. 5.7 Implémentation de la classe Form 5.7.1 Classe Form La classe Form doit être implémentée en LBX par des éléments qui instancient des sous-classes Form (voir Tableau 6, 6.2 et 6.3). Tableau 6 — Classe Form Classe LMF Construction LBX /Form/ <Form> 5.7.2 Classe Lemma La classe Lemma, une sous-classe de la classe Form, doit être implémentée en LBX au moyen de l’élément <Lemma> (voir Tableau 7). Tableau 7 — Classe Lemma Classe LMF Construction LBX /Lemma/ <Lemma> Les représentations orthographiques dans l’élément <Lemma> doivent être implémentées en LBX au moyen de l’élément <FormRep>, ou par des éléments qui instancient des sous-classes Form, y compris <Orth> et <Pron>. NOTE 1 Les éléments <FormRep>, <Orth> et <Pron> sont présentés en 6.2. NOTE 2 <Orth> et <Pron> peuvent être admis si les objectifs de la conception le justifient. 5.8 Implémentation de la classe GrammaticalInformation La classe GrammaticalInformation regroupe les traits grammaticaux associés aux classes LexicalEntry, Form ou autres (par exemple Translation, Sense), en cas de restrictions grammaticales particulières. La classe GrammaticalInformation doit être implémentée en LBX au moyen de l’élément <GramFeats> (voir Tableau 8) combiné avec différents éléments enfants possibles pour des traits grammaticaux spécifiques. Tableau 8 — Classe GrammaticalInformation Classe LMF Construction LBX /GrammaticalInformation/ <GramFeats> LBX fournit les éléments enfants suivants de <GramFeats> pour décrire des traits grammaticaux spécifiques d’éléments associés (par exemple <Lemma>, <WordForm>): — <POS> pour indiquer la catégorie grammaticale de l’élément lexical. Cela correspond à la catégorie de données /partOfSpeech/ de l’ISO 24611:2012, Annexe A; — <Person> pour indiquer la personne grammaticale (si pertinente) de l’élément lexical ou de l’une de ses formes fléchies. Cela correspond à la catégorie de données /person/ de l’ISO 24611:2012, Annexe A; — <Gender> pour indiquer le genre grammatical (si pertinent) de l’élément lexical ou de l’une de ses formes fléchies. Cela correspond à la catégorie de données /grammaticalGender/ de l’ISO 24611:2012, Annexe A; — <Number> pour indiquer le nombre grammatical (si pertinent) de l’élément lexical ou de l’une de ses formes fléchies. Cela correspond à la catégorie de données /grammaticalNumber/ de l’ISO 24611:2012, Annexe A; — <Tense> pour indiquer le temps grammatical (si pertinent) de l’élément lexical ou de l’une de ses formes fléchies. Cela correspond à la catégorie de données /grammaticalTense/ de l’ISO 24611:2012, Annexe A; — <Aspect> pour indiquer l’aspect grammatical (si pertinent) de l’élément lexical ou de l’une de ses formes fléchies; — <Mood> pour indiquer le registre grammatical (si pertinent) de l’élément lexical ou de l’une de ses formes fléchies; — <Voice> pour indiquer la voix grammaticale (si pertinente) de l’élément lexical ou de l’une de ses formes fléchies; — <Animacy> pour indiquer l’animation grammaticale (si pertinente) de l’élément lexical ou de l’une de ses formes fléchies (en russe, par exemple); — <GrammaticalClass> pour indiquer la classe grammaticale (genre) des langues bantoues; — <GrammaticalClassGroup> pour indiquer les classes grammaticales agrégées (genres) d’un nom spécifique au singulier et au pluriel; — <iType> pour indiquer la classe de flexion associée à l’élément lexical ou à l’une de ses formes fléchies; — <Subcat> pour indiquer les informations de sous-catégorisation (par exemple transitif/intransitif, dénombrable/indénombrable). L’exemple suivant présente les informations grammaticales pour un mot-forme dans un dictionnaire français monolingue qui fait partie d’une ressource linguistique phonétique contenant une collection de dictionnaires monolingues et bilingues dans plusieurs langues sources. La valeur @notation="French", qui désigne le français canonique, est utilisée dans des bases de données qui prennent en charge un grand nombre de notations éventuellement idiosyncratiques (par exemple pour les formes canoniques, translittérées et transcrites). EXEMPLE <Entry> <Lemma> <GramFeats> <POS>verb</POS> <Subcat>transitive</Subcat> </GramFeats> <FormRep xml:lang="fr" notation="French">pacifier</FormRep> <FormRep xml:lang="fr" notation="ipa">pasifje</FormRep> </Lemma> <Sense><Sense/> </Entry> Pour obtenir un exemple de simplification de ce schéma, voir 6.2. 5.9 Implémentation de la classe Sense La classe Sense, en tant que construction récursive, doit être implémentée en LBX au moyen de l’élément <Sense> (voir Tableau 9). LBX n’autorise pas les contenus de caractères dans l’élément. Tableau 9 — Classe Sense Classe LMF Construction LBX /Sense/ <Sense> 5.10 Implémentation de la classe Definition La classe Definition, qui contient une description narrative du sens d’un mot, doit être implémentée en LBX au moyen de l’élément <Def> (voir Tableau 10). Tableau 10 — Classe Definition Classe LMF Construction LBX /Definition/ <Def> LBX fournit l’élément enfant suivant pour la description des informations de définition: — <DefRep>, un élément qui instancie une sous-classe TextRepresentation contenant le contenu de caractères de la définition. Voir 6.3.6. L’élément <Def> permet d’inclure le texte littéral (contenu de caractères) dans l’élément <Def> lui-même, en utilisant des données mixtes. Voir 6.3.6 pour obtenir une description de l’élément <DefRep>, qui offre une méthode alternative pour la gestion du texte littéral. Dans un élément LBX <LexicalResource> ou <Lexicon>, la meilleure pratique consiste à utiliser <Def> ou <DefRep> de manière cohérente pour le contenu de caractères. NOTE L’élément <DefRep> facilite l’inclusion de multiples représentations orthographiques pour un élément <Def> (chinois simplifié ou traditionnel, par exemple). 5.11 Implémentation de la classe CrossREF La classe CrossREF doit être implémentée en LBX au moyen de l’élément <Xref> (voir Tableau 11), qui pointe sur un objet dictionnaire interne ou externe, tel qu’une entrée, un lemme, un sens ou une traduction. LBX permet d’utiliser un éventail de différents types de données pour la référence croisée (URI, IRI, HREF, etc.). Afin de rendre les données accessibles par le web, il convient que LBX implémente des normes pour le web, telles qu’une URL ou un modèle RDF (Resource Description Framework). L’élément <Xref> peut être qualifié par des attributs, tels que @relType pour décrire le type de relation (par exemple synonyme, antonyme, hyponyme). Il convient d’identifier le type d’identifiant et toutes ses caractéristiques et contraintes inhérentes dans l’élément <GlobalInformation> ou <LexiconInformation>, suivant le cas. Tableau 11 — Classe CrossREF Classe LMF Construction LBX /CrossREF/ <Xref [URI|IRI|HREF|…]=""> L’élément <Xref> peut être utilisé comme un enfant de l’élément <RelForm> afin de pointer sur un contenu et des métadonnées dans une entrée différente. Dans ce cas, l’élément <Xref> peut remplacer le contenu et les métadonnées contenues dans l’élément <RelForm>. EXEMPLE 1 <Entry> <Lemma> <FormRep xml:lang="en">lawful</FormRep> </Lemma> <RelForm relType="synonym"> <Xref href="#legal">legal</Xref> </RelForm> <Sense/> </Entry> L’élément <Xref> peut également implémenter la classe CrossREF via d’autres classes. Dans l’exemple suivant, des sens associés à des enchaînements donnent une meilleure description des relations sémantiques. EXEMPLE 2 <Entry> <Lemma> <FormRep xml:lang="en">lawful</FormRep> </Lemma> <Sense xml:id="legal"> <Def>being legal</Def> <Xref IRI="#legal-sense1"/> </Sense> </Entry> LBX permet d’adopter de multiples stratégies pour décrire les expressions multi-mots en utilisant le mécanisme CrossREF. Dans l’exemple suivant, l’élément <Xref> est contenu dans le lemme et utilisé pour pointer sur d’autres entrées, dont chacune contient un composant de l’expression multi-mot. EXEMPLE 3 <Entry> <Lemma formStructure="MWE"> <GramFeats> <POS>noun</POS> </GramFeats> <FormRep xml:lang="en">motion picture</FormRep> <Xref relType="component" order="1" IRI="#motion_form_1">motion</Xref> <Xref relType="component" order="2" IRI="#picture_form_1">picture</Xref> </Lemma> <Sense> <Def>sequence of pictures that give the effect of motion when shown in rapid succession</Def> </Sense> </Entry> Dans l’exemple suivant, l’utilisation de l’élément <RelForm> pour implémenter l’expression multi-mot permet une analyse grammaticale plus approfondie. EXEMPLE 4 <Entry> <Lemma formStructure="MWE"> <GramFeats> <POS>noun</POS> </GramFeats> <FormRep xml:lang="en">motion picture</FormRep> </Lemma> <RelForm relType="MWEComponent"> <GramFeats> <POS>attributiveNoun</POS> </GramFeats> <FormRep xml:lang="en">motion</FormRep> <Xref relType="component" order="1" IRI="#motion_form_1"/> </RelForm> <RelForm relType="MWEComponent"> <FormRep xml:lang="en">picture</FormRep> <Xref relType="component" order="2" IRI="#picture_form_1"/> </RelForm> <Sense> <Def xml:lang="en">sequence of pictures that give the effect of motion when shown in rapid succession</Def> </Sense> </Entry> 6 Sérialisation de l’extension MRD (ISO 24613-2) 6.1 Implémentation des sous-classes OrthographicRepresentation La classe OrthographicRepresentation en LBX doit être sérialisée au moyen d’éléments dérivés des classes FormRepresentation et TextRepresentation ou de leurs sous-classes décrites dans l’ISO 24613-2. Les sous-classes FormRepresentation sont détaillées en 6.2 et les sous-classes TextRepresentation en 6.3.6. Dans tous les cas, ces éléments correspondants peuvent être qualifiés par des attributs, y compris @xml: lang, @script et @notation. D’autres attributs, tels que @representationType (par exemple canonicalForm, phoneticForm), sont également disponibles. Il convient qu’une implémentation en LBX utilise le BCP 47 pour la description linguistique, en particulier pour la prise en charge des applications web, l’échange des données et l’interopérabilité entre les systèmes (@script n’est pas utilisé si BCP 47 est implémenté). 6.2 Implémentation de la classe FormRepresentation Selon les objectifs de conception, la classe FormRepresentation en LBX doit être sérialisée au moyen d’un élément <FormRep> général dérivé de la classe FormRepresentation, ou par des éléments dérivés de sous-classes FormRepresentation qui correspondent à l’environnement lexical des sous-classes Form (voir Tableau 12). L’objectif de cette conception est de faciliter la gestion efficace des ressources pour les bases de données lexicales complexes et volumineuses (par exemple de nombreux lexiques englobant de nombreuses langues). Si les objectifs de conception le justifient, une simplification peut être obtenue en réduisant le nombre de sous-classes utilisées. Ces sous-classes sont représentées par les éléments décrits en 6.3.1 à 6.3.5, couplés aux attributs appropriés pour qualifier le contenu, en particulier @xml: lang, @script et @notation. Tableau 12 — Classe FormRepresentation Classe LMF Construction LBX /FormRepresentation/ <FormRep> <StemRep> <PartRep> <RelFormRep> <FormRep> est contenu par les éléments <Lemma> (voir 5.7.2), <WordForm> et <RelForm>; <StemRep>, <PartRep> et <RelFormRep> sont restreints à des éléments spécifiques (voir 6.3, 6.4 et 6.5). Lorsque la classe FormRepresentation est auto-suffisante sans nécessiter de qualification complémentaire, des équivalents d’éléments TEI, tels que <Orth>, <Pron>, <Hyph>, <Stress> et <Syll>, peuvent potentiellement être utilisés pour poursuivre la simplification. Dans ces cas, il convient d’inclure les descriptions des attributs qualifiants (par exemple @xml: lang) dans <GlobalInformation> ou <LexiconInformation>, suivant le cas. L’exemple suivant montre les éléments <Orth> et <Pron> pour un lemma dans un dictionnaire français monolingue, faisant partie d’une collection de lexiques français monolingues. En référence à l’ISO 24613-4:2021, 5.2, il n’y a aucune exigence d’inclure les codes de langue et de script dans l’élément <GlobalInformation> (bien que LBX permette d’inclure ces codes). <GlobalInformation> peut également être utilisé pour implémenter une définition qui assigne la valeur de @notation «ipa» à l’élément <Pron>. L’exemple suivant présente la mise en œuvre de ce principe en utilisant une version révisée de l’exemple de 5.8. EXEMPLE <Entry> <Lemma> <GramFeats> <POS>verb</POS> <Subcat>transitive</Subcat> </GramFeats> <Orth>pacifier</Orth> <Pron>pasifje</Pron> </Lemma> <Sense/> </Entry> 6.3 Implémentation des sous-classes Form 6.3.1 Principes généraux Les sous-classes Form décrites dans l’ISO 24613-2 doivent être sérialisées au moyen des éléments décrits de ...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization

Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 5: Sérialisation de l’échange de bases lexicales (LBX)

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del: Serializacija leksikalne osnovne izmenjave (LBX)

General Information

Relations

Overview

Key topics and technical requirements

Practical applications and users

Related standards

Buy Documents

ISO 24613-5:2023

ISO 24613-5:2022 - Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization Released:1/19/2022

ISO 24613-5:2022 - Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization Released:1/19/2022

Frequently Asked Questions

Standards Content (Sample)

Questions, Comments and Discussion

This May Also Interest You