SIST ISO 24613-2:2021
(Main)Language resource management -- Lexical markup framework (LMF) -- Part 2: Machine Readable Dictionary (MRD) model
Language resource management -- Lexical markup framework (LMF) -- Part 2: Machine Readable Dictionary (MRD) model
This document describes the machine-readable dictionary (MRD) model, a metamodel for representing data stored in a variety of electronic dictionary subtypes, ranging from direct support for human translators to support for machine processing.
Gestion de ressources linguistiques -- Cadre de balisage lexical -- Partie 2: Titre manque
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 2. del: Model za strojno berljiv slovar (MRD)
General Information
Relations
Overview
SIST ISO 24613-2:2021 - part of the Lexical Markup Framework (LMF) - defines the Machine-Readable Dictionary (MRD) model, an implementation‑independent metamodel for representing electronic dictionary data. Building on the LMF core model (ISO 24613‑1), this part specifies UML classes, associations, data‑category selection and cross‑reference (CrossREF) principles to support a wide range of lexicon designs - from human‑oriented bilingual dictionaries to rigorously constrained lexica for machine processing.
Key topics and technical requirements
- Metamodel and class model: MRD is represented as UML classes and associations. Core classes reused from the LMF core include LexicalResource, Lexicon, LexicalEntry, Lemma, Form, Sense and Definition; new MRD classes include WordForm, Stem, WordPart, RelatedForm, Translation, Example, FormRepresentation, TextRepresentation, SubjectField, Bibliography.
- Class selection and multiplicity: Designers choose a subset of classes and set association multiplicities to match design goals (e.g., monolingual vs. bilingual MRDs). Optional classes have minimum cardinality zero.
- Generalization (typing): The model supports subclassing (e.g., Form → Lemma, WordForm, Stem, WordPart) to allocate specific data categories and constraints to subclasses.
- CrossREF model: Cross‑references link Forms, Senses and Translations, enabling rich interconnections across lexical entries (a capability emphasized in LMF Part 1).
- Data category selection: Annex A provides examples of data categories; developers may reuse or define domain‑specific data categories for features such as morphology, part of speech, usage notes.
- Object realization and serialization: The standard is implementation‑independent but discusses realization choices; commonly used serializations include XML and JSON, with tradeoffs in element/attribute modeling.
- Supporting morphology and MWEs: The MRD model supports extensional morphological descriptions and flexible modeling of multiword expressions (MWE) using CrossREFs and typed classes.
Practical applications and users
Who benefits:
- NLP engineers and computational linguists building lexicons for parsing, tagging, or machine translation.
- Machine translation and localization teams creating bilingual or multilingual MRDs.
- Lexicographers and dictionary publishers designing structured, interoperable electronic dictionaries.
- Ontology and terminology engineers linking lexical forms to semantic resources. Practical uses:
- Designing interoperable lexicons for NLP pipelines.
- Creating bilingual dictionaries with explicit translation mappings.
- Modeling full morphological paradigms for language resources.
- Serializing lexicons in XML/JSON for dissemination and tool integration.
Related standards
- ISO 24613‑1 (LMF core model) - normative foundation for ISO 24613‑2.
- Other parts of the ISO 24613 series provide complementary LMF extensions and guidance for specific lexical resource types.
Keywords: SIST ISO 24613-2:2021, Lexical Markup Framework, LMF, Machine‑Readable Dictionary, MRD model, lexicon, CrossREF, UML, XML, JSON, morphology, bilingual dictionary, NLP, machine translation.
Frequently Asked Questions
SIST ISO 24613-2:2021 is a standard published by the Slovenian Institute for Standardization (SIST). Its full title is "Language resource management -- Lexical markup framework (LMF) -- Part 2: Machine Readable Dictionary (MRD) model". This standard covers: This document describes the machine-readable dictionary (MRD) model, a metamodel for representing data stored in a variety of electronic dictionary subtypes, ranging from direct support for human translators to support for machine processing.
This document describes the machine-readable dictionary (MRD) model, a metamodel for representing data stored in a variety of electronic dictionary subtypes, ranging from direct support for human translators to support for machine processing.
SIST ISO 24613-2:2021 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination); 01.140.20 - Information sciences; 35.240.30 - IT applications in information, documentation and publishing. The ICS classification helps identify the subject area and facilitates finding related standards.
SIST ISO 24613-2:2021 has the following relationships with other standards: It is inter standard links to SIST ISO 24613:2013. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase SIST ISO 24613-2:2021 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of SIST standards.
Standards Content (Sample)
SLOVENSKI STANDARD
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 2. del:
Model za strojno berljiv slovar (MRD)
Language resource management -- Lexical markup framework (LMF) -- Part 2: Machine
Readable Dictionary (MRD) model
Gestion de ressources linguistiques -- Cadre de balisage lexical -- Partie 2: Titre manque
Ta slovenski standard je istoveten z: ISO 24613-2:2020
ICS:
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24613-2
First edition
2020-07
Language resource management —
Lexical markup framework (LMF) —
Part 2:
Machine-readable dictionary (MRD)
model
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 2: Modèle de dictionnaire lisible par ordinateur (MRD)
Reference number
©
ISO 2020
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Key standards used by LMF . 1
5 The machine-readable dictionary (MRD) model . 1
5.1 General . 1
5.2 MRD class model . 2
5.2.1 Set of classes . 2
5.2.2 Class selection and multiplicity . 2
5.2.3 Generalization . 3
5.2.4 Object realization . 3
5.3 Data category selection and class population . 3
5.4 CrossREF allocation . 3
5.5 Form subclasses . 4
5.5.1 WordForm class . 4
5.5.2 Lemma class . 4
5.5.3 Stem class . 4
5.5.4 WordPart class . 4
5.5.5 RelatedForm class . 4
5.6 FormRepresentation class . 4
5.7 TextRepresentation class . 5
5.8 Translation class . 5
5.9 Example class . 5
5.10 SubjectField class . 5
5.11 Bibliography class . 5
5.12 Multiword Expression (MWE) Analysis . 6
Annex A (informative) Data category examples . 7
Annex B (informative) Machine-readable dictionary examples . 9
Bibliography .21
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
1) 1)
This first edition of ISO 24613-2, together with ISO 24613-1:2019, ISO 24613-3 , ISO 24613-4 ,
1) 2) 2)
ISO 24613-5 , ISO 24613-6 and ISO 24613-7 , cancels and replaces ISO 24613:2008, which has been
divided into several parts and technically revised.
The main changes compared to the previous edition are as follows.
This edition merges two normative annexes from the previous edition, Annex A, Morphology extension,
and Annex C, Machine-readable dictionary extension, providing a more cohesive description of the
key structures (classes and associations) found in that edition. The cross-reference (CrossREF) model
introduced in Part 1, Core model, of this edition, provides a new capability for correlating lexical
features across different form and sense classes. In addition, the CrossREF model has replaced the
ListOfComponents and Component classes, enabling a more extensible and flexible capability for
managing multiword expressions. The metamodel of generalization by typing introduced in Part 1
provides a more rigorous and unambiguous framework for applying LMF modelling mechanisms in
ways that enable greater editorial freedom and support the comparison of different LMF conformant
designs. This edition has kept most of the informative examples found in the previous edition (deleting
only a few redundant examples) and has added new examples to illustrate new modelling features.
There have been some class name changes (e.g. OrthographicRepresentation for Representation and
Translation for Equivalent), but no changes in the underlying concepts of the previously existing classes.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
1) Under preparation.
2) Planned.
iv © ISO 2020 – All rights reserved
Introduction
The ISO 24613 series is based upon the definition of an implementation-independent metamodel
combining a core model and additional models that onomasiological (form-oriented) and semasiological
(concept-oriented) lexical content can take.
It provides guidelines for various implementation use cases, and where appropriate describes LMF
compliant serializations that fit various application contexts.
This document extends ISO 24613-1, the LMF core model, through the use of the processes and
mechanisms described in ISO 24613-1. The objective is to enable flexible design methods to support
the development of machine-readable dictionaries for different purposes while enabling cross-
comparisons of different designs and a basis for developing assessments of standards conformance.
The scope of supported design goals ranges from simple to complex human-oriented MRDs, both
monolingual and bilingual, lexicons that support conceptual-lexical systems through links with
ontological resources, rigorously constrained lexicons for supporting machine processes, and lexicons
that provide an extensional description of the morphology of lexical entries. Since this document is
based on ISO 24613-1, the LMF core model, it is designed to interchange data with other parts of the
ISO 24613 series where applicable.
INTERNATIONAL STANDARD ISO 24613-2:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 2:
Machine-readable dictionary (MRD) model
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
This document describes the machine-readable dictionary (MRD) model, a metamodel for representing
data stored in a variety of electronic dictionary subtypes, ranging from direct support for human
translators to support for machine processing.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
4 Key standards used by LMF
The key standards applicable to this document are described in ISO 24613-1, the LMF core model.
5 The machine-readable dictionary (MRD) model
5.1 General
The MRD model is represented by UML classes, associations among the classes (the structure), sets
of data categories (attribute-value pairs), and links (cross-references). Subclauses 5.2 through 5.12
describe each of these features, their interdependencies, and their implementation.
Figure 1 — MRD class model
5.2 MRD class model
5.2.1 Set of classes
The classes defined in ISO 24613-1, the LMF core model, that are used in the MRD extension include
LexicalResource, GlobalInformation, Lexicon, LexiconInformation, GrammaticalInformation,
LexicalEntry, Lemma, Form, Sense, Definition, OrthographicRepresentation, and principles for
applying the CrossREF class. These classes, together with the associations and constraints described
in ISO 24613-1, are applicable to the design of MRD. New classes introduced in this document
include WordForm, Stem, WordPart, RelatedForm, Translation, Example, FormRepresentation,
TextRepresentation, Bibliography and SubjectField.
5.2.2 Class selection and multiplicity
The sets of classes shown in the model in Figure 1 can support a wide range of design objectives. A
specific design objective can require all or only some of the classes shown in the above model and
can require as well the creation of new subclasses. The recommended first step in the creation of a
model for a specific design objective (e.g. a bilingual dictionary) should be the selection and possible
exclusion of classes contained in the class model and the application of desired multiplicities to the
class associations as required by the model and the design goals (the optional classes in the model
have a minimum cardinality of zero). The developer can create new subclasses, as needed, using the
mechanisms described in ISO 24613-1, the LMF core model. The selected classes and their associations
2 © ISO 2020 – All rights reserved
provide the structure and nodes (classes) appropriate for the intended lexical design. The classes and
subclasses are described in detail below (see 5.5 to 5.11).
EXAMPLE
— Certain classes of MRD, such as monolingual and bilingual dictionaries, generally require a Sense class
instantiation.
— Certain classes of MRD, such as concept hierarchies, do not necessarily require a Form class instantiation.
— Certain classes of MRD, such as orthographic dictionaries and extensional morphologies do not necessarily
require a Sense class instantiation.
— Certain classes of MRD, such as extensional morphologies, can provide constraints on the attributes managed
by the RelatedForm class.
NOTE The purpose of the MRD morphology extension is to provide the mechanisms to support the
development of lexicons that have an extensional description of the morphology of lexical entries in which all
relevant inflections or derivations of a lemma are included.
5.2.3 Generalization
Figure 1 illustrates the use of generalization (typing) through the Form class (superclass) and its
subclasses, Lemma, WordForm, Stem, and WordPart, and OrthographicRepresentation (superclass)
and its subclasses, FormRepresentation and TextRepresentation. The typing mechanism describes
how to allocate specific sets of data categories, associations, multiplicities, and cross-references to
subclasses (e.g. Lemma) in order to redefine the superclass. ISO 24613-1 provides a more complete
description of typing.
NOTE The subclasses shown in Figure 1 are available for use in LMF compliant designs, but are not
exhaustive, since LMF allows the creation of additional subclasses. The lexicon designer specifies what sets of
features are available in form features.
5.2.4 Object realization
LMF provides examples of object models (see Annex B), but does not provide an in-depth description
of the overall methodologies for developing the object models, since those processes are heavily
dependent on the choice of model serialization (e.g. XML, JSON). Different serializations can require
different design approaches and impose limitations on how the object can be modelled.
EXAMPLE XML provides a number of structural models for implementing XML schemas. Within the
framework of these models, a lexicon designer could implement UML classes as XML elements or a combination of
an XML element and attributes. For example, a designer could instantiate the Lemma class as a element
or a element-attribute combination. These object modelling choices use selective class
and data category allocations to implement object designs that are strongly dependent on the structures and
methods of the chosen serialization.
5.3 Data category selection and class population
Data category selection can include all or a subset of data categories used by a given domain. Examples
of data categories and their allocations are listed in Annex A. Where needed, the lexicon developer can
create new data categories that are not listed in the annex.
5.4 CrossREF allocation
Figure 1 shows links (cross-references) between the Form and Sense and the Form and Translation
classes. The principles for modelling cross-references are described in ISO 24613-1, the LMF core model.
The CrossREF class is specifically allowed for the LexicalEntry class, the Lemma class, the WordForm
class, the WordPart class, the Sense class, and the Sense class children. The lexicon designer should
consider using cross-references with the RelatedForm class. The use of data categories to provide
information about the CrossREF features (e.g. internal reference, external reference, type of ID, lexical
type, syntactic type, or semantic type) is a best practice.
EXAMPLE A WordPart that contains the suffix component of a Lemma can be cross-referenced with the
LexicalEntry that contains that suffix as the Lemma, or a Sense can be cross-referenced with a broader Sense
contained in a different LexicalEntry, or an authentic Quote can be cross-referenced with a document that
contains the Quote.
NOTE The range of data categories describing CrossREF features is potentially quite broad and could be
used to support references to audio, video, and other types of metadata relevant for lexical resources.
5.5 Form subclasses
5.5.1 WordForm class
WordForm is a Form subclass containing a word form, such as an inflected form, that a lexeme can take
when used in a sentence or a phrase. The WordForm class is in a zero-to-many aggregate association
with the LexicalEntry class (inheriting the Form multiplicity). The WordForm class can manage simple
lexemes, compounds, multi-word expressions, and sub-lexemes such as affixes and roots.
5.5.2 Lemma class
Lemma is a Form subclass representing a lexeme or sub-lexeme used to designate the LexicalEntry
(part of the Form-Sense paradigm). The Lemma class is in a zero-to-one aggregate association with the
LexicalEntry class that overrides the multiplicity inherited from the Form class (see ISO 24613-1 for a
more complete description of the Lemma).
5.5.3 Stem class
Stem is a Form subclass containing a stem or root. The Stem class can be typed as a specific type of
stem or root (e.g. type=”arabicRoot”). The Stem class is in a zero-to-one aggregate association with the
LexicalEntry class (overriding the multiplicity inherited from the Form class).
5.5.4 WordPart class
WordPart is a Form subclass representing sub-lexeme parts other than the stem or root (e.g. affix, prefix,
suffix). The WordPart class is in a zero-to-many aggregate association with the LexicalEntry class.
5.5.5 RelatedForm class
RelatedForm is a Form subclass containing a word form or a morph that is typical of run-on entries
in print dictionaries. The RelatedForm has a different Sense than the Lemma and can be considered a
candidate for eventual inclusion in a different LexicalEntry object when realized in a lexical database.
The RelatedForm can be related to the Lemma in a variety of ways (e.g. synonym, cross-reference,
multi-word expression, idiom). The RelatedForm class is in a zero-to-many aggregate association with
the LexicalEntry class and can contain a recursive cross-reference to the LexicalEntry class, which
would be realized as a link to a different LexicalEntry object when instantiated in a lexical database.
The RelatedForm class can be typed (generalization) using data categories.
EXAMPLE A developer possibly wants to use the RelatedForm class for a multi-word expression (e.g. United
States) that contains a component form of a Lemma (e.g. united). The design goal could be to preserve the format
of the original source material, or to provide immediate user support while developing an improved lexicon that
includes /united/ and /United States/ as separate entries.
5.6 FormRepresentation class
FormRepresentation is an OrthographicRepresentation subclass that contains the text literals and
metadata (e.g. pronunciation, hyphenation, xml: lang, script) for a Lemma, WordForm, or other subclass
4 © ISO 2020 – All rights reserved
of the Form class. The FormRepresentation class is in a one-to-many aggregate association with a Form
subclass. The FormRepresentation class allows subclasses (typing).
NOTE Data categories, such as xml: lang, script, and notation, are associated with the
OrthographicRepresentation class and inherited by subclasses.
EXAMPLE Because
...
SLOVENSKI STANDARD
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 2. del:
Model za strojno berljiv slovar (MRD)
Language resource management -- Lexical markup framework (LMF) -- Part 2: Machine
Readable Dictionary (MRD) model
Gestion de ressources linguistiques -- Cadre de balisage lexical -- Partie 2: Titre manque
Ta slovenski standard je istoveten z: ISO 24613-2:2020
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24613-2
First edition
2020-07
Language resource management —
Lexical markup framework (LMF) —
Part 2:
Machine-readable dictionary (MRD)
model
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 2: Modèle de dictionnaire lisible par ordinateur (MRD)
Reference number
©
ISO 2020
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Key standards used by LMF . 1
5 The machine-readable dictionary (MRD) model . 1
5.1 General . 1
5.2 MRD class model . 2
5.2.1 Set of classes . 2
5.2.2 Class selection and multiplicity . 2
5.2.3 Generalization . 3
5.2.4 Object realization . 3
5.3 Data category selection and class population . 3
5.4 CrossREF allocation . 3
5.5 Form subclasses . 4
5.5.1 WordForm class . 4
5.5.2 Lemma class . 4
5.5.3 Stem class . 4
5.5.4 WordPart class . 4
5.5.5 RelatedForm class . 4
5.6 FormRepresentation class . 4
5.7 TextRepresentation class . 5
5.8 Translation class . 5
5.9 Example class . 5
5.10 SubjectField class . 5
5.11 Bibliography class . 5
5.12 Multiword Expression (MWE) Analysis . 6
Annex A (informative) Data category examples . 7
Annex B (informative) Machine-readable dictionary examples . 9
Bibliography .21
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
1) 1)
This first edition of ISO 24613-2, together with ISO 24613-1:2019, ISO 24613-3 , ISO 24613-4 ,
1) 2) 2)
ISO 24613-5 , ISO 24613-6 and ISO 24613-7 , cancels and replaces ISO 24613:2008, which has been
divided into several parts and technically revised.
The main changes compared to the previous edition are as follows.
This edition merges two normative annexes from the previous edition, Annex A, Morphology extension,
and Annex C, Machine-readable dictionary extension, providing a more cohesive description of the
key structures (classes and associations) found in that edition. The cross-reference (CrossREF) model
introduced in Part 1, Core model, of this edition, provides a new capability for correlating lexical
features across different form and sense classes. In addition, the CrossREF model has replaced the
ListOfComponents and Component classes, enabling a more extensible and flexible capability for
managing multiword expressions. The metamodel of generalization by typing introduced in Part 1
provides a more rigorous and unambiguous framework for applying LMF modelling mechanisms in
ways that enable greater editorial freedom and support the comparison of different LMF conformant
designs. This edition has kept most of the informative examples found in the previous edition (deleting
only a few redundant examples) and has added new examples to illustrate new modelling features.
There have been some class name changes (e.g. OrthographicRepresentation for Representation and
Translation for Equivalent), but no changes in the underlying concepts of the previously existing classes.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
1) Under preparation.
2) Planned.
iv © ISO 2020 – All rights reserved
Introduction
The ISO 24613 series is based upon the definition of an implementation-independent metamodel
combining a core model and additional models that onomasiological (form-oriented) and semasiological
(concept-oriented) lexical content can take.
It provides guidelines for various implementation use cases, and where appropriate describes LMF
compliant serializations that fit various application contexts.
This document extends ISO 24613-1, the LMF core model, through the use of the processes and
mechanisms described in ISO 24613-1. The objective is to enable flexible design methods to support
the development of machine-readable dictionaries for different purposes while enabling cross-
comparisons of different designs and a basis for developing assessments of standards conformance.
The scope of supported design goals ranges from simple to complex human-oriented MRDs, both
monolingual and bilingual, lexicons that support conceptual-lexical systems through links with
ontological resources, rigorously constrained lexicons for supporting machine processes, and lexicons
that provide an extensional description of the morphology of lexical entries. Since this document is
based on ISO 24613-1, the LMF core model, it is designed to interchange data with other parts of the
ISO 24613 series where applicable.
INTERNATIONAL STANDARD ISO 24613-2:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 2:
Machine-readable dictionary (MRD) model
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
This document describes the machine-readable dictionary (MRD) model, a metamodel for representing
data stored in a variety of electronic dictionary subtypes, ranging from direct support for human
translators to support for machine processing.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
4 Key standards used by LMF
The key standards applicable to this document are described in ISO 24613-1, the LMF core model.
5 The machine-readable dictionary (MRD) model
5.1 General
The MRD model is represented by UML classes, associations among the classes (the structure), sets
of data categories (attribute-value pairs), and links (cross-references). Subclauses 5.2 through 5.12
describe each of these features, their interdependencies, and their implementation.
Figure 1 — MRD class model
5.2 MRD class model
5.2.1 Set of classes
The classes defined in ISO 24613-1, the LMF core model, that are used in the MRD extension include
LexicalResource, GlobalInformation, Lexicon, LexiconInformation, GrammaticalInformation,
LexicalEntry, Lemma, Form, Sense, Definition, OrthographicRepresentation, and principles for
applying the CrossREF class. These classes, together with the associations and constraints described
in ISO 24613-1, are applicable to the design of MRD. New classes introduced in this document
include WordForm, Stem, WordPart, RelatedForm, Translation, Example, FormRepresentation,
TextRepresentation, Bibliography and SubjectField.
5.2.2 Class selection and multiplicity
The sets of classes shown in the model in Figure 1 can support a wide range of design objectives. A
specific design objective can require all or only some of the classes shown in the above model and
can require as well the creation of new subclasses. The recommended first step in the creation of a
model for a specific design objective (e.g. a bilingual dictionary) should be the selection and possible
exclusion of classes contained in the class model and the application of desired multiplicities to the
class associations as required by the model and the design goals (the optional classes in the model
have a minimum cardinality of zero). The developer can create new subclasses, as needed, using the
mechanisms described in ISO 24613-1, the LMF core model. The selected classes and their associations
2 © ISO 2020 – All rights reserved
provide the structure and nodes (classes) appropriate for the intended lexical design. The classes and
subclasses are described in detail below (see 5.5 to 5.11).
EXAMPLE
— Certain classes of MRD, such as monolingual and bilingual dictionaries, generally require a Sense class
instantiation.
— Certain classes of MRD, such as concept hierarchies, do not necessarily require a Form class instantiation.
— Certain classes of MRD, such as orthographic dictionaries and extensional morphologies do not necessarily
require a Sense class instantiation.
— Certain classes of MRD, such as extensional morphologies, can provide constraints on the attributes managed
by the RelatedForm class.
NOTE The purpose of the MRD morphology extension is to provide the mechanisms to support the
development of lexicons that have an extensional description of the morphology of lexical entries in which all
relevant inflections or derivations of a lemma are included.
5.2.3 Generalization
Figure 1 illustrates the use of generalization (typing) through the Form class (superclass) and its
subclasses, Lemma, WordForm, Stem, and WordPart, and OrthographicRepresentation (superclass)
and its subclasses, FormRepresentation and TextRepresentation. The typing mechanism describes
how to allocate specific sets of data categories, associations, multiplicities, and cross-references to
subclasses (e.g. Lemma) in order to redefine the superclass. ISO 24613-1 provides a more complete
description of typing.
NOTE The subclasses shown in Figure 1 are available for use in LMF compliant designs, but are not
exhaustive, since LMF allows the creation of additional subclasses. The lexicon designer specifies what sets of
features are available in form features.
5.2.4 Object realization
LMF provides examples of object models (see Annex B), but does not provide an in-depth description
of the overall methodologies for developing the object models, since those processes are heavily
dependent on the choice of model serialization (e.g. XML, JSON). Different serializations can require
different design approaches and impose limitations on how the object can be modelled.
EXAMPLE XML provides a number of structural models for implementing XML schemas. Within the
framework of these models, a lexicon designer could implement UML classes as XML elements or a combination of
an XML element and attributes. For example, a designer could instantiate the Lemma class as a element
or a element-attribute combination. These object modelling choices use selective class
and data category allocations to implement object designs that are strongly dependent on the structures and
methods of the chosen serialization.
5.3 Data category selection and class population
Data category selection can include all or a subset of data categories used by a given domain. Examples
of data categories and their allocations are listed in Annex A. Where needed, the lexicon developer can
create new data categories that are not listed in the annex.
5.4 CrossREF allocation
Figure 1 shows links (cross-references) between the Form and Sense and the Form and Translation
classes. The principles for modelling cross-references are described in ISO 24613-1, the LMF core model.
The CrossREF class is specifically allowed for the LexicalEntry class, the Lemma class, the WordForm
class, the WordPart class, the Sense class, and the Sense class children. The lexicon designer should
consider using cross-references with the RelatedForm class. The use of data categories to provide
information about the CrossREF features (e.g. internal reference, external reference, type of ID, lexical
type, syntactic type, or semantic type) is a best practice.
EXAMPLE A WordPart that contains the suffix component of a Lemma can be cross-referenced with the
LexicalEntry that contains that suffix as the Lemma, or a Sense can be cross-referenced with a broader Sense
contained in a different LexicalEntry, or an authentic Quote can be cross-referenced with a document that
contains the Quote.
NOTE The range of data categories describing CrossREF features is potentially quite broad and could be
used to support references to audio, video, and other types of metadata relevant for lexical resources.
5.5 Form subclasses
5.5.1 WordForm class
WordForm is a Form subclass containing a word form, such as an inflected form, that a lexeme can take
when used in a sentence or a phrase. The WordForm class is in a zero-to-many aggregate association
with the LexicalEntry class (inheriting the Form multiplicity). The WordForm class can manage simple
lexemes, compounds, multi-word expressions, and sub-lexemes such as affixes and roots.
5.5.2 Lemma class
Lemma is a Form subclass representing a lexeme or sub-lexeme used to designate the LexicalEntry
(part of the Form-Sense paradigm). The Lemma class is in a zero-to-one aggregate association with the
LexicalEntry class that overrides the multiplicity inherited from the Form class (see ISO 24613-1 for a
more complete description of the Lemma).
5.5.3 Stem class
Stem is a Form subclass containing a stem or root. The Stem class can be typed as a specific type of
stem or root (e.g. type=”arabicRoot”). The Stem class is in a zero-to-one aggregate association with the
LexicalEntry class (overriding the multiplicity inherited from the Form class).
5.5.4 WordPart class
WordPart is a Form subclass representing sub-lexeme parts other than the stem or root (e.g. affix, prefix,
suffix). The WordPart class is in a zero-to-many aggregate association with the LexicalEntry class.
5.5.5 RelatedForm class
RelatedForm is a Form subclass containing a word form or a morph that is typical of run-on entries
in print dictionaries. The RelatedForm has a different Sense than the Lemma and can be considered a
candidate for eventual inclusion in a different LexicalEntry object when realized in a lexical database.
The RelatedForm can be related to the Lemma in a variety of ways (e.g. synonym, cross-reference,
multi-word expression, idiom). The RelatedForm class is in a zero-to-many aggregate association with
the LexicalEntry class and can contain a recursive cross-reference to the LexicalEntry class, which
would be realized as a link to a different LexicalEntry object when instantiated in a lexical database.
The RelatedForm class can be typed (generalization) using data categories.
EXAMPLE A developer possibly wants to use the RelatedForm class for a multi-word expression (e.g. United
States) that contains a component form of a Lemma (e.g. united). The design goal could be to preserve the format
of the original source material, or to provide immediate user support while developing an improved lexicon that
includes /united/ and /United States/ as separate entries.
5.6 FormRepresentation class
FormRepresentation is an OrthographicRepresentation subclass that contains the text literals and
metadata (e.g. pronunciation, hyphenation, xml: lang, script) for a Lemma, WordForm, or other subclass
4 © ISO 2020 – All rights reserved
of the Form class. The FormRepresentation class is in a one-to-many aggregate association with a Form
subclass. The FormRepresentation class allows subclasses (typing).
NOTE Data categories, such as xml: lang, script, and notation, are associated with the
OrthographicRepresentation class
...
INTERNATIONAL ISO
STANDARD 24613-2
First edition
2020-07
Language resource management —
Lexical markup framework (LMF) —
Part 2:
Machine-readable dictionary (MRD)
model
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 2: Modèle de dictionnaire lisible par ordinateur (MRD)
Reference number
©
ISO 2020
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Key standards used by LMF . 1
5 The machine-readable dictionary (MRD) model . 1
5.1 General . 1
5.2 MRD class model . 2
5.2.1 Set of classes . 2
5.2.2 Class selection and multiplicity . 2
5.2.3 Generalization . 3
5.2.4 Object realization . 3
5.3 Data category selection and class population . 3
5.4 CrossREF allocation . 3
5.5 Form subclasses . 4
5.5.1 WordForm class . 4
5.5.2 Lemma class . 4
5.5.3 Stem class . 4
5.5.4 WordPart class . 4
5.5.5 RelatedForm class . 4
5.6 FormRepresentation class . 4
5.7 TextRepresentation class . 5
5.8 Translation class . 5
5.9 Example class . 5
5.10 SubjectField class . 5
5.11 Bibliography class . 5
5.12 Multiword Expression (MWE) Analysis . 6
Annex A (informative) Data category examples . 7
Annex B (informative) Machine-readable dictionary examples . 9
Bibliography .21
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
1) 1)
This first edition of ISO 24613-2, together with ISO 24613-1:2019, ISO 24613-3 , ISO 24613-4 ,
1) 2) 2)
ISO 24613-5 , ISO 24613-6 and ISO 24613-7 , cancels and replaces ISO 24613:2008, which has been
divided into several parts and technically revised.
The main changes compared to the previous edition are as follows.
This edition merges two normative annexes from the previous edition, Annex A, Morphology extension,
and Annex C, Machine-readable dictionary extension, providing a more cohesive description of the
key structures (classes and associations) found in that edition. The cross-reference (CrossREF) model
introduced in Part 1, Core model, of this edition, provides a new capability for correlating lexical
features across different form and sense classes. In addition, the CrossREF model has replaced the
ListOfComponents and Component classes, enabling a more extensible and flexible capability for
managing multiword expressions. The metamodel of generalization by typing introduced in Part 1
provides a more rigorous and unambiguous framework for applying LMF modelling mechanisms in
ways that enable greater editorial freedom and support the comparison of different LMF conformant
designs. This edition has kept most of the informative examples found in the previous edition (deleting
only a few redundant examples) and has added new examples to illustrate new modelling features.
There have been some class name changes (e.g. OrthographicRepresentation for Representation and
Translation for Equivalent), but no changes in the underlying concepts of the previously existing classes.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
1) Under preparation.
2) Planned.
iv © ISO 2020 – All rights reserved
Introduction
The ISO 24613 series is based upon the definition of an implementation-independent metamodel
combining a core model and additional models that onomasiological (form-oriented) and semasiological
(concept-oriented) lexical content can take.
It provides guidelines for various implementation use cases, and where appropriate describes LMF
compliant serializations that fit various application contexts.
This document extends ISO 24613-1, the LMF core model, through the use of the processes and
mechanisms described in ISO 24613-1. The objective is to enable flexible design methods to support
the development of machine-readable dictionaries for different purposes while enabling cross-
comparisons of different designs and a basis for developing assessments of standards conformance.
The scope of supported design goals ranges from simple to complex human-oriented MRDs, both
monolingual and bilingual, lexicons that support conceptual-lexical systems through links with
ontological resources, rigorously constrained lexicons for supporting machine processes, and lexicons
that provide an extensional description of the morphology of lexical entries. Since this document is
based on ISO 24613-1, the LMF core model, it is designed to interchange data with other parts of the
ISO 24613 series where applicable.
INTERNATIONAL STANDARD ISO 24613-2:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 2:
Machine-readable dictionary (MRD) model
IMPORTANT — The electronic file of this document contains colours which are considered to be
useful for the correct understanding of the document. Users should therefore consider printing
this document using a colour printer.
1 Scope
This document describes the machine-readable dictionary (MRD) model, a metamodel for representing
data stored in a variety of electronic dictionary subtypes, ranging from direct support for human
translators to support for machine processing.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
4 Key standards used by LMF
The key standards applicable to this document are described in ISO 24613-1, the LMF core model.
5 The machine-readable dictionary (MRD) model
5.1 General
The MRD model is represented by UML classes, associations among the classes (the structure), sets
of data categories (attribute-value pairs), and links (cross-references). Subclauses 5.2 through 5.12
describe each of these features, their interdependencies, and their implementation.
Figure 1 — MRD class model
5.2 MRD class model
5.2.1 Set of classes
The classes defined in ISO 24613-1, the LMF core model, that are used in the MRD extension include
LexicalResource, GlobalInformation, Lexicon, LexiconInformation, GrammaticalInformation,
LexicalEntry, Lemma, Form, Sense, Definition, OrthographicRepresentation, and principles for
applying the CrossREF class. These classes, together with the associations and constraints described
in ISO 24613-1, are applicable to the design of MRD. New classes introduced in this document
include WordForm, Stem, WordPart, RelatedForm, Translation, Example, FormRepresentation,
TextRepresentation, Bibliography and SubjectField.
5.2.2 Class selection and multiplicity
The sets of classes shown in the model in Figure 1 can support a wide range of design objectives. A
specific design objective can require all or only some of the classes shown in the above model and
can require as well the creation of new subclasses. The recommended first step in the creation of a
model for a specific design objective (e.g. a bilingual dictionary) should be the selection and possible
exclusion of classes contained in the class model and the application of desired multiplicities to the
class associations as required by the model and the design goals (the optional classes in the model
have a minimum cardinality of zero). The developer can create new subclasses, as needed, using the
mechanisms described in ISO 24613-1, the LMF core model. The selected classes and their associations
2 © ISO 2020 – All rights reserved
provide the structure and nodes (classes) appropriate for the intended lexical design. The classes and
subclasses are described in detail below (see 5.5 to 5.11).
EXAMPLE
— Certain classes of MRD, such as monolingual and bilingual dictionaries, generally require a Sense class
instantiation.
— Certain classes of MRD, such as concept hierarchies, do not necessarily require a Form class instantiation.
— Certain classes of MRD, such as orthographic dictionaries and extensional morphologies do not necessarily
require a Sense class instantiation.
— Certain classes of MRD, such as extensional morphologies, can provide constraints on the attributes managed
by the RelatedForm class.
NOTE The purpose of the MRD morphology extension is to provide the mechanisms to support the
development of lexicons that have an extensional description of the morphology of lexical entries in which all
relevant inflections or derivations of a lemma are included.
5.2.3 Generalization
Figure 1 illustrates the use of generalization (typing) through the Form class (superclass) and its
subclasses, Lemma, WordForm, Stem, and WordPart, and OrthographicRepresentation (superclass)
and its subclasses, FormRepresentation and TextRepresentation. The typing mechanism describes
how to allocate specific sets of data categories, associations, multiplicities, and cross-references to
subclasses (e.g. Lemma) in order to redefine the superclass. ISO 24613-1 provides a more complete
description of typing.
NOTE The subclasses shown in Figure 1 are available for use in LMF compliant designs, but are not
exhaustive, since LMF allows the creation of additional subclasses. The lexicon designer specifies what sets of
features are available in form features.
5.2.4 Object realization
LMF provides examples of object models (see Annex B), but does not provide an in-depth description
of the overall methodologies for developing the object models, since those processes are heavily
dependent on the choice of model serialization (e.g. XML, JSON). Different serializations can require
different design approaches and impose limitations on how the object can be modelled.
EXAMPLE XML provides a number of structural models for implementing XML schemas. Within the
framework of these models, a lexicon designer could implement UML classes as XML elements or a combination of
an XML element and attributes. For example, a designer could instantiate the Lemma class as a element
or a element-attribute combination. These object modelling choices use selective class
and data category allocations to implement object designs that are strongly dependent on the structures and
methods of the chosen serialization.
5.3 Data category selection and class population
Data category selection can include all or a subset of data categories used by a given domain. Examples
of data categories and their allocations are listed in Annex A. Where needed, the lexicon developer can
create new data categories that are not listed in the annex.
5.4 CrossREF allocation
Figure 1 shows links (cross-references) between the Form and Sense and the Form and Translation
classes. The principles for modelling cross-references are described in ISO 24613-1, the LMF core model.
The CrossREF class is specifically allowed for the LexicalEntry class, the Lemma class, the WordForm
class, the WordPart class, the Sense class, and the Sense class children. The lexicon designer should
consider using cross-references with the RelatedForm class. The use of data categories to provide
information about the CrossREF features (e.g. internal reference, external reference, type of ID, lexical
type, syntactic type, or semantic type) is a best practice.
EXAMPLE A WordPart that contains the suffix component of a Lemma can be cross-referenced with the
LexicalEntry that contains that suffix as the Lemma, or a Sense can be cross-referenced with a broader Sense
contained in a different LexicalEntry, or an authentic Quote can be cross-referenced with a document that
contains the Quote.
NOTE The range of data categories describing CrossREF features is potentially quite broad and could be
used to support references to audio, video, and other types of metadata relevant for lexical resources.
5.5 Form subclasses
5.5.1 WordForm class
WordForm is a Form subclass containing a word form, such as an inflected form, that a lexeme can take
when used in a sentence or a phrase. The WordForm class is in a zero-to-many aggregate association
with the LexicalEntry class (inheriting the Form multiplicity). The WordForm class can manage simple
lexemes, compounds, multi-word expressions, and sub-lexemes such as affixes and roots.
5.5.2 Lemma class
Lemma is a Form subclass representing a lexeme or sub-lexeme used to designate the LexicalEntry
(part of the Form-Sense paradigm). The Lemma class is in a zero-to-one aggregate association with the
LexicalEntry class that overrides the multiplicity inherited from the Form class (see ISO 24613-1 for a
more complete description of the Lemma).
5.5.3 Stem class
Stem is a Form subclass containing a stem or root. The Stem class can be typed as a specific type of
stem or root (e.g. type=”arabicRoot”). The Stem class is in a zero-to-one aggregate association with the
LexicalEntry class (overriding the multiplicity inherited from the Form class).
5.5.4 WordPart class
WordPart is a Form subclass representing sub-lexeme parts other than the stem or root (e.g. affix, prefix,
suffix). The WordPart class is in a zero-to-many aggregate association with the LexicalEntry class.
5.5.5 RelatedForm class
RelatedForm is a Form subclass containing a word form or a morph that is typical of run-on entries
in print dictionaries. The RelatedForm has a different Sense than the Lemma and can be considered a
candidate for eventual inclusion in a different LexicalEntry object when realized in a lexical database.
The RelatedForm can be related to the Lemma in a variety of ways (e.g. synonym, cross-reference,
multi-word expression, idiom). The RelatedForm class is in a zero-to-many aggregate association with
the LexicalEntry class and can contain a recursive cross-reference to the LexicalEntry class, which
would be realized as a link to a different LexicalEntry object when instantiated in a lexical database.
The RelatedForm class can be typed (generalization) using data categories.
EXAMPLE A developer possibly wants to use the RelatedForm class for a multi-word expression (e.g. United
States) that contains a component form of a Lemma (e.g. united). The design goal could be to preserve the format
of the original source material, or to provide immediate user support while developing an improved lexicon that
includes /united/ and /United States/ as separate entries.
5.6 FormRepresentation class
FormRepresentation is an OrthographicRepresentation subclass that contains the text literals and
metadata (e.g. pronunciation, hyphenation, xml: lang, script) for a Lemma, WordForm, or other subclass
4 © ISO 2020 – All rights reserved
of the Form class. The FormRepresentation class is in a one-to-many aggregate association with a Form
subclass. The FormRepresentation class allows subclasses (typing).
NOTE Data categories, such as xml: lang, script, and notation, are associated with the
OrthographicRepresentation class and inherited by subclasses.
EXAMPLE Because searching for WordPart data (e.g. suffix components of a form) is generally not a high
user priority, a lexicon developer might want to create a PartRep subclass of the FormRepresentation class
in order to support application designs that use object (class) names as part of their query strategy. Creating
different search criteria for FormRepresentation objects and PartRep objects is one way to increase search and
display efficiency.
5.7 TextRepresentation class
TextRepresentation is an Orthographic
...
La norme SIST ISO 24613-2:2021 se concentre sur la gestion des ressources linguistiques, en spécifiant le cadre de balisage lexical (LMF) et plus particulièrement le modèle de dictionnaire lisible par machine (MRD). Cette norme joue un rôle essentiel en fournissant un métamodèle qui permet de représenter les données stockées dans divers sous-types de dictionnaires électroniques. Son champ d'application est donc particulièrement large, allant de l'assistance directe pour les traducteurs humains à la prise en charge du traitement automatique par des machines. L'un des principaux atouts de cette norme réside dans sa capacité à standardiser la représentation des informations lexicales, ce qui facilite l'interopérabilité entre différentes ressources linguistiques et outils de traitement de la langue. En offrant un modèle clair et structuré, la norme aide à garantir que les données lexicographiques soient accessibles et utilisables de manière cohérente et efficace, indépendamment de la plateforme ou de l'application. De plus, la norme SIST ISO 24613-2:2021 est particulièrement pertinente dans le contexte actuel où l'utilisation de technologies avancées, telles que l'intelligence artificielle et le traitement du langage naturel, est en forte croissance. En normalisant le modèle de dictionnaire lisible par machine, elle répond aux besoins croissants des développeurs et des chercheurs travaillant sur des applications de traduction automatique et d'autres systèmes basés sur des ressources linguistiques. Enfin, la norme contribue à la promotion des meilleures pratiques dans la conception et l'implémentation des bases de données lexicales, renforçant ainsi la qualité et la fiabilité des ressources utilisées dans divers domaines, y compris l'éducation, la linguistique computationnelle et les technologies de l'information. En somme, la SIST ISO 24613-2:2021 représente une avancée significative dans le domaine de la gestion des ressources linguistiques, avec un impact direct sur l'efficacité des processus de traitement lexical.
The SIST ISO 24613-2:2021 standard offers a comprehensive approach to language resource management through its detailed outline of the Machine Readable Dictionary (MRD) model. This document serves as a crucial metamodel, effectively defining how data can be represented within various electronic dictionary subtypes. By addressing both human translator needs and machine processing requirements, the standard establishes a versatile framework for modern linguistic resources. One of the primary strengths of this standard lies in its clear delineation of the MRD model's architecture, which facilitates the seamless integration of diverse dictionary functionalities. It enhances interoperability between different systems, thereby promoting wider accessibility and usability of linguistic data. Furthermore, the focus on machine readability aligns with current trends in language technology, emphasizing the importance of automating language processing tasks for improved efficiency. Additionally, the standard's relevance is underscored by the increasing demand for effective language resource management in an era characterized by globalization and digital communication. By standardizing the way dictionaries are structured and accessed, SIST ISO 24613-2:2021 supports advancements in natural language processing and artificial intelligence applications. This ensures that the language resources are not only adequately managed but also optimally utilized across various platforms. Overall, the SIST ISO 24613-2:2021 standard is a vital contribution to the field of lexical markup frameworks, providing essential guidance for developers and researchers involved in creating and managing machine-readable dictionaries while fostering an environment of collaboration within the linguistic community.
SIST ISO 24613-2:2021の標準は、機械可読辞書(MRD)モデルに関するガイドラインを提供しており、電子辞書のさまざまなサブタイプで保存されるデータの表現に特化しています。この文書は、言語資源管理の一環として、言語処理の向上と効率化を目指す素晴らしい基盤を築いています。 まず、この標準の範囲は広く、機械翻訳システムや人間翻訳者の支援において、MRDモデルが果たす役割を明確に定義しています。多様な辞書のデータ形式に対応し、異なる辞書間の相互運用性を促進するためのメタモデルとして設計されている点は特筆すべき強みです。これにより、開発者や研究者は、データの統一された表現を利用して、より高度な言語処理技術を簡単に実装できるようになります。 さらに、SIST ISO 24613-2:2021の強みは、機械処理の容易さを考慮した設計にあります。このモデルは、電子辞書のデータを機械が理解しやすい形式で提供するため、AIや機械学習を活用した言語技術の開発において重要な役割を果たすことができます。これによって、自然言語処理における新たな可能性が広がることでしょう。 また、この標準は、業界全体での適用性を考慮した内容になっており、学術研究や商業製品においても幅広く支持される基盤を形成しています。言語リソースの管理に関する国際的な原則を提供することで、グローバルなコミュニケーションの向上にもつながる点は、特に現在のデジタル社会において重要です。 以上のように、SIST ISO 24613-2:2021は、機械可読辞書のモデルにおける革新的なアプローチを提供し、言語資源管理の分野での標準化を推進するための重要な文書であると言えます。
SIST ISO 24613-2:2021 표준은 기계 가독 사전(MRD) 모델에 대한 포괄적인 설명을 제공합니다. 이 문서는 다양한 전자 사전 하위 유형에 저장된 데이터를 표현하기 위한 메타모델로, 인간 번역가를 지원하는 직접적인 기능부터 기계 처리에 이르기까지 폭넓은 적용 범위를 갖추고 있습니다. 이 표준의 주요 강점은 데이터의 다양한 표현 방식과 함께 기계 가독성을 극대화하기 위한 체계적인 구조를 제공한다는 점입니다. 이는 사용자들이 사전 데이터를 효율적으로 활용하고, 번역 및 언어 처리 작업에서의 정확성을 높이는 데 길잡이가 됩니다. 또한, SIST ISO 24613-2:2021은 기계 가독 사전 모델이 복잡한 언어 자원의 관리 및 통합에 필수적이며, 단계별 구현을 가능케 함으로써 여러 산업 분야에 걸쳐 언어 자원의 표준화를 촉진하는 역할을 합니다. 이러한 점에서 이 문서는 현재와 미래의 언어 자원 관리에 있어 매우 중요한 기준을 설정하고 있습니다. 결론적으로, SIST ISO 24613-2:2021 표준은 기계 가독 사전 모델의 원활한 구현과 관리에 필요한 이론적 및 실용적 기초를 제공하여, 언어 리소스 관리에 있어 현대 사회의 요구를 충족시키는 데 기여하고 있습니다.
Die SIST ISO 24613-2:2021 bietet eine umfassende Beschreibung des Modells für maschinenlesbare Wörterbücher (MRD), welches als Metamodell dient. Dieses Standarddokument ist besonders relevant in der heutigen digitalisierten Welt, in der die effektive Verwaltung von Sprachressourcen zunehmend wichtig wird. Der Standard erstreckt sich über verschiedene Bereiche, indem er die Notwendigkeit adressiert, Daten aus einer Vielzahl elektronischer Wörterbuchuntertypen zu repräsentieren. Dies betrifft sowohl die direkte Unterstützung für menschliche Übersetzer als auch die Anforderungen an die maschinelle Verarbeitung. Durch diese vielseitige Anwendung ermöglicht das Dokument eine erhebliche Flexibilität und Anpassungsfähigkeit der Sprachdatenverwaltung. Ein herausragendes Merkmal der SIST ISO 24613-2:2021 ist die Klarheit, mit der das MRD-Modell die Struktur und die Beziehungen innerhalb lexikalischer Daten beschreibt. Diese strikte Gliederung fördert nicht nur die Effizienz in der Datenverarbeitung, sondern unterstützt auch die Interoperabilität zwischen unterschiedlichen Systemen und Plattformen. Die standardisierte Vorgehensweise gewährleistet, dass verschiedene Anwendungen, sei es in der Übersetzungssoftware oder in der Sprachverarbeitungstechnologie, harmonisch zusammenarbeiten können. Zudem trägt der Standard wesentlich zur Standardisierung in der Sprachressourcenverwaltung bei. Angesichts der Vielfalt an elektronischen Wörterbüchern und der unterschiedlichen Datenformate, die in der Branche existieren, minimiert die Adaption dieses Modells Komplikationen und erleichtert den Austausch und die Wiederverwendbarkeit von Sprachdaten. Die gestärkte Zusammenarbeit zwischen Entwicklern und Linguisten wird durch diese klaren Richtlinien gefördert. Insgesamt ist die SIST ISO 24613-2:2021 von hoher Relevanz für die internationale Sprach- und Übersetzungsindustrie und bietet eine solide Basis für die Entwicklung und Anwendung der lexikalischen Markup-Frameworks, die für moderne Technologien unabdingbar sind. Mit ihrem fokussierten Ansatz und den klaren Richtlinien stellt der Standard sicher, dass die Herausforderungen im Bereich der Sprachressourcenverwaltung erfolgreich gemeistert werden können.












Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...