ISO 24635-1:2025
(Main)Language resource management — Corpus annotation project management — Part 1: Core model
Language resource management — Corpus annotation project management — Part 1: Core model
This document establishes a core model of project management for corpus annotation, to specify the work packages of project teams, required processes and deliverables. This document presents the necessary components for issues such as coordination, human training, reusability, software, quality control, licensing and copyright. However, it does not specify a methodology to solve such issues. This document gives guidance on what work packages and deliverables are required under the project in which workflows and processes deal with the following: — Integration and communication among work packages: This includes ensuring that all work packages are well-coordinated, particularly in terms of the adoption of broader annotation standards and integration with ontologies to enhance interoperability. Effective communication across work packages is crucial for the seamless sharing of annotated documents with other projects. — Human resource management and interrater reliability: This covers the management of human resources, focusing on training and qualification, as well as the implementation of interrater reliability practices. These practices include training, testing and the use of appropriate tools to ensure consistency across annotations. — Annotation guideline management and software utilization: This involves managing the guidelines for annotation tasks and utilizing annotation software and tools, particularly in environments leveraging artificial intelligence (AI) and machine learning (ML) techniques. — Quality control, data validation and structured documentation: This encompasses the processes for quality control and validation of annotation results, alongside the need for structured documentation and ongoing curation. This ensures that annotated documents remain accurate, relevant and usable over the long term. — Licensing, copyrights and metadata management: This focuses on documenting licences and copyrights, providing metadata to manage the sharing of resources. It is particularly important in areas with copyright restrictions or licensing concerns, ensuring that data subsets can be appropriately managed and shared.
Gestion des ressources linguistiques — Gestion de projet d'annotation de corpus — Partie 1: Modèle de base
Upravljanje jezikovnih virov - Projektno vodenje anotacije korpusa - 1. del: Jedrni model
General Information
Standards Content (Sample)
International
Standard
ISO 24635-1
First edition
Language resource management —
2025-07
Corpus annotation project
management —
Part 1:
Core model
Gestion des ressources linguistiques — Gestion de projet
d'annotation de corpus —
Partie 1: Modèle de base
Reference number
© ISO 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Terms related to corpus annotation .2
3.2 Terms related to project management .3
4 Core model . 6
4.1 General .6
4.2 Project organization and role .6
4.2.1 General .6
4.2.2 Project manager .7
4.2.3 Project technical manager .7
4.2.4 Work package manager .7
4.2.5 Process team leader.7
4.2.6 Team member . .8
4.3 Project management process groups for corpus annotation project .8
4.4 Corpus annotation project work package and process .8
4.4.1 General .8
4.4.2 Integrated management of corpus annotation project .8
4.4.3 Corpus annotation work management . 12
4.4.4 Corpus annotation project quality control . 13
5 Publication and archiving of the corpus annotation (optional) .15
Annex A (informative) Process flow organized by process groups and work packages.16
Bibliography .20
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24635 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
Corpus annotation is a process of annotating additional information to primary data. The goal of corpus
annotation projects is to achieve high quality deliverables following the annotation specification within
limited resource environments.
This series gives recommendations on constructing high quality annotated corpora effectively and
efficiently. The series will consist of three parts of model: core model, training model and validation model:
— This document presents the basic principles including considerations of corpus annotation, procedures
of corpus annotation project, project organization, work packages and tasks that can be applied to corpus
annotation project regardless of the scale, complexity and duration of the corpus annotation projects.
1)
— ISO 24635-2 presents the basic principles for training project participants and maintaining their ability
to execute the annotation project tasks.
2)
— ISO 24635-3 presents the basic principles for quality control of deliverables ensuring error-free
annotation aligned with the annotation specification.
Corpus annotation principles and guidelines on proper and efficient annotation have a long-established
history, as discussed in References [11] and [13]. This document specifically focuses on providing guidance on
managing corpus annotation projects effectively rather than prescribing specific annotation methodologies.
1) Under preparation. Stage at the time of publication: ISO/WD 24635-2:2024.
2) Planned.
v
International Standard ISO 24635-1:2025(en)
Language resource management — Corpus annotation project
management —
Part 1:
Core model
1 Scope
This document establishes a core model of project management for corpus annotation, to specify the work
packages of project teams, required processes and deliverables.
This document presents the necessary components for issues such as coordination, human training,
reusability, software, quality control, licensing and copyright. However, it does not specify a methodology to
solve such issues.
This document gives guidance on what work packages and deliverables are required under the project in
which workflows and processes deal with the following:
— Integration and communication among work packages: This includes ensuring that all work packages are
well-coordinated, particularly in terms of the adoption of broader annotation standards and integration
with ontologies to enhance interoperability. Effective communication across work packages is crucial for
the seamless sharing of annotated documents with other projects.
— Human resource management and interrater reliability: This covers the management of human resources,
focusing on training and qualification, as well as the implementation of interrater reliability practices.
These practices include training, testing and the use of appropriate tools to ensure consistency across
annotations.
— Annotation guideline management and software utilization: This involves managing the guidelines for
annotation tasks and utilizing annotation software and tools, particularly in environments leveraging
artificial intelligence (AI) and machine learning (ML) techniques.
— Quality control, data validation and structured documentation: This encompasses the processes for
quality control and validation of annotation results, alongside the need for structured documentation
and ongoing curation. This ensures that annotated documents remain accurate, relevant and usable over
the long term.
— Licensing, copyrights and metadata management: This focuses on documenting licences and copyrights,
providing metadata to manage the sharing of resources. It is particularly important in areas with
copyright restrictions or licensing concerns, ensuring that data subsets can be appropriately managed
and shared.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 Terms related to corpus annotation
3.1.1
annotation
information added to primary data (3.1.9), independent of its representation
[SOURCE: ISO 24623-1:2018, 3.1]
3.1.2
annotation layer
layer for corpus annotation (3.1.6)
EXAMPLE Syntactic layer, lexical-semantic layer, entity layer.
3.1.3
annotation scheme
description of the structure of annotations (3.1.1)
3.1.4
annotation unit
specific segment of primary data (3.1.9) that is identified and labelled according to an annotation scheme (3.1.3)
EXAMPLE Word, phrase, clause, sentence, utterance.
3.1.5
corpus
collection of natural language data
[SOURCE: ISO 1087:2019, 3.6.4, modified — The preferred term “text corpus” deleted. Note 1 to entry
deleted.]
3.1.6
corpus annotation
action of adding interpretative linguistic or non-linguistic information to a corpus (3.1.5)
[12]
[SOURCE: Leech, G., 2005 , modified — “non-linguistic” added.]
3.1.7
corpus annotation project
project (3.2.9) aimed at enhancing a collection of corpora (3.1.5) with metadata or labels that provide
additional linguistic, non-linguistic, semantic, or structural information to facilitate analysis, research and
the development of natural language processing tools
3.1.8
guideline
official recommendation or advice that indicates policies, standards or procedures for how something
should be accomplished
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.1774]
3.1.9
primary data
original, unannotated electronic representation of language data that serves as the foundation for the
annotation process
3.1.10
resource
skilled human resources (specific disciplines either individually or in crews or teams), equipment, services,
supplies, commodities, material, budgets or funds
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3461,2]
3.2 Terms related to project management
3.2.1
activity
identified piece of work that is required to be undertaken to complete a project (3.2.9), programme, portfolio
or other related work
[SOURCE: ISO 21506:2024, 3.2, modified — Note 1 to entry deleted.]
3.2.2
control
comparison of actual performance with planned performance, analysing variances and taking appropriate
corrective and/or preventive action as needed
[SOURCE: ISO 21506:2024, 3.13, modified — “and/or” replaced “and”.]
3.2.3
data consistency
adherence to uniform and standardized annotation (3.1.1) guidelines (3.1.8) and criteria across the entire
corpus (3.1.5), ensuring that all annotated elements follow the same rules and conventions, which facilitates
reliable and reproducible analysis
3.2.4
data validation
process (3.2.7) of systematically checking and verifying the accuracy, completeness and consistency of
annotations (3.1.1) within the corpus (3.1.5) to ensure that the data meet predefined quality standards and
guidelines (3.1.8)
3.2.5
deliverable
unique and verifiable element that is required to be produced by a project (3.2.9)
[SOURCE: ISO 21502:2020, 3.9]
3.2.6
output
aggregated tangible or intangible deliverables (3.2.5) that form the project (3.2.9) result
[SOURCE: ISO 21502:2020, 3.14]
3.2.7
process
systematic series of activities (3.2.1) directed towards causing an end result such that one or more inputs
will be acted upon to create one or more outputs (3.2.6)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3037,8]
3.2.8
process group
collection of related processes (3.2.7)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3057,1]
3.2.9
project
temporary endeavour to achieve one or more defined objectives
[SOURCE: ISO 21502:2020, 3.20]
3.2.10
project charter
document that states the problem to be solved, the improvement goals, the project scope (3.2.19), the project
(3.2.9) milestones and the project roles and responsibilities
[SOURCE: ISO 13053-2:2011, 2.26]
3.2.11
project communications management
processes (3.2.7) that are required to ensure timely and appropriate planning, collection, creation,
distribution, storage, retrieval, management, control, monitoring and the ultimate disposition of project
(3.2.9) information
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3156]
3.2.12
project cost management
processes (3.2.7) involved in planning, estimating, budgeting, financing, funding, managing and controlling
costs so that the project (3.2.9) can be completed within the approved budget
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3158]
3.2.13
project integration management
processes (3.2.7) and activities (3.2.1) needed to identify, define, combine, unify and coordinate the various
processes and project management (3.2.14) activities within the project management process groups (3.2.8)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3165]
3.2.14
project management
planning, organizing, monitoring, controlling and reporting of all aspects of a project (3.2.9), and the
motivation of all those involved in it to achieve the project (3.2.9) objectives
[SOURCE: ISO 22886:2020, 3.9.7]
3.2.15
project management process group
logical grouping of project management (3.2.14) inputs, tools and techniques, and outputs (3.2.6)
Note 1 to entry: The project management process groups include initiating processes (3.2.7), planning processes,
executing processes, monitoring and controlling processes, and closing processes. Project management process
groups are not project phases (3.2.16).
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3173]
3.2.16
project phase
collection of logically related project (3.2.9) activities (3.2.1) that culminates in the completion of one or
more deliverables (3.2.5)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3181]
3.2.17
project procurement management
processes (3.2.7) necessary to purchase or acquire products, services or results needed from outside the
project (3.2.9) team
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3185]
3.2.18
project quality management
processes (3.2.7) and activities (3.2.1) of the performing organization that determine quality policies,
objectives and responsibilities so that the project (3.2.9) will satisfy the needs for which it was undertaken
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3186]
3.2.19
project scope
authorized work to accomplish agreed objectives
[SOURCE: ISO 21502:2020, 3.25]
3.2.20
project scope management
processes (3.2.7) required to ensure that the project (3.2.9) includes all the work required, and only the work
required, to complete the project successfully
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3194]
3.2.21
schedule management plan
component of the project management (3.2.14) plan that establishes the criteria and the activities (3.2.1) for
developing, monitoring and controlling the schedule
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3619]
3.2.22
stakeholder
person, group or organization that has interests in, or can affect, be affected by, or perceive itself to be
affected by, any aspect of the project (3.2.9), programme or portfolio
[SOURCE: ISO 21502:2020, 3.27]
3.2.23
work breakdown structure
WBS
decomposition of the defined scope of a project (3.2.9) or programme into progressively lower levels
consisting of elements of work
[SOURCE: ISO 21502:2020, 3.29, modified — Abbreviated term “WBS” added.]
3.2.24
work package
group of activities (3.2.1) that have a defined scope, deliverable (3.2.5), timescale and cost
[SOURCE: ISO 21502:2020, 3.30]
3.2.25
work package leader
work package team leader
role within project management (3.2.14) that is responsible for overseeing a specific work package (3.2.24)
4 Core model
4.1 General
The basic configuration of the core model for corpus annotation project is based on the guidance of standards
on project management developed by ISO/TC 258 (see ISO 21500 and ISO 21502) that use work package,
process, deliverable, etc.
The work packages in the core model include the integrated management of corpus annotation project,
communication, annotation guideline, annotation work, annotation quality management and annotation
system. Each work package represents a group of activities that shall be carried out for the corpus annotation
project.
Under each work package, there are some work activities as a process performed for the subject group to be
managed. A process has its input to activate the process and the list of outputs.
A subject group represents what should be considered in the project management (e.g. stakeholder, scope,
resource, time, cost, risk, quality, procurement, communication, benefit, reporting, issue, change control,
integration). Each work package deals with almost all the subject groups in it. The following are the core
components in the model:
— Annotator, validator, project participant: A subject of personnel resource. When the human resources
are crowd-sourced, the procurement subject group will initiate the process for the supplier selection.
Annotators will be trained and qualified by means of the relevant processes: annotator training (P5.2c),
annotator quality test (P5.3a), annotator quality upgrade training (P5.3b) and annotator quality
maintenance (P5.3) control.
— Annotation guide: A subject in the scope of output from a corpus annotation project: The annotation
guide’s scope follows the project charter. The annotation guide is developed in the annotation guide
work package (WP3), used to develop the training material in the process of annotation guide training
development (P3.3). It is then revised in the process of annotation guide updating and communication
(P2.3a), and then distributed efficiently and effectively through the communication work package (WP2).
— Annotation work package (WP4) activates the following processes, according to the annotation scope
and annotation guide: setup by procuring the annotation system and environment, annotator resource
procurement, training and quality control.
— Annotation quality management work package (WP5) manages the quality of annotation output from
the annotation work package.
— Annotation system work package (WP6) maintains the annotation working environment with output
repository, security, issue logging, annotation consistency check, support functions, etc.
— Communication work package (WP2) emphasizes the project communications management among
annotation guide writers, annotators, annotation environment and system developers, quality control
managers and validators. The communication shall be done immediately, efficiently and effectively. Its
complete application and synchronization over all of annotation steps shall be ensured and recorded/
documented.
Activities in a work package and a subject group are classified into process groups to maintain the life cycle
for initiating, planning, implementing, controlling and closing a project.
The relationship between subject groups and processes is summarized in Table A.1.
4.2 Project organization and role
4.2.1 General
A corpus annotation project consists of a set of work packages as shown in Figure 1. The project manager
supervises the work package managers with the help of project technical managers. A work package works
as a collection of processes (or activities or tasks) under the supervision of a work package manager. A
process is accomplished by the team members to be assigned to the process under the supervision of a
process team leader.
Figure 1 — Project organization
4.2.2 Project manager
The project manager has responsibility for the project’s integrated management, supervision and decision-
making, particularly in the corpus annotation project, including to approve the annotation guide and its
distribution, optionally with the project technical manager(s). The project manager has the required
competence in natural language processing, machine learning-purpose data collection and corpus
linguistics.
4.2.3 Project technical manager
The project technical manager takes care of the deliverables for each stage of the project and manages the
schedule of each work package. This position oversees administrative support and communication with
the work package managers. The project technical manager shall have experience in participating in data
development projects and natural language processing/machine learning.
4.2.4 Work package manager
The work package manager is responsible for managing the processes belonging to the work package,
administrative work, reporting the output of the processes and communicating with the process team
leader under the work package.
4.2.5 Process team leader
The process team leader is responsible for process scheduling, deliverable management and reporting, and
communication among process team members. Additionally, the team leader is responsible for estimating
the resources required to perform the process and requesting the support to the work package manager.
The process team leader and the work package manager may participate in the actual processes with project
team members, and draw practical issues (e.g. about applying the annotation guidelines and, if necessary,
suggesting revisions).
4.2.6 Team member
A team member has expertise in the process, participates in the session of training and maintains the
required qualifications.
4.3 Project management process groups for corpus annotation project
Each process belongs to one of initiation, planning, implementation, control or closing process groups.
As shown in Figure 2, an annotation project does not proceed in the order of initiation, planning,
implementation, control and termination, but it is a structure that repeats and re-executes when a change
issue appears.
Figure 2 — Interaction among project management process groups
4.4 Corpus annotation project work package and process
4.4.1 General
Each work package specializes in a topic on annotation guidelines, annotation work, annotation quality,
annotation environment and system, which are the core topics of the corpus annotation project. The
integrated management work package manages the coordination among work packages as shown in the
following list:
— project integration management (WP1);
— communication (WP2);
— annotation guide authoring, updating and licensing (WP3);
— annotation work (WP4);
— annotation project quality control (WP5);
— annotation environment and system (WP6).
4.4.2 Integrated management of corpus annotation project
4.4.2.1 WP1. Project integration management work package
WP1 is responsible for integrated management and administrative support throughout the project life cycle,
including forming and managing the project team, overseeing licensing activities related to data, annotation
models, annotation guides and annotation software systems, and coordinating various work packages
aligned with project scope management, project cost management, project quality management, project
procurement management and project risk management.
WP1 consists of the following processes:
— P1.1. Initiating a project:
— P1.1a. Project charter development: A project charter is developed based on the project contract, its
similar corpus project examples and past-phase documents.
— P1.1b. Project team formation: This process begins based on the project charter which is the
output of the previous process. A team s
...
SLOVENSKI STANDARD
oSIST ISO/DIS 24635-1:2025
01-junij-2025
Upravljanje jezikovnih virov - Projektno vodenje anotacije korpusa - 1. del: Jedrni
model
Language resource management — Corpus Annotation Project Management — Part 1:
Core model
Gestion des ressources linguistiques — Gestion de projet d'annotation de corpus —
Partie 1: Modèle de base
Ta slovenski standard je istoveten z: ISO/PRF 24635-1
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
oSIST ISO/DIS 24635-1:2025 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
oSIST ISO/DIS 24635-1:2025
oSIST ISO/DIS 24635-1:2025
DRAFT
International
Standard
ISO/DIS 24635-1
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Corpus Annotation Project
Voting begins on:
Management —
2024-11-05
Part 1:
Voting terminates on:
2025-01-28
Core model
Gestion des ressources linguistiques — Gestion de projet
d'annotation de corpus —
Partie 1: Modèle de base
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
This document is circulated as received from the committee secretariat.
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS.
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Reference number
ISO/DIS 24635-1:2024(en)
oSIST ISO/DIS 24635-1:2025
DRAFT
ISO/DIS 24635-1:2024(en)
International
Standard
ISO/DIS 24635-1
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Corpus Annotation Project
Voting begins on:
Management —
Part 1:
Voting terminates on:
Core model
Gestion des ressources linguistiques — Gestion de projet
d'annotation de corpus —
Partie 1: Modèle de base
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
© ISO 2024
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
STANDARDS MAY ON OCCASION HAVE TO
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
This document is circulated as received from the committee secretariat. BE CONSIDERED IN THE LIGHT OF THEIR
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
or ISO’s member body in the country of the requester.
NATIONAL REGULATIONS.
ISO copyright office
RECIPIENTS OF THIS DRAFT ARE INVITED
CP 401 • Ch. de Blandonnet 8
TO SUBMIT, WITH THEIR COMMENTS,
CH-1214 Vernier, Geneva
NOTIFICATION OF ANY RELEVANT PATENT
Phone: +41 22 749 01 11
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/DIS 24635-1:2024(en)
ii
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
3.1 Terms and definitions for corpus annotation .2
3.2 Terms and definitions for project management.3
4 Purpose and justification . 6
5 Core Model . 6
5.1 Project organization and role .7
5.1.1 Project manager .7
5.1.2 Project technical manager .7
5.1.3 Work package manager .7
5.1.4 Process team leader.7
5.1.5 Team member .8
5.2 Process groups for corpus annotation project .8
5.3 Corpus annotation project work package and process .8
5.3.1 Integrated management of corpus annotation project .8
5.3.2 Corpus annotation work management . 12
5.3.3 Corpus annotation project quality control . 13
6 Publication and archiving of the corpus annotation (optional) .15
Annex A (informative) Process flow in the scope of process groups and work packages .16
Bibliography . 19
iii
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any patent
rights identified during the development of the document will be in the Introduction and/or on the ISO list of
patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and Terminology, Subcommittee
SC 4, Language Resource Management.
A list of all parts in the ISO 24635 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
Introduction
Corpus annotation is a process of annotating additional linguistic information to primary data. The goal
of corpus annotation projects is to achieve high quality deliverables following the annotation specification
within limited resource environments.
Language resource management – Corpus Annotation Project Management is a serialized proposal of standards
that aim to give recommendations to construct high quality annotated corpora effectively and efficiently.
The proposal consists of three parts of model – Core Model, Validation Model, and Training Model.
Part 1: Core Model presents the basic principles including considerations of corpus annotation, procedures
of corpus annotation project, project organization, work packages and tasks that can be applied to corpus
annotation project regardless of the scale, complexity, and duration of the corpus annotation projects.
Part 2: Training Model presents the basic principles to train the project participants and to maintain their
ability to execute the project.
Part 3: Validation Model presents the basic principles for quality control of deliverables achieving error- free
annotation following the specification of annotation.
v
oSIST ISO/DIS 24635-1:2025
oSIST ISO/DIS 24635-1:2025
DRAFT International Standard ISO/DIS 24635-1:2024(en)
Language resource management — Corpus Annotation
Project Management —
Part 1:
Core model
1 Scope
This standard is a part of series of standards for corpus annotation project management. This part 1
describes the core model of project management for corpus annotation, to specify the work packages of
project teams, required processes and deliverables. The other parts of this series of standards shall describe
the training model of human resources involved and the validation model as parts 2 and 3.
This document does not specify the methodology to solve the issues such as quality control, human training,
reusability, licensing and copyright, but present the necessary components for such issues and specify what
work packages, their subtasks and workflow among them are required to manage the corpus annotation
project to handle such issues. This document presents the required components to deal with the quality
control, human training, reusability, licensing, copyright and other area for corpus project management by
specifying what work packages, their subtasks and workflow among them.
Thus, this core model of project management for corpus annotation shall specify recommendations on what
work packages and deliverables are required under the project in which workflows and processes deal with:
— Integration and Communication Among Work Packages: This includes ensuring that all work
packages are well-coordinated, particularly in terms of the adoption of broader annotation standards
and integration with ontologies to enhance interoperability. Effective communication across work
packages is crucial for the seamless sharing of annotated documents with other projects.
— Human Resource Management and Interrater Reliability: This covers the management of human
resources, focusing on training and qualification, as well as the implementation of interrater reliability
practices. These practices include training, testing, and the use of appropriate tools to ensure consistency
across annotations.
— Annotation Guideline Management and Software Utilization: This involves managing the guidelines
for annotation tasks and utilizing annotation software and tools, particularly in environments leveraging
artificial intelligence (AI) and machine learning (ML) techniques. It includes the cautious application of
AI/ML methods, such as weak supervised learning, to support the annotation process.
— Quality Control, Validation, and Structured Documentation: This encompasses the processes for
quality control and validation of annotation results, alongside the need for structured documentation
and ongoing curation. This ensures that annotated documents remain accurate, relevant, and usable
over the long term.
— Licensing, Copyrights, and Metadata Management: This focuses on documenting licenses and
copyrights, providing metadata to manage the sharing of resources. It is particularly important in
areas with copyright restrictions or licensing concerns, ensuring that data subsets can be appropriately
managed and shared.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
the latest edition of the referenced document (including any amendments) applies. The applicable references
of this proposal can be extended to the standards of the TC37/SC4.
ISO 21500:2021, Project, programme and portfolio management — Context and concepts
ISO 21502:2020, Project, programme and portfolio management — Guidance on project management
ISO 21503:2022, Project, programme and portfolio management — Guidance on programme management
ISO 21504:2022, Project, programme and portfolio management — Guidance on portfolio management
ISO 21505:2017, Project, programme and portfolio management — Guidance on governance
ISO/TR 21506:2018, Project, programme and portfolio management — Vocabulary
3 Terms and definitions
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 Terms and definitions for corpus annotation
3.1.1
annotation
information added to primary data (3.1.9), independent of its representation
[SOURCE: ISO 24623-1:2018, 3.1]
3.1.2
annotation layer
layer for corpus annotation (3.1.6)
EXAMPLE Syntactic layer, lexical-semantic layer, entity layer.
3.1.3
annotation scheme
description of the structure of the annotations (3.1.1)
3.1.4
annotation unit
specific segment of primary data (3.1.9) that is identified and labelled according to annotation scheme (3.1.3)
EXAMPLE Word, phrase, sentence.
3.1.5
corpus
collection of natural language data
[SOURCE: ISO 1087:2019(en), 3.6.4, modified – “text corpus” as alternative form is not used, and this standard
does not require a restriction to text only]
3.1.6
corpus annotation
action of adding interpretative linguistic or non-linguistic information to a corpus (3.1.5)
Note 1 to entry: “non-linguistic” is added to the definition of Geoffrey Leach (2005).
[SOURCE: Geoffrey Leech, Adding Linguistic annotation, 2005]
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
3.1.7
corpus annotation project
project (3.2.9) aimed at enhancing a collection of corpora (3.1.5) with metadata or labels that provide
additional linguistic, semantic, or structural information to facilitate analysis, research, and the development
of natural language processing tools
3.1.8
guideline
official recommendation or advice that indicates policies, standards, or procedures for how something
should be accomplished
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.1774]
3.1.9
primary data
original, unannotated electronic representation of language data that serves as the foundation for the
annotation process
3.1.10
resource
skilled human resources (specific disciplines either individually or in crews or teams), equipment, ser- vices,
supplies, commodities, material, budgets, or funds
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3461,2]
3.2 Terms and definitions for project management
3.2.1
activity
identified component of work within a schedule that is required to be undertaken to complete a project (3.2.9)
[SOURCE: ISO 21500:2012(e), 2.1]
3.2.2
control
comparison of actual performance with planned performance, analysing variances and taking appropriate
corrective and preventive action as needed
[SOURCE: ISO/TR 21506:2018, 3.13]
3.2.3
data consistency
adherence to uniform and standardized annotation guidelines and criteria across the entire corpus,
ensuring that all annotated elements follow the same rules and conventions, which facilitates reliable and
reproducible analysis
3.2.4
data validation
process of systematically checking and verifying the accuracy, completeness, and consistency of annotations
within the corpus to ensure that the data meets predefined quality standards and guidelines
3.2.5
deliverable
unique and verifiable element that is required to be produced by a project (3.2.9)
[SOURCE: ISO 21502:2020(en), 3.9]
3.2.6
output
aggregated tangible or intangible deliverables (3.2.5) that form the project (3.2.9) result
[SOURCE: ISO 21502:2020(en): 3.14]
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
3.2.7
process
systematic series of activities (3.2.1) directed towards causing an end result such that one or more inputs
will be acted upon to create one or more outputs
[SOURCE: ISO/IEC/IEEE 24765-2017(en), 3.3037,8]
3.2.8
process group
collection of related processes (3.2.7)
[SOURCE: ISO/IEC/IEEE 24765-2017(en), 3.3057,1]
3.2.9
project
temporary endeavour to achieve one or more defined objectives
[SOURCE: ISO 21502:2020(en), 3.20]
3.2.10
project charter
document that states the problem to be solved, the improvement goals, the project scope (3.2.19), the project
milestones and the project roles and responsibilities
[SOURCE: ISO 13053-2:2011(en), 2.26]
3.2.11
project communication management
processes that are required to ensure timely and appropriate planning, collection, creation, distribution,
storage, retrieval, management, control, monitoring, and the ultimate disposition of project information
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3156]
3.2.12
project cost management
processes involved in planning, estimating, budgeting, financing, funding, managing, and controlling costs
so that the project can be completed within the approved budget
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3158]
3.2.13
project integration management
project integration management includes the processes and activities needed to identify, define, com-
bine, unify, and coordinate the various processes and project management activities within the project
management process groups (3.2.8)
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3165]
3.2.14
project management
planning, organizing, monitoring, controlling and reporting of all aspects of a project, and the motivation of
all those involved in it to achieve the project (3.2.9) objectives
[SOURCE: ISO 22886:2020(en), 3.9.7]
3.2.15
project management process group
logical grouping of project management (3.2.14) inputs, tools and techniques, and outputs. The project
management process groups include initiating processes, planning processes, executing processes,
monitoring and controlling processes, and closing processes. Project management process groups are not
project phases
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3173]
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
3.2.16
project phase
collection of logically related project activities that culminates in the completion of one or more deliverables
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3181]
3.2.17
project procurement management
processes necessary to purchase or acquire products, services, or results needed from outside the project team
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3185]
3.2.18
project quality management
processes (3.2.7) and activities (3.2.1) of the performing organization that determine quality policies,
objectives, and responsibilities so that the project (3.2.9) will satisfy the needs for which it was undertaken
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3186]
3.2.19
project scope
authorized work to accomplish agreed objectives
[SOURCE: ISO 21502:2020(en), 3.25]
3.2.20
project scope management
processes (3.2.7) required to ensure that the project includes all the work required, and only the work
required, to complete the project successfully
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3194]
3.2.21
schedule management plan
component of the project management (3.2.14) plan that establishes the criteria and the activities (3.2.1) for
developing, monitoring, and controlling the schedule
[SOURCE: ISO/IEC/IEEE 24765:2017(en), 3.3619]
3.2.22
stakeholder
person, group or organization that has interests in, or can affect, be affected by, or perceive itself to be
affected by, any aspect of the project (3.2.9), programme or portfolio
[SOURCE: ISO 21502:2020(en), 3.27]
3.2.23
work breakdown structure
WBS
decomposition of the defined scope of a project (3.2.9) or programme into progressively lower levels
consisting of elements of work
[SOURCE: ISO 21502:2020(en), 3.29, modified – “WBS” is acronym of “work breakdown structure”]
3.2.24
work package
group of activities (3.2.1) that have a defined scope, deliverable (3.2.5), timescale and cost
[SOURCE: ISO 21502:2020(en), 3.30]
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
3.2.25
work package leader
work package team leader
role within project management (3.2.14) responsible for overseeing a specific work package (3.2.24)
4 Purpose and justification
The purpose of ISO 24635 is to describe its principles and basic model for corpus annotation project
management. For this purpose, there are three parts: part 1 contains the core components, part 2 covers the
training model for annotators, and part 3 the annotation validation model.
This part is to provide the guidelines and principles for annotation project planning to guarantee its qualified
[3]
output in aspects of management practices for projects in ISO 21502 .
Annotation principles and guidelines about how to annotate properly and efficiently have a long history and
[1,4].
references like This standard is not to describe the annotation model but to give the guideline for the
project management of corpus annotation.
5 Core Model
The basic configuration of the core model for corpus annotation project is based on the guidance of project
[2,3]
management in the ISO 21500 standards series that use work package, process, deliverable, and so on.
The work packages in the core model include the integrated management of corpus annotation project,
communication, annotation guideline, annotation work, annotation quality management, and annotation
system. Each work package represents a group of activities that must be carried out for the corpus annotation
project.
Under each work package, there are some work activities as a process performed for the subject group to be
managed. A process has its input to activate the process and the list of outputs.
A subject group represents what should be considered in the project management: for example, stake- holder,
scope, resource, time, cost, risk, quality, procurement, communication, benefit, reporting, issue, change control,
integration, and so on. Each work package will deal with almost all the subject groups in it. The following are
the core components in the model:
— annotator, validator, project participant: a subject of personnel resource; when the human resources
are crowd-sourced, the procurement subject group will initiate the process for the supplier selection.
Annotators will be trained and qualified by means of the relevant processes: annotator training (P5.2c),
annotator quality test (P5.3a), annotator quality upgrade training (P5.3b), annotator quality maintenance
(P5.3) control.
— annotation guide: a subject in the scope of output from a corpus annotation project: Annotation guide’s
scope follows the project charter. The annotation guide is developed in the annotation guide work package
(WP3), used to develop the training material in the process of annotation guide training development
(P3.3), revised then in the process of annotation guide updating and communication (P2.3a), and then
distributed efficiently and effectively through the communication work package (WP2).
— annotation work package (WP4) activates the following processes - according to the annotation scope
and annotation guide, setup by procuring the annotation system & environment, annotator resource
procurement, training, and quality control.
— annotation quality management work package (WP5) manages the quality of annotation output from the
annotation work package.
— annotation system work package (WP6) maintains the annotation working environment with output
repository, security, issue logging, annotation consistency check, support functions, and so on.
— communication work package (WP2) emphasizes the communication among annotation guide writers,
annotators, annotation environment & system developers, quality control managers, and validators.
oSIST ISO/DIS 24635-1:2025
ISO/DIS 24635-1:2024(en)
The communication must be done immediately, efficiently, effectively, and its complete application and
synchronization over all of annotation steps must be ensured and recorded/documented.
Activities in a work package and a subject group are classified into process groups to maintain the life cycle
for initiating, planning, implementing, controlling, and closing of project.
The relationship between subject groups and processes are summarized in the table 1.
5.1 Project organization and role
A corpus annotation project consists of a set of work packages as in Fig. 1. The project manager supervises
...
SLOVENSKI STANDARD
01-junij-2025
Upravljanje jezikovnih virov - Projektno vodenje anotacije korpusa - 1. del: Jedrni
model
Language resource management — Corpus Annotation Project Management — Part 1:
Core model
Gestion des ressources linguistiques — Gestion de projet d'annotation de corpus —
Partie 1: Modèle de base
Ta slovenski standard je istoveten z: ISO/PRF 24635-1
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
International
Standard
ISO 24635-1
First edition
Language resource management —
Corpus annotation project
management —
Part 1:
Core model
Gestion des ressources linguistiques — Gestion de projet
d'annotation de corpus —
Partie 1: Modèle de base
PROOF/ÉPREUVE
Reference number
© ISO 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
PROOF/ÉPREUVE
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Terms related to corpus annotation .2
3.2 Terms related to project management .3
4 Core model . 6
4.1 General .6
4.2 Project organization and role .6
4.2.1 General .6
4.2.2 Project manager .7
4.2.3 Project technical manager .7
4.2.4 Work package manager .7
4.2.5 Process team leader.7
4.2.6 Team member . .8
4.3 Project management process groups for corpus annotation project .8
4.4 Corpus annotation project work package and process .8
4.4.1 General .8
4.4.2 Integrated management of corpus annotation project .8
4.4.3 Corpus annotation work management . 12
4.4.4 Corpus annotation project quality control . 13
5 Publication and archiving of the corpus annotation (optional) .15
Annex A (informative) Process flow organized by process groups and work packages.16
Bibliography .20
PROOF/ÉPREUVE
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24635 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
PROOF/ÉPREUVE
iv
Introduction
Corpus annotation is a process of annotating additional information to primary data. The goal of corpus
annotation projects is to achieve high quality deliverables following the annotation specification within
limited resource environments.
This series gives recommendations on constructing high quality annotated corpora effectively and
efficiently. The series will consist of three parts of model: core model, training model and validation model:
— This document presents the basic principles including considerations of corpus annotation, procedures
of corpus annotation project, project organization, work packages and tasks that can be applied to corpus
annotation project regardless of the scale, complexity and duration of the corpus annotation projects.
1)
— ISO 24635-2 presents the basic principles for training project participants and maintaining their ability
to execute the annotation project tasks.
2)
— ISO 24635-3 presents the basic principles for quality control of deliverables ensuring error-free
annotation aligned with the annotation specification.
Corpus annotation principles and guidelines on proper and efficient annotation have a long-established
history, as discussed in References [11] and [13]. This document specifically focuses on providing guidance on
managing corpus annotation projects effectively rather than prescribing specific annotation methodologies.
1) Under preparation. Stage at the time of publication: ISO/WD 24635-2:2024.
2) Planned.
PROOF/ÉPREUVE
v
International Standard ISO 24635-1:2025(en)
Language resource management — Corpus annotation project
management —
Part 1:
Core model
1 Scope
This document establishes a core model of project management for corpus annotation, to specify the work
packages of project teams, required processes and deliverables.
This document presents the necessary components for issues such as coordination, human training,
reusability, software, quality control, licensing and copyright. However, it does not specify a methodology to
solve such issues.
This document gives guidance on what work packages and deliverables are required under the project in
which workflows and processes deal with the following:
— Integration and communication among work packages: This includes ensuring that all work packages are
well-coordinated, particularly in terms of the adoption of broader annotation standards and integration
with ontologies to enhance interoperability. Effective communication across work packages is crucial for
the seamless sharing of annotated documents with other projects.
— Human resource management and interrater reliability: This covers the management of human resources,
focusing on training and qualification, as well as the implementation of interrater reliability practices.
These practices include training, testing and the use of appropriate tools to ensure consistency across
annotations.
— Annotation guideline management and software utilization: This involves managing the guidelines for
annotation tasks and utilizing annotation software and tools, particularly in environments leveraging
artificial intelligence (AI) and machine learning (ML) techniques.
— Quality control, data validation and structured documentation: This encompasses the processes for
quality control and validation of annotation results, alongside the need for structured documentation
and ongoing curation. This ensures that annotated documents remain accurate, relevant and usable over
the long term.
— Licensing, copyrights and metadata management: This focuses on documenting licences and copyrights,
providing metadata to manage the sharing of resources. It is particularly important in areas with
copyright restrictions or licensing concerns, ensuring that data subsets can be appropriately managed
and shared.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
PROOF/ÉPREUVE
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 Terms related to corpus annotation
3.1.1
annotation
information added to primary data (3.1.9), independent of its representation
[SOURCE: ISO 24623-1:2018, 3.1]
3.1.2
annotation layer
layer for corpus annotation (3.1.6)
EXAMPLE Syntactic layer, lexical-semantic layer, entity layer.
3.1.3
annotation scheme
description of the structure of annotations (3.1.1)
3.1.4
annotation unit
specific segment of primary data (3.1.9) that is identified and labelled according to an annotation scheme (3.1.3)
EXAMPLE Word, phrase, clause, sentence, utterance.
3.1.5
corpus
collection of natural language data
[SOURCE: ISO 1087:2019, 3.6.4, modified — The preferred term “text corpus” deleted. Note 1 to entry
deleted.]
3.1.6
corpus annotation
action of adding interpretative linguistic or non-linguistic information to a corpus (3.1.5)
[12]
[SOURCE: Leech, G., 2005 , modified — “non-linguistic” added.]
3.1.7
corpus annotation project
project (3.2.9) aimed at enhancing a collection of corpora (3.1.5) with metadata or labels that provide
additional linguistic, non-linguistic, semantic, or structural information to facilitate analysis, research and
the development of natural language processing tools
3.1.8
guideline
official recommendation or advice that indicates policies, standards or procedures for how something
should be accomplished
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.1774]
3.1.9
primary data
original, unannotated electronic representation of language data that serves as the foundation for the
annotation process
PROOF/ÉPREUVE
3.1.10
resource
skilled human resources (specific disciplines either individually or in crews or teams), equipment, services,
supplies, commodities, material, budgets or funds
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3461,2]
3.2 Terms related to project management
3.2.1
activity
identified piece of work that is required to be undertaken to complete a project (3.2.9), programme, portfolio
or other related work
[SOURCE: ISO 21506:2024, 3.2, modified — Note 1 to entry deleted.]
3.2.2
control
comparison of actual performance with planned performance, analysing variances and taking appropriate
corrective and/or preventive action as needed
[SOURCE: ISO 21506:2024, 3.13, modified — “and/or” replaced “and”.]
3.2.3
data consistency
adherence to uniform and standardized annotation (3.1.1)guidelines (3.1.8) and criteria across the entire
corpus (3.1.5), ensuring that all annotated elements follow the same rules and conventions, which facilitates
reliable and reproducible analysis
3.2.4
data validation
process (3.2.7) of systematically checking and verifying the accuracy, completeness and consistency of
annotations (3.1.1) within the corpus (3.1.5) to ensure that the data meet predefined quality standards and
guidelines (3.1.8)
3.2.5
deliverable
unique and verifiable element that is required to be produced by a project (3.2.9)
[SOURCE: ISO 21502:2020, 3.9]
3.2.6
output
aggregated tangible or intangible deliverables (3.2.5) that form the project (3.2.9) result
[SOURCE: ISO 21502:2020, 3.14]
3.2.7
process
systematic series of activities (3.2.1) directed towards causing an end result such that one or more inputs
will be acted upon to create one or more outputs (3.2.6)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3037,8]
3.2.8
process group
collection of related processes (3.2.7)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3057,1]
PROOF/ÉPREUVE
3.2.9
project
temporary endeavour to achieve one or more defined objectives
[SOURCE: ISO 21502:2020, 3.20]
3.2.10
project charter
document that states the problem to be solved, the improvement goals, the project scope (3.2.19), the project
(3.2.9) milestones and the project roles and responsibilities
[SOURCE: ISO 13053-2:2011, 2.26]
3.2.11
project communications management
processes (3.2.7) that are required to ensure timely and appropriate planning, collection, creation,
distribution, storage, retrieval, management, control, monitoring and the ultimate disposition of project
(3.2.9) information
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3156]
3.2.12
project cost management
processes (3.2.7) involved in planning, estimating, budgeting, financing, funding, managing and controlling
costs so that the project (3.2.9) can be completed within the approved budget
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3158]
3.2.13
project integration management
processes (3.2.7) and activities (3.2.1) needed to identify, define, combine, unify and coordinate the various
processes and project management (3.2.14) activities within the project management process groups (3.2.8)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3165]
3.2.14
project management
planning, organizing, monitoring, controlling and reporting of all aspects of a project (3.2.9), and the
motivation of all those involved in it to achieve the project (3.2.9) objectives
[SOURCE: ISO 22886:2020, 3.9.7]
3.2.15
project management process group
logical grouping of project management (3.2.14) inputs, tools and techniques, and outputs (3.2.6)
Note 1 to entry: The project management process groups include initiating processes (3.2.7), planning processes,
executing processes, monitoring and controlling processes, and closing processes. Project management process
groups are not project phases (3.2.16).
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3173]
3.2.16
project phase
collection of logically related project (3.2.9)activities (3.2.1) that culminates in the completion of one or more
deliverables (3.2.5)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3181]
PROOF/ÉPREUVE
3.2.17
project procurement management
processes (3.2.7) necessary to purchase or acquire products, services or results needed from outside the
project (3.2.9) team
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3185]
3.2.18
project quality management
processes (3.2.7) and activities (3.2.1) of the performing organization that determine quality policies,
objectives and responsibilities so that the project (3.2.9) will satisfy the needs for which it was undertaken
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3186]
3.2.19
project scope
authorized work to accomplish agreed objectives
[SOURCE: ISO 21502:2020, 3.25]
3.2.20
project scope management
processes (3.2.7) required to ensure that the project (3.2.9) includes all the work required, and only the work
required, to complete the project successfully
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3194]
3.2.21
schedule management plan
component of the project management (3.2.14) plan that establishes the criteria and the activities (3.2.1) for
developing, monitoring and controlling the schedule
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3619]
3.2.22
stakeholder
person, group or organization that has interests in, or can affect, be affected by, or perceive itself to be
affected by, any aspect of the project (3.2.9), programme or portfolio
[SOURCE: ISO 21502:2020, 3.27]
3.2.23
work breakdown structure
WBS
decomposition of the defined scope of a project (3.2.9) or programme into progressively lower levels
consisting of elements of work
[SOURCE: ISO 21502:2020, 3.29, modified — Abbreviated term “WBS” added.]
3.2.24
work package
group of activities (3.2.1) that have a defined scope, deliverable (3.2.5), timescale and cost
[SOURCE: ISO 21502:2020, 3.30]
3.2.25
work package leader
work package team leader
role within project management (3.2.14) that is responsible for overseeing a specific work package (3.2.24)
PROOF/ÉPREUVE
4 Core model
4.1 General
The basic configuration of the core model for corpus annotation project is based on the guidance of standards
on project management developed by ISO/TC 258 (see ISO 21500 and ISO 21502) that use work package,
process, deliverable, etc.
The work packages in the core model include the integrated management of corpus annotation project,
communication, annotation guideline, annotation work, annotation quality management and annotation
system. Each work package represents a group of activities that shall be carried out for the corpus annotation
project.
Under each work package, there are some work activities as a process performed for the subject group to be
managed. A process has its input to activate the process and the list of outputs.
A subject group represents what should be considered in the project management (e.g. stakeholder, scope,
resource, time, cost, risk, quality, procurement, communication, benefit, reporting, issue, change control,
integration). Each work package deals with almost all the subject groups in it. The following are the core
components in the model:
— Annotator, validator, project participant: A subject of personnel resource. When the human resources
are crowd-sourced, the procurement subject group will initiate the process for the supplier selection.
Annotators will be trained and qualified by means of the relevant processes: annotator training (P5.2c),
annotator quality test (P5.3a), annotator quality upgrade training (P5.3b) and annotator quality
maintenance (P5.3) control.
— Annotation guide: A subject in the scope of output from a corpus annotation project: The annotation
guide’s scope follows the project charter. The annotation guide is developed in the annotation guide
work package (WP3), used to develop the training material in the process of annotation guide training
development (P3.3). It is then revised in the process of annotation guide updating and communication
(P2.3a), and then distributed efficiently and effectively through the communication work package (WP2).
— Annotation work package (WP4) activates the following processes, according to the annotation scope
and annotation guide: setup by procuring the annotation system and environment, annotator resource
procurement, training and quality control.
— Annotation quality management work package (WP5) manages the quality of annotation output from
the annotation work package.
— Annotation system work package (WP6) maintains the annotation working environment with output
repository, security, issue logging, annotation consistency check, support functions, etc.
— Communication work package (WP2) emphasizes the project communications management among
annotation guide writers, annotators, annotation environment and system developers, quality control
managers and validators. The communication shall be done immediately, efficiently and effectively. Its
complete application and synchronization over all of annotation steps shall be ensured and recorded/
documented.
Activities in a work package and a subject group are classified into process groups to maintain the life cycle
for initiating, planning, implementing, controlling and closing a project.
The relationship between subject groups and processes is summarized in Table A.1.
4.2 Project organization and role
4.2.1 General
A corpus annotation project consists of a set of work packages as shown in Figure 1. The project manager
supervises the work package managers with the help of project technical managers. A work package works
PROOF/ÉPREUVE
as a collection of processes (or activities or tasks) under the supervision of a work package manager. A
process is accomplished by the team members to be assigned to the process under the supervision of a
process team leader.
Figure 1 — Project organization
4.2.2 Project manager
The project manager has responsibility for the project’s integrated management, supervision and decision-
making, particularly in the corpus annotation project, including to approve the annotation guide and its
distribution, optionally with the project technical manager(s). The project manager has the required
competence in natural language processing, machine learning-purpose data collection and corpus
linguistics.
4.2.3 Project technical manager
The project technical manager takes care of the deliverables for each stage of the project and manages the
schedule of each work package. This position oversees administrative support and communication with
the work package managers. The project technical manager shall have experience in participating in data
development projects and natural language processing/machine learning.
4.2.4 Work package manager
The work package manager is responsible for managing the processes belonging to the work package,
administrative work, reporting the output of the processes and communicating with the process team
leader under the work package.
4.2.5 Process team leader
The process team leader is responsible for process scheduling, deliverable management and reporting, and
communication among process team members. Additionally, the team leader is responsible for estimating
the resources required to perform the process and requesting the support to the work package manager.
The process team leader and the work package manager may participate in the actual processes with project
PROOF/ÉPREUVE
team members, and draw practical issues (e.g. about applying the annotation guidelines and, if necessary,
suggesting revisions).
4.2.6 Team member
A team member has expertise in the process, participates in the session of training and maintains the
required qualifications.
4.3 Project management process groups for corpus annotation project
Each process belongs to one of initiation, planning, implementation, control or closing process groups.
As shown in Figure 2, an annotation project does not proceed in the order of initiation, planning,
implementation, control and termination, but it is a structure that repeats and re-executes when a change
issue appears.
Figure 2 — Interaction among project management process groups
4.4 Corpus annotation project work package and process
4.4.1 General
Each work package specializes in a topic on annotation guidelines, annotation work, annotation quality,
annotation environment and system, which are the core topics of the corpus annotation project. The
integrated management work package manages the coordination among work packages as shown in the
following list:
— project integration management (WP1);
— communication (WP2);
— annotation guide authoring, updating and licensing (WP3);
— annotation work (WP4);
— annotation project quality control (WP5);
— annotation environment and system (WP6).
4.4.2 Integrated management of corpus annotation project
4.4.2.1 WP1. Project integration management work package
WP1 is responsible for integrated management and administrative support throughout the project life cycle,
including forming and managing the project team, overseeing licensing activities related to data, annotation
models, annotation guides and annotation software systems, and coordinating various work packages
PROOF/ÉPREUVE
ISO
...












Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...