Language resource management — Semantic annotation framework (SemAF) — Part 15: Measurable quantitative information extraction (MQIE)

This document establishes a measurable quantitative information extraction (MQIE) scheme, which is based on the semantic annotation scheme specified in ISO 24617-11. It is applicable to the domains of technology that carry more applicational relevance than some theoretical issues found in the ordinary use of language. NOTE ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative information. This document also treats temporal durations that are discussed in ISO 24617-1, and spatial measures such as distances that are treated in ISO 24617-7, while making them interoperable with other measure types. It also accommodates the treatment of measures or amounts that are introduced in ISO 24617-6:2016, 8.3.

Gestion des ressources linguistiques — Cadre d’annotation sémantique (SemAF) — Partie 15: Extraction d’informations quantitatives mesurables (MQIE)

Le présent document établit un schéma d’extraction des informations quantitatives mesurables (MQIE), qui est basé sur le schéma d’annotation sémantique spécifié dans l’ISO 24617-11. Il s’applique aux domaines technologiques qui présentent plus d’intérêt sur le plan de l’application que certains problèmes théoriques rencontrés dans l’utilisation ordinaire du langage. NOTE L’ISO 24617-12 traite des questions plus générales et théoriques de la quantification et de l’information quantitative. Le présent document traite également des durées temporelles qui sont abordées dans l’ISO 24617-1 et des mesures spatiales telles que les distances qui sont traitées dans l’ISO 24617-7, tout en les rendant interopérables avec d’autres types de mesures. Il intègre également le traitement des mesures ou des quantités qui sont introduits dans l’ISO 24617-6:2016, 8.3.

Upravljanje jezikovnih virov - Ogrodje za semantično označevanje (SemAF) - 15. del: Ekstrakcija merljivih kvantitativnih informacij (MQIE)

General Information

Status
Published
Publication Date
30-Apr-2025
Current Stage
6060 - International Standard published
Start Date
01-May-2025
Due Date
20-Jun-2026
Completion Date
01-May-2025
Standard
ISO 24617-15:2025 - Language resource management — Semantic annotation framework (SemAF) — Part 15: Measurable quantitative information extraction (MQIE) Released:1. 05. 2025
English language
15 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24617-15:2025 - Gestion des ressources linguistiques — Cadre d’annotation sémantique (SemAF) — Partie 15: Extraction d’informations quantitatives mesurables (MQIE) Released:1. 05. 2025
French language
16 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/DIS 24617-15:2024 - BARVE
English language
20 pages
sale 10% off
sale 10% off
e-Library read for
1 day

Standards Content (Sample)


International
Standard
ISO 24617-15
First edition
Language resource management —
2025-05
Semantic annotation framework
(SemAF) —
Part 15:
Measurable quantitative
information extraction (MQIE)
Gestion des ressources linguistiques — Cadre d’annotation
sémantique (SemAF) —
Partie 15: Extraction d’informations quantitatives
mesurables (MQIE)
Reference number
© ISO 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General framework of MQIE . 2
4.1 Overview .2
4.2 Primary requirements of MQIE .2
4.3 Framework .3
4.4 Preprocessing .4
4.5 Basic element identification.4
4.6 Link identification .5
4.7 Measure normalization .6
4.8 Verification and filtering .7
5 Examples . 7
5.1 General .7
5.2 Sample data .7
5.3 Procedure of extraction .8
5.3.1 Overview .8
5.3.2 Preprocessing.8
5.3.3 Basic element extraction .8
5.3.4 Link identification .8
5.3.5 Measure normalization .9
5.3.6 Verification and filtering .9
Annex A (informative) Examples of applications extended based on MQIE .11
Annex B (informative) Informal statements of MQI during extraction . 14
Bibliography .15

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
Measurable quantitative information (MQI) describes one of basic properties that is associated with the
magnitude aspect of quantity, and is very common in ordinary language. The main characteristics of MQI, as
described in ISO 24617-11, is that quantitative information is presented as measures expressed in terms of
a pair of a numerically expressed quantity and a unit. Such information is much more abundant in scientific
publications or technical reports to the extent that it constitutes an essential part of communicative
segments of language in general. The processing of such information is thus required for any successful
language resource management.
In such a big data era, demands from industry and academic communities for an accurate extraction of MQI
[1]
have increased. For example, business investment companies frequently need to identify and aggregate
various information covering net sales, gross profit, operating expenses, operating profit, interest expense,
net profit before taxes, net income, etc., of the target companies from their annual reports. The fast-growing
medical informatics research also needs to process a large amount of medical text to analyse the dose of
medicine, the eligibility criteria of clinical trial, the phenotype characters of patients, the laboratory tests in
[2][3]
clinical records, etc. All these demands either in industry or in medical research require the effective
[4]
extraction of MQI for automated identification, aggregation, computation and analysis.
However, in the information retrieval and natural language processing areas, there is no standardized way
of extracting measurable quantitative information currently available. Each application system developed
in industrial sectors has hitherto used common NLP models or their own models to identify measurable
quantitative information from unstructured text. There is no standard extraction procedure for ensuring
the quality of the extraction currently. A general, interoperable and standardized measurable quantitative
information extraction scheme for IR and NLP tasks to work with many different application systems is
called for.
This document formulates a general extraction scheme while following the basic requirements of semantic
annotation laid down in ISO 24617-11, which facilitates the annotation of MQI in scientific and technical
language and makes it interoperable with other semantic annotation schemes such as those given in the
parts of the ISO 24617 series. The extraction scheme also utilizes various International Standards on lexical
resources and morpho-syntactic annotation frameworks. It aims at being compatible with other existing
relevant standards such as ISO 24617-9.
NOTE ISO 24617-11 provides a standardized schema of annotating measurable quantitative information from
unstructured text.
Focusing on measurements in scientifico-technological language, this document is expected to contribute
to information retrieval (IR), question answering (QA), text summarization (TS) and other natural language
[5][6][7]
processing (NLP) applications.

v
International Standard ISO 24617-15:2025(en)
Language resource management — Semantic annotation
framework (SemAF) —
Part 15:
Measurable quantitative information extraction (MQIE)
1 Scope
This document establishes a measurable quantitative information extraction (MQIE) scheme, which is based
on the semantic annotation scheme specified in ISO 24617-11. It is applicable to the domains of technology
that carry more applicational relevance than some theoretical issues found in the ordinary use of language.
NOTE ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative information.
This document also treats temporal durations that are discussed in ISO 24617-1, and spatial measures such
as distances that are treated in ISO 24617-7, while making them interoperable with other measure types. It
also accommodates the treatment of measures or amounts that are introduced in ISO 24617-6:2016, 8.3.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 24617-6:2016, Language resource management — Semantic annotation framework — Part 6: Principles of
semantic annotation (SemAF Principles)
ISO 24617-11:2021, Language resource management — Semantic annotation framework (SemAF) — Part 11:
Measurable quantitative information (MQI)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24617-6:2016, ISO 24617-11:2021
and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
information extraction
IE
identifying specific structured information from natural language, semi-structured texts and/or other
electronic text sources
3.2
MQI
measurable quantitative information
quantitative information that can be expressed in unitized numeric terms
[SOURCE: ISO 24617-11:2021, 3.5]
3.3
MQIE
measurable quantitative information extraction
process of identifying measurable quantitative information (3.2) from natural language, semi-structured
texts and/or other electronic text sources
3.4
normalization
process that represents objective information with a formal and/or regular format or converts the
information into a consistent value range
Note 1 to entry: The normalization objectives may contain information of entities, measure units and quantities.
4 General framework of MQIE
4.1 Overview
The MQIE usually contains four types of strategies including:
a) manual extraction strategy;
b) semi-automated extraction strategy;
c) automated extraction strategy;
d) hybrid extraction strategy.
The automated extraction strategy usually involves rule-based methods, other machine-learning-based
methods and deep-learning-based methods. This document thus includes the four types of extraction
strategies and specifies a general framework for automated MQIE.
4.2 Primary requirements of MQIE
MQIE shall adhere to the following requirements:
a) generalable: the extraction process shall be general and adaptive to most kinds of MQI extraction tasks
with necessary but slight modification or tuning;
b) independable: the extraction process shall not depend on special domains, subjects and languages;
c) completable: the extraction process shall be able to identify all elements and links of MQI properly from
text by using information extraction related technologies;
d) normalizable: International System of Units, decimalism and other metric systems shall be adapted to
normalize the extracted quantities, values and units;
e) compatible: the extraction result shall be structured and shall be able to transfer into MQI-compliance
formats unambiguously;
f) assessable: the extraction result shall be evaluated by widely accepted metrics or criterions;
g) interpretable: the extraction process and the result shall be explained reasonably upon requests.

4.3 Framework
The overall framework of MQIE is represented by the workflow in Figure 1.
Figure 1 — General framework of MQIE
This framework shall consist of the following five general and mandatory steps as represented as square
boxes with the label “MQIE” in the middle of the Figure 1:
a) preprocessing: to make the format consistent and to remove noise from the original natural language
text as sources for extraction;
b) basic element identification: to extract three types of basic elements: entity, measure and relator from
text sources;
c) link identification: to extract two types of links: measure link and comparison link;
d) measure normalization: to convert and normalize identified measures into consistent or target ones;
e) verification and filtering: to verify the correctness of identified MQI and filter out incorrect ones.
As described in ISO 24617-11, the element “entity” shall be any object which has the property of measurable
quantity, represented by “@quantity”, as one of its properties. The “entity”, as is used in this document, shall
be a very general term that refers to any object, not just individual entities, but also to their properties, such
as “height” of a building or “speed” of a car, and also to any kinds of eventualities such as states, processes or
transitions.
EXAMPLE “White blood cell count > 14.0 × 10 / L”.
“White blood cell count” describes an entity. “14.0 × 10 / L” describes a measure consisting of two attributes @numeral
(“14.0 × 10 ”) and @unit (“L”). “>” describes a relator relation (“larger than”). A measure link and a comparison link
are triggered by the measure and by the relator, respectively. The composition of the example and its related parts for
extraction are illustrated in Figure 2.

Figure 2 — Example of the basic element and link compositions of MQI for extraction
4.4 Preprocessing
Original raw text commonly contains noise or inconsistent content and thus needs to be preprocessed. The
step processes raw text by text cleaning, sentence splitting, special symbol conversion, etc., to achieve usable
text for MQI extraction. It removes inconsistent character encodings, normalizes the representation of
special symbols, deletes blank spaces and corrects typos based on commonly used digit grouping delimiters
in MQI. Numerals in word form are checked and converted to normal Arabic format. The text is then split
into sentences and each sentence is checked to determine whether it contains numerals using a common
matching strategy, e.g. using regular expressions. Only the sentences containing numerals are retained for
further extraction.
EXAMPLE 1 Cleaning redundant blank spaces: changing “30 years  of age” to “30 years of age”.
EXAMPLE 2 Removing inconsistent character coding and replacing special symbols with normalized ones:
replacing cm with cm^3.
EXAMPLE 3 Rectifying typos in numerical representations: changing the typo “18,.5” in “BMI less than 18,.5 kg/
m^2” to “18,5”.
4.5 Basic element identification
The basic element identification step covers at least the following four tasks:
— entity identification;
— numeral identification;
— unit identification;
— relator identification.
The entity extraction identifies entities which have quantitative properties from preprocessed text to extract
entities.
...


Norme
internationale
ISO 24617-15
Première édition
Gestion des ressources
2025-05
linguistiques — Cadre d’annotation
sémantique (SemAF) —
Partie 15:
Extraction d’informations
quantitatives mesurables (MQIE)
Language resource management — Semantic annotation
framework (SemAF) —
Part 15: Measurable quantitative information extraction (MQIE)
Numéro de référence
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2025
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii
Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Cadre général de MQIE . . 2
4.1 Vue d’ensemble .2
4.2 Exigences principales de MQIE .2
4.3 Cadre.3
4.4 Prétraitement .4
4.5 Identification des éléments de base .4
4.6 Identification des liens .6
4.7 Normalisation des mesures .7
4.8 Vérification et filtrage .7
5 Exemples . 8
5.1 Généralités .8
5.2 Échantillons de données .8
5.3 Mode opératoire d’extraction .8
5.3.1 Vue d’ensemble .8
5.3.2 Prétraitement .8
5.3.3 Extraction des éléments de base .9
5.3.4 Identification des liens .9
5.3.5 Normalisation des mesures .10
5.3.6 Vérification et filtrage .10
Annexe A (informative) Exemples d’applications étendues basées sur MQIE .12
Annexe B (informative) Énoncés informels de MQI pendant l’extraction .15
Bibliographie .16

iii
Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes nationaux
de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est en général
confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude a le droit de faire
partie du comité technique créé à cet effet. Les organisations internationales, gouvernementales et non
gouvernementales, en liaison avec l’ISO participent également aux travaux. L’ISO collabore étroitement avec
la Commission électrotechnique internationale (IEC) en ce qui concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier, de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a
été rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir
www.iso.org/directives).
L’ISO attire l’attention sur le fait que la mise en application du présent document peut entraîner l’utilisation
d’un ou de plusieurs brevets. L’ISO ne prend pas position quant à la preuve, à la validité et à l’applicabilité de
tout droit de brevet revendiqué à cet égard. À la date de publication du présent document, l’ISO n’avait pas
reçu notification qu’un ou plusieurs brevets pouvaient être nécessaires à sa mise en application. Toutefois,
il y a lieu d’avertir les responsables de la mise en application du présent document que des informations
plus récentes sont susceptibles de figurer dans la base de données de brevets, disponible à l’adresse
www.iso.org/brevets. L’ISO ne saurait être tenue pour responsable de ne pas avoir identifié tout ou partie de
tels droits de propriété.
Les appellations commerciales éventuellement mentionnées dans le présent document sont données pour
information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion de
l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles techniques au
commerce (OTC), voir www.iso.org/avant-propos.
Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-
comité SC 4, Gestion des ressources linguistiques.
Une liste de toutes les parties de la série ISO 24617 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes se
trouve à l’adresse www.iso.org/fr/members.html.

iv
Introduction
Les informations quantitatives mesurables (MQI, Measurable Quantitative Information) décrivent l’une des
propriétés de base qui est associée à l’aspect quantitatif d’une grandeur. Elles sont très courantes dans le
langage ordinaire. Les principales caractéristiques de la norme MQI, telles que décrites dans l’ISO 24617-11,
sont que les informations quantitatives sont présentées sous forme de mesures exprimées sous forme de
paire, consistant en une grandeur exprimée numériquement et une unité. Ces informations sont beaucoup
plus abondantes dans les publications scientifiques ou les rapports techniques au point qu’elles constituent
une part essentielle des segments communicatifs du langage en général. Le traitement de ces informations
est donc nécessaire pour une gestion réussie des ressources linguistiques.
À l’époque du «big data», les demandes de l’industrie et des milieux universitaires pour une extraction précise
[1]
des MQI ont augmenté. Par exemple, les sociétés d’investissement dans les entreprises ont fréquemment
besoin d’identifier et d’agréger différentes informations couvrant les ventes nettes, la marge brute, les frais
d’exploitation, le bénéfice d’exploitation, les frais d’intérêt, le bénéfice net avant impôts, le revenu net, etc.
des sociétés cibles à partir de leurs rapports annuels. La recherche en informatique médicale, en plein essor,
a également besoin de traiter une grande quantité de textes médicaux pour analyser la dose de médicament,
les critères d’éligibilité des essais cliniques, les caractères phénotypiques des patients, les essais en
[2][3]
laboratoire dans les dossiers cliniques, etc. Toutes ces demandes, qu’elles soient liées à l’industrie ou
à la recherche médicale, exigent l’extraction efficace des MQI afin de permettre une identification, une
[4]
agrégation, un calcul et une analyse automatisés.
Cependant, dans les domaines de la recherche d’informations et du traitement du langage naturel, il
n’existe actuellement aucun moyen normalisé d’extraire les informations quantitatives mesurables. Chaque
système d’application développé dans les secteurs industriels utilise jusqu’à présent des modèles communs
de traitement automatique des langues (TAL) ou ses propres modèles pour identifier les informations
quantitatives mesurables à partir de textes non structurés. À l’heure actuelle, il n’existe aucun mode
opératoire d’extraction normalisé permettant de garantir la qualité de l’extraction. Un schéma d’extraction
des informations quantitatives mesurables qui soit général, interopérable et normalisé est nécessaire pour
permettre aux tâches de recherche d’informations (IR) et de traitement automatique des langues (TAL)
de fonctionner avec de nombreux systèmes d’application différents.
Le présent document formule un schéma d’extraction général en suivant les exigences de base de l’annotation
sémantique définies dans l’ISO 24617-11, qui facilite l’annotation des MQI dans le langage scientifique et
technique et le rend interopérable avec d’autres schémas d’annotation sémantique tels que ceux décrits
dans les parties de la série ISO 24617. Le schéma d’extraction s’appuie également sur diverses Normes
internationales relatives aux ressources lexicales et aux cadres d’annotation morpho-syntaxique. Il vise à
être compatible avec les autres normes pertinentes existantes telles que l’ISO 24617-9.
NOTE L’ISO 24617-11 fournit un schéma normalisé d’annotation des informations quantitatives mesurables à
partir de textes non structurés.
Axé sur les mesures en langage scientifico-technologique, le présent document est censé contribuer aux
applications de recherche d’informations (IR), de question-réponse (QA), de résumé de texte (TS) et autres
[5][6][7]
applications de traitement automatique des langues (TAL) .

v
Norme internationale ISO 24617-15:2025(fr)
Gestion des ressources linguistiques — Cadre d’annotation
sémantique (SemAF) —
Partie 15:
Extraction d’informations quantitatives mesurables (MQIE)
1 Domaine d’application
Le présent document établit un schéma d’extraction des informations quantitatives mesurables (MQIE),
qui est basé sur le schéma d’annotation sémantique spécifié dans l’ISO 24617-11. Il s’applique aux domaines
technologiques qui présentent plus d’intérêt sur le plan de l’application que certains problèmes théoriques
rencontrés dans l’utilisation ordinaire du langage.
NOTE L’ISO 24617-12 traite des questions plus générales et théoriques de la quantification et de l’information
quantitative.
Le présent document traite également des durées temporelles qui sont abordées dans l’ISO 24617-1
et des mesures spatiales telles que les distances qui sont traitées dans l’ISO 24617-7, tout en les rendant
interopérables avec d’autres types de mesures. Il intègre également le traitement des mesures ou des
quantités qui sont introduits dans l’ISO 24617-6:2016, 8.3.
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique. Pour
les références non datées, la dernière édition du document de référence s’applique (y compris les éventuels
amendements).
ISO 24617-6:2016, Gestion des ressources linguistiques — Cadre d’annotation sémantique — Partie 6: Principes
d’annotation sémantique (SemAF Principes)
ISO 24617-11:2021, Gestion des ressources linguistiques — Cadre d’annotation sémantique (SemAF) — Partie
11: Informations quantitatives mesurables (MQI)
3 Termes et définitions
Pour les besoins du présent document, les termes et les définitions de l’ISO 24617-6:2016, l’ISO 24617-11:2021
ainsi que les suivants s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en normalisation,
consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp
— IEC Electropedia: disponible à l’adresse https:// www .electropedia .org/
3.1
extraction d’informations
IE
identification d’informations structurées spécifiques à partir de textes semi-structurés en langage naturel
et/ou d’autres sources textuelles électroniques

3.2
MQI
information quantitative mesurable
information quantitative qui peut être exprimée en termes numériques unitaires
[SOURCE: ISO 24617-11:2021, 3.5]
3.3
MQIE
extraction d’informations quantitatives mesurables
processus d’identification d’informations quantitatives mesurables (3.2) à partir de textes semi-structurés en
langage naturel et/ou d’autres sources textuelles électroniques
3.4
normalisation
processus qui représente des informations objectives dans un format formel et/ou régulier ou qui convertit
les informations en une plage de valeurs cohérente
Note 1 à l'article: Les objectifs de la normalisation peuvent contenir des informations sur des entités, des unités de
mesure et des grandeurs.
4 Cadre général de MQIE
4.1 Vue d’ensemble
MQIE contient généralement quatre types de stratégies, notamment:
a) stratégie d’extraction manuelle;
b) stratégie d’extraction semi-automatisée;
c) stratégie d’extraction automatisée;
d) stratégie d’extraction hybride.
La stratégie d’extraction automatisée implique généralement des méthodes basées sur des règles, d’autres
méthodes basées sur l’apprentissage machine et des méthodes basées sur l’apprentissage profond. Le présent
document comprend ainsi les quatre types de stratégies d’extraction et spécifie un cadre général de MQIE
automatisée.
4.2 Exigences principales de MQIE
Le processus de MQIE doit respecter les exigences suivantes:
a) généralisable: le processus d’extraction doit être général et adaptable à la plupart des types de tâches
d’extraction de MQI, moyennant de légères modifications ou ajustements nécessaires;
b) indépendant: le processus d’extraction ne doit pas dépendre de domaines, de sujets et de langues
particuliers;
c) exhaustif: le processus d’extraction doit être capable d’identifier correctement tous les éléments et liens
relatifs aux MQI à partir d’un texte, en utilisant des technologies liées à l’extraction d’informations;
d) normalisable: le Système international d’unités, le système décimal et tout autre système métrique
doivent être adaptés pour normaliser les grandeurs, valeurs et unités extraites;
e) compatible: le résultat de l’extraction doit être structuré et doit pouvoir être converti sans ambiguïté
dans des formats compatibles avec la norme MQI;
f) évaluable: le résultat de l’extraction doit être évalué selon des mesures ou des critères largement
acceptés;
g) interprétable: le processus d’extraction et le résultat doivent être raisonnablement expliqués sur
demande.
4.3 Cadre
Le cadre général de MQIE est représenté par le diagramme fonctionnel de la Figure 1.
Figure 1 — Cadre général de MQIE
Ce cadre doit comprendre les cinq étapes générales et obligatoires suivantes telles que représentées par les
cases rectangulaires portant l’étiquette «MQIE» au milieu de la Figure 1:
a) Prétraitement: rendre le format cohérent et éliminer le bruit du texte original en langage naturel comme
source d’extraction;
b) Identification des éléments de base: extraire trois types d’éléments de base: entité, mesure et relateur à
partir de sources textuelles;
c) Identification des liens: extraire deux types de liens: lien de mesure et lien de comparaison;
d) Normalisation des mesures: convertir et normaliser les mesures identifiées en mesures cohérentes ou cibles;
e) Vérification et filtrage: vérifier l’exactitude des MQI identifiées et éliminer celles qui sont incorrectes.
Comme décrit dans l’ISO 24617-11, l’élément «entité» doit être tout objet qui a la propriété d’une grandeur
mesurable, représentée par «@grandeur», comme l’une de ses propriétés. L’«entité», telle qu’elle est utilisée
dans le présent document, doit être un terme très général qui fait référence à tout objet, non seulement à
des entités individuelles, mais aussi à leurs propriétés, telles que la «hauteur» d’un bâtiment ou la «vitesse»
d’une voiture, ainsi que toutes sortes d’éventualités telles que des états, des processus ou des transitions.
EXEMPLE «Globules blancs > 14,0 × 10 / L».
Le terme «globules blancs» désigne une entité. «14,0 × 10 / L» décrit une mesure composée de deux attributs @
numérique («14,0 × 10 ») et @unité («L»). «>» décrit une relation de relateur («supérieur à»). Un lien de mesure et un
lien de comparaison sont déclenchés respectivement par la mesure et par le relateur. La composition de l’exemple et de
ses parties associées pour l’extraction est illustrée à la Figure 2.

Figure 2 — Exemple de compositions d’éléments de base et de liens des MQI à extraire
4.4 Prétraitement
Le texte brut d’origine contient généralement du bruit ou un contenu incohérent et doit donc être prétraité.
Cette étape consiste à traiter le texte brut en le nettoyant, en découpant les phrases, en convertissant les
symboles spéciaux, etc. afin d’obtenir un texte utilisable pour l’extraction des MQI. Elle supprime les codages
de caractères incohérents, normalise la représentation des symboles spéciaux, supprime les espaces vides
et corrige les fautes de frappe sur la base des délimiteurs de groupes de chiffres couramment utilisés dans
les MQI. Les valeurs numériques écrites en lettres sont vérifiées et converties en chiffres arabes normaux.
Le texte est ensuite divisé en phrases et chaque phrase est vérifiée pour déterminer si elle contient des
valeurs numériques à l’aide d’une stratégie de correspondance courante, par exemple en utilisant des
expressions régulières. Seules les phrases contenant des valeurs numériques sont retenues pour la suite de
l’extraction.
EXEMPLE 1 Nettoyer les espaces vides redondants: remplacer «30  ans» par «30 ans».
EXEMPLE 2 Supprimer les codages de caractères incohérents et remplacer les symboles spéciaux par des symboles
normalisés: remplacer cm par cm^3.
EXEMPLE 3 Corriger les fautes de frappe dans les représentations numériques: remplacer la faute de frappe «18,.5»
dans «IMC inférieur à 18,.5 kg/m^2» par «18,5».
4.5 Identification des éléments de base
L’étape d’identification des éléments de base couvre au moins les quatre tâches suivantes:
— l’identification des entités;
— l’identification numérique;
— l’identification des unités;
— l’identification des relateurs.
L’extraction des entités consiste à identifier les entités qui ont des propriétés quantitatives à partir d’un
texte prétraité afin d’extraire les entités. L’extraction peut utiliser une stratégie unique ou une stratégie
hybride. Par exemple, une stratégie hybride peut inclure des connaissances contextuelles, des connaissances
du domaine et des informations de co-occurrence de n-grammes, représenté
...


SLOVENSKI STANDARD
oSIST ISO/DIS 24617-15:2024
01-november-2024
Upravljanje jezikovnih virov - Ogrodje za semantično označevanje (SemAF) - 15.
del: Ekstrakcija merljivih kvantitativnih informacij (MQIE)
Language resource management — Semantic annotation framework (SemAF) — Part
15: Measurable quantitative information extraction (MQIE)
Gestion des ressources linguistiques — Cadre d’annotation sémantique (SemAF) —
Partie 15: Extraction d’informations quantitatives mesurables (MQIE)
Ta slovenski standard je istoveten z: ISO/DIS 24617-15
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/DIS 24617-15:2024 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

oSIST ISO/DIS 24617-15:2024
oSIST ISO/DIS 24617-15:2024
DRAFT
International
Standard
ISO/DIS 24617-15
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Semantic annotation framework
Voting begins on:
(SemAF) —
2024-08-13
Part 15:
Voting terminates on:
2024-11-05
Measurable quantitative
information extraction (MQIE)
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
This document is circulated as received from the committee secretariat.
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS.
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Reference number
ISO/DIS 24617-15:2024(en)
oSIST ISO/DIS 24617-15:2024
DRAFT
ISO/DIS 24617-15:2024(en)
International
Standard
ISO/DIS 24617-15
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Semantic annotation framework
Voting begins on:
(SemAF) —
Part 15:
Voting terminates on:
Measurable quantitative
information extraction (MQIE)
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
© ISO 2024
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
STANDARDS MAY ON OCCASION HAVE TO
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
This document is circulated as received from the committee secretariat. BE CONSIDERED IN THE LIGHT OF THEIR
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
or ISO’s member body in the country of the requester.
NATIONAL REGULATIONS.
ISO copyright office
RECIPIENTS OF THIS DRAFT ARE INVITED
CP 401 • Ch. de Blandonnet 8
TO SUBMIT, WITH THEIR COMMENTS,
CH-1214 Vernier, Geneva
NOTIFICATION OF ANY RELEVANT PATENT
Phone: +41 22 749 01 11
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/DIS 24617-15:2024(en)
ii
oSIST ISO/DIS 24617-15:2024
ISO/DIS 24617-15:2024(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General framework of MQIE . 2
4.1 Overview .2
4.2 Primary requirements of MQIE .2
4.3 Framework .2
4.4 preprocessing .4
4.5 Basic element identification.4
4.6 Link identification .5
4.7 Measure normalization .6
4.8 Verification and Filtering .6
5 Examples . 7
5.1 General .7
5.2 Sample data .7
5.3 Procedure of extraction .7
5.3.1 Overview .7
5.3.2 Pre-processing .7
5.3.3 Basic element extraction .7
5.3.4 Link identification .8
5.3.5 Measure normalization .8
5.3.6 Verification and Filtering .8
Annex A (informative) The examples of applications extended based on MQIE .11
Annex B (informative) Informal statements of MQI during extraction . 14
Bibliography .15

iii
oSIST ISO/DIS 24617-15:2024
ISO/DIS 24617-15:2024(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any patent
rights identified during the development of the document will be in the Introduction and/or on the ISO list of
patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
oSIST ISO/DIS 24617-15:2024
ISO/DIS 24617-15:2024(en)
Introduction
Measurable quantitative information (MQI) describes one of basic properties that is associated with the
magnitude aspect of quantity, and is very common in ordinary language. The main characteristics of MQI, as
[1]
described in ISO 24617-11, is that quantitative information is presented as measures expressed in terms of
a pair of a numerically expressed quantity and a unit. Such information is much more abundant in scientific
publications or technical reports to the extent that it constitutes an essential part of communicative
segments of language in general. The processing of such information is thus required for any successful
language resource management.
In such a big data era, demands from industry and academic communities for an accurate extraction of MQI
[2]
have increased. For example, business investment companies frequently need to identify and aggregate
various information covering net sales, gross profit, operating expenses, operating profit, interest expense,
net profit before taxes, net income, etc., of the target companies from their annual reports. The fast-growing
medical informatics research also needs to process a large amount of medical text to analyze the dose of
medicine, the eligibility criteria of clinical trial, the phenotype characters of patients, the lab tests in clinical
[3,4]
records, etc. All these demands either in industry or in medical research require the effective extraction
[5]
of MQI for automated identification, aggregation, computation, and analysis .
However, in the information retrieval and natural language processing areas, there is no standardized way
of extracting measurable quantitative information currently available. Each application system developed
in industrial sectors has hitherto used common NLP models or their own models to identify measurable
quantitative information from unstructured text. There is no standard extraction procedure for ensuring
the quality of the extraction currently. A general, interoperable and standardized measurable quantitative
information extraction scheme for IR and NLP tasks to work with many different application systems is
called for.
This document, named ‘SemAF-MQIE’, aims at formulating a general extraction scheme with following the
basic requirements of semantic annotation laid down in ISO 24617-11, which facilitates the annotation
of MQI in scientific and technical language and to make it interoperable with other semantic annotation
schemes such as ISO 24617. The extraction scheme also utilizes various ISO standards on lexical resources
and morpho-syntactic annotation frameworks. It aims at being compatible with other existing relevant
standards.
NOTE ISO 24617-11 has proposed a standardized schema of annotating measurable quantitative information
from unstructured text.
Focusing on measurements in scientifico-technological language, this document is expected to contribute to
information retrieval (IR), question answering (QA), text summarization (TS), and other natural language
[6-8]
processing (NLP) applications .

v
oSIST ISO/DIS 24617-15:2024
oSIST ISO/DIS 24617-15:2024
DRAFT International Standard ISO/DIS 24617-15:2024(en)
Language resource management — Semantic annotation
framework (SemAF) —
Part 15:
Measurable quantitative information extraction (MQIE)
1 Scope
This document covers the extractions of measurable or magnitudinal aspect of quantity so that it can focus
on the technical or practical use of measurements in IR (information retrieval), QA (question answering),
TS (text summarization), and other NLP (natural language processing) applications. It is applicable to the
domains of technology that carry more applicational relevance than some theoretical issues found in the
ordinary use of language.
NOTE ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative information.
This document also treats temporal durations that are discussed in ISO 24617-1, and spatial measures such
as distances that are treated in ISO 24617-7, while making them interoperable with other measure types. It
also accommodates the treatment of measures or amounts that are introduced in ISO 24617-6:2016, 8.3.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 24617-11:2021, Language resource management — Semantic annotation framework (SemAF) — Part 11:
Measurable quantitative information (MQI)
ISO 80000-1:2009, Quantities and units — Part 1: General
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24617-6:2016, ISO 24617-11:2021
and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
information extraction
IE
extracting specific structured information from natural language and/or semi-structured texts and other
electronically represented text sources

oSIST ISO/DIS 24617-15:2024
ISO/DIS 24617-15:2024(en)
3.2
measurable quantatitive information extraction
MQIE
extracting measurable quantatitive information from natural language and/or semi-structured texts and
other electronically represented text sources
3.3
normalization
process that represents objective information with a formal and/or regular format or converts the
information into a consistent value range
Note 1 to entry: The normalization objectives may contain information of entities, measure units and quantities.
4 General framework of MQIE
4.1 Overview
The MQIE usually contains four types of strategies including: 1) manual extraction strategy, 2) semi-
automated extraction strategy, 3) automated extraction strategy, and 4) hybrid extraction strategy. The
automated extraction strategy usually involves rule-based methods, other machine learning-based methods,
and deep learning-based methods. The SemAF-MQIE document thus includes the four types of extraction
strategies and specifies a general framework for automated MQIE.
4.2 Primary requirements of MQIE
MQIE shall adhere to the following requirements:
a) Generalable: the extraction process shall be general and adaptive to most kinds of MQI extraction tasks
with necessary but slight modification or tuning;
b) Independenable: the extraction process shall not depend on special domains, subjects, and languages;
c) Completable: the extraction process shall be able to identify all elements and links of MQI properly from
text by using information extraction related technolgies;
d) Normalizable: International System of Units, decimalism and other metric system shall be adapted to
nomalize the extracted quantities, values, and units;
e) Compatible: the extraction result shall be structured, and able to transfor into MQI-compliance formats
unambiguously;
f) Assessiable: the extraction result shall be evaluated by widely accepted metrics or criterions;
g) Interpretable: the extraction process and the result shall be explained reasonably upon requests.
4.3 Framework
The overall framework of MQIE is represented by the workflow in Figure 1.

oSIST ISO/DIS 24617-15:2024
ISO/DIS 24617-15:2024(en)
Figure 1 — the general framework of MQIE
This framework shall consist of five general and mandatory steps as represented as square boxes with the
label “MQIE” in the middle of the Figure 1:
a) preprocessing: to consist format and remove noises of original natural language text as sources for
extraction;
b) basic element identification: to extract three types of basic elements: entity, measure, and relator
from text sources;
c) link identification: to extract two types of links: measure link and comparison link;
d) measure normalization: to convert and normalize identified measures into consistent or target ones;
e) verification and filtering: to verify the correctness of identified MQI and filter out incorrect ones.
Note 1 As described in ISO 24617-11, the element “entity” shall be any object which has the property of measurable
quantity, represented by “@quantity”, as one of its properties. The “entity”, as is used in this document, shall be a very
general term that refers to any object, not just individual entities, but also to their properties, such as “height” of a
building or “speed” of a car, and also to any kinds of eventualities such as states, processes or transitions.
EXAMPLE 1 “White blood cell > 14.0 X 10 / L”
“White blood cell” describes an entity. “14.0 X 10 / L” describes a measure consisting of two attributes @numeral
(“14.0 X 10 ”) and @unit (“L”). “>” describes a relator relation (“larger than”). A measure link and a comparison link
are triggered by the measure and by the relator, respectively. The composition of the example 1 and its related parts
for extraction are illustrated in Figure 2.
Figure 2 — An example of the basic element and link compositions of MQI for extraction

oSIST ISO/DIS 24617-15:2024
ISO/DIS 24617-15:2024(en)
4.4 preprocessing
Original raw text commonly contains noise or inconsistent content and thus need to be pre-processed.
The step processes raw text by text cleaning, sentence splitting, special symbol
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...