Language resource management — Controlled human communication (CHC) — Part 5: Lexico-morpho-syntactic principles and methodology for personal data recognition and protection in text

This document establishes basic principles and a methodology to recognize personal data written in free text, in different languages (whether agglutinating, inflectional or isolating) and countries. This document is applicable to protecting human data circulating in national and international industries, and private and public organizations. This document is applicable to processing by human beings and/or automated processing, and to various domains (e.g. law, finance, health). It does not apply to automated image processing. This document uses formal methods only, as statistical methods are very different in nature.

Gestion des ressources linguistiques — Communication humaine contrôlée (CHC) — Partie 5: Principes lexico-morpho-syntaxiques et méthodologie pour la reconnaissance et la protection des données à caractère personnel dans du texte

Le présent document définit les principes de base et la méthodologie pour reconnaître des données à caractère personnel dans du texte libre, dans différentes langues (qu’elles soient agglutinantes, flexionnelles ou isolantes) et pays. Le présent document est applicable essentiellement à la protection des données humaines circulant dans les industries nationales et internationales, et dans les organisations privées et publiques. Le présent document s’applique au traitement par des êtres humains et/ou au traitement automatisé, ainsi qu’à divers domaines (par exemple, le droit, la finance, la santé). Il ne s’applique pas au traitement automatisé des images. Le présent document n’utilise que des méthodes formelles, les méthodes statistiques étant de nature très différente.

General Information

Status
Published
Publication Date
02-Jun-2024
Current Stage
6060 - International Standard published
Start Date
03-Jun-2024
Due Date
07-Feb-2025
Completion Date
03-Jun-2024
Ref Project

Buy Standard

Standard
ISO 24620-5:2024 - Language resource management — Controlled human communication (CHC) — Part 5: Lexico-morpho-syntactic principles and methodology for personal data recognition and protection in text Released:3. 06. 2024
English language
19 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24620-5:2024 - Gestion des ressources linguistiques — Communication humaine contrôlée (CHC) — Partie 5: Principes lexico-morpho-syntaxiques et méthodologie pour la reconnaissance et la protection des données à caractère personnel dans du texte Released:3. 06. 2024
French language
19 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/FDIS 24620-5 - Language resource management — Controlled human communication (CHC) — Part 5: Lexico-morpho-syntactic principles and methodology for personal data recognition and protection in text Released:22. 02. 2024
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
REDLINE ISO/FDIS 24620-5 - Language resource management — Controlled human communication (CHC) — Part 5: Lexico-morpho-syntactic principles and methodology for personal data recognition and protection in text Released:22. 02. 2024
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

International
Standard
ISO 24620-5
First edition
Language resource
2024-06
management — Controlled human
communication (CHC) —
Part 5:
Lexico-morpho-syntactic principles
and methodology for personal data
recognition and protection in text
Gestion des ressources linguistiques — Communication humaine
contrôlée (CHC) —
Partie 5: Principes lexico-morpho-syntaxiques et méthodologie
pour la reconnaissance et la protection des données à caractère
personnel dans du texte
Reference number
ISO 24620-5:2024(en) © ISO 2024

---------------------- Page: 1 ----------------------
ISO 24620-5:2024(en)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland

© ISO 2024 – All rights reserved
ii

---------------------- Page: 2 ----------------------
ISO 24620-5:2024(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation for controlled human communication . 2
5 Basic principles and methodology . 2
5.1 General .2
5.2 Specific issues .3
5.3 Principles .3
5.3.1 Overview .3
5.3.2 Lexical, morphological and syntactic indicants .4
6 Applications . 6
6.1 General .6
6.2 Different language families .6
6.3 Languages and countries .6
6.4 Semes in text.
...

Norme
internationale
ISO 24620-5
Première édition
Gestion des ressources
2024-06
linguistiques — Communication
humaine contrôlée (CHC) —
Partie 5:
Principes lexico-morpho-
syntaxiques et méthodologie pour
la reconnaissance et la protection
des données à caractère personnel
dans du texte
Language resource management — Controlled human
communication (CHC) —
Part 5: Lexico-morpho-syntactic principles and methodology for
personal data recognition and protection in text
Numéro de référence
ISO 24620-5:2024(fr) © ISO 2024

---------------------- Page: 1 ----------------------
ISO 24620-5:2024(fr)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2024
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse

© ISO 2024 – Tous droits réservés
ii

---------------------- Page: 2 ----------------------
ISO 24620-5:2024(fr)
Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Raisons en faveur d’une communication humaine contrôlée . 2
5 Principes de base et méthodologie . . 3
5.1 Généralités .3
5.2 Aspects spécifiques .3
5.3 Principes .3
5.3.1 Vue d’ensemble .3
5.3.2 Indicateurs lexicaux, morphologiques et syntaxiques .4
6 Applications . 6
6.1 Généralités .6
6.2 Différentes familles de langues .6
6.3 Langues et pays . .6
6.4 Sèmes dans les textes .
...

FINAL DRAFT
International
Standard
ISO/FDIS 24620-5
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Controlled human communication
Voting begins on:
(CHC) —
2024-03-07
Part 5:
Voting terminates on:
2024-05-02
Lexico-morpho-syntactic principles
and methodology for personal data
recognition and protection in text
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
Reference number
ISO/FDIS 24620-5:2024(en) © ISO 2024

---------------------- Page: 1 ----------------------
FINAL DRAFT
ISO/FDIS 24620-5:2024(en)
International
Standard
ISO/FDIS 24620-5
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Controlled human communication
Voting begins on:
(CHC) —
Part 5:
Voting terminates on:
Lexico-morpho-syntactic principles
and methodology for personal data
recognition and protection in text
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
COPYRIGHT PROTECTED DOCUMENT
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
© ISO 2024
IN ADDITION TO THEIR EVALUATION AS
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
or ISO’s member body in the country of the requester.
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/FDIS 24620-5:2024(en) © ISO 2024

© ISO 2024 – All rights reserved
ii

---------------------- Page: 2 ----------------------
ISO/FDIS 24620-5:2024(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation for controlled human communication . 2
5 Basic principles and methodology . 2
5.1 General .2
5.2 Specific issues .3
5.3 Principles .3
5.3.1 Overview .
...

Date: 2023-12-28
ISO/FDIS 24620-5:2023(E)2024
Date: 2024-02-21
ISO/TC 37/SC 4/WG 5
Secretariat: KATS
Language resource management — Controlled human
communication (CHC) — Part 5: Lexico-morpho-syntactic
principles and methodology for personal data recognition and
protection in textstext (DataPro)
Gestion des ressources linguistiques — Communication humaine contrôlée (CHC) — Partie 5:
Principes lexico-morpho-syntaxiques et méthodologie pour la détection et protection des données
personnelles dans les textes (DataPro)
© ISO 2024 – All rights reserved
i

---------------------- Page: 1 ----------------------
ISO/FDIS 24620-5:2023(E)
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of
this publication may be reproduced or utilized otherwise in any form or by any means, electronic or
mechanical, including photocopying, or posting on the internet or an intranet, without prior written
permission. Permission can be requested from either ISO at the address below or ISO’s member body in the
country of the requester.
ISO Copyright Office
CP 401 • CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland.
ii © ISO 2023 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/FDIS 24620-5:2024
Contents
Foreword . iv
Introduction. v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation for controlled human communication . 2
5 Basic principles and methodology . 4
5.1 General . 4
5.2 Specific issues . 5
5.3 Principles . 6
5.3.1 Overview . 6
5.3.2 Lexical, morphological and syntactic indicants . 7
6 Applications . 10
6.1 General . 10
6.2 Different language families . 10
6.3 Languages and countries . 10
6.4 Semes in text . 11
6.5 Applications for personal data recognition . 11
Annex A (informative) Examples of text in different languages and different semes . 12
Annex B (informative) Examples of hidden text with seme indications . 20
Annex C (informative) Table of semes in context . 23
Bibliography . 26
© ISO 2024 – All rights reserved
iii

---------------------- Page: 3 ----------------------
ISO/FDIS 2
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.