ISO/TS 25237:2008
(Main)Health informatics - Pseudonymization
Health informatics - Pseudonymization
ISO/TS 25237:2008 contains principles and requirements for privacy protection using pseudonymization services for the protection of personal health information. ISO/TS 25237:2008 is applicable to organizations who make a claim of trustworthiness for operations engaged in pseudonymization services. ISO/TS 25237:2008: defines one basic concept for pseudonymization; gives an overview of different use cases for pseudonymization that can be both reversible and irreversible; defines one basic methodology for pseudonymization services including organizational as well as technical aspects; gives a guide to risk assessment for re-identification; specifies a policy framework and minimal requirements for trustworthy practices for the operations of a pseudonymization service; specifies a policy framework and minimal requirements for controlled re-identification; specifies interfaces for the interoperability of services interfaces.
Informatique de santé — Pseudonymisation
General Information
Relations
Frequently Asked Questions
ISO/TS 25237:2008 is a technical specification published by the International Organization for Standardization (ISO). Its full title is "Health informatics - Pseudonymization". This standard covers: ISO/TS 25237:2008 contains principles and requirements for privacy protection using pseudonymization services for the protection of personal health information. ISO/TS 25237:2008 is applicable to organizations who make a claim of trustworthiness for operations engaged in pseudonymization services. ISO/TS 25237:2008: defines one basic concept for pseudonymization; gives an overview of different use cases for pseudonymization that can be both reversible and irreversible; defines one basic methodology for pseudonymization services including organizational as well as technical aspects; gives a guide to risk assessment for re-identification; specifies a policy framework and minimal requirements for trustworthy practices for the operations of a pseudonymization service; specifies a policy framework and minimal requirements for controlled re-identification; specifies interfaces for the interoperability of services interfaces.
ISO/TS 25237:2008 contains principles and requirements for privacy protection using pseudonymization services for the protection of personal health information. ISO/TS 25237:2008 is applicable to organizations who make a claim of trustworthiness for operations engaged in pseudonymization services. ISO/TS 25237:2008: defines one basic concept for pseudonymization; gives an overview of different use cases for pseudonymization that can be both reversible and irreversible; defines one basic methodology for pseudonymization services including organizational as well as technical aspects; gives a guide to risk assessment for re-identification; specifies a policy framework and minimal requirements for trustworthy practices for the operations of a pseudonymization service; specifies a policy framework and minimal requirements for controlled re-identification; specifies interfaces for the interoperability of services interfaces.
ISO/TS 25237:2008 is classified under the following ICS (International Classification for Standards) categories: 35.240.80 - IT applications in health care technology. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/TS 25237:2008 has the following relationships with other standards: It is inter standard links to ISO 25237:2017. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/TS 25237:2008 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
TECHNICAL ISO/TS
SPECIFICATION 25237
First edition
2008-12-01
Health informatics — Pseudonymization
Informatique de santé — Pseudonymisation
Reference number
©
ISO 2008
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimised for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2008
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilised in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2008 – All rights reserved
Contents Page
Foreword. iv
Introduction . v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions. 1
4 Symbols (and abbreviated terms) . 6
5 Requirements for privacy protection of identities in healthcare . 6
5.1 A conceptual model for pseudonymization of personal data . 6
5.2 Categories of data subject. 13
5.3 Classification of data. 14
5.4 Trusted services . 16
5.5 Need for re-identification of pseudonymized data . 16
5.6 Pseudonymization service characteristics . 17
6 Pseudonymization process (methods and implementation). 18
6.1 Design criteria . 18
6.2 Entities in the model. 18
6.3 Workflow in the model. 20
6.4 Preparation of data . 21
6.5 Processing steps in the workflow. 22
6.6 Protecting privacy protection through pseudonymization . 23
7 Re-identification process (methods and implementation) . 27
8 Specification of interoperability of interfaces (methods and implementation). 28
9 Policy framework for operation of pseudonymization services (methods and
implementation) . 29
9.1 General. 29
9.2 Privacy policy. 29
9.3 Trustworthy practices for operations. 30
9.4 Implementation of trustworthy practices for re-identification . 31
Annex A (informative) Healthcare pseudonymization scenarios . 33
Annex B (informative) Requirements for privacy risk assessment design. 46
Bibliography . 56
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
In other circumstances, particularly when there is an urgent market requirement for such documents, a
technical committee may decide to publish other types of document:
⎯ an ISO Publicly Available Specification (ISO/PAS) represents an agreement between technical experts in
an ISO working group and is accepted for publication if it is approved by more than 50 % of the members
of the parent committee casting a vote;
⎯ an ISO Technical Specification (ISO/TS) represents an agreement between the members of a technical
committee and is accepted for publication if it is approved by 2/3 of the members of the committee casting
a vote.
An ISO/PAS or ISO/TS is reviewed after three years in order to decide whether it will be confirmed for a
further three years, revised to become an International Standard, or withdrawn. If the ISO/PAS or ISO/TS is
confirmed, it is reviewed again after a further three years, at which time it must either be transformed into an
International Standard or be withdrawn.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO/TS 25237 was prepared by Technical Committee ISO/TC 215, Healthcare informatics.
iv © ISO 2008 – All rights reserved
Introduction
Pseudonymization is recognised as an important method for privacy protection of personal health information.
Such services may be used nationally as well as for trans-border communication.
Application areas include but are not limited to:
⎯ secondary use of clinical data (e.g. research);
⎯ clinical trials and post-marketing surveillance;
⎯ pseudonymous care;
⎯ patient identification systems;
⎯ public health monitoring and assessment;
⎯ confidential patient-safety reporting (e.g. adverse drug effects);
⎯ comparative quality indicator reporting;
⎯ peer review;
⎯ consumer groups;
⎯ equipment maintenance.
This Technical Specification provides a conceptual model of the problem areas, requirements for trustworthy
practices, and specifications to support the planning and implementation of pseudonymization services.
The specification of a general workflow together with a policy for trustworthy operations serve both as a
general guide for implementers but also for quality assurance purposes, assisting users of the
pseudonymization services to determine their trust in the services provided.
This Technical Specification also defines the interfaces to pseudonymization services to ensure
interoperability between pseudonymization service systems, identity management systems, information
providers and recipients of pseudonyms.
TECHNICAL SPECIFICATION ISO/TS 25237:2008(E)
Health informatics — Pseudonymization
1 Scope
This Technical Specification contains principles and requirements for privacy protection using
pseudonymization services for the protection of personal health information. This technical specification is
applicable to organizations who make a claim of trustworthiness for operations engaged in pseudonymization
services.
This Technical Specification:
⎯ defines one basic concept for pseudonymization;
⎯ gives an overview of different use cases for pseudonymization that can be both reversible and
irreversible;
⎯ defines one basic methodology for pseudonymization services including organizational as well as
technical aspects;
⎯ gives a guide to risk assessment for re-identification;
⎯ specifies a policy framework and minimal requirements for trustworthy practices for the operations of a
pseudonymization service;
⎯ specifies a policy framework and minimal requirements for controlled re-identification;
⎯ specifies interfaces for the interoperability of services interfaces.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 27799, Health informatics —Information security management in health using ISO/IEC 27002
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
access control
means of ensuring that the resources of a data processing system can be accessed only by authorized
entities in authorized ways
[ISO/IEC 2382-8:1998, definition 08.04.01]
3.2
anonymization
process that removes the association between the identifying data set and the data subject
3.3
anonymized data
data from which the patient cannot be identified by the recipient of the information
[General Medical Council Confidentiality Guidance]
3.4
anonymous identifier
identifier of a person which does not allow the unambiguous identification of the natural person
3.5
authentication
assurance of the claimed identity
3.6
ciphertext
data produced through the use of encryption, the semantic content of which is not available without the use of
cryptographic techniques
[ISO/IEC 2382-8:1998, definition 08-03-8]
3.7
confidentiality
property that information is not made available or disclosed to unauthorized individuals, entities or processes
[ISO 7498-2:1989, definition 3.3.16]
3.8
content-encryption key
cryptographic key used to encrypt the content of a communication
3.9
controller
natural or legal person, public authority, agency or any other body which alone or jointly with others
determines the purposes and means of the processing of personal data
3.10
cryptography
discipline which embodies principles, means and methods for the transformation of data in order to hide its
information content, prevent its undetected modification and/or prevent its unauthorized use
[ISO 7498-2:1989, definition 3.3.20]
3.11
cryptographic algorithm
〈cipher〉 method for the transformation of data in order to hide its information content, prevent its undetected
modification and/or prevent its unauthorized use
3.12
key management
cryptographic key management
generation, storage, distribution, deletion, archiving and application of keys in accordance with a security
policy (3.43)
[ISO 7498-2:1989, definition 3.3.33]
2 © ISO 2008 – All rights reserved
3.13
data integrity
property that data have not been altered or destroyed in an unauthorized manner
[ISO 7498-2:1989, definition 3.3.21]
3.14
data linking
matching and combining data from multiple databases
3.15
data protection
technical and social regimen for negotiating, managing and ensuring informational privacy, confidentiality and
security
3.16
data-subjects
persons to whom data refer
3.17
decipherment
decryption
process of obtaining, from a ciphertext, the original corresponding data
[ISO/IEC 2382-8:1998, definition 08-03-04]
NOTE A ciphertext can be enciphered a second time, in which case a single decipherment does not produce the
original plaintext.
3.18
de-identification
general term for any process of removing the association between a set of identifying data and the data
subject
3.19
direct identifying data
data that directly identifies a single individual
NOTE Direct identifiers are those data that can be used to identify a person without additional information or with
cross-linking through other information that is in the public domain.
3.20
disclosure
divulging of, or provision of access to, data
NOTE Whether the recipient actually looks at the data, takes them into knowledge, or retains them, is irrelevant to
whether disclosure has occurred.
3.21
encipherment
encryption
cryptographic transformation of data to produce ciphertext (3.6)
[ISO 7498-2:1989, definition 3.3.27]
NOTE See cryptography (3.10).
3.22
subject of care identifier
healthcare identifier
identifier of a person for exclusive use by a healthcare system
3.23
identifiable person
one who can be identified, directly or indirectly, in particular by reference to an identification number or to one
or more factors specific to his physical, physiological, mental, economic, cultural or social identity
[Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of
individuals with regard to the processing of personal data and on the free movement of such data]
3.24
identification
process of using claimed or observed attributes of an entity to single out the entity among other entities in a
set of identities
NOTE The identification of an entity within a certain context enables another entity to distinguish between the entities
with which it interacts.
3.25
identifier
information used to claim an identity, before a potential corroboration by a corresponding authenticator
[ENV 13608-1]
3.26
indirectly identifying data
data that can identify a single person only when used together with other indirectly identifying data
NOTE Indirect identifiers can reduce the population to which the person belongs, possibly down to one if used in
combination.
EXAMPLE Postcode, sex, age, date of birth.
3.27
information
data set within a context of meaning
3.28
irreversibility
situation when, for any passage from identifiable to pseudonymous, it is computationally unfeasible to trace
back to the original identifier from the pseudonym
3.29
key
sequence of symbols which controls the operations of encipherment (3.21) and decipherment (3.17)
[ISO 7498-2:1989, definition 3.3.32]
3.30
linkage of information objects
process allowing a logical association to be established between different information objects
3.31
other names
name(s) by which the patient has been known at some time [HL7]
4 © ISO 2008 – All rights reserved
3.32
person identification
process for establishing an association between an information object and a physical person
3.33
personal identifier
information with the purpose of uniquely identifying a person within a given context
3.34
personal data
any information relating to an identified or identifiable natural person (“data subject”)
[Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of
individuals with regard to the processing of personal data and on the free movement of such data]
3.35
primary use of personal data
use of personal data for delivering healthcare
3.36
privacy
freedom from intrusion into the private life or affairs of an individual when that intrusion results from undue or
illegal gathering and use of data about that individual
[ISO/IEC 2382-8:1998, definition 08-01-23]
3.37
processing of personal data
any operation or set of operations that is performed upon personal data, whether or not by automatic means,
such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use,
disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking,
erasure or destruction
[Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of
individuals with regard to the processing of personal data and on the free movement of such data]
3.38
processor
natural or legal person, public authority, agency or any other body that processes personal data on behalf of
the controller
[Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of
individuals with regard to the processing of personal data and on the free movement of such data]
3.39
pseudonymization
particular type of anonymization that both removes the association with a data subject and adds an
association between a particular set of characteristics relating to the data subject and one or more
pseudonyms
3.40
pseudonym
personal identifier that is different from the normally used personal identifier
NOTE 1 This may be either derived from the normally used personal identifier in a reversible or irreversible way, or
alternatively be totally unrelated.
NOTE 2 Pseudonym is usually restricted to mean an identifier that does not allow the derivation of the normal personal
identifier. Such pseudonymous information is thus functionally anonymous.
3.41
recipient
natural or legal person, public authority, agency or any other body to whom data are disclosed
3.42
secondary use of personal data
any use different from primary use
3.43
security policy
plan or course of action adopted for providing computer security
[ISO/IEC 2382-8:1998, definition 08-01-06]
4 Symbols (and abbreviated terms)
HIPAA Health Insurance Portability and Accountability Act
HIS Hospital Information System
HIV Human Immunodeficiency Virus
IP Internet Protocol
VoV Victim of Violence
5 Requirements for privacy protection of identities in healthcare
5.1 A conceptual model for pseudonymization of personal data
5.1.1 General
De-identification is the general term for any process of removing the association between a set of identifying
data and the data subject. Pseudonymization is a subcategoy of de-identification. The pseudonym is the
means by which pseudonymized data are linked to the same person across multiple data records or
information systems without revealing the identity of the person. Pseudonymization can be performed with or
without the possibility of re-identifying the subject of the data (reversible or irreversible pseudonymization).
There are several use case scenarios in healthcare for pseudonymization with particular applicability in
increasing electronic processing of patient data together with increasing patient expectations for privacy
protection. Several examples of these are provided in Annex A.
NOTE Anonymization is another subcategory of de-identification. Unlike pseudonymization, it does not provide a
means by which the information may be linked to the same person across multiple data records or information systems.
Hence re-identification of anonymized data is not possible.
5.1.2 Objectives of privacy protection
The objective of privacy protection, e.g. by using pseudonymization, is to prevent the unauthorized or
unwanted disclosure of information about a person which may further influence legal, organizational and
financial risk factors. Privacy protection is a subdomain of generic privacy protection that by definition includes
other privacy sensitive entities such as organizations. As privacy is the best regulated and pervasive one, this
conceptual model focuses on privacy. Protective solutions designed for privacy can also be transposed for the
privacy protection of other entities. This may be useful in countries where the privacy of entities or
organizations is regulated by law.
6 © ISO 2008 – All rights reserved
There are two strands in the protection of personal data, one that is oriented towards the protection of
personal data in interaction with on-line applications (e.g. web browsing) and another strand that looks at the
protection of collected personal data in databases. This Technical Specification will restrict itself to the latter
context.
A pre-requisite of this conceptual model is that data can be extracted from, e.g. treatment or diagnostic
databases. This conceptual model ensures that the identities of the data subjects are not disclosed.
Researchers work with “cases”, longitudinal histories of patients collected in time and/or from different sources.
For the aggregation of various data elements into the cases, it is however, necessary to use a technique that
enables aggregations without endangering the privacy of the data subjects whose data are being aggregated.
This can be achieved by pseudonymization of the data.
5.1.3 Privacy protection of entities
The conceptual model uses the privacy of personal data as a starting point, but the term "data subject" is not
limited to persons and can denote any other entity such as an organization, a device or an application. It is
however useful to focus on physical persons as their privacy is covered in legislation and the focus of privacy
protection is on them. Privacy legislation contains specifications on some of the concepts covered in this
model. In the healthcare context, the privacy protection of persons is much more complicated than the privacy
protection of, e.g., devices, because phenotype data can potentially help to identify the data subject.
5.1.4 Personal data versus de-identified data
5.1.4.1 Definition of personal data
According to the Data Privacy Protection Directive (Directive 95/46/EC) of the European Parliament and of the
th [7]
Council of 24 October 1995 (European Data Protection Directive), “personal data” shall mean any
information relating to an identified or identifiable natural person (“data subject”); an identifiable person is one
who can be identified, directly or indirectly, in particular by reference to an identification number or to one or
more factors specific to his physical, physiological, mental, economic, cultural or social identity.
This concept is addressed in other national legislations with consideration for the same principals found in this
definition (e.g. HIPAA).
5.1.4.2 The idealized concept of identification and de-identification
Figure 1 — Identification of data subjects
This subclause describes an idealized concept of identification and de-identification. It is assumed that there
are no data outside the model, e.g. that may be linked with data inside the model to achieve (indirect)
identification of data subjects.
In 5.1.5, potential information sources outside the data model will be taken into account. This is necessary in
order to discuss re-identification risks. Information and communication technology projects never picture data
that are not used within the model when covering functional design aspects. However, when focusing on
identifiability, critics bring in information that could be obtained by an attacker in order to identify data subjects,
or to gain more information on them (e.g. membership of a group).
As depicted in Figure 1, a data subject has a number of characteristics (e.g. name, date of birth, medical data)
that are stored in a medical database and that are personal data of the data subject. A data subject is
identified within a set of data subjects if they can be singled out. That means that a set of characteristics
associated with the data subject can be found that uniquely identifies this data subject. In some cases, only
one single characteristic is sufficient to identify the data subject (e.g. if the number is a unique national
registration number). In other cases, more than one characteristic is needed to single out a data subject, such
as when the address is used of a family member living at the same address. Some associations between
characteristics and data subjects are more persistent in time (e.g. a date of birth, location of birth) than others
(e.g. an e-mail address).
Figure 2 — Separation of personal data from payload data
From a conceptual point of view, personal data can be split up into two parts according to identifiability criteria
(see. Figure 2):
⎯ payload data: the data part, containing characteristics that do not allow unique identification of the data
subject; conceptually, the payload contains anonymous data;
⎯ identifying data: the identifying part that contains a set of characteristics that allow unique identification of
the data subject (e.g. demographic data).
Note that the conceptual distinction between “identifying data” and “payload data” can lead to contradictions.
This is the case when directly identifying data are considered “payload data”. Any pseudonymization method
should strive to reduce the level of directly identifying data, e.g. by aggregating these data into groups. In
particular cases (e.g. date of birth of infants) where this is not possible, the risk should be pointed out in the
policy document. A following section of this document deals with the splitting of the data into the payload part
and the identifying part from a practical point of view, rather than from a conceptual point of view. From a
conceptual point of view it is sufficient that it is possible to obtain this division. It is important to note that the
distinction between identifying characteristics and payload are not absolute. Some data that is also identifying
might be needed for the research, e.g. year and month of birth. These distinctions are covered further on.
5.1.4.3 The concept of pseudonymization
The practice and advancement of medicine require that elements of private medical records be released for
teaching, research, quality control and other purposes. For both scientific and privacy reasons these record
elements need to be modified to conceal the identities of the subjects.
There is no one single de-identification procedure that will meet the diverse needs of all the medical uses
while providing identity concealment. Every record release process shall be subject to risk analysis to
evaluate:
a) the purpose for the data release (e.g. analysis);
b) the minimum information that shall be released to meet that purpose;
c) what the disclosure risks will be (including re-identification);
d) what release strategies are available.
8 © ISO 2008 – All rights reserved
From this, the details of the release process and the risk analysis, a strategy of identification concealment
shall be determined. This determination shall be performed for each new release process, although many
different release processes may select a common release strategy and details. Most teaching files will have
common characteristics of purpose and minimum information content. Many clinical drug trials will have a
common strategy with varying details. De-identification meets more needs than just privacy protection. There
are often issues such as single-blinded and double-blinded experimental procedures that also require de-
identification to provide the blinding. This will affect the decision on release procedures.
This subclause provides the terminology used for describing the concealment of identifying information.
Figure 3 — Anonymization
Anonymization (see Figure 3) is the process that removes the association between the identifying data set
and the data subject. This can be done in two different ways:
⎯ by removing or transforming characteristics in the associated characteristics-data-set so that the
association is not unique anymore and relates to more than one data subject;
⎯ by increasing the population in the data subjects set so that the association between the data set and the
data subject is not unique anymore.
Figure 4 — Pseudonymization
Pseudonymization (see Figure 4) removes the association with a data subject and adds an association
between a particular set of characteristics relating to the data subject and one or more pseudonyms.
From a functional point of view, pseudonymous data sets can be associated as the pseudonyms allow
associations between sets of characteristics, while disallowing association with the data subject. As a result it
becomes possible, e.g., to carry out longitudinal studies to build cases from real patient data while protecting
their identity.
In irreversible pseudonymization, the conceptual model does not contain a method to derive the association
between the data-subject and the set of characteristics from the pseudonym.
Figure 5 — Reversible pseudonymization
In reversible pseudonymization (see Figure 5), the conceptual model includes a way of re-associating the
data-set with the data subject.
There are two methods to achieve this goal:
a) derivation from the payload; this could be achieved by, for instance, encrypting identifiable information
along with the payload;
b) derivation from the pseudonym or via a lookup-table.
Reversible pseudonymization can be established in several ways whereby it is understood that the reversal of
the pseudonymization should only be done by an authorized entity in controlled circumstances. The policy
framework regarding re-identification is described in Clause 9 and should take care of this. Reversible
pseudonymization compared to irreversible pseudonymization typically requires increased protection of the
entity performing the pseudonymization.
Anonymized data differ from pseudonymized data as pseudonymized data contain a method to group data
together based on criteria that are derived from the personal data from which they were derived.
5.1.5 Real world pseudonymization
5.1.5.1 Rationale
Subclause 5.1.4 depicts the conceptual approach to pseudonymize where concepts such as “associated”,
“identifiable”, “pseudonymous”, etc. are considered absolute. In practice, the risk for re-identification of data
sets is often difficult to assess. This subclause refines the concepts of pseudonymization and
unwanted/unintended identifiability. As a starting point, the European data privacy protection directive is here
referred to.
Recital 26 of the European Data Privacy Protection Directive states that “to determine whether a person is
identifiable, account should be taken of all the means likely reasonable to be used either by the controller or
by any other person to identify the said person; whereas the principles of protection shall not apply to data
rendered anonymous in such a way that the data subject is no longer identifiable; whereas codes of conduct
within the meaning of Article 27 may be a useful instrument for providing guidance as to the ways in which
data may be rendered anonymous and retained in a form in which identification of the data subject is no
longer possible”.
The recital focuses, as the definition of personal data itself, on “identification”, i.e. the association between
data and data subject.
10 © ISO 2008 – All rights reserved
Statements such as “all the means likely reasonable” and “by any other person” are still too vague. Since the
definition of “identifiable” and “pseudonymous” depend upon the undefined behaviour (“all the means likely
reasonable”) of undefined actors (“by any other person”) the conceptual model in this document should
include “reasonable” assumptions about “all the means” likely deployed by “any other person” to associate
characteristics with data subjects.
The conceptual model will be refined to reflect differences in identifiability and the conceptual model will take
into account “observational databases” and “attackers”.
5.1.5.2 Levels of assurance of privacy protection
Current definitions lack precision in the description of terms such as “pseudonymous” or “identifiable”. It is
unrealistic to assume that all imprecision in the terminology can be removed, because pseudonymization is
always a matter of statistics. But the level of the risk for unauthorized re-identification can be estimated. The
scheme for the classification of this risk should take into account the likelihood of identifying the capability of
data as well as by a clear understanding of the entities in the model and their relationship to each other. The
risk model may in some cases be limited to minimizing the risk of accidental exposure or to eliminate bias in
situations of double-blinded studies, or the risks may be extended to the potential for malicious attacks. The
objective of this estimation shall be that privacy policies, for instance, can shift the “boundaries of imprecision”
and define within a concrete context what is understood by “identifiability” and as a result liabilities will be
easier to assess.
A classification is provided below, but further refinement is required, especially since quantification of re-
identification risks requires the establishment of mathematical models. Running one record through one
algorithm no matter how good that algorithm is, still carries risks of being re-identifiable. A critical step in the
risk assessment process is the analysis of the resulting de-identified data set for any static groups that may be
used for re-identification. This is particularly important in cases where some identifiers are needed for the
intended use. This document does not specify such mathematical models, however, informative references
are provided in the Bibliography.
Instead of an idealized conceptual model that does not take into account data sources (known or unknown)
outside the data model, assumptions shall be made in the re-identification risk assessment method on what
data are available outside the model.
A real-life model should take into account, both directly and indirectly, identifying data. Each use case shall be
analysed to determine the information requirements for identifiers and to determine which identifiers can be
simply blanked, which can be blurred, which are needed with full integrity, and which will need to be
pseudonimized.
Three levels of the pseudonymization procedure, ensuring a certain level of privacy protection, are specified.
These assurance levels consider risks of re-identification based upon consideration of both directly and
indirectly identifying data. The assurance levels consider:
⎯ level 1: the risks associated with the person identifying data elements;
⎯ level 2: the risks associated with aggregating data variables;
⎯ level 3: the risks associated with outliers in the populated database.
The re-identification risk assessment at all levels shall be established as a re-iterative process with regular re-
assessments (as defined in the privacy policies). As experience is gained and the risk model is better
understood, privacy protection and risk assessment levels should be reviewed.
Apart from regular re-assessments, reviews can also be triggered by events, such as a change in the
captured data or introduction of new observational data into the model.
When referring to the assurance levels, the basic denomination of the levels as 1, 2 and 3 could be
complemented by the number of revisions (e.g. level 2+ for a level 2 that has been revised. The latest revision
data should be mentioned and a history of incidents and revisions kept up-to-date). The requested assurance
level dictates what kind of technical and organizational safeguards need to be implemented to protect the
privacy of the subject of data. A low level of pseudonymization will require more organizational measures to
protect the privacy of data than will a high level of pseudonymization.
Assurance level 1 privacy protection: removal of clearly identifying data or easily obtainable indirectly
identifying data.
A first, intuitive level of anonymity can be achieved by applying rules of thumb. This method is usually
implicitly understood when pseudonymized data are discussed. In many contexts, especially when only
attackers with poor capabilities have to be considered, this first level of anonymity may provide a sufficient
guarantee. Identifiable data denotes that the information contained in the data itself is sufficient in a given
context to pinpoint an entity. Names of persons are a typical example. Subclause 6.6.2.2 provides
specification of data elements that should be considered for removal or aggregation to assert an anonymized
data set.
Assurance level 2 privacy protection: considering attackers using external data.
The second level of privacy protection can be achieved when taking into account the global data model and
the data flows inside the model. When defining the procedures to achieve this level, a static risk analysis that
checks for re-identification vulnerabilities by different actors should be performed. Additionally, the presence of
attackers who combine external data with the pseudonymized data to identify specific data sets, should be
considered. The available external data may depend on the legal situation in different countries and on the
specific knowledge of the attacker. As an example the required procedures may include the removal of
absolute time references. A reference time marker “T” is defined as, e.g., the admission of a patient for an
episode of care and other events, e.g. discharge is expressed with reference to this time marker. An attacker
is an entity that gathers data (authorized or unauthorized) with the aim of attempting to attribute to data
subjects, the gathered data in an unauthorized way and thus obtain information to which he is not entitled.
From a risk analysis point of view, data gathered and used by an attacker are called “observational data”.
Note that the disallowed or undesired activity by the attacker is not necessarily the gathering of the data,
rather the attempt to attribute the data to a data subject and consequently gain information about a data
subject in an unauthorized way.
A risk analysis model may include assumptions about attacks and attackers. E.g. in some countries it may be
possible to legally obtain discharge data by entities that are not implicitly involved in the care or associated
administration of patients. The risk analysis model may take into account the likeliness of the availability of
specific data sets.
From a conceptual point of view, an attacker brings data elements into the model that in the ideal world would
not exist.
A policy document should contain an assessment of the possibility of attacks in the given context.
Assurance level 3 privacy protection: considering outliers of data.
The re-identification risk can be seriously influenced by the data itself, e.g. by the presence of outliers or rare
data. Outliers or rare data can indirectly lead to identification of a data subject. Outliers do not necessarily
consist of medical data. For instance, if, on a specific day, only one patient with a specific pathology has
visited a clinic, then observational data on who has visited the clinic that day can indirectly lead to
identification.
When assessing a pseudonymization procedure, just a static model-based risk analysis cannot quantify the
vulnerability due to the content of databases, therefore running regular risk analyses on populated models is
required to provide a higher level of anonymity.
In practice, proof of level 3 privacy protection will be difficult to achieve.
12 © ISO 2008 – All rights reserved
5.2 Categories of data subject
5.2.1 General
This Technical Specification focuses on the pseudonymization of data pertaining to patients/health consumers.
These principles can also be applied to other categories of data subjects such as health professionals and
organizations.
Subclauses 5.2.2 to 5.2.4 enumerate specific categories of data subjects and list a number of issues related to
these categories.
5.2.2 Patient/healthcare consumer
Decisions to protect the identity of the patient may be associated with:
⎯ legal requirements for privacy protection;
⎯ trust relationships between the health professional and patient associated with medical secrecy
principals;
⎯ responsible handling of sensitive disease registries and other public health information resources;
⎯ provision of minimum necessary disclosures of identifiers in the provision of care (e.g. laboratory testing);
⎯ privacy protection to enable secondary use of clinical data for research purposes; be aware that in some
legislations (e.g. in Germany), the secondary use of patient data require informed consent when the data
are only pseudonymized and not fully anonymized).
Continuity of care requires uniform identification of patients and the ability to link information across different
domains. Where data are pseudonymized in the context of clinical care, there is a risk to misidentification or
missed linkages of the patient across multiple domains. In cases where pseudonymization is applied in a
direct care environment, consideration shall be given to patient consent for those cases where the patient
does not want pseudonymization for safety purposes.
5.2.3 Health professionals and organizations
Pseudonymization may also be used to protect the identity of health professionals for a number of purposes
including:
⎯ peer review;
⎯ reporting of medical mishaps o
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...