ISO/PRF 20271-2
(Main)Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals
Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals
This document defines the fundamentals of text documents for long-term preservation covering the concept, elements and components of text documents.
Titre manque — Partie 2: Titre manque
General Information
- Status
- Not Published
- Current Stage
- 5020 - FDIS ballot initiated: 2 months. Proof sent to secretariat
- Start Date
- 31-Mar-2026
- Completion Date
- 04-Apr-2026
Overview
ISO/PRF 20271-2: Document management - Reference model for long-term preservation of textual documents - Part 2: Fundamentals is an international standard developed by ISO Technical Committee 171/SC 2. It defines the fundamental concepts necessary for the long-term preservation of textual documents, with a focus on the elements, components, and reference model structure that support sustainable document management and digital archiving.
As digital transformation accelerates, maintaining accessibility to textual documents across evolving technologies and platforms becomes increasingly challenging. This standard addresses the risks of obsolescence due to changing formats and technologies, providing a structured approach for organizations to preserve the integrity, structure, and meaning of digital textual documents over time.
Key Topics
Fundamental Concepts
The standard sets out the essential ideas behind long-term digital preservation, covering elements such as the layered reference model, definitions of document elements, and property classifications.Multi-layered Reference Model
ISO/PRF 20271-2 introduces an abstract, multi-layered framework for understanding textual documents. The model breaks documents down by aspects such as visualization, content, metadata, and logical structure, providing clarity for both analysis and future-proofing.Document Elements and Properties
The standard describes the variety of elements and property types that textual documents can contain - from simple text to complex layouts, images, tables, domain-specific notations, and more.Abstraction for Evaluation and Development
By defining layers and property classes, the reference model makes it easier to evaluate current file formats for long-term preservation suitability and guides the design or enhancement of new document formats.Recommendations for Preservation
Guidance is provided for identifying which document components and properties are most critical for maintaining document accessibility and meaning, regardless of the evolution of technologies or formats.
Applications
Digital Archiving and Records Management
The standard is a cornerstone for organizations tasked with archiving large volumes of digital textual documents, ensuring that records remain accessible and interpretable for decades.Development of Document Formats
File format designers use the ISO/PRF 20271-2 reference model as a blueprint when creating or updating textual document formats, prioritizing the elements that support long-term reliability and interoperability.Format Assessment and Migration Planning
IT professionals, archivists, and records managers leverage this standard when assessing legacy formats (DOCX, PDF, ODT, TXT, and more) for migration to more sustainable preservation formats.Quality Assurance in Preservation Solutions
The standard provides a checklist for evaluating and improving the capability of technical solutions-such as digital repositories or document management systems-to support effective long-term preservation.Non-technical Stakeholders
ISO/PRF 20271-2 aids users without deep technical format knowledge by outlining clear expectations and terminology for document preservation activities.
Related Standards
ISO 20271-1: Overview
Offers an overview of the ISO 20271 series, detailing roles, interrelationships, and the overall scope of reference models for long-term preservation.ISO 20271-3: Implementation
Addresses how to practically implement reference model principles and recommendations in document management systems.ISO 20271-4: Assessment
Focuses on evaluating and certifying the long-term preservation readiness of textual document formats and solutions.ISO/IEC 26300: Open Document Format (ODF)
Standard for open document file formats widely used in digital archiving.ISO 32000: PDF
Reference for the widely adopted Portable Document Format, including preservation subsets such as PDF/A.
Key terms:
long-term preservation, document management, textual documents, reference model, digital archiving, ISO 20271, metadata, file format evaluation, information lifecycle management, archiving standards, multi-layered abstraction
By following ISO/PRF 20271-2, organizations and stakeholders can confidently plan for the enduring accessibility and authenticity of their valuable digital textual content, supporting regulatory, legal, and business continuity requirements.
Buy Documents
ISO/PRF 20271-2 - Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals/25/2025
ISO/PRF 20271-2 - Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals
REDLINE ISO/PRF 20271-2 - Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals
Get Certified
Connect with accredited certification bodies for this standard

BSI Group
BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

NYCE
Mexican standards and certification body.
Sponsored listings
Frequently Asked Questions
ISO/PRF 20271-2 is a draft published by the International Organization for Standardization (ISO). Its full title is "Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals". This standard covers: This document defines the fundamentals of text documents for long-term preservation covering the concept, elements and components of text documents.
This document defines the fundamentals of text documents for long-term preservation covering the concept, elements and components of text documents.
ISO/PRF 20271-2 is classified under the following ICS (International Classification for Standards) categories: 35.240.30 - IT applications in information, documentation and publishing; 37.080 - Document imaging applications. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/PRF 20271-2 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
DRAFT
International
Standard
ISO/DIS 20271-2
ISO/TC 171/SC 2
Document management —
Secretariat: ANSI
Reference model for long-
Voting begins on:
term preservation of textual
2025-04-22
documents —
Voting terminates on:
2025-07-15
Part 2:
Fundamentals
ICS: 37.080; 35.240.30
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
This document is circulated as received from the committee secretariat.
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS.
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Reference number
ISO/DIS 20271-2:2025(en)
DRAFT
ISO/DIS 20271-2:2025(en)
International
Standard
ISO/DIS 20271-2
ISO/TC 171/SC 2
Document management —
Secretariat: ANSI
Reference model for long-
Voting begins on:
term preservation of textual
documents —
Voting terminates on:
Part 2:
Fundamentals
ICS: 37.080; 35.240.30
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
© ISO 2025
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
STANDARDS MAY ON OCCASION HAVE TO
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
This document is circulated as received from the committee secretariat. BE CONSIDERED IN THE LIGHT OF THEIR
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
or ISO’s member body in the country of the requester.
NATIONAL REGULATIONS.
ISO copyright office
RECIPIENTS OF THIS DRAFT ARE INVITED
CP 401 • Ch. de Blandonnet 8
TO SUBMIT, WITH THEIR COMMENTS,
CH-1214 Vernier, Geneva
NOTIFICATION OF ANY RELEVANT PATENT
Phone: +41 22 749 01 11
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/DIS 20271-2:2025(en)
ii
ISO/DIS 20271-2:2025(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Textual documents . 4
5 Reference model for textual documents . 7
5.1 Purpose .7
5.2 Applicability .7
5.3 Rationale .8
5.4 Multi-Layered reference model .9
5.4.1 Definition of the layers of the reference model. .10
5.4.2 Definition of property types of each layer . 12
5.4.3 Recommendations for assessing each layer . 13
6 Target documents for applying reference model .16
6.1 Type of document for the reference model .16
6.2 Types of content included in textual document .16
6.2.1 Types of content .16
6.2.2 Text .17
6.2.3 Image .17
6.2.4 Table .19
6.2.5 Domain-specific notations .19
6.2.6 Review and Comment .21
6.2.7 Other Types . 22
Bibliography .23
iii
ISO/DIS 20271-2:2025(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any patent
rights identified during the development of the document will be in the Introduction and/or on the ISO list of
patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 171, Document Management Application,
Subcommittee SC 2, EDMS systems and authenticity of information.
A list of all parts in the ISO 20271 series can be found on the ISO website.
ISO 20271 series consists of the following parts, under the general title Document management — Reference
model for long-term preservation of textual documents:
— Part 1: Overview
— Part 2: Fundamentals
— Part 3: Implementation
— Part 4: Assessment
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
ISO/DIS 20271-2:2025(en)
Introduction
Over time, various file formats have been created and eventually phased out, leading to a situation
where files stored in obsolete formats become inaccessible. This occurs due to the disappearance of the
technologies and standards upon which these formats were based, coupled with inadequate preservation
efforts or updates to current technologies. As a result, digital files produced several decades ago are
rendered inaccessible, as there is no longer sufficient information about the file structure, making the data
unreadable or unanalysable. This issue, which affects files created many decades ago, has led to significant
discussions among countries and organisations on the long-term preservation of digital files, underlining its
importance as a critical issue in the field of digital archiving.
In the ISO 20271 series, this document defines a reference model for textual documents, which incorporates
multiple abstraction layers for technical analysis and quantitative evaluation. It specifies the definitions
for these layers and the categorisation of properties contained within them. When storing properties at
each layer according to specific file formats, the document establishes technical criteria to ensure that
appropriate measures are in place to address potential obsolescence of the preserved files. It defines
specific review targets within the file to assess the long-term preservation capability of storage formats for
digital documents. Furthermore, it provides guidance on evaluating the long-term preservation of storage
standards, designing new file formats for textual documents, and adding new properties to existing textual
document standards. This document also presents considerations for referencing and addressing when
improving these standards.
This document supports the following activities:
— Format analysis activities for selecting and preparing the evaluation of formats for the long-term
preservation of textual document file formats.
— Technical activities for selecting design targets and performing structural design when developing new
textual document format specifications.
— Activities related to adding specific properties or making structural improvements to existing textual
document format specifications.
— Classification activities concerning textual document formats, including the addition of specific
properties or structural improvements to existing specifications.
For information related to other parts of the ISO 20271 series, the ISO 20271-1 document can be referred to.
ISO 20271-1 provides an overview of the roles, interrelationships between parts, and the scope of the entire
ISO 20271 series.
v
DRAFT International Standard ISO/DIS 20271-2:2025(en)
Document management — Reference model for long-term
preservation of textual documents —
Part 2:
Fundamentals
1 Scope
This document specifies the reference model for textual documents and provides detailed recommendations
necessary to support long-term preservation from various perspectives, based on the reference model.
ISO 20271-2 defines the fundamental concepts of the reference model for textual documents. This includes
the definitions of layers that make up the reference model, elements incorporated within textual documents,
property types, classifications of properties by type, and various properties inherent to textual documents.
Additionally, it defines the concepts and structure of a long-term preservation reference model for digital
documents, which can be applied to other types of documents beyond textual documents.
ISO 20271-2:
— defines textual documents and outlines major content properties that consist of a textual document.
— provides the concepts of the reference model for textual documents, defines key elements included within
the textual documents, and outlines recommendations for enhancing long-term preservation based on
the reference model.
— provides guidelines for classifying various properties that can be included in textual documents as
outlined in ISO 20271-3 by reference model layers, along with examples of classification and guidelines
for enhancing long-term preservation post-classification.
ISO 20271-2 does not specify the following:
— specific technical methods for checking whether the properties exist within a specific textual
document or not.
— specific technical methods for analysing a textual document format such as DOC, DOCX, ODT, TXT, PDF, etc.
— specific metadata items for the long-term preservation of textual documents.
— required computer hardware or operating system.
— does not recommend specific textual document file formats as suitable for long-term preservation.
— does not recommend any processes, procedures, or management practices associated with long-term
preservation, records management.
This document provides technical recommendations for organizations, individuals, and both public and
private entities involved in designing digital textual documents, assessing existing file formats, or enhancing
file format specifications. Its primary aim is to ensure that these documents remain technically interpretable
and understandable despite potential obsolescence, while accommodating various requirements and levels
of information. These recommendations are particularly valuable for users who are not fully acquainted
with the technical characteristics of file formats or the core content elements of textual documents.
ISO/DIS 20271-2:2025(en)
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
recommendations of this document. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org
3.1
ActiveX
deprecated software framework created by Microsoft that adapts its earlier Component Object Model (COM)
and Object Linking and Embedding (OLE) technologies for content downloaded from a network, particularly
from the World Wide Web
3.2
ASCII (American Standard Code for Information Interchange)
character encoding standard that is abbreviated from American Standard Code for Information Interchange,
is a character encoding standard for electronic communication
3.3
ASMO708
7-bit character encoding standard specifically designed for Arabic text
3.4
Big5
character encoding standard that is a Chinese character encoding method used in Taiwan, Hong Kong, and
Macau for traditional Chinese characters
3.5
DOCX
file format, especially for Office Open XML documents
3.6
Elements
components included in a textual document
3.7
EUC (Extended Unix Code)
character encoding standard that is a multibyte character encoding system used primarily for Japanese,
Korean, and simplified Chinese (characters)
3.8
EUC-KR (Extended Unix Code for Korean)
character encoding standard that is an 8-bit character encoding that utilizes KS X 1001(Korea Industrial
Standards) and KS X 1003, a variant of Extended Unix Code (EUC)
Note 1 to entry: As it is a representative completed Korean encoding, it is commonly referred to as ‘Wansung’.
3.9
HWPX
file format, especially for word processing documents based on Open Word Processor Markup Language
(OWPML), which is used by most public institutions in Republic of Korea and designated as a permitted
format for the long-term preservation of official documents
ISO/DIS 20271-2:2025(en)
3.10
Johap
encoding specification and Korean character set that served as industrial standards in South Korea during
the early 1990s
3.11
Kihon-Hanmen
“basic reverse”, a term used in Japanese martial arts, particularly in the context of kata (forms) and training
methodologies
3.12
LTR (Left to Right)
languages like English, French, and Spanish follow a reading direction that starts from the left and moves to
the right
3.13
MathML (Mathematical Markup Language)
standard that is a mathematical markup language, an application of XML for describing mathematical
notations and capturing both its structure and content and is one of a number of mathematical markup
languages
3.14
OLE (Object Linking and Embedding)
technical specification that is a proprietary technology developed by Microsoft that allows embedding and
linking to documents and other objects
3.15
OOXML (Office Open XML)
file format that is developed by Microsoft for representing spreadsheets, charts, presentations and word
processing documents (ISO/IEC 29500)
3.16
ODT (Open Document Text)
file format, especially for word processing document of Open Document Format
3.17
ODF (Open Document Format)
open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-
compressed XML files (ISO/IEC 26300)
3.18
OWPML (Open Word Processor Markup Language)
file format that is abbreviated from Open Word Processor Markup Language (OWPML), which follows the
standard (KS X 6101), is a file format developed by Hancom Inc. in 2010
3.19
Plug-In
technical specification that is used as a software component, which adds a specific feature to an existing
computer program
3.20
PDF (Portable Document Format)
standard that is a file format developed by Adobe in 1992 to present documents, including text formatting
and images, in a manner independent of application software, hardware, and operating systems (ISO 32000)
3.21
property
attribute, element (3.6), and other component found in textual documents, which is subject to long-term
preservation
ISO/DIS 20271-2:2025(en)
3.22
raster image
graphics and digital photography, a raster graphics represents a two-dimensional picture as a rectangular
matrix or grid of square pixels, viewable via a computer display, paper, or other display medium
3.23
rendering engine
software component responsible for converting document content (such as text, images, and formatting
instructions) into a visual or printable output on various devices, like screens or printers
Note 1 to entry: It interprets the document's code or format and displays it in a way that users can view or interact with.
3.24
RTF (Rich Text Format)
file format that is a proprietary document file format with published specification developed by Microsoft
Corporation from 1987 until 2008 for cross-platform document
3.25
RTL (Right to Left)
languages like Arabic, Hebrew, and Persian follow a reading direction that starts from the right and moves
to the left
3.26
semantic information
properties related to textual documents encompass all meanings contained within the document, including
information that conveys the structural aspects of the textual document
3.27
Shift-JIS
character encoding standard that is for the Japanese language
Note 1 to entry: Originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft
and standardized as JIS X 0208 Appendix 1
3.28
SVG (Scalable Vector Graphics)
XML-based vector image format for defining two-dimensional graphics, having support for interactivity and
animation
3.29
UNICODE
character encoding standard maintained by the Unicode Consortium designed to support the use of text
written in all of the world’s major writing systems
3.30
vector image
form of computer graphics in which visual images are created directly from geometric shapes defined on a
Cartesian plane, such as points, lines, curves, and polygons
3.31
XML (eXtensible Markup Language)
markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a
set of rules for encoding documents in a format that is both human-readable and machine-readable
4 Textual documents
A textual document refers to a type of document typically created, saved, or printed using various types
of documents editing software, usually utilizing file extensions such as TXT, DOCX, Open Document
Text(ODT), Portable Document Format(PDF), Hangul Word Process XML(HWPX), TeX, Hyper-Text Markup
Language(HTML) and more.
ISO/DIS 20271-2:2025(en)
Textual documents can encompass a wide range of content, from simple text to multimedia elements like
images, videos, and audio, while supporting rich expressions through various styles, complex layouts, and
integration with external elements such as fonts.
The structural types of content contained in a textual document range from the simplest type that includes
only text to types that include multimedia elements such as images and videos, as well as various properties
that enable rich expressions such as different styles and complex layouts through integration with
external elements such as fonts. There exist various types that allow for diverse properties and enable rich
representations.
The reference model defined in this standard aims to provide a layered abstraction of technical information,
which helps break down the structure of textual documents. This breakdown facilitates the establishment
of evaluation criteria for long-term preservation and the categorization of documents. In practical
applications, this reference model may require additional layers beyond the five foundational abstract layers
initially identified for standard textual documents. These foundational layers typically include aspects
such as content, structure, presentation, interaction, and metadata. Additional layers may be necessary for
documents that differ significantly from textual documents, such as spreadsheets for storing or analysing
numerical data and presentation documents that incorporate dynamic features like animations. This
standard primarily focuses on text-centric documents that store, preserve, and deliver information conveyed
through text content, ranging from simple structured formats to those with complex layouts.
Figure 1 — Examples of textual documents
Figure 2 illustrates various types of textual documents. These documents can have the following layout
characteristics : a header, footer, and body aligned to fit a specific paper size; a body composed of several
paragraphs, each of which can be represented by one or more sections; paragraphs constructed from
characters encoded in various standards (such as Unicode, American Standard Code for Information
ISO/DIS 20271-2:2025(en)
Interchange (ASCII), Shift-JIS, Extended Unix Code for Korean (EUC-KR), Big5 and so on), images, and tables,
and can include various styles for decorating them.
Figure 2 — Different types of text flow in a paragraph
These textual documents are not just digitized forms but also reflect the cultural characteristics of
the countries and regions where they are used. For example, in the West, lists are often used in the
documentation, while in Asia, tables are often used to layout documents, and paragraphs are sometimes
written vertically. As shown in Figure 3, in the Arabic script, the character flow of a paragraphs is rtl or ltr.
ISO/DIS 20271-2:2025(en)
Figure 3 — The diverse logical structures and layouts of textual documents.
As illustrated in Figure 4, a textual document can be a digital format that simply contains text content, but it
can also incorporate logical structures such as reading sequence and presentation order.
5 Reference model for textual documents
5.1 Purpose
This reference model is a fundamental framework that outlines the properties and recommendations for
each layer of textual documents that ref be relevant to long-term preservation. It serves as a common basis
for understanding, analysing, and establishing criteria to assess the technical considerations necessary for
the long-term preservation of textual documents. The model enables the examination and determination
of long-term preservation viability for various formats with distinct technical foundations, promoting
consistent integration, interoperability, scalability, maintainability, and functionality among technical
tools and programs. However, it doesn’t address specific technical or implementation details regarding the
analysis of individual file formats.
5.2 Applicability
This reference model is a useful tool for professionals in documents management and institutional
documents management personnel to establish criteria for evaluating the long-term preservation of specific
ISO/DIS 20271-2:2025(en)
file formats. It is recommended that software developers adopt these long-term preservation standards
to design a common structure and interface for textual documents. This will ensure compatibility and
interoperability between different technologies and systems. Additionally, reference markup is provided as
a valuable resource for developing new formats to represent textual documents or improve the long-term
preservation capabilities of existing formats.
5.3 Rationale
Textual documents can be encoded in a variety of formats, including plain-text documents, PDF, Open
Document Format (ODF), Office Open XML (OOXML), and more. The field of traditional archiving or
document management is currently undergoing technical reviews and international discussions regarding
methods to ensure the long-term preservation of digital content. Various solutions have been suggested and
implemented for the long-term conservation of different types of digital documents. These solutions include:
1) Preserving original formats:
Even if the original document is kept intact for a long time, there is a possibility that it may not be
compatible with the latest technology or that the file format can become outdated, leading to the
inability to access the document. It can also prove difficult to locate software capable of faithfully
viewing the content.
2) Virtualisation to preserve the original file format’s usage environment:
Virtualisation refers to creating a virtual copy of the original computing environment (e.g., hardware,
operating system, software, etc.) needed to access the document. This method allows future users to
access the document as if they were still using the original system, even if the technology has become
obsolete. The usage environment consists of the specific software and configurations required to
render or interact with the document. This option helps preserve access to the document by emulating
the original system, but it presents risks such as copyright infringement, high costs, and complex
management because the required software and operating systems must be maintained or copied for
preservation.
3) Storing textual documents in a standardized long-term preservation format (e.g., PDF/A):
This involves converting documents into formats specifically designed for long-term preservation,
such as PDF/A. While this is a widely accepted method, it does not guarantee that all original document
properties will be fully preserved, particularly in cases where documents contain unique elements (e.g.,
embedded media, dynamic elements) that may not convert well into the new format.
4) Storing textual documents in widely supported formats (which can be proprietary or non-standardized):
In this approach, documents are stored in formats that are currently widely supported, such as
proprietary formats (e.g., DOCX). This option carries the risk that these formats can become obsolete in
the future, but it provides the advantage of using formats that are currently accessible and supported by
various tools.
However, when converting documents into a dedicated visualisation format (e.g., XPS, PDF/A etc.) for long-
term preservation, there is no guarantee that all the information from the original document will get
preserved in the long run. This is because different document formats can have different properties and
conversion software can be limited. Therefore, solutions like virtualization or migration can face potential
issues related to technical obsolescence, legal problems, or the loss of document fidelity.
Data collection is crucial for the development of many technologies related to data analysis, generative AI,
big data etc. Most of this data is either numerical or text-based, and text information may be included from
textual documents. Therefore, it is essential to preserve documents for an extended period to facilitate the
training of AI models. This preservation can be done while preserving the characteristics and semantic
information of the original documents (for example, DOCX, ODT, HWPX, HTML) when converting textual
documents into a dedicated format for long-term preservation.
At present, there is no standard definition or reference model that identifies the types of content or
structural information of textual documents that can be required long-term preservation. This makes it
ISO/DIS 20271-2:2025(en)
difficult to conduct technical analysis based on each document. In the documents management field, where
technical analysis and information on textual documents can be limited, it can be challenging to define long-
term preservation strategies or evaluation conditions for such documents.
It is important to establish preservation strategies for all types of digital content, even those that are difficult
to quantitatively assess, such as inclusion of:
— text within the document
— graphics, such as graphs and charts
— audio and video clips
— hyperlinks and metadata information
— semantics of the original document
— digital signature authentication information
— binary data (closed stream format)
This reference model and its recommendations provide a clear set of criteria for evaluating the long-
term preservation of textual documents. These criteria can be used for both quantitative and qualitative
assessments, and even individuals without technical training can conduct technical analyses using them.
This model can serve as a guideline for identifying a more detailed and quantifiable set of criteria that can
be applied to all textual documents commonly defined in the field of archiving or documents management.
The model includes technical recommendation for evaluating the documents, such as contextual information
support, complexity, interoperability, viability, and reusability.
This abstract Reference Model for textual documents helps analyse the characteristics of documents stored
in different file formats. Technical evaluation subjects are chosen to improve long-term preservation. Reliable
evaluation methods and tools need standardized recommendations and guidelines for their development.
5.4 Multi-Layered reference model
This standard specifies the reference model for textual documents and the recommendations for their long-
term preservation. Textual documents can vary in structure from being very simple to highly complex. In
order to classify textual documents, an abstract reference model is used that categorizes them based on
their visual, descriptive, logical, physical, and content characteristics. For each of these characteristics,
layers are constructed, and properties are mapped to the corresponding reference model. The reference
model for the textual document is defined as a Multi-Layered Reference Model consisting of five layers. The
complete structure of this reference model is illustrated in Figure 2.
ISO/DIS 20271-2:2025(en)
Figure 5 — The reference model of textual documents
5.4.1 Definition of the layers of the reference model.
5.4.1.1 Visualisation layer
The reference model consists of five layers, with the visualisation layer being the first. The visualisation
layer is defined as a layer that includes the properties necessary for visually representing the document.
Textual documents contain text and related information, ranging from simple to complex forms. This
content is displayed on the screen, which is referred to as the visualisation layer among the various layers
that make up the reference model. Textual documents express a variety of content information included in
the document through a series of implemented procedures known as rendering. The properties related to
this are classified into the visualisation layer.
Textual documents can be visualised through the visualisation layer, targeting static content elements such
as text and images, style information including fonts and layout, as well as dynamic elements such as video.
In the case of plain-text documents with no additional visualisation properties such as font face information
and layout style, the visualisation information can depend on the platform or program used to visualise the
document.
Within the reference model, each unit of information is called a property. A property can have characteristics
from multiple layers of the reference model. The properties belonging to the visualisation layer are called
visualisation properties. Depending on their complexity or implementation method, these visualisation
properties can be displayed differently. In some cases, the characteristics of other layers are maintained,
while in other cases, they are not. Depending on whether the original characteristics are preserved or not, it
not only affects the long-term preservation but also the compatibility of the document.
Based on the textual documents reference model, the visualisation layer properties can be used to access
the accuracy and reliability of visualisation, such as if visualization can vary depending on the system or
application used. For example, using an image format can preserve the visual appearance accurately, but
at the detriment of other layers, while other formats can be less visually precise if they support dynamic
reflow or re-layout.
5.4.1.2 Content layer
The second layer of the reference model, the content layer, is defined as the layer responsible for representing
the intrinsic content elements contained within the document.
ISO/DIS 20271-2:2025(en)
Textual documents can include various types of information, ranging from basic text to images, videos,
sounds, charts, and more. The content layer includes properties that represent the crucial information for
all types of textual documents. The basic information included in the content layer may be in a standardized
format for each type or maybe in a non-standardized format.
The content layer plays a critical role in representing the core elements of a document. Among these, the
text property, as the most fundamental aspect of textual documents, holds the highest importance. Other
content properties, depending on their relevance, can also be prioritized based on the specific use case or
requirements. Regardless of the format, it is essential that the properties representing the content itself are
preserved without any loss, ensuring the document's integrity and meaning remain intact.
The properties in the content layer may or may not be visually represented.
5.4.1.3 Metadata layer
The third layer in the reference model is the metadata layer, which encompasses a range of metadata within
textual documents. The metadata layer is defined to represent information classified as metadata included
within the document. This layer plays a crucial role in providing context, structure, and additional details
that enhance the understanding and management of the document's content.
This layer includes properties that do not directly influence the visualisation or the content of the document
and therefore not included in either the visualisation layer or the content layer.
Conversely, visualisation properties or content properties do not belong to the metadata layer. The metadata
layer is associated with additional information contained within the document, which is essential for
providing context and enhancing the usability of the document. Metadata properties may include, but are
not limited to, document summary information, fields within the document, alternative text for accessibility
support, document change tracking information, and notes, depending on the specific needs of the document.
5.4.1.4 Semantics layer
The fourth layer of the Reference Model is the semantics layer, which is defined to contain structured content
that expresses semantic information and conveys meaning within the document.
The semantics layer may not be present in simple textual documents that contain only basic content
information, such as plain text or images. However, when textual documents include properties that
encompass various structural information—such as paragraphs, lists, headers, table titles, and figure
titles—they are included in the semantics layer. Additionally, the properties within the semantics layer can
be applied to the visualisation layer.
If properties within the semantics layer are omitted, it can result in discrepancies between the visualised
part of the document and the original document, This can lead to alterations or omissions in the logical
structure and contextual representation of the original document.
The semantics layer can include properties such as information distinguishing paragraphs, headers, footers,
the flow order of document content, captions for images or tables, automatically assigned paragraph
numbers, table of contents information, footnotes, endnotes, and other related properties.
5.4.1.5 Package layer
The fifth layer of the reference model is the package layer, which is defined to represent the method and all
related properties for converting textual documents into data streams and storing them in physical storage,
along with any associated information. This layer treats document data as data streams and ensures that
the necessary information for storage and retrieval is included.
However, for embedded files within textual documents, they are treated as stream objects, and the specific
storage methods related to individual formats are not directly managed.
A key challenge related to the package layer is the potential risk of data loss when storing digital documents
on physical storage devices that use non-standard or proprietary formats. This can lead to difficulties in
interpreting the respective data streams, as well as potential issues with the storage medium itself. The
ISO/DIS 20271-2:2025(en)
package layer can include additional information to address structural responses to errors in the storage
medium or compatibility with evolving technologies that can make data stream interpretation difficult.
Moreover, compression techniques used to bundle multiple files into a single physical unit during the
construction of textual documents are also managed within the package layer.
The package layer can incorporate properties defined by separate, independent specifications or standards,
potentially with more ease than the visualisation, contents, semantics, and metadata layers.
5.4.2 Definition of property types of each layer
5.4.2.1 Visualisation property
Visualisation Properties are those properties that affect the visualisation of textual documents. These
properties can possess multiple characteristics, as they can affect or apply to multiple layers. They are
related to visual information, and even if visual characteristic information is hidden by the user or other
elements and is not visually observable, it still be preserved. they can occupy space within the document
and influence the style properties in which documents or information are represented, making them part of
the visualisation properties.
Visual properties that are used to represent or display content on a document include style properties like
font, margin, colour, and text decoration. In addition, content properties like text, image and table can also be
considered as visualisation properties in some formats as they directly contribute to the visual appearance
of the document on a screen.
5.4.2.2 Content property
Content property refers to the properties that represents the informational content of textual documents.
These properties can have inter-related characteristics and can be applied to multiple layers. Content
properties can or cannot be displayed when the document is visualised. A type of content that is commonly
used in documents is text. This can be in the form of a publicly standardized code format such as Unicode,
or ASCII, or a code system format used in specific platforms or countries such as CP949, Windows-874,
and Korean-Johap. The image property, another common information type in textual documents, can be
implemented in various formats like BMP, JPG, PNG, and GIF.
When documents are implemented as a format, it’s common to distinguish between style properties
responsible for visualisation and content properties representing the content itself. Hence, preserving
properties related to these two layers can play
...
International
Standard
ISO 20271-2
First edition
Document management —
Reference model for long-
term preservation of textual
documents —
Part 2:
Fundamentals
PROOF/ÉPREUVE
Reference number
ISO 20271-2:2026(en) © ISO 2026
ISO 20271-2:2026(en)
© ISO 2026
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
PROOF/ÉPREUVE
ii
ISO 20271-2:2026(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Textual documents . 2
5 Reference model for textual documents . 5
5.1 Purpose .5
5.2 Approaches and rationale for long-term preservation .6
5.3 Multi-layered reference model .7
5.3.1 General .7
5.3.2 Layers of the reference model .8
5.3.3 Property types of each layer .10
5.3.4 Recommendations for assessing each layer .11
6 Target documents for applying reference model .15
6.1 Type of document for the reference model . 15
6.2 Content . . 15
6.2.1 General . 15
6.2.2 Text .16
6.2.3 Image .16
6.2.4 Table .18
6.2.5 Domain-specific notations .18
6.2.6 Reviewing and commenting .21
6.2.7 Other content elements of textual documents . .21
Bibliography .22
PROOF/ÉPREUVE
iii
ISO 20271-2:2026(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 171, Document management applications,
Subcommittee SC 2, Document file formats, EDMS systems and authenticity of information.
A list of all parts in the ISO 20271 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
PROOF/ÉPREUVE
iv
ISO 20271-2:2026(en)
Introduction
Over time, numerous file formats have been created and subsequently become obsolete, resulting in digital
files that are no longer accessible.
This situation typically arises when the technologies, software environments, or underlying specifications
and standards – whether international, industry-based, or proprietary – are no longer maintained, and
when insufficient information is available to interpret the file structure. Consequently, digital documents
created decades ago can become unreadable or unanalyzable, even though the data itself still exists. This
challenge has prompted sustained discussion among governments and organizations regarding the long-
term preservation of digital documents and has established digital preservation as a critical issue in
electronic document management.
The primary objective of this document is to support the long-term preservation of textual documents
by ensuring that they remain technically interpretable and understandable despite potential format
obsolescence. To achieve this, this document defines a reference model that enables systematic technical
analysis and quantitative evaluation of textual document formats, while accommodating different
preservation requirements and levels of available information.
This document defines multiple abstraction layers for textual documents and specifies the categories of
properties associated with each layer. It establishes technical criteria for recording and assessing these
properties within specific file formats, in order to identify risks related to long-term accessibility and
interpretability.
The reference model defined in this document serves as a practical resource for professionals involved in
document management, including institutional archivists and records managers, by providing a common
basis for evaluating the long-term preservation readiness of textual document formats, in order to support
consistent structure, interoperability, and long-term interpretability across different technologies and
systems. In addition, the reference markup presented in this document can be used as a reference when
developing new textual document formats or when enhancing the long-term preservation capabilities of
existing formats.
Accordingly, this document supports the following activities:
— format analysis for selection and evaluation of textual document formats for long-term preservation;
— technical design activities related to the development of new textual document format specifications;
— activities aimed at improving existing textual document standards through the addition of properties or
structural refinements;
— classification and comparative analysis of textual document formats.
The ISO 20271 series currently consists of the following parts:
1)
— Part 1 (ISO 20271-1 ) provides an overview and contextual background for this document;
— Part 2 (this document) defines the fundamental concepts of the reference model;
2)
— Part 3 (ISO 20271-3 ) defines a taxonomy and XML-based reference markup for digital preservation.
1) Under preparation. Stage at the time of publication: ISO/DIS 20271-1:2026.
2) Under preparation. Stage at the time of publication: ISO/WD 20271-3:2026.
PROOF/ÉPREUVE
v
International Standard ISO 20271-2:2026(en)
Document management — Reference model for long-term
preservation of textual documents —
Part 2:
Fundamentals
1 Scope
This document specifies fundamental concepts of the reference model for textual documents and provides
guidance to support long-term preservation from the perspectives of its five layers.
It defines:
— the layers that constitute the reference model for textual documents;
— the types of elements incorporated within textual documents;
— property types associated with textual documents;
— classifications of properties by type; and
— properties inherent to textual documents relevant to long-term preservation.
This document does not cover:
— specific technical methods for checking whether the properties exist within a specific textual document;
— specific technical methods for analysing particular textual document format (e.g. DOC, DOCX, ODT, TXT,
PDF);
— specific metadata items for the long-term preservation of textual documents;
— processes, procedures, or management practices related to long-term preservation or records
management.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
3.1
Common file formats
documents formats include plain-text format (TXT), Office Open XML (DOCX), Open Document Text (ODT),
Portable Document Format (PDF), Open Word Processor Markup Language (OWPML), TeX and Hypertext
Markup Language (HTML)
3.2
element
component included in a textual document (3.6)
3.3
property
attribute, element (3.2), and other components found in textual documents (3.6), which are subject to long-
term preservation
3.4
rendering engine
software component responsible for converting document content (such as text, images, and formatting
instructions) into a visual or printable output on various devices, like screens or printers
Note 1 to entry: It interprets the document's code or format and displays it in a way that users can view or interact
with.
3.5
semantic information
properties of textual documents (3.6) that encompass all semantic content
Note 1 to entry: This includes both the substantive information and the structural aspects that convey meaning within
the document.
3.6
textual document
document that conveys its core message primarily through the use of human language characters, regardless
of the encoding or rendering method used
Note 1 to entry: A textual document may also include structured layouts, stylesheets, images, audio and other
embedded content elements.
Note 2 to entry: Common file formats for textual documents include plain text (TXT), Office Open XML (DOCX),
OpenDocument Text (ODT), Portable Document Format (PDF), Hangul Word Processor XML (HWPX), Hypertext
Markup Language (HTML) and TeX.
3.7
unicode
character encoding standard maintained by the Unicode Consortium designed to support the use of text
written in all of the world’s major writing systems
3.8
vector image
form of computer graphics in which visual images are created directly from geometric shapes defined on a
Cartesian plane, such as points, lines, curves, and polygons
4 Textual documents
Textual documents can be represented in a variety of file formats.
Common file formats for textual documents include TXT, DOCX, ODT, PDF, HWPX, HTML and TeX.
Textual documents can encompass a wide range of content, from simple text to multimedia elements like
images, videos and audio. They additionally support rich expressions through various styles, complex
layouts and integration with external elements such as fonts.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
The structural content types in textual documents range from simple text-only formats to those
incorporating multimedia elements like images and videos. Furthermore, these documents contain various
properties enabling rich expression – such as diverse styles and complex layouts – through integration with
external elements like fonts.
The reference model defined in this document aims to provide a layered abstraction of technical
information, which helps break down the structure of textual documents. This breakdown facilitates
the establishment of evaluation criteria for long-term preservation and the categorization of documents.
In practical applications, this reference model can require additional layers beyond the five foundational
abstract layers initially identified for textual documents. These foundational layers typically include aspects
such as content, structure, presentation, interaction and metadata. Additional layers can be necessary for
documents with non-textual primary content, such as spreadsheets (for numerical data) and presentation
files (with dynamic features like animations). This document primarily focuses on text-centric documents
that store, preserve and deliver information conveyed through text content, ranging from simple structured
formats to those with complex layouts.
Figure 1 — Examples of textual documents
Figure 1 illustrates various types of textual documents. These documents can have the following layout
characteristics:
— a header, footer, and body aligned to fit a specific paper size;
— a body composed of several paragraphs, each of which can be represented by one or more sections;
— paragraphs constructed from characters encoded in various standards [such as Unicode, American
Standard Code for Information Interchange (ASCII), Shift-JIS, Extended Unix Code for Korean (EUC-KR),
Big5];
— images, and tables, including various styles to decorate them.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
Key
1 vertical writing mode
2 horizontal writing mode
3 character size
4 line gap
5 inter-line space
6 left-to-right base direction
7 right-to-left base direction
NOTE Specific language examples are preserved where they demonstrate different layout structures and writing
directions, as these are essential to illustrate the concept rather than requiring translation.
Figure 2 — Different types of text flow in a paragraph
These textual documents are not just digitized forms but also reflect the cultural characteristics of the
countries and regions where they are used. For example, different regional documentation practices vary
in their approach: some frequently use lists, others commonly employ tables for layout, and certain writing
traditions utilize vertical paragraph orientation. As shown in Figure 2, in the Arabic script, the character
flow of paragraphs combines right-to-left (for Arabic text) and left-to-right (for embedded Chinese, Japanese,
Korean, Latin characters, or numerals), though the overall paragraph direction remains right-to-left.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
Key
1 text progression
2 size of the illustration < left-right size of the column
3 size of the illustration = left-right size of the column
4 size of the illustration = top-bottom size of the two columns
5 size of the illustration = top-bottom size of basic layout
6 left-right size of the column
7 size of the illustration left-right size of basic layout
8 top-bottom size of the column
9 Size of the illustration = top-bottom size of the column
Figure 3 — Diverse logical structures and layouts of textual documents
As illustrated in Figure 3, a textual document can be a digital format that simply contains text content, but it
can also incorporate logical structures such as reading sequence and presentation order.
5 Reference model for textual documents
5.1 Purpose
This reference model is a fundamental framework that outlines the properties and recommendations for
each layer of textual documents that can be relevant to long-term preservation. It serves as a common basis
for understanding, analysing and establishing criteria for assessing the technical considerations required
for long-term preservation of textual documents. The model enables the examination and determination
of long-term preservation viability for various formats with distinct technical foundations, promoting
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
consistent integration, interoperability, scalability, maintainability and functionality among technical tools
and programs. However, this reference model does not address specific technical or implementation details
regarding the analysis of individual file formats.
5.2 Approaches and rationale for long-term preservation
Textual documents can be encoded in a variety of formats, including plain-text documents, and those
specified in standards such as ISO 32000 series (PDF), the ISO/IEC 26300 series (Open Document Format),
the ISO/IEC 29500 series (Office Open XML). At the time of publication of this document, the field of
archiving, records management and document management continue to examine methods for the long-term
digital preservation of digital documents. Various approaches have been suggested and implemented for the
long-term preservation of different types of digital documents. These approaches include the following.
a) Preserving original formats
Even if the original document is kept intact for a long time, there is a possibility that it is not compatible
with the latest technology or that the file format can become outdated, leading to the inability to access
the document. It can also prove difficult to locate software capable of faithfully viewing the content.
b) Virtualisation to preserve the original file format’s usage environment
Virtualisation refers to creating a virtual copy of the original computing environment (e.g. hardware,
operating system, software) needed to access the document. This method allows future users to access
the document as if they were still using the original system, even if the technology has become obsolete.
The usage environment consists of the specific software and configurations required to render or
interact with the document. This method preserves access by emulating the original system. However,
it carries risks, including copyright infringement, high costs and complex maintenance, as the required
software and operating systems must be preserved.
c) Storing textual documents in a standardized long-term preservation format (e.g. PDF/A)
This involves converting documents into formats specifically designed for long-term preservation,
such as PDF/A. While this is a widely accepted method, it does not guarantee that all original document
properties will be fully preserved, particularly in cases where documents contain unique elements (e.g.
embedded media, dynamic elements) that do not necessarily convert well into the new format.
d) Storing textual documents in widely supported formats (which can be proprietary or non-standardized)
In this approach, documents are stored in formats that are widely supported, such as proprietary
formats (e.g. DOC). This option carries the risk that these formats can become obsolete in the future, but
it provides the advantage of using formats that are accessible and supported by various tools.
e) Converting textual documents to standardised or updated formats
Conversion refers to transforming textual documents from their original file formats into newer or
standardized formats to ensure continued accessibility. This process helps prevent obsolescence by
allowing documents to be opened and used in up-to-date environments. However, it can lead to loss
of information, metadata, or structural elements if the conversion does not fully support the source
format. For this reason, conversion procedures should be clearly documented, and the resulting files
should be verified for fidelity and completeness.
f) Encapsulating textual documents with related resources
Encapsulation involves packaging a textual document together with all its related information, such as
metadata, fonts, schemas, and usage context, into a single container file. This approach ensures that all
components necessary for rendering and interpretation are preserved together. Although encapsulation
can improve integrity and portability, it also increases storage requirements and can depend on
proprietary container structures. Standardized encapsulation formats should be used whenever
possible to maintain interoperability.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
However, when converting documents into a dedicated visualization format (e.g. XPS, PDF/A) for long-term
preservation, there is no guarantee that all the information from the original document will be preserved
in the long run. This is because different document formats can have different properties and conversion
software can be limited. Therefore, solutions like virtualization or migration can face potential issues
related to technical obsolescence, legal problems, or the loss of document fidelity.
Data collection is crucial for the development of many technologies related to data analysis, generative
artificial intelligence (AI), big data etc. Most of this data is either numerical or text-based, and text
information may be included from textual documents. Therefore, it is essential to preserve documents for an
extended period to facilitate the training of AI models. This preservation can be done while preserving the
characteristics and semantic information of the original documents (e.g. DOCX, ODT, HWPX, HTML) when
converting textual documents into a dedicated format for long-term preservation.
At present, there is no standard definition or reference model that identifies the types of content or
structural information of textual documents that can be required for long-term preservation. This makes it
difficult to conduct technical analysis of individual documents. In the field of document management, where
technical analysis and information on textual documents are limited, it can be difficult to define long-term
preservation strategies or evaluation conditions for such documents.
It is important to establish preservation strategies for various existing types of digital content that can
include:
— text within the document;
— graphics, such as graphs and charts;
— audio and video clips;
— hyperlinks and metadata information;
— semantics of the original document;
— digital signature authentication information;
— binary data (closed stream format).
This reference model and its recommendations (see 5.3.4) provide a clear set of criteria and technical
factors – such as contextual information support, complexity, interoperability, viability, and reusability – for
evaluating the long-term preservation of textual documents.
These criteria can be applied in both quantitative and qualitative assessments, and even individuals without
technical training can conduct technical analyses. The model also serves as guidance for identifying a more
detailed and measurable set of criteria applicable to textual documents commonly defined in archiving or
document management.
5.3 Multi-layered reference model
5.3.1 General
Textual documents can vary in structure from being very simple to highly complex. In order to classify
textual documents, an abstract reference model is used that categorizes them based on their visual,
descriptive, logical, physical characteristics, and on the content itself. For each of these characteristics,
layers are constructed, and properties are mapped to the corresponding reference model. The reference
model for the textual document is defined as a multi-layered reference model consisting of five layers. The
complete structure of this reference model is illustrated in Figure 4.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
Figure 4 — Reference model of textual documents
5.3.2 Layers of the reference model
5.3.2.1 Visualization layer
The visualization layer is defined as a layer that includes the properties necessary for visually representing
the document.
Textual documents contain text and related information, ranging from simple to complex forms. This content
is displayed on the screen, which is referred to as the visualization layer among the various layers that make
up the reference model. Textual documents express a variety of content through rendering processes. The
properties related to this are classified into the visualization layer.
Textual documents can be visualized through the visualization layer, targeting static content elements such
as text and images, style information including fonts and layout, as well as dynamic elements such as video.
In the case of plain-text documents with no additional visualization properties such as font face information
and layout style, the visualization information can depend on the platform or program used to visualize the
document.
This dependence should be recognised as a potential risk for long-term preservation, particularly for
documents requiring specific rendering (e.g. ASCII art, complex scripts, bidirectional text). Possible
approaches to mitigate this risk include providing coordinate-based or image-based representations,
embedding rendering metadata, or using standardised formats such as PDF/A. These approaches are
provided as informative examples, not as requirements.
Within the reference model, each unit of information is called a property. A property can have characteristics
from multiple layers of the reference model. The properties belonging to the visualization layer are called
visualization properties. Depending on their complexity or implementation method, these visualization
properties can be displayed differently. In some cases, the characteristics of other layers are maintained,
while in other cases, they are not. Depending on whether the original characteristics are preserved or not, it
not only affects the long-term preservation but also the compatibility of the document.
Based on the textual documents reference model, the visualization layer properties can be used to access
the accuracy and reliability of visualization, such as if visualization can vary depending on the system or
application used. For example, using an image format can preserve the visual appearance accurately, but
at the detriment of other layers, while other formats can be less visually precise if they support dynamic
reflow or re-layout.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
5.3.2.2 Content layer
The second layer of the reference model, the content layer, is defined as the layer responsible for representing
the intrinsic content elements contained within the document.
Textual documents can include various types of information, ranging from basic text to images, videos,
sounds, charts, and more. The content layer includes properties that represent the crucial information for
all types of textual documents. The basic information included in the content layer can be in a standardized
format for each type or in a non-standardized format.
The content layer plays a critical role in representing the core elements of a document. Among these, the
text property constitutes the core element of textual documents and therefore represents the most essential
aspect for their preservation.
Other content properties, depending on their relevance, can also be prioritized based on the specific use
case or requirements. Regardless of the format, it is essential that the properties representing the content
itself are preserved without any loss, ensuring the integrity and meaning of the document remain intact.
The properties in the content layer can be visually represented, or not.
5.3.2.3 Metadata layer
The third layer in the reference model is the metadata layer, which encompasses a range of metadata within
textual documents. The metadata layer is defined to represent information classified as metadata included
within the document. This layer plays a crucial role in providing context, structure, and additional details
that enhance the understanding and management of the document's content.
This layer includes properties that do not directly influence the visualization or the content of the document
and therefore are not included in either the visualization layer or the content layer.
Conversely, visualization properties or content properties do not belong to the metadata layer. The metadata
layer is associated with additional information contained within the document, which is essential for
providing context and enhancing the usability of the document. Metadata properties can include, but are
not limited to, document summary information, fields within the document, alternative text for accessibility
support, document change tracking information, and notes, depending on the specific needs of the document.
5.3.2.4 Semantics layer
The fourth layer of the reference model is the semantics layer, which is defined to contain structured content
that expresses semantic information and conveys meaning within the document.
It is possible that the semantics layer is not present in simple textual documents that contain only basic
content information, such as plain text or images. However, when textual documents include properties
that encompass various structural information, such as paragraphs, lists, headers, table titles, and figure
titles, they are included in the semantics layer. Additionally, the properties within the semantics layer can be
applied to the visualization layer.
If properties within the semantics layer are omitted, it can result in discrepancies between the visualized
part of the document and the original document. This can lead to alterations or omissions in the logical
structure and contextual representation of the original document.
The semantics layer can include properties such as information distinguishing paragraphs, headers, footers,
the flow order of document content, captions for images or tables, automatically assigned paragraph
numbers, table of contents information, footnotes, endnotes and other related properties.
5.3.2.5 Package layer
The fifth layer of the reference model is the package layer, which is defined to represent the method and all
related properties for converting textual documents into data streams and storing them in physical storage,
along with any associated information. This layer treats document data as data streams and ensures that
the necessary information for storage and retrieval is included.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
However, for embedded files within textual documents, they are treated as stream objects, and the specific
storage methods related to individual formats are not directly managed.
A key challenge related to the package layer is the potential risk of data loss when storing digital documents
on physical storage devices that use non-standard or proprietary formats. This can lead to difficulties in
interpreting the respective data streams, as well as potential issues with the storage medium itself.
In this context, the role of the file system is critical, as it governs the logical structure and accessibility of files
within physical storage. File systems that are non-standard, obsolete, or proprietary can pose additional
risks for long-term preservation due to limitations in compatibility or recoverability. It is therefore essential
to consider the characteristics of the file system when assessing the long-term preservation readiness of the
packaging layer.
The package layer can include additional information to address structural responses to errors in the
storage medium or compatibility with evolving technologies that can make data stream interpretation
difficult. Moreover, compression techniques used to bundle multiple files into a single physical unit during
the construction of textual documents are also managed within the package layer.
The package layer can incorporate properties defined by separate, independent specifications or standards,
potentially with more ease than the visualization, contents, semantics and metadata layers.
5.3.3 Property types of each layer
5.3.3.1 Visualization property
Visualization properties affect the visualization of textual documents. These properties can possess multiple
characteristics, as they can affect or apply to multiple layers. They are related to visual information, and
even if such information is not visible to the user, it can still be preserved, as it can be made visible through
appropriate rendering or presentation mechanisms. They can occupy space within the document and
influence the style properties in which documents or information are represented, making them part of the
visualization properties.
Visual properties that are used to represent or display content on a document include style properties like
font, margin, colour and text decoration. In addition, content properties like text, image and table can also be
considered as visualization properties in some formats, as they directly contribute to the visual appearance
of the document on a screen.
5.3.3.2 Content property
Content properties refer to those that represent the informational content of textual documents. These
properties can have inter-related characteristics and can be applied to multiple layers. Content properties
can be displayed when the document is visualized, or not. A type of content that is commonly used in
documents is text. This can be represented using publicly standardized character encoding schemes such
as Unicode, or ASCII, or platform-specific encoding schemes such as CP949 or Korean-Johap. The image
property, another common information type in textual documents, can be implemented in various formats
like BMP, JPG, PNG and GIF.
When documents are implemented as a format, it is common to distinguish between style properties
responsible for visualization and content properties representing the content itself. Hence, preserving
properties related to these two layers can play a significant role in enhancing long-term preservation,
enabling users to read and utilize digital documents.
5.3.3.3 Metadata property
The metadata property represents supplementary information, which does not affect the visualization layer
of textual documents and is not incorporated into the content layer.
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
An example of a metadata property is the diverse types of metadata embedded within a document. The
varieties of metadata found in textual documents can include the following.
a) Descriptive metadata, which is the descriptive information about a document. It is used for discovery
and identification. It includes elements such as author, title, abstract, author and keywords.
b) Structural/semantics metadata, which is about containers of data and indicates how compound objects
are put together, for example, how pages are ordered to form chapters. It describes the types, versions,
relationships, and other characteristics of digital materials or a specific part of content.
c) Administrative metadata, which is information to help manage a resource, like resource type,
permissions, and when and how it was created.
d) Reference metadata, which is information about the contents and quality of statistical data.
e) Legal metadata, which provides information about the creator, copyright holder and public licensing, if
provided.
The metadata property should not influence the visualization layer. However, metadata that is involved
in the content layer as content can affect the visualization layer. Even if the properties within the content
layer and metadata layer of a textual document contain similar information, they should be distinguished
and adhere to the recommendations of their respective layers. Moreover, it is often crucial to ensure that
the elimination of all Metadata properties from a document does not lead to any distortion of the visual
information or content of the textual document.
5.3.3.4 Semantics property
Semantics property represents structural and contextual information within textual documents. Generally,
the semantics property does not influence the visualization layer. However, depending on the document
creation and editing tools, it can be visualized for user convenience, thereby affecting the visualization layer.
Even so, like the metadata layer, the semantics layer can be entirely omitted from the document. Even in
such instances, the final form of the document’s visualization layer should remain unaffected.
5.3.3.5 Package property
Package property refers to the properties used for physical packaging, encryption, digital signatures,
integrity verification and other related aspects of a document. These properties are specifically implemented
to assemble textual documents into physical units for storage as data streams or to enable associated
functionalities.
The package property typically operates independently and possibly does not influence other layers. It
includes information related to the method of storing documents in physical storage, details used to verify
document integrity, information required for digital signature verification and information utilized for
encryption and decryption purposes.
5.3.4 Recommendations for assessing each layer
5.3.4.1 Recommendations for the visualization layer
When setting criteria to ensure that the visual representation of a textual document remains unchanged,
the properties defined in the visualization layer of the reference model should meet the following
recommendations.
a) Even if there is no need to preserve visual information, properties that pertain to both the content layer
and the visualization layer should be retained and not omitted.
b) The data representing the properties included in the visualization layer should be expressed as system-
and application-independent values
PROOF/ÉPREUVE
ISO 20271-2:2026(en)
c) Target textual documents that cannot be adequately displayed without additional information should
include visualization properties. This situation can also arise in case of plain-text files. In such cases,
alternative formats should be considered and verified to ensure that they support the required
visualization properties.
d) In the absence of external references, corresponding visualization layer properties may be omitted.
e) Visualization properties can also pertain to other layers depending on the context. Therefore, these
properties should be considered in relation to the recommendations applicable to the associated layers.
As an example of the recommendations mentioned above, in the case of a), the absence of essential properties,
such as text or image, necessary for the composition of the visualization layer, renders it impossible to
maintain not only the visualization properties but also the content properties.
For item b), long-term preservation of documents is archivable only when no information explicitly depending
on values or states within a specific program or software rendering engine used to display documents on
devices such as displays or printers. Instead, representation should rely on coordinate-based vector values,
standardized colour information, and standardized units such as centimetres (cm) and millimetres (mm),
as well as pixel-based coordinates (px), relative values expressed as percentages (%) and typographic units
such as points (pt), where 1 pt equals approximately 0,3528 mm.
For c), which is plain text, additional information i
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
ISO/DIS PRF 20271-2:202x(en)
Style Definition
...
ISO /TC 171/SC 2/WG 10
Style Definition
...
Style Definition
...
Secretariat: ANSI
Style Definition
...
Date: 2026-01-1503-30
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Document management — Reference model for long-term
Style Definition
...
preservation of textual documents — —
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Part 2: Style Definition
...
Style Definition
Fundamentals
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
PROOF
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
St l D fi iti
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
Formatted: Default Paragraph Font
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication
Formatted: Indent: Left: 0 cm, Right: 0 cm, Space
may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying,
or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO Before: 0 pt, No page break before, Adjust space
at the address below or ISO’s member body in the country of the requester. between Latin and Asian text, Adjust space between
Asian text and numbers
ISO copyright office
Formatted: Right: 1.5 cm, Bottom: 1 cm, Gutter: 0 cm,
CP 401 • Ch. de Blandonnet 8
Header distance from edge: 1.27 cm, Footer distance
CH-1214 Vernier, Geneva
from edge: 0.5 cm
Phone: + 41 22 749 01 11
EmailE-mail: copyright@iso.org
Website: www.iso.orgwww.iso.org
Published in Switzerland
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
ii © ISO 2025 2026 – All rights reserved
ii
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Contents
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Left, Space After: 0 pt,
Foreword . v
Line spacing: single
Introduction . vi
Formatted: Adjust space between Latin and Asian text,
1 Scope . 1
Adjust space between Asian text and numbers
2 Normative references . 1
3 Terms and definitions . 1
4 Textual documents . 3
5 Reference model for textual documents . 10
5.1 Purpose . 10
5.2 Approaches and rationale for long-term preservation . 10
5.3 Multi-layered reference model . 12
6 Target documents for applying reference model . 23
6.1 Type of document for the reference model . 23
6.2 Content . 23
Bibliography . 34
Foreword . iv
Introduction . v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Textual documents . 5
5 Reference Model for textual documents . 8
5.1 Purpose . 8
5.2 Applicability . 8
5.3 Rationale . 8
5.4 Multi-Layered Reference Model . 9
5.4.1 Definition of the layers of the reference model. . 10
5.4.2 Definition of property types of each layer . 12
5.4.3 Recommendations for assessing each layer. 13
6 Target documents for applying reference model . 16
6.1 Type of document for the reference model . 16
Formatted: Font: 10 pt
6.2 Types of content included in textual document . 17
Formatted: Font: 10 pt
6.2.1 Types of content . 17
6.2.2 Text . 17
Formatted: FooterCentered, Left, Space Before: 0 pt,
6.2.3 Image . 17
Tab stops: Not at 17.2 cm
6.2.4 Table . 19
Formatted: Font: 11 pt
6.2.5 Domain-specific notations . 19
Formatted: FooterPageRomanNumber, Left, Space
6.2.6 Review and Comment . 21
After: 0 pt, Tab stops: Not at 17.2 cm
iii
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
6.2.7 Other Types . 22
spacing: single
Bibliography . 23
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
iv © ISO 2025 2026 – All rights reserved
iv
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Foreword
Formatted: Font: 11 pt, Bold
ISO (the International Organization for Standardization) is a worldwide federation of national standards
Formatted: HeaderCentered, Left, Space After: 0 pt,
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through Line spacing: single
ISO technical committees. Each member body interested in a subject for which a technical committee has been
Formatted: Adjust space between Latin and Asian text,
established has the right to be represented on that committee. International organizations, governmental and
Adjust space between Asian text and numbers
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of
ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Formatted: English (United Kingdom)
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent rights
in respect thereof. As of the date of publication of this document, ISO had not received notice of (a) patent(s)
Formatted: Font color: Auto
which may be required to implement this document. However, implementers are cautioned that this may not
represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents.www.iso.org/patents. ISO shall not be held responsible for identifying any or all such
patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.htmlwww.iso.org/iso/foreword.html.
Formatted: English (United Kingdom)
This document was prepared by Technical Committee ISO/TC 171, Document management applications,
Formatted: Adjust space between Latin and Asian text,
Subcommittee SC 2, Document file formats, EDMS systems and authenticity of information. Adjust space between Asian text and numbers
Information about A list of all parts in the ISO 20271 series, including its planned and published parts, is
provided in an informative note in can be found on the IntroductionISO website.
Formatted: Default Paragraph Font
Any feedback or questions on this document should be directed to the user’s national standards body. A
Formatted: English (United Kingdom)
complete listing of these bodies can be found at www.iso.org/members.htmlwww.iso.org/members.html.
Formatted: English (United Kingdom)
.
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: FooterCentered, Left, Space Before: 0 pt,
Tab stops: Not at 17.2 cm
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Left, Space
After: 0 pt, Tab stops: Not at 17.2 cm
v
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
Introduction
spacing: single
Over time, numerous file formats have been created and subsequently become obsolete, resulting in digital
files that are no longer accessible.
This situation typically arises when the technologies, software environments, or underlying specifications and
standards —– whether international, industry-based, or proprietary —– are no longer maintained, and when
insufficient information is available to interpret the file structure. Consequently, digital documents created
Formatted: English (United Kingdom)
decades ago maycan become unreadable or unanalyzable, even though the data itself still exists. This challenge
Formatted: English (United Kingdom)
has prompted sustained discussion among governments and organizations regarding the long-term
preservation of digital documents and has established digital preservation as a critical issue in electronic
document management.
The primary objective of this document is to support the long-term preservation of textual documents by
ensuring that they remain technically interpretable and understandable despite potential format
obsolescence. To achieve this, this document defines a reference model that enables systematic technical
analysis and quantitative evaluation of textual document formats, while accommodating different
preservation requirements and levels of available information.
This document defines multiple abstraction layers for textual documents and specifies the categories of
properties associated with each layer. It establishes technical criteria for recording and assessing these
properties within specific file formats, in order to identify risks related to long-term accessibility and
interpretability.
The reference model defined in this document serves as a practical resource for professionals involved in
document management, including institutional archivists and records managers, by providing a common basis
for evaluating the long-term preservation readiness of textual document formats, in order to support
consistent structure, interoperability, and long-term interpretability across different technologies and
systems. In addition, the reference markup presented in this document can be used as a reference when
developing new textual document formats or when enhancing the long-term preservation capabilities of
Formatted: Adjust space between Latin and Asian text,
existing formats.
Adjust space between Asian text and numbers, Tab
stops: Not at 0.7 cm + 1.4 cm + 2.1 cm + 2.8 cm +
Accordingly, this document supports the following activities.:
3.5 cm + 4.2 cm + 4.9 cm + 5.6 cm + 6.3 cm + 7 cm
Formatted: Font: 10 pt
— — format analysis for selection and evaluation of textual document formats for long-term preservation;
Formatted: Font: 10 pt
— — technical design activities related to the development of new textual document format specifications;
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
vi © ISO 2025 2026 – All rights reserved
vi
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
— — activities aimed at improving existing textual document standards through the addition of properties
Formatted: Font: 11 pt, Bold
or structural refinements;
Formatted: HeaderCentered, Left, Space After: 0 pt,
— — classification and comparative analysis of textual document formats.
Line spacing: single
The ISO 20271 series currently consists of the following parts:
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers
1)
— — Part 1 (ISO 20271-1 ) provides an overview and contextual background. for this document;
— — Part 2 (this document) defines the fundamental concepts of the reference model (this document).;
2)
— — Part 3 (ISO 20271-3 ) defines a taxonomy and XML-based reference markup for digital preservation
(under development).
— Part 4 is intended to cover an evaluation framework for assessing long-term preservation readiness
(planned).
Up-to-date information on the development status of these parts is available in the ISO/TC 171/SC 2 work
programme and on the ISO website (https://www.iso.org).
ISO 20271-1 provides contextual background for this document.
This document does not require the implementation of ISO 20271-1 or any other International Standard.
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: FooterCentered, Left, Space Before: 0 pt,
Tab stops: Not at 17.2 cm
1)
Under preparation. Stage at the time of publication: ISO/DIS 20271-1:2026.
Formatted: Font: 11 pt
2)
Under preparation. Stage at the time of publication: ISO/WD 20271-3:2026. Formatted: FooterPageRomanNumber, Left, Space
After: 0 pt, Tab stops: Not at 17.2 cm
vii
DRAFT International Standard ISO/DIS 20271-2:2025(en)
Document management — Reference model for long-term
preservation of textual documents – —
Formatted: Main Title 2, Adjust space between Latin
Part 2:
and Asian text, Adjust space between Asian text and
Fundamentals
numbers
Formatted: Right: 1.5 cm, Bottom: 1 cm, Gutter: 0 cm,
1 Scope
Section start: New page, Header distance from edge:
1.27 cm, Footer distance from edge: 0.5 cm
This document specifies fundamental concepts of the reference model for textual documents and provides
guidance to support long-term preservation from the perspectives of its five layers.
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
It defines:
— — the layers that constitute the reference model for textual documents;
— — the types of elements incorporated within textual documents;
— — property types associated with textual documents;
— — classifications of properties by type; and
— — properties inherent to textual documents relevant to long-term preservation.
This document does not cover:
Formatted: Default Paragraph Font
— — specific technical methods for checking whether the properties exist within a specific textual
document;
— — specific technical methods for analysing particular textual document format (e.g. DOC, DOCX, ODT,
TXT, PDF, etc);
— — specific metadata items for the long-term preservation of textual documents;
— — processes, procedures, or management practices related to long-term preservation or records
management.
2 Normative references
Formatted: Adjust space between Latin and Asian text,
There are no normative references in this document.
Adjust space between Asian text and numbers
Formatted: English (United Kingdom)
3 Terms and definitions
Formatted: Font: 11 pt, English (United Kingdom)
For the purposes of this document, the following terms and definitions apply.
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
Formatted: Default Paragraph Font, English (United
— — ISO Online browsing platform: available at https://www.iso.org/obphttps://www.iso.org/obp
Kingdom)
Formatted: Footer, Left, Space After: 0 pt, Line
— — IEC Electropedia: available at https://www.electropedia.org/https://www.electropedia.org/
spacing: single, Tab stops: Not at 17.2 cm
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
3.1
spacing: single
Common file formats
Formatted: TermNum2
documents formats include plain-text format (TXT, ), Office Open XML (DOCX,), Open Document Text (ODT),
Portable Document Format (PDF,), Open Word Processor XMLMarkup Language (OWPML), Tex,TeX and Formatted: Term(s), Adjust space between Latin and
Hypertext Markup Language (HTML) Asian text, Adjust space between Asian text and
numbers
3.2
Formatted: TermNum2
element
Formatted: Adjust space between Latin and Asian text,
component included in a textual document (3.6)
Adjust space between Asian text and numbers
Formatted: Font: Italic
3.3
property
Formatted: TermNum2
attribute, element (3.6),(3.2), and other componentcomponents found in textual documents, (3.6), which isare
Formatted
...
subject to long-term preservation
Formatted: Font: Italic
3.4
Formatted: TermNum2
rendering engine
Formatted
...
software component responsible for converting document content (such as text, images, and formatting
instructions) into a visual or printable output on various devices, like screens or printers
Note 1 to entry: It interprets the document's code or format and displays it in a way that users can view or interact
Formatted
...
with.
3.5
Formatted: TermNum2
semantic information
Formatted
...
properties of textual documents (3.6) that encompass all semantic content.
Formatted: Font: Italic
Note 1 to entry: This includes both the substantive information and the structural aspects that convey meaning within
Formatted
...
the document.
Formatted: English (United Kingdom)
3.6
Formatted: TermNum2
textual document
Formatted
...
document that conveys its core message primarily through the use of human language characters, regardless
of the encoding or rendering method used
Formatted: Font: 10 pt
Note 1 to entry: A textual document may also include structured layouts, stylesheets, images, audio, and other
embedded content elements.
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted
...
2 © ISO 2025 2026 – All rights reserved
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Note 2 to entry: Common file formats for textual documents include plain text (TXT, ), Office Open XML (DOCX, ),
OpenDocument Text (ODT, ), Portable Document Format (PDF, ), Hangul Word Processor XML (HWPX, ), Hypertext Formatted: Font: 11 pt, Bold
Markup Language (HTML,) and TeX.
Formatted: HeaderCentered, Left, Space After: 0 pt,
Line spacing: single
3.7
Formatted: English (United Kingdom)
unicode
character encoding standard maintained by the Unicode Consortium designed to support the use of text
Formatted: TermNum2
written in all of the world’s major writing systems
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers
3.8
Formatted: TermNum2
vector image
form of computer graphics in which visual images are created directly from geometric shapes defined on a
Formatted: Adjust space between Latin and Asian text,
Cartesian plane, such as points, lines, curves, and polygons
Adjust space between Asian text and numbers
4 Textual documents
Textual documents can be represented in a variety of file formats.
Common file formats for textual documents include TXT, DOCX, ODT, PDF, HWPX, HTML and TeX.
Textual documents can encompass a wide range of content, from simple text to multimedia elements like
Formatted: Adjust space between Latin and Asian text,
images, videos, and audio. They additionally support rich expressions through various styles, complex layouts, Adjust space between Asian text and numbers
and integration with external elements such as fonts.
The structural content types in textual documents range from simple text-only formats to those incorporating
multimedia elements like images and videos. Furthermore, these documents contain various properties
enabling rich expression— – such as diverse styles and complex layouts— – through integration with external
elements like fonts.
The reference model defined in this document aims to provide a layered abstraction of technical information,
which helps break down the structure of textual documents. This breakdown facilitates the establishment of
evaluation criteria for long-term preservation and the categorization of documents. In practical applications,
this reference model maycan require additional layers beyond the five foundational abstract layers initially
identified for textual documents. These foundational layers typically include aspects such as content,
structure, presentation, interaction, and metadata. Additional layers maycan be necessary for documents with
non-textual primary content, such as spreadsheets (for numerical data) and presentation files (with dynamic
features like animations). This document primarily focuses on text-centric documents that store, preserve,
and deliver information conveyed through text content, ranging from simple structured formats to those with
complex layouts.
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: FooterCentered, Left, Space Before: 0 pt,
Line spacing: single, Tab stops: Not at 17.2 cm
Formatted: Font: 11 pt
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
spacing: single
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted: FooterPageNumber, Space After: 0 pt, Line
spacing: single
4 © ISO 2025 2026 – All rights reserved
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Left, Space After: 0 pt,
Line spacing: single
Figure 1 — Examples of textual documents
Formatted: None, Adjust space between Latin and
Asian text, Adjust space between Asian text and
numbers
Figure 1Figure 1 illustrates various types of textual documents. These documents can have the following
Formatted: Adjust space between Latin and Asian text,
layout characteristics :
Adjust space between Asian text and numbers
— a header, footer, and body aligned to fit a specific paper size;
Formatted: List Continue 1, No bullets or numbering,
Adjust space between Latin and Asian text, Adjust space
— a body composed of several paragraphs, each of which can be represented by one or more sections;
between Asian text and numbers
— paragraphs constructed from characters encoded in various standards [such as Unicode, American
Standard Code for Information Interchange (ASCII), Shift-JIS, Extended Unix Code for Korean (EUC-KR),
Big5];
— images, and tables, including various styles for decoratingto decorate them.
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: FooterCentered, Left, Space Before: 0 pt,
Line spacing: single, Tab stops: Not at 17.2 cm
Formatted: Font: 11 pt
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
spacing: single
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
Formatted: Font: 10 pt
Key:
Formatted: Font: 10 pt
1. vertical writing mode
Formatted: Font: 11 pt
Formatted: FooterPageNumber, Space After: 0 pt, Line
spacing: single
6 © ISO 2025 2026 – All rights reserved
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
2. horizontal writing mode
Formatted: Font: 11 pt, Bold
3. character size
Formatted: HeaderCentered, Left, Space After: 0 pt,
Line spacing: single
4. line gap
Formatted: English (United Kingdom)
5. inter-line space
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
6. left-to-right base direction
Formatted: English (United Kingdom)
7. right-to-left base direction
Formatted: English (United Kingdom)
Note: NOTE Specific language examples are preserved where they demonstrate different layout structures and
Formatted: English (United Kingdom)
writing directions, as these are essential to illustrate the concept rather than requiring translation.
Formatted: English (United Kingdom)
Figure 2 — Different types of text flow in a paragraph
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
These textual documents are not just digitized forms but also reflect the cultural characteristics of the
Formatted: English (United Kingdom)
countries and regions where they are used. For example, different regional documentation practices vary in
their approach: some frequently use lists, others commonly employ tables for layout, and certain writing Formatted: English (United Kingdom)
traditions utilize vertical paragraph orientation. As shown in Figure 2,Figure 2, in the Arabic script, the
Formatted: English (United Kingdom)
character flow of paragraphs combines right-to-left (for Arabic text) and left-to-right (for embedded Chinese,
Formatted: Fig Legend, Adjust space between Latin
Japanese, Korean, Latin characters, or numerals), though the overall paragraph direction remains right-to-left.
and Asian text, Adjust space between Asian text and
numbers, Tab stops: Not at 0.7 cm + 1.4 cm + 2.1 cm
+ 2.8 cm + 3.5 cm + 4.2 cm + 4.9 cm + 5.6 cm + 6.3
cm + 7 cm
Formatted: English (United Kingdom)
Formatted: None, Adjust space between Latin and
Asian text, Adjust space between Asian text and
numbers
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: FooterCentered, Left, Space Before: 0 pt,
Line spacing: single, Tab stops: Not at 17.2 cm
Formatted: Font: 11 pt
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
spacing: single
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted: FooterPageNumber, Space After: 0 pt, Line
spacing: single
8 © ISO 2025 2026 – All rights reserved
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Left, Space After: 0 pt,
Line spacing: single
Key:
1. text progression
2. size of the illustration < left-right size of the column
3. size of the illustration = left-right size of the column
4. size of the illustration = top-bottom size of the two columns
5. size of the illustration = top-bottom size of basic layout
6. left-right size of the column
Formatted: Fig Legend, Adjust space between Latin
and Asian text, Adjust space between Asian text and
7. size of the illustration left-right size of basic layout
numbers
Formatted: None, Adjust space between Latin and
8. top-bottom size of the column
Asian text, Adjust space between Asian text and
numbers
9. Size of the illustration = top-bottom size of the column
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers
Figure 3 —— Diverse logical structures and layouts of textual documents
Formatted: Font: 10 pt
As illustrated in Figure 3,Figure 3, a textual document can be a digital format that simply contains text content,
Formatted: Font: 10 pt
but it can also incorporate logical structures such as reading sequence and presentation order.
Formatted: FooterCentered, Left, Space Before: 0 pt,
Line spacing: single, Tab stops: Not at 17.2 cm
Formatted: Font: 11 pt
ISO /PRF 20271
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
-2:202X (E2026(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Space After: 0 pt, Line
5 Reference model for textual documents
spacing: single
5.1 Purpose
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers, Tab
This reference model is a fundamental framework that outlines the properties and recommendations for each
stops: Not at 0.71 cm
layer of textual documents that can be relevant to long-term preservation. It serves as a common basis for
Formatted: Adjust space between Latin and Asian text,
understanding, analysing, and establishing criteria for assessing the technical considerations required for
Adjust space between Asian text and numbers
long-term preservation of textual documents. The model enables the examination and determination of long-
term preservation viability for various formats with distinct technical foundations, promoting consistent
integration, interoperability, scalability, maintainability and functionality among technical tools and
programs. However, this reference model does not address specific technical or implementation details
regarding the analysis of individual file formats.
5.2 SolutionsApproaches and rationale for long-term preservation
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers, Tab
Textual documents can be encoded in a variety of formats, including plain-text documents, and those specified
stops: Not at 0.71 cm
in standards such as ISO 32000 — Document management — Portable Document Format series (PDF), the
Formatted: Adjust space between Latin and Asian text,
ISO/IEC 26300 — series (Open Document Format (ODF) for Office Applications, ), the ISO/IEC 29500 — series
Adjust space between Asian text and numbers
(Office Open XML (OOXML), and more. The). At the time of publication of this document, the field of traditional
archiving or, records management and document management is undergoing technical reviews and
international discussions regarding oncontinue to examine methods for the long-term digital preservation
methods. of digital documents. Various solutionsapproaches have been suggested and implemented for the
Formatted: Numbered + Level: 1 + Numbering Style: a,
long-term conservationpreservation of different types of digital documents. These solutionsapproaches
b, c, … + Start at: 1 + Alignment: Left + Aligned at: 0
include the following.
cm + Indent at: 0 cm, Adjust space between Latin and
Asian text, Adjust space between Asian text and
a) a) Preserving original formats
numbers, Tab stops: Not at 0.7 cm + 1.4 cm + 2.1 cm
+ 2.8 cm + 3.5 cm + 4.2 cm + 4.9 cm + 5.6 cm + 6.3
Even if the original document is kept intact for a long time, there is a possibility that it is not compatible
cm + 7 cm
with the latest technology or that the file format can become outdated, leading to the inability to access
Formatted: Adjust space between Latin and Asian text,
the document. It can also prove difficult to locate software capable of faithfully viewing the content.
Adjust space between Asian text and numbers
Formatted
...
b) b) Virtualisation to preserve the original file format’s usage environment
Formatted: Adjust space between Latin and Asian text,
Virtualisation refers to creating a virtual copy of the original computing environment (e.g. hardware,
Adjust space between Asian text and numbers
operating system, software) needed to access the document. This method allows future users to access
Formatted: Font: 10 pt
the document as if they were still using the original system, even if the technology has become obsolete.
Formatted: Font: 10 pt
The usage environment consists of the specific software and configurations required to render or interact
with the document. This method preservepreserves access by emulating the original system. However, it
Formatted: Font: 11 pt
carries risks, including copyright infringement, high costs, and complex maintenance, as the required
Formatted: FooterPageNumber, Space After: 0 pt, Line
software and operating systems must be preserved.
spacing: single
10 © ISO 2025 2026 – All rights reserved
ISO/DIS PRF 20271-2:20252026(en)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
Formatted: Font: Bold, English (United Kingdom)
c) c) Storing textual documents in a standardized long-term preservation format (e.g. PDF/A)
Formatted: Font: 11 pt, Bold
This involves converting documents into formats specifically designed for long-term preservation, such
Formatted: HeaderCentered, Left, Space After: 0 pt,
as PDF/A. While this is a widely accepted method, it does not guarantee that all original document
Line spacing: single
properties will be fully preserved, particularly in cases where documents contain unique elements (e.g.
Formatted: Numbered + Level: 1 + Numbering Style: a,
embedded media, dynamic elements) that maydo not necessarily convert well into the new format.
b, c, … + Start at: 3 + Alignment: Left + Aligned at: 0
cm + Indent at: 0 cm, Adjust space between Latin and
d) d) Storing textual documents in widely supported formats (which can be proprietary or non-
Asian text, Adjust space between Asian text and
standardized)
numbers, Tab stops: Not at 0.7 cm + 1.4 cm + 2.1 cm
+ 2.8 cm + 3.5 cm + 4.2 cm + 4.9 cm + 5.6 cm + 6.3
In this approach, documents are stored in formats that are widely supported, such as proprietary formats
cm + 7 cm
(e.g. DOC). This option carries the risk that these formats can become obsolete in the future, but it provides
Formatted: Adjust space between Latin and Asian text,
the advantage of using formats that are accessible and supported by various tools.
Adjust space between Asian text and numbers
Formatted: Numbered + Level: 1 + Numbering Style: a,
e) e) Converting textual documents to standardised or updated formats
b, c, … + Start at: 4 + Alignment: Left + Aligned at: 0
cm + Indent at: 0 cm, Adjust space between Latin and
Conversion refers to transforming textual documents from their original file formats into newer or
Asian text, Adjust space between Asian text and
standardisedstandardized formats to ensure continued accessibility. This process helps prevent
numbers, Tab stops: Not at 0.7 cm + 1.4 cm + 2.1 cm
obsolescence by allowing documents to be opened and used in up-to-date environments. However, it can
+ 2.8 cm + 3.5 cm + 4.2 cm + 4.9 cm + 5.6 cm + 6.3
lead to loss of information, metadata, or structural elements if the conversion does not fully support the
cm + 7 cm
source format. For this reason, conversion procedures should be clearly documented, and the resulting
Formatted: Adjust space between Latin and Asian text,
files should be verified for fidelity and completeness.
Adjust space between Asian text and numbers
f) f) Encapsulating textual documents with related resources
Formatted: Numbered + Level: 1 + Numbering Style: a,
b, c, … + Start at: 5 + Alignment: Left + Aligned at: 0
Encapsulation involves packaging a textual document together with all its related information, such as
cm + Indent at: 0 cm, Adjust space between Latin and
metadata, fonts, schemas, and usage context, into a single container file. This approach ensures that all Asian text, Adjust space between Asian text and
components necessary for rendering and interpretation are preserved together. Although encapsulation numbers, Tab stops: Not at 0.7 cm + 1.4 cm + 2.1 cm
can improve integrity and portability, it also increases storage requirements and maycan depend on + 2.8 cm + 3.5 cm + 4.2 cm + 4.9 cm + 5.6 cm + 6.3
cm + 7 cm
proprietary container structures. StandardisedStandardized encapsulation formats should be used
whenever possible to maintain interoperability.
Formatted: Adjust space between Latin and Asian text,
Adjust space between Asian text and numbers
However, when converting documents into a dedicated visualisationvisualization format (e.g. XPS, PDF/A) for
Formatted: English (United Kingdom)
long-term preservation, there is no guarantee that all the information from the original document will be
Formatted: Numbered + Level: 1 + Numbering Style: a,
preserved in the long run. This is because different document formats can have different properties and
b, c, … + Start at: 6 + Alignment: Left + Aligned at: 0
conversion software can be limited. Therefore, solutions like virtualization or migration can face potential
cm + Indent at: 0 cm, Adjust space between Latin and
issues related to technical obsolescence, legal problems, or the loss of document fidelity.
Asian text, Adjust space between Asian text and
numbers, Tab stops: Not at 0.7 cm + 1.4 cm + 2.1 cm
Data collection is crucial for the development of many technologies related to data analysis, generative
+ 2.8 cm + 3.5 cm + 4.2 cm + 4.9 cm + 5.6 cm + 6.3
artificial intelligence (AI,), big data etc. Most of this data is either numerical or text-based, and text information
cm + 7 cm
may be included from textual documents. Therefore, it is essential to preserve documents for an extended
Formatted: Adjust space between Latin and Asian text,
period to facilitate the training of AI models. This preservation can be done while preserving the
Adjust space between Asian text and numbers
characteristics and semantic information of the original documents (e.g. DOCX, ODT, HWPX, HTML) when
converting textual documents into a dedicated format for long-term preservation.
At present, there is no standard definition or reference model that identifies the types of content or structural
information of textual documents that can be required for long-term preservation. This makes it difficult to
conduct technical analysis of individual do
...












Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...