ISO/PRF 20271-2
(Main)Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals
Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals
This document defines the fundamentals of text documents for long-term preservation covering the concept, elements and components of text documents.
Titre manque — Partie 2: Titre manque
General Information
- Status
- Not Published
- Current Stage
- 5000 - FDIS registered for formal approval
- Start Date
- 22-Jan-2026
- Completion Date
- 04-Nov-2025
Overview
ISO/PRF 20271-2: Document management - Reference model for long-term preservation of textual documents - Part 2: Fundamentals is an international standard developed by ISO Technical Committee 171/SC 2. It defines the fundamental concepts necessary for the long-term preservation of textual documents, with a focus on the elements, components, and reference model structure that support sustainable document management and digital archiving.
As digital transformation accelerates, maintaining accessibility to textual documents across evolving technologies and platforms becomes increasingly challenging. This standard addresses the risks of obsolescence due to changing formats and technologies, providing a structured approach for organizations to preserve the integrity, structure, and meaning of digital textual documents over time.
Key Topics
Fundamental Concepts
The standard sets out the essential ideas behind long-term digital preservation, covering elements such as the layered reference model, definitions of document elements, and property classifications.Multi-layered Reference Model
ISO/PRF 20271-2 introduces an abstract, multi-layered framework for understanding textual documents. The model breaks documents down by aspects such as visualization, content, metadata, and logical structure, providing clarity for both analysis and future-proofing.Document Elements and Properties
The standard describes the variety of elements and property types that textual documents can contain - from simple text to complex layouts, images, tables, domain-specific notations, and more.Abstraction for Evaluation and Development
By defining layers and property classes, the reference model makes it easier to evaluate current file formats for long-term preservation suitability and guides the design or enhancement of new document formats.Recommendations for Preservation
Guidance is provided for identifying which document components and properties are most critical for maintaining document accessibility and meaning, regardless of the evolution of technologies or formats.
Applications
Digital Archiving and Records Management
The standard is a cornerstone for organizations tasked with archiving large volumes of digital textual documents, ensuring that records remain accessible and interpretable for decades.Development of Document Formats
File format designers use the ISO/PRF 20271-2 reference model as a blueprint when creating or updating textual document formats, prioritizing the elements that support long-term reliability and interoperability.Format Assessment and Migration Planning
IT professionals, archivists, and records managers leverage this standard when assessing legacy formats (DOCX, PDF, ODT, TXT, and more) for migration to more sustainable preservation formats.Quality Assurance in Preservation Solutions
The standard provides a checklist for evaluating and improving the capability of technical solutions-such as digital repositories or document management systems-to support effective long-term preservation.Non-technical Stakeholders
ISO/PRF 20271-2 aids users without deep technical format knowledge by outlining clear expectations and terminology for document preservation activities.
Related Standards
ISO 20271-1: Overview
Offers an overview of the ISO 20271 series, detailing roles, interrelationships, and the overall scope of reference models for long-term preservation.ISO 20271-3: Implementation
Addresses how to practically implement reference model principles and recommendations in document management systems.ISO 20271-4: Assessment
Focuses on evaluating and certifying the long-term preservation readiness of textual document formats and solutions.ISO/IEC 26300: Open Document Format (ODF)
Standard for open document file formats widely used in digital archiving.ISO 32000: PDF
Reference for the widely adopted Portable Document Format, including preservation subsets such as PDF/A.
Key terms:
long-term preservation, document management, textual documents, reference model, digital archiving, ISO 20271, metadata, file format evaluation, information lifecycle management, archiving standards, multi-layered abstraction
By following ISO/PRF 20271-2, organizations and stakeholders can confidently plan for the enduring accessibility and authenticity of their valuable digital textual content, supporting regulatory, legal, and business continuity requirements.
Buy Documents
ISO/PRF 20271-2 - Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals/25/2025
Get Certified
Connect with accredited certification bodies for this standard

BSI Group
BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

NYCE
Mexican standards and certification body.
Sponsored listings
Frequently Asked Questions
ISO/PRF 20271-2 is a draft published by the International Organization for Standardization (ISO). Its full title is "Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals". This standard covers: This document defines the fundamentals of text documents for long-term preservation covering the concept, elements and components of text documents.
This document defines the fundamentals of text documents for long-term preservation covering the concept, elements and components of text documents.
ISO/PRF 20271-2 is classified under the following ICS (International Classification for Standards) categories: 35.240.30 - IT applications in information, documentation and publishing; 37.080 - Document imaging applications. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/PRF 20271-2 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
DRAFT
International
Standard
ISO/DIS 20271-2
ISO/TC 171/SC 2
Document management —
Secretariat: ANSI
Reference model for long-
Voting begins on:
term preservation of textual
2025-04-22
documents —
Voting terminates on:
2025-07-15
Part 2:
Fundamentals
ICS: 37.080; 35.240.30
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
This document is circulated as received from the committee secretariat.
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS.
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Reference number
ISO/DIS 20271-2:2025(en)
DRAFT
ISO/DIS 20271-2:2025(en)
International
Standard
ISO/DIS 20271-2
ISO/TC 171/SC 2
Document management —
Secretariat: ANSI
Reference model for long-
Voting begins on:
term preservation of textual
documents —
Voting terminates on:
Part 2:
Fundamentals
ICS: 37.080; 35.240.30
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
© ISO 2025
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
STANDARDS MAY ON OCCASION HAVE TO
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
This document is circulated as received from the committee secretariat. BE CONSIDERED IN THE LIGHT OF THEIR
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
or ISO’s member body in the country of the requester.
NATIONAL REGULATIONS.
ISO copyright office
RECIPIENTS OF THIS DRAFT ARE INVITED
CP 401 • Ch. de Blandonnet 8
TO SUBMIT, WITH THEIR COMMENTS,
CH-1214 Vernier, Geneva
NOTIFICATION OF ANY RELEVANT PATENT
Phone: +41 22 749 01 11
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/DIS 20271-2:2025(en)
ii
ISO/DIS 20271-2:2025(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Textual documents . 4
5 Reference model for textual documents . 7
5.1 Purpose .7
5.2 Applicability .7
5.3 Rationale .8
5.4 Multi-Layered reference model .9
5.4.1 Definition of the layers of the reference model. .10
5.4.2 Definition of property types of each layer . 12
5.4.3 Recommendations for assessing each layer . 13
6 Target documents for applying reference model .16
6.1 Type of document for the reference model .16
6.2 Types of content included in textual document .16
6.2.1 Types of content .16
6.2.2 Text .17
6.2.3 Image .17
6.2.4 Table .19
6.2.5 Domain-specific notations .19
6.2.6 Review and Comment .21
6.2.7 Other Types . 22
Bibliography .23
iii
ISO/DIS 20271-2:2025(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any patent
rights identified during the development of the document will be in the Introduction and/or on the ISO list of
patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 171, Document Management Application,
Subcommittee SC 2, EDMS systems and authenticity of information.
A list of all parts in the ISO 20271 series can be found on the ISO website.
ISO 20271 series consists of the following parts, under the general title Document management — Reference
model for long-term preservation of textual documents:
— Part 1: Overview
— Part 2: Fundamentals
— Part 3: Implementation
— Part 4: Assessment
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
ISO/DIS 20271-2:2025(en)
Introduction
Over time, various file formats have been created and eventually phased out, leading to a situation
where files stored in obsolete formats become inaccessible. This occurs due to the disappearance of the
technologies and standards upon which these formats were based, coupled with inadequate preservation
efforts or updates to current technologies. As a result, digital files produced several decades ago are
rendered inaccessible, as there is no longer sufficient information about the file structure, making the data
unreadable or unanalysable. This issue, which affects files created many decades ago, has led to significant
discussions among countries and organisations on the long-term preservation of digital files, underlining its
importance as a critical issue in the field of digital archiving.
In the ISO 20271 series, this document defines a reference model for textual documents, which incorporates
multiple abstraction layers for technical analysis and quantitative evaluation. It specifies the definitions
for these layers and the categorisation of properties contained within them. When storing properties at
each layer according to specific file formats, the document establishes technical criteria to ensure that
appropriate measures are in place to address potential obsolescence of the preserved files. It defines
specific review targets within the file to assess the long-term preservation capability of storage formats for
digital documents. Furthermore, it provides guidance on evaluating the long-term preservation of storage
standards, designing new file formats for textual documents, and adding new properties to existing textual
document standards. This document also presents considerations for referencing and addressing when
improving these standards.
This document supports the following activities:
— Format analysis activities for selecting and preparing the evaluation of formats for the long-term
preservation of textual document file formats.
— Technical activities for selecting design targets and performing structural design when developing new
textual document format specifications.
— Activities related to adding specific properties or making structural improvements to existing textual
document format specifications.
— Classification activities concerning textual document formats, including the addition of specific
properties or structural improvements to existing specifications.
For information related to other parts of the ISO 20271 series, the ISO 20271-1 document can be referred to.
ISO 20271-1 provides an overview of the roles, interrelationships between parts, and the scope of the entire
ISO 20271 series.
v
DRAFT International Standard ISO/DIS 20271-2:2025(en)
Document management — Reference model for long-term
preservation of textual documents —
Part 2:
Fundamentals
1 Scope
This document specifies the reference model for textual documents and provides detailed recommendations
necessary to support long-term preservation from various perspectives, based on the reference model.
ISO 20271-2 defines the fundamental concepts of the reference model for textual documents. This includes
the definitions of layers that make up the reference model, elements incorporated within textual documents,
property types, classifications of properties by type, and various properties inherent to textual documents.
Additionally, it defines the concepts and structure of a long-term preservation reference model for digital
documents, which can be applied to other types of documents beyond textual documents.
ISO 20271-2:
— defines textual documents and outlines major content properties that consist of a textual document.
— provides the concepts of the reference model for textual documents, defines key elements included within
the textual documents, and outlines recommendations for enhancing long-term preservation based on
the reference model.
— provides guidelines for classifying various properties that can be included in textual documents as
outlined in ISO 20271-3 by reference model layers, along with examples of classification and guidelines
for enhancing long-term preservation post-classification.
ISO 20271-2 does not specify the following:
— specific technical methods for checking whether the properties exist within a specific textual
document or not.
— specific technical methods for analysing a textual document format such as DOC, DOCX, ODT, TXT, PDF, etc.
— specific metadata items for the long-term preservation of textual documents.
— required computer hardware or operating system.
— does not recommend specific textual document file formats as suitable for long-term preservation.
— does not recommend any processes, procedures, or management practices associated with long-term
preservation, records management.
This document provides technical recommendations for organizations, individuals, and both public and
private entities involved in designing digital textual documents, assessing existing file formats, or enhancing
file format specifications. Its primary aim is to ensure that these documents remain technically interpretable
and understandable despite potential obsolescence, while accommodating various requirements and levels
of information. These recommendations are particularly valuable for users who are not fully acquainted
with the technical characteristics of file formats or the core content elements of textual documents.
ISO/DIS 20271-2:2025(en)
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
recommendations of this document. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org
3.1
ActiveX
deprecated software framework created by Microsoft that adapts its earlier Component Object Model (COM)
and Object Linking and Embedding (OLE) technologies for content downloaded from a network, particularly
from the World Wide Web
3.2
ASCII (American Standard Code for Information Interchange)
character encoding standard that is abbreviated from American Standard Code for Information Interchange,
is a character encoding standard for electronic communication
3.3
ASMO708
7-bit character encoding standard specifically designed for Arabic text
3.4
Big5
character encoding standard that is a Chinese character encoding method used in Taiwan, Hong Kong, and
Macau for traditional Chinese characters
3.5
DOCX
file format, especially for Office Open XML documents
3.6
Elements
components included in a textual document
3.7
EUC (Extended Unix Code)
character encoding standard that is a multibyte character encoding system used primarily for Japanese,
Korean, and simplified Chinese (characters)
3.8
EUC-KR (Extended Unix Code for Korean)
character encoding standard that is an 8-bit character encoding that utilizes KS X 1001(Korea Industrial
Standards) and KS X 1003, a variant of Extended Unix Code (EUC)
Note 1 to entry: As it is a representative completed Korean encoding, it is commonly referred to as ‘Wansung’.
3.9
HWPX
file format, especially for word processing documents based on Open Word Processor Markup Language
(OWPML), which is used by most public institutions in Republic of Korea and designated as a permitted
format for the long-term preservation of official documents
ISO/DIS 20271-2:2025(en)
3.10
Johap
encoding specification and Korean character set that served as industrial standards in South Korea during
the early 1990s
3.11
Kihon-Hanmen
“basic reverse”, a term used in Japanese martial arts, particularly in the context of kata (forms) and training
methodologies
3.12
LTR (Left to Right)
languages like English, French, and Spanish follow a reading direction that starts from the left and moves to
the right
3.13
MathML (Mathematical Markup Language)
standard that is a mathematical markup language, an application of XML for describing mathematical
notations and capturing both its structure and content and is one of a number of mathematical markup
languages
3.14
OLE (Object Linking and Embedding)
technical specification that is a proprietary technology developed by Microsoft that allows embedding and
linking to documents and other objects
3.15
OOXML (Office Open XML)
file format that is developed by Microsoft for representing spreadsheets, charts, presentations and word
processing documents (ISO/IEC 29500)
3.16
ODT (Open Document Text)
file format, especially for word processing document of Open Document Format
3.17
ODF (Open Document Format)
open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-
compressed XML files (ISO/IEC 26300)
3.18
OWPML (Open Word Processor Markup Language)
file format that is abbreviated from Open Word Processor Markup Language (OWPML), which follows the
standard (KS X 6101), is a file format developed by Hancom Inc. in 2010
3.19
Plug-In
technical specification that is used as a software component, which adds a specific feature to an existing
computer program
3.20
PDF (Portable Document Format)
standard that is a file format developed by Adobe in 1992 to present documents, including text formatting
and images, in a manner independent of application software, hardware, and operating systems (ISO 32000)
3.21
property
attribute, element (3.6), and other component found in textual documents, which is subject to long-term
preservation
ISO/DIS 20271-2:2025(en)
3.22
raster image
graphics and digital photography, a raster graphics represents a two-dimensional picture as a rectangular
matrix or grid of square pixels, viewable via a computer display, paper, or other display medium
3.23
rendering engine
software component responsible for converting document content (such as text, images, and formatting
instructions) into a visual or printable output on various devices, like screens or printers
Note 1 to entry: It interprets the document's code or format and displays it in a way that users can view or interact with.
3.24
RTF (Rich Text Format)
file format that is a proprietary document file format with published specification developed by Microsoft
Corporation from 1987 until 2008 for cross-platform document
3.25
RTL (Right to Left)
languages like Arabic, Hebrew, and Persian follow a reading direction that starts from the right and moves
to the left
3.26
semantic information
properties related to textual documents encompass all meanings contained within the document, including
information that conveys the structural aspects of the textual document
3.27
Shift-JIS
character encoding standard that is for the Japanese language
Note 1 to entry: Originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft
and standardized as JIS X 0208 Appendix 1
3.28
SVG (Scalable Vector Graphics)
XML-based vector image format for defining two-dimensional graphics, having support for interactivity and
animation
3.29
UNICODE
character encoding standard maintained by the Unicode Consortium designed to support the use of text
written in all of the world’s major writing systems
3.30
vector image
form of computer graphics in which visual images are created directly from geometric shapes defined on a
Cartesian plane, such as points, lines, curves, and polygons
3.31
XML (eXtensible Markup Language)
markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a
set of rules for encoding documents in a format that is both human-readable and machine-readable
4 Textual documents
A textual document refers to a type of document typically created, saved, or printed using various types
of documents editing software, usually utilizing file extensions such as TXT, DOCX, Open Document
Text(ODT), Portable Document Format(PDF), Hangul Word Process XML(HWPX), TeX, Hyper-Text Markup
Language(HTML) and more.
ISO/DIS 20271-2:2025(en)
Textual documents can encompass a wide range of content, from simple text to multimedia elements like
images, videos, and audio, while supporting rich expressions through various styles, complex layouts, and
integration with external elements such as fonts.
The structural types of content contained in a textual document range from the simplest type that includes
only text to types that include multimedia elements such as images and videos, as well as various properties
that enable rich expressions such as different styles and complex layouts through integration with
external elements such as fonts. There exist various types that allow for diverse properties and enable rich
representations.
The reference model defined in this standard aims to provide a layered abstraction of technical information,
which helps break down the structure of textual documents. This breakdown facilitates the establishment
of evaluation criteria for long-term preservation and the categorization of documents. In practical
applications, this reference model may require additional layers beyond the five foundational abstract layers
initially identified for standard textual documents. These foundational layers typically include aspects
such as content, structure, presentation, interaction, and metadata. Additional layers may be necessary for
documents that differ significantly from textual documents, such as spreadsheets for storing or analysing
numerical data and presentation documents that incorporate dynamic features like animations. This
standard primarily focuses on text-centric documents that store, preserve, and deliver information conveyed
through text content, ranging from simple structured formats to those with complex layouts.
Figure 1 — Examples of textual documents
Figure 2 illustrates various types of textual documents. These documents can have the following layout
characteristics : a header, footer, and body aligned to fit a specific paper size; a body composed of several
paragraphs, each of which can be represented by one or more sections; paragraphs constructed from
characters encoded in various standards (such as Unicode, American Standard Code for Information
ISO/DIS 20271-2:2025(en)
Interchange (ASCII), Shift-JIS, Extended Unix Code for Korean (EUC-KR), Big5 and so on), images, and tables,
and can include various styles for decorating them.
Figure 2 — Different types of text flow in a paragraph
These textual documents are not just digitized forms but also reflect the cultural characteristics of
the countries and regions where they are used. For example, in the West, lists are often used in the
documentation, while in Asia, tables are often used to layout documents, and paragraphs are sometimes
written vertically. As shown in Figure 3, in the Arabic script, the character flow of a paragraphs is rtl or ltr.
ISO/DIS 20271-2:2025(en)
Figure 3 — The diverse logical structures and layouts of textual documents.
As illustrated in Figure 4, a textual document can be a digital format that simply contains text content, but it
can also incorporate logical structures such as reading sequence and presentation order.
5 Reference model for textual documents
5.1 Purpose
This reference model is a fundamental framework that outlines the properties and recommendations for
each layer of textual documents that ref be relevant to long-term preservation. It serves as a common basis
for understanding, analysing, and establishing criteria to assess the technical considerations necessary for
the long-term preservation of textual documents. The model enables the examination and determination
of long-term preservation viability for various formats with distinct technical foundations, promoting
consistent integration, interoperability, scalability, maintainability, and functionality among technical
tools and programs. However, it doesn’t address specific technical or implementation details regarding the
analysis of individual file formats.
5.2 Applicability
This reference model is a useful tool for professionals in documents management and institutional
documents management personnel to establish criteria for evaluating the long-term preservation of specific
ISO/DIS 20271-2:2025(en)
file formats. It is recommended that software developers adopt these long-term preservation standards
to design a common structure and interface for textual documents. This will ensure compatibility and
interoperability between different technologies and systems. Additionally, reference markup is provided as
a valuable resource for developing new formats to represent textual documents or improve the long-term
preservation capabilities of existing formats.
5.3 Rationale
Textual documents can be encoded in a variety of formats, including plain-text documents, PDF, Open
Document Format (ODF), Office Open XML (OOXML), and more. The field of traditional archiving or
document management is currently undergoing technical reviews and international discussions regarding
methods to ensure the long-term preservation of digital content. Various solutions have been suggested and
implemented for the long-term conservation of different types of digital documents. These solutions include:
1) Preserving original formats:
Even if the original document is kept intact for a long time, there is a possibility that it may not be
compatible with the latest technology or that the file format can become outdated, leading to the
inability to access the document. It can also prove difficult to locate software capable of faithfully
viewing the content.
2) Virtualisation to preserve the original file format’s usage environment:
Virtualisation refers to creating a virtual copy of the original computing environment (e.g., hardware,
operating system, software, etc.) needed to access the document. This method allows future users to
access the document as if they were still using the original system, even if the technology has become
obsolete. The usage environment consists of the specific software and configurations required to
render or interact with the document. This option helps preserve access to the document by emulating
the original system, but it presents risks such as copyright infringement, high costs, and complex
management because the required software and operating systems must be maintained or copied for
preservation.
3) Storing textual documents in a standardized long-term preservation format (e.g., PDF/A):
This involves converting documents into formats specifically designed for long-term preservation,
such as PDF/A. While this is a widely accepted method, it does not guarantee that all original document
properties will be fully preserved, particularly in cases where documents contain unique elements (e.g.,
embedded media, dynamic elements) that may not convert well into the new format.
4) Storing textual documents in widely supported formats (which can be proprietary or non-standardized):
In this approach, documents are stored in formats that are currently widely supported, such as
proprietary formats (e.g., DOCX). This option carries the risk that these formats can become obsolete in
the future, but it provides the advantage of using formats that are currently accessible and supported by
various tools.
However, when converting documents into a dedicated visualisation format (e.g., XPS, PDF/A etc.) for long-
term preservation, there is no guarantee that all the information from the original document will get
preserved in the long run. This is because different document formats can have different properties and
conversion software can be limited. Therefore, solutions like virtualization or migration can face potential
issues related to technical obsolescence, legal problems, or the loss of document fidelity.
Data collection is crucial for the development of many technologies related to data analysis, generative AI,
big data etc. Most of this data is either numerical or text-based, and text information may be included from
textual documents. Therefore, it is essential to preserve documents for an extended period to facilitate the
training of AI models. This preservation can be done while preserving the characteristics and semantic
information of the original documents (for example, DOCX, ODT, HWPX, HTML) when converting textual
documents into a dedicated format for long-term preservation.
At present, there is no standard definition or reference model that identifies the types of content or
structural information of textual documents that can be required long-term preservation. This makes it
ISO/DIS 20271-2:2025(en)
difficult to conduct technical analysis based on each document. In the documents management field, where
technical analysis and information on textual documents can be limited, it can be challenging to define long-
term preservation strategies or evaluation conditions for such documents.
It is important to establish preservation strategies for all types of digital content, even those that are difficult
to quantitatively assess, such as inclusion of:
— text within the document
— graphics, such as graphs and charts
— audio and video clips
— hyperlinks and metadata information
— semantics of the original document
— digital signature authentication information
— binary data (closed stream format)
This reference model and its recommendations provide a clear set of criteria for evaluating the long-
term preservation of textual documents. These criteria can be used for both quantitative and qualitative
assessments, and even individuals without technical training can conduct technical analyses using them.
This model can serve as a guideline for identifying a more detailed and quantifiable set of criteria that can
be applied to all textual documents commonly defined in the field of archiving or documents management.
The model includes technical recommendation for evaluating the documents, such as contextual information
support, complexity, interoperability, viability, and reusability.
This abstract Reference Model for textual documents helps analyse the characteristics of documents stored
in different file formats. Technical evaluation subjects are chosen to improve long-term preservation. Reliable
evaluation methods and tools need standardized recommendations and guidelines for their development.
5.4 Multi-Layered reference model
This standard specifies the reference model for textual documents and the recommendations for their long-
term preservation. Textual documents can vary in structure from being very simple to highly complex. In
order to classify textual documents, an abstract reference model is used that categorizes them based on
their visual, descriptive, logical, physical, and content characteristics. For each of these characteristics,
layers are constructed, and properties are mapped to the corresponding reference model. The reference
model for the textual document is defined as a Multi-Layered Reference Model consisting of five layers. The
complete structure of this reference model is illustrated in Figure 2.
ISO/DIS 20271-2:2025(en)
Figure 5 — The reference model of textual documents
5.4.1 Definition of the layers of the reference model.
5.4.1.1 Visualisation layer
The reference model consists of five layers, with the visualisation layer being the first. The visualisation
layer is defined as a layer that includes the properties necessary for visually representing the document.
Textual documents contain text and related information, ranging from simple to complex forms. This
content is displayed on the screen, which is referred to as the visualisation layer among the various layers
that make up the reference model. Textual documents express a variety of content information included in
the document through a series of implemented procedures known as rendering. The properties related to
this are classified into the visualisation layer.
Textual documents can be visualised through the visualisation layer, targeting static content elements such
as text and images, style information including fonts and layout, as well as dynamic elements such as video.
In the case of plain-text documents with no additional visualisation properties such as font face information
and layout style, the visualisation information can depend on the platform or program used to visualise the
document.
Within the reference model, each unit of information is called a property. A property can have characteristics
from multiple layers of the reference model. The properties belonging to the visualisation layer are called
visualisation properties. Depending on their complexity or implementation method, these visualisation
properties can be displayed differently. In some cases, the characteristics of other layers are maintained,
while in other cases, they are not. Depending on whether the original characteristics are preserved or not, it
not only affects the long-term preservation but also the compatibility of the document.
Based on the textual documents reference model, the visualisation layer properties can be used to access
the accuracy and reliability of visualisation, such as if visualization can vary depending on the system or
application used. For example, using an image format can preserve the visual appearance accurately, but
at the detriment of other layers, while other formats can be less visually precise if they support dynamic
reflow or re-layout.
5.4.1.2 Content layer
The second layer of the reference model, the content layer, is defined as the layer responsible for representing
the intrinsic content elements contained within the document.
ISO/DIS 20271-2:2025(en)
Textual documents can include various types of information, ranging from basic text to images, videos,
sounds, charts, and more. The content layer includes properties that represent the crucial information for
all types of textual documents. The basic information included in the content layer may be in a standardized
format for each type or maybe in a non-standardized format.
The content layer plays a critical role in representing the core elements of a document. Among these, the
text property, as the most fundamental aspect of textual documents, holds the highest importance. Other
content properties, depending on their relevance, can also be prioritized based on the specific use case or
requirements. Regardless of the format, it is essential that the properties representing the content itself are
preserved without any loss, ensuring the document's integrity and meaning remain intact.
The properties in the content layer may or may not be visually represented.
5.4.1.3 Metadata layer
The third layer in the reference model is the metadata layer, which encompasses a range of metadata within
textual documents. The metadata layer is defined to represent information classified as metadata included
within the document. This layer plays a crucial role in providing context, structure, and additional details
that enhance the understanding and management of the document's content.
This layer includes properties that do not directly influence the visualisation or the content of the document
and therefore not included in either the visualisation layer or the content layer.
Conversely, visualisation properties or content properties do not belong to the metadata layer. The metadata
layer is associated with additional information contained within the document, which is essential for
providing context and enhancing the usability of the document. Metadata properties may include, but are
not limited to, document summary information, fields within the document, alternative text for accessibility
support, document change tracking information, and notes, depending on the specific needs of the document.
5.4.1.4 Semantics layer
The fourth layer of the Reference Model is the semantics layer, which is defined to contain structured content
that expresses semantic information and conveys meaning within the document.
The semantics layer may not be present in simple textual documents that contain only basic content
information, such as plain text or images. However, when textual documents include properties that
encompass various structural information—such as paragraphs, lists, headers, table titles, and figure
titles—they are included in the semantics layer. Additionally, the properties within the semantics layer can
be applied to the visualisation layer.
If properties within the semantics layer are omitted, it can result in discrepancies between the visualised
part of the document and the original document, This can lead to alterations or omissions in the logical
structure and contextual representation of the original document.
The semantics layer can include properties such as information distinguishing paragraphs, headers, footers,
the flow order of document content, captions for images or tables, automatically assigned paragraph
numbers, table of contents information, footnotes, endnotes, and other related properties.
5.4.1.5 Package layer
The fifth layer of the reference model is the package layer, which is defined to represent the method and all
related properties for converting textual documents into data streams and storing them in physical storage,
along with any associated information. This layer treats document data as data streams and ensures that
the necessary information for storage and retrieval is included.
However, for embedded files within textual documents, they are treated as stream objects, and the specific
storage methods related to individual formats are not directly managed.
A key challenge related to the package layer is the potential risk of data loss when storing digital documents
on physical storage devices that use non-standard or proprietary formats. This can lead to difficulties in
interpreting the respective data streams, as well as potential issues with the storage medium itself. The
ISO/DIS 20271-2:2025(en)
package layer can include additional information to address structural responses to errors in the storage
medium or compatibility with evolving technologies that can make data stream interpretation difficult.
Moreover, compression techniques used to bundle multiple files into a single physical unit during the
construction of textual documents are also managed within the package layer.
The package layer can incorporate properties defined by separate, independent specifications or standards,
potentially with more ease than the visualisation, contents, semantics, and metadata layers.
5.4.2 Definition of property types of each layer
5.4.2.1 Visualisation property
Visualisation Properties are those properties that affect the visualisation of textual documents. These
properties can possess multiple characteristics, as they can affect or apply to multiple layers. They are
related to visual information, and even if visual characteristic information is hidden by the user or other
elements and is not visually observable, it still be preserved. they can occupy space within the document
and influence the style properties in which documents or information are represented, making them part of
the visualisation properties.
Visual properties that are used to represent or display content on a document include style properties like
font, margin, colour, and text decoration. In addition, content properties like text, image and table can also be
considered as visualisation properties in some formats as they directly contribute to the visual appearance
of the document on a screen.
5.4.2.2 Content property
Content property refers to the properties that represents the informational content of textual documents.
These properties can have inter-related characteristics and can be applied to multiple layers. Content
properties can or cannot be displayed when the document is visualised. A type of content that is commonly
used in documents is text. This can be in the form of a publicly standardized code format such as Unicode,
or ASCII, or a code system format used in specific platforms or countries such as CP949, Windows-874,
and Korean-Johap. The image property, another common information type in textual documents, can be
implemented in various formats like BMP, JPG, PNG, and GIF.
When documents are implemented as a format, it’s common to distinguish between style properties
responsible for visualisation and content properties representing the content itself. Hence, preserving
properties related to these two layers can play
...




Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...