Digital publishing — EPUB3 preservation — Part 1: Principles

The ISO/IEC TS 22424 series supports long-term preservation of EPUB publications via a dual strategy. This document considers EPUB features from a long-term preservation point of view. Some EPUB features are forbidden and some others required, depending on how they relate to a long-term preservation. EPUB publications constructed according to these guidelines are suitable for preservation. ISO/IEC TS 22424-2 makes EPUB compliant with Open Archival Information System (OAIS) and current practices of OAIS archives.

Publications numériques — EPUB3 preservation — Partie 1: Principes

General Information

Status
Published
Publication Date
28-Jan-2020
Current Stage
9093 - International Standard confirmed
Completion Date
15-Sep-2023
Ref Project

Relations

Buy Standard

Technical specification
ISO/IEC TS 22424-1:2020 - Digital publishing -- EPUB3 preservation
English language
25 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

TECHNICAL ISO/IEC TS
SPECIFICATION 22424-1
First edition
2020-01
Digital publishing — EPUB3
preservation —
Part 1:
Principles
Publications numériques — EPUB3 preservation —
Partie 1: Principes
Reference number
ISO/IEC TS 22424-1:2020(E)
©
ISO/IEC 2020

---------------------- Page: 1 ----------------------
ISO/IEC TS 22424-1:2020(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC TS 22424-1:2020(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 9
5 Packaging standards. 9
6 Construction of OAIS information packages .11
6.1 Overview .11
6.2 General principles .12
6.2.1 EPUB publications shall be sent to a repository system as well-formed
and complete submission information packages (SIPs) .12
6.2.2 Regardless of its type or format, it shall be possible to include any data or
metadata in SIPs .14
6.2.3 It should be possible to transfer SIPs by any means, methods, or tools
from the submitting organization to the repository system .16
6.2.4 The archive shall have a way to verify the identity of the submitting
organization/person, no matter how the information packages are transferred 16
6.2.5 There is no 1:1 relation between OAIS information packages .16
6.2.6 A SIP may contain 0-n EPUB 3 publications, and one EPUB 3 publication
may be submitted to the repository system in 1-n SIPs .16
6.2.7 The information package type (in this case, SIP) shall be indicated .16
6.2.8 SIP packaging method shall not restrict the application of any
preservation method .17
6.2.9 The packaging method shall not limit the size of the SIP .17
6.3 Identification of information packages and their content .17
6.3.1 It shall be possible to identify any SIP uniquely both during and after the
ingest process .17
6.3.2 Information objects (EPUB publications, PREMIS preservation metadata
record, etc.) within SIPs shall be identified uniquely and persistently .17
6.3.3 EPUB Fragment Identifiers should not be used in EPUB publications sent
to a repository system, unless the submission agreement explicitly allows
their use .18
6.4 Structure of information packages .18
6.5 Generic Information package metadata .19
6.5.1 Metadata in information packages shall be based on standards .19
6.5.2 Metadata should allow (automatic) validation of the structure and
content of SIPs in terms of integrity, fixity, and syntax .19
6.5.3 It shall be possible to edit metadata in information packages .19
Annex A (informative) EPUB and digital preservation: issues and recommendations.20
Bibliography .24
© ISO/IEC 2020 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC TS 22424-1:2020(E)

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that
are members of ISO or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see http:// patents .iec .ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 34, Document description and processing languages.
A list of all parts in the ISO/IEC TS 22424 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO/IEC 2020 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC TS 22424-1:2020(E)

Introduction
0.1 General
This document facilitates the long-term preservation of EPUB publications by specifying in general level
EPUB features which are mandatory for long-term preservation (such as font embedding) and features
which should be avoided if possible.
This document can be seen as a stepping stone towards a detailed specification which would be related
to EPUB in the same way as PDF/A, specified in ISO 19005-1 to ISO 19005-3, is related to the Portable
Document Format (PDF). If and when the EPUB community develops detailed guidelines for the
production of archivable EPUB publications, this document could be used as one of the starting points.
Long-term preservation in general requires two things:
— making the object such as EPUB publication fit for preservation – including features to be used and
features to avoid;
— packaging the object (and any metadata related to it) together with any additional data such as
other versions of the object and other documentation into an Open Archival Information System
(OAIS) submission information package (SIP).
Packaging is covered in ISO/IEC TS 22424-2.
0.2 EPUB
The EPUB standard
defines a distribution and interchange format for digital publications and documents. The EPUB® format
provides a means of representing, packaging and encoding structured and semantically enhanced Web
[17]
content — including HTML, CSS, SVG and other resources — for distribution in a single-file container.
EPUB format was developed by the International Digital Publishing Forum, IDPF, which merged with
the World Wide Web Consortium, W3C, in January 2017. Ongoing technical development of the standard,
related extension specifications and ancillary deliverables are the responsibility of the W3C EPUB 3
1)
Community Group , which published its charter in February 2017. According to the charter,
work on any future major revision of EPUB, e.g. an EPUB 4, is initially out of scope on the presumption that
this will be taken up by a new W3C WG as a W3C Recommendation Track activity. The EPUB 3 CG will
coordinate its work with such new WG, and meanwhile with the existing W3C Digital Publishing Interest
[23]
Group (DPUB IG).
The International Digital Publishing Forum, IDPF, has ceased operations as a membership organization
2)
in January 2017, and its website is now an archive. The latest version of the standard and information
about future EPUB developments is available at the Publishing@ W3C webpage, https:// www .w3 .org/
publishing/ .
3) 4)
The specification at hand covers EPUB 3 versions up to EPUB 3.0.1 . EPUB 3.1 was the first major
revision of EPUB 3.0.1, but there are no implementations of version 3.1 and therefore it is not covered
in this document. The most widely used version of the standard is still 3.0.1. EPUB 3.2, was published in
5)
May 2019 . Unlike 3.1, it is fully backwards compatible with 3.0.1. It will be covered in the next edition
of this document.
1) https:// www .w3 .org/ publishing/ groups/ epub3 -cg/
2) http:// idpf .org/
3) http:// idpf .org/ epub/ 301
4) https:// www .w3 .org/ Submission/ epub31/
5) https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -spec .html
© ISO/IEC 2020 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/IEC TS 22424-1:2020(E)

Differences between EPUB specifications 2.0.1-3.2 are well documented:
6)
— EPUB 3 Changes from EPUB 2.0.1
7)
— EPUB 3.0.1 Changes from EPUB 3.0
8)
— EPUB 3.2 Changes from EPUB 3.0.1
All EPUB specifications are available in the Web; 2.01 at http:// idpf .org/ epub/ 201, EPUB 3.0.1 at http://
idpf .org/ epub/ 301 and 3.2 at https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -spec .html.
All EPUB publications, including ones using version 3.2, can be validated using EPUBCheck version
4.2.0, which was released in March 2019.
From long-term preservation point of view, lack of backward compatibility between successive versions
of a file format would be a problem because it makes migration more challenging. In addition, EPUB
3.1 has at least one feature which would have been problematic. In EPUB 3.1 foreign resources do not
require fallbacks if they are not in the spine and not embedded in EPUB Content Documents. In EPUB
3.0.1, fallback guarantees that there is a version of the document that can be rendered; in 3.1 such
guarantee no longer exists.
EPUB 3.0.1 was prepared by the IDPF. It consists of six interlinked documents:
— EPUB 3 Overview
— Publications 3.0.1
— Canonical fragment identifiers
— Content documents 3.0.1
— Media overlays 3.0.1
— Open Container Format 3.0.1
There are several extension specifications to these EPUB base standards. The list below is incomplete,
as it contains mainly specifications that are relevant from the long-term preservation point of view.
Some of them are still drafts:
9)
— EPUB Accessibility specification 1.0 addresses evaluation and certification of accessible EPUB
publications, and discovery of the accessible qualities in such publications.
10)
— EPUB Previews 1.0 describes how content previews can be included in EPUB publications.
11)
— EPUB Distributable Objects 1.0 is a draft specification that defines a method for the encapsulation,
transportation, and integration of distributable objects in EPUB publications.
12)
— EPUB Scriptable Components 1.0 provides an interoperable publish and subscribe (pubsub)
pattern by which interactive content can be created and incorporated into EPUB publications. Same
as EPUB Distributable Objects, it is as of 2019-05-13 a draft.
6) http:// www .idpf .org/ epub/ 30/ spec/ epub30 -changes -20111011 .html
7) http:// www .idpf .org/ epub/ 301/ spec/ epub -changes -20140626 .html
8) https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -changes .html
9) http:// www .idpf .org/ epub/ a11y/ accessibility .html
10) http:// www .idpf .org/ epub/ previews/ epub -previews -20150826 .html
11) http:// www .idpf .org/ epub/ do/
12) http:// www .idpf .org/ epub/ sc/ api/
vi © ISO/IEC 2020 – All rights reserved

---------------------- Page: 6 ----------------------
ISO/IEC TS 22424-1:2020(E)

13)
— EPUB Scriptable Components Packaging and Integration 1.0 is a draft that defines a method for
the creation and inclusion of dynamic and interactive components in EPUB publications.
14)
— EPUB Multiple-Rendition Publications 1.0 defines the creation and rendering of EPUB publications
consisting of more than one rendition of the same publication.
15)
— EPUB Dictionaries and Glossaries 1.0 provides a means for expressing dictionary and glossary
semantics in EPUB publications.
These extensions are not widely used and they have not been explicitly taken into account in this
document. As regards accessibility, all EPUB publications are supposed to be accessible. However,
accessibility features as such do not have an impact on long term preservation of EPUB publications and
therefore this document does not make accessibility-related requirements.
EPUB 3 core media types have been listed at https:// www .w3 .org/ publishing/ epub3/ epub -spec .html
#sec -core -media -types. As of 2019-05-13, the latest change has been made on April 1, 2018. Starting
from EPUB 3.2, core media types are part of the standard.
In 2014, EPUB 3.0 specifications were republished as ISO/IEC TS 30135-1 to ISO/IEC TS 30135-6. Each
of these six ISO specifications is identical to its IDPF equivalent, for example ISO/IEC TS 30135-1 has
exactly the same content as the EPUB 3.0 Overview.
ISO/IEC TS 30135-7 entitled "Part 7: EPUB3 Fixed-Layout Documents" is from EPUB 3.0.1 (EPUB 3.0
does not have fixed layout specification). ISO/IEC TS 30135 (all parts) is therefore a combination of
EPUB 3.0 and Fixed-Layout Documents specification from 3.0.1.
ISO/IEC JTC 1/SC 34 is currently updating the ISO standard to match fully the version 3.0.1.
EPUB is a rich document format with a lot of features. From the digital preservation point of view this
is a challenge, not least because long-term preservation has not been a priority in the development of
the standard. Preserving all aspects and features of EPUB publications may be difficult, since there are
features which are difficult to preserve. Moreover, EPUB reading systems usually do not support all
features of the specification and finding tools supporting rare features can be difficult.
In spite of these challenges EPUB is generally regarded as a suitable format for digital archiving. For
instance, the Finnish National Digital Library initiative has selected just eight archivable file formats
for text, EPUB being one of them. The selection criteria were openness/transparency, adoption as a
preservation standard, degree of forward/backward compatibility, degree of protection against file
corruption, frequency of version releases, dependencies/interoperability, and standardization. EPUB
got an A, the best grade, from everything else except the second and third criterion. For those, the
grade was the second best, a B (see Reference [19], p.40). Based on these generic criteria, EPUB seems
to provide a good basis for long-term preservation, although additional guidelines on how to use the
standard are needed to guarantee EPUB files can be preserved efficiently.
The British Library’s Digital Preservation Team has published an assessment of EPUB as a preservation
[15]
format . It covers EPUB versions 3.0.1 and 2 and the overall view of EPUB is positive (Reference
[15], p.2):
EPUB 3 is currently the closest thing available to an open standard for e-books. In 2013, Bläsi and Rothlauf
concluded that EPUB 3 had the “highest expressive power” of all formats in the e-book ecosystem, and that
it included the superset of all features used in proprietary formats like KF8, Fixed Layout EPUB, and iBooks.
EPUB long-term preservation issues uncovered in the assessment of the British Library are discussed
in Annex A.
EPUB is enjoying reasonable support in the e-book market. Many suppliers, publishers, and application
developers who have supported EPUB 2 have implemented version 3.0.1. According to the EPUBTest web
13) http:// www .idpf .org/ epub/ sc/ pkg/
14) http:// www .idpf .org/ epub/ renditions/ multiple/
15) http:// www .idpf .org/ epub/ dict/
© ISO/IEC 2020 – All rights reserved vii

---------------------- Page: 7 ----------------------
ISO/IEC TS 22424-1:2020(E)

16)
site , EPUB 3 support in reading systems is far from exhaustive, but market coverage is good – in January
2018, there were 59 reading systems supporting at least some of the features specified in EPUB 3.0.
E-book suppliers have produced EPUB 3 based formats that incorporate digital rights management
(DRM), and EPUB modifications that may restrict using the format on other than the suppliers’ own
platforms. For example, the Kindle Fire eReader, released in 2015, uses a new format called Kindle
Format 8 (KF8), which is partly based on EPUB 3, with Amazon’s DRM. See Reference [15], 3. Publisher/
supplier specific DRM often restricts the use of e-books to that publisher’s/supplier’s rendering devices
and/or applications, and is therefore a major obstacle to digital preservation (see Reference [15], p.7).
The EPUB specification does not enforce a particular digital rights management scheme, but DRM may
be layered on top of the EPUB specifications. A producer can, for instance, use one of the three major
rights management systems in the market (Amazon DRM, Apple FairPlay DRM for books bought from
iBooks, and Adobe DRM), or some other DRM system along with some additional platform-targeting.
DRM protection should be removed from EPUB publications during pre-ingest by the producer or as a
part of the ingest process by the OAIS archive. In practice, only national libraries may be able to do this,
provided that legal deposit act and / or copyright act guarantee them such privilege. If migration is the
chosen preservation strategy, existing EPUB publications will be converted into more modern EPUB
versions when rendering tools for old versions are no longer available, and (eventually) migrated into
other formats.
If preserved EPUB publications are not directly accessible by the public, removing DRM, digital
watermarking, and other protection mechanisms from the archived documents is not a risk. When
publications are delivered to the customers as dissemination information packages (DIPs), the archive
shall use a combination of administrative and technical means to protect the documents as required in
the submission agreement. These means may include adding DRM protection mechanism into the DIP
submitted to the user according to the requirements of the submission agreement. The agreement may
also specify the customers the archive is entitled to serve; for instance, it is possible to require that the
preserved documents can only be disseminated to the producer, and the producer will serve the end-
users who do not have direct access the OAIS archive.
0.3 Digital preservation
The information society is dependent on successful long-term digital preservation. When an increasing
percentage of information is produced and published only in a digital format, it is important to make
sure that this information remains available in the distant future.
Digital preservation is not about preserving just bits, but about preserving access. The “business logic”
is as follows:
— we need software and hardware to render content for human users;
— software changes over time; there are new versions from old applications, and entirely new
applications;
— new or updated applications may not be able to render outdated file formats or format versions
correctly
— digital preservation makes an effort to have all archived content in stable formats. Publications
should also contain the smallest possible amount of features which are not commonly supported
in software packages used to render the content in these formats, and also avoid adding links to
external resources since then the long-term access to the publication requires also persistence of
these external resources.
— when necessary, data in old formats may be migrated into more modern formats or updated versions
of the same format. For instance, an e-book in EPUB 3.0.1 format may be migrated to EPUB 3.2.
when version 3.0.1 is no longer widely supported by reading systems.
16) http:// epubtest .org/ results
viii © ISO/IEC 2020 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC TS 22424-1:2020(E)

— since the aim is to preserve the content, not the bits, the bits may change as a result of version
updates and format migrations.
— Many OAIS archives preserve successive versions of archives publications, because migration may
change the look and feel of the original document, or even its intellectual content.
In many countries, national libraries are responsible for preserving the published cultural heritage for
the future generations, while national archives take care of governmental publications, irrespective of
which format they are available in. All of these resources have to be preserved for decades, centuries
even. Then again, publishers may guarantee continuous access to the subscribers of electronic serials
and other licensed content. If this is so, either the publisher or a third-party should look after the
publications and make sure they remain accessible or at least available.
Ordinary digital asset management systems are not suitable for long-term preservation; therefore it is a
normal practice to separate short-term and long-term information management into different systems.
However, this does not mean that digital archiving is independent of the routine life cycle of documents.
Digital preservation is a long process that begins when publications are created.
Preservation metadata, which allows the publication to be found, rendered and authenticated
correctly, is a prerequisite for digital preservation. Some preservation metadata elements can or
should be provided by the original creator of the publication. It is also important to keep preservation
requirements in mind when preparing a publication, if it is known that it has to be preserved for a long
time. Any feature in a file format can be either essential, useful, neutral, questionable, or even downright
counterproductive from a long-term preservation point of view. However, publishers are likely to use
the features that let them achieve their own goals, and preservation may not be among them.
There are archivable versions of some file formats. PDF/A (ISO 19005-1:2005) is probably the best known
example. It specifies how to use the PDF for long-term preservation. An example of a counterproductive
feature for preservation in PDF is font referencing; therefore in PDF/A all fonts shall be embedded in
order to guarantee that the document can be rendered correctly.
PDF/A forbids also the use of encryption, because encryption is generally regarded as a risk for long-
term preservation. But storing unencrypted documents is a risk as well, because if they are stolen, non-
[25]
authorized usage is easy. Therefore, according to the Digital preservation handbook :
Information security methods such as encryption add to the complexity of the preservation process and should
be avoided if possible for archival copies. Other security approaches may therefore need to be more rigorously
applied for sensitive unencrypted files; these might include restricting access to locked-down terminals in
controlled locations (secure rooms), or strong user authentication requirements for remote access.
In order to guarantee the correct processing of PDF/A files, there are specific requirements for PDF/A
reading systems, such as support for embedded fonts. There are three versions of the specification:
PDF/A-1 is based on PDF 1.4, PDF/A-2 adds features from PDF 1.5, 1.6 and 1.7, and PDF/A-3 contains all
the features of PDF/A-2 as well as allows the embedding of other file formats into PDF/A conforming
[21]
documents .
The TI/A (Tagged Image for Archival) standard initiative intended to create an ISO recommendation
to optimize the format specification for archival purposes. Unfortunately the project was disbanded
in 2016, and the TI/A draft the initiative completed in September 2016 is only available in the project
Intranet. However, the original TIFF/A (later TI/A) draft from February 2015 is a public document
17)
available on a PREFORMA project web site . Although this TIFF/A specification is only a draft, it is
probably a good idea to use in archival TIFF images features specified mandatory in the specification,
and avoid the ones which are forbidden.
The motivation behind the TI/A initiative can be applied to other image formats as well, and there are
also points the EPUB community might agree with Reference [22]:
17) http:// www .preforma -project .eu/ dpf -manager .html
© ISO/IEC 2020 – All rights reserved ix

---------------------- Page: 9 ----------------------
ISO/IEC TS 22424-1:2020(E)

The versatility of the TIFF format has made it very attractive for memory institutions for long-term archival
of their digital images. However, since the TIFF format offers such a great flexibility, it is not guaranteed
that in the future a standard TIFF reader will be able to read some TIFF images.
The limitations of the baseline TIFF are too severe for many applications in digital archiving. It is important
that, besides crucial technical metadata such as ICC color profiles (in case of color images) also important
descriptive metadata is stored within the image file. Having descriptive metadata available (such as content
description, iconography, copyright and ownership information etc.) is crucial for every archive. Having this
information in the same file as the image data guarantees that this information will always be associated
with the image.
TIFF is not an EPUB core media type, but four other image types have been listed; GIF, JPEG, PNG,
and SVG. It is significant from a digital preservation point of view how these formats and other core
media types are used in the EPUB context. Image and audio files embedded in an EPUB publication may
require migration before the EPUB publication itself has to be migrated into a more modern file fo
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.