ISO/IEC 23092-4:2020
(Main)Information technology - Genomic information representation - Part 4: Reference software
Information technology - Genomic information representation - Part 4: Reference software
This document specifies genomic information representation reference software, referred to as the genomic model (GM). This decoding software is provided to assess conformance to the requirements of ISO/IEC 23092-1 and ISO/IEC 23092-2.
Technologie de l'information — Représentation des informations génomiques — Partie 4: Logiciel de référence
General Information
- Status
- Published
- Publication Date
- 28-Oct-2020
- Current Stage
- 9020 - International Standard under periodical review
- Start Date
- 15-Oct-2025
- Completion Date
- 15-Oct-2025
Overview
ISO/IEC 23092-4:2020 is an international standard published by ISO and IEC titled Information technology - Genomic information representation - Part 4: Reference software. This standard specifies the genomic information representation reference software, known as the genomic model (GM), which serves as decoding software to assess conformance with the earlier parts of the ISO/IEC 23092 series, specifically parts 1 and 2.
The purpose of this standard is to provide a normative, reference implementation that ensures interoperability and verification of genomic data coding methods. It supports the growing need for standardized, efficient, and interoperable genomic information processing in the context of high-throughput sequencing (HTS) technologies and personalized genomic medicine.
Key Topics
Genomic Model (GM) Software: The core of ISO/IEC 23092-4 is the genomic model, which is decoding software compliant with ISO/IEC 23092-1 and ISO/IEC 23092-2. It decodes bitstreams into genomic data representations consistent with the standard.
Conformance Assessment: The GM provides a tool for validating compliance of genomic data encoding and decoding implementations, assuring that encoded bitstreams processed by conformant decoders yield identical results.
Data Structures: The standard introduces fundamental genomic data structures such as genomic records, access units, and data clusters that represent sequencing reads, alignments, and associated metadata efficiently.
Reference Software Modules: ISO/IEC 23092-4 provides modular source code, available under a BSD License, facilitating reuse and integration in genomic data processing tools. The software does not prioritize computational optimization but acts as a definitive reference implementation.
Copyright and Licensing: The software modules come with a copyright disclaimer under BSD License terms, encouraging redistribution and use under specific conditions while disclaiming warranties.
Supporting Documents: This part of the series complements ISO/IEC 23092-1 (transport and storage), ISO/IEC 23092-2 (coding of genomic information), and ISO/IEC 23092-3 (metadata and APIs).
Applications
Genomic Data Exchange: Enables standardized exchange of sequencing data and aligned DNA sequences across diverse platforms and software systems, bridging compatibility gaps in genomic information workflows.
Bioinformatics Tools Development: Provides a foundation reference implementation supporting developers in building applications and pipelines that encode or decode genomic information compliant with ISO standards.
Personalized Medicine: Facilitates accurate, reliable transmission and storage of genomic data critical to patient-specific diagnostics and treatment decisions in clinical environments.
Genomic Data Compression: Supports optimized, interoperable coding formats for large-scale genomic data compression, improving storage efficiency and transmission speed.
Research and Clinical Genomics: Ensures consistent handling of genomic datasets, including raw sequencing reads, alignment data, and associated metadata, enhancing reproducibility in research and clinical diagnostics.
Related Standards
ISO/IEC 23092-1: Defines transport and storage methods for genomic information, including container formats optimized for genomic data.
ISO/IEC 23092-2: Specifies coding techniques for genomic sequences and alignment data, enabling efficient compression.
ISO/IEC 23092-3: Covers genomic information metadata frameworks and application programming interfaces (APIs) to interact with genomic datasets programmatically.
The ISO/IEC 23092 series as a whole addresses modular and flexible approaches for representing genomic data, focusing on scalability, interoperability, and secure data handling.
For more information and access to the reference software, visit the official ISO page: https://standards.iso.org/iso-iec/23092/-4/ed-1/en/. This resource offers full access to the genomic model implementation referenced by this standard.
Frequently Asked Questions
ISO/IEC 23092-4:2020 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Genomic information representation - Part 4: Reference software". This standard covers: This document specifies genomic information representation reference software, referred to as the genomic model (GM). This decoding software is provided to assess conformance to the requirements of ISO/IEC 23092-1 and ISO/IEC 23092-2.
This document specifies genomic information representation reference software, referred to as the genomic model (GM). This decoding software is provided to assess conformance to the requirements of ISO/IEC 23092-1 and ISO/IEC 23092-2.
ISO/IEC 23092-4:2020 is classified under the following ICS (International Classification for Standards) categories: 35.040.99 - Other standards related to information coding. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 23092-4:2020 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 23092-4
First edition
2020-10
Information technology — Genomic
information representation —
Part 4:
Reference software
Technologie de l'information — Représentation des informations
génomiques —
Partie 4: Logiciel de référence
Reference number
©
ISO/IEC 2020
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Reference software modules . 1
4.1 General . 1
4.2 Copyright disclaimer for software modules . 2
5 Genomic model (GM). 2
5.1 GM availability . 2
5.2 Compilation and usage of the GM . 2
5.3 Decoding software . 3
5.3.1 Decoding software modules . 3
5.3.2 Feature availability. 3
© ISO/IEC 2020 – All rights reserved iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that
are members of ISO or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see http:// patents .iec .ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information
A list of all parts in the ISO/IEC 23092 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO/IEC 2020 – All rights reserved
Introduction
The advent of high-throughput sequencing (HTS) technologies has the potential to boost the adoption
of genomic information in everyday practice, ranging from biological research to personalized genomic
medicine in clinics. As a consequence, an extraordinarily growing volume of generated data has
increased dramatically during the last few years, and an even more pronounced growth is expected in
the near future.
At the moment genomic information is mostly exchanged through a variety of data formats, such as
FASTA/FASTQ for unaligned sequencing reads and SAM/BAM/CRAM for aligned reads. With respect to
such formats, the ISO/IEC 23092 series provides a new solution for the representation and compression
of genome sequencing information by:
— Specifying an abstract representation of the sequencing data rather than a specific format with its
direct implementation.
— Being designed at a time point when technologies and use cases are more mature. This permits
addressing one limitation of the textual SAM format, for which the incremental ad-hoc addition of
features followed along the years, resulting in an overall redundant and suboptimal format which
was unnecessarily complicated.
— Separating free-field user-defined information with no clear semantics from the genomic data
representation. This allows a fully interoperable and automatic exchange of information between
different data producers.
— Allowing multiplexing of relevant metadata information with the data since data and metadata are
partitioned at different conceptual levels.
— Following a strict and supervised development process which has proven successful in the last
30 years in the domain of digital media for the transport format, the file format, the compressed
representation and the application program interfaces.
The ISO/IEC 23092 series provides the enabling technology that will allow the community to create an
ecosystem of novel, interoperable, solutions in the field of genomic information processing. In particular
it offers:
— Consistent, general and properly designed format definitions and data structures to store sequencing
and alignment information. A robust framework which can be used as a foundation to implement
different compression algorithms.
— Speed and flexibility in the selective access to coded data, by means of newly designed data clustering
and optimized storage methodologies.
— Low latency in data transmission and consequent fast availability at remote locations, based on
transmission protocols inspired by real-time application domains.
— Built-in privacy and protection of sensitive information, thanks to a flexible framework which
allows customizable secured access at all layers of the data hierarchy.
— Reliability of the technology and interoperability among tools and systems, owing to the provision
of a procedure to assess conformance to this document on an exhaustive dataset.
— Support to the implementation of a complete ecosystem of compliant devices and applications,
through the availability of a normative reference implementation covering the totality of the
ISO/IEC 23092 series.
The fundamental structure of the ISO/IEC 23092 series data representation is the genomic record. The
genomic record is a data structure consisting of either a single sequence read, or a paired sequence
read, and its associated sequencing and alignment information; it may contain detailed mapping and
alignment data, a single or paired read identifier (read name) and quality values.
© ISO/IEC 2020 – All rights reserved v
Without breaking traditional approaches, the genomic record introduced in the ISO/IEC 23092 series
provides a more compact, simpler and manageable data structure grouping all the information related
to a single DNA template, from simple sequencing data to sophisticated alignment information.
The genomic record, although it is an appropriate logic data structure for interaction and manipulation of
coded information, is not a suitable atomic data structure for compression. To achieve high compression
ratios, it is necessary to group genomic records into clusters and to transform the information of the
same type into sets of descriptors structured into homogeneous blocks. Furthermore, when dealing
with selective data access, the genomic record is
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...