Information technology - Genomic information representation - Part 3: Metadata and application programming interfaces (APIs)

This document specifies information metadata, auxiliary fields, SAM interoperability, protection metadata and programming interfaces of genomic information. It defines: - metadata storage and interpretation for the different encapsulation levels as specified in ISO/IEC 23092-1 (in Clause 6); - protection elements providing confidentiality, integrity and privacy rules at the different encapsulation levels specified in ISO/IEC 23092-1 (in Clause 7); - how to associate auxiliary fields to encoded reads (in Clause 8); - mechanisms for backward compatibility with existing SAM content, and exportation to this format (in Annex C); - interfaces to access genomic information coded in compliance with ISO/IEC 23092-1 and ISO/IEC 23092-2 (in subclause 8.1).

Technologie de l'information — Représentation des informations génomiques — Partie 3: Métadonnées et interfaces de programmation d'application (API)

General Information

Status
Withdrawn
Publication Date
16-Mar-2020
Current Stage
9599 - Withdrawal of International Standard
Start Date
25-Oct-2022
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 23092-3:2020 - Information technology — Genomic information representation — Part 3: Metadata and application programming interfaces (APIs) Released:3/17/2020
English language
88 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 23092-3:2020 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Genomic information representation - Part 3: Metadata and application programming interfaces (APIs)". This standard covers: This document specifies information metadata, auxiliary fields, SAM interoperability, protection metadata and programming interfaces of genomic information. It defines: - metadata storage and interpretation for the different encapsulation levels as specified in ISO/IEC 23092-1 (in Clause 6); - protection elements providing confidentiality, integrity and privacy rules at the different encapsulation levels specified in ISO/IEC 23092-1 (in Clause 7); - how to associate auxiliary fields to encoded reads (in Clause 8); - mechanisms for backward compatibility with existing SAM content, and exportation to this format (in Annex C); - interfaces to access genomic information coded in compliance with ISO/IEC 23092-1 and ISO/IEC 23092-2 (in subclause 8.1).

This document specifies information metadata, auxiliary fields, SAM interoperability, protection metadata and programming interfaces of genomic information. It defines: - metadata storage and interpretation for the different encapsulation levels as specified in ISO/IEC 23092-1 (in Clause 6); - protection elements providing confidentiality, integrity and privacy rules at the different encapsulation levels specified in ISO/IEC 23092-1 (in Clause 7); - how to associate auxiliary fields to encoded reads (in Clause 8); - mechanisms for backward compatibility with existing SAM content, and exportation to this format (in Annex C); - interfaces to access genomic information coded in compliance with ISO/IEC 23092-1 and ISO/IEC 23092-2 (in subclause 8.1).

ISO/IEC 23092-3:2020 is classified under the following ICS (International Classification for Standards) categories: 35.040.99 - Other standards related to information coding. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 23092-3:2020 has the following relationships with other standards: It is inter standard links to ISO/IEC 23092-3:2022. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 23092-3:2020 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 23092-3
First edition
2020-03
Information technology — Genomic
information representation —
Part 3:
Metadata and application
programming interfaces (APIs)
Technologie de l'information — Représentation des informations
génomiques —
Partie 3: Métadonnées et interfaces de programmation
d'application (API)
Reference number
©
ISO/IEC 2020
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved

Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
4 Abbreviated terms . 2
5 Conventions . 2
5.1 Character encoding . 2
5.2 Syntax functions and data types . 2
5.3 Graphic notations . 3
6 Information metadata . 4
6.1 General . 4
6.2 Dataset group metadata . 4
6.3 Reference metadata . 4
6.4 Dataset metadata . 5
6.5 Metadata protection . 7
6.6 Mechanism for extensions of the metadata set . 7
6.6.1 General. 7
6.6.2 Example for dataset metadata extensions . 8
6.6.3 Example for obfuscating labels. 8
6.6.4 Example for obfuscating sequences . 8
6.7 Metadata profiles . 8
6.7.1 General. 8
6.7.2 Example of metadata profile — Run . 8
6.7.3 Example of metadata profile — Genomic data commons . 9
7 Protection metadata .10
7.1 General .10
7.2 Encryption of gen_info elements and blocks.10
7.2.1 General.10
7.2.2 EncryptionParameters carried in dataset group protection.10
7.2.3 EncryptionParameters carried in dataset protection .12
7.2.4 Key retrieval .15
7.2.5 Decryption .16
7.3 Privacy rules for the use of the genomic information .17
7.3.1 General.17
7.3.2 Example of use of privacy rules .18
7.4 Digital signature of gen_info elements and blocks .19
7.4.1 General.19
7.4.2 Signatures carried in dataset group protection .19
7.4.3 Signatures carried in dataset protection.19
7.4.4 Signatures carried in access unit protection .21
7.4.5 Signatures carried in descriptor stream protection .21
8 Access unit information .22
8.1 General .22
8.2 genAuxRecord .22
8.3 genAux .23
8.4 genTag .23
9 Decoding process for metadata.24
9.1 General .24
9.2 Initialization of parameters .26
9.2.1 General.26
© ISO/IEC 2020 – All rights reserved iii

9.2.2 Properties .26
9.2.3 Parameters .26
9.2.4 Constants .27
9.2.5 Process .28
9.3 Macros .29
9.4 Decoding process .31
10 Application programming interfaces (APIs) .38
10.1 General .38
10.2 Structure of the API .38
10.3 Detailed specification of the API .38
10.3.1 Data types .38
10.3.2 Return codes .39
10.3.3 Metadata fields .39
10.3.4 Output structures .40
10.3.5 Filters .47
10.3.6 Genomic information .52
10.3.7 Metadata .56
10.3.8 Protection .58
10.3.9 Reference .59
10.3.10 Statistics .60
Annex A (normative) XML schemas corresponding to metadata information and protection
elements .63
Annex B (informative) XML schemas and XML-based data .64
Annex C (informative) SAM interoperability .77
Annex D (informative) Example of key transport .84
Bibliography .88
iv © ISO/IEC 2020 – All rights reserved

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that
are members of ISO or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see http:// patents .iec .ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 23092 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
© ISO/IEC 2020 – All rights reserved v

Introduction
The advent of high-throughput sequencing (HTS) technologies has the potential to boost the adoption
of genomic information in everyday practice, ranging from biological research to personalized genomic
medicine in the clinic. As a consequence, the volume of generated data has increased dramatically
during the last few years, and an even more pronounced growth is expected in the near future.
At the moment, genomic information is mostly exchanged through a variety of data formats, such as
FASTA/FASTQ for unaligned sequencing reads and SAM/BAM/CRAM for aligned reads. With respect to
such formats, the ISO/IEC 23092 series provides a new solution for the representation and compression
of genome sequencing information by:
— specifying an abstract representation of the sequencing data rather than a specific format with its
direct implementation;
— being designed at a time point when technologies and use cases are more mature. This permits the
addressing of one limitation of the textual SAM format, for which incremental ad-hoc addition of
features followed along the years, resulting in an overall redundant and suboptimal format which
at the same time results not general and unnecessarily complicated;
— normatively separating free-field user-defined information with no clear semantics from the
normative genomic data representation. This allows a fully interoperable and automatic exchange
of information between different data producers;
— allowing multiplexing of relevant metadata information with the data since data and metadata are
partitioned at different conceptual levels;
— following a strict and supervised development process which has proven successful in the last
30 years in the domain of digital media for the transport format, the file format, the compressed
representation and the application program interfaces.
This document provides the enabling technology that will allow the community to create an ecosystem
of novel, interoperable solutions in the field of genomic information processing. In particular, it offers:
— consistent, general and properly designed format definitions and data structures to store sequencing
and alignment information. A robust framework which can be used as a foundation to implement
different compression algorithms;
— speed and flexibility in the selective access to coded data, by means of newly designed data clustering
and optimized storage methodologies;
— low latency in data transmission and consequent fast availability at remote locations, based on
transmission protocols inspired by real-time application domains;
— built-in privacy and protection of sensitive information, thanks to a flexible framework which allows
customizable secured access at all layers of the data hierarchy;
— reliability of the technology and interoperability among tools and systems, owing to the provision
of a normative procedure to assess conformance to the standard on an exhaustive dataset;
— support to the implementation of a complete ecosystem of compliant devices and applications,
through the availability of a normative reference implementation covering the totality of the
specification.
The fundamental structure of the ISO/IEC 23092 series data representation is the genomic record. The
genomic record is a data structure consisting of either a single sequence read, or a paired sequence
read, and its associated sequencing and alignment information; it may contain detailed mapping and
alignment data, a single or paired read identifier (read name) and quality values.
vi © ISO/IEC 2020 – All rights reserved

Without breaking traditional approaches, the genomic record introduced in the ISO/IEC 23092 series
provides a more compact, simpler and manageable data structure grouping all the information related
to a single DNA template, from simple sequencing data to sophisticated alignment information.
The genomic record, although it is an appropriate logic data structure for interaction and manipulation of
coded information, is not a suitable atomic data structure for compression. To achieve high compression
ratios, it is necessary to group genomic records into clusters and to transform the information of the
same type into sets of descriptors structured into homogeneous blocks. Furthermore, when dealing
with selective data access, the genomic record is a too small unit to allow effective and fast information
retrieval.
For these reasons, this document introduces the concept of access unit, which is the fundamental
structure for coding and access to information in the compressed domain.
The access unit is the smallest data structure that can be decoded by a decoder compliant with
ISO/IEC 23092-2. An access unit is composed of one block for each descriptor used to represent the
information of its genomic records; therefore, a block payload is the coded representation of all the data
of the same type (i.e. a descriptor) in a cluster.
In addition to clusters of genomic records compressed into access units, reads are further classified in
six data classes: five classes are defined according to the result of their alignment against one or more
reference sequences; the sixth class contains either reads that could not be mapped or raw sequencing
data. The classification of sequence reads into classes enables to develop powerful selective data access.
In fact, access units inherit a specific data characterization (e.g. perfect matches in Class P, substitutions
in Class M, indels in Class I, half-mapped reads in Class HM) from the genomic records composing them,
and thus constitute a data structure capable of providing powerful filtering capability for the efficient
support of many different use cases.
Access units are the fundamental, finest grain data structure in terms of content protection and in
terms of metadata association. In other words, each access unit can be protected individually and
independently. Figure 1 shows how access units, blocks and genomic records relate to each other in the
ISO/IEC 23092 series data structure.
Figure 1 — Access units, blocks and genomic records
© ISO/IEC 2020 – All rights reserved vii

Figure 2 — High-level data structure: datasets and dataset group
A dataset is a coded data structure containing headers and one or more access units. Typical datasets
could, for example, contain the complete sequencing of an individual, or a portion of it. Other datasets
could contain, for example, a reference genome or a subset of its chromosomes. Datasets are grouped in
dataset groups, as shown in Figure 2.
A simplified diagram of the dataset decoding process is shown in Figure 3.
Figure 3 — Decoding process
The International Organization for Standardization (ISO) and International Electrotechnical
Commission (IEC) draw attention to the fact that it is claimed that compliance with this document may
involve the use of a patent.
ISO and IEC take no position concerning the evidence, validity and scope of this patent right.
viii © ISO/IEC 2020 – All rights reserved

The holder of this patent right has assured ISO and IEC that he/she is willing to negotiate licences under
reasonable and non-discriminatory terms and conditions with applicants throughout the world. In this
respect, the statement of the holder of this patent right is registered with ISO and IEC. Information may
be obtained from the patent database available at www .iso .org/ patents.
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights other than those in the patent database. ISO and IEC shall not be held responsible for
identifying any or all such patent rights.
© ISO/IEC 2020 – All rights reserved ix

INTERNATIONAL STANDARD ISO/IEC 23092-3:2020(E)
Information technology — Genomic information
representation —
Part 3:
Metadata and application programming interfaces (APIs)
1 Scope
This document specifies information metadata, auxiliary fields, SAM interoperability, protection
metadata and programming interfaces of genomic information. It defines:
— metadata storage and interpretation for the different encapsulation levels as specified in
ISO/IEC 23092-1 (in Clause 6);
— protection elements providing confidentiality, integrity and privacy rules at the different
encapsulation levels specified in ISO/IEC 23092-1 (in Clause 7);
— how to associate auxiliary fields to encoded reads (in Clause 8);
— mechanisms for backward compatibility with existing SAM content, and exportation to this format
(in Annex C);
— interfaces to access genomic information coded in compliance with ISO/IEC 23092-1 and
ISO/IEC 23092-2 (in subclause 8.1).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 23092-1, Information technology — Genomic information representation — Part 1: Transport and
storage of genomic information
ISO/IEC 23092-2, Information technology — Genomic information representation — Part 2: Coding of
genomic information
OASIS, eXtensible Access Control Markup Language (XACML) Version 3.0, 2013, Available: http:// docs
.oasis -open .org/ xacml/ 3 .0/ xacml -3 .0 -core -spec -cs -01 -en .pdf
IETF, PKCS #1: RSA Cryptography Specifications Version 2.2, November 2016, Available: https:// tools
.ietf .org/ html/ rfc8017
IETF, PKCS #5: Password-Based Cryptography Specification Version 2.1, January 2017, Available:
https:// tools .ietf .org/ html/ rfc2898https:// tools .ietf .org/ html/ rfc8018
IETF, Advanced Encryption Standard (AES) Key Wrap Algorithm, September 2002, Available: https://
tools .ietf .org/ html/ rfc3394
W3C, XML Path Language (XPath), Version 1.0, 16 November 1999, Available: https:// www .w3 .org/ TR/
xpath -10/
IEEE, 754-2008, IEEE Standard for Floating-Point Arithmetic, August 2008, Available: https:// ieeexplore
.ieee .org/ document/ 4610935
© ISO/IEC 2020 – All rights reserved 1

3 Terms and definitions
For the purposes of this document, the terms and definitions in ISO/IEC 23092-1 and ISO/IEC 23092-2
and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
BAM
compressed binary version of SAM
3.2
dataset group
collection of one or more datasets
Note 1 to entry: Which information is represented varies depending on the genomic information representation.
4 Abbreviated terms
AU access unit
AUC access unit contiguity
DSC descriptor stream contiguity
EBI European Bioinformatics Institute
EGA European Genome Archive
ENA European Nucleotide Archive
LSB least significant bit
NCBI National Center for Biotechnology Information
SRA sequence read archive
URN uniform resource name
5 Conventions
5.1 Character encoding
The implementation of the specifications described in this document shall use UTF-8 character
encoding.
5.2 Syntax functions and data types
The functions presented here are used in the syntactical description. These functions are expressed
in terms of the value of a bitstream pointer that indicates the position of the next bit to be read by the
decoding process from the bitstream.
2 © ISO/IEC 2020 – All rights reserved

byte_aligned( ) is specified as follows:
— If the current position in the bitstream is on a byte boundary, i.e., the next bit in the bitstream is the
first bit in a byte, the return value of byte_aligned( ) is equal to TRUE.
— Otherwise, the return value of byte_aligned( ) is equal to FALSE.
read_bits( n ) reads the next n bits from the bitstream and advances the bitstream pointer by n bit
positions. When n is equal to 0, read_bits( n ) is specified to return a value equal to 0 and to not advance
the bitstream pointer.
Size(array_name[]) returns the number of elements contained in the array named array_name.
The following data types specify the parsing process of each syntax element:
— f(n): fixed-pattern bit string using n bits written (from left to right) with the left bit first. The parsing
process for this data type is specified by the return value of the function read_bits( n ).
— st(v): null-terminated string encoded as universal coded character set (UCS) transmission format-8
(UTF-8) characters as specified in ISO/IEC 10646. The parsing process is specified as follows: st(v)
begins at a byte-aligned position in the bitstream and reads and returns a series of bytes from the
bitstream, beginning at the current position and continuing up to but not including the next byte-
aligned byte that is equal to 0x00, and advances the bitstream pointer by ( stringLength + 1 ) * 8 bit
positions, where stringLength is equal to the number of bytes returned.
NOTE The st(v) syntax data type is only used in this document when the current position in the
bitstream is a byte-aligned position.
— i(n): signed integer using n bits. When n is "v" in the syntax table, the number of bits varies in a
manner dependent on the value of other syntax elements. The parsing process for this data type
is specified by the return value of the function read_bits( n ) interpreted as a two's complement
integer representation with most significant bit written first.
— u(n): unsigned integer using n bits. When n is "v" in the syntax table, the number of bits varies in a
manner dependent on the value of other syntax elements. The parsing process for this data type is
specified by the return value of the function read_bits( n ) interpreted as a binary representation of
an unsigned integer with most significant bit written first.
— f32: 32 bit single precision floating-point as specified by IEEE 754-2008.
— f64: 64 bit double precision floating-point as specified by IEEE 754-2008.
— c(n): sequence of n ASCII characters.
5.3 Graphic notations
The notation -> (arrow) is used in this document to indicate the access to a member of a data structure.
The notations | |= are used in this document to indicate the bitwise OR operation and assignment
respectively. a |= b is equivalent to a = a | b.
The notations & &= are used in this document to indicate the bitwise AND operation and assignment
respectively. a &= b is equivalent to a = a & b.
The notation return_error()is used in this document to indicate that the decoding process has to stop
due to a decoding error which cannot be handled.
The notation continue is used in this document within for and while statements to signal that the
process shall continue to the next iteration without executing any further statement in the current
iteration.
© ISO/IEC 2020 – All rights reserved 3

The notation *(ptr) is used in this document to access the data/value in the memory that the pointer
ptr points to - the contents of the address with that numerical index. The operator * is said to dereference
the pointer ptr.
6 Information metadata
6.1 General
This clause defines a minimum core set of metadata elements, which users and applications can then
extend by including extra information elements. Metadata sets are specified for dataset groups,
datasets and references, as specified in ISO/IEC 23092-1. The structure of these metadata sets and their
elements is specified using XML v1.1.
Extensions to (i.e., new elements for) the metadata sets specified in this clause are represented with an
identifier of the extension type in the form of a URI, a value and a pointer to a resource documenting the
semantics of the extension type.
Metadata profiles are specific subsets of metadata sets specified using mechanisms provided in this
document. A metadata profile specified in this document may correspond to well-known metadata
[1]
sets specified or used out of the ISO/IEC 23092 series, such as those in ENA or EGA and NCBI
[2]
specifications , as examples. This allows easy interoperability with already existing systems. A
metadata profile includes a subset of (or all) core elements described in subclauses 6.2 and 6.4, and a
set of new elements specified with the extension mechanism specified in subclause 6.6.
The rest of clauses specify dataset group metadata (subclause 6.2), reference metadata (subclause 6.3),
dataset metadata (subclause 6.4), extensions (subclause 6.6) and profiles (subclause 6.7).
6.2 Dataset group metadata
Compressed dataset group metadata are stored within the DG_metadata_value element of the DG_
metadata box (with key dgmd), as specified in ISO/IEC 23092-1. The decoding process of DG_metadata_
value is specified in Clause 9. The output of the decoding process is an XML document, where the root
node is DatasetGroup. Annex A.1 provides the XML schema for a decoded dataset group metadata.
As previously introduced in subclause 6.1, an extensions type is the combination of three elements:
the value, the identifier of the extension, and a link to a resource documenting the interpretation of
the extenstion. In the XML schema, this is translated as an element with three child elements: the
Type element (of type URI), the Documentation (of type URI) and the value which is represented as the
element taking the place of the element any in the schema. Additionally, for extensions belonging to the
dataset group, the Boolean element Inheritable (as specified in Annex A.1) of the extension element
indicates if the extension is only relevant to the dataset group, or if the dataset also inherits it. The
resource documentation can be human readable, and the extensions parsing is not required.
6.3 Reference metadata
Compressed reference metadata are stored within the reference metadata box, as specified in
ISO/IEC 23092-1, in the reference_metadata_value field. Clause 9 specifies the decoding process of
reference_metadata_value. The output of the decoding process is an XML document, with a root element
Reference. Annex A.4 provides the related XML schema. Table 1 specifies the semantics of the fields.
Table 1 — Semantics of reference sequence's fields
Tag name Description
a
length Length in base pairs of the sequence
a
In this document, "base" or "base pair" is used as a synonym for "nucleotide".
4 © ISO/IEC 2020 – All rights reserved

Table 1 (continued)
Tag name Description
alternative_locus_location The sequence is an alternative locus from an unknown region. A child
element chromosome_name identifies on which chromosome the
sequence has an alternative locus. If present, a child element position
indicates the start and end position of the alternative locus.
alternative_sequence_name List of alternative names
genome_assembly_identifier Genome assembly identifier
description Human readable textual description
species Name of the species
URI URI of the sequence
a
In this document, "base" or "base pair" is used as a synonym for "nucleotide".
6.4 Dataset metadata
Compressed dataset metadata are stored within the DT_metadata_value field of the DT_metadata box
(marked as dtmd), as specified in ISO/IEC 23092-1. Clause 9 specifies the decoding process of DT_
metadata_value. The output of the decoding process is an XML document with an element Dataset as
root. Annex A.2 provides the XML schema for dataset metadata. A dataset metadata element overwrites
the corresponding element whose values differ from the one indicated at the dataset group level (i.e.,
the new value in the dataset is a specialization of the value at the dataset group level).
Table 2 defines the process to obtain the dataset metadata with inherited elements. In this table, the
following notations are used:
— .has(): the function returns true if the element has a child element with an unqualified name equal
to the parameter given, and false otherwise
— .get(): the function returns the content of the child element with an unqualified name equal to the
parameter given, as an array of characters
— .getElement(): the function returns the content of the child element with an unqualified name equal
to the parameter given
th
— .getByIndex(): the function returns the content of the i child element with an unqualified name
equal to the first parameter given and i equal to the second parameter given, as an array of characters
— .getEncoding(): the function returns the content of the element as an array of characters
— .set(): the function sets the content of the child element with an unqualified name equal to the first
parameter given, to the array of characters given as the second parameter
— .add(): the function creates a new child element with an unqualified name equal to the first
parameter given, and a content equal to the second parameter. The created element is appended to
the content of the current element.
— .getNumber(): the function returns the number of child elements with an unqualified name equal to
the parameter given.
Table 2 — Decoding process of dataset metadata
datasetMetadataWithInheritance = datasetMetadata
if (!datasetMetadata.has(“Type”)){
datasetMetadataWithInheritance.set(
“Type”,
datasetGroupMetadata.get(“Type”)
)
© ISO/IEC 2020 – All rights reserved 5

Table 2 (continued)
}
if (!datasetMetadata.has(“Abstract”)){
datasetMetadataWithInheritance.set(
“Abstract”,
datasetGroupMetadata.get(“Abstract”)
)
}
if (!datasetMetadata.has(“ProjectCentre”)){
datasetMetadataWithInheritance.set(
“ProjectCentre”,
datasetGroupMetadata.get(“projectCentre”)
)
}
if (!datasetMetadata.has(“Description”)){
datasetMetadataWithInheritance.set(
“Description”,
datasetGroupMetadata.get(“Description”)
)
}
if(!datasetMetadata.has(“Samples”)){
datasetMetadataWithInheritance.set(
“Samples”,
datasetGroupMetadata.get(“Samples”)
)
}
extensions = datasetGroupMetadata.getElement(“Extensions”)
extensionsDataset = datasetMetadata.getElement(“Extensions”)
for(i=0; i < extensions.getNumber(“Extension”); i++){
extension = extensions.getByIndex(“Extension”, i)
typeExtension = extension.get(“Type”)
if (extension.get(“Inheritable”) == “true”){
continue
}
found = false
for(j=0; j< extensionsDataset.getNumber(“Extension”); j++){
extensionDataset = extensionsDataset.getByIndex(“Extension”, j)
typeExtensionDataset = extensionDataset.get(“Type”)
if( typeExtension == typeExtensionDataset){
found = true
break
}
}
if(!found){
extensionsDataset.add(“Extension”, extension.getEncoding())
}
}
After executing the inheritance process, the extensions are ordered in the alphabetical order of their
Type element.
6 © ISO/IEC 2020 – All rights reserved

For example, one can have datasets for patients A, B and C; therefore the dataset group metadata
includes a list of samples representing A, B and C. The datasets then provide only one sample
description (respectively of A, B or C). The base set of elements in the dataset metadata is the same as
for the dataset group, but the elements are not mandatory (so there is no need to repeat them), since by
default their values are considered equal to the values indicated in the dataset group. This is always the
case for the values belonging to the core set, or by default for the extensions except for those cases that
have the inheritance parameter (element named Inheritable in the definition of the extension type in
Annex A.1) set to false.
As in the case of the dataset group metadata, Annex A.2 provides the resulting schema.
Also as in the case of the dataset group, the extension mechanism is available to include new attributes
where necessary. See subclause 6.6.2 on extensions for an example in the case of dataset metadata.
6.5 Metadata protection
The schemas defined in Annexes A.1 and A.2 include choices to either provide certain values as
plaintext or encrypted content. The encrypted content shall be such that after decryption, a metadata
element is obtained which is valid according to the schemas, but which does not contain any encrypted
information. The mechanism to transmit the knowledge of the keys shall be established through
another channel.
No call to functions specified in 10.3.7 may return encrypted content.
The schemas defined in Annexes A.1 and A.2 allow for the signature of the metadata or parts thereof.
No call to functions specified in 10.3.7 may return content for which the signature could not be verified.
6.6 Mechanism for extensions of the metadata set
6.6.1 General
This subclause provides a mechanism for adding new elements to the different core metadata sets
(dataset group and dataset levels).
An extended element consists of:
— an information type identifier (provided in the form of a URI),
— a value,
— documentation (provided in the form of a URI).
In the case of extensions at the dataset group level, a third value, the inheritance flag of type Boolean,
is optionally present. If not present, the default value is True. If the flag is set to True, the value of the
extension is inherited by the datasets belonging to the dataset group. If the flag is set to False, the value
only applies to the dataset group.
Annex A.1 defines the extension schema. Using extensions, the core metadata sets can be adapted
to multiple use cases. This document defines profiles (see subclause 6.7), which rely on well-known
extensions, defined in this document and for which the URI pointer is known. To be compliant with a
profile specified in this document, a tool shall implement the list of extensions included in the profile.
At the end of the decoding
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...