Genomics informatics — Requirements of data analysis for direct-to-consumer testing

This document specifies the requirements for genetic data analysis relating to direct-to-consumer (DTC) testing, including preprocessing, detection site, evaluation models, the use of databases and the elements of assessment reports. This document applies to the analysis of genetic data from DTC testing without the involvement of a health care provider.

Informatique génomique — Exigences d'analyse des données pour les tests en libre accès

General Information

Status
Published
Publication Date
04-Mar-2026
Current Stage
6060 - International Standard published
Start Date
05-Mar-2026
Due Date
04-Sep-2027
Completion Date
05-Mar-2026

Overview - ISO/DTS 20738 (Genomics informatics for DTC testing)

ISO/DTS 20738 specifies data analysis requirements for direct‑to‑consumer (DTC) genetic testing products and services. It covers preprocessing, quality control, detection SNP sites, evaluation models and databases, report elements, and analytical workflows for both genotyping arrays (DNA chips) and whole genome sequencing (WGS). The draft applies specifically to DTC in vitro diagnostics delivered without a health care provider and aims to improve consistency, transparency and consumer confidence in DTC genetic results.

Key topics and technical requirements

  • Data analysis workflows
    • DNA chip: preprocessing, genotype calling, quality evaluation, genotype imputation and cluster analysis.
    • WGS: sequencing QC, alignment and deduplication, variant calling (hcWGS) or imputation (lcWGS), variant QC, annotation, interpretation and manual confirmation.
  • File formats and raw data
    • FASTQ for raw sequencing reads; VCF for variant output (Annex A provides VCF example).
    • Raw genotype files should include RSID, chromosome, position and allele calls, with metadata (platform, reference genome, run date).
  • Quality control thresholds
    • DNA chip: high‑density arrays (>600 000 markers) should have sample call rate ≥ 0.98; single‑site detection rate in a batch ≥ 0.8. (See Annex C for call‑rate thresholds by application.)
    • WGS: paired‑end reads ≥ 100 bp; filtered data QC metrics include Q20 ratio ≥ 90%, Q30 ratio ≥ 80%, and GC content ~40–45%. Sequencing depth guidance: high‑coverage WGS > 20× (clinical often >100×); low‑coverage WGS ~0.5–6×.
  • Genotype imputation and reference data
    • Imputation with appropriate reference haplotype panels is required for low‑coverage data or array gaps; reference sequences (e.g., GRCh38) must be used for alignment and reporting.
  • Evaluation model, databases and reporting
    • Requirements for evaluation models and annotation databases (informative Annex B) and mandatory elements of consumer assessment reports, plus provisions for use and disclosure of consumer data.

Applications and who should use this standard

  • DTC genetic test providers and marketplaces
  • Clinical and commercial sequencing laboratories offering consumer products
  • Bioinformatics pipelines and software developers for genotyping arrays and WGS
  • Quality assurance, regulatory and compliance teams assessing DTC product claims
  • Test developers preparing consumer‑facing interpretation and reporting workflows

ISO/DTS 20738 helps organizations ensure robust QC, transparent reporting and consistent interpretation practices for consumer genomics services.

Related standards

  • ISO/TC 215 (Health informatics), Subcommittee SC 1 - Genomics Informatics
  • Standards referenced in the draft: ISO 20397‑2:2021, ISO 16577:2022, ISO/IEC 23092‑2 (as cited)

Keywords: ISO/DTS 20738, genomics informatics, direct‑to‑consumer testing, DTC, genotyping arrays, whole genome sequencing, VCF, FASTQ, quality control, genotype imputation, variant calling.

Buy Documents

Technical specification

ISO/TS 20738:2026 - Genomics informatics — Requirements of data analysis for direct-to-consumer testing

Release Date:05-Mar-2026
English language (14 pages)
sale 15% off
Preview
sale 15% off
Preview

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

NYCE

Mexican standards and certification body.

EMA Mexico Verified

Sponsored listings

Frequently Asked Questions

ISO/TS 20738:2026 is a technical specification published by the International Organization for Standardization (ISO). Its full title is "Genomics informatics — Requirements of data analysis for direct-to-consumer testing". This standard covers: This document specifies the requirements for genetic data analysis relating to direct-to-consumer (DTC) testing, including preprocessing, detection site, evaluation models, the use of databases and the elements of assessment reports. This document applies to the analysis of genetic data from DTC testing without the involvement of a health care provider.

This document specifies the requirements for genetic data analysis relating to direct-to-consumer (DTC) testing, including preprocessing, detection site, evaluation models, the use of databases and the elements of assessment reports. This document applies to the analysis of genetic data from DTC testing without the involvement of a health care provider.

ISO/TS 20738:2026 is classified under the following ICS (International Classification for Standards) categories: 35.240.80 - IT applications in health care technology. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/TS 20738:2026 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)


Technical
Specification
ISO/TS 20738
First edition
Genomics informatics —
2026-03
Requirements of data analysis for
direct-to-consumer testing
Informatique génomique — Exigences d'analyse des données
pour les tests en libre accès
Reference number
© ISO 2026
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Data analysis process . 3
5 Quality control of raw data . 4
5.1 DNA chip data preprocessing and quality control requirements .4
5.1.1 Data preprocessing .4
5.1.2 Data quality control .5
5.2 Whole genome sequencing quality requirements.5
5.2.1 Sequencing type and data quality . .5
5.2.2 Sequencing data comparison and quality control .6
6 Evaluation model and database . 6
6.1 DNA chip analysis requirements .6
6.1.1 DNA chip selection .6
6.1.2 Genotyping analysis .6
6.1.3 Genotype imputation analysis .7
6.2 WGS analysis requirements .7
6.2.1 Variant detection, genotype imputation and quality control .7
6.2.2 Variant site annotation .7
6.2.3 Interpretation of variation .8
7 Evaluation report . 8
7.1 Interpretation .8
7.2 Use and disclosure of data .8
Annex A (normative) VCF format file example . 9
Annex B (informative) Annotation databases .11
Annex C (informative) Call rate thresholds by application .12
Bibliography .13

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 215, Health informatics, Subcommittee SC 1,
Genomics Informatics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
With increasing people’s awareness of their right to know their own body and of the need for disease
prevention, prediction, participation and personalized treatment, and with the rapid development of
sequencing technology, genetic testing has expanded from clinical application to general consumer
application. Direct-to-customer (DTC) testing refers to genetic testing that individuals can order without
needing a clinician or a health care provider. These tests typically analyze DNA from a sample ‒ often saliva
‒ to provide insights into various genetic traits.
DTC tests cover a wide range of genetic analyses, including ancestry and heritage (understanding ethnic
background and lineage), health and disease risk (identifying genetic predispositions to conditions such as
cancer or heart disease), traits and lifestyle (examining genetic influences on taste preferences, hair loss,
or lactose digestion), pharmacogenomics (assessing how genetic variations affect drug metabolism). DTC
testing improves the awareness and attention to certain diseases, and it allows to adjust existing precaution
under the guidance of professionals. It provides the necessary basis for the formation of personalized disease
prevention programs. As an increasing prevalent commonality that connects clinical care and lifestyle, DTC
testing has grown enormously both in practical and expected use, becoming more and more indispensable
in the genetic testing ecosystem.
This document is based on current DTC industry data, combined with the needs of upstream and downstream
industry users. It puts forward general requirements and suggestions on the data and technical content of
genotype imputation technology, analysis and interpretation of results, as well as specific requirements in
the development of a supporting evaluation model and database. With this document’s specifications as the
basis of data analysis in the development of DTC testing products and services, consumers can have greater
confidence in the conclusions drawn from the data, thereby facilitating greater confidence in DTC testing.

v
Technical Specification ISO/TS 20738:2026(en)
Genomics informatics — Requirements of data analysis for
direct-to-consumer testing
1 Scope
This document specifies the requirements for genetic data analysis relating to direct-to-consumer (DTC)
testing, including preprocessing, detection site, evaluation models, the use of databases and the elements of
assessment reports.
This document applies to the analysis of genetic data from DTC testing without the involvement of a health
care provider.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
coverage
coverage depth
number of times that a given base position is read in a sequencing run
Note 1 to entry: The number of reads that cover a particular position.
[SOURCE: ISO 20397-2:2021 3.6]
3.2
DNA chip
DNA microarray
solid substrate where a collection of probe DNA arranged in a specific design is attached in a high-density
fashion, directly or indirectly, that assays large amounts of biological material using high-throughput
screening methods
[SOURCE: ISO 16577:2022 3.4.13]
3.3
direct-to-customer
DTC
retail business model which eliminates any intermediaries and sells direct to consumer
Note 1 to entry: Also referred to as business to consumer (B2C).
Note 2 to entry: The sample, blood, saliva, cheek swab (cells from buccal cavity), fecal matter, nail clipping, are
provided by the consumer in assumed accordance with the collection protocol provided by the business.

3.4
FASTQ
genomic information representation that includes FASTA and quality values
[SOURCE: ISO/IEC 23092-2:2024, 3.8]
3.5
GC content
proportion of guanine and cytosine in a DNA molecule
3.6
genotype imputation
computational process to infer unobserved or missing genotypes in sequencing/genotyping data
Note 1 to entry: Using statistical models (e.g. hidden Markov models) and reference haplotype panels (e.g. 1 000
Genomes Project, TOPMed), imputation predicts missing variants by leveraging linkage disequilibrium (LD) patterns.
Common tools include IMPUTE2, Minimac, and BGI-lowpass.
Note 2 to entry: The output is typically a completed genomic variant dataset. This step is critical for enhancing data
utility in low-coverage whole genome sequencing (lcWGS) or genome-wide association studies (GWAS).
3.7
haplotype
combination of alleles at multiple sites that are inherited together on the same chromosome
3.8
InDel
insertion or deletion, or both, that occurs at a certain position in the genome
Note 1 to entry: InDel length is less than 50 bp.
3.9
quality score
Q score
Phred score
quality of base calling
measure of the probability of correct base recognition, usually expressed directly by a numerical value
Note 1 to entry: Q score is defined by the following formula:
Q = −10log (p)
where p is the estimated probability of the base call being wrong.
Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of 99 %.
Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of
99,9 %.
Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a
significant portion of the reads being unusable. Low quality scores can also indicate false-positive variant calls,
resulting in inaccurate conclusions.
3.10
sequencing depth
average number of times a nucleotide in a genome has been sequenced
Note 1 to entry: It is calculated by dividing the total number of sequenced bases in the aligned genome by the total
number of bases in the genome (excluding N).

3.11
whole genome sequencing
WGS
process that determines the complete DNA sequence of a human’s genome, including all 23 chromosome
pairs and mitochondrial DNA
Note 1 to entry: While performed through a coordinated workflow, current next-generation sequencing systems
cannot process the entire genome in a single run. The DNA is fragmented, sequenced in sections, and computationally
reconstructed using bioinformatics tools to assemble the complete genomic sequence.
Note 2 to entry: WGS is divided into high-coverage whole genome sequencing (hcWGS) and low-coverage whole
genome sequencing (lcWGS) according to the amount of sequencing.
Note 3 to entry: High-coverage WGS has sequencing depth >20×, while the coverage of clinical grade WGS is usually
>100×.
Note 4 to entry: For low-coverage WGS: 0,5× ≤ sequencing depth ≤ 6×.
4 Data analysis process
4.1 The integrity of the sample provided should be checked and verified prior to performing the analysis.
4.2 The data analysis process supported by the DNA chip shall include data preprocessing, genotype
calling, quality evaluation (quality assurance or quality control), genotype imputation, cluster analysis.
4.3 The WGS data analysis process shall include sequencing data quality control, compare and
deduplication, comparison quality control, variant calling (hcWGS) or genotype imputation (lcWGS),
variation quality control, variant annotation, variant interpretation and variation manual confirmation,
shown in Figure 1.
Figure 1 — Analysis and interpretation process based on WGS
5 Quality control of raw data
5.1 DNA chip data preprocessing and quality control requirements
5.1.1 Data preprocessing
5.1.1.1 The original data format of the DNA chip shall be subject to the chip manufacturer. Individual
data should be converted into VCF files or raw genotype data files for subsequent analysis. File formats are
provided as 5.1.1.2 and 5.1.1.3.
5.1.1.2 When converting chip data from raw data to variant call format (VCF) files or raw genotype
data files, cluster analysis should be used. The reference data used in the cluster analysis should be the
target population data of the detection service. The source of the reference data should be explained to the
consumer so there is a clear understanding of the relative nature of the results.

5.1.1.3 The raw genotype data file shall consist of four columns, including RSID (Reference SNP Cluster
ID), chromosome, the position on the chromosome, and a pair of bases. The raw genotype data file shall
explain the detection platform, detection time, reference genome sequence and other information in the
form of comments at the beginning of the file.
5.1.1.4 VCF file format requirements shall conform with Annex A.
5.1.2 Data quality control
5.1.2.1 For DNA chips spanning marker densities can be ranging from dozens to over 600 000 genome
sites. High-density DNA chips (> 600 000 markers) shall have a sample call rate ≥ 0,98. Targeted chips
(< 600 000 markers) shall meet call rate thresholds appropriate to their designed purpose, see Table C.1 for
recommended minimum call rates by chip type.
5.1.2.2 The detection rate of a single site in the same batch of samples shall not be lower than 0,8.
5.1.2.3 The international general human nucleic acid database shall be used as
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...