Biotechnology — Predictive computational models in personalized medicine research — Part 1: Constructing, verifying and validating models

This document specifies requirements and recommendations for the design, development and establishment of predictive computational models for research purposes in the field of personalized medicine. It addresses the set-up, formatting, validation, simulation, storing and sharing of computational models used for personalized medicine. Requirements and recommendations for data used to construct or required for validating such models are also addressed. This includes rules for formatting, descriptions, annotations, interoperability, integration, access and provenance of such data. This document does not apply to computational models used for clinical, diagnostic or therapeutic purposes.

Biotechnologie — Modèles informatiques prédictifs dans la recherche sur la médecine personnalisée — Partie 1: Construction, vérification et validation des modèles

General Information

Status
Published
Publication Date
13-Jun-2023
Technical Committee
Drafting Committee
Current Stage
9092 - International Standard to be revised
Start Date
20-Jun-2023
Completion Date
13-Dec-2025
Ref Project
Technical specification
ISO/TS 9491-1:2023 - Biotechnology — Predictive computational models in personalized medicine research — Part 1: Constructing, verifying and validating models Released:14. 06. 2023
English language
31 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


TECHNICAL ISO/TS
SPECIFICATION 9491-1
First edition
2023-06
Biotechnology — Predictive
computational models in personalized
medicine research —
Part 1:
Constructing, verifying and validating
models
Biotechnologie — Modèles informatiques prédictifs dans la recherche
sur la médecine personnalisée —
Partie 1: Construction, vérification et validation des modèles
Reference number
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Principles . 4
4.1 General . 4
4.2 Computational models in personalized medicine . 4
4.2.1 General . 4
4.2.2 Cellular systems biology models . 5
4.2.3 Risk prediction for common diseases. 6
4.2.4 Disease course and therapy response prediction . 6
4.2.5 Pharmacokinetic/-dynamic modelling and in silico trial simulations . 7
4.2.6 Artificial intelligence models . 7
4.3 Standardization needs for computational models. 8
4.3.1 General . 8
4.3.2 Challenges . 8
4.3.3 Common standards relevant for personalized medicine . 9
4.4 Data preparation for integration into computer models . 9
4.4.1 General . 9
4.4.2 Sampling data . . 9
4.4.3 Data formatting . 10
4.4.4 Data description . 11
4.4.5 Data annotation (semantics) . 11
4.4.6 Data interoperability requirements across subdomains .12
4.4.7 Data integration .13
4.4.8 Data provenance information . 13
4.4.9 Data access . 14
4.5 Model formatting . . 14
4.6 Model validation . 15
4.6.1 General .15
4.6.2 Specific recommendations for model validation .15
4.7 Model simulation . 17
4.7.1 General . 17
4.7.2 Requirements for capturing and sharing simulation set-ups. 18
4.7.3 Requirements for capturing and sharing simulation results . 19
4.8 Requirements for model storing and sharing . 19
4.9 Application of models in clinical trials and research . 19
4.9.1 General . 19
4.9.2 Specific recommendations . 20
4.10 Ethical requirements for modelling in personalized medicine . 20
Annex A (informative) Common standards relevant for personalized medicine and in silico
approaches .21
Annex B (informative) Information on modelling approaches relevant for personalized
medicine . .24
Bibliography .26
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO’s adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 276, Biotechnology.
A list of all parts in the ISO 9491 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
The capacity to generate data in life sciences and health research has greatly increased in the last decade.
In combination with patient/personal-derived data, such as electronic health records, patient registries
and databases, as well as lifestyle information, this big data holds an immense potential for clinical
applications, especially for computer-based models with predictive capacities in personalized medicine.
However, and despite the ever-progressing technological advances in producing data, the exploitation of
big data to generate new knowledge for medical benefits, while guaranteeing data privacy and security,
is lacking behind its full potential. A reason for this obstacle is the inherent heterogeneity of big data
and the lack of broadly accepted standards allowing interoperable integration of heterogeneous health
data to perform analysis and interpretation for predictive modelling approaches in health research,
such as personalized medicine.
Common standards lead to a mutual understanding and improve information exchange within and
across research communities and are indispensable for collaborative work. In order to setup computer
models in personalized medicine, data integration from heterogeneous and different sources at different
times plays a key role. Consistent documentation of data, models and simulation results based on basic
guiding principles for data management practices, such as FAIR (findable, accessible, interoperable,
[7]
reusable) or ALCOA (attributable, legible, contemporaneous, original, accurate), and standards can
ensure that the data and the corresponding metadata (data describing the data and its context), as well
as the models, methods and visualizations, are of reliable high quality.
Hence, standards for biomedical and clinical data, simulation models and data exchange are a
prerequisite for reliable integration of health-related data. Such standards, together with harmonized
ways to describe their metadata, ensure the interoperability of tools used for data integration and
modelling, as well as the reproducibility of the simulation results. In this sense, modelling standards are
agreed ways of consistently structuring, describing, and associating models and data, their respective
parts and their graphical visualization, as well as the information about applied methods and the
outcome of model simulations. Such standards also assist in describing how constituent parts interact,
or are linked together, and how they are embedded in their physiological context.
Major challenges in the field of personalized medicine are to:
a) harmonize the standardization efforts that refer to different data types, approaches and
technologies;
b) make the standards interoperable, so that the data can be compared and integrated into models.
An overall goal is to FAIRify data and processes in order to improve data integration and reuse. An
additional challenge is to ensure a legal and ethical framework enabling interoperability.
This document presents modelling requirements and recommendations for research in the field of
personalized medicine, especially with focus on collaborative research, such that health-related data
can be optimally used for translational research and personalized medicine worldwide.
v
TECHNICAL SPECIFICATION ISO/TS 9491-1:2023(E)
Biotechnology — Predictive computational models in
personalized medicine research —
Part 1:
Constructing, verifying and validating models
1 Scope
This document specifies requirements and recommendations for the design, development and
establishment of predictive computational models for research purposes in the field of personalized
medicine. It addresses the set-up, formatting, validation, simulation, storing and sharing of
computational models used for personalized medicine. Requirements and recommendations for data
used to construct or required for validating such models are also addressed. This includes rules for
formatting, descriptions, annotations, interoperability, integration, access and provenance of such data.
This document does not apply to computational models used for clinical, diagnostic or therapeutic
purposes.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 20691:2022, Biotechnology — Requirements for data formatting and description in the life sciences
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
artificial intelligence
AI
capability to acquire, process, create and apply knowledge, held in the form of a model, to
conduct one or more given tasks
[SOURCE: ISO/IEC TR 24030:2021, 3.1]
3.2
molecular biomarker
biomarker
molecular marker
detectable and/or quantifiable molecule or group of molecules used to indicate a biological condition,
state, identity or characteristic or an organism
EXAMPLE Nucleic acid sequences, proteins, small molecules such as metabolites, other molecules such as
lipids and polysaccharides.
[SOURCE: ISO 16577:2022, 3.4.28]
3.3
big data in health
high volume, high diversity biological, clinical, environmental, and lifestyle information collected from
single individuals to large cohorts, in relation to their health and wellness status, at one or several time
points
[SOURCE: Reference [8]]
3.4
community standard
standard that reflects the results of a grass-roots standardization effort from a specific user group, and
that is created by individual organizations or communities
3.5
computational model
in silico model
description of a system in a mathematical expression and/or graphical form highlighting objects and
their interfaces
Note 1 to entry: An object distributed processing (ODP) concept.
Note 2 to entry: The computational model is similar to OMT ad UML notion of a class diagram when using the
graphical form.
[SOURCE: ISO/IEC 16500-8:1999, 3.6, modified — Admitted term added. “mathematical expression
and/or” added, and “as such it is similar to the OMT and UML notion of a class Diagram” deleted from
the definition. “An object distributed processing (ODP) concept” moved to Note 1 to entry. Note 2 to
entry added.]
3.6
data-driven model
model developed through the use of data derived from tests or from the output of investigated process
[SOURCE: ISO 15746-1:2015, 2.4]
3.7
data harmonization
technical process of bringing together different data types to make them processable in the same
computational framework
3.8
data integration
systematic combining of data from different independent and potentially heterogeneous sources, to
create a more compatible, unified view of these data for research purpose
[SOURCE: ISO 5127:2017, 3.1.11.24]
3.9
genome-wide association studies
GWAS
testing of genetic variants across the genomes of many individuals to identify genotype–phenotype
associations
3.10
in silico clinical trial
use of individualized computer simulation in the development or regulatory evaluation of a medicinal
product, medical device or medical intervention
[SOURCE: Reference [9]]
3.11
in silico approach
computer-executable analyses of mathematical model(s) (3.13) to study and simulate a biological system
3.12
machine learning
ML
computer technology with the ability to automatically learn and improve from experience without
being explicitly programmed
EXAMPLE Speech recognition, predictive text, spam detection, artificial intelligence.
[SOURCE: ISO 20252:2019, 3.52, modified — Abbreviated term “ML” added.]
3.13
mathematical model
sets of equations that describes the behaviour of a physical system
[SOURCE: ISO 16730-1:2015, 3.11]
3.14
mechanism-based
approach in computational modelling that aims for a structural representation
3.15
model validation
comparison between the output of the calibrated model and the measured data, independent of the
data set used for calibration
[SOURCE: ISO 14837-1:2005, 3.7]
3.16
model verification
confirmation that the mathematical elements of the model behave as intended
[SOURCE: ISO 14837-1:2005, 3.8]
3.17
personalized medicine
medical model using characterization of individuals’ phenotypes and genotypes for tailoring the right
therapeutic strategy for the right person at the right time, and/or to determine the predisposition to
disease and/or to deliver timely and targeted prevention
Note 1 to entry: Examples for individuals’ phenotypes and genotypes are molecular profiling, medical imaging
and lifestyle data.
Note 2 to entry: Medical decisions, prevention strategies and therapies in personalized medicine are based on
this individuality.
[10]
[SOURCE: EU 2015/C 421/03 ]
3.18
raw data
data in its originally acquired, direct form from its source before subsequent processing
[SOURCE: ISO 5127:2017, 3.1.10.04]
4 Principles
4.1 General
Research in the field of personalized medicine is highly dependent on the exchange of data from
different sources, as well as harmonized integrative analysis of large-scale personalized medicine data
(big data in health). Computational modelling approaches play a key role for understanding, simulating
and predicting the molecular processes and pathways that characterize human biology. Modelling
approaches in biomedical research also lead to a more profound understanding of the mechanisms and
factors that drive disease, and consequently allow for adapting personalized treatment strategies that
are guided by central clinical questions. Patients can greatly benefit from this development in research
that equips personalized medicine with predictive capabilities to simulate in silico clinically relevant
questions, such as the effect of therapies, the response to drug treatments or the progression of disease.
4.2 Computational models in personalized medicine
4.2.1 General
Computational models have the potential to translate in vitro, non-clinical and clinical results (and
their related uncertainty) into descriptive or predictive expressions. The added value of such models
[11][12]
in medicine and pharmacology has increasingly been recognized by the scientific community,
[13][14]
as well as by regulatory bodies such as the European Medicines Agency (e.g. EMA guideline on
[15] [16][17]
PBPK reporting ), or the US Food and Drug Administration (FDA). Computational models are
integrated in different fields in medicine and drug development expanding from disease modelling,
molecular biomarker research to assessment of drug efficacy and safety. In silico approaches are also
[18][19] [20][21]
expanding in neighbouring fields, such as pharmacoeconomics, analytical chemistry and
[22][23]
biology that are out of scope of this document.
Model creation starts with a clinical question and the collection of data (see Figure 1). The data employed
need harmonized approaches for data integration to start the model construction. The initial model
usually undergoes several refinement and improvement iterations to enhance predictive capabilities.
Common standards (see 4.3.3) should be used for the model building and curation process. Accuracy
measurements and validation processes are key, and should be transparent, while model output and
function should ideally be interpretable or explainable.
A number of computational modelling approaches in pre-clinical and clinical research already address
these questions in detail (see 4.2.2 to 4.2.6) and, therefore, play a leading role for the future development
of personalized medicine.
Figure 1 — Modelling approach for personalized medicine
4.2.2 Cellular systems biology models
4.2.2.1 General
For the simulation of complex dynamic biological processes and networks, models can be either data-
driven (“bottom-up”) or mechanism-based (“top-down”).
Mechanism-based concepts aim for a structural representation of the governing physiological
processes based on model equations with limited amount of data, which are required for the base model
[24] [25][26] [11][27]
establishment or, alternatively, on static interacting networks. Data-driven approaches
require sufficiently rich and quantitative time-course data to train and to validate the model. Due to the
often black-box nature of data driven approaches, the model validation process relies on performance
tests against known results.
4.2.2.2 Challenges
The challenges are as follows:
— Creation of models that balance the level of abstraction with comprehensiveness to make modelling
efforts reproducible and reusable (abstraction versus size).
— Development of prediction models that can be adopted easily to individual patient profiles.
— Efficient parameter estimation tools to cope with population and disease heterogeneity.
— Overfitting of the model to the experimental/patient data and optimization methods for model
predictions in a realistic parametric uncertainty.
— Flexibility in models to cope with missing data (e.g. diverse patient profiles).
— Scaling from cellular to organ and to organism levels (e.g. high clinical relevance, high hurdles for
regulatory acceptancy).
4.2.3 Risk prediction for common diseases
4.2.3.1 General
Predictive models stratify patients into distinct subgroups at different levels of risk for clinical
outcomes (risk prediction for disease). By training the algorithm on clinical data, phenotypic or
genotypic, subgroups can be identified which have identifiably different patterns of clinical markers. By
then identifying which patterns a patient fits best, the model can place a particular patient within the
most similar trajectory, thereby also stratifying the patient to a particular level of risk. Clinical markers
used in such models can be any health feature, tokenized as to be analysable by the model, from data
such as disease history symptoms, treatment and other exposure data, family history, laboratory data,
etc., to genetic data.
4.2.3.2 Challenges
The challenges are as follows:
— Understanding the possible implication to patients at an individual level. What can be inferred?
How to test the inference made?
— Limited replication of genetic associations and poor application of diverse populations (e.g. too
poorly represented to be of interest for specific analyses), specifically of mixed or non-European
ancestry.
— Varying transparency of methodological choices and reproducibility.
— Limited cellular/tissue context and harmonized functional data availability across populations/
studies.
— Missing environmental information coupled to genetic data.
4.2.4 Disease course and therapy response prediction
4.2.4.1 General
Prediction of the disease behaviour (mild versus severe, stable versus progressive) early in the disease
course based on specific molecular biomarkers can allow an improved timing of therapy introduction,
[28]
as well as the choice of therapy scheme (targeted therapy). Ideally, these models can provide a
prediction of multi-factorial diseases at unprecedented resolution, in a way that clinicians can use the
information in their daily decision-making.
4.2.4.2 Challenges
The challenges are as follows:
— Harmonization and standardization of clinical information for measuring the disease of interest.
— Developing transparent and quality-controlled workflows for molecular data generation and
interpretation in clinical settings.
— Harmonization and application of existing and upcoming pre-examination workflow standards
(including specimen collection, storage and nucleic acid isolation), as well as developing feasible
ring trial formats and external quality assurance (EQA) schemes for given molecular analysis types.
— Transparent reduction of contents and definition of appropriate marker sets and dynamic models to
foster clinical translation.
— Developing intuitive visualization results and insights into molecular analyses, as well as critical
appraisal of limitations of models by physicians.
4.2.5 Pharmacokinetic/-dynamic modelling and in silico trial simulations
4.2.5.1 General
[29][30]
Pharmacokinetic/pharmacodynamic (PK/PD) models can usefully translate in vitro, non-clinical
and clinical PK/PD data into meaningful information to support decision-making. At the individual
level, substance PKs can either be described by non-compartmental analysis and compartmental PK
modelling or by physiologically-based PK (PBPK) modelling. At the population level, population PK
have become the most commonly used top-down models that derive a pharmaco-statistical model from
observed systemic concentrations. PK/PD modelling involves on the one hand a quantification of drug
absorption and disposition (PK) and on the other hand a description of the drug-induced effect (PD).
PK/PD models and quantitative systems pharmacology (QSP) both aim for mechanistic and quantitative
[31]
analyses of the interactions between a substance such as a drug and a specific biological system.
PK and PBPK modelling are currently used for simulations for virtual patient populations in in silico
clinical trials. The concept is that computer simulations are proposed as an alternative source of
evidence to support drug development to reduce, refine, complement or replace the established data
sources including in vitro experiments, in vivo animal studies and clinical trials in healthy volunteers
and patients.
4.2.5.2 Challenges
The challenges are as follows:
— Reliable data sources for systems-related parameters are currently limited.
— Methods for data generation, collection and integration are not standardized.
[32]
— Reporting of results is very heterogeneous and inconsistent.
— Tools to be used and criteria for model evaluation are very variable across projects.
— Very limited platforms (systems model) are currently considered reliable and qualified for
regulatory submission.
4.2.6 Artificial intelligence models
4.2.6.1 General
Data-driven approaches, utilizing artificial intelligence (AI) and machine learning (ML) treat the
mechanism as unknown and aim to model a function that operates on data input to predict the
outcome, regardless of the unknown physiological processes. The mechanisms operating in the complex
systems being modelled, i.e. which factors together drive outcomes, are considered too complex to be
determined (black-box models). The quality of black-box models is assessed through the accuracy of
their predictions, tested in a variety of ways. These data-driven models can be applied in a hypothesis-
naive way, made as to which factors drive the causal mechanism.
ML approaches learn the theory automatically from the data through a process of inference, model
[33]
fitting or learning from examples. ML can be supervised, unsupervised or partially supervised (see
Annex B).
4.2.6.2 Challenges
The challenges are as follows:
— Imprecise reporting, which makes it difficult to obtain the full benefit of results, navigate biomedical
literature and generate clinically actionable findings.
— Data standardization, since most in silico methods require comparable input data.
— Data based on group associations, or pre-determined understanding of clinical relationships, can
bias and limit AI/ML predictions (inappropriately pre-processed data).
— Different proprietary systems in healthcare information technology (IT) make data extraction,
labelling, interpretation and standardization highly complex procedures (data lockdown).
4.3 Standardization needs for computational models
4.3.1 General
Major challenges in the field of personalized medicine are to harmonize the standardization efforts
that refer to different data types, approaches and technologies, as well as to make the standards
interoperable, so that the data can be compared and integrated into models. Reproducible modelling in
personalized medicine requires a basic understanding of the modelled system, as well as of its biological
and physiological background, and finally of the applied virtual experiments.
Because of the heterogeneous nature of the data in personalized medicine, harmonized strategies for
data integration are required that utilize broadly applicable standards to allow for reproducible data
exploitation to generate new knowledge for medical benefits. The two key components for which broad
standardization efforts make most sense in the model building process are thus data integration and
model validation (see Figure 2).
Figure 2 — Data integration and model validation as key factors for standardization
requirements for computational models
4.3.2 Challenges
Although for many different data types used in personalized medicine there are domain-specific
annotation standards and terminologies available (see Annex A), the process of model building
possesses the following variety of challenges:
— High degree of variability regarding data types (structured versus unstructured, molecular, clinical,
laboratory, patient-reported, etc.).
— Differences in coding and calculation within data types (between-machine variability, different
measurements, etc.).
— Heterogeneous utilization of existing data.
— High effort of data harmonization in terms of time, resources and cost.
— Models relevant for clinical use need to be fit for purpose.
— Differences in IT systems used in data generation, e.g. enterprise resource planning systems and
laboratory result software or hardware, at national, regional or clinical centre level.
— Adoption of different standards such as NPU (Nomenclature for Properties and Units) or LOINC
(Logical Observation Identifier Names and Codes).
— Long-term variety and dynamics of data and standards.
— Differences in implementation of international terminologies such as the International Classification
of Diseases (ICD).
— Language differences in unstructured text, and other factors.
4.3.3 Common standards relevant for personalized medicine
The use of common standards developed by specific user communities and different stakeholders,
as well as standard-defining organizations, has been enhanced as they have been coupled to tools,
which have spread in the respective field of research. Annex A provides an overview of some of these
standards currently in use by different communities.
4.4 Data preparation for integration into computer models
4.4.1 General
Computational models in the life sciences in general and in personalized medicine research in particular
are increasingly incorporating rich and varied data sets to capture multiple aspects of the modelled
phenomenon. Data types are encoded in technology and subdomain specific formats and the variety
and incompatibility, as well as lack of interoperability, of such data formats have been noted as one of
the major hurdles for data preparation.
To allow for seamless integration of data used for the construction of predictive computational models
in personalized medicine, these data shall:
— include or be annotated with sampling and specimen data that follow the requirements and
recommendations in accordance with the relevant domain-specific standards;
— be formatted using generally accepted and interoperable standard data formats commonly used for
the corresponding data types (in accordance with ISO 20691);
— include or be annotated with descriptive metadata that consider generally accepted domain-specific
minimum information guidelines and describes the metadata attributes and entities using semantic
standards, such as standard terminologies, controlled vocabularies and ontologies (as specified in
ISO 20691:2022, Annex B);
— follow best practice requirements and recommendations of generally accepted domain-specific
data interoperability frameworks;
— be structured in a way that allows integration of the data into a model, together with other data;
— include or be annotated with data provenance information that allows for tracking of the data and
source material throughout the whole data processing and modelling;
— be made accessible via harmonized data access agreements (hDAAs) for controlled access data, if
open access to the data is not possible.
4.4.2 Sampling data
Dedicated measures shall be taken for collecting, stabilizing, transporting, storing and processing
of biological specimen/samples, to ensure that profiles of analytes of interest (e.g. gene sequence,
transcript, protein, metabolite) for examination are not changed ex vivo. Without these measures,
analyte profiles can change drastically during and after specimen collection, thus making the outcome
from diagnostics or research unreliable or even impossible, because the subsequent examination
cannot determine the situation in the patient, but determines an artificial profile generated during the
pre-examination process.
NOTE Important measures include, for example, times and temperatures of sample transportation not
exceeding the specifications provided in relevant International Standards (e.g. ISO 20916, ISO 20186-1) and
International Technical Specifications (e.g. ISO/TS 20658), giving guidelines on all steps of the pre-examination
workflow.
Conditions applied to a specimen shall be documented in addition to other important metadata,
including but not limited to the content of Table 1.
Table 1 — Important metadata collected during pre-examination workflows
Metadata Details
Specimen collection — ID of responsible person
Information about specimen — ID
donor
— Health status (e.g. healthy, disease type, concomitant disease,
demographics such as age and gender)
— Routine medical treatment and special treatment prior to specimen
collection (e.g. anaesthetics, medications, surgical or diagnostic
procedures, fasting status)
— Appropriate consent from the specimen donor/patient
Information about the — Type and the purpose of the examination requested
specimen, collection from
— Specimen collection technique used (e.g. surgery, draw, flush)
the donor or patient and
processing
— Time and date when the specimen is removed from the body
— Documentation of any additions or modifications to the specimen after
removal from the body (e.g. addition of reagents)
Specimen storage and — Temperatures of the collection device’s surroundings
transport
Specimen reception — ID or name of the person receiving the specimen
— Arrival date, time and conditions (e.g. labelling, transport conditions
including temperature, leaking/breaking of the specimen collection
container)
— Nonconformities, including those from collection and transport
requirements
Specimen processing and — Procedure and any modification applied from method referenced
isolation of analyte
— Storage buffer of analyte
Information on isolated — Quantity and quality/integrity of analyte
analyte
Storage of isolated analyte — Date and time of storage start
— Temperature and method applied for storage
4.4.3 Data formatting
The first step in constructing a predictive computational model in personalized medicine is collating
the data sets that need to be integrated into the model (which typically originate from various sources),
and describing the different aspects of the studied subject and specimen to be modelled and simulated.
The following different major resources for data in personalized medicine can be identified:
a) clinical data;
b) laboratory data (including omics data, as well as data from histology and cytology);
[34]
c) sample data and trial data;
d) data from medical imaging, functional examinations and other clinical tests.
NOTE 1 Sensory data, either from medical or personal devices, are becoming more and more important and
constitute another type of resource.
Each of these groups shall be structured in the corresponding formats in a way that allows conflation
and, thus, ensures interoperability. The aim shall be to render the data interoperable, so that people
(e.g. researchers) and also software tools can identify the key information in the data files/entries
and interrelate corresponding pieces of information. To allow for seamless integration of data used
for the construction or parameterization of such computational models, before their integration into
the corresponding models, data sets shall be formatted using generally accepted, appropriate and
interoperable standard data formats commonly used for the corresponding data types (in accordance
with ISO 20691:2022, Annex A, with extensive examples of recommended standard formats).
The used data format and its version shall be unambiguously documented with the data to allow for
later decoding of the data.
NOTE 2 It is often difficult for researchers to be able to identify suitable standards for use within their field,
as availability is not enough: other researchers within the field also need to use the standards, and there has to
be software to facilitate the generation and exchange of standardized data and models. Online resources are
available for bundling information about available standards for formatting data in the domain. One widely
used example for such a curated, informative and educational resource on data and metadata standards, inter-
1)
[35]
related to databases and data policies, is the publicly available FAIRsharing portal. Recommended formats
referenced in ISO 20691 can be found online as an actively curated, constantly maintained and updated list as the
[36]
ISO 20691 FAIRSharing Collection.
4.4.4 Data description
NOTE Research data are distributed across resources and repositories hosted by different institutions,
ranging from individual companies, institutions, research groups and universities, to national and international
infrastructures with their data repositories. Many life science disciplines have developed their own, discipline-
specific resources and catalogues with specific, sometimes implicit practices and standards. However, a
standardized set of metadata has the potential to overcome these data silos.
Descriptive metadata are necessary to integrate biomedical, laboratory and clinical data into computer
models. Metadata provides structured descriptions of the content, context and provenance of data sets.
Descriptive metadata provides searchable information, making the data sets themselves discoverable
and providing mechanisms for data citation. Metadata also enables users to judge whether a particular
data set is suitable for their particular research purpose. Requirements for descriptive metadata can
be found in domain-specific minimum information guidelines (see ISO 20691:2022, Annex B) or, more
[7]
generally, in the FAIR principles for data stewardship.
4.4.5 Data annotation (semantics)
For many data types used in personalized medicine, domain-specific annotation standards and
[37] [38]
terminologies are available. For example, UniProt or the Protein Ontology should be used to
uniquely identify proteins in a particular biological context which can then be linked to specific entities
[39]
in the computational model. Similarly, the Gene Ontology should be used to identify specific genes, or
[40]
cellular components, whereas the Foundational Model of Anatomy (FMA) should be used to localize
an entity in the computational model to specific spatial location or anatomical structure.
If not found completely or partially unstructured, which is often the case, health-related data are most
commonly structured and codified by specific formatting standards for medical data. These can be the
[41]
interoperability standard HL7 Fast Healthcare Interoperability Resou
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...