ISO/IEC TS 42112:2026
(Main)Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization
Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization
This document outlines key factors affecting machine learning model training efficiency and presents corresponding optimization approaches. It provides guidance for AI providers and producers through a structured set of characteristics and related optimizations to improve training efficiency. This information can support the evaluation and comparison of various ML training strategies. This document does not specify any training accelerating mechanisms provided and implemented within machine learning computing device described in ISO/IEC TR 17903.
Technologies de l'information — Intelligence artificielle — Recommandations relatives à l'optimisation de l'efficacité de l'entraînement du modèle d'apprentissage automatique
General Information
- Status
- Published
- Publication Date
- 04-Jun-2026
- Technical Committee
- ISO/IEC JTC 1/SC 42 - Artificial intelligence
- Current Stage
- 6060 - International Standard published
- Start Date
- 05-Jun-2026
- Due Date
- 18-Jun-2026
- Completion Date
- 05-Jun-2026
Overview
ISO/IEC TS 42112:2026 is an internationally recognized technical specification developed by ISO and IEC. It provides guidance for optimizing machine learning (ML) model training efficiency within the field of artificial intelligence (AI). The document targets AI providers, who supply platforms, services, and computational resources, as well as AI producers, who design and train ML models. The standard systematically identifies the key characteristics influencing ML training efficiency and offers structured optimization methods to improve overall resource utilization, training time, and model quality. This standard is vital for organizations aiming to evaluate and compare machine learning training strategies, reduce computational costs, and accelerate AI solution delivery.
Key Topics
Training Efficiency Factors
The standard explores critical aspects impacting machine learning training, including:- Quality and size of training data
- Management of model parameters
- Communication overhead in distributed computing environments
- Detection, diagnosis, and recovery of training failures
- Overall quality, robustness, and security considerations of ML models
- Effective allocation and management of computing resources
Optimization Approaches
ISO/IEC TS 42112:2026 presents a phased optimization strategy aligned with the ML pipeline, including:- Data quality improvement and input validation
- Feature engineering, selection, and scaling
- Informed selection of ML algorithms based on task and data characteristics
- Training process optimization such as cross-validation, regularization, hyperparameter tuning, and early stopping
- Ensemble learning techniques to combine multiple models for superior performance
- System-level enhancements like parallelism, communication optimization, checkpointing, and resource management
- Continuous monitoring, failure recovery, and infrastructure assessment
Stakeholder Guidance
Both ML platform providers and model developers benefit from tailored advice, with emphasis on optimizing at data, algorithmic, and system levels for scalable and robust AI deployment.
Applications
Organizations implementing ISO/IEC TS 42112:2026 can realize significant improvements in the following areas:
AI Solution Development
By employing standardized methods for optimizing ML training, AI producers deliver validated models faster, enabling quicker deployment and innovation cycles.AI Platform Efficiency
Providers of AI infrastructure can increase resource utilization rates, support more users, and reduce operational costs by systematically applying efficiency guidelines to their platforms.Evaluation and Benchmarking
The standard offers a structured basis for comparing diverse ML training strategies, helping organizations make informed technology and architecture decisions.Reliable and Scalable Systems
Robust guidance on failure detection, recovery, and resource management ensures more reliable ML training in large-scale or distributed environments.Compliance and Best Practices
Adhering to international standards like ISO/IEC TS 42112 can demonstrate commitment to best practices and build stakeholder confidence in AI products.
Practical use cases include optimizing deep learning recommender systems for e-commerce, streamlining model updates in dynamic environments, and enhancing fairness and robustness in AI-driven business processes.
Related Standards
Implementation of ISO/IEC TS 42112:2026 may be supported by or integrated with the following standards:
- ISO/IEC 22989:2022 - Artificial intelligence concepts and terminology
- ISO/IEC 23053:2022 - Framework for AI systems using machine learning
- ISO/IEC TR 17903:2024 - Overview of machine learning computing devices
- ISO/IEC TS 4213 - Assessment of machine learning classification performance
- ISO/IEC 5259-2 - Data quality measures for AI training data
- ISO/IEC 25059 & ISO/IEC 25010 - System/software quality models
By following these standards, organizations can ensure consistency and interoperability across AI and ML projects, promoting efficient, reliable, and transparent machine learning workflows.
Get Certified
Connect with accredited certification bodies for this standard

BSI Group
BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

NYCE
Mexican standards and certification body.
Sponsored listings
Frequently Asked Questions
ISO/IEC TS 42112:2026 is a technical specification published by the International Organization for Standardization (ISO). Its full title is "Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization". This standard covers: This document outlines key factors affecting machine learning model training efficiency and presents corresponding optimization approaches. It provides guidance for AI providers and producers through a structured set of characteristics and related optimizations to improve training efficiency. This information can support the evaluation and comparison of various ML training strategies. This document does not specify any training accelerating mechanisms provided and implemented within machine learning computing device described in ISO/IEC TR 17903.
This document outlines key factors affecting machine learning model training efficiency and presents corresponding optimization approaches. It provides guidance for AI providers and producers through a structured set of characteristics and related optimizations to improve training efficiency. This information can support the evaluation and comparison of various ML training strategies. This document does not specify any training accelerating mechanisms provided and implemented within machine learning computing device described in ISO/IEC TR 17903.
ISO/IEC TS 42112:2026 is classified under the following ICS (International Classification for Standards) categories: 35.240.01 - Application of information technology in general. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC TS 42112:2026 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
Technical
Specification
ISO/IEC TS 42112
First edition
Information technology — Artificial
2026-06
intelligence — Guidance on machine
learning model training efficiency
optimization
Technologies de l'information — Intelligence artificielle —
Recommandations relatives à l'optimisation de l'efficacité de
l'entraînement du modèle d'apprentissage automatique
Reference number
© ISO/IEC 2026
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2026 – All rights reserved
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 3
5 Overview of ML model training . 3
5.1 Model training in ML pipeline .3
5.2 Stakeholders of model training .4
6 ML model training efficiency . 4
7 Characteristics impacting ML model training efficiency . 4
7.1 Training data .4
7.2 Model parameter management .4
7.3 Communication challenges . .5
7.4 Failure detection in model training .5
7.5 Failure recovery in model training .5
7.6 Quality of the ML model .5
7.7 Management of computing resources .6
8 Model training efficiency optimization methods . 6
8.1 Overview .6
8.2 Training data preparation and model optimization .6
8.2.1 General consideration .6
8.2.2 Training data quality optimization . .7
8.2.3 Feature engineering .7
8.2.4 Feature selection .8
8.2.5 Feature scaling .8
8.2.6 Training algorithm selection .8
8.2.7 Training process optimization .9
8.2.8 Ensemble learning .9
8.3 Parallelism strategies . . .9
8.3.1 Data parallelism .9
8.3.2 Model parallelism .10
8.3.3 Hybrid parallelism .10
8.4 Communication optimization.10
8.4.1 Collective communication .10
8.4.2 Data compression .10
8.4.3 Asynchronous communication .11
8.4.4 Network topology-aware scheduling .11
8.5 Model checkpoint optimization .11
8.5.1 Hierarchical checkpoint saving .11
8.5.2 Overlapping model copy and computation .11
8.5.3 Network-aware asynchronous storage .11
8.6 Resource management for model training . 12
8.7 Failure detection optimization . 12
8.8 Continuous monitoring and anomaly detection . 13
8.9 Environment and infrastructure assessment . 13
Annex A (informative) Use case: Deep learning recommendation system for an e-commerce
platform . 14
Bibliography . 17
© ISO/IEC 2026 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 42, Artificial intelligence.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
© ISO/IEC 2026 – All rights reserved
iv
Introduction
Machine learning (ML) is a key branch of artificial intelligence (AI). To apply ML across diverse domains, ML
models are trained, validated and deployed in production environments. As model complexity and dataset
size continue to grow, the time, hardware resources, human effort and financial costs associated with ML
model training are escalating.
ML platforms, services and products are now widely available and adopted. Both AI providers (who offer
ML platforms, services or products) and AI producers (who build ML-based solutions) seek to optimize
training efficiency to minimize resource consumption and cost, without compromising model or dataset
scale. For instance, AI providers aim to reduce hardware usage per training task to support more customers
and improve resource utilization. AI producers prioritize faster training cycles to accelerate deployment of
validated models.
This document provides guidance to help AI providers and producers achieve faster training and reduced
resource consumption, given specific models, dataset and infrastructure.
© ISO/IEC 2026 – All rights reserved
v
Technical Specification ISO/IEC TS 42112:2026(en)
Information technology — Artificial intelligence — Guidance
on machine learning model training efficiency optimization
1 Scope
This document outlines key factors affecting machine learning model training efficiency and presents
corresponding optimization approaches.
It provides guidance for AI providers and producers through a structured set of characteristics and related
optimizations to improve training efficiency. This information can support the evaluation and comparison
of various ML training strategies.
This document does not specify any training accelerating mechanisms provided and implemented within
machine learning computing device described in ISO/IEC TR 17903.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 22989:2022, Information technology — Artificial intelligence — Artificial intelligence concepts and
terminology
ISO/IEC 23053:2022, Framework for Artificial Intelligence (AI) Systems Using Machine Learning (ML)
ISO/IEC TR 17903:2024, Information technology — Artificial intelligence — Overview of machine learning
computing devices
ISO/IEC TS 4213, Information technology — Artificial intelligence — Assessment of machine learning
classification performance
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 22989, ISO/IEC 23053,
ISO/IEC TR 17903, ISO/IEC TS 4213 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
artificial intelligence platform
AI platform
set of services that are provided by AI platform provider and enable other stakeholders to produce artificial
intelligence (AI) services or products
© ISO/IEC 2026 – All rights reserved
3.2
asynchronous
pertaining to two or more processes that do not depend upon the occurrence of specific events such as
common timing signals
[SOURCE: ISO/IEC 20944-1:2013, 3.6.1.14]
3.3
computation graph
graph used to represent mathematical computation process, where nodes represent operations or variables
and edges represent data flows and dependencies
3.4
failure
event during the model training process in which the training task fails to complete as expected or
experiences significant degradation in performance
3.5
fault
defect, error or abnormal condition in infrastructure, hardware, software, data or configuration that can
lead to a failure (3.4) in the model training process
3.6
grid search
hyperparameter tuning technique that explores all possible combinations of hyperparameters within a
predefined set to identify the combination that can maximize ML model performance
Note 1 to entry: Hyperparameter tuning is specified in ISO/IEC 23053:2022, 6.5.3.1.
3.7
principal-component analysis
PCA
factor analysis involving the extraction of orthogonal factors that successively capture the largest amount of
variance in the dataset
[SOURCE: ISO 18115-1:2023, 22.14, modified — Notes to entry removed.]
3.8
random search
hyperparameter tuning technique that randomly samples a fixed number of hyperparameter combinations
within a predefined set to identify a favourable combination
Note 1 to entry: Hyperparameter tuning is specified in ISO/IEC 23053:2022, 6.5.3.1.
3.9
synchronous
pertaining to two or more processes that depend upon the occurrence of specific events such as common
timing signals
[SOURCE: ISO/IEC 20944-1:2013, 3.6.1.13]
© ISO/IEC 2026 – All rights reserved
4 Abbreviated terms
AI artificial intelligence
AdaBoost adaptive boosting
Bagging bootstrap aggregating
BMI body mass index
CPU central processing unit
CNN convolutional neural network
D2H Device-to-Host
FP floating-point
GPU graphics processing unit
KNN k-nearest neighbours
Lasso least absolute shrinkage and selection operator
LDA linear discriminant analysis
LSTM long short-term memory
ML machine learning
Nested CV nested cross validation
NLP natural language processing
PCA principal component analysis
RNN recurrent neural network
SMOTE synthetic minority over-sampling technique
SSD solid-state drive
SVM support vector machine
t-SNE t-distributed stochastic neighbour embedding
XGBoost extreme gradient boosting
5 Overview of ML model training
5.1 Model training in ML pipeline
The ML pipeline is defined in ISO/IEC 23053:2022, Clause 8. Within this pipeline, training data are prepared
and an appropriate ML algorithm is selected prior to model training. During model training, an ML model is
trained using the training data to establish its parameters. After training, model selection is conducted to
tune hyperparameters, followed by model evaluation and verification.
This document focuses exclusively on optimizing efficiency during the ML model training phase.
NOTE This document does not cover the ML optimization methods specified in ISO/IEC 23053:2022, 6.5.4.
© ISO/IEC 2026 – All rights reserved
5.2 Stakeholders of model training
AI stakeholder roles and their sub-roles, as defined in ISO/IEC 22989:2022, 5.19, are applicable to the context
of ML model training in this document.
In this context, an AI provider supplies tools, infrastructure or services, typically via an AI platform, to
enable organizations to train ML models. The AI platform facilitates various aspects of the ML workflow,
including data preparation, model training, evaluation and deployment. Key features provided by the AI
platform include:
— computing resources allowing access to hardware (CPUs, GPUs or other types of processors) necessary
for model training;
— storage solutions for secure and efficient management of datasets and models;
— training automation to streamline and manage aspects of the training process;
— collaboration tools supporting version control, project management and team coordination;
[9]–[11]
— framework support ensuring compatibility with various ML frameworks (optional).
An AI producer is responsible for designing, training, and validating ML models, and may utilize the tools,
infrastructure or services offered by an AI provider.
6 ML model training efficiency
Performance efficiency is a key characteristic in the AI system quality model (see ISO/IEC 25059). It is
defined in terms of time behaviour, resource utilization and capacity (see ISO/IEC 25010). In the context of
ML model training, efficiency optimization refers to:
— time behaviour optimization, which aims to reduce the duration of model training;
— resource utilization optimization, which focuses on minimizing hardware usage or improving utilization
rate.
7 Characteristics impacting ML model training efficiency
7.1 Training data
Several characteristics of training data influence ML training efficiency and optimization potential.
The quality of the training data impacts the efficiency of ML model training and the inference performance of
the trained model. ISO/IEC 5259-2 specifies data quality characteristics and methods for their quantitative
assessment. Low-quality training data can lead to longer training time, increased resource consumption
and degraded inference performance. If data quality requirements are not met, retraining can be necessary,
which can significantly increase overall training duration and resource usage.
The size of the training dataset has a complex effect on both training efficiency and model inference
performance. Larger datasets demand more computation, memory and network resources, and typically
require longer training durations. Conversely, datasets that are too small to be representative, complete,
balanced, effective or fair can impair model performance, even if they require fewer resources and shorter
training time.
The method of data processing also impacts training efficiency. Sequential data processing can be
prohibitively time-consuming or infeasible for large datasets.
7.2 Model parameter management
A large number of model parameters can require distributed storage and frequent updates across multiple
ML computing devices.
© ISO/IEC 2026 – All rights reserved
7.3 Communication challenges
Centralized, synchronous communication between multiple ML computing devices where training data are
split can be an efficiency bottleneck. Latency between ML computing devices further contributes to reduced
training efficiency.
Concurrent model training tasks running on shared computing infrastructure can lead to network
congestion. When multiple tasks simultaneously transmit data, they may exceed network bandwidth limits,
causing network congestion. High-volume data transmission from certain training tasks can obstruct
network traffic for others, slowing down communication and negatively impacting overall training efficiency.
7.4 Failure detection in model training
ML model training typically runs on computing infrastructure equipped with supporting libraries, software
and tools. When training large models with extensive datasets, the process can be prolonged. Hardware
failure, system faults, power outage or other disruptions may interrupt the training process.
When failures occur, troubleshooting is necessary. If the issue is not clearly linked to the AI producer’s code,
the AI provider’s operations team should intervene to diagnose and isolate the fault. Ultimately, it can be
necessary for the AI producer to resubmit the training task, which can be time-consuming.
In large-scale distributed synchronous deep learning tasks, hardware faults can leave many GPU cards idle,
resulting in substantial resource waste.
Effective failure detection is therefore critical to maintaining ML model training efficiency.
7.5 Failure recovery in model training
To prevent restarting the entire training process after a failure, intermediate model states can be saved
during training. For large models trained on extensive datasets, a checkpoints mechanism should be
implemented to periodically save model states—including weights, optimizer configurations and other
relevant parameters. Checkpoints enable AI producers to resume training from a saved state rather than
starting from scratch. However, generating checkpoints introduces overhead, and failure recovery using
this method can affect overall training efficiency.
7.6 Quality of the ML model
The quality of an ML model, including correctness, robustness, bias mitigation, information security and
protection from exploitation, can significantly influence training efficiency.
Correctness directly affects training efficiency, as it reflects how well the model performs in terms of metrics
like accuracy and error rates. High correctness indicates effective learning and generalization, reducing the
need for retraining and improving resource efficiency. However, striving for high correctness can lead to
overfitting, where the model becomes overly tailored to the training data and performs poorly on unseen
data. Mitigating overfitting requires techniques such as cross-validation, regularization and additional
tuning, which can extend the training process.
Robustness reflects the model’s ability to handle fluctuations and interruptions in training data and to
perform reliably with imperfect or faulty data. Training for robustness involves exposing the model to noisy
or incomplete data and evaluating its ability under such conditions. A robust model can adapt to fluctuations
without significant performance decrease, reduce the need for repeated validation and conserve resources.
Achieving robustness can require additional training phases, such as incorporating negative examples or
optimizing hyperparameters to handle perturbations. This additional complexity can lengthen the training
process and require more computing resources. Techniques like data augmentation and regularization can
improve resilience but also increase training time and computational costs.
Bias mitigation requires careful data curation and preprocessing to ensure representativeness and fairness.
Techniques such as re-weighting or data augmentation can be necessary to address under-represented
groups. A model free from unwanted bias generalizes better across diverse data and reduces retraining
© ISO/IEC 2026 – All rights reserved
due to ethical or legal concerns. However, ensuring fairness can increase training pipeline complexity and
require additional fine-tuning and post-training bias audits, further extending the training cycle.
Information security impacts training efficiency by introducing requirements for data privacy and model
confidentiality. Measures such as encryption and access control can complicate data preparation and
training workflows. Maintaining infrastructure availability, data confidentiality and model integrity helps
avoid costly retraining. However, secure handling of sensitive data and model parameters can require
additional steps, such as secure data storage and protected access environments, which introduces overhead
that affects overall training efficiency.
Safeguarding against exploitation also influences training efficiency. Secure models help prevent system
failures and reduce retraining caused by adversarial vulnerabilities. Techniques such as output obfuscation
and review of model predictions help prevent unintended information exposure and adversarial exploitation.
Proactively building adversarial robustness saves time and resources otherwise spent on reactive defences.
However, simulating attack scenarios and validating robustness can significantly increase training time and
resources demands.
7.7 Management of computing resources
Poor management of computing resources can lead to workload imbalance, where some ML computing
devices are overloaded while others remain underutilized. This results in inefficient resource usage and
may cause instability or failure in overloaded devices.
When multiple model training tasks run concurrently on the same underlying infrastructure, resource
congestion can occur. These tasks simultaneously compete for limited computing capacity, and critical
training tasks can be delayed or interrupted due to insufficient resource allocation, ultimately slowing down
the overall training process.
8 Model training efficiency optimization methods
8.1 Overview
Model training efficiency optimization aims to reduce training time while improving computing resources
utilization. It is a multifaceted process that involves various approaches targeting different aspects of
the training process. From a stakeholder perspective, different roles emphasize distinct optimization
approaches.
— AI producers primarily focus on optimizing training through data preparation and model design,
including training data quality optimization, feature engineering, feature scaling, training algorithm
selection, training process optimization and ensemble learning.
— AI providers, especially AI platform providers, focus on system-level optimizations, including parallel
computing, communication, checkpoint, failure detection, resource management, continuous monitoring
and anomaly detection, and environment and infrastructure assessment.
An example of the application of the optimization techniques described in Clause 8 is illustrated in Annex A.
8.2 Training data preparation and model optimization
8.2.1 General consideration
The training data preparation and model optimization process consists of a comprehensive sequence of
methods aimed at improving the quality of training data, enhancing the informativeness of features and
maximizing model performance. These methods follow the machine learning (ML) pipeline and should be
selected and combined according to the unique characteristics of the dataset and the modelling goals.
To achieve consistent and transparent optimization, a structured machine learning pipeline approach should
be implemented. In this approach, each optimization task is treated as a distinct pipeline step that addresses
a specific aspect of the modelling process. Pipeline steps are applied sequentially and synergistically, leading
© ISO/IEC 2026 – All rights reserved
to improved generalization, performance metrics and operational robustness. Importantly, some pipeline
steps are model-independent (data-centric), while others are model-dependent (algorithm-centric).
The core sequence of steps in the pipeline is presented in 8.2.2 to 8.2.8, structured by steps.
Table 1 provides a consolidated summary of the machine learning pipeline approach. This framework
structures the model development process and promotes optimal use of available tools and strategies for
ML performance.
Table 1 — Machine learning pipeline approach to model training optimization
Model dependency
Step Description Subclause Role in pipeline
(in general)
0 Training data quality optimization 8.2.2 No Prerequisites
1 Feature engineering 8.2.3 No Data preprocessing
2 Feature selection 8.2.4 No Data preprocessing
3 Feature scaling 8.2.5 No Data preprocessing
4 Training algorit
...



