Systems and software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Guidance for quality evaluation of artificial intelligence (AI) systems

This document provides guidance for evaluation of artificial intelligence (AI) systems using an AI system quality model. The document is applicable to all types of organizations engaged in the development and use of AI.

Ingénierie des systèmes et des logiciels — Exigences et évaluation de la qualité des systèmes et des logiciels (SQuaRE) — Lignes directrices pour l'évaluation de la qualité des systèmes d'intelligence artificielle (IA)

General Information

Status
Published
Publication Date
23-Jan-2024
Current Stage
9092 - International Standard to be revised
Start Date
28-Jul-2025
Completion Date
30-Oct-2025

Overview

ISO/IEC TS 25058:2024 - “Systems and software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Guidance for quality evaluation of artificial intelligence (AI) systems” - provides a structured guidance for evaluating AI system quality using an AI system quality model. Published as a Technical Specification by ISO/IEC JTC 1/SC 42, it is applicable to all types of organizations involved in the development, deployment, or use of AI. The document aligns AI quality evaluation with the SQuaRE approach to systems and software quality.

Key Topics

The specification defines an AI-focused quality evaluation methodology and addresses quality characteristics familiar to SQuaRE, adapted for AI systems. Major topics include:

  • Quality evaluation methodology for AI systems and use of an AI system quality model.
  • Functional suitability aspects such as completeness, correctness, appropriateness, and adaptability.
  • Performance efficiency covering time behaviour, resource utilization, and capacity.
  • Compatibility including co-existence and interoperability with other systems.
  • Usability dimensions like recognizability, learnability, operability, accessibility, user controllability and transparency.
  • Reliability attributes: maturity, availability, fault tolerance, recoverability and robustness.
  • Security elements: confidentiality, integrity, non-repudiation, accountability, authenticity and intervenability.
  • Maintainability (modularity, reusability, analysability, modifiability, testability) and portability (adaptability, installability, replaceability).
  • User-centered measures: effectiveness, efficiency, satisfaction (trust, usefulness, pleasure, comfort, transparency).
  • Freedom from risk and risk mitigation across economic, health & safety, environmental, societal and ethical dimensions.
  • Context coverage and the extent to which an AI system meets required operational contexts.

Applications

ISO/IEC TS 25058:2024 is practical for:

  • Defining quality requirements and acceptance criteria for AI products and services.
  • Designing evaluation plans, test cases and benchmarks for AI system performance and safety.
  • Supporting procurement specifications, vendor assessment, and contract clauses for AI systems.
  • Performing internal QA audits, independent conformity assessment, and post-deployment monitoring.
  • Guiding risk assessment and mitigation strategies that include ethical, safety and environmental considerations.

Who should use it

  • AI developers, system architects and QA teams
  • Product managers and procurement officers
  • Compliance, audit and risk-management professionals
  • Regulators, certification bodies and researchers focused on AI quality and safety

Related standards

  • SQuaRE family (ISO/IEC 25000 series) and other ISO/IEC JTC 1 and SC 42 AI standards provide complementary guidance for quality models, requirements and conformity assessment.

Keywords: ISO/IEC TS 25058:2024, AI quality evaluation, SQuaRE, AI system quality model, AI system assessment, AI risk mitigation, AI transparency.

Technical specification

ISO/IEC TS 25058:2024 - Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — Guidance for quality evaluation of artificial intelligence (AI) systems Released:24. 01. 2024

English language
20 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC TS 25058:2024 is a technical specification published by the International Organization for Standardization (ISO). Its full title is "Systems and software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Guidance for quality evaluation of artificial intelligence (AI) systems". This standard covers: This document provides guidance for evaluation of artificial intelligence (AI) systems using an AI system quality model. The document is applicable to all types of organizations engaged in the development and use of AI.

This document provides guidance for evaluation of artificial intelligence (AI) systems using an AI system quality model. The document is applicable to all types of organizations engaged in the development and use of AI.

ISO/IEC TS 25058:2024 is classified under the following ICS (International Classification for Standards) categories: 35.080 - Software. The ICS classification helps identify the subject area and facilitates finding related standards.

You can purchase ISO/IEC TS 25058:2024 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


Technical
Specification
ISO/IEC TS 25058
First edition
Systems and software
2024-01
engineering — Systems and
software Quality Requirements and
Evaluation (SQuaRE) — Guidance
for quality evaluation of artificial
intelligence (AI) systems
Ingénierie des systèmes et des logiciels — Exigences et évaluation
de la qualité des systèmes et des logiciels (SQuaRE) — Lignes
directrices pour l'évaluation de la qualité des systèmes
d'intelligence artificielle (IA)
Reference number
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2024 – All rights reserved
ii
Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Overview . 1
5 Quality evaluation methodology . 3
6 Functional suitability . 3
6.1 Functional completeness .3
6.2 Functional correctness .3
6.3 Functional appropriateness .4
6.4 Functional adaptability.5
7 Performance efficiency . 5
7.1 Time behaviour .5
7.2 Resource utilization .5
7.3 Capacity .6
8 Compatibility . 6
8.1 Co-existence .6
8.2 Interoperability.6
9 Usability . 6
9.1 Appropriateness recognizability . .6
9.2 Learnability .6
9.3 Operability .7
9.4 User error protection .7
9.5 User interface aesthetics .7
9.6 Accessibility .7
9.7 User controllability .8
9.8 Transparency .8
10 Reliability . 9
10.1 Maturity .9
10.2 Availability .9
10.3 Fault tolerance .9
10.4 Recoverability .9
10.5 Robustness .9
11 Security . 10
11.1 Confidentiality .10
11.2 Integrity .10
11.3 Non-repudiation .11
11.4 Accountability .11
11.5 Authenticity . .11
11.6 Intervenability .11
12 Maintainability .11
12.1 Modularity .11
12.2 Reusability . .11
12.3 Analysability. 12
12.4 Modifiability . 12
12.5 Testability . 12
13 Portability .12
13.1 Adaptability . 12

© ISO/IEC 2024 – All rights reserved
iii
13.2 Installability . 12
13.3 Replaceability . 12
14 Effectiveness .13
15 Efficiency .13
16 Satisfaction .13
16.1 General . 13
16.2 Usefulness . 13
16.3 Trust . 13
16.4 Pleasure . 13
16.5 Comfort. 13
16.6 Transparency . 13
17 Freedom from risk .13
17.1 General . 13
17.2 Economic risk mitigation . 13
17.3 Health and safety risk mitigation .14
17.4 Environmental risk mitigation.16
17.5 Societal and ethical risk mitigation .16
18 Context coverage . 17
18.1 General .17
18.2 Context completeness.17
18.3 Flexibility.18
Bibliography . 19

© ISO/IEC 2024 – All rights reserved
iv
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee
SC 42, Artificial intelligence.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

© ISO/IEC 2024 – All rights reserved
v
Introduction
An artificial intelligence (AI) system can be challenging to evaluate. Consequently, the impact of an AI
system with poor quality can be considerable since it can be developed to facilitate the automation of critical
actions and decisions.
The purpose of this document is to guide AI developers performing a quality evaluation of their AI systems.
This document does not state exact measurements and thresholds, as these vary depending on the nature of
each system. Instead, it specifies comprehensive guidance that covers the relevant facets of an AI system’s
quality for successful quality evaluation.
Testing is within the scope as far as each characteristic and sub-characteristic is verified by testing
strategies, but details of testing methods and measurements are covered elsewhere, for example in the
ISO/IEC/IEEE 29119 series.
© ISO/IEC 2024 – All rights reserved
vi
Technical Specification ISO/IEC TS 25058:2024(en)
Systems and software engineering — Systems and software
Quality Requirements and Evaluation (SQuaRE) — Guidance
for quality evaluation of artificial intelligence (AI) systems
1 Scope
This document provides guidance for evaluation of artificial intelligence (AI) systems using an AI system
quality model.
The document is applicable to all types of organizations engaged in the development and use of AI.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC TS 4213, Information technology — Artificial intelligence — Assessment of machine learning
classification performance
ISO/IEC 22989, Information technology — Artificial intelligence — Artificial intelligence concepts and
terminology
ISO/IEC 23053:2022, Framework for Artificial Intelligence (AI) Systems Using Machine Learning (ML)
ISO/IEC 25059:2023, Software engineering — Systems and software Quality Requirements and Evaluation
(SQuaRE) — Quality model for AI systems
ISO/IEC/IEEE 29119-1, Software and systems engineering — Software testing — Part 1: General concepts
ISO/IEC/IEEE 29148, Systems and software engineering — Life cycle processes — Requirements engineering
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC TS 4213, ISO/IEC 22989,
ISO/IEC 23053, ISO/IEC 25059, ISO/IEC/IEEE 29119-1 and ISO/IEC/IEEE 29148 apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
4 Overview
To ensure that relevant facets of an AI system’s quality are covered by the quality evaluation guidance, this
document references Systems and software Quality Requirements and Evaluation (SQuaRE) product quality
and quality in use models’ characteristics for an AI system (see ISO/IEC 25059). The product quality and
quality in use models’ characteristics, as applicable to a general system, apply to an AI system. Several sub-
characteristics have been added, and some have different meanings or contexts.

© ISO/IEC 2024 – All rights reserved
Figures 1 and 2 illustrate an AI system's product quality and quality in use models’ characteristics and sub-
characteristics. Please note that some sub-characteristics have been added or modified from the SQuaRE
quality models for general systems, as an AI system differs from a general system and software.

a
New sub-characteristic.
m
Modified sub-characteristic.
SOURCE ISO/IEC 25059:2023, Figure 1.
Figure 1 — AI system product quality model

© ISO/IEC 2024 – All rights reserved
a
New sub-characteristic.
SOURCE ISO/IEC 25059:2023, Figure 2.
Figure 2 — AI system quality in-use model
5 Quality evaluation methodology
Quality evaluation guidance is defined by relevant quality model sub-characteristics.
All the sub-characteristics from the SQuaRE product quality and quality in use models are covered in this
document.
The guidance in this document should complement the SQuaRE quality evaluation process described in
ISO/IEC 25040 for AI systems.
6 Functional suitability
6.1 Functional completeness
Quality of the functional completeness sub-characteristic should be measured against quality measures
according to ISO/IEC 25023:2016, 8.2.1.
6.2 Functional correctness
Quality of the functional correctness sub-characteristic should be measured against quality measures
according to ISO/IEC 25023:2016, 8.2.2.

© ISO/IEC 2024 – All rights reserved
Functional correctness should be evaluated with the proper key performance indicators (KPIs) and
measurements.
Measurements and key performance indicators should be established to measure the capability of an AI
system to do a specific task and to evaluate the amount of unpredictability of the system.
The right evaluation measurements should be used to measure functional correctness based on an AI
system’s problem type and the stakeholders’ objectives. For a list of typical evaluation measurements, refer
to ISO/IEC 23053:2022, 6.5.5.
Functional correctness should also be evaluated using functional testing methods, such as:
— metamorphic testing: technique that establishes relationships between inputs and outputs of the system;
— expert panels: technique used when an AI system is built to replace the judgement of experts, which
consists of establishing a panel to review the test results;
— benchmarking an AI system: technique used when an AI system is replacing existing approaches or when
a similar AI system can be used as a benchmark;
— testing an AI system’s behaviours against various scenarios or test cases defined by stakeholders;
— testing in a simulated environment: technique used when an AI system is characterized by physical
action on the environment;
— field trials: technique used when there is a potential difference or evolution between testing environments
and actual operation conditions;
— risk management: testing AI system behaviour against identified risk scenarios.
Functional correctness evaluation techniques should be performed on different and representative datasets.
The best machine learning (ML) model should be selected using the appropriate evaluation measurements
against a validation dataset. The simple ML model validation technique uses only one validation dataset.
However, a cross-validation technique is suggested when possible.
In a separate back-testing phase, the selected ML model should be tested once again with new data (the
testing dataset) for consistency.
Training, validation and testing datasets should all be built with different data.
Validation and testing datasets should all be built with representative subsets of the actual operation
conditions.
The ML model should be tested against datasets with known cohorts to identify positive or negative bias
creep.
The final settings to tune the ML model (e.g. the cut-off threshold in classification) should be defined together
with the business users.
The functional correctness should be evaluated on production data for monitoring purposes.
Product deployment should take place after the back-testing phase.
6.3 Functional appropriateness
The quality of the functional appropriateness sub-characteristic should be measured against quality
measures according to ISO/IEC 25023:2016, 8.2.3.

© ISO/IEC 2024 – All rights reserved
6.4 Functional adaptability
An AI system should have a mechanism to adapt dynamically to changes in the production data, by using one
of the following;
— deploying a continuous or reinforcement learning modelling approach;
— implementing an automated retraining workflow.
NOTE Functional adaptability does not necessarily adapt to changes of the system objectives as these changes
potentially transform the functional state of the system.
The organization should develop an adaptation system to generate a feedback loop. This managed system
comprises four essential functions: monitor; analyse; plan; execute.
— The monitor function tracks the managed system and the environment in which the system operates and
updates the knowledge.
— The analyse function uses the up-to-date knowledge to evaluate the need for adaptation, exploiting
rigorous analysis techniques or simulations of runtime ML models.
— The plan function selects the best option based on the adaptation goals and generates a plan to adapt the
system from its current configuration to the new configuration.
— The execute function implements the adaptation actions of the plan with relevant intervention.
Functional adaptability should be evaluated using measurements, key performance indicators and functional
testing methods, as documented in 6.2, to measure the adaptability of an AI system to a new dataset.
The organization should take into consideration resource trade-offs when selecting the best ML model for
deployment, as the most accurate ML model can be prohibitively expensive to computationally evaluate.
Refer to 7.2 for more details.
7 Performance efficiency
7.1 Time behaviour
Quality of the time behaviour sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.3.1.
The organization should calculate time behaviour during the training, evaluation and inference workflows
under normal conditions as part of normal workflows in production, using the production environment,
infrastructures and computing resources, as time behaviour depends on resource utilization. Refer to 7.2 for
guidance on resource utilization.
The organization should consider an AI system adaptability mechanism while measuring the process
duration. For example, a system that consists of a sequence of retraining, evaluation and inference should
measure the duration of the entire sequence of workflows.
The organization should test and assess potential conflicts between computational resources. For example,
if the training and inference workflows use the same computational resources or if multiple inferences
happen simultaneously, this can negatively affect time behaviour of an AI system.
The organization should test and assess timing between data collection, data transformation and other
data-dependent AI system workflows. For example, AI system inference cannot be processed if the required
input data are not collected and transformed beforehand.
7.2 Resource utilization
Quality of the resource utilization sub-characteristic should be measured against quality measures
according to ISO/IEC 25023:2016, 8.3.2.

© ISO/IEC 2024 – All rights reserved
The organization should allocate the appropriate resources during the training process of an ML-based AI
system. Factors such as computational resource types, time requirements, data quantity, ML model type and
the quantity of hyperparameters can impact the resources required to complete this process.
An AI system can also get its input from other AI systems, which should be considered as this can create
additional dependencies on utilized resources.
The organization should consider available resources through a system adaptability strategy (e.g.
incremental learning, active learning, online learning or retraining strategy). For example, an adaptability
strategy consisting of retraining an AI system hourly takes few resources during inference but several
resources during the retraining. Consider as part of this adaptability strategy that inference and retraining
can happen simultaneously.
7.3 Capacity
Quality of the capacity sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.3.3.
8 Compatibility
8.1 Co-existence
Quality of the co-existence sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.4.1.
8.2 Interoperability
Quality of the interoperability sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.4.2.
9 Usability
9.1 Appropriateness recognizability
Quality of the appropriateness recognizability sub-characteristic should be measured against quality
measures according to ISO/IEC 25023:2016, 8.5.1.
9.2 Learnability
Quality of the learnability sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.5.2.
Additionally, measurements commonly used to evaluate learnability of an AI system refer to but are not
limited to the following:
— Task performance, for example, learning how to input the variables needed in the user interface for an AI
system to return a ML model prediction. The measurements evaluate the ability to perform a task in the
expected time, percentage of users who completed the task, characteristics of the users who completed
the task and others.
— Usage, for example, learning the commands available to operate an AI system. The measurements
evaluate the success rate of the command use, the number of commands used in a time interval and
others.
— Cognitive process, for example, the thinking time required for a user to understand an ML model
inference.
© ISO/IEC 2024 – All rights reserved
— User feedback: The opinion of the user after learning or not learning to use an AI system. For example,
does the system contain any counterintuitive parameters or functionalities?
The learnability measurements are susceptible to the tester or the user capabilities and experience. For
example, users with a technical background or general AI understanding can learn how to perform a task
and use commands in shorter times, while users with a non-technical background need a good user interface
to reach the learnability goals. A diverse group of users is important to accurately evaluate learnability.
An evaluation of learnability measures can be made and can include the following:
— User testing: users can rate the degree to which they can learn how to use an AI system. For example,
are the user guidelines clear? Do I understand the variables required to run a specific AI system? Are the
default fields comprehensive and based on real scenarios? Do the outputs of an AI system correspond to
the outputs described in the user guidelines?
— Other learnability evaluation tools: the learning curve, which is the relationship between time and
task repetition, provides information on a) first-use learnability; b) how quickly users improve with
repetition (steepness of the curve); c) how the productivity of the user can improve if they learn how to
use the system appropriately (efficiency of the ultimate plateau).
9.3 Operability
Quality of the operability sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.5.3.
Operability of an AI system can also be measured by the following:
— Peer review: technical peers evaluate an AI system based on individual experiences and lessons learnt.
For example, an ML engineer can identify that the time between having an ML model inference and
delivering the result to the user be improved. The ML engineer can suggest new ML model deployment
options.
— User testing: users will be asked to rate the degree to which an AI system is meeting their expectations
with efficiency and efficacy. For example, is the ML model providing accurate classification of the entries?
Is it comparable to human classification?
— Other evaluation methods linked to the measurements of reusability and reliability: an AI system’s
consistency when exposed to different environments and users can help meet the expectations of the
user. For example, measuring inference delivery times of an ML-based AI system when exposed to
different variables – users, processing power and others.
9.4 User error protection
Quality of the user error protection sub-characteristic should be measured against quality measures
according to ISO/IEC 25023:2016, 8.5.4.
9.5 User interface aesthetics
Quality of the user interface aesthetics sub-characteristic should be measured against quality measures
according to ISO/IEC 25023:2016, 8.5.5.
9.6 Accessibility
Quality of the accessibility sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.5.6.
© ISO/IEC 2024 – All rights reserved
9.7 User controllability
The organization should use the controllability framework according to ISO/IEC TS 8200 for an AI system
designed to be controllable. For an evaluation, the following should be considered and prepared in advance:
— AI system that is runnable and with control functionalities implemented according to requirements;
— test items that are designed specifically for the test of control functionalities;
— toolkits capable of issuing controllability instructions, receiving or observing system appearance or
internal parameter changes, as well as computing concerned measures (e.g. response latencies and
stabilities).
The evaluation of an AI system’s controllability should include the following:
— To check whether an AI system’s control functionalities meet requirements, each of the required control
functionalities, as well as their useful combinations or workflows, should be tested.
— To qualify the level of controllability of an AI system, specified control functionalities in the level where
the system is designed, implemented or required to be in should be tested.
A specific set of test items should be prepared according to the evaluation objectives.
Sub-processes of control, such as control transfer, engagement and disengagement of control and uncertainty
handling of control transfer, as well as the actual control, should be tested. For each control functionality, the
test item can check the following. Stakeholders can apply a subset of these according to specific concerns:
— Correctness of a control functionality: an AI system that can learn or execute as expected when
receiving correct control instruction and remain in its current state when receiving an incorrect control
instruction.
— Duration of a control functionality: length of time needed by a sub-process or the entire process to
complete a control functionality. The durations of important sub-processes (e.g. transfer of control and
engagement of control) should be checked for an AI system that operates in risky environments.
— Reliability of a control functionality: degree to which the control functionality can behave consistently,
particularly in those cases when system faults or external unpredicted incidences can happen. To test
this, fault injection mechanisms can be selected and applied.
— Number of operations needed by a control functionality: total number of operations a controller should
carry out, such that the entire control functionality process can complete and return results. All
operations needed by all sub-processes count.
For further details, refer to ISO/IEC TS 8200.
9.8 Transparency
The presentation of information to stakeholders should be open, comprehensive and understandable.
The organization should be able to understand, trace and document all privacy-relevant data processing
considerations, including the legal, technical and organizational.
An AI system should have clear owners who are accountable for meeting the expected benefits and
communicating about the system’s outcomes to the stakeholders.
The relevant characteristics used in an AI system should be comprehensive, accessible, clear and
understandable to the stakeholders. For example, considering that only a finite number of variables are used
as input to an AI system, this limitation should be communicated clearly to avoid misunderstandings.
The organization should communicate the risks of an AI system’s output (i.e. predictions, decisions or
activities) affecting society, the economy or the environment in a clear, accurate, timely, honest and complete
manner.
© ISO/IEC 2024 – All rights reserved
The organization should communicate an AI system’s output (i.e. predictions, decisions or activities) to
relevant stakeholders in a comprehensive, accessible and understandable manner.
Appropriate information about an AI system and its level of quality should be made available to relevant
stakeholders.
10 Reliability
10.1 Maturity
Quality of the maturity sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.6.1.
10.2 Availability
Quality of the availability sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.6.2.
10.3 Fault tolerance
Quality of the fault tolerance sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.6.3.
10.4 Recoverability
Quality of the recoverability sub-characteristic should be measured against quality measures according to
ISO/IEC 25023:2016, 8.6.4.
10.5 Robustness
The robustness of an AI system should be measured under normal operational conditions. These conditions
are dependent on the AI system’s domain, as described in ISO/IEC 24029-2:2023, 5.2. The bounded domain
should:
— be determined by a set of attributes that are clearly defined;
— be sufficient for the AI system to conduct one or more given tasks as intended;
— use training data that is representative of data expected to be used for inference.
The following actions should be considered to measure the AI system’s robustness against the bounded
domain:
— assess training, validation and testing dataset to ensure they cover normal operational conditions;
— develop specific test scenarios to test the system’s performance under a wide range of normal operational
conditions;
— use simulation as a test data generator to address the full range of operation;
— consider regularization techniques, data augmentation or introduction of random noise to maximize
robustness of an AI system under normal operating conditions;
— evaluate functional correctness (see 6.2 for the recommended guidance on functional correctness and
ISO/IEC TR 24029-1:2021, 5.2 for robustness metrics).
An AI system should endure long-tail, black swan and abnormal events. The following actions should be
considered:
— perform test scenarios of long-tail and abnormal events, benchmarks and stress-test of an AI system;

© ISO/IEC 2024 – All rights reserved
— identify breaking points of an AI system;
— decide whether an AI system is robust enough for potential future needs based on the analysis results.
An AI system should handle diverse perceptible and unforeseen attacks. The following actions should be
considered:
— Address failures and errors according to best practice, for example, through hardening, testing and
verification of an AI system’s stability using techniques such as metamorphic testing, data augmentation,
generative adversarial networks, adversarial training, adversarial example generation or adversarial
example detection.
— Apply specific countermeasures in the field of machine learning, such as anomaly detection of an AI
system’s input and output and flagging novel misuses to further mitigate functional risks.
An AI system should generate representative outputs under abnormal environmental conditions by:
— leveraging confidence score or confidence intervals to decide whether to act on the output generated or
if a backup workflow should be triggered;
— calibrating confidence score or confidence intervals to be representative of the uncertainty relative to
the output being true by using calibration measurements validated on testing data;
— integrating a backup workflow, such as a manual queue, a heuristics model, a statistical model or
a separate AI system to overwrite outcomes from an AI system that are considered abnormal or too
uncertain based on their confidence scores or confidence intervals;
— implementing a robust backup workflow that should be operational when an AI system fails to generate
outputs.
An AI system’s hardware should be robust to common causes of failure. From the point of view of common
cause failures at the hardware lev
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

La norme ISO/IEC TS 25058:2024, intitulée "Systèmes et ingénierie logicielle - Exigences de qualité et évaluation des systèmes et logiciels (SQuaRE) - Guide pour l’évaluation de la qualité des systèmes d’intelligence artificielle (IA)", représente un progrès significatif dans le domaine de l’évaluation des systèmes d’IA. Son champ d’application est vaste, englobant toutes les organisations impliquées dans le développement et l'utilisation de ces technologies. L'un des points forts de cette norme réside dans sa capacité à fournir un modèle de qualité spécifiquement conçu pour les systèmes d’IA. Cela permet une évaluation systématique et structurée de la qualité des systèmes d’IA, assurant ainsi que les exigences de qualité soient clairement définies et mesurables. Ce modèle facilite une approche harmonisée, ce qui est essentiel dans un secteur en pleine évolution comme l'intelligence artificielle. En outre, la pertinence de la norme ISO/IEC TS 25058:2024 est indiscutable dans un contexte où les systèmes d’IA sont de plus en plus intégrés dans divers secteurs. Les attentes en matière de qualité et de fiabilité de ces systèmes augmentent, et cette norme répond à ce besoin en offrant des lignes directrices qui aident les organisations à évaluer et garantir la qualité des systèmes d'IA avant leur déploiement. Enfin, l'inclusivité de la norme, applicable à tous les types d’organisations, permet de promouvoir une adoption généralisée des bonnes pratiques en matière d’évaluation de la qualité des systèmes d’IA. Avec son focus sur les exigences spécifiques à l'intelligence artificielle, ISO/IEC TS 25058:2024 constitue une référence cruciale pour garantir que les systèmes développés non seulement répondent aux besoins des utilisateurs, mais aussi respectent des standards de qualité élevés, réduisant ainsi les risques associés à leur utilisation.

Die ISO/IEC TS 25058:2024 ist ein bedeutendes Dokument im Bereich der System- und Softwaretechnik, das spezifische Leitlinien zur Bewertung von Qualität bei künstlicher Intelligenz (KI) Systemen bietet. Es richtet sich an alle Arten von Organisationen, die an der Entwicklung und Nutzung von KI beteiligt sind, und stellt somit sicher, dass dieser Standard weitreichend anwendbar ist. Ein herausragendes Merkmal dieser Norm ist das integrierte Qualitätssystem, das die Evaluierung von KI-Anwendungen strukturiert und systematisiert. Die Bereitstellung eines spezifischen Modells zur Qualität von KI-Systemen ermöglicht es Organisationen, klar definierte Kriterien zur Bewertung der Qualität zu verwenden. Diese Norm bietet zudem eine fundierte Grundlage, um die verschiedenen Qualitätsmerkmale von KI-Systemen zu untersuchen, einschließlich Funktionalität, Zuverlässigkeit, Effizienz und Benutzbarkeit. Darüber hinaus unterstreicht die ISO/IEC TS 25058:2024 die Relevanz von Qualität in der Entwicklung von KI-Systemen, die zunehmend in kritischen Anwendungen eingesetzt werden. Durch die Fokussierung auf bewährte Praktiken und detaillierte Evaluationsmethoden verbessert der Standard die Fähigkeit der Organisationen, qualitativ hochwertige und zuverlässige KI-Lösungen zu erstellen. Dies ist besonders wichtig in einem Markt, der hohen Erwartungen an die Leistung und Sicherheit von KI-Technologien hat. Insgesamt trägt die ISO/IEC TS 25058:2024 entscheidend dazu bei, die Qualitätssicherung im Bereich der KI-Technologie zu fördern und unterstützt Organisationen dabei, ihre Produkte und Dienstleistungen auf einem hohen Qualitätsniveau anzubieten.

ISO/IEC TS 25058:2024 표준은 인공지능(AI) 시스템의 품질 평가를 위한 지침을 제공하는 중요한 문서입니다. 이 표준의 범위는 AI 시스템 품질 모델을 활용하여 AI 시스템을 평가하는 방법을 안내하며, 모든 종류의 AI 개발 및 사용에 참여하는 조직에 적용될 수 있습니다. 이를 통해 다양한 조직들이 AI 시스템의 품질을 보다 체계적으로 관리하고 평가할 수 있는 기반을 제공합니다. 이 표준의 강점은 명확하고 구체적인 품질 평가 기준을 제시한다는 점입니다. 이는 조직들이 인공지능의 복잡성과 빠른 발전 속도를 고려할 때 매우 중요한 요소로 작용합니다. AI 시스템의 품질을 평가하는 데 필요한 다양한 관점을 다루며, 전체적인 품질 관리 프로세스와 통합될 수 있는 방법을 제시합니다. ISO/IEC TS 25058:2024는 AI 시스템의 품질을 공정하고 일관되게 평가할 수 있는 지침을 제공함으로써, 인공지능 개발자와 사용자 모두에게 실제적이고 실질적인 혜택을 가져다 줍니다. 특히, AI 시스템의 특성상 발생할 수 있는 다양한 위험 요소를 사전에 인지하고 이에 대한 평가 기준을 마련함으로써, 조직은 더욱 신뢰성 있는 AI 솔루션을 제공할 수 있습니다. 결론적으로, ISO/IEC TS 25058:2024는 인공지능 시스템의 품질 평가에 있어 필수적인 기준을 제시하며, 모든 AI 관련 조직이 적극적으로 참고하여 적용해야 할 중요한 관리 도구로 자리잡고 있습니다. 이 표준은 AI의 품질 보증 및 지속 가능한 발전에 기여할 것으로 기대됩니다.

ISO/IEC TS 25058:2024 establishes a comprehensive framework for evaluating the quality of artificial intelligence (AI) systems, emphasizing the integration of quality requirements specific to AI within the larger landscape of systems and software engineering. This standard is particularly relevant in today's rapidly evolving technological environment, where the integration of AI components into various applications necessitates rigorous assessment criteria to ensure reliability, safety, and effectiveness. The scope of ISO/IEC TS 25058:2024 is broad, encompassing any type of organization involved in the development and utilization of AI systems. This inclusivity addresses a wide range of stakeholders, from startups focused on innovative AI solutions to large enterprises implementing AI at scale. By doing so, the standard fosters a common understanding of quality requirements across diverse entities, promoting a standardized approach to quality evaluation. One of the key strengths of this standard is its use of an AI system quality model, which provides structured guidance for the evaluation process. This model allows organizations to assess various quality attributes of AI systems systematically, such as functionality, performance, and security, ensuring that the AI products meet both user expectations and regulatory demands. Furthermore, the standard aligns with existing quality frameworks, facilitating integration with current practices in systems and software engineering. ISO/IEC TS 25058:2024 addresses the unique challenges associated with AI evaluations, including biases in data, interpretability of algorithms, and ethical considerations. By incorporating these elements, the standard enhances the relevance and applicability of quality evaluations in real-world AI deployments. The proactive approach outlined in the document allows organizations to identify and mitigate potential issues before they manifest in operational settings, ultimately contributing to the trustworthiness of AI technologies. In summary, ISO/IEC TS 25058:2024 serves as an essential guideline for organizations aiming to ensure quality in AI systems through structured evaluations based on a well-defined quality model. Its strengths lie in its comprehensive scope, clear evaluation frameworks, and emphasis on addressing the distinctive aspects of AI systems, making it a critical resource for contemporary AI development and implementation.

ISO/IEC TS 25058:2024は、人工知能(AI)システムの評価に関するガイダンスを提供する重要な標準です。その範囲は広く、AIシステムの品質モデルを使用して、評価を行うための指針を示しています。この標準は、AIの開発と利用に関わるすべてのタイプの組織に適用可能であり、業界全体での一貫性と透明性を促進するポイントが評価されます。 本標準の強みは、AIシステムの品質に関する詳細な評価指針を提供する点です。AIシステムの特異性を考慮し、具体的な評価基準や手法を提示することで、開発者や企業が品質を確保するための具体的な手助けをしています。これにより、AIシステムの信頼性や安全性が高まり、使用者の満足度向上にも寄与します。 さらに、ISO/IEC TS 25058:2024は、さまざまな業種においてAI技術の導入が進む現代において、特に重要な関連性を持っています。AI技術が急速に進化する中で、品質評価の標準化は極めて必要であり、この文書はその要件に応えるものとして、業界での広範な適用が期待されます。このように、ISO/IEC TS 25058:2024は、AIシステムの品質評価に関するガイダンスを提供することで、関連するすべての関係者にとっての価値ある資源となっています。