ISO/TS 21749:2005
(Main)Measurement uncertainty for metrological applications — Repeated measurements and nested experiments
Measurement uncertainty for metrological applications — Repeated measurements and nested experiments
ISO/TS 21749:2005 follows the approach taken in the Guide to the expression of the uncertainty of measurement (GUM) and establishes the basic structure for stating and combining components of uncertainty. To this basic structure, it adds a statistical framework using the analysis of variance (ANOVA) for estimating individual components, particularly those classified as Type A evaluations of uncertainty, i.e. based on the use of statistical methods. A short description of Type B evaluations of uncertainty (non-statistical) is included for completeness. ISO/TS 21749:2005 covers experimental situations where the components of uncertainty can be estimated from statistical analysis of repeated measurements, instruments, test items or check standards. It provides methods for obtaining uncertainties from single-, two- and three-level nested designs only. More complicated experimental situations where, for example, there is interaction between operator effects and instrument effects or a cross effect, are not covered. ISO/TS 21749:2005 is not applicable to measurements that cannot be replicated, such as destructive measurements or measurements on dynamically varying systems (such as fluid flow, electronic currents or telecommunications systems). It is not particularly directed to the certification of reference materials (particularly chemical substances) and to calibrations where artefacts are compared using a scheme known as a "weighing design". For certification of reference materials, see ISO Guide 35. When results from interlaboratory studies can be used, techniques are presented in the companion guide ISO/TS 21748. The main difference between ISO/TS 21748 and this Technical Specification is that the ISO/TS 21748 is concerned with reproducibility data (with the inevitable repeatability effects), whereas this Technical Specification concentrates on repeatability data and the use of the analysis of variance for its treatment. ISO/TS 21749:2005 is applicable to a wide variety of measurements, for example, lengths, angles, voltages, resistances, masses and densities.
Incertitude de mesure pour les applications en métrologie — Mesures répétées et expériences emboîtées
General Information
Standards Content (Sample)
TECHNICAL ISO/TS
SPECIFICATION 21749
First edition
2005-02-15
Corrected version
2005-07-15
Measurement uncertainty for
metrological applications — Repeated
measurements and nested experiments
Incertitude de mesure pour les applications en métrologie — Mesures
répétées et expériences emboîtées
Reference number
©
ISO 2005
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2005
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2005 – All rights reserved
Contents Page
Foreword. iv
Introduction . v
1 Scope. 1
2 Normative references . 1
3 Terms and definitions. 2
4 Statistical methods of uncertainty evaluation .3
4.1 Approach of the Guide to the expression of uncertainty of measurement . 3
4.2 Check standards . 4
4.3 Steps in uncertainty evaluation. 5
4.4 Examples in this Technical Specification. 6
5 Type A evaluation of uncertainty . 6
5.1 General. 6
5.2 Role of time in Type A evaluation of uncertainty . 7
5.3 Measurement configuration. 14
5.4 Material inhomogeneity. 16
5.5 Bias due to measurement configurations . 17
6 Type B evaluation of uncertainty . 26
7 Propagation of uncertainty . 27
7.1 General. 27
7.2 Formulae for functions of a single variable .28
7.3 Formulae for functions of two variables. 28
8 Example — Type A evaluation of uncertainty from a gauge study . 30
8.1 Purpose and background. 30
8.2 Data collection and check standards. 30
8.3 Analysis of repeatability, day-to-day and long-term effects. 31
8.4 Probe bias. 31
8.5 Wiring bias. 33
8.6 Uncertainty calculation. 35
Annex A (normative) Symbols . 37
Bibliography . 38
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
In other circumstances, particularly when there is an urgent market requirement for such documents, a
technical committee may decide to publish other types of normative document:
— an ISO Publicly Available Specification (ISO/PAS) represents an agreement between technical experts in
an ISO working group and is accepted for publication if it is approved by more than 50 % of the members
of the parent committee casting a vote;
— an ISO Technical Specification (ISO/TS) represents an agreement between the members of a technical
committee and is accepted for publication if it is approved by 2/3 of the members of the committee casting
a vote.
An ISO/PAS or ISO/TS is reviewed after three years in order to decide whether it will be confirmed for a
further three years, revised to become an International Standard, or withdrawn. If the ISO/PAS or ISO/TS is
confirmed, it is reviewed again after a further three years, at which time it must either be transformed into an
International Standard or be withdrawn.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO/TS 21749 was prepared by Technical Committee ISO/TC 69, Applications of statistical methods,
Subcommittee SC 6, Measurement methods and results.
This corrected version of ISO/TS 21749:2005 incorporates the correction of the title.
iv © ISO 2005 – All rights reserved
Introduction
Test, calibration and other laboratories are frequently required to report the results of measurements and the
associated uncertainties. Evaluation of uncertainty is an on-going process that can consume time and
resources. In particular, there are many tests and other operations carried out by laboratories where two or
three sources of uncertainty are involved. Following the approach in the Guide to the expression of uncertainty
of measurement (GUM) to combining components of uncertainty, this document focuses on using the analysis
of variance (ANOVA) for estimating individual components, particularly those based on Type A (statistical)
evaluations.
An experiment is designed by the laboratory to enable an adequate number of measurements to be made, the
analysis of which will permit the separation of the uncertainty components. The experiment, in terms of design
and execution, and the subsequent analysis and uncertainty evaluation, require familiarity with data analysis
techniques, particularly statistical analysis. Therefore, it is important for laboratory personnel to be aware of
the resources required and to plan the necessary data collection and analysis.
In this Technical Specification, the uncertainty components based on Type A evaluations can be estimated
from statistical analysis of repeated measurements, from instruments, test items or check standards.
A purpose of this Technical Specification is to provide guidance on the evaluation of the uncertainties
associated with the measurement of test items, for instance as part of ongoing manufacturing inspection.
Such uncertainties contain contributions from the measurement process itself and from the variability of the
manufacturing process. Both types of contribution include those from operators, environmental conditions and
other effects. In order to assist in separating the effects of the measurement process and manufacturing
variability, measurements of check standards are used to provide data on the measurement process itself.
Such measurements are nominally identical to those made on the test items. In particular, measurements on
check standards are used to help identify time-dependent effects, so that such effects can be evaluated and
contrasted with a database of check standard measurements. These standards are also useful in helping to
control the bias and long-term drift of the process once a baseline for these quantities has been established
from historical data.
Clause 4 briefly describes the statistical methods of uncertainty evaluation including the approach
recommended in the GUM, the use of check standards, the steps in uncertainty evaluation and the examples
in this Technical Specification. Clause 5, the main part of this Technical Specification, discusses the Type A
evaluations. Nested designs in ANOVA are used in dealing with time-dependent sources of uncertainty. Other
sources such as those from the measurement configuration, material inhomogeneity, and the bias due to
measurement configurations and related uncertainty analyses are discussed. Type B (non-statistical)
evaluations of uncertainty are discussed for completeness in Clause 6. The law of propagation of uncertainty
described in the GUM has been widely used. Clause 7 provides formulae obtained by applying this law to
certain functions of one and two variables. In Clause 8, as an example, a Type A evaluation of uncertainty for
a gauge study is discussed, where uncertainty components from various sources are obtained. Annex A lists
the statistical symbols used in this Technical Specification.
TECHNICAL SPECIFICATION ISO/TS 21749:2005(E)
Measurement uncertainty for metrological applications —
Repeated measurements and nested experiments
1 Scope
This Technical Specification follows the approach taken in the Guide to the expression of the uncertainty of
measurement (GUM) and establishes the basic structure for stating and combining components of
uncertainty. To this basic structure, it adds a statistical framework using the analysis of variance (ANOVA) for
estimating individual components, particularly those classified as Type A evaluations of uncertainty, i.e. based
on the use of statistical methods. A short description of Type B evaluations of uncertainty (non-statistical) is
included for completeness.
This Technical Specification covers experimental situations where the components of uncertainty can be
estimated from statistical analysis of repeated measurements, instruments, test items or check standards.
It provides methods for obtaining uncertainties from single-, two- and three-level nested designs only. More
complicated experimental situations where, for example, there is interaction between operator effects and
instrument effects or a cross effect, are not covered.
This Technical Specification is not applicable to measurements that cannot be replicated, such as destructive
measurements or measurements on dynamically varying systems (such as fluid flow, electronic currents or
telecommunications systems). It is not particularly directed to the certification of reference materials
(particularly chemical substances) and to calibrations where artefacts are compared using a scheme known
[14]
as a “weighing design”. For certification of reference materials, see ISO Guide 35 .
When results from interlaboratory studies can be used, techniques are presented in the companion guide
[15]
ISO/TS 21748 . The main difference between ISO/TS 21748 and this Technical Specification is that the
ISO/TS 21748 is concerned with reproducibility data (with the inevitable repeatability effects), whereas this
Technical Specification concentrates on repeatability data and the use of the analysis of variance for its
treatment.
This Technical Specification is applicable to a wide variety of measurements, for example, lengths, angles,
voltages, resistances, masses and densities.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 3534-1:1993, Statistics — Vocabulary and symbols — Part 1: Probability and general statistical terms
ISO 3534-3:1999, Statistics — Vocabulary and symbols — Part 3: Design of experiments
ISO 5725-1, Accuracy (trueness and precision) of measurement methods and results — Part 1: General
principles and definitions
ISO 5725-2, Accuracy (trueness and precision) of measurement methods and results — Part 2: Basic method
for the determination of repeatability and reproducibility of a standard measurement method
ISO 5725-3, Accuracy (trueness and precision) of measurement methods and results — Part 3: Intermediate
measures of the precision of a standard measurement method
ISO 5725-4, Accuracy (trueness and precision) of measurement methods and results — Part 4: Basic
methods for the determination of the trueness of a standard measurement method
ISO 5725-5, Accuracy (trueness and precision) of measurement methods and results — Part 5: Alternative
methods for the determination of the precision of a standard measurement method
ISO 5725-6, Accuracy (trueness and precision) of measurement methods and results — Part 6: Use in
practice of accuracy values
Guide to the expression of uncertainty in measurement (GUM), BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, OIML,
1993, corrected and reprinted in 1995
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 3534-1, ISO 3534-3, ISO 5725 (all
parts) and the following apply.
3.1
measurand
well-defined physical quantity that is to be measured and can be characterized by an essentially unique value
3.2
uncertainty of measurement
parameter or an estimate of the parameter, associated with the result of a measurement, that characterizes
the dispersion of the values that could reasonably be attributed to the quantity being measured
3.3
Type A evaluation
method of evaluation of uncertainty by using statistical methods
3.4
Type B evaluation
method of evaluation of uncertainty by means other than statistical methods
3.5
standard uncertainty
uncertainty expressed as a standard deviation associated with a single component of uncertainty
3.6
combined standard uncertainty
standard deviation associated with the result of a particular measurement or series of measurements that
takes into account one or more components of uncertainty
3.7
expanded uncertainty
combined standard uncertainty multiplied by a coverage factor which usually is an appropriate critical value
from the t-distribution which depends upon the degrees of freedom in the combined standard uncertainty and
the desired level of coverage
3.8
effective degrees of freedom
degrees of freedom associated with a standard deviation composed of two or more components of variance
NOTE The effective degrees of freedom can be computed using the Welch-Satterthwaite approximation (see GUM,
G.4).
2 © ISO 2005 – All rights reserved
3.9
nested design
experimental design in which each level (i.e. each potential setting, value or assignment of a factor) of a given
factor appears in only a single level of any other factor
NOTE 1 Adapted from ISO 3534-3:1999, definition 2.6.
NOTE 2 See ISO 3434-3:1999, 1.6, for the definition of level.
3.10
fixed effects
〈factors〉 effects resulting from the preselection of levels of each factor over the range of values of the factors
3.11
random effects
〈factors〉 effects resulting from the sampling at each level of each factor from the population of levels of each
factor
3.12
balanced nested design
nested design experiment in which the number of levels of the nested factors is constant
[ISO 3534-3:1999, definition 2.6.1]
3.13
mean square for random errors
sum of squared error divided by the corresponding degrees of freedom
NOTE See ISO 3534-1:1993, 2.85 for the definition of the degrees of freedom.
4 Statistical methods of uncertainty evaluation
4.1 Approach of the Guide to the expression of uncertainty of measurement
The Guide to the expression of uncertainty of measurement (GUM) recommends that the result of
measurement be corrected for all recognized significant systematic effects, that the result accordingly be the
best (or at least unbiased) estimate of the measurand and that a complete model of the measurement system
exists. The model provides a functional relationship between a set of input quantities (upon which the
measurand depends) and the measurand (output quantities). The objective of uncertainty evaluation is to
determine an interval that can be expected to encompass a large fraction of the distribution of values that
could reasonably be attributed to the measurand. Since a bias cannot be quantified exactly, when a result of
measurement is corrected for bias, the correction has an associated uncertainty.
The general approach, beginning from the modelling process, is the following.
NOTE The approach here relates to input quantities that are mutually independent. It is capable of a further
generalization to mutually dependent input quantities (see the GUM, 5.2).
a) Develop a mathematical model (functional relationship) of the measurement process or measurement
system that relates the model input quantities (including influence quantities) to the model output quantity
(measurand). In many cases, this model is the formula (or formulae) used to calculate the measurement
result, augmented if necessary by random, environmental and other effects such as bias correction that
may affect the measurement result.
b) Assign best estimates and the associated standard uncertainties (uncertainties expressed as standard
deviations) to the model input quantities.
c) Evaluate the contribution to the standard uncertainty associated with the measurement result that is
attributable to each input quantity. These contributions shall take into account uncertainties associated
with both random and systematic effects relating to the input quantities, and may themselves involve
more detailed uncertainty evaluations.
d) Aggregate these standard uncertainties to obtain the (combined) standard uncertainty associated with the
measurement result. This evaluation of uncertainty is carried out, according to GUM, using the law of
propagation of uncertainty, or by more general analytical or numerical methods when the conditions for
the law of propagation of uncertainty do not apply or it is not known whether they apply.
e) Where appropriate, multiply the standard uncertainty associated with the measurement result by a
coverage factor to obtain an expanded uncertainty and hence a coverage interval for the measurand at a
prescribed level of confidence. The GUM provides an approach that can be used to calculate the
coverage factor. If the degrees of freedom for the standard uncertainties of all the input quantities are
infinite, the coverage factor is determined from the normal distribution. Otherwise, the (effective) degrees
of freedom for the combined standard uncertainty is estimated from the degrees of freedom for the
standard uncertainties associated with the best estimates of the input quantities using the Welch-
Satterthwaite formula.
The GUM permits the evaluation of standard uncertainties by any appropriate means. It distinguishes the
evaluation by the statistical treatment of repeated observations as a Type A evaluation of uncertainty, and the
evaluation by any other means as a Type B evaluation of uncertainty. In evaluating the combined standard
uncertainty, both types of evaluation are to be characterized by variances (squared standard uncertainties)
and treated in the same way.
Full details of this procedure and the additional assumptions on which it is based are given in the GUM.
The purpose of this Technical Specification is to provide additional detail on the evaluation of uncertainty by
statistical means, concentrating on b) above, whether obtained by repeated measurement of the input
quantities or of the entire measurement.
In this Technical Specification the term “artefact” is often used in the context of measurement. This usage is to
be given a general interpretation in that the measurement may also relate to a bulk or chemical item, etc.
4.2 Check standards
A check standard is a standard required to have the following properties.
a) It shall be capable of being measured periodically.
b) It shall be close in material content and geometry to the production items.
c) It shall be a stable artefact.
d) It shall be available to the measurement process at all times.
Subject to its having these properties, an ideal check standard is an artefact selected at random from the
production items, if appropriate, and reserved for this purpose.
Examples of the use of check standards include
measurements on a stable artefact, and
differences between values of two reference standards as estimated from a calibration experiment.
Methods for analysing check standard measurements are treated in 5.2.3.
In this Technical Specification, the term “check standard” is to be given a general interpretation. For instance,
a bulk or chemical item may be used.
4 © ISO 2005 – All rights reserved
4.3 Steps in uncertainty evaluation
4.3.1 The first step in the uncertainty evaluation is the definition of the measurand for which a measurement
result is to be reported for the test item. Special care should be taken to provide an unambiguous definition of
the measurand, because the resulting uncertainty will depend on this definition. Possibilities include
quantity at an instant in time at a point in space,
quantity at an instant in time averaged over a specified spatial region,
quantity at a point in space averaged over a time period.
For instance, the measurands corresponding to the hardness of a specimen of a ceramic material are (very)
different
a) at a specified point in the specimen, or
b) averaged over the specimen.
4.3.2 If the value of the measurand can be measured directly, the evaluation of the standard uncertainty
depends on the number of repeated measurements and the environmental and operational conditions over
which the repetitions are made. It also depends on other sources of uncertainty that cannot be observed
under the conditions selected to repeat the measurements, such as calibration uncertainties for reference
standards. On the other hand, if the value of the measurand cannot be measured directly, but is to be
calculated from measurements of secondary quantities, the model (or functional relationship) for combining
the various quantities must be defined. The standard uncertainties associated with best estimates of the
secondary quantities are then needed to evaluate the standard uncertainty associated with the value of the
measurand.
The steps to be followed in an uncertainty evaluation are outlined as follows.
a) Type A evaluations:
1) If the output quantity is represented by Y, and measurements of Y can be replicated, use an ANOVA
model to provide estimates of the variance components, associated with Y, for random effects from
replicated results for the test item,
measurements on a check standard,
measurements made according to a designed experiment.
2) If measurements of Y cannot be replicated directly, and the model
Y = f (X , X , ., X )
1 2 n
is known, and the input quantities X can be replicated, evaluate the uncertainties associated with the
i
best estimates x of X ; then the law of propagation of uncertainty can be used.
i i
3) If measurements of Y or X cannot be replicated, refer to Type B evaluations.
i
b) Type B evaluations: evaluate a standard uncertainty associated with the best estimate of each input
quantity.
c) Aggregate the standard uncertainties from the Type A and Type B evaluations to provide a standard
uncertainty associated with the measurement result.
d) Compute an expanded uncertainty.
4.4 Examples in this Technical Specification
The purpose of the examples in various clauses of this Technical Specification and the more detailed case
study in Clause 8 is to demonstrate the evaluation of uncertainty associated with measurement processes
having several sources of uncertainty. The reader should be able to generalize the principles illustrated in
these sections to particular applications. The examples treat the effect of both random effects and systematic
effects in the form of bias on the measurement result. There is an emphasis on quantifying uncertainties
observed over time, such as those for time intervals defined as short-term (repeatability) and for intermediate
measures of precision such as day-to-day or run-to-run, as well as for reproducibility. For the reader's
purpose, the time intervals should be defined in a way that makes sense for the measurement process in
question.
To illustrate strategies for dealing with several sources of uncertainty, data from the Electronics and Electrical
Engineering Laboratory of the National Institute of Standards and Technology (NIST), USA, are featured. The
measurements in question are volume resistivities (Ω⋅cm) of silicon wafers. These data were chosen for
illustrative purposes because of the inherent difficulties in measuring resistivity by probing the surface of the
wafer and because the measurand is defined by an ASTM test method and cannot be defined independently
of the method.
The intent of the experiment is to evaluate the uncertainty associated with the resistivity measurements of
silicon wafers at various levels of resistivity (Ω·cm), which were certified using a four-point probe wired in a
specific configuration. The test method is ASTM Method F84. The reported resistivity for each wafer is the
average of six short-term repetitions made at the centre of the wafer.
5 Type A evaluation of uncertainty
5.1 General
5.1.1 Generally speaking, any observation that can be repeated (see GUM, 3.1.4 to 3.1.6) can provide data
suitable for a Type A evaluation. Type A evaluations can be based on (for example) the following:
repeated measurements on the item under test, in the course of, or in addition to, the measurement
necessary to provide the result;
measurements carried out on a suitable test material during the course of method validation, prior to any
measurements being carried out;
measurements on check standards, that is, test items measured repeatedly over a period of time to
monitor the stability of the measurement process, where appropriate;
measurements on certified reference materials or standards;
repeated observations or determination of influence quantities (for example, regular or random monitoring
of environmental conditions in the laboratory, or repeated measurements of a quantity used to calculate
the measurement result).
5.1.2 Type A evaluations can apply both to random and systematic effects (GUM, 3.2). The only
requirement is that the evaluation of the uncertainty component is based on a statistical analysis of series of
observations. The distinction with regard to random and systematic effects is that
random effects vary between observations and are not to be corrected,
systematic effects can be regarded as essentially constant over observations in the short term and can,
theoretically at least, be corrected or eliminated from the result.
6 © ISO 2005 – All rights reserved
Sometimes it is difficult to distinguish a systematic effect from random effects and it becomes a question of
interpretation and the use of related statistical models. In general, it is not possible to separate random and
systematic effects.
The GUM recommends that generally all systematic effects are corrected and that consequently the only
uncertainty from such sources are those of the corrections. The role of time in the evaluation of Type A
uncertainty using nested designs is discussed in 5.2. The uncertainties associated with measurement
configuration and material inhomogeneity, respectively, are discussed in 5.3 and 5.4. Guidance on how to
assess and correct for bias due to measurement configurations and to evaluate the associated uncertainty is
given in 5.5. The manner in which the source of uncertainty affects the reported value and the context for the
uncertainty determine whether an analysis of a random or systematic effect is appropriate.
Consider a laboratory with several instruments of a certain type, regarded as representative of the set of all
instruments of that type. Then the differences among the instruments in this set can be considered to be a
random effect if the uncertainty statement is intended to apply to the result of any particular instrument,
selected at random, from the set.
Conversely, if the uncertainty statement is intended to apply to one (or several) specific instrument, the
systematic effect of this instrument relative to the set is the component of interest.
5.2 Role of time in Type A evaluation of uncertainty
5.2.1 Time-dependent sources of uncertainty and choice of time intervals
Many random effects are time-dependent, often due to environmental changes. Three levels of time-
dependent fluctuations are discussed and can be characterized as
a) short-term fluctuations (repeatability or instrument precision),
b) intermediate fluctuations (day-to-day or operator-to-operator or equipment-to-equipment, known as
intermediate precision),
c) long-term fluctuations [run-to-run or stability (which may not be a concern for all processes) or
intermediate precision].
This characterization is only a guideline. It is necessary for the user to define the time increments that are of
importance in the measurement process of concern, whether they are minutes, hours or days.
One reason for this approach is that much modern instrumentation is exceedingly precise (repeatable
measurements) in the short term, but changes over time, often caused by environmental effects, can be the
dominant source of uncertainty in the measurement process. An uncertainty statement may be inappropriate if
it relates to a measurement result that cannot be reproduced over time. A customer is entitled to know the
uncertainties associated with the measurement result, regardless of the day or time of year when the
measurement was made.
Two levels of time-dependent components are sufficient for describing many measurement processes. Three
levels may be needed for new measurement processes or processes whose characteristics are not well
understood. A three-level design is considered, with a two-level design as a special case.
Nested designs having more than three levels are not considered in this Technical Specification, but the
approaches discussed can be extended to them. See ISO 5725-3.
5.2.2 Experiment using a three-level design
5.2.2.1 A three-level nested design is generally recommended for studying the effect of sources of
variability that manifest themselves over time. Data collection and analysis are straightforward, and there is
usually no need to estimate interaction terms when dealing with time-dependent effects. Nested designs can
be operated at several levels. Three levels are recommended for measurement systems where sources of
uncertainty are not well understood and have not previously been studied.
The following levels are based on the characteristics of many measurement systems and should be adapted
to a specific measurement situation as required:
a) Level 1: measurements taken over a short-time to capture the repeatability of the measurement;
b) Level 2: measurements taken over days (or other appropriate time increment);
c) Level 3: measurements taken over runs separated by months.
Symbols relating to these levels are defined thus:
Level 1: J (J > 1) repetitions;
Level 2: K (K > 1) days;
Level 3: L (L > 1) runs.
The following balanced three-level nested design is recommended for collecting data on this basis. It
describes the long-term fluctuations in the measurement process:
Y = µ + γ + δ + ε
lkj l lk lkj
Here the measurements are represented by Y (l = 1,.,L; k = 1,.,K; j = 1,.,J) for the jth repetition on the
lkj
kth day, which are repeated for the lth run. The subscripted terms in the model represent random effects in the
measurement process that fluctuate with runs, days and short-term time intervals. The purpose of the
experiment is to estimate the variance components that quantify these sources of variability. Let the variance
2 2
components of the day and run effects for δ and γ be σ and σ , respectively, and the variance of the
D R
measurement error ε be σ . These variance components form the basis for providing the standard
uncertainties.
Table 1 — ANOVA table for a three-level nested design
Degrees of freedom Sum of squares Mean square
Source Expected mean square
ν SS MS
22 2
Run L − 1 SS MS σ++JJσσK
R R
DR
Day (run) L(K − 1) SS MS σ + Jσ
D(R) D(R) D
Error LK(J − 1) SS MS
σ
E E
The sources of variation, the sum of squares (SS), and the corresponding degrees of freedom (ν), are listed in the first, third and the
second columns, respectively. The mean squares (MS), which are obtained from the sums of squares divided by the corresponding
degrees of freedom, are listed in the fourth column. The last column lists the expected mean squares.
8 © ISO 2005 – All rights reserved
Figure 1 depicts a design with J = 4, K = 3 and L = 2.
Figure 1 — Three-level nested design
5.2.2.2 The design can be repeated for Q (Q > 1) check standards (for check standards, see 5.2.3) and
for I (I > 1) gauges (measuring instruments) if the intent is to characterize several similar gauges. Such a
design has advantages in ease of use and computation. In particular, the number of repetitions at each level
need not be large because information is being gathered on several check standards.
The measurements should be made with a single operator. The operator is not usually a consideration with
automated systems. However, systems that require decisions regarding line, edge or other feature
delineations may be operator-dependent. If there is reason to think that results might differ significantly
between operators, “operators” can be substituted for “runs” in the design. Choose L (L > 1) operators at
random from the pool of operators who are capable of making measurements at the same level of precision.
(Conduct a small experiment with operators making repeatability measurements, if necessary, to verify
comparability of precision among operators.) Then complete the data collection and analysis as outlined. In
this case, the Level 3 standard deviation estimates operator effect.
Randomize with respect to gauges for each check standard, i.e. choose the first check standard and
randomize the gauges, choose the second check standard and randomize the gauges, and so forth.
Record the average and standard deviation from each group of J repetitions by check standard and gauge.
The results should be recorded together with pertinent environmental readings and identifications for
significant factors. A recommended way to record this information is in one computer file with one line or row
of information in fixed fields for each check standard measurement. A spreadsheet is useful for this purpose.
A list of typical entries follows:
a) month;
b) day;
c) year;
d) operator identification;
e) check standard identification;
f) gauge identification;
g) average of J repetitions;
h) short-term standard deviation from J repetitions;
i) degrees of freedom;
j) environmental readings (if pertinent).
From the model above, the standard deviation of the error with LK(J − 1) degrees of freedom is estimated
using the mean square for random errors, MS , which is calculated as follows:
E
LK J
()YY−
lkj lk•
∑∑∑
lk==11j=1
σˆ==S MS =
E
LK(1J −)
where
J
is the average from each group of J repetitions.
YY=
lk• ∑ lkj
J
j=1
The mean square for the day effect, MS , with L(K − 1) degrees of freedom, is calculated as follows:
D(R)
LK
()YY−
∑∑ lk••l•
lk==11
MS = J
D(R)
LK(1−)
where
K
YY=
ll•• ∑k•
K
k=1
The mean square for the run effect, MS , with L − 1 degrees of freedom is calculated as follows:
R
L
()YY−
∑ l•• •••
l=1
MS = JK
R
L −1
where
L
YY=
••• ∑ l••
L
l=1
From the ANOVA Table 1, the estimator of the standard deviation for days is
MS − MS
D(R) E
σˆ ==S
DD
J
10 © ISO 2005 – All rights reserved
and the estimator of the standard deviation for runs is
MS − MS
RD(R)
σˆ ==S
RR
JK
ˆ ˆ
if the differences under the square root sign are positive. Otherwise, σ or σ or both is (are) taken as zero,
D R
as appropriate.
Sometimes, a two-level nested design is suggested for collecting data on short-term and day-to-day
fluctuations in the measurement process. The data that is collected in this experiment is similar to that
collected on the check standard in the next section. If more than one check standard is used, the factor of
“check standard” may be treated as a random factor, since the factor of “run” in the three-level case and the
model and analysis are the same.
5.2.3 Check standard for assessing two levels of variability
5.2.3.1 Check standard procedure
Measurements on a single check standard are recommended for studying the effect of sources of variability
that manifest themselves over time. Data collection and analysis are straightforward, and there is usually no
need to estimate interaction terms when dealing with time-dependent errors. The measurements are made at
two levels, which should be sufficient for characterizing many measurement systems. The following levels are
based on the characteristics of many measurement systems and should be adapted to a specific
measurement situation as required:
Level 1 measurements, taken over a short term to estimate gauge precision;
Level 2 measurements, taken over days to estimate longer-term variability.
A schedule for making check standard measurements over time (once a day, twice a week, or whatever is
appropriate for sampling all conditions of measurement) should be established and followed. The check
standard measurements should be structured in the same way as values reported on the test items. For
example, if the reported values are the averages of two repetitions made within 5 min of each other, the check
standard values should be averages of the two measurements made in the same manner. One exception to
this rule is that there should be at least J = 2 repetitions per day, etc. Without this redundancy, there is no way
to check the short-term precision of the measurement system.
5.2.3.2 Model
The statistical model that explains the sources of uncertainty being studied is a balanced two-level nested
design:
Y = µ + δ + ε
kj k kj
Measurements on the test items are denoted by Y (k = 1,.,K; j = 1,.,J) with the first index identifying day and
kj
the second index the repetition number. The subscripted terms in the model represent random effects in the
measurement process that fluctuate with days and short-term time intervals. The purpose of the experiment is
to estimate the variance components that quantify these sources of variability.
5.2.3.3 Time intervals
The two levels discussed in this subclause are based on the characteristics of many measurement systems
and can be adapted to a specific measurement situation as required. A typical design is shown in Figure 2,
where there are J = 4 repetitions per day with the following levels:
Level 1 J (J > 1) short-term repetitions to capture gauge precision;
Level 2 K (K > 1) days (or other appropriate time increment).
Figure 2 — Two-level nested design
5.2.3.4 Data collection
It is important that the design be truly nested as shown in Figure 2, so that repetitions are nested within days.
It is sufficient to record the average and standard deviation for each group of J repetitions, with the following
information:
a) month;
b) day;
c) year;
d) operator identification;
e) check standard identification;
f) gauge identification;
g) average of J repetitions;
h) repeatability standard deviation from J repetitions;
i) degrees of freedom;
j) environmental readings (if pertinent).
For this two-level nested design, the ANOVA table, Table 2, can be obtained from the three-level case.
12 © ISO 2005
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...