ASTM D6300-24
(Practice)Standard Practice for Determination of Precision and Bias Data for Use in Test Methods for Petroleum Products, Liquid Fuels, and Lubricants
Standard Practice for Determination of Precision and Bias Data for Use in Test Methods for Petroleum Products, Liquid Fuels, and Lubricants
SIGNIFICANCE AND USE
5.1 ASTM test methods are frequently intended for use in the manufacture, selling, and buying of materials in accordance with specifications and therefore should provide such precision that when the test is properly performed by a competent operator, the results will be found satisfactory for judging the compliance of the material with the specification. Statements addressing precision and bias are required in ASTM test methods. These then give the user an idea of the precision of the resulting data and its relationship to an accepted reference material or source (if available). Statements addressing determinability are sometimes required as part of the test method procedure in order to provide early warning of a significant degradation of testing quality while processing any series of samples.
5.2 Repeatability and reproducibility are defined in the precision section of every Committee D02 test method. Determinability is defined above in Section 3. The relationship among the three measures of precision can be tabulated in terms of their different sources of variation (see Table 1).
5.2.1 When used, determinability is a mandatory part of the Procedure section. It will allow operators to check their technique for the sequence of operations specified. It also ensures that a result based on the set of determined values is not subject to excessive variability from that source.
5.3 A bias statement furnishes guidelines on the relationship between a set of test results and a related set of accepted reference values. When the bias of a test method is known, a compensating adjustment can be incorporated in the test method.
5.4 This practice is intended for use by D02 subcommittees in determining precision estimates and bias statements to be used in D02 test methods. Its procedures correspond with ISO 4259 and are the basis for the Committee D02 computer software, Calculation of Precision Data: Petroleum Test Methods. The use of this practice replaces that of Re...
SCOPE
1.1 This practice covers the necessary preparations and planning for the conduct of interlaboratory programs for the development of estimates of precision (determinability, repeatability, and reproducibility) and of bias (absolute and relative), and further presents the standard phraseology for incorporating such information into standard test methods.
1.2 This practice is generally limited to homogeneous petroleum products, liquid fuels, and lubricants with which serious sampling problems (such as heterogeneity or instability) do not normally arise.
1.3 This practice may not be suitable for products with sampling problems as described in 1.2, solid or semisolid products such as petroleum coke, industrial pitches, paraffin waxes, greases, or solid lubricants when the heterogeneous properties of the substances create sampling problems. In such instances, consult a trained statistician.
1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
General Information
- Status
- Published
- Publication Date
- 29-Feb-2024
- Technical Committee
- D02 - Petroleum Products, Liquid Fuels, and Lubricants
- Drafting Committee
- D02.94 - Coordinating Subcommittee on Quality Assurance and Statistics
Relations
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Sep-2022
- Effective Date
- 01-Apr-2022
- Effective Date
- 01-Apr-2022
- Effective Date
- 01-May-2021
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
- Effective Date
- 01-Mar-2024
Overview
ASTM D6300-24 is the Standard Practice established by ASTM International for the determination of precision and bias data for use in test methods for petroleum products, liquid fuels, and lubricants. This standard outlines the critical procedures and recommended phraseology to evaluate and report the repeatability, reproducibility, determinability, and bias of analytical test methods. Its primary application is to support standardized testing in contexts such as manufacturing, purchasing, and quality control, ensuring that results are reliable and comparable across laboratories.
Complying with ASTM D6300-24 is vital for organizations seeking to align with international best practices, as the standard is harmonized with ISO 4259 and developed in accordance with the World Trade Organization (WTO) Technical Barriers to Trade Committee guidelines.
Key Topics
Precision Assessment
- Repeatability: Measures agreement between results under the same conditions (operator, equipment, lab).
- Reproducibility: Applies to results obtained by different labs, operators, and equipment, indicating the broader variability in test outcomes.
- Determinability: Quantifies the difference expected when the same operator performs the test multiple times using the same procedure.
Bias Determination
- Explains the method for evaluating and reporting bias, or the difference between the test results and accepted reference values.
- Guidance on incorporating bias compensation in standardized test methods where known.
Interlaboratory Studies
- Provides requirements for planning, executing, and analyzing interlaboratory test programs.
- Specifies how to select representative samples and ensure sufficient data for statistical reliability.
- Details on statistical methods such as analysis of variance (ANOVA), outlier detection, and transformation techniques for data normalization.
Standardizing Test Statements
- Presents standard language for precision and bias statements used in ASTM Committee D02 test methods.
- Defines how to address determinability for early detection of testing quality issues.
Applications
ASTM D6300-24 is especially relevant for industries and laboratories involved in:
Petroleum Product Testing
- Ensures conformity to quality and specification standards for fuels and lubricants.
- Supports compliance in manufacturing, selling, and buying transactions.
Regulatory Compliance
- Helps organizations meet contractual and legal obligations by referencing accepted precision and bias criteria in quality assurance programs.
Quality Assurance and Method Development
- Aids R&D and QA personnel in developing robust test methods and troubleshooting issues related to test variability or bias.
- Facilitates early warning and corrective action for significant degradation in testing performance.
Statistical Analysis in Laboratory Environments
- Streamlines the use of statistical tools for interlaboratory comparisons, enabling consistent interpretation of results across different settings.
Note: The standard is designed for use primarily with homogeneous petroleum products, liquid fuels, and lubricants. It may not be suitable for products with significant sampling challenges (e.g., petroleum coke, waxes, or solid lubricants), and specialized statistical consultation is recommended in such cases.
Related Standards
- ASTM D3244: Practice for Utilization of Test Data to Determine Conformance with Specifications
- ASTM D6708: Practice for Statistical Assessment and Improvement of Agreement Between Two Methods
- ASTM E29: Practice for Using Significant Digits in Test Data
- ASTM E177: Practice for Use of the Terms Precision and Bias in ASTM Test Methods
- ASTM E691: Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method
- ISO 4259: Petroleum Products - Determination and Application of Precision Data in Relation to Methods of Test
Organizations implementing ASTM D6300-24 can rely on these related standards to build a comprehensive quality and statistical control system for petroleum testing activities.
By following ASTM D6300-24, laboratories and industry stakeholders can ensure their test methods for petroleum products, liquid fuels, and lubricants deliver reliable, precise, and unbiased results, supporting both compliance and consistent product quality.
Buy Documents
ASTM D6300-24 - Standard Practice for Determination of Precision and Bias Data for Use in Test Methods for Petroleum Products, Liquid Fuels, and Lubricants
REDLINE ASTM D6300-24 - Standard Practice for Determination of Precision and Bias Data for Use in Test Methods for Petroleum Products, Liquid Fuels, and Lubricants
Get Certified
Connect with accredited certification bodies for this standard

BSI Group
BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

Bureau Veritas
Bureau Veritas is a world leader in laboratory testing, inspection and certification services.

DNV
DNV is an independent assurance and risk management provider.
Sponsored listings
Frequently Asked Questions
ASTM D6300-24 is a standard published by ASTM International. Its full title is "Standard Practice for Determination of Precision and Bias Data for Use in Test Methods for Petroleum Products, Liquid Fuels, and Lubricants". This standard covers: SIGNIFICANCE AND USE 5.1 ASTM test methods are frequently intended for use in the manufacture, selling, and buying of materials in accordance with specifications and therefore should provide such precision that when the test is properly performed by a competent operator, the results will be found satisfactory for judging the compliance of the material with the specification. Statements addressing precision and bias are required in ASTM test methods. These then give the user an idea of the precision of the resulting data and its relationship to an accepted reference material or source (if available). Statements addressing determinability are sometimes required as part of the test method procedure in order to provide early warning of a significant degradation of testing quality while processing any series of samples. 5.2 Repeatability and reproducibility are defined in the precision section of every Committee D02 test method. Determinability is defined above in Section 3. The relationship among the three measures of precision can be tabulated in terms of their different sources of variation (see Table 1). 5.2.1 When used, determinability is a mandatory part of the Procedure section. It will allow operators to check their technique for the sequence of operations specified. It also ensures that a result based on the set of determined values is not subject to excessive variability from that source. 5.3 A bias statement furnishes guidelines on the relationship between a set of test results and a related set of accepted reference values. When the bias of a test method is known, a compensating adjustment can be incorporated in the test method. 5.4 This practice is intended for use by D02 subcommittees in determining precision estimates and bias statements to be used in D02 test methods. Its procedures correspond with ISO 4259 and are the basis for the Committee D02 computer software, Calculation of Precision Data: Petroleum Test Methods. The use of this practice replaces that of Re... SCOPE 1.1 This practice covers the necessary preparations and planning for the conduct of interlaboratory programs for the development of estimates of precision (determinability, repeatability, and reproducibility) and of bias (absolute and relative), and further presents the standard phraseology for incorporating such information into standard test methods. 1.2 This practice is generally limited to homogeneous petroleum products, liquid fuels, and lubricants with which serious sampling problems (such as heterogeneity or instability) do not normally arise. 1.3 This practice may not be suitable for products with sampling problems as described in 1.2, solid or semisolid products such as petroleum coke, industrial pitches, paraffin waxes, greases, or solid lubricants when the heterogeneous properties of the substances create sampling problems. In such instances, consult a trained statistician. 1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
SIGNIFICANCE AND USE 5.1 ASTM test methods are frequently intended for use in the manufacture, selling, and buying of materials in accordance with specifications and therefore should provide such precision that when the test is properly performed by a competent operator, the results will be found satisfactory for judging the compliance of the material with the specification. Statements addressing precision and bias are required in ASTM test methods. These then give the user an idea of the precision of the resulting data and its relationship to an accepted reference material or source (if available). Statements addressing determinability are sometimes required as part of the test method procedure in order to provide early warning of a significant degradation of testing quality while processing any series of samples. 5.2 Repeatability and reproducibility are defined in the precision section of every Committee D02 test method. Determinability is defined above in Section 3. The relationship among the three measures of precision can be tabulated in terms of their different sources of variation (see Table 1). 5.2.1 When used, determinability is a mandatory part of the Procedure section. It will allow operators to check their technique for the sequence of operations specified. It also ensures that a result based on the set of determined values is not subject to excessive variability from that source. 5.3 A bias statement furnishes guidelines on the relationship between a set of test results and a related set of accepted reference values. When the bias of a test method is known, a compensating adjustment can be incorporated in the test method. 5.4 This practice is intended for use by D02 subcommittees in determining precision estimates and bias statements to be used in D02 test methods. Its procedures correspond with ISO 4259 and are the basis for the Committee D02 computer software, Calculation of Precision Data: Petroleum Test Methods. The use of this practice replaces that of Re... SCOPE 1.1 This practice covers the necessary preparations and planning for the conduct of interlaboratory programs for the development of estimates of precision (determinability, repeatability, and reproducibility) and of bias (absolute and relative), and further presents the standard phraseology for incorporating such information into standard test methods. 1.2 This practice is generally limited to homogeneous petroleum products, liquid fuels, and lubricants with which serious sampling problems (such as heterogeneity or instability) do not normally arise. 1.3 This practice may not be suitable for products with sampling problems as described in 1.2, solid or semisolid products such as petroleum coke, industrial pitches, paraffin waxes, greases, or solid lubricants when the heterogeneous properties of the substances create sampling problems. In such instances, consult a trained statistician. 1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
ASTM D6300-24 is classified under the following ICS (International Classification for Standards) categories: 03.120.20 - Product and company certification. Conformity assessment; 75.080 - Petroleum products in general. The ICS classification helps identify the subject area and facilitates finding related standards.
ASTM D6300-24 has the following relationships with other standards: It is inter standard links to ASTM D6300-23a, ASTM D3606-22, ASTM E456-13a(2022)e1, ASTM E456-13a(2022), ASTM D6708-21, ASTM D7501-22, ASTM D7157-23, ASTM D8267-19a, ASTM D7778-15(2022)e1, ASTM D7995-19, ASTM D8290-22, ASTM D7039-15a(2020), ASTM D6259-23, ASTM D7622-20, ASTM D7798-20. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
ASTM D6300-24 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: D6300 − 24 An American National Standard
Standard Practice for
Determination of Precision and Bias Data for Use in Test
Methods for Petroleum Products, Liquid Fuels, and
Lubricants
This standard is issued under the fixed designation D6300; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
INTRODUCTION
Both Research Report RR:D02-1007, Manual on Determining Precision Data for ASTM Methods
on Petroleum Products and Lubricants and the ISO 4259, benefitted greatly from more than 50 years
of collaboration between ASTM and the Institute of Petroleum (IP) in the UK. The more recent work
was documented by the IP and has become ISO 4259.
ISO 4259 encompasses both the determination of precision and the application of such precision
data. In effect, it combines the type of information in RR:D02-1007 regarding the determination of
the precision estimates and the type of information in Practice D3244 for the utilization of test data.
The following practice, intended to replace RR:D02-1007, differs slightly from related portions of the
ISO standard.
1. Scope* ization established in the Decision on Principles for the
Development of International Standards, Guides and Recom-
1.1 This practice covers the necessary preparations and
mendations issued by the World Trade Organization Technical
planning for the conduct of interlaboratory programs for the
Barriers to Trade (TBT) Committee.
development of estimates of precision (determinability,
repeatability, and reproducibility) and of bias (absolute and
2. Referenced Documents
relative), and further presents the standard phraseology for
2.1 ASTM Standards:
incorporating such information into standard test methods.
D3244 Practice for Utilization of Test Data to Determine
1.2 This practice is generally limited to homogeneous pe-
Conformance with Specifications
troleum products, liquid fuels, and lubricants with which
D3606 Test Method for Determination of Benzene and
serious sampling problems (such as heterogeneity or instabil-
Toluene in Spark Ignition Fuels by Gas Chromatography
ity) do not normally arise.
D6708 Practice for Statistical Assessment and Improvement
1.3 This practice may not be suitable for products with
of Expected Agreement Between Two Test Methods that
sampling problems as described in 1.2, solid or semisolid
Purport to Measure the Same Property of a Material
products such as petroleum coke, industrial pitches, paraffin
D7915 Practice for Application of Generalized Extreme
waxes, greases, or solid lubricants when the heterogeneous
Studentized Deviate (GESD) Technique to Simultane-
properties of the substances create sampling problems. In such
ously Identify Multiple Outliers in a Data Set
instances, consult a trained statistician.
E29 Practice for Using Significant Digits in Test Data to
1.4 This international standard was developed in accor- Determine Conformance with Specifications
dance with internationally recognized principles on standard- E177 Practice for Use of the Terms Precision and Bias in
ASTM Test Methods
E456 Terminology Relating to Quality and Statistics
E691 Practice for Conducting an Interlaboratory Study to
This practice is under the jurisdiction of ASTM Committee D02 on Petroleum
Products, Liquid Fuels, and Lubricantsand is the direct responsibility of Subcom-
Determine the Precision of a Test Method
mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.
Current edition approved March 1, 2024. Published March 2024. Originally
approved in 1998. Last previous edition approved in 2023 as D6300 – 23a. DOI:
10.1520/D6300-24. For referenced ASTM standards, visit the ASTM website, www.astm.org, or
Supporting data have been filed at ASTM International Headquarters and may contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
be obtained by requesting Research Report RR:D02-1007. Contact ASTM Customer Standards volume information, refer to the standard’s Document Summary page on
Service at service@astm.org. the ASTM website.
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
D6300 − 24
2.2 ISO Standards: this practice, precision statements are framed in terms of
ISO 4259 Petroleum Products-Determination and Applica- repeatability and reproducibility of the test method.
tion of Precision Data in Relation to Methods of Test
3.1.9.1 Discussion—The testing conditions represented by
repeatability and reproducibility should reflect the normal
3. Terminology
extremes of variability under which the test is commonly used.
3.1 Definitions: Repeatability conditions are those showing the least variation;
3.1.1 analysis of variance (ANOVA), n—technique that en-
reproducibility, the usual maximum degree of variability. Refer
ables the total variance of a method to be broken down into its to the definitions of each of these terms for greater detail.
component factors. ISO 4259
RR:D02–1007
3.1.2 bias, n—the difference between the expectation of the
3.1.10 random error, n—the chance variation encountered in
test results and an accepted reference value.
all test work despite the closest control of variables.
3.1.2.1 Discussion—The term “expectation” is used in the
RR:D02–1007
context of statistics terminology, which implies it is a “statis-
3.1.11 repeatability (a.k.a. Repeatability Limit), n—a quan-
tical expectation.” E177
titative expression for the random error associated with the
3.1.3 between-method bias (relative bias), n—a quantitative
difference between two independent results obtained under
expression for the mathematical correction that can statistically
repeatability conditions that would be exceeded about 5 % of
improve the degree of agreement between the expected values
the time (one case in 20 in the long run) in the normal and
of two test methods which purport to measure the same
correct operation of the test method.
property. D6708
3.1.11.1 Discussion—Interpret as the limit value the abso-
lute difference between two single test results obtained under
3.1.4 degrees of freedom, n—the divisor used in the calcu-
lation of variance, one less than the number of independent repeatability conditions is expected to exceed with an approxi-
results. mate probability of 5 %.
3.1.4.1 Discussion—This definition applies strictly only in 3.1.11.2 Discussion—The difference is related to the repeat-
the simplest cases. Complete definitions are beyond the scope
ability standard deviation but it is not the standard deviation or
of this practice. ISO 4259 its estimate.
3.1.11.3 Discussion—In 3.1.11 and 3.1.13, the term “prob-
3.1.5 determinability, n—a quantitative measure of the vari-
ability” quantifies the likelihood of repeatability or reproduc-
ability associated with the same operator in a given laboratory
ibility limit exceedance for the difference between a single pair
obtaining successive determined values using the same appa-
of results obtained under the respective conditions. The "one
ratus for a series of operations leading to a single result; it is
case in 20 in the long run" in the parenthesis is not to be
defined as the difference between two such single determined
interpreted as one case in every 20, but it is over the long run.
values that would be exceeded about 5 % of the time (one case
The long run concept can be illustrated using 10 cases out of
in 20 in the long run) in the normal and correct operation of the
200, or 100 cases out of 2000, or 1000 cases in 20 000. The
test method.
lowest numerical values of one case in 20 is used here.
3.1.5.1 Discussion—This definition implies that two deter-
3.1.11.4 Discussion—The "one case in 20" is a legacy term
mined values, obtained under determinability conditions,
that was carried over from RR:D02-1007 in the original
which differ by more than the determinability value should be
development of Practice D6300. RR:D02–1007
considered suspect. If an operator obtains more than two
determinations, then it would usually be satisfactory to check
3.1.12 repeatability conditions, n—conditions where inde-
the most discordant determination against the mean of the
pendent test results are obtained with the same method on
remainder, using determinability as the critical difference (1).
identical test items in the same laboratory by the same operator
3.1.6 mean square, n—in analysis of variance, sum of using the same equipment within short intervals of time. E177
squares divided by the degrees of freedom. ISO 4259
3.1.13 reproducibility (a.k.a. Reproducibility Limit), n—a
3.1.7 normal distribution, n—the distribution that has the
quantitative expression for the random error associated with
probability function x, such that, if x is any real number, the
the difference between two independent results obtained under
probability density is
reproducibility conditions that would be exceeded about 5 % of
21/2 2 2 the time (one case in 20 in the long run) in the normal and
f~x! 5 ~1/σ!~2π! exp@2~x 2 μ! /2σ # (1)
correct operation of the test method.
NOTE 1—μ is the true value and σ is the standard deviation of the
normal distribution (σ > 0). ISO 4259 3.1.13.1 Discussion—Interpret as the limit value the abso-
lute difference between two single test results obtained under
3.1.8 outlier, n—a result far enough in magnitude from other
reproducibility conditions is expected to exceed with an
results to be considered not a part of the set. RR:D02–1007
approximate a probability of 5 %.
3.1.9 precision, n—the degree of agreement between two or
3.1.13.2 Discussion—The difference is related to the repro-
more results on the same property of identical test material. In
ducibility standard deviation but is not the standard deviation
or its estimate. RR:D02–1007
Available from American National Standards Institute (ANSI), 25 W. 43rd St.,
3.1.13.3 Discussion—In those cases where the normal use
4th Floor, New York, NY 10036, http://www.ansi.org.
of the test method does not involve sending a sample to a
The bold numbers in parentheses refers to the list of references at the end of this
standard. testing laboratory, either because it is an in-line test method or
D6300 − 24
because of serious sample instabilities or similar reasons, the 3.2.4 result, n—the final value obtained by following the
precision test for obtaining reproducibility may allow for the complete set of instructions in the test method.
use of apparatus from the participating laboratories at a 3.2.4.1 Discussion—It may be obtained from a single deter-
common site (several common sites, if feasible). The statistical mination or from several determinations, depending on the
analysis is not affected thereby. However, the interpretation of instructions in the method. When rounding off results, the
the reproducibility value will be affected since the test data is procedures described in Practice E29 shall be used.
collected under intermediate precision conditions as defined in
4. Summary of Practice
Practice E177, and therefore, the precision statement shall, in
this case, state the conditions to which the reproducibility value
4.1 A draft of the test method is prepared and a pilot
applies, and label this precision in a manner consistent with
program can be conducted to verify details of the procedure
how the test data is obtained.
and to estimate roughly the precision of the test method.
4.1.1 If the responsible committee decides that an interla-
NOTE 2—The reproducibility precision outcome from 3.1.13.3 is a form
of Intermediate Precision as defined in Practice E177. boratory study for the test method is to take place at a later
point in time, an interim repeatability is estimated by following
3.1.14 reproducibility conditions, n—conditions where in-
the requirements in 6.2.1.
dependent test results are obtained with the same method on
identical test items in different laboratories with different
4.2 A plan is developed for the interlaboratory study using
operators using different equipment.
the number of participating laboratories to determine the
number of samples needed to provide the necessary degrees of
NOTE 3—Different laboratory by necessity means a different operator,
freedom. Samples are acquired and distributed. The interlabo-
different equipment, and different location and under different supervisory
ratory study is then conducted on an agreed draft of the test
control. E177
method.
3.1.15 standard deviation, n—measure of the dispersion of a
series of results around their mean, equal to the square root of 4.3 The data are summarized and analyzed. Any depen-
dence of precision on the level of test result is removed by
the variance and estimated by the positive square root of the
mean square. ISO 4259 transformation. The resulting data are inspected for uniformity
and for outliers. Any missing and rejected data are estimated.
3.1.16 sum of squares, n—in analysis of variance, sum of
The transformation is confirmed. Finally, an analysis of vari-
squares of the differences between a series of results and their
ance is performed, followed by calculation of repeatability,
mean. ISO 4259
reproducibility, and bias. When it forms a necessary part of the
3.1.17 variance, n—a measure of the dispersion of a series
test procedure, the determinability is also calculated.
of accepted results about their average. It is equal to the sum of
the squares of the deviation of each result from the average,
5. Significance and Use
divided by the number of degrees of freedom. RR:D02–1007
5.1 ASTM test methods are frequently intended for use in
3.1.18 variance, between-laboratory, n—that component of
the manufacture, selling, and buying of materials in accordance
the overall variance due to the difference in the mean values
with specifications and therefore should provide such precision
obtained by different laboratories. ISO 4259
that when the test is properly performed by a competent
3.1.18.1 Discussion—When results obtained by more than
operator, the results will be found satisfactory for judging the
one laboratory are compared, the scatter is usually wider than
compliance of the material with the specification. Statements
when the same number of tests are carried out by a single
addressing precision and bias are required in ASTM test
laboratory, and there is some variation between means obtained
methods. These then give the user an idea of the precision of
by different laboratories. Differences in operator technique,
the resulting data and its relationship to an accepted reference
instrumentation, environment, and sample “as received” are
material or source (if available). Statements addressing deter-
among the factors that can affect the between laboratory
minability are sometimes required as part of the test method
variance. There is a corresponding definition for between-
procedure in order to provide early warning of a significant
operator variance.
degradation of testing quality while processing any series of
3.1.18.2 Discussion—The term “between-laboratory” is of-
samples.
ten shortened to “laboratory” when used to qualify represen-
5.2 Repeatability and reproducibility are defined in the
tative parameters of the dispersion of the population of results,
precision section of every Committee D02 test method. Deter-
for example as “laboratory variance.”
minability is defined above in Section 3. The relationship
3.2 Definitions of Terms Specific to This Standard:
among the three measures of precision can be tabulated in
3.2.1 determination, n—the process of carrying out a series
terms of their different sources of variation (see Table 1).
of operations specified in the test method whereby a single
5.2.1 When used, determinability is a mandatory part of the
value is obtained.
Procedure section. It will allow operators to check their
technique for the sequence of operations specified. It also
3.2.2 operator, n—a person who carries out a particular test.
ensures that a result based on the set of determined values is
3.2.3 probability density function, n—function which yields
not subject to excessive variability from that source.
the probability that the random variable takes on any one of its
admissible values; here, we are interested only in the normal 5.3 A bias statement furnishes guidelines on the relationship
probability. between a set of test results and a related set of accepted
D6300 − 24
TABLE 1 Sources of Variation
Method Apparatus Operator Laboratory Time
Reproducibility Complete Different Different Different Not Specified
(Result)
Repeatability Complete Same Same Same Almost same
(Result)
Determinability Incomplete Same Same Same Almost same
(Part result)
reference values. When the bias of a test method is known, a 6. Stages in Planning of an Interlaboratory Test Program
compensating adjustment can be incorporated in the test for the Determination of the Precision of a Test
method. Method
5.4 This practice is intended for use by D02 subcommittees
6.1 The stages in planning an interlaboratory test program
in determining precision estimates and bias statements to be
are: preparing a draft method of test (see 6.2), planning and
used in D02 test methods. Its procedures correspond with ISO
executing a pilot program with at least two laboratories
4259 and are the basis for the Committee D02 computer
(optional but recommended for new test methods) (see 6.3),
software, Calculation of Precision Data: Petroleum Test Meth-
planning the interlaboratory program (see 6.4), and executing
ods. The use of this practice replaces that of Research Report
the interlaboratory program (see 6.5). The four stages are
RR:D02-1007.
described in turn.
5.5 Standard practices for the calculation of precision have
6.2 Preparing a Draft Method of Test—This shall contain all
been written by many committees with emphasis on their
the necessary details for carrying out the test and reporting the
particular product area. One developed by Committee E11 on
results. Any condition which could alter the results shall be
Statistics is Practice E691. Practice E691 and this practice
specified. The section on precision will be included at this stage
differ as outlined in Table 2.
only as a heading.
6.2.1 Interim Repeatability Study—If the responsible com-
mittee decides that an interlaboratory study for the test method
is to take place at a later point in time, using this standard, an
TABLE 2 Differences in Calculation of Precision in Practices
D6300 and E691 interim repeatability standard deviation is estimated by follow-
ing the steps as outlined below. This interim repeatability
Element This Practice Practice E691
standard deviation can be used to meet ASTM Form and Style
Number of replicates Two Any number
Requirement A21.5.1. When the committee is ready to proceed
Precision is written Test method Each sample
with the ILS, continue with this practice from 6.3 onwards.
for
6.2.1.1 Design—The following minimum requirements
Outlier tests: Sequential Simultaneous
shall be met:
Within laboratories Cochran test k-value
(1) Three (3) samples, compositionally representative of
Between laborato- Hawkins test h-value
ries
the majority of materials within the design envelope of the test
method, covering the low, medium, and high regions of the
Outliers Rejected, subject to subcom- Rejected if many laborato-
intended test method range.
mittee approval. ries or for cause such as
blunder or not following (2) Twelve (12) replicates per sample, obtained under
method.
repeatability conditions in a single laboratory.
6.2.1.2 Analysis—Carry out the following analyses in the
Retesting not generally per- Laboratory may retest
mitted. sample having rejected
order presented:
data.
(1) Perform GESD Outlier Rejection as per Practice D7915
Analysis of variance Two-way, applied globally One-way, applied to each for each sample.
to all the remaining data sample separately.
(2) Calculate sample variance (v) and standard deviation
at once.
(s) for each sample using non-rejected results.
Precision multiplier (3) Perform the Hartley test for variance equality as fol-
t 2 , where t is the two- 2.851.96 2
œ œ
tailed Student’s t for 95 %
lows:
probability.
calculate the ratio : F = v /v where v and v
max max min max min
are the largest and smallest variance obtained.
Increases with decreasing Constant.
(4) If F is less than 4.85, estimate the interim repeat-
laboratories × samples par-
max
ticularly below 12.
ability standard deviation of the test method by taking the
square root of the average variance calculated using individual
Variation of precision Minimized by data transfor- User may assess from in-
with level mation. Equations dividual sample precisions. variances from all samples as illustrated below using three
for repeatability and reproduc-
samples:
ibility are generated in the
Interim repeatability standard deviation = @~v 1 v
retransformation process. 1 2
0.5
1 v ⁄3 , where v ,v , v are variances for each sample; it
! #
3 1 2 3
D6300 − 24
should be noted that if the number of non-outlying results used practice. If any are considered to be too large for the technical
to calculate the variances are not the same, this equation application, then consider alterations to the test method.
provides an approximation only, but is suitable for the intended
6.4 Planning the Interlaboratory Program:
purpose.
6.4.1 There shall be at least six (6) participating
(5) If F exceeds 4.85, list the averages and associated
max
laboratories, but it is recommended this number be increased to
repeatability standard deviations for each sample separately.
eight (8) or more in order to ensure the final precision is based
(6) If F exceeds 4.85, and, v is associated with the
max max
on at least six (6) laboratories and to make the precision
sample with the lowest average, calculate the following ratio:
statement more representative of the qualified user population.
0.5
[10 s ]/average , where s is (v ) , and
max sample max max
6.4.2 The number of samples shall be sufficient to cover the
average is the average of the sample. If this ratio is near
sample
range of the property measured, and to give reliability to the
or exceeds 1, then it is likely that this sample is at or below the
precision estimates. If any variation of precision with level was
limit of quantitation of the test method. If this ratio is far below
observed in the results of the pilot program, then at least six
1, it is likely this is a sample-specific effect. Method developers
samples, spanning the range of the test method in a manner
should investigate and take appropriate steps to revise the test
than ensures the leverage (h) of each sample (see Eq 2) does
method scope or improve the test method precision at the low
not exceed 4/n (rounded to 1st decimal) shall be used in the
limit prior to the conduct of a full ILS.
interlaboratory program. In any case, it is necessary to obtain
(7) If the sample set design meets the requirement in 6.4.2,
at least 30 degrees of freedom in both repeatability and
the methodology in Appendix X2 can be used to estimate an
reproducibility. For repeatability, this means obtaining a total
interim repeatability function by treating the repeats per sample
of at least 30 pairs of results in the program. In the absence of
as results from ‘pseudo-laboratories’ without repeats.
pilot test program information to permit use of Fig. 1 (see
NOTE 4—It is highly recommended that 6.2.1.2(7) be conducted under
6.4.3) to determine the number of samples, the number of
the guidance of a statistician familiar with the methodology in Appendix
samples shall be greater than five, and chosen such that the
X2.
number of laboratories times the number of samples is greater
6.2.1.3 Validation of Interim Repeatability Study by Another
than or equal to 42.
Laboratory—It is highly recommended that the findings from
Leverage calculation:
the interim repeatability study be validated by conducting a
1 x 2 x¯
~ !
i
similar study at another laboratory. If the findings from the
h 5 1 (2)
ii n
n
validation study do not support the functional form (constant or
x 2 x¯
~ !
k
(
k51
per Appendix X2) of the interim repeatability study obtained
by the initial laboratory, or, if the ratio:
h = leverage of sample i,
ii
n = total number of planned samples,
interim repeataility standard deviation from lab A
F G
p = planned property level for sample i,
interim repeatability standard deviation from lab B i
x = ln (p ), and
i i
exceeds 2.4, where the larger of the standard deviation value x¯ = grand average of all x .
i
is in the numerator, that is, if the repeatability standard
6.4.3 For reproducibility, Fig. 1 gives the minimum number
deviation for lab A is numerically larger than B; otherwise use
of samples required in terms of L, P, and Q, where L is the
the repeatability standard deviation for lab B in the numerator
number of participating laboratories, and P and Q are the ratios
and the repeatability standard deviation for lab A in the
of variance component estimates (see 8.3.1) obtained from the
denominator, it can be concluded that the findings from one
pilot program. Specifically, P is the ratio of the interaction
laboratory cannot be validated by another laboratory. The
component to the repeats component, and Q is the ratio of the
method developer is advised to consult a statistician and
laboratories component to the repeats component.
subject matter experts to decide on which laboratory findings
NOTE 5—Appendix X1 gives the derivation of the equation used. If Q
are to be used.
is much larger than P, then 30 degrees of freedom cannot be achieved; the
blank entries in Fig. 1 correspond to this situation or the approach of it
6.3 Planning and Executing a Pilot Program with at Least
(that is, when more than 20 samples are required). For these cases, there
Two Laboratories:
is likely to be a significant bias between laboratories. The program
6.3.1 A pilot program is recommended to be used with new
organizer shall be informed; further standardization of the test method
test methods for the following reasons: (1) to verify the details may be necessary.
in the operation of the test; (2) to find out how well operators
6.5 Executing the Interlaboratory Program:
can follow the instructions of the test method; (3) to check the
6.5.1 One person shall oversee the entire program, from the
precautions regarding sample handling and storage; and (4) to
distribution of the texts and samples to the final appraisal of the
estimate roughly the precision of the test.
results. He or she shall be familiar with the test method, but
6.3.2 At least two samples are required, covering the range should not personally take part in the actual running of the
of results to which the test is intended to apply; however, tests.
include at least 12 laboratory-sample combinations. Test each 6.5.2 The text of the test method shall be distributed to all
sample twice by each laboratory under repeatability conditions. the laboratories in time to raise any queries before the tests
If any omissions or inaccuracies in the draft method are begin. If any laboratory wants to practice the test method in
revealed, they shall now be corrected. Analyze the results for advance, this shall be done with samples other than those used
precision, bias, and determinability (if applicable) using this in the program.
D6300 − 24
FIG. 1 Determination of Number of Samples Required (see 6.4.3)
6.5.3 The samples shall be accumulated, subdivided, and trolled in the normal execution of the test method shall not be
distributed by the organizer, who shall also keep a reserve of intentionally removed nor controlled in the testing of the ILS
each sample for emergencies. It is most important that the samples, unless explicitly permitted by the sponsoring subcom-
individual laboratory portions be homogeneous. Instructions to mittee of the ILS for special studies where certain factors are
each laboratory shall include the following: controlled intentionally as part of the testing protocol to meet
6.5.3.1 Testing Protocol—The protocol to be used for test- the intended ILS study objectives. To remove, control, or set
ing of the ILS sample set shall be provided. Factors that may limits on factors that are not intended to be controlled in the
affect test method outcome but are not intended to be con- normal execution of the test method in the conduct of an ILS
D6300 − 24
that is intended for the precision evaluation of the test method 7.1.1.2 The uniformity of precision from laboratory to
executed under normal operating conditions will result in laboratory, and to detect the presence of outliers.
overly optimistic precision. Precision statements thus gener-
NOTE 6—The procedures are described in mathematical terms based on
ated will likely be unattainable by majority of users in the
the notation of Annex A1 and illustrated with reference to the example
normal execution of the test method.
data (calculation of bromine number) set out in Annex A2. Throughout
6.5.3.2 The agreed draft method of test; this section (and Section 8), the procedures to be used are first specified
and then illustrated by a worked example using data given in Annex A2.
6.5.3.3 Material Safety Data Sheets, where applicable, and
NOTE 7—It is assumed throughout this section that all the deviations are
the handling and storage requirements for the samples;
either from a single normal distribution or capable of being transformed
6.5.3.4 The order in which the samples are to be tested (a
into such a distribution (see 7.2). Other cases (which are rare) would
different random order for each laboratory);
require different treatment that is beyond the scope of this practice. Also,
see (2) for a statistical test of normality.
6.5.3.5 The statement that two test results are to be obtained
in the shortest practical period of time on each sample by the
7.2 Transformation of Data:
same operator with the same apparatus. For statistical reasons
7.2.1 In many test methods the precision depends on the
it is imperative that the two results are obtained independently
level of the test result, and thus the variability of the reported
of each other, that is, that the second result is not biased by
results is different from sample to sample. The method of
knowledge of the first. If this is regarded as impossible to
analysis outlined in this practice requires that this shall not be
achieve with the operator concerned, then the pairs of results
so and the position is rectified, if necessary, by a transforma-
shall be obtained in a blind fashion, but ensuring that they are
tion.
carried out in a short period of time (preferably the same day).
7.2.1.1 Prior to commencement of analysis to determine if
The term blind fashion means that the operator does not know
transformation is necessary, it is a good practice to examine
that the sample is a replicate of any previous run.
information gathered from ILS participants to determine com-
6.5.3.6 The period of time during which repeated results are
pliance with agreed upon ILS protocol and method of test. As
to be obtained and the period of time during which all the
part of this examination, the raw data as reported should be
samples are to be tested;
inspected for existence of extreme or outlandish values that are
6.5.3.7 A blank form for reporting the results. For each
visually obvious. Exclusion of extreme or outlandish results
sample, there shall be space for the date of testing, the two
from transformation analysis is recommended if assignable
results, and any unusual occurrences. The unit of accuracy for
causes can be found in order to help ensure test data
reporting the results shall be specified. This should be, if
dependability, transformation reliability, and subsequent com-
possible, more digits reported than will be used in the final test
putation efficiency. If assignable causes cannot be found,
method, in order to avoid having rounding unduly affect the
exclusion of extreme or outlandish results from transformation
estimated precision values.
analysis should be confirmed for each sample using a formal
6.5.3.8 When it is required to estimate the determinability,
statistical test such as the General Extreme Studentized Devia-
the report form must include space for each of the determined
tion (GESD) multi-outlier technique (see Practice D7915) or
values as well as the test results.
other technically equivalent techniques at the 99 % confidence
6.5.3.9 A statement that the test shall be carried out under
level on the difference and average (or sum) of the two
normal conditions, using operators with good experience but
replicate results as submitted by each ILS participant for each
not exceptional knowledge; and that the duration of the test
sample as follows:
shall be the same as normal.
(1) Compute the difference of the two replicates submitted
6.5.4 The pilot program operators may take part in the
by each participant for the sample;
interlaboratory program. If their extra experience in testing a
(2) Perform GESD on the differences from all participants
few more samples produces a noticeable effect, it will serve as
for the sample;
a warning that the test method is not satisfactory. They shall be
(3) For each difference that is identified as outlier, reject the
identified in the report of the results so that any such effect may
result that is farthest from the median of all results for that
be noted.
sample;
6.5.5 It can not be overemphasized that the statement of
(4) Compute the average (or sum) of the two replicates for
precision in the test method is to apply to test results obtained
each participant for the sample; for participants who submitted
by running the agreed procedure exactly as written. Therefore,
only a single result, or, if one of the submitted replicates is
the test method must not be significantly altered after its
rejected in (3), use the remaining result as the average (or 2 ×
precision statement is written.
the remaining result as sum) for the participant;
(5) Perform GESD on the averages (or sums) from all
7. Inspection of Interlaboratory Results for Uniformity
participants for the sample;
and for Outliers
(6) Reject all results identified as outliers in (5); and
(7) Continue execution of the remainder of this practice
7.1 Introduction:
using the retained results.
7.1.1 This section specifies procedures for examining the
It is recommended that such statistical tests be conducted
results reported in a statistically designed interlaboratory
under the guidance of a statistician.
program (see Section 6) to establish:
7.1.1.1 The independence or dependence of precision and 7.2.2 The laboratories’ standard deviations D , and the
j
the level of results; repeats standard deviations d (see Annex A1) are calculated
j
D6300 − 24
and plotted separately against the sample means m . If the 7.2.8 The choice of transformation is difficult to make the
j
points so plotted may be considered as lying about a pair of subject of formalized rules. Qualified statistical assistance may
lines parallel to the m-axis, then no transformation is necessary. be required in particular cases. The presence of outliers may
If, however, the plotted points describe non-horizontal straight affect judgement as to the type of transformation required, if
lines or curves of the form D = f (m) and d = f (m), then a any (see 7.7).
1 2
transformation will be necessary. 7.2.9 Worked Example:
7.2.3 The relationships D = f (m) and d = f ( m) will not in
7.2.9.1 Table 3 lists the values of m, D, and d for the eight
1 2
general be identical. It is frequently the case, however, that the samples in the example given in Annex A2, correct to three
d
j significant digits. Corresponding degrees of freedom are in
ratios u 5 are approximately the same for all m , in which
j j
D
j parentheses. Inspection of the values in Table 3 shows that both
case f is approximately proportional to f and a single
1 2
D and d increase with m, the rate of increase diminishing as m
transformation will be adequate for both repeatability and
increases. A plot of these figures on log-log paper (that is, a
reproducibility. The statistical procedures of this practice are
graph of log D and log d against log m) shows that the points
greatly facilitated when a single transformation can be used.
may reasonably be considered as lying about two straight lines
For this reason, unless the u clearly vary with property level,
j
(see Fig. A4.1 in Annex A4). From the example calculations
the two relationships are combined into a single dependency
given in A4.4, the gradients of these lines are shown to be the
relationship D = f(m) (where D now includes d) by including
same, with an estimated value of 0.638. Bearing in mind the
a dummy variable T. This will take account of the difference
errors in this estimated value, the gradient may for convenience
between the relationships, if one exists, and will provide a
be taken as 2/3.
means of testing for this difference (see A4.1).
7.2.4 In the event that the rations u do vary with level
j 2 1
x 3 dx 5 3x 3 (4)
*
(mean, m ), as confirmed with a regression of u on m , or
j j j
log(u ) on log(m ), follow the instructions in Annex A5.
j j
7.2.9.2 Hence, the same transformation is appropriate both
Otherwise, continue with 7.2.5.
for repeatability and reproducibility, and is given by the
7.2.5 The single relationship D = f(m) is best estimated by
equation. Since the constant multiplier may be ignored, the
weighted linear regression analysis. Strictly speaking, an
transformation thus reduces to that of taking the cube roots of
iteratively weighted regression should be used, but in most
the reported bromine numbers. This yields the transformed
cases even an unweighted regression will give a satisfactory
data shown in Table A1.3, in which the cube roots are quoted
approximation. The derivation of weights is described in A4.2,
correct to three decimal places.
and the computational procedure for the regression analysis is
7.3 Tests for Outliers:
described in A4.3. Typical forms of dependence D = f(m) are
7.3.1 The reported data or, if it has been decided that a
given in A3.1. These are all expressed in terms of at most two
transformation is necessary, the transformed results shall be
(2) transformation parameters, B and B .
inspected for outliers. These are the values which are so
7.2.6 The typical forms of dependence, the transformations
different from the remainder that it can only be concluded that
they give rise to, and the regressions to be performed in order
they have arisen from some fault in the application of the test
to estimate the transformation parameters B, are all summa-
method or from testing a wrong sample. Many possible tests
rized in A3.2. This includes statistical tests for the significance
may be used and the associated significance levels varied, but
of the regression (that is, is the relationship D = f(m) parallel
those that are specified in the following subsections have been
to the m-axis), and for the difference between the repeatability
found to be appropriate in this practice. These outlier tests all
and reproducibility relationships, based at the 5 % significance
assume a normal distribution of errors.
level. If such a difference is found to exist, follow the
7.3.1.1 The total percentage of outliers rejected, as defined
procedures in Annex A5.
by 100× (no. of rejected results/no. of reported results), shall be
7.2.7 If it has been shown at the 5 % significance level that
reported explicitly to the ILS Program Manager for approval
there is a significant regression of the form D = f(m), then the
by the sponsoring subcommittee and main committee.
appropriate transformation y = F(x), where x is the reported
7.3.2 Uniformity of Repeatability—The first outlier test is
result, is given by the equation
concerned with detecting a discordant result in a pair of repeat
dx
results. This test (3) involves calculating the e over all the
F x 5 K (3)
~ ! * ij
f~x!
laboratory/sample combinations. Cochran’s criterion at the 1 %
where K = a constant. In that event, all results shall be trans-
significance level is then used to test the ratio of the largest of
formed accordingly and the remainder of the analysis carried
these values over their sum (see A1.5). If its value exceeds the
out in terms of the transformed results. Typical transforma-
tions are given in A3.1. value given in Table A2.2, corresponding to one degree of
TABLE 3 Computed from Bromine Example Showing Dependence of Precision on Level
Sample Number 3 8 1 4 5 6 2 7
m 0.756 1.22 2.15 3.64 10.9 48.2 65.4 114
D 0.0669 (14) 0.159 (9) 0.729 (8) 0.211 (11) 0.291 (9) 1.50 (9) 2.22 (9) 2.93 (9)
d 0.0500 (9) 0.0572 (9) 0.127 (9) 0.116 (9) 0.0943 (9) 0.527 (9) 0.818 (9) 0.935 (9)
D6300 − 24
freedom, n being the number of pairs available for comparison, 7.3.4.4 If a significant value is encountered for individual
then the member of the pair farthest from the sample mean samples the corresponding extreme values shall be omitted and
shall be rejected and the process repeated, reducing n by 1, the process repeated. If any extreme values are found in the
until no more rejections are called for. In certain cases, laboratory totals, then all the results from that laboratory shall
specifically when the number of digits used in reporting results be rejected.
leads to a large number of repeat ties, this test can lead to large 7.3.4.5 If the test leads to large proportion of rejections,
proportion of rejections. If this is so, consideration should be consideration should be given to cease this rejection test and
given to cease this rejection test and retain some or all of the retain some or all of the rejected results. A decision based on
rejected results. A decision based on judgement in consultation judgement in consultation with a statistician will be necessary
with a statistician will be necessary in this case. in this case.
7.3.3 Worked Example—In the case of the example given in 7.3.5 Worked Example:
Annex A2, the absolute differences (ranges) between trans- 7.3.5.1 The application of Hawkins’ test to cell means
formed repeat results, that is, of the pairs of numbers in Table within samples is shown below.
A1.3, in units of the third decimal place, are shown in Table 4. 7.3.5.2 The first step is to calculate the deviations of cell
The largest range is 0.078 for Laboratory G on Sample 3. The means from respective sample means over the whole array.
sum of squares of all the ranges is These are shown in Table 5, in units of the third decimal place.
2 2 2 2
0.042 + 0.021 + . . . + 0.026 + 0 = 0.0439. The sum of squares of the deviations are then calculated for
Thus, the ratio to be compared with Cochran’s criterion is each sample. These are also shown in Table 5 in units of the
third decimal place.
7.3.5.3 The cell to be tested is the one with the most extreme
0.078
5 0.138 (5)
deviation. This was obtained by Laboratory D from Sample 1.
0.0439
The appropriate Hawkins’ test ratio is therefore:
where 0.138 is the result obtained by electronic calculation
of unrounded factors in the expression. There are 72 ranges
0.314
B* 5 5 0.7281 (6)
and as, from Table A2.2, the criterion for 80 ranges is
=0.11710.0151. . .10.017
0.1709, this ratio is not significant.
7.3.5.4 The critical value, corresponding to n = 9 cells in
7.3.4 Uniformity of Reproducibility:
sample 1 and v = 56 extra degrees of freedom from the other
7.3.4.1 The following outlier tests are concerned with es-
samples is interpolated from Table A1.5 as 0.3729. The test
tablishing uniformity in the reproducibility estimate, and are
value is greater than the critical value, and so the results from
designed to detect either a discordant pair of results from a
Laboratory D on Sample 1 are rejected.
laboratory on a particular sample or a discordant set of results
7.3.5.5 As there has been a rejection, the mean value,
from a laboratory on all samp
...
This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: D6300 − 23a D6300 − 24 An American National Standard
Standard Practice for
Determination of Precision and Bias Data for Use in Test
Methods for Petroleum Products, Liquid Fuels, and
Lubricants
This standard is issued under the fixed designation D6300; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
INTRODUCTION
Both Research Report RR:D02-1007, Manual on Determining Precision Data for ASTM Methods
on Petroleum Products and Lubricants and the ISO 4259, benefitted greatly from more than 50 years
of collaboration between ASTM and the Institute of Petroleum (IP) in the UK. The more recent work
was documented by the IP and has become ISO 4259.
ISO 4259 encompasses both the determination of precision and the application of such precision
data. In effect, it combines the type of information in RR:D02-1007 regarding the determination of
the precision estimates and the type of information in Practice D3244 for the utilization of test data.
The following practice, intended to replace RR:D02-1007, differs slightly from related portions of the
ISO standard.
1. Scope*
1.1 This practice covers the necessary preparations and planning for the conduct of interlaboratory programs for the development
of estimates of precision (determinability, repeatability, and reproducibility) and of bias (absolute and relative), and further presents
the standard phraseology for incorporating such information into standard test methods.
1.2 This practice is generally limited to homogeneous petroleum products, liquid fuels, and lubricants with which serious sampling
problems (such as heterogeneity or instability) do not normally arise.
1.3 This practice may not be suitable for products with sampling problems as described in 1.2, solid or semisolid products such
as petroleum coke, industrial pitches, paraffin waxes, greases, or solid lubricants when the heterogeneous properties of the
substances create sampling problems. In such instances, consult a trained statistician.
1.4 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
This practice is under the jurisdiction of ASTM Committee D02 on Petroleum Products, Liquid Fuels, and Lubricantsand is the direct responsibility of Subcommittee
D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.
Current edition approved Dec. 1, 2023March 1, 2024. Published December 2023March 2024. Originally approved in 1998. Last previous edition approved in 2023 as
D6300 – 23.D6300 – 23a. DOI: 10.1520/D6300-23A.10.1520/D6300-24.
Supporting data have been filed at ASTM International Headquarters and may be obtained by requesting Research Report RR:D02-1007. Contact ASTM Customer
Service at service@astm.org.
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
D6300 − 24
2. Referenced Documents
2.1 ASTM Standards:
D3244 Practice for Utilization of Test Data to Determine Conformance with Specifications
D3606 Test Method for Determination of Benzene and Toluene in Spark Ignition Fuels by Gas Chromatography
D6708 Practice for Statistical Assessment and Improvement of Expected Agreement Between Two Test Methods that Purport
to Measure the Same Property of a Material
D7915 Practice for Application of Generalized Extreme Studentized Deviate (GESD) Technique to Simultaneously Identify
Multiple Outliers in a Data Set
E29 Practice for Using Significant Digits in Test Data to Determine Conformance with Specifications
E177 Practice for Use of the Terms Precision and Bias in ASTM Test Methods
E456 Terminology Relating to Quality and Statistics
E691 Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method
2.2 ISO Standards:
ISO 4259 Petroleum Products-Determination and Application of Precision Data in Relation to Methods of Test
3. Terminology
3.1 Definitions:
3.1.1 analysis of variance (ANOVA), n—technique that enables the total variance of a method to be broken down into its
component factors. ISO 4259
3.1.2 bias, n—the difference between the expectation of the test results and an accepted reference value.
3.1.2.1 Discussion—
The term “expectation” is used in the context of statistics terminology, which implies it is a “statistical expectation.” E177
3.1.3 between-method bias (relative bias), n—a quantitative expression for the mathematical correction that can statistically
improve the degree of agreement between the expected values of two test methods which purport to measure the same property.
D6708
3.1.4 degrees of freedom, n—the divisor used in the calculation of variance, one less than the number of independent results.
3.1.4.1 Discussion—
This definition applies strictly only in the simplest cases. Complete definitions are beyond the scope of this practice. ISO 4259
3.1.5 determinability, n—a quantitative measure of the variability associated with the same operator in a given laboratory obtaining
successive determined values using the same apparatus for a series of operations leading to a single result; it is defined as the
difference between two such single determined values that would be exceeded about 5 % of the time (one case in 20 in the long
run) in the normal and correct operation of the test method.
3.1.5.1 Discussion—
This definition implies that two determined values, obtained under determinability conditions, which differ by more than the
determinability value should be considered suspect. If an operator obtains more than two determinations, then it would usually be
satisfactory to check the most discordant determination against the mean of the remainder, using determinability as the critical
difference (1).
3.1.6 mean square, n—in analysis of variance, sum of squares divided by the degrees of freedom. ISO 4259
3.1.7 normal distribution, n—the distribution that has the probability function x, such that, if x is any real number, the probability
density is
21/2 2 2
f x 5 1/σ 2π exp 2 x 2 μ /2σ (1)
~ ! ~ !~ ! @ ~ ! #
NOTE 1—μ is the true value and σ is the standard deviation of the normal distribution (σ > 0). ISO 4259
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
Available from American National Standards Institute (ANSI), 25 W. 43rd St., 4th Floor, New York, NY 10036, http://www.ansi.org.
The bold numbers in parentheses refers to the list of references at the end of this standard.
D6300 − 24
3.1.8 outlier, n—a result far enough in magnitude from other results to be considered not a part of the set. RR:D02–1007
3.1.9 precision, n—the degree of agreement between two or more results on the same property of identical test material. In this
practice, precision statements are framed in terms of repeatability and reproducibility of the test method.
3.1.9.1 Discussion—
The testing conditions represented by repeatability and reproducibility should reflect the normal extremes of variability under
which the test is commonly used. Repeatability conditions are those showing the least variation; reproducibility, the usual
maximum degree of variability. Refer to the definitions of each of these terms for greater detail.
RR:D02–1007
3.1.10 random error, n—the chance variation encountered in all test work despite the closest control of variables. RR:D02–1007
3.1.11 repeatability (a.k.a. Repeatability Limit), n—a quantitative expression for the random error associated with the difference
between two independent results obtained under repeatability conditions that would be exceeded about 5 % of the time (one case
in 20 in the long run) in the normal and correct operation of the test method.
3.1.11.1 Discussion—
Interpret as the limit value the absolute difference between two single test results obtained under repeatability conditions is
expected to exceed with an approximate probability of 5 %.
3.1.11.2 Discussion—
The difference is related to the repeatability standard deviation but it is not the standard deviation or its estimate.
3.1.11.3 Discussion—
In 3.1.11 and 3.1.13, the term “probability” quantifies the likelihood of repeatability or reproducibility limit exceedance for the
difference between a single pair of results obtained under the respective conditions. The "one case in 20 in the long run" in the
parenthesis is not to be interpreted as one case in every 20, but it is over the long run. The long run concept can be illustrated using
10 cases out of 200, or 100 cases out of 2000, or 1000 cases in 20 000. The lowest numerical values of one case in 20 is used here.
3.1.11.4 Discussion—
The "one case in 20" is a legacy term that was carried over from RR:D02-1007 in the original development of Practice D6300.
RR:D02–1007
3.1.12 repeatability conditions, n—conditions where independent test results are obtained with the same method on identical test
items in the same laboratory by the same operator using the same equipment within short intervals of time. E177
3.1.13 reproducibility (a.k.a. Reproducibility Limit), n—a quantitative expression for the random error associated with the
difference between two independent results obtained under reproducibility conditions that would be exceeded about 5 % of the time
(one case in 20 in the long run) in the normal and correct operation of the test method.
3.1.13.1 Discussion—
Interpret as the limit value the absolute difference between two single test results obtained under reproducibility conditions is
expected to exceed with an approximate a probability of 5 %.
3.1.13.2 Discussion—
The difference is related to the reproducibility standard deviation but is not the standard deviation or its estimate. RR:D02–1007
3.1.13.3 Discussion—
In those cases where the normal use of the test method does not involve sending a sample to a testing laboratory, either because
it is an in-line test method or because of serious sample instabilities or similar reasons, the precision test for obtaining
reproducibility may allow for the use of apparatus from the participating laboratories at a common site (several common sites, if
feasible). The statistical analysis is not affected thereby. However, the interpretation of the reproducibility value will be affected
since the test data is collected under intermediate precision conditions as defined in Practice E177, and therefore, the precision
statement shall, in this case, state the conditions to which the reproducibility value applies, and label this precision in a manner
consistent with how the test data is obtained.
NOTE 2—The reproducibility precision outcome from 3.1.13.3 is a form of Intermediate Precision as defined in Practice E177.
3.1.14 reproducibility conditions, n—conditions where independent test results are obtained with the same method on identical test
items in different laboratories with different operators using different equipment.
NOTE 3—Different laboratory by necessity means a different operator, different equipment, and different location and under different supervisory control.
D6300 − 24
E177
3.1.15 standard deviation, n—measure of the dispersion of a series of results around their mean, equal to the square root of the
variance and estimated by the positive square root of the mean square. ISO 4259
3.1.16 sum of squares, n—in analysis of variance, sum of squares of the differences between a series of results and their mean.
ISO 4259
3.1.17 variance, n—a measure of the dispersion of a series of accepted results about their average. It is equal to the sum of the
squares of the deviation of each result from the average, divided by the number of degrees of freedom. RR:D02–1007
3.1.18 variance, between-laboratory, n—that component of the overall variance due to the difference in the mean values obtained
by different laboratories. ISO 4259
3.1.18.1 Discussion—
When results obtained by more than one laboratory are compared, the scatter is usually wider than when the same number of tests
are carried out by a single laboratory, and there is some variation between means obtained by different laboratories. Differences
in operator technique, instrumentation, environment, and sample “as received” are among the factors that can affect the between
laboratory variance. There is a corresponding definition for between-operator variance.
3.1.18.2 Discussion—
The term “between-laboratory” is often shortened to “laboratory” when used to qualify representative parameters of the dispersion
of the population of results, for example as “laboratory variance.”
3.2 Definitions of Terms Specific to This Standard:
3.2.1 determination, n—the process of carrying out a series of operations specified in the test method whereby a single value is
obtained.
3.2.2 operator, n—a person who carries out a particular test.
3.2.3 probability density function, n—function which yields the probability that the random variable takes on any one of its
admissible values; here, we are interested only in the normal probability.
3.2.4 result, n—the final value obtained by following the complete set of instructions in the test method.
3.2.4.1 Discussion—
It may be obtained from a single determination or from several determinations, depending on the instructions in the method. When
rounding off results, the procedures described in Practice E29 shall be used.
4. Summary of Practice
4.1 A draft of the test method is prepared and a pilot program can be conducted to verify details of the procedure and to estimate
roughly the precision of the test method.
4.1.1 If the responsible committee decides that an interlaboratory study for the test method is to take place at a later point in time,
an interim repeatability is estimated by following the requirements in 6.2.1.
4.2 A plan is developed for the interlaboratory study using the number of participating laboratories to determine the number of
samples needed to provide the necessary degrees of freedom. Samples are acquired and distributed. The interlaboratory study is
then conducted on an agreed draft of the test method.
4.3 The data are summarized and analyzed. Any dependence of precision on the level of test result is removed by transformation.
The resulting data are inspected for uniformity and for outliers. Any missing and rejected data are estimated. The transformation
is confirmed. Finally, an analysis of variance is performed, followed by calculation of repeatability, reproducibility, and bias. When
it forms a necessary part of the test procedure, the determinability is also calculated.
5. Significance and Use
5.1 ASTM test methods are frequently intended for use in the manufacture, selling, and buying of materials in accordance with
D6300 − 24
specifications and therefore should provide such precision that when the test is properly performed by a competent operator, the
results will be found satisfactory for judging the compliance of the material with the specification. Statements addressing precision
and bias are required in ASTM test methods. These then give the user an idea of the precision of the resulting data and its
relationship to an accepted reference material or source (if available). Statements addressing determinability are sometimes
required as part of the test method procedure in order to provide early warning of a significant degradation of testing quality while
processing any series of samples.
5.2 Repeatability and reproducibility are defined in the precision section of every Committee D02 test method. Determinability
is defined above in Section 3. The relationship among the three measures of precision can be tabulated in terms of their different
sources of variation (see Table 1).
5.2.1 When used, determinability is a mandatory part of the Procedure section. It will allow operators to check their technique for
the sequence of operations specified. It also ensures that a result based on the set of determined values is not subject to excessive
variability from that source.
5.3 A bias statement furnishes guidelines on the relationship between a set of test results and a related set of accepted reference
values. When the bias of a test method is known, a compensating adjustment can be incorporated in the test method.
5.4 This practice is intended for use by D02 subcommittees in determining precision estimates and bias statements to be used in
D02 test methods. Its procedures correspond with ISO 4259 and are the basis for the Committee D02 computer software,
Calculation of Precision Data: Petroleum Test Methods. The use of this practice replaces that of Research Report RR:D02-1007.
5.5 Standard practices for the calculation of precision have been written by many committees with emphasis on their particular
product area. One developed by Committee E11 on Statistics is Practice E691. Practice E691 and this practice differ as outlined
in Table 2.
6. Stages in Planning of an Interlaboratory Test Program for the Determination of the Precision of a Test Method
6.1 The stages in planning an interlaboratory test program are: preparing a draft method of test (see 6.2), planning and executing
a pilot program with at least two laboratories (optional but recommended for new test methods) (see 6.3), planning the
interlaboratory program (see 6.4), and executing the interlaboratory program (see 6.5). The four stages are described in turn.
6.2 Preparing a Draft Method of Test—This shall contain all the necessary details for carrying out the test and reporting the results.
Any condition which could alter the results shall be specified. The section on precision will be included at this stage only as a
heading.
6.2.1 Interim Repeatability Study—If the responsible committee decides that an interlaboratory study for the test method is to take
place at a later point in time, using this standard, an interim repeatability standard deviation is estimated by following the steps
as outlined below. This interim repeatability standard deviation can be used to meet ASTM Form and Style Requirement A21.5.1.
When the committee is ready to proceed with the ILS, continue with this practice from 6.3 onwards.
6.2.1.1 Design—The following minimum requirements shall be met:
(1) Three (3) samples, compositionally representative of the majority of materials within the design envelope of the test
method, covering the low, medium, and high regions of the intended test method range.
(2) Twelve (12) replicates per sample, obtained under repeatability conditions in a single laboratory.
6.2.1.2 Analysis—Carry out the following analyses in the order presented:
(1) Perform GESD Outlier Rejection as per Practice D7915 for each sample.
TABLE 1 Sources of Variation
Method Apparatus Operator Laboratory Time
Reproducibility Complete Different Different Different Not Specified
(Result)
Repeatability Complete Same Same Same Almost same
(Result)
Determinability Incomplete Same Same Same Almost same
(Part result)
D6300 − 24
TABLE 2 Differences in Calculation of Precision in Practices
D6300 and E691
Element This Practice Practice E691
Number of replicates Two Any number
Precision is written Test method Each sample
for
Outlier tests: Sequential Simultaneous
Within laboratories Cochran test k-value
Between laborato- Hawkins test h-value
ries
Outliers Rejected, subject to subcom- Rejected if many laborato-
mittee approval. ries or for cause such as
blunder or not following
method.
Retesting not generally per- Laboratory may retest
mitted. sample having rejected
data.
Analysis of variance Two-way, applied globally One-way, applied to each
to all the remaining data sample separately.
at once.
Precision multiplier
t 2 , where t is the two- 2.851.96 2
œ œ
tailed Student’s t for 95 %
probability.
Increases with decreasing Constant.
laboratories × samples par-
ticularly below 12.
Variation of precision Minimized by data transfor- User may assess from in-
with level mation. Equations dividual sample precisions.
for repeatability and reproduc-
ibility are generated in the
retransformation process.
(2) Calculate sample variance (v) and standard deviation (s) for each sample using non-rejected results.
(3) Perform the Hartley test for variance equality as follows:
calculate the ratio : F = v /v where v and v are the largest and smallest variance obtained.
max max min max min
(4) If F is less than 4.85, estimate the interim repeatability standard deviation of the test method by taking the square root
max
of the average variance calculated using individual variances from all samples as illustrated below using three samples:
0.5
Interim repeatability standard deviation = v 1 v 1 v ⁄3 , where v ,v ,v are variances for each sample; it should be noted
@~ ! #
1 2 3 1 2 3
that if the number of non-outlying results used to calculate the variances are not the same, this equation provides an approximation
only, but is suitable for the intended purpose.
(5) If F exceeds 4.85, list the averages and associated repeatability standard deviations for each sample separately.
max
(6) If F exceeds 4.85, and, v is associated with the sample with the lowest average, calculate the following ratio: [10 s
max max max
0.5
]/average , where s is (v ) , and
sample max max
average is the average of the sample. If this ratio is near or exceeds 1, then it is likely that this sample is at or below the limit
sample
of quantitation of the test method. If this ratio is far below 1, it is likely this is a sample-specific effect. Method developers should
investigate and take appropriate steps to revise the test method scope or improve the test method precision at the low limit prior
to the conduct of a full ILS.
(7) If the sample set design meets the requirement in 6.4.2, the methodology in Appendix X2 can be used to estimate an interim
repeatability function by treating the repeats per sample as results from ‘pseudo-laboratories’ without repeats.
NOTE 4—It is highly recommended that 6.2.1.2(7) be conducted under the guidance of a statistician familiar with the methodology in Appendix X2.
6.2.1.3 Validation of Interim Repeatability Study by Another Laboratory—It is highly recommended that the findings from the
interim repeatability study be validated by conducting a similar study at another laboratory. If the findings from the validation study
do not support the functional form (constant or per Appendix X2) of the interim repeatability study obtained by the initial
laboratory, or, if the ratio:
interim repeataility standard deviation from lab A
F G
interim repeatability standard deviation from lab B
D6300 − 24
exceeds 2.4, where the larger of the standard deviation value is in the numerator, that is, if the repeatability standard deviation
for lab A is numerically larger than B; otherwise use the repeatability standard deviation for lab B in the numerator and the
repeatability standard deviation for lab A in the denominator, it can be concluded that the findings from one laboratory cannot be
validated by another laboratory. The method developer is advised to consult a statistician and subject matter experts to decide on
which laboratory findings are to be used.
6.3 Planning and Executing a Pilot Program with at Least Two Laboratories:
6.3.1 A pilot program is recommended to be used with new test methods for the following reasons: (1) to verify the details in the
operation of the test; (2) to find out how well operators can follow the instructions of the test method; (3) to check the precautions
regarding sample handling and storage; and (4) to estimate roughly the precision of the test.
6.3.2 At least two samples are required, covering the range of results to which the test is intended to apply; however, include at
least 12 laboratory-sample combinations. Test each sample twice by each laboratory under repeatability conditions. If any
omissions or inaccuracies in the draft method are revealed, they shall now be corrected. Analyze the results for precision, bias, and
determinability (if applicable) using this practice. If any are considered to be too large for the technical application, then consider
alterations to the test method.
6.4 Planning the Interlaboratory Program:
6.4.1 There shall be at least six (6) participating laboratories, but it is recommended this number be increased to eight (8) or more
in order to ensure the final precision is based on at least six (6) laboratories and to make the precision statement more representative
of the qualified user population.
6.4.2 The number of samples shall be sufficient to cover the range of the property measured, and to give reliability to the precision
estimates. If any variation of precision with level was observed in the results of the pilot program, then at least six samples,
spanning the range of the test method in a manner than ensures the leverage (h) of each sample (see Eq 2) is less than 0.5does
not exceed 4/n (rounded to 1st decimal) shall be used in the interlaboratory program. In any case, it is necessary to obtain at least
30 degrees of freedom in both repeatability and reproducibility. For repeatability, this means obtaining a total of at least 30 pairs
of results in the program. In the absence of pilot test program information to permit use of Fig. 1 (see 6.4.3) to determine the
number of samples, the number of samples shall be greater than five, and chosen such that the number of laboratories times the
number of samples is greater than or equal to 42.
Leverage calculation:
1 x 2 x¯
~ !
i
h 5 1 (2)
ii n
n
x 2 x¯
~ !
k
(
k51
h = leverage of sample i,
ii
n = total number of planned samples,
p = planned property level for sample i,
i
x = ln (p ), and
i i
x¯ = grand average of all x .
i
6.4.3 For reproducibility, Fig. 1 gives the minimum number of samples required in terms of L, P, and Q, where L is the number
of participating laboratories, and P and Q are the ratios of variance component estimates (see 8.3.1) obtained from the pilot
program. Specifically, P is the ratio of the interaction component to the repeats component, and Q is the ratio of the laboratories
component to the repeats component.
NOTE 5—Appendix X1 gives the derivation of the equation used. If Q is much larger than P, then 30 degrees of freedom cannot be achieved; the blank
entries in Fig. 1 correspond to this situation or the approach of it (that is, when more than 20 samples are required). For these cases, there is likely to
be a significant bias between laboratories. The program organizer shall be informed; further standardization of the test method may be necessary.
6.5 Executing the Interlaboratory Program:
6.5.1 One person shall oversee the entire program, from the distribution of the texts and samples to the final appraisal of the
results. He or she shall be familiar with the test method, but should not personally take part in the actual running of the tests.
D6300 − 24
FIG. 1 Determination of Number of Samples Required (see 6.4.3)
6.5.2 The text of the test method shall be distributed to all the laboratories in time to raise any queries before the tests begin. If
any laboratory wants to practice the test method in advance, this shall be done with samples other than those used in the program.
6.5.3 The samples shall be accumulated, subdivided, and distributed by the organizer, who shall also keep a reserve of each sample
for emergencies. It is most important that the individual laboratory portions be homogeneous. Instructions to each laboratory shall
include the following:
D6300 − 24
6.5.3.1 Testing Protocol—The protocol to be used for testing of the ILS sample set shall be provided. Factors that may affect test
method outcome but are not intended to be controlled in the normal execution of the test method shall not be intentionally removed
nor controlled in the testing of the ILS samples, unless explicitly permitted by the sponsoring subcommittee of the ILS for special
studies where certain factors are controlled intentionally as part of the testing protocol to meet the intended ILS study objectives.
To remove, control, or set limits on factors that are not intended to be controlled in the normal execution of the test method in the
conduct of an ILS that is intended for the precision evaluation of the test method executed under normal operating conditions will
result in overly optimistic precision. Precision statements thus generated will likely be unattainable by majority of users in the
normal execution of the test method.
6.5.3.2 The agreed draft method of test;
6.5.3.3 Material Safety Data Sheets, where applicable, and the handling and storage requirements for the samples;
6.5.3.4 The order in which the samples are to be tested (a different random order for each laboratory);
6.5.3.5 The statement that two test results are to be obtained in the shortest practical period of time on each sample by the same
operator with the same apparatus. For statistical reasons it is imperative that the two results are obtained independently of each
other, that is, that the second result is not biased by knowledge of the first. If this is regarded as impossible to achieve with the
operator concerned, then the pairs of results shall be obtained in a blind fashion, but ensuring that they are carried out in a short
period of time (preferably the same day). The term blind fashion means that the operator does not know that the sample is a
replicate of any previous run.
6.5.3.6 The period of time during which repeated results are to be obtained and the period of time during which all the samples
are to be tested;
6.5.3.7 A blank form for reporting the results. For each sample, there shall be space for the date of testing, the two results, and
any unusual occurrences. The unit of accuracy for reporting the results shall be specified. This should be, if possible, more digits
reported than will be used in the final test method, in order to avoid having rounding unduly affect the estimated precision values.
6.5.3.8 When it is required to estimate the determinability, the report form must include space for each of the determined values
as well as the test results.
6.5.3.9 A statement that the test shall be carried out under normal conditions, using operators with good experience but not
exceptional knowledge; and that the duration of the test shall be the same as normal.
6.5.4 The pilot program operators may take part in the interlaboratory program. If their extra experience in testing a few more
samples produces a noticeable effect, it will serve as a warning that the test method is not satisfactory. They shall be identified in
the report of the results so that any such effect may be noted.
6.5.5 It can not be overemphasized that the statement of precision in the test method is to apply to test results obtained by running
the agreed procedure exactly as written. Therefore, the test method must not be significantly altered after its precision statement
is written.
7. Inspection of Interlaboratory Results for Uniformity and for Outliers
7.1 Introduction:
7.1.1 This section specifies procedures for examining the results reported in a statistically designed interlaboratory program (see
Section 6) to establish:
7.1.1.1 The independence or dependence of precision and the level of results;
7.1.1.2 The uniformity of precision from laboratory to laboratory, and to detect the presence of outliers.
NOTE 6—The procedures are described in mathematical terms based on the notation of Annex A1 and illustrated with reference to the example data
(calculation of bromine number) set out in Annex A2. Throughout this section (and Section 8), the procedures to be used are first specified and then
illustrated by a worked example using data given in Annex A2.
D6300 − 24
NOTE 7—It is assumed throughout this section that all the deviations are either from a single normal distribution or capable of being transformed into
such a distribution (see 7.2). Other cases (which are rare) would require different treatment that is beyond the scope of this practice. Also, see (2) for a
statistical test of normality.
7.2 Transformation of Data:
7.2.1 In many test methods the precision depends on the level of the test result, and thus the variability of the reported results is
different from sample to sample. The method of analysis outlined in this practice requires that this shall not be so and the position
is rectified, if necessary, by a transformation.
7.2.1.1 Prior to commencement of analysis to determine if transformation is necessary, it is a good practice to examine information
gathered from ILS participants to determine compliance with agreed upon ILS protocol and method of test. As part of this
examination, the raw data as reported should be inspected for existence of extreme or outlandish values that are visually obvious.
Exclusion of extreme or outlandish results from transformation analysis is recommended if assignable causes can be found in order
to help ensure test data dependability, transformation reliability, and subsequent computation efficiency. If assignable causes cannot
be found, exclusion of extreme or outlandish results from transformation analysis should be confirmed for each sample using a
formal statistical test such as the General Extreme Studentized Deviation (GESD) multi-outlier technique (see Practice D7915) or
other technically equivalent techniques at the 99 % confidence level on the difference and average (or sum) of the two replicate
results as submitted by each ILS participant for each sample as follows:
(1) Compute the difference of the two replicates submitted by each participant for the sample;
(2) Perform GESD on the differences from all participants for the sample;
(3) For each difference that is identified as outlier, reject the result that is farthest from the median of all results for that sample;
(4) Compute the average (or sum) of the two replicates for each participant for the sample; for participants who submitted only
a single result, or, if one of the submitted replicates is rejected in (3), use the remaining result as the average (or 2 × the remaining
result as sum) for the participant;
(5) Perform GESD on the averages (or sums) from all participants for the sample;
(6) Reject all results identified as outliers in (5); and
(7) Continue execution of the remainder of this practice using the retained results.
It is recommended that such statistical tests be conducted under the guidance of a statistician.
7.2.2 The laboratories’ standard deviations D , and the repeats standard deviations d (see Annex A1) are calculated and plotted
j j
separately against the sample means m . If the points so plotted may be considered as lying about a pair of lines parallel to the
j
m-axis, then no transformation is necessary. If, however, the plotted points describe non-horizontal straight lines or curves of the
form D = f (m) and d = f (m), then a transformation will be necessary.
1 2
7.2.3 The relationships D = f (m) and d = f ( m) will not in general be identical. It is frequently the case, however, that the ratios
1 2
d
j
u 5 are approximately the same for all m , in which case f is approximately proportional to f and a single transformation will
j j 1 2
D
j
be adequate for both repeatability and reproducibility. The statistical procedures of this practice are greatly facilitated when a single
transformation can be used. For this reason, unless the u clearly vary with property level, the two relationships are combined into
j
a single dependency relationship D = f(m) (where D now includes d) by including a dummy variable T. This will take account of
the difference between the relationships, if one exists, and will provide a means of testing for this difference (see A4.1).
7.2.4 In the event that the rations u do vary with level (mean, m ), as confirmed with a regression of u on m , or log(u ) on log(m ),
j j j j j j
follow the instructions in Annex A5. Otherwise, continue with 7.2.5.
7.2.5 The single relationship D = f(m) is best estimated by weighted linear regression analysis. Strictly speaking, an iteratively
weighted regression should be used, but in most cases even an unweighted regression will give a satisfactory approximation. The
derivation of weights is described in A4.2, and the computational procedure for the regression analysis is described in A4.3.
Typical forms of dependence D = f(m) are given in A3.1. These are all expressed in terms of at most two (2) transformation
parameters, B and B .
7.2.6 The typical forms of dependence, the transformations they give rise to, and the regressions to be performed in order to
estimate the transformation parameters B, are all summarized in A3.2. This includes statistical tests for the significance of the
regression (that is, is the relationship D = f(m) parallel to the m-axis), and for the difference between the repeatability and
reproducibility relationships, based at the 5 % significance level. If such a difference is found to exist, follow the procedures in
Annex A5.
D6300 − 24
7.2.7 If it has been shown at the 5 % significance level that there is a significant regression of the form D = f(m), then the
appropriate transformation y = F(x), where x is the reported result, is given by the equation
dx
F x 5 K (3)
~ ! *
f x
~ !
where K = a constant. In that event, all results shall be transformed accordingly and the remainder of the analysis carried out
in terms of the transformed results. Typical transformations are given in A3.1.
7.2.8 The choice of transformation is difficult to make the subject of formalized rules. Qualified statistical assistance may be
required in particular cases. The presence of outliers may affect judgement as to the type of transformation required, if any (see
7.7).
7.2.9 Worked Example:
Table 3 lists the values of m, D, and d for the eight samples in the example given in Annex A2, correct to three significant
7.2.9.1
digits. Corresponding degrees of freedom are in parentheses. Inspection of the values in Table 3 shows that both D and d increase
with m, the rate of increase diminishing as m increases. A plot of these figures on log-log paper (that is, a graph of log D and log d
against log m) shows that the points may reasonably be considered as lying about two straight lines (see Fig. A4.1 in Annex A4).
From the example calculations given in A4.4, the gradients of these lines are shown to be the same, with an estimated value of
0.638. Bearing in mind the errors in this estimated value, the gradient may for convenience be taken as 2/3.
2 1
x 3 dx 5 3x 3 (4)
*
7.2.9.2 Hence, the same transformation is appropriate both for repeatability and reproducibility, and is given by the equation. Since
the constant multiplier may be ignored, the transformation thus reduces to that of taking the cube roots of the reported bromine
numbers. This yields the transformed data shown in Table A1.3, in which the cube roots are quoted correct to three decimal places.
7.3 Tests for Outliers:
7.3.1 The reported data or, if it has been decided that a transformation is necessary, the transformed results shall be inspected for
outliers. These are the values which are so different from the remainder that it can only be concluded that they have arisen from
some fault in the application of the test method or from testing a wrong sample. Many possible tests may be used and the associated
significance levels varied, but those that are specified in the following subsections have been found to be appropriate in this
practice. These outlier tests all assume a normal distribution of errors.
7.3.1.1 The total percentage of outliers rejected, as defined by 100× (no. of rejected results/no. of reported results), shall be
reported explicitly to the ILS Program Manager for approval by the sponsoring subcommittee and main committee.
7.3.2 Uniformity of Repeatability—The first outlier test is concerned with detecting a discordant result in a pair of repeat results.
This test (3) involves calculating the e over all the laboratory/sample combinations. Cochran’s criterion at the 1 % significance
ij
level is then used to test the ratio of the largest of these values over their sum (see A1.5). If its value exceeds the value given in
Table A2.2, corresponding to one degree of freedom, n being the number of pairs available for comparison, then the member of
the pair farthest from the sample mean shall be rejected and the process repeated, reducing n by 1, until no more rejections are
called for. In certain cases, specifically when the number of digits used in reporting results leads to a large number of repeat ties,
this test can lead to large proportion of rejections. If this is so, consideration should be given to cease this rejection test and retain
some or all of the rejected results. A decision based on judgement in consultation with a statistician will be necessary in this case.
7.3.3 Worked Example—In the case of the example given in Annex A2, the absolute differences (ranges) between transformed
repeat results, that is, of the pairs of numbers in Table A1.3, in units of the third decimal place, are shown in Table 4. The largest
range is 0.078 for Laboratory G on Sample 3. The sum of squares of all the ranges is
TABLE 3 Computed from Bromine Example Showing Dependence of Precision on Level
Sample Number 3 8 1 4 5 6 2 7
m 0.756 1.22 2.15 3.64 10.9 48.2 65.4 114
D 0.0669 (14) 0.159 (9) 0.729 (8) 0.211 (11) 0.291 (9) 1.50 (9) 2.22 (9) 2.93 (9)
d 0.0500 (9) 0.0572 (9) 0.127 (9) 0.116 (9) 0.0943 (9) 0.527 (9) 0.818 (9) 0.935 (9)
D6300 − 24
TABLE 4 Absolute Differences Between Transformed Repeat
Results: Bromine Example
Laboratory Sample
1 2 3 4 5 6 7 8
A 42 21 7 13 7 10 8 0
B 23 12 12 0 7 9 3 0
C 0 6 0 0 7 8 4 0
D 14 6 0 13 0 8 9 32
E 65 4 0 0 14 5 7 28
F 23 20 34 29 20 30 43 0
G 62 4 78 0 0 16 18 56
H 44 20 29 44 0 27 4 32
J 0 59 0 40 0 30 26 0
2 2 2 2
0.042 + 0.021 + . . . + 0.026 + 0 = 0.0439.
Thus, the ratio to be compared with Cochran’s criterion is
0.078
5 0.138 (5)
0.0439
where 0.138 is the result obtained by electronic calculation of unrounded factors in the expression. There are 72 ranges and
as, from Table A2.2, the criterion for 80 ranges is 0.1709, this ratio is not significant.
7.3.4 Uniformity of Reproducibility:
7.3.4.1 The following outlier tests are concerned with establishing uniformity in the reproducibility estimate, and are designed to
detect either a discordant pair of results from a laboratory on a particular sample or a discordant set of results from a laboratory
on all samples. For both purposes, the Hawkins’ test (4) is appropriate.
7.3.4.2 This involves forming for each sample, and finally for the overall laboratory averages (see 7.6), the ratio of the largest
absolute deviation of laboratory mean from sample (or overall) mean to the square root of certain sums of squares (A1.6).
7.3.4.3 The ratio corresponding to the largest absolute deviation shall be compared with the critical 1 % values given in Table
A1.5, where n is the number of laboratory/sample cells in the sample (or the number of overall laboratory means) concerned and
where v is the degrees of freedom for the sum of squares which is additional to that corresponding to the sample in question. In
the test for laboratory/sample cells v will refer to other samples, but will be zero in the test for overall laboratory averages.
7.3.4.4 If a significant value is encou
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...