Standard Practice for Calculating and Using Basic Statistics

ABSTRACT
This practice covers methods and equations for computing and presenting basic statistics. This practice includes simple descriptive statistics for variable and attribute data, elementary methods of statistical inference, and tabular and graphical methods for variable data. Some interpretation and guidance for use is also included.
This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data.
SIGNIFICANCE AND USE
4.1 This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data.  
4.1.1 A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation or instance of measurement of the variable. The numbers themselves are the result of applying the measurement process to the variable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations, there may be several variables defined for study.  
4.1.2 The sample is selected from a larger set called the population. The population can be a finite set of items, a very large or essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic, continuing to emerge and possibly change over time. Sample data serve as representatives of the population from which the sample originates. It is the population that is of primary interest in any particular study.  
4.2 The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes, the data may be either binary trials or a count of a defined event over some interval (time, space, volume, weight, or area). Binary trials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied and a “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “1.” Such data are often governed by the binomial distribution. For a count of events over some interval, the number of times the event is observed on the inspection interval ...
SCOPE
1.1 This practice covers methods and equations for computing and presenting basic statistics. This practice includes simple descriptive statistics for variable and attribute data, elementary methods of statistical inference, and tabular and graphical methods for variable data. Some interpretation and guidance for use is also included.  
1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrations of calculation methods. The examples are not binding on products or test methods treated.  
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.  
1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

General Information

Status
Published
Publication Date
31-Mar-2019
Technical Committee
E11 - Quality and Statistics

Relations

Effective Date
01-Apr-2019
Effective Date
21-Dec-2018
Effective Date
01-Nov-2023
Effective Date
01-Nov-2023
Effective Date
01-Apr-2022
Effective Date
01-Sep-2019
Effective Date
01-Nov-2017
Effective Date
01-Oct-2017
Effective Date
01-Oct-2017
Effective Date
01-Nov-2016
Effective Date
01-Jun-2016
Effective Date
01-Oct-2014
Effective Date
15-Nov-2013
Effective Date
15-Nov-2013
Effective Date
15-Nov-2013

Overview

ASTM E2586-19e1: Standard Practice for Calculating and Using Basic Statistics is a widely recognized standard issued by ASTM International. It provides comprehensive methods and equations for computing and presenting basic statistics. The scope encompasses simple descriptive statistics for variable and attribute data, elementary statistical inference, and both tabular and graphical methods tailored to variable data. The standard also offers interpretation and guidance for practical data analysis.

Organizations and institutions across research, business, government, and industry generate and maintain extensive empirical data sets. Consistent and correct statistical characterization of such data is crucial for quality control, decision-making, and process improvement. ASTM E2586-19e1 helps ensure that statistical methods are appropriate, transparent, and standardized across various application areas.

Key Topics

  • Descriptive Statistics: Covers key measures such as mean, median, range, variance, standard deviation, percentiles, quartiles, interquartile range, skewness, kurtosis, minimum, and maximum.
  • Attribute Data: Guidance for handling binary data and count data, including appropriate use of binomial and Poisson models.
  • Statistical Inference: Explains statistical hypothesis testing, confidence intervals, prediction intervals, tolerance intervals, and the use of p-values.
  • Population and Sample Concepts: Definitions and procedures for distinguishing between populations and samples, including the use of sample statistics to estimate population parameters.
  • Tabular and Graphical Methods: Instruction on the use of tables (frequency distributions, relative frequencies, cumulative frequencies) and graphical representations (histograms, boxplots, ogives, dotplots, normal probability plots, q-q plots).
  • Data Quality and Homogeneity: Emphasis on ensuring data samples are homogenous or originate from statistically controlled processes for valid statistical conclusions.
  • Terminology and Interpretation: Use of clearly defined statistical terminology, standardized in conjunction with ISO standards and other referenced ASTM documents.

Applications

ASTM E2586-19e1 serves as foundational guidance for:

  • Quality Control and Assurance: Organizations apply the standard to assess product consistency and monitor process variation in manufacturing.
  • Data Analysis in Research and Development: Science and engineering teams use this standard for summarizing experimental results, supporting hypothesis testing, and making evidence-based conclusions.
  • Regulatory Reporting: Agencies and regulated industries use consistent statistical reporting as specified by this standard to meet compliance requirements.
  • Business Analytics: Businesses leverage standardized statistical techniques to extract insights from sales, production, or customer data.
  • Process Improvement Initiatives: Methodologies such as Lean Six Sigma incorporate statistical analysis per ASTM E2586-19e1 for root-cause analysis and performance measurement.
  • Education and Training: Universities and training programs use this standard as a reference to teach foundational statistical methods.

Using ASTM E2586-19e1 ensures statistical calculations are performed and presented in ways that are widely recognized, reproducible, and facilitate informed decision-making.

Related Standards

For comprehensive application and alignment with international best practices, the following standards are referenced or commonly used alongside ASTM E2586-19e1:

  • ASTM E178 – Practice for Dealing With Outlying Observations
  • ASTM E456 – Terminology Relating to Quality and Statistics
  • ASTM E2234 – Practice for Sampling a Stream of Product by Attributes Indexed by AQL
  • ASTM E2282 – Guide for Defining the Test Result of a Test Method
  • ASTM E3080 – Practice for Regression Analysis With a Single Predictor Variable
  • ISO 3534-1 – Statistics-Vocabulary and Symbols, Part 1: Probability and General Statistical Terms
  • ISO 3534-2 – Statistics-Vocabulary and Symbols, Part 2: Applied Statistics

By adhering to ASTM E2586-19e1 and these related standards, organizations ensure data integrity, comparability, and regulatory alignment in their statistical practices.

Buy Documents

Standard

ASTM E2586-19e1 - Standard Practice for Calculating and Using Basic Statistics

English language (22 pages)
sale 15% off
sale 15% off

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

Bureau Veritas

Bureau Veritas is a world leader in laboratory testing, inspection and certification services.

COFRAC France Verified

DNV

DNV is an independent assurance and risk management provider.

NA Norway Verified

Sponsored listings

Frequently Asked Questions

ASTM E2586-19e1 is a standard published by ASTM International. Its full title is "Standard Practice for Calculating and Using Basic Statistics". This standard covers: ABSTRACT This practice covers methods and equations for computing and presenting basic statistics. This practice includes simple descriptive statistics for variable and attribute data, elementary methods of statistical inference, and tabular and graphical methods for variable data. Some interpretation and guidance for use is also included. This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data. SIGNIFICANCE AND USE 4.1 This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data. 4.1.1 A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation or instance of measurement of the variable. The numbers themselves are the result of applying the measurement process to the variable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations, there may be several variables defined for study. 4.1.2 The sample is selected from a larger set called the population. The population can be a finite set of items, a very large or essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic, continuing to emerge and possibly change over time. Sample data serve as representatives of the population from which the sample originates. It is the population that is of primary interest in any particular study. 4.2 The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes, the data may be either binary trials or a count of a defined event over some interval (time, space, volume, weight, or area). Binary trials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied and a “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “1.” Such data are often governed by the binomial distribution. For a count of events over some interval, the number of times the event is observed on the inspection interval ... SCOPE 1.1 This practice covers methods and equations for computing and presenting basic statistics. This practice includes simple descriptive statistics for variable and attribute data, elementary methods of statistical inference, and tabular and graphical methods for variable data. Some interpretation and guidance for use is also included. 1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrations of calculation methods. The examples are not binding on products or test methods treated. 1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

ABSTRACT This practice covers methods and equations for computing and presenting basic statistics. This practice includes simple descriptive statistics for variable and attribute data, elementary methods of statistical inference, and tabular and graphical methods for variable data. Some interpretation and guidance for use is also included. This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data. SIGNIFICANCE AND USE 4.1 This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data. 4.1.1 A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation or instance of measurement of the variable. The numbers themselves are the result of applying the measurement process to the variable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations, there may be several variables defined for study. 4.1.2 The sample is selected from a larger set called the population. The population can be a finite set of items, a very large or essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic, continuing to emerge and possibly change over time. Sample data serve as representatives of the population from which the sample originates. It is the population that is of primary interest in any particular study. 4.2 The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes, the data may be either binary trials or a count of a defined event over some interval (time, space, volume, weight, or area). Binary trials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied and a “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “1.” Such data are often governed by the binomial distribution. For a count of events over some interval, the number of times the event is observed on the inspection interval ... SCOPE 1.1 This practice covers methods and equations for computing and presenting basic statistics. This practice includes simple descriptive statistics for variable and attribute data, elementary methods of statistical inference, and tabular and graphical methods for variable data. Some interpretation and guidance for use is also included. 1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrations of calculation methods. The examples are not binding on products or test methods treated. 1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

ASTM E2586-19e1 is classified under the following ICS (International Classification for Standards) categories: 03.120.30 - Application of statistical methods. The ICS classification helps identify the subject area and facilitates finding related standards.

ASTM E2586-19e1 has the following relationships with other standards: It is inter standard links to ASTM E2586-19, ASTM D4375-96(2011), ASTM E2282-23, ASTM E3080-23, ASTM E456-13a(2022)e1, ASTM E3080-19, ASTM E3080-17, ASTM E456-13A(2017)e3, ASTM E456-13A(2017)e1, ASTM E3080-16, ASTM E178-16, ASTM E2282-14, ASTM E456-13ae1, ASTM E456-13ae3, ASTM E456-13a. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

ASTM E2586-19e1 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)


This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
ϵ1
Designation: E2586 − 19 An American National Standard
Standard Practice for
Calculating and Using Basic Statistics
This standard is issued under the fixed designation E2586; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
ε NOTE—Section 3.1.29 was corrected editorially in April 2020.
1. Scope 2.2 ISO Standards:
ISO 3534-1 Statistics—Vocabulary and Symbols, part 1:
1.1 This practice covers methods and equations for comput-
Probability and General Statistical Terms
ing and presenting basic statistics. This practice includes
ISO 3534-2 Statistics—Vocabulary and Symbols, part 2:
simple descriptive statistics for variable and attribute data,
Applied Statistics
elementary methods of statistical inference, and tabular and
graphical methods for variable data. Some interpretation and
3. Terminology
guidance for use is also included.
3.1 Definitions—Unless otherwise noted, terms relating to
1.2 The system of units for this practice is not specified.
quality and statistics are as defined in Terminology E456.
Dimensional quantities in the practice are presented only as
3.1.1 alternative hypothesis, H,n—a probability distribu-
a
illustrations of calculation methods. The examples are not
tion or type of probability distribution distinguished from the
binding on products or test methods treated.
null hypothesis.
1.3 This standard does not purport to address all of the
3.1.1.1 Discussion—The alternative hypothesis is typically
safety concerns, if any, associated with its use. It is the
a research hypothesis or a statement that we hope to show is
responsibility of the user of this standard to establish appro-
more plausible than the null hypothesis using real data.
priate safety, health, and environmental practices and deter-
3.1.2 characteristic, n—a property of items in a sample or
mine the applicability of regulatory limitations prior to use.
population which, when measured, counted, or otherwise
1.4 This international standard was developed in accor-
observed, helps to distinguish among the items. E2282
dance with internationally recognized principles on standard-
3.1.3 coeffıcient of variation, CV, n—for a nonnegative
ization established in the Decision on Principles for the
characteristic, the ratio of the standard deviation to the mean
Development of International Standards, Guides and Recom-
for a population or sample
mendations issued by the World Trade Organization Technical
3.1.3.1 Discussion—The coefficient of variation is often
Barriers to Trade (TBT) Committee.
expressed as a percentage.
3.1.3.2 Discussion—This statistic is also known as the
2. Referenced Documents
relative standard deviation, RSD.
2.1 ASTM Standards:
3.1.4 confidence bound, n—see confidence limit.
E178 Practice for Dealing With Outlying Observations
E456 Terminology Relating to Quality and Statistics 3.1.5 confidence coeffıcient, n—see confidence level.
E2234 Practice for Sampling a Stream of Product by Attri-
3.1.6 confidence interval, n—an interval estimate [L, U]
butes Indexed by AQL
with the statistics L and U as limits for the parameter θ and
E2282 Guide for Defining the Test Result of a Test Method
with confidence level 1 – α, where Pr(L ≤θ≤ U) ≥1– α.
E3080 Practice for Regression Analysis with a Single Pre-
3.1.6.1 Discussion—The confidence level, 1 – α, reflects the
dictor Variable
proportion of cases that the confidence interval [L, U] would
containorcoverthetrueparametervalueinaseriesofrepeated
1 random samples under identical conditions. Once L and U are
This practice is under the jurisdiction ofASTM Committee E11 on Quality and
Statistics and is the direct responsibility of Subcommittee E11.10 on Sampling / given values, the resulting confidence interval either does or
Statistics.
doesnotcontainit.Inthissense“confidence”appliesnottothe
Current edition approved April 1, 2019. Published May 2019. Originally
particular interval but only to the long run proportion of cases
approved in 2007. Last previous edition approved in 2018 as E2586 – 18. DOI:
when repeating the procedure many times.
10.1520/E2586-19E01.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on Available from American National Standards Institute (ANSI), 25 W. 43rd St.,
the ASTM website. 4th Floor, New York, NY 10036, http://www.ansi.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
ϵ1
E2586 − 19
3.1.7 confidence level, n—thevalue,1 – α,oftheprobability 3.1.22 population, n—the totality of items or units of
associated with a confidence interval, often expressed as a material under consideration.
percentage.
3.1.23 population parameter, n—summary measure of the
3.1.7.1 Discussion—α is generally a small number. Confi-
values of some characteristic of a population. ISO 3534-2
dence level is often 95 % or 99 %.
3.1.24 power, n—in hypothesis testing, the probability that a
3.1.8 confidence limit, n—each of the limits, L and U, of a
statistical hypothesis test rejects a null hypothesis, calculated
confidence interval, or the limit of a one-sided confidence
using an alternative hypothesis.
interval.
3.1.25 prediction interval, n—an interval for a future value
3.1.9 critical value, n—in hypothesis testing, the boundary
orsetofvalues,constructedfromacurrentsetofdata,inaway
(number) of the rejection region for a test statistic in a
that has a specified probability for the inclusion of the future
hypothesis test.
value.
3.1.10 degrees of freedom, df, n—the number of indepen-
3.1.26 p-value, n—in hypothesis testing, the probability of
dent data points minus the number of parameters that have to
observing a test statistic at least as extreme as what was
be estimated before calculating the variance.
actually obtained, under the assumption of the null hypothesis.
3.1.11 estimate, n—sample statistic used to approximate a 3.1.26.1 Discussion—p-value must not be thought of as the
probability the null hypothesis is true.
population parameter.
3.1.27 quantile, n—valuesuchthatafraction fofthesample
3.1.12 histogram, n—graphical representation of the fre-
quency distribution of a characteristic consisting of a set of or population is less than or equal to that value.
rectangles with area proportional to the frequency. ISO 3534-1
3.1.28 range, R, n—maximum value minus the minimum
3.1.12.1 Discussion—While not required, equal bar or class
value in a sample.
widths are recommended for histograms.
3.1.29 residual, n—the observed value minus fitted value,
th
3.1.13 interquartile range, IQR, n—the 75 percentile (0.75
when a regression model is used. E3080
th
quantile) minus the 25 percentile (0.25 quantile), for a data
3.1.30 sample, n—a group of observations or test results,
set.
taken from a larger collection of observations or test results,
3.1.14 kurtosis, γ,g,n—for a population or a sample, a
whichservestoprovideinformationthatmaybeusedasabasis
2 2
measure of the weight of the tails of a distribution relative to
for making a decision concerning the larger collection.
the center, calculated as the ratio of the fourth central moment
3.1.31 sample size, n, n—number of observed values in the
(empiricalifasample,theoreticalifapopulationapplies)tothe
sample.
standard deviation (sample, s, or population, σ) raised to the
3.1.32 sample statistic, n—summary measure of the ob-
fourth power, minus 3 (also referred to as excess kurtosis).
served values of a sample.
3.1.15 mean, n—of a population, µ, average or expected
¯
3.1.33 significance level, α,n—the probability a hypothesis
value of a characteristic in a population – of a sample, X, sum
test would reject the null hypothesis, based on the distribution
of the observed values in the sample divided by the sample
of the test statistic and assuming the null hypothesis to be true.
size.
3.1.33.1 Discussion—For a composite hypothesis, the maxi-
th
˜
3.1.16 median, X,n—the 50 percentile in a population or
mum probability of rejecting.
sample.
3.1.34 skewness, γ,g,n—for population or sample, a
3.1.16.1 Discussion—The sample median is the [(n + 1)⁄2] 1 1
measure of symmetry of a distribution, calculated as the ratio
order statistic if the sample size n is odd and is the average of
of the third central moment (empirical if a sample, and
the [n/2] and [n/2 + 1] order statistics if n is even.
theoretical if a population applies) to the standard deviation
3.1.17 midrange, n—average of the minimum and maxi-
(sample, s, or population, σ) raised to the third power.
mum values in a sample.
3.1.35 standard error—standard deviation of the population
3.1.18 nullhypothesis,H ,n—astatementaboutaparameter
of values of a sample statistic in repeated sampling, or an
of a probability distribution or about the type of probability
estimate of it.
distribution, tentatively regarded as true until rejected using a
3.1.35.1 Discussion—If the standard error of a statistic is
statistical hypothesis test.
estimated, it will itself be a statistic with some variance that
th
3.1.19 orderstatistic,x ,n—valueofthek observedvalue
depends on the sample size.
(k)
in a sample after sorting by order of magnitude.
3.1.36 standard deviation—of a population, σ, the square
3.1.19.1 Discussion—For a sample of size n, the first order
root of the average or expected value of the squared deviation
statistic x is the minimum value, x is the maximum value.
(1) (n)
of a variable from its mean; —of a sample, s, the square root
3.1.20 parameter, n—see population parameter. of the sum of the squared deviations of the observed values in
the sample from their mean divided by the sample size
3.1.21 percentile, n—quantile of a sample or a population,
minus 1.
for which the fraction less than or equal to the value is
expressed as a percentage. 3.1.37 statistic, n—see sample statistic.
ϵ1
E2586 − 19
3.1.38 statistical hypothesis test, n—a procedure and deci- attributes, the data may be either binary trials or a count of a
sion criteria used to decide whether or not to reject a null defined event over some interval (time, space, volume, weight,
hypothesis.
or area). Binary trials consist of a sequence of 0s and 1s in
3.1.38.1 Discussion—Synonyms include statistical test, hy- which a “1” indicates that the inspected item exhibited the
pothesis test, and significance test.
attribute being studied and a “0” indicates the item did not
exhibit the attribute. Each inspection item is assigned either a
3.1.39 test statistic, n—a statistic, calculable from the
“0” or a “1.” Such data are often governed by the binomial
sample observations of the variable of interest, whose prob-
distribution. For a count of events over some interval, the
ability distribution is known under the assumption of a null
number of times the event is observed on the inspection
hypothesis.
interval is recorded for each of n inspection intervals. The
3.1.40 tolerance interval, n—an interval to contain at least a
Poisson distribution often governs counting events over an
given proportion, p, of a process output or population, con-
interval.
structed using some confidence level, C.
3.1.40.1 Discussion—Parameters p and C are the coverage
4.3 For sample data to be used to draw conclusions about
proportion and confidence probability respectively. A 99/95
the population, the process of sampling and data collection
tolerance interval means at least 99 % of the population is
must be considered, at least potentially, repeatable. Descriptive
contained or covered with 95 % confidence.
statistics are calculated using real sample data that will vary in
3.1.41 type I error, n—the error of rejecting a null hypoth- repeating the sampling process.As such, a statistic is a random
esis when it is actually true.
variable subject to variation in its own right. The sample
statistic usually has a corresponding parameter in the popula-
3.1.42 type II error, n—the error of not rejecting a null
tion that is unknown (see Section 5). The point of using a
hypothesis when it is actually false.
statisticistosummarizethedatasetandestimateacorrespond-
2 2
3.1.43 variance, σ,s,n—square of the standard deviation
ing population characteristic or parameter, or to test a hypoth-
of the population or sample.
esis.
3.1.43.1 Discussion—For a finite population, σ is calcu-
latedasthesumofsquareddeviationsofvaluesfromthemean,
4.4 Descriptive statistics consider numerical, tabular, and
divided by n. For a continuous population, σ is calculated by
graphical methods for summarizing a set of data. The methods
integrating (x–µ) with respect to the density function. For a
considered in this practice are used for summarizing the
sample, s is calculated as the sum of the squared deviations of
observations from a single variable. The descriptive statistics
observedvaluesfromtheiraveragedividedbyonelessthanthe
described in this practice are: mean, median, min, max, range,
sample size.
mid range, order statistic, quartile, empirical percentile,
3.1.44 Z-score, n—observed value minus the sample mean quantile, interquartile range, variance, standard deviation,
Z-score, coefficient of variation, and skewness and kurtosis.
divided by the sample standard deviation.
4.5 Statistical inference is drawing conclusions about the
4. Significance and Use
population or its parameters. Methods for statistical inference
4.1 This practice provides approaches for characterizing a
described in this practice are: degrees of freedom, standard
sample of n observations that arrive in the form of a data set.
error, confidence intervals, prediction intervals, tolerance
Large data sets from organizations, businesses, and govern-
intervals, and statistical hypothesis tests.
mental agencies exist in the form of records and other
empirical observations. Research institutions and laboratories
4.6 Tabular methods described in this practice are: fre-
at universities, government agencies, and the private sector
quencydistribution,relativefrequencydistribution,cumulative
also generate considerable amounts of empirical data.
frequency distribution, and cumulative relative frequency dis-
4.1.1 Adata set containing a single variable usually consists
tribution.
of a column of numbers. Each row is a separate observation or
4.7 Graphical methods described in this practice are:
instance of measurement of the variable. The numbers them-
histogram,ogive,boxplot,dotplot,normalprobabilityplot,and
selvesaretheresultofapplyingthemeasurementprocesstothe
q-q plot.
variable being studied or observed. We may refer to each
observation of a variable as an item in the data set. In many
4.8 While the methods described in this practice may be
situations, there may be several variables defined for study.
used to summarize any set of observations, the results obtained
4.1.2 The sample is selected from a larger set called the
by using them may be of little value from the standpoint of
population. The population can be a finite set of items, a very
interpretation unless the data quality is acceptable and satisfies
large or essentially unlimited set of items, or a process. In a
certain requirements. To be useful for inductive generalization,
process, the items originate over time and the population is
any sample of observations that is treated as a single group for
dynamic, continuing to emerge and possibly change over time.
presentationpurposesmustrepresentaseriesofmeasurements,
Sample data serve as representatives of the population from
all made under essentially the same test conditions, on a
which the sample originates. It is the population that is of
material or product, all of which have been produced under
primary interest in any particular study.
essentially the same conditions. When these criteria are met,
we are minimizing the danger of mixing two or more distinctly
4.2 The data (measurements and observations) may be of
the variable type or the simple attribute type. In the case of different sets of data.
ϵ1
E2586 − 19
FIG. 2 Cumulative Distribution Function, F(x), and Density
FIG. 1 Probability Density Function—Four Examples of
Function, f(x) Relationship
Distribution Shape
4.8.1 If a given collection of data consists of two or more
samples collected under different test conditions or represent- 5.1.2 Agreat variety of distribution shapes are theoretically
possible. When the curve is symmetric, we say that the
ing material produced under different conditions (that is,
different populations), it should be considered as two or more distribution is symmetric; otherwise, it is asymmetric. A
distribution having a longer tail on the right side is called right
separate subgroups of observations, each to be treated inde-
pendently in a data analysis program. Merging of such skewed; a distribution having a longer tail on the left is called
subgroups, representing significantly different conditions, may left skewed.
lead to a presentation that will be of little practical value. 5.1.3 For a given density function, f(x), the relationship to
Briefly, any sample of observations to which these methods are cumulative area under the curve may be graphically shown in
applied should be homogeneous or, in the case of a process, the form of a cumulative distribution function, F(x). The
have originated from a process in a state of statistical control. function F(x) plots the cumulative area under f(x)as x moves
to the right. Fig. 2 shows a symmetric distribution with its
4.9 The methods developed in Sections 6, 7, 8, and 9 apply
density function, f(x), plotted on the left-hand axis and distri-
to the sample data. There will be no misunderstanding when,
bution function, F(x), plotted on the right-hand axis.
for example, the term “mean” is indicated, that the meaning is
5.1.4 Referring to the F(x) axis in Fig. 2, observe that
sample mean, not population mean, unless indicated otherwise.
F(30) = 0.5. The point x = 30 divides the distribution into two
It is understood that there is a data set containing n observa-
equal halves with respect to probability (50 % on each side of
tions. The data set may be denoted as:
x). In general, where F(x) = 0.5, we call the point x the median
x , x , x … x (1) th
1 2 3 n
or 50 percentile of the distribution. In like manner, we may
th th
define any percentile, for example, the 25 or the 90
4.9.1 There is no order of magnitude implied by the
percentiles. In general, for 0 < p < 1, a 100p % percentile is a
subscript notation unless subscripts are contained in parenthe-
location point, Q , that divides the distribution into two parts,
sis (see 6.7).
p
with 100p % lying to the left and (1 – p)100 % lying to the
5. Characteristics of Populations
right.
5.1 A population is the totality of a set of items under
5.2 Adensity function is often given as a equation with one
consideration. Populations may be finite or unlimited in size
ormoreparameters,which,whengivenvalues,allowthecurve
and may be existing or continuing to emerge as, for example,
to be drawn. For many distributions, two parameters are
in a process. For continuous variables, X, representing an
sufficient (some have one parameter and others have more than
essentially unlimited population or a process, the population is
two). The parameters may also have meaning with respect to
mathematicallycharacterizedbyaprobabilitydensityfunction,
the shape of the curve, the scale used, or some other property
f(x). The density function visually describes the shape of the
of the curve.
distribution as for example in Fig. 1. Mathematically, the only
5.2.1 The mean or “expected value” of a distribution,
requirements of a density function are that its ordinates be all
denoted by the symbol µ, is a parameter that defines the central
positive and that the total area under the curve be equal to 1.
location of a distribution. The mean can be thought of as a
5.1.1 Area under the density function curve is equivalent to
“centerofgravity”forthedistribution.Whenthedistributionis
th
probabilityforthevariableX.TheprobabilitythatXshalloccur
symmetric, the mean will coincide with the 50 percentile and
between any two values, say s and t, is given by the area under
occur exactly in the center, splitting the area under the curve
the curve bounded by the two given values of s and t. This is
into two equal halves of 0.5 each. For right-skewed
expressedmathematicallyasadefiniteintegraloverthedensity
function between s and t:
t
In the same way a straight line, y = mx + b, has “parameters” referred to as the
slope, m,and y-intercept, b.Oncetheseparametersareknown,thelineiscompletely
P ~s,X # t! 5 f~x!dx (2)
*
s known and may be drawn precisely.
ϵ1
E2586 − 19
TABLE 1 Areas Under the Curve for the Normal Distribution
Interval Area Interval Area
µ±1σ 0.68270 µ ± 0.674σ 0.50
µ±2σ 0.95450 µ ± 1.645σ 0.90
µ±3σ 0.99730 µ ± 1.960σ 0.95
µ±4σ 0.99994 µ ± 2.576σ 0.99
general shape of a distribution. Two such quantities are
skewness and kurtosis. For a continuous variable, X, skewness
3 3
is defined as the average value of the quantity (X–µ) /σ , and
4 4
kurtosis as the average value of the quantity (X–µ) /σ ,
minus 3. Each of these calculations is taken over the popula-
FIG. 3 Normal Distribution and Relationship to
tion. The symbols used for the theoretical skewness and
Parameters µ and σ
kurtosis are γ and γ , respectively. For a population specified
1 2
by a density function, f(x), the theoretical skewness and
kurtosis are defined mathematically as:
distributions,themeanwilloccurtotherightofthemedian;for
`
left-skewed distributions, the mean will occur to the left of the
x 2 µ f x dx
* ~ ! ~ !
median.
2`
5.2.2 The standard deviation, denoted by the symbol σ,is
γ 5 (5)
σ
another important parameter in many distributions. It carries
`
the same units as the variable X, and is also called a scale
x 2 µ f x dx
* ~ ! ~ !
parameter. Generally, it is a standard measure of variability.
2`
γ 5 23 (6)
Thelargerthevalueof σ,thegreaterwillbethevariationinthe 2 4
σ
variable X. One of the most important theoretical distributions
5.3.1 Here again, the variable X is assumed to take on all
in statistics is the normal, or Gaussian, distribution. It arises in
values in the interval (-∞,+∞).
complex phenomena when many uncontrolled factor effects
5.3.2 When a distribution is perfectly symmetric, γ =0.
cause variability and no single effect is of dominating magni-
This is the case for the normal distribution in Fig. 3.Ifthe
tude. The normal distribution is a symmetrical, bell-shaped
distribution has a longer tail on the right, we say that it is right
curve and is completely determined by its mean, µ, and its
skewed and γ > 0 as in Fig. 4. If the distribution has a longer
standard deviation, σ. The parameter µ locates the center, or 1
tailontheleft,wesaythatitisleftskewedand γ < 0asinFig.
peak, of the distribution, and the parameter σ determines its
5.
spread. The distance from the mean to the inflection point of
5.3.3 For the normal distribution (Fig. 3), γ = 0. The large
the curve (maximum slope point) is σ.This is illustrated in Fig.
baseofapplicationsforthenormaldistributionisthereasonfor
3.
subtracting 3 in the definition of kurtosis. Subtracting of 3
5.2.3 Theprobabilityofobtainingavalueinagiveninterval
from (6) makes γ = 0 for the normal distribution. For any
on the measurement scale is the area under the curve over the 2
distribution the quantity γ cannot be less than –2 (1). Several
interval. This gives some numerical meaning to the parameter
examples of skewness and kurtosis as related to specific
σ. Table 1 gives the normal probability for several selected
distributions are given in Table 2.
intervals in terms of parameters µ and σ. The first two columns
5.3.4 Table 2 shows that there is great variation in both
in Table 1 are known as the empirical rule for symmetric and
skewness and kurtosis for several commonly occurring distri-
mound-shaped distributions.
butions. Also, for some distributions such as the normal,
5.2.4 The variance of a distribution, σ , is the square of the
exponential, and uniform, skewness and kurtosis are constant
standard deviation. It is the average value of the quantity
and not dependent on the value of any other parameter; for
(X–µ) in the population. It is the variance that is computed
others, however, skewness and kurtosis are a function of some
first, and then the standard deviation is the positive square root
other parameter. Here we see that for the Poisson distribution,
of the variance. For a population specified by a density
both γ and γ are functions of the mean, λ. For the Weibull
function, f(x), the theoretical mean and variance are defined 1 2
distribution, both γ and γ are functions of the Weibull shape
1 2
mathematically as:
parameter β.
`
µ 5 xf x dx (3) 5.4 Statistics is the study of the properties, behavior, and
* ~ !
2`
treatment of numerical data. A statistic may be defined as any
`
function of the data values that originate from a sample. In
2 2
σ 5 x 2 µ f x dx (4)
* ~ ! ~ !
many applications in which one has a specific model in mind,
2`
the initial goal is to try to estimate the population (model)
5.2.5 Here the variable X is assumed to take on all values in
parameters using the sample data. These estimates are called
the interval (-∞,+∞), but this need not be the case.
5.3 In addition to the mean and standard deviation, mea-
The boldface numbers in parentheses refer to a list of references at the end of
sures may be theoretically defined that attempt to describe the this standard.
ϵ1
E2586 − 19
n
X
( i
i51
x¯ 5 (7)
n
th
6.2 Median or 50 Percentile—The median is a measure of
centrality or central tendency that is generally not affected by
the extremes of the distribution. It is a value that divides the
distribution into two equal parts. For continuous distributions,
50 % will lie to the left and 50 % to the right of the median.To
th
obtain the 50 percentile of a sample, arrange the n values of
a sample in increasing order of magnitude. The median is the
FIG. 4 Curve with Positive Skewness, γ >0
th
[(n + 1)⁄2] value when n is odd. When n is even, the median
th th
lies between the (n/2) and the [(n/2)+1] values and is not
defined uniquely among the data values. It is then taken to be
the arithmetic average of these two values.
6.2.1 As a measure of central tendency, the median is often
preferred over the average, particularly for quantities that tend
to be skewed in a natural way. Examples include life length of
a product, salary, and other monetary quantities or any quantity
that has a natural lower or upper bound.
6.3 Midrange—Midrange is a measure of central tendency.
It is the average of the largest (max) and smallest (min)
FIG. 5 Curve with Negative Skewness, γ <0 observed values in a sample of n items. It is greatly affected by
any outliers in the data set.
TABLE 2 Skewness and Kurtosis for Selected Distribution Forms
6.4 Max—Thelargestobservedvalueinasampleof nitems.
Distribution Form Skewness Kurtosis
6.5 Min—The smallest observed value in a sample of n
Normal 0 0
Exponential 2 6
items.
Uniform 0 –1.2
A
Poisson 1/=λ 1/λ
6.6 Range—The difference, R, between the largest and
B
Student’s t 06/(v –4)
smallest observed value in a sample of n items is called the
C
Weibull , β = 3.6 0 –0.28
samplerangeandisusedasameasureofvariation.Itsequation
Weibull, β = 0.5 6.62 84.72
Weibull, β = 50.0 –1 1.9
is:
A
For the Poisson distribution, λ is the mean.
R 5 max x 2 min x (8)
~ ! ~ !
B
For the Student’s t distribution, v is the degrees of freedom. When v # 4, kurtosis
is infinite.
C 6.6.1 The sample range is useful for assessing variation for
For the Weibull distribution, β is the shape parameter.
twobasicreasons:(1)itiseasytocalculate,and(2)itisreadily
understood. But caution is advised when the sample size is
modest to large as the min and max then come from the tails of
descriptive statistics. For example, the sample mean and
the distribution and can be extremely variable. The sample
standard deviation are attempting to estimate the parameters µ
range is therefore directly affected by extreme values. In
and σ, sample skewness and kurtosis are attempting to estimate
general, the standard deviation of a sample is the preferred
γ and γ , and sample percentiles may be calculated that are
1 2
measure of variation (see 6.12).
attempting to estimate population percentiles. In some cases,
6.6.2 The range is particularly useful for small samples, say
there may be more than one statistic that may be used for the
when n = 2 to 12 and there is possibly the burden of
same purpose.
calculation, as the standard deviation is more calculation
5.4.1 In addition to estimation, descriptive statistics serve to
intensive and abstract. An important application occurs when
organize and give meaning to the raw sample data. By itself a
the range is used in quality control applications. For a given
set of numbers in columnar format may yield little useful
sample size, the sample range can be converted into an
information. The methods of descriptive statistics include
estimate of the standard deviation.This is done by dividing the
numerical, tabular, and graphical methods that will lead to
range or average range in a group of ranges, by a constant (2),
great insight for the underlying phenomena being studied.
d , which is the ratio of expected range in a sample of size n to
6. Descriptive Statistics
standard deviation for a normal distribution. Table 3 contains
values of d for sample sizes of 2 through 16.
6.1 Mean or Arithmetic Average—The mean is a measure of 2
centrality or central tendency of a distribution of observations. 6.6.3 An important application of this type of estimate for
It is most appropriate for symmetric distributions and is the standard deviation is in quality control charts. When there
affected by distribution nonsymmetry (shape) and extreme are available several sample ranges, all with the same sample
values. The calculation of the mean is the sum of the n sample size, n,wetaketheaveragerangeanddividebytheappropriate
values divided by the number of values, n. This equation is: constant, d , from Table 3.
ϵ1
E2586 − 19
TABLE 3 Values of the Constant, d , for Converting the Sample
6.8.2.1 Example—For a sample of size 20, to estimate the
A
Range into an Estimate of Standard Deviation th
15 percentile. Calculate (n+1)p = 21(0.15) = 3.15, so k=3
th
n d nd nd
2 2 2
and r = 0.15. The 15 percentile is estimated as x + 0.15(x
(3) (4)
2 1.128 7 2.704 12 3.258
– x ).
(3)
3 1.693 8 2.847 13 3.336
th
4 2.059 9 2.970 14 3.407
6.9 Quartile—The0.25quantileor25 percentile,Q ,isthe
5 2.326 10 3.078 15 3.472 st th
1 quartile.The 0.75 quantile or 75 percentile, Q , is the third
6 2.534 11 3.173 16 3.532
th nd
quartile.The 50 percentile or Q , is the 2 quartile. Note that
A
Source: ASTM Manual on Presentation of Data and Control Chart Analysis (2).
th
the 50 percentile is also referred to as the median.
rd
6.10 Interquartile Range—The difference between the 3
st
and 1 quartiles is denoted as IQR:
6.7 Order Statistics—Whentheobservationsinasampleare
IQR 5 Q 2 Q (12)
3 1
arranged in order of increasing magnitude, the order statistics
6.10.1 The IQR is sometimes used as an alternative estima-
are:
tor of the standard deviation by dividing by an appropriate
x # x # x # … x # x (9)
~1! ~2! ~3! ~n21! ~n!
constant. This is particularly true when several outlying obser-
vations are present and may be inflating the ordinary calcula-
6.7.1 The bracketed subscript notation indicates that the
th
tion of the standard deviation. The dividing constant will
value is an ordered value. Thus, x is the k largest value in
(k)
th
depend on the type of distribution being used. For example, in
n called the k order statistic of the sample. This value is said
a normal distribution, the IQR will span 1.35 standard devia-
to have a rank of k among the sample values. In a sample of
tions; then dividing the sample IQR by 1.35 will give an
size n, the smallest observation is x and the largest observa-
(1)
estimate of the standard deviation when a normal distribution
tion is x . The sample range may then be defined in terms of
(n)
st th
is used.
the 1 and n order statistics:
R 5 x 2 x (10) 6.11 Variance—Ameasureofvariationamongasampleof n
~n! ~1!
items, which is the sum of the squared deviations of the
6.8 Empirical Quantiles and Percentiles—A quantile is a
observations from their average value, divided by one less than
value that divides a distribution to leave a given fraction, p,of
the number of observations. It is calculated using one of the
the observations less than or equal to that value (0 < p < 1). A
two following equations:
percentile is the same value in which the fraction, p,is
n n n 2
expressed as a percent, 100p %. For example, the 0.5 quantile
2 2
~x 2 x¯! n x 2 x
S D
th ( 1 ( i ( i
or 50 percentile (also called the median) is a value such that i51 i51 i51
s 5 5 (13)
halfoftheobservationsexceeditandhalfarebelowit;the0.75 n 2 1 n~n 2 1!
th
quantile or 75 percentile is a value such that 25 % of the
6.12 Standard Deviation—The standard deviation is the
observations exceed it and 75 % are below it; the 0.9 quantile
positivesquarerootofthevariance. Thesymbolis s.Itisused
th
or 90 percentile is a value such that 10 % of the observations
to characterize the probable spread of the data set, but this use
exceed it and 90 % are below it.
is dependent on distribution shape. For mound-shaped distri-
6.8.1 The sample estimate of a quantile or percentile is an
butions that are symmetric, such as the normal form, and
order statistic or the weighted average of two adjacent order
modesttolargesamplesize,wemayusethestandarddeviation
th
statistics. The i order statistic in a sample of size n is the
in conjunction with the empirical rule (see Table 1). This rule
th 6
i/(n + 1) quantile or 100i/(n+1) percentile estimate. The
states that approximately 68 % of the data will fall within one
th
quantityi/(n + 1)isreferredtoasthemeanrankforthei order
standard deviation of the mean; 95 % within two standard
statistic. In repeated sampling, the expected fraction of the
deviations, and nearly all (99.7 %) within three standard
th
population lying below the i order statistic in the sample is
deviations. The approximations improve when the sample size
equal to i/(n + 1) for any continuous population.
is very large or unlimited and the underlying distribution is of
th
6.8.2 Toestimatethe100p percentile,computeanapproxi-
the normal form. The rule is applied to other symmetric
mate rank value using the following equation: i=(n+1)p.If i
mound-shaped distributions based on their resemblance to the
th
is an integer between 1 and n inclusive, then the 100p
normal distribution.
percentileisestimatedasx .Ifiisnotaninteger,thendropthe
(i)
6.13 Z-Score—In a sample of n distinct observations, every
fractional portion and keep the integer portion of i. Let k be the
sample value has an associated Z-score. For sample value, x,
i
retained integer portion and r be the dropped fractional portion
th the associated Z-score is computed as the number of standard
(note that 0 < r < 1). The estimated 100p percentile is com-
deviations that the value x lies from the sample mean. Positive
i
puted from the equation:
Z-scores mean that the observation is to the right of the
x 1r x 2 x (11)
~ !
k k11 k
~ ! ~ ! ~ ! average;negativevaluesmeanthattheobservationistotheleft
of the average. Z-scores are calculated as:
Several alternatives to the mean rank equation i/(n + 1) are available (3),
including the median rank and Kaplan-Meier methods. A equation for the exact These equations are algebraic equivalents, but the second form may be subject
median rank is available but is computationally intensive. The Benard approxima- to round off error.
tion equation to the median rank, (i – 0.3) ⁄(n + 0.4), is widely used. The modified When the denominator of the sample variance is taken as n instead of n – 1, the
Kaplan-Meier equation is (i – 0.5) ⁄n. square root of this quantity is called the root mean squared deviation (RMS).
ϵ1
E2586 − 19
TABLE 4 Maximum Z-Scores Attainable for a Selected Sample
6.16.2 Alternative estimates of skewness and kurtosis are
Size, n
defined in terms of k-statistics. The k-statistic equations have
n 3 5 10 11 15 18
the advantage of being less biased than the corresponding
Z(n) 1.155 1.789 2.846 3.015 3.615 4.007
moment estimators. These statistics are defined by:
n
n x 2 x¯
~ !
i
(
i51
k 5 x¯, k 5 s , k 5 (17)
1 2 3
n 2 1 n 2 2
~ !~ !
x 2 x¯
~ !
i
n n 2
Z 5 (14)
i
4 2
s
n n11 x 2 x¯ 3 x 2 x¯
~ ! ~ ! S ~ ! D
( i ( i
i51 i51
k 5 2 (18)
6.13.1 Sample Z-scores are often useful for comparing the
~n 2 1!~n 2 2!~n 2 3! ~n 2 2!~n 2 3!
relative rank or merit of individual items in the sample.
6.16.3 From the k-statistics, sample skewness and kurtosis
Z-scores are also used to help identify possible outliers in a set
are calculated from Eq 19. Notice than when n is large, g and
of data. There is a much-used rule of thumb that a Z-score
g reduce to approximately:
outside the bounds of 63 is a possible outlier to be examined
1.5 2
for a special cause. Care should be exercised when using this
g 'k /k , g 'k /k (19)
1 3 2 2 4 2
rule, particularly for very small as well as very large sample
6.16.4 One cannot definitely infer anything about the shape
sizes. For small sample sizes, it is not possible to obtain a
of a distribution from knowledge of g unless we are willing to
Z-score outside the bounds of 63 unless n is at least 11. Eq 15
assume some theoretical distribution such as the Pearson or
and Table 4 illustrates this theory:
other distribution family provides.
Z # ~n 2 1!/=n (15)
? i?
7. Statistical Methods and Inference
6.13.2 Table 4 was constructed using the equation for the
maximum (contained in Ref (4)).
7.1 Degrees of Freedom:
6.13.3 On the other hand, for very large sample sizes, such
7.1.1 The term ‘degrees of freedom’is used in several ways
as n = 250 or more, it is a common occurrence in practice to
in statistics. First, it is used to denote the number of items in a
findatleastoneZ-scoreoutsidetherangeof 63.Wherewecan
sample that are free to vary and not constrained in any way
claim a normal distribution is the underlying model, the
when estimating a parameter. For example, the deviations of n
approximate probability of at least one Z-score beyond 63is
observations from their sample average must of necessity sum
approximately 50 % when the sample size is around 250. At
to zero. This property, that Σ y 2 y¯ 50, constitutes a linear
~ !
n = 300, it is approximately 55 %.Athorough treatment of the
constraint on the sum of the n deviations or residuals y
use of the sample Z-score for detecting possible outlying
2y¯,y 2y¯,., y 2y¯ used in calculating the sample variance,
2 n
2 2
observations may be found in Practice E178.
s 5Σ~y 2 y¯! ⁄~n 2 1!. When any n–1 of the deviations are
known, the nth is determined by this constraint – thus only n–1
6.14 Coeffıcient of Variation—For a non-negative
of the n sample values are free to vary. This implies that
characteristic, the coefficient of variation is the ratio of the
knowledge of any n–1 of the residuals completely determines
standard deviation to the average.
the last one. The n residuals, y 2y¯ , and hence their sum of
6.15 Skewness, g —Skewness is a measure of the shape of
1 2 2
squares Σ y 2 y¯ and the sample variance Σ y 2 y¯ ⁄ n 2 1
~ ! ~ ! ~ !
i
a distribution. It characterizes asymmetry or skew in a distri-
aresaidtohave n–1 degrees of freedom.Thelossofonedegree
bution. It may be positive or negative. If the distribution has a
of freedom is associated with the need to replace the unknown
longer tail on the right side, the skewness will be positive; if
population mean µ by the sample average y¯. Note that there is
the distribution has a longer tail on the left side, the skewness
no requirement that Σ~y 2 µ!50 . In estimating a parameter,
i
will be negative. For a distribution that is perfectly
such as a variance as described above, we have to estimate the
symmetrical, the skewness will be equal to 0; however, if the
mean µ using the sample average y¯. In doing so, we lose 1
skewness is equal to 0, this does not imply that the distribution
degree of freedom.
is symmetric.
7.1.1.1 More generally, when we have to estimate k
6.16 Kurtosis, g —Kurtosis is a measure of the combined
parameters, we lose k degrees of freedom. In simple linear
weight of the tails of a distribution relative to the rest of the
regression where there are n pairs of data (x, y) and the
i i
distribution.
problem is to fit a linear model of the form y5mx1b through
6.16.1 Sample skewness and kurtosis are given by the
the data, there are two parameters (m and b) that must be
equations:
estimated, and we effectively lose 2 degrees of freedom when
n
calculating the residual variance. The concept is further ex-
x 2 x¯
~ !
i 4
(
x 2 x¯
i51 ~ !
tended to multiple regression where there are k parameters that
( i
g 5 , g 5 23 (16)
1 3 2 4
ns ns
must be estimated and to other types of statistical methods
where parameters must be estimated.
7.1.2 Degrees of freedom are also used as an indexing
For example, an F distribution having four degrees of freedom in the
variableforcertaintypesofprobabilitydistributionsassociated
denominator always has a theoretical skewness of 0, yet this distribution is not
symmetric. Also, see Ref (5), Chapter 27, for further discussion. with the normal form. There are three important distributions
ϵ1
E2586 − 19
that use this concept: the Student’s t and chi-square distribu- be a nonconforming unit. Often, the population being sampled
tions both use one parameter in their definition. The parameter is conceptual—that is, a process with some unknown noncon-
in each case is referred to as its “degrees of freedom.” The F forming fraction, p.
distributionrequirestwoparameters,bothofwhicharereferred
7.2.1.1 If an indicator variable, X, is defined as X = 1 when
to as “degrees of freedom.” In what follows we assume that the unit is nonconforming and 0 if not, then the statistic of
there is a process in statistical control that follows a normal
interest may be defined as:
distribution with mean µ and standard deviation σ.
n
X
7.1.2.1 Student’s t Distribution—For a random sample of ( i
i51
pˆ 5 (23)
size n where y¯ and s are the sample mean and standard
n
deviation respectively, the following has a Student’s t distribu-
7.2.1.2 In some applications, such as in quality control,
tion with n–1 degrees of freedom:
there are k samples each of size n. Each sample gives rise to a
x¯ 2 µ
separate estimate of p. Then the statistic of interest may be
t 5 (20)
s⁄=n
defined as:
k
The t distribution is used to construct confidence intervals
p
( i
for means when σ is unknown and to test a statistical i51
p¯ 5 (24)
k
hypothesis concerning means, among other uses.
7.1.2.2 The Chi-Square Distribution—For a random sample
7.2.1.3 The bar over the “p” indicates that this is an average
of size n where s is the sample standard deviation, the
of the sample fractions which estimates the unknown probabil-
following has a chi-square distribution with n–1 degrees of
ity p. The binomial distribution is the basis of the p and np
freedom:
charts found in classical quality control applications.
n 2 1 s 7.2.2 Case 2—Poisson Simple Count Data—If an inspec-
~ !
q 5 (21)
σ tion process counts the number of nonconformities or “events”
over some fixed inspection area (either a fixed volume, area,
The chi-square distribution is used to construct a confidence
time, or spatial interval), the estimate of the mean is identical
interval for an unknown variance; in testing a hypothesis
to the equation in 6.1. We refer to this as the estimate of the
concerning a variance; in determining the goodness of fit
mean number of events expected to occur within the interval,
between a set of sample data and a hypothetical distributio
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...