ASTM E2310-04
(Guide)Standard Guide for Use of Spectral Searching by Curve Matching Algorithms with Data Recorded Using Mid-infrared Spectroscopy
Standard Guide for Use of Spectral Searching by Curve Matching Algorithms with Data Recorded Using Mid-infrared Spectroscopy
ABSTRACT
This guide presents the use of spectral searching by curve matching search algorithms for data recorded using mid-infrared spectroscopy. The methods described herein may be applicable to the use of these algorithms for other types of spectroscopic data, but each type of data search should be assessed separately. The purpose of this evaluation is the classification and, where possible, identification of the unknown. Spectral searching is intended as a screening method to assist the analyst, and is not an absolute identification technique, and hence, not intended to replace an expert in infrared spectroscopy and should not be used without suitable training. The Euclidean distance algorithm and the first derivative Euclidean distance algorithm are described and their use discussed. The theory and common assumptions made when using search algorithms are also discussed, along with guidelines for the use and interpretation of the search results.
SCOPE
1.1 Spectral searching is the process whereby a spectrum of an unknown material is evaluated against a library (database) of digitally recorded reference spectra. The purpose of this evaluation is classification of the unknown and, where possible, identification of the unknown. Spectral searching is intended as a screening method to assist the analyst and is not an absolute identification technique. Spectral searching is not intended to replace an expert in infrared spectroscopy. Spectral searching should not be used without suitable training.
1.2 The user of this document should be aware that the results of a spectral search can be affected by the following factors described in Section : (1) Baselines, (2) sample purity, (3) Absorbance linearity (Beers Law), (4) sample thickness, (5) sample technique and preparation, (6) physical state of the sample, (7) wavenumber range, (8) spectral resolution, and (9) choice of algorithm.
1.2.1 Many other factors can affect spectral searching results.
1.3 The scope of this document is to provide a guide for the use of search algorithms for mid-infrared spectroscopy. The methods described herein may be applicable to the use of these algorithms for other types of spectroscopic data, but each type of data search should be assessed separately.
1.4 The Euclidean distance algorithm and the first derivative Euclidean distance algorithm are described and their use discussed. The theory and common assumptions made when using search algorithms are also discussed, along with guidelines for the use and interpretation of the search results.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation:E2310–04
Standard Guide for
Use of Spectral Searching by Curve Matching Algorithms
with Data Recorded Using Mid-infrared Spectroscopy
This standard is issued under the fixed designation E 2310; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (e) indicates an editorial change since the last revision or reapproval.
1. Scope E 334 Practice for General Techniques of Infrared Mi-
croanalysis
1.1 Spectral searching is the process whereby a spectrum of
E 573 Practices for Internal Reflectance Spectroscopy
an unknown material is evaluated against a library (database)
E 1252 Practice for General Techniques of Qualitative In-
of digitally recorded reference spectra. The purpose of this
frared Analysis
evaluation is classification of the unknown and, where pos-
E 1642 Practice for General Techniques of Gas Chromatog-
sible, identification of the unknown. Spectral searching is
raphy Infrared (GC/IR) Analysis
intended as a screening method to assist the analyst and is not
E 2105 Practice for General Techniques of Thermogravi-
an absolute identification technique. Spectral searching is not
metric Analysis (TGA) Coupled with Infrared Analysis
intended to replace an expert in infrared spectroscopy. Spectral
(TGA/IR)
searching should not be used without suitable training.
E 2106 Practice for General Techniques of Liquid
1.2 The user of this document should be aware that the
Chromatography—Infrared (LC/IR) and Size Exclusion
results of a spectral search can be affected by the following
Chromatography—Infrared (SEC/IR)
factors described in Section 5: (1) Baselines, (2) sample purity,
(3) Absorbance linearity (Beer’s Law), (4) sample thickness,
3. Terminology
(5) sample technique and preparation, (6) physical state of the
3.1 Definitions—For general definitions of terms and sym-
sample, (7) wavenumber range, (8) spectral resolution, and (9)
bols, refer to Terminology E 131.
choice of algorithm.
3.1.1 reference spectrum—an established spectrum of a
1.2.1 Many other factors can affect spectral searching re-
known compound or chemical sample.
sults.
3.1.1.1 Discussion—This spectrum is typically stored in
1.3 The scope of this document is to provide a guide for the
retrievable format so that it may be compared against the
use of search algorithms for mid-infrared spectroscopy. The
sample spectrum of an analyte.
methods described herein may be applicable to the use of these
3.1.1.2 Discussion—This term has sometimes been used to
algorithms for other types of spectroscopic data, but each type
refer to a background spectrum; such usage is not recom-
of data search should be assessed separately.
mended.
1.4 TheEuclideandistancealgorithmandthefirstderivative
3.1.2 spectral searching—the process whereby a spectrum
Euclidean distance algorithm are described and their use
of an unknown material is evaluated against a library of digital
discussed. The theory and common assumptions made when
reference spectra. Each reference spectrum in the library is
using search algorithms are also discussed, along with guide-
individually compared to the spectrum of the unknown, and
lines for the use and interpretation of the search results.
assignedanumericalvalueastothegoodnessoffit.Toperform
2. Referenced Documents this comparison, each data point in the unknown spectrum is
compared to each corresponding point in the reference spec-
2.1 ASTM Standards:
trum.
E 131 Terminology Relating to Molecular Spectroscopy
3.1.3 peak searching—the process whereby the peak table
of the spectrum of an unknown material is evaluated against a
library of peak tables. Each reference spectrum in the library
This guide is under the jurisdiction of ASTM Committee E13 on Molecular
contains a peak table and the peak table is individually
Spectroscopy and is the direct responsibility of Subcommittee E13.03 on Infrared
compared to the peak table of the unknown, and assigned a
Spectroscopy.
Current edition approved Feb. 1, 2004. Published Feb. 2004.
numerical value as to the goodness of fit.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
3.1.4 spectral library—a collection of reference spectra
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
stored in a computer readable form, also called a library,
Standards volume information, refer to the standard’s Document Summary page on
the ASTM website. database, or spectral database.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
E2310–04
3.1.5 search algorithm—the mathematical formula used to one or two strong bands and a few medium intensity bands, the
make a point-by-point comparison of two spectra. range of the spectrum must be reselected or the spectrum will
3.1.6 hit quality value—the spectral search software com- be dominated by the strong bands in the spectrum and the HQI
pares each spectrum in the database to that of the unknown, will be insensitive to weaker fingerprint bands necessary for
and assigns a numeric value for each library entry demonstrat- identification of a specific compound. Successful compound
ing how similar the two spectra are. identification may require the spectral match exclude the
3.1.6.1 Discussion—There are several methods for assign- strongest bands, then the normalization will be based on a
ing Hit Quality values and either a high or low value can be medium intensity band, and weak fingerprint bands will be
assignedasthebestmatch.Refertothesoftwaremanufacturers emphasized in the HQI.
documentation. 5.2 Data Point Matching:
3.1.7 hit quality index (HQI)—a table which ranks the 5.2.1 The algorithms used for searching a spectrum against
library spectra in the database according to their Hit Quality a library use a calculation that mathematically compares the
values (see 7.5). data points of the spectrum being searched to the data points of
3.1.8 Euclidean Distance algorithm—the Euclidean Dis- the spectra in the library. This requires that the data points in
tance algorithm measures the Euclidean distance between each both the sample and library spectra occur at the same fre-
library spectrum and the unknown spectrum by treating the quency. If the data points in the sample and library spectra are
spectra as normalized vectors. The closeness of the match, or, not aligned in this manner, then one of the spectra must be
HQI, is calculated from the square root of the sum of the mathematically altered (interpolated) to make the data points
squares of the difference between the vectors for the unknown match. Typically the unknown spectrum being searched is
spectrum and each library spectrum. altered to match the data point spacing of the spectra in the
3.1.9 First Derivative Euclidean Distance algorithm—in library.
the First Derivative Euclidean Distance algorithm the Euclid- 5.2.2 Data point matching is commonly accomplished using
ean distance is also computed, except the derivative of each a linear data point interpolation method. In this method, the
spectrum is calculated prior to the Euclidean distance calcula- slope and offset of a line segment is calculated between the
tion. absorbancesofeverypairofdatapointsinthespectrum.Anew
3.1.10 normalization—the mathematical technique used to set of absorbances is calculated by locating the values that
compensate for an intensity difference between two spectra occur on the line segments at positions corresponding to the
(see 5.1). datapoint frequency of the library spectrum.
4. Theory
6. Conditions or Issues Affecting Results
4.1 Beer’s Law—One of the basic principles that make
6.1 Spectral quality is one of the primary conditions or
spectral searching possible is Beer’s Law (see Terminology
issues that can affect search results. There is no substitute for
E 131), which states that A = abc, where A is the absorbance,
a carefully recorded spectrum. There are several conditions or
a is the absorptivity, b is the sample pathlength, and c is the
issues that affect spectral quality as pertains to spectral search-
concentration of the analyte of interest.As long as Beer’s Law
ing. These conditions or issues apply to both the spectra used
applies, two spectra of the same material recorded under
to create the reference database and to the unknown spectrum.
similar conditions can be made to appear the same by normal-
6.2 Baselines:
ization of the data.
6.2.1 A flat baseline is preferred for the Euclidean distance
algorithm as the Euclidean distance algorithm compares each
NOTE 1—In an ideal case, this is true for transmittance spectra, but
data point in the unknown spectrum to the corresponding data
there are differences in the spectral peak intensities when reflectance
spectra are compared to transmittance spectra. point in the reference spectrum. The effect of an offset or slope
in the baseline is interpreted as a difference between the two
5. Spectral Data Pre-Treatment
spectra. Therefore, when a spectrum with a sloping baseline or
5.1 Normalization:
offset is evaluated using the Euclidean distance algorithm, a
5.1.1 Normalization of spectra compensates for the differ-
simple baseline correction should be used.
ences in sample quantity (concentration or pathlength, or both)
NOTE 2—Negative bands can also produce an offset in the baseline as
used to generate the reference spectra in the library and that of
a result of the data normalization process.
the unknown. The spectra are normalized over the complete
6.2.2 The first derivative Euclidean distance algorithm
spectral range of the library. When searching less than the full
minimizes the effect of an offset or sloping baseline. In this
spectral range of the library, the spectra must be re-normalized
algorithm, the comparison is made between the difference of a
over the new range before an accurate comparison can be
pair of adjacent points in the unknown spectrum to the
made. Normalization of a spectrum for library searching is a
difference between the corresponding pair of adjacent points in
two step process. First, the minimum absorbance value in the
the reference spectrum. In effect, this causes the first derivative
selectedspectralrangeissubtractedfromalltheabsorbancesin
Euclidean distance algorithm to look only at the differences in
the same range. The resulting values are then scaled by
the slope of adjacent data points between the two spectra. Fig.
dividing by the maximum result value in the range. The end
1 shows how the two algorithms view the same two spectra.
result is a spectrum (or a sub-range portion of a spectrum)
where the minimum value is zero (0) and the maximum is one
NOTE 3—The first derivative algorithm converts a sloping baseline into
(1) absorbance. If the range chosen for normalization has only an offset that is then eliminated by the normalization procedure.
E2310–04
The bottom two spectra demonstrate the results of the 1st derivative of a spectrum with a sloping baseline as compared to a spectrum with a flat baseline.
The two spectra in the bottom trace are almost completely overlapped.
FIG. 1
6.3 Sample Purity: 6.3.2.5 Solvent bands from samples run in solution, and
6.3.1 The physical state of the sample should be as close as 6.3.2.6 Bands from solvents left over from an extraction or
possible to the physical state of the reference materials used to
from casting a film from a solution.
obtain the library. For example, a pure liquid sample would
NOTE 4—Retain spectra of any solvents used, so that bands due to the
ideally be searched against a library of spectra of only liquid
solvent can be identified in the spectrum of the unknown.
reference materials. A sample which is probably a mixture,
NOTE 5—If the solvent bands in a region of the spectrum cannot be
such as a commercial formulation, should be compared to a
removed from the spectrum (by either re-recording the spectrum, using an
library of commercial formulations.
uncontaminated sample, or by spectral subtraction using the solvent
6.3.2 Insomecasesthenatureofthesamplemaynotbewell
reference spectrum), then that region of the spectrum should be excluded
during a search. It is not sufficient to remove the offending bands digitally
understood. An unknown sample may be a pure material or a
by drawing a straight line through the region before the search.The search
mixture.Itmayhaveadditionalcontaminantsthatwillaffectits
algorithm will calculate a poor match in this region for any reference
spectrum by adding spurious bands. In addition there are
spectrum containing features in the region. It should be realized that the
several other sources of spurious spectral features that may
removal of the solvent bands may also remove underlying features in the
appearaseitherpositiveornegativebands.Severaloftheseare
sample spectrum.
listed below:
6.4 Absorbance Linearity (Beers Law):
6.3.2.1 Features due to variations in the carbon dioxide or
6.4.1 A spectrum recorded using good practices (see Prac-
water vapor levels in the optical path,
tices E 334, E 1252, E 1642, E 2105, and E 2106) should
6.3.2.2 Bands from a mulling agent,
follow Beer’s Law, and so maintain the relative absorbance
6.3.2.3 Halide salts used as window material and as the
intensities of its bands, independently of sample thickness. As
diluent for both pellets and diffuse reflection analysis often
long as this ratio between the bands is maintained, the spectra
contain contaminants such as adsorbed water, hydrocarbon and
can be normalized and a good comparison between spectra can
nitrates. Always use dry halide salts and keep unused halide
be made. For a spectrum to meet this requirement, each ray of
salts in a desiccator,
light of a given frequency must pass through the same amount
6.3.2.4 Water can alter the spectrum of the sample from its
of sample. There are at least two general cases where this may
dry state. Spectra of inorganic samples with waters of hydra-
not happen.
tion are particularly sensitive to adsorbed water,
6.4.1.1 One case occurs when there is an uneven thickness
of sample in the beam. For example, if the sample is wedge
shaped in thickness, or irregular in shape, some rays of light
Coleman, Patricia B., Practical Sample Techniques for Infrared Analysis, CRC
Press, FSBN# 0849342031: 8/26/93. pass through the thin part and some rays pass through the
E2310–04
thicker part of the wedge. A similar concern arises when addition, the relative band intensities become highly distorted
makingKBrpelletsforanalysis.Unlessthepowderiscarefully when normali
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.