ASTM E2077-00(2005)
(Specification)Standard Specification for Analytical Data Interchange Protocol for Mass Spectrometric Data
Standard Specification for Analytical Data Interchange Protocol for Mass Spectrometric Data
ABSTRACT
This specification covers an analytical data interchange protocol for mass spectrometric data representation and a software vehicle to affect the transfer of mass spectrometric data between instrument data systems. This specification does not provide for the storage of data acquired simultaneous to and integrated with the mass spectrometric data, but on other detectors. The protocol, which is designed to benefit users of analytical instruments and increase laboratory productivity and efficiency, provides a standardized format for the creation of raw data files, library spectrum files or results files. This file, which has a ".cdf" extension, contains typical header information like instrument, sample, and acquisition method description, followed by raw, library, or processed data. Once data have been written or converted to this protocol, they can be read and processed by software packages that support the protocol. This protocol is intended to perform the following functions: (1) transfer data between various vendors' instrument systems; (2) provide Laboratory Information Management Systems (LIMS) communications; (3) link data to document processing applications; (4) link data to spreadsheet applications, and (5) archive analytical data, or a combination thereof.
SCOPE
1.1 This specification covers a standardized format for mass spectrometric data representation and a software vehicle to effect the transfer of mass spectrometric data between instrument data systems. This specification provides a protocol designed to benefit users of analytical instruments and increase laboratory productivity and efficiency.
1.2 The protocol in this specification provides a standardized format for the creation of raw data files, library spectrum files or results files. This standard format has the extension ".cdf" (derived from NetCDF). The contents of the file include typical header information like instrument, sample, and acquisition method description, followed by raw, library or processed data. Once data have been written or converted to this protocol, they can be read and processed by software packages that support the protocol.
1.3 This specification does not provide for the storage of data acquired simultaneous to and integrated with the mass spectrometric data, but on other detectors; for example attached to the mass spectrometer's liquid or gas chromatographic system. Related Specification E 1947 and Guide 1948 describe the storage of 2-dimensional chromatographic data.
1.4 The software transfer vehicle used for the protocol in this specification is NetCDF, which was developed by the Unidata Program and is funded by the Division of Atmospheric Sciences of the National Science Foundation.
1.5 The protocol in this specification is intended to (1) transfer data between various vendors' instrument systems, (2) provide Laboratory Information Management Systems (LIMS) communications, (3) link data to document processing applications, (4) link data to spreadsheet applications, and (5) archive analytical data, or a combination thereof. The protocol is a consistent, vendor independent data format that facilitates the analytical data interchange for these activities.
1.6 The protocol consists of:
1.6.1 This specification on mass spectrometric data, which gives the full definitions for each one of the generic mass spectrometric data elements used in implementation of the protocol. It defines the analytical information categories, which are a convenient way for sorting analytical data elements to make them easier to standardize.
1.6.2 Guide E 2078 on mass spectrometric data, which gives the full details on how to implement the content of the protocol using the public-domain NetCDF data interchange system. It includes a brief introduction to using NetCDF and describes an API (Application Programming Interface) that is intended to be incorporated into application programs to read or write NetCDF files. It is intended...
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E2077 – 00 (Reapproved 2005)
Standard Specification for
Analytical Data Interchange Protocol for Mass
Spectrometric Data
This standard is issued under the fixed designation E2077; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision.Anumber in parentheses indicates the year of last reapproval.A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 1.6.1 This specification on mass spectrometric data, which
gives the full definitions for each one of the generic mass
1.1 Thisspecificationcoversastandardizedformatformass
spectrometric data elements used in implementation of the
spectrometric data representation and a software vehicle to
protocol.Itdefinestheanalyticalinformationcategories,which
effect the transfer of mass spectrometric data between instru-
are a convenient way for sorting analytical data elements to
ment data systems. This specification provides a protocol
make them easier to standardize.
designedtobenefitusersofanalyticalinstrumentsandincrease
1.6.2 GuideE2078onmassspectrometricdata,whichgives
laboratory productivity and efficiency.
thefulldetailsonhowtoimplementthecontentoftheprotocol
1.2 The protocol in this specification provides a standard-
using the public-domain NetCDF data interchange system. It
ized format for the creation of raw data files, library spectrum
includesabriefintroductiontousingNetCDFanddescribesan
files or results files. This standard format has the extension
API(ApplicationProgrammingInterface)thatisintendedtobe
“.cdf” (derived from NetCDF).The contents of the file include
incorporated into application programs to read or write
typical header information like instrument, sample, and acqui-
NetCDF files. It is intended for software implementors, not
sition method description, followed by raw, library or pro-
those wanting to understand the definitions of data in a mass
cessed data. Once data have been written or converted to this
spectrometric dataset.
protocol,theycanbereadandprocessedbysoftwarepackages
1.6.3 NetCDF User’s Guide.
that support the protocol.
1.3 This specification does not provide for the storage of
2. Referenced Documents
data acquired simultaneous to and integrated with the mass
2.1 ASTM Standards:
spectrometric data, but on other detectors; for example at-
E2078 Guide for Analytical Data Interchange Protocol for
tached to the mass spectrometer’s liquid or gas chromato-
Mass Spectrometric Data
graphicsystem.RelatedSpecificationE1947andGuideE1948
E1947 Specification forAnalytical Data Interchange Proto-
describe the storage of 2-dimensional chromatographic data.
col for Chromatographic Data
1.4 The software transfer vehicle used for the protocol in
E1948 Guide for Analytical Data Interchange Protocol for
this specification is NetCDF, which was developed by the
Chromatographic Data
UnidataProgramandisfundedbytheDivisionofAtmospheric
2 2.2 Other Standards:
Sciences of the National Science Foundation.
NetCDF User’s Guide
1.5 The protocol in this specification is intended to (1)
Occupational Safety and Health Administration (OSHA)
transferdatabetweenvariousvendors’instrumentsystems,(2)
Standards-29 CFR part 1910
provideLaboratoryInformationManagementSystems(LIMS)
IEEE 488
communications, (3) link data to document processing appli-
IEEE 802
cations, (4) link data to spreadsheet applications, and (5)
EIA 232
archive analytical data, or a combination thereof.The protocol
is a consistent, vendor independent data format that facilitates
the analytical data interchange for these activities.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
1.6 The protocol consists of:
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on
the ASTM website.
1 4
This specification is under the jurisdiction of ASTM Committee E13 on Available from Russell K. Rew, Unidata Program Center, University Corpora-
Molecular Spectroscopy and Separation Science and is the direct responsibility of tion for Atmospheric Research, P.O. Box 3000, Boulder, CO 80307-3000.
Subcommittee E13.15 on Analytical Data. Available from Occupational Safety and Health Administration (OSHA), 200
Current edition approved Sept. 1, 2005. Published November 2005. Originally Constitution Ave., NW, Washington, DC 20210, http://www.osha.gov.
approved in 2000. Last previous edition approved in 2000 as E2077–00. DOI: Available from Institute of Electrical and Electronics Engineers, Inc. (IEEE),
10.1520/E2077-00R05. 445 Hoes Ln., P.O. Box 1331, Piscataway, NJ 08854-1331, http://www.ieee.org.
2 7
For more information on the NetCDF standard, contact Unidata at www.uni- Available from Electronic Industries Alliance (EIA), 2500 Wilson Blvd.,
data.ucar.edu. Arlington, VA 22201, http://www.eia.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
E2077 – 00 (2005)
TABLE 1 Administrative Information Class
2.3 ISO Standards:
8601:1988 Data elements and interchange formats (First
NOTE 1—Particular analytical information categories (C1, C2, C3, C4,
edition published 1988-06-15; with Technical Corrigen- or C5) are assigned to each data element under the Category column.The
meaning of this category assignment is explained in Section 5.
dum 1 published 1991-05-01)
NOTE 2—The Required column indicates whether a data element is
639:1988 Codefortherepresentationofnamesoflanguages
required, and if required, for which categories. For example, M1234
9000 Quality Management Systems
indicates that that particular data element is required for any dataset that
includes information from Category 1, 2, 3, or 4. M4 indicates that a data
ISO/IEC 8802
element is only required for Category 4 datasets.
NOTE 3—Unless otherwise specified, data elements are generally re-
3. Terminology
corded to be their actual test values, instead of the nominal values that
were used at the initiation of a test.
3.1 Analytical Information Classes—The Mass Spectrom-
NOTE 4—A table is not to be interpreted as a table of keywords. The
etry Information Model categorizes mass spectrometric infor-
software implementation is independent of the data element names used
mation into a number of information “classes.” There is not a
here,andisinfactquitedifferent.Likewise,thedatatypesgivenarenotan
direct mapping of these classes into the implementation cat-
implementation representation, but a description of the form of the data
egories described further below. The implementation catego-
element name. That is, a data element labeled as floating point may, for
riesdescribetheinformationhierarchy;theclassesdescribethe example, be implemented as a double precision floating point number; in
this document, it is sufficient to note it as floating point without reference
contents within the hierarchy. The model presented here only
to precision.
partially addresses these classes. In particular, the last two
(Processed Results and Component Quantitation Results) are
Data Element Name Datatype Category Required
not described at all. Only Implementation Category 1 is
dataset-completeness string C1 M12345
required for compliance within this specification. Information
protocol-template-revision string C1 M12345
about the other implementation categories is provided for
netcdf-revision string C1 M12345
languages string C1 or C5 . . .
historical interest. The classes defined here are:
administrative-comments string C1 or C2 . . .
3.1.1 Administrative—information for administrative track-
dataset-origin string C1 M4
ing of experiments. dataset-owner string C1 . . .
dataset-date-time-stamp string C1 M1234
3.1.2 Instrument-ID—information about the instrument that
injection-date-time-stamp string C1 M1234
generally does not change from experiment to experiment. experiment-title string C1 . . .
experiment-cross-references string array[n] C3 or C4
3.1.3 Sample Description—information describing the
operator-name string C1 M4
sample and its history, handling and processing.
experiment-type string C1 or C4 . . .
pre-experiment-program-name string C2 or C5 . . .
3.1.4 Test Method—allinformationusedtogeneratetheraw
post-experiment-program-name string C2 or C5 . . .
data and processed results. This includes instrument control,
number-of-times-processed integer C5
detection, calibration, data processing and quantitation meth- number-of-times-calibrated integer C5
calibration-history string array[n] C5
ods.
source-file-reference string C5 M4
3.1.5 Raw Data—the data as stored in the data file, along
source-file-format string C5
source-file-date-time-stamp string C5 M4
with any parameters needed to describe it.
external-file-references string array[n] C5
3.1.6 Processed Results—processing information and val-
error-log string C5
ues derived from the raw data.
3.2.1 administrative-comments—comments about the
3.1.7 Component Quantitation Results—individual quanti-
dataset identification of the experiment. This free text field is
tation results for components in a complex mixture.
foranythinginthisinformationclassthatisnotcoveredbythe
3.2 Definitions for Administrative Information Class—
other data elements in this class.
These definitions are for those data elements that are imple-
3.2.2 calibration-history—an audit trail of file names and
mented in the protocol. See Table 1.
data sets which records the calibration history; used for Good
Laboratory Practice (GLP) compliance.
3.2.3 dataset-completeness—indicates which analytical in-
formation categories are contained in the dataset. The string
Available from International Organization for Standardization (ISO), 1, ch. de
shouldexactlylistthecategoryvalues,asappropriate,asoneor
la Voie-Creuse, Case postale 56, CH-1211, Geneva 20, Switzerland, http://
www.iso.ch. more of the following “C1+C2+C3+C4+C5,” in a string
E2077 – 00 (2005)
are represented as mass-intensity pairs, whether incrementally spaced
separated by plus (+) signs.This data element is used to check
or not.
for completeness of the analytical dataset being transferred.
library mass spectrum—a data set consisting of one or more spectra
3.2.4 dataset-date-time-stamp—indicates the absolute time
derived from a spectral library. This is distinguished from an experi-
of dataset creation relative to Greenwich Mean Time. Ex-
mentalmassspectraldatasetinthateachspectruminthelibrarysethas
pressed as the synthetic datetime given in the form:
associated chemical identification and other information.
YYYYMMDDhhmmss6ffff.
3.2.10.2 Discussion—A required Raw Data Information
3.2.4.1 Discussion—This is a synthesis of ISO 8601:1988,
parameter, the number of scans, is used to define the shape of
which compensates for local time variations.
the data in the file, that is, to differentiate between single and
3.2.4.2 Discussion—The YYYYMMDDhhmmss expresses
multiplespectrumfiles.Anotherparameter,thescannumber,is
the local time, and time differential factor (ffff) expresses the
used to determine whether multiple scan files have an order or
hours and minutes between local time and the Coordinated
relatedness between scans.
Universal Time (UTC or Greenwich Mean Time, as dissemi-
3.2.10.3 Discussion—Some instruments are capable of
nated by time signals), as defined in ISO 8601:1988. The time
mixedmodedataacquisition,forexample,alternatingpositive/
differential factor (ffff) is represented by a four-digit number
negative EI (Electron Ionisation) or CI (Chemical Ionisation)
preceded by a plus (+) or a minus (−) sign, indicating the
scans. In order to keep this interchange standard as simple as
number of hours and minutes that local time differs from the
possible, each scan mode must be treated as a separate data
UTC. Local times vary throughout the world from UTC by as
set regardless of how the data are actually stored in the source
much as −1200 h (west of the Greenwich Meridian) and by as
data file. Alternating positive/negative EI data, for example,
much as +1300 h (east of the Greenwich Meridian). When the
will generate two interchange files (possibly simultaneously,
time differential factor equals zero, this indicates a zero hour,
depending on the implementation); one for the positive EI
zerominute,andzeroseconddifferencefromGreenwichMean
scans and one for the negative EI scans. These files may be
Time.
made mutually cross-referential using their “external-file-
3.2.4.3 Discussion—An example of a value for a datetime
references” fields.
would be: 1991,08,01,12:30:23-0500 or 19910801123023-
3.2.11 external-file-references—an array of strings listing
0500. In human terms this is 23 s past 12:30 PM onAugust 1,
filenamesreferredtofromwithintherawdatafile.Thesecould
1991 in New York City. Note that the −0500 h is 5 full hours
include, for example, tune parameter, method, calibration,
time behind Greenwich MeanTime.The ISO standard permits
reference, sequence, or other files. NetCDF User’s Guide files
theuseofseparatorsasshown,iftheyarerequiredtofacilitate
producedinparallel(suchaspairedfilescontainingalternating
human understanding. However, separators are not required
EI/CI scans) should be cross-referenced here.
andconsequentlyshallnotbeusedtoseparatedateandtimefor
3.2.12 injection-date-time-stamp—indicates the absolute
interchange among data processing systems.
time of sample injection relative to Greenwich Mean Time.
3.2.4.4 Discussion—The numerical value for the month of
Expressed as the synthetic datetime given in the form:
the year is used, because this eliminates problems with the
YYYYMMDDhhmmss 6ffff. See dataset-date-time-stamp for
different month abbreviations used in different human lan-
details of the ISO standard definition of a date-time-stamp.
guages.
3.2.13 languages—optional list of natural (human) lan-
3.2.5 dataset-origin—name of the organization, address,
guages and programming languages delineated for processing
telephone number, electronic mail nodes, and names of indi-
by language tools.
vidual contributors, including operator(s), and any other infor-
3.2.13.1 ISO-639-language—indicates a language symbol
mation as appropriate. This is where the dataset originated.
and country code from Annex B and D of the ISO 639:1988
3.2.6 dataset-owner—name of the owner of a proprietary
Standard.
dataset. The person or organization named here is responsible
3.2.13.2 other-language—indicates the languages and dia-
for this field’s accuracy. Copyrighted data should be indicated
lect using a user-readable name; applies only for those lan-
here.
guages and dialects not covered by ISO 639:1988 (such as
3.2.7 error-log—informationthatservesasalogforfailures
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.