ASTM E2077-00
(Specification)Standard Specification for Analytical Data Interchange Protocol for Mass Spectrometric Data
Standard Specification for Analytical Data Interchange Protocol for Mass Spectrometric Data
SCOPE
1.1 This specification covers a standardized format for mass spectrometric data representation and a software vehicle to effect the transfer of mass spectrometric data between instrument data systems. This specification provides a protocol designed to benefit users of analytical instruments and increase laboratory productivity and efficiency.
1.2 The protocol in this specification provides a standardized format for the creation of raw data files, library spectrum files or results files. This standard format has the extension ".cdf" (derived from NetCDF). The contents of the file include typical header information like instrument, sample, and acquisition method description, followed by raw, library or processed data. Once data have been written or converted to this protocol, they can be read and processed by software packages that support the protocol.
1.3 This specification does not provide for the storage of data acquired simultaneous to and integrated with the mass spectrometric data, but on other detectors; for example attached to the mass spectrometer's liquid or gas chromatographic system. Related Specification E 1947 and Guide 1948 describe the storage of 2-dimensional chromatographic data.
1.4 The software transfer vehicle used for the protocol in this specification is NetCDF, which was developed by the Unidata Program and is funded by the Division of Atmospheric Sciences of the National Science Foundation.
1.5 The protocol in this specification is intended to (1) transfer data between various vendors' instrument systems, (2) provide Laboratory Information Management Systems (LIMS) communications, (3) link data to document processing applications, (4) link data to spreadsheet applications, and (5) archive analytical data, or a combination thereof. The protocol is a consistent, vendor independent data format that facilitates the analytical data interchange for these activities.
1.6 The protocol consists of:
1.6.1 This specification on mass spectrometric data, which gives the full definitions for each one of the generic mass spectrometric data elements used in implementation of the protocol. It defines the analytical information categories, which are a convenient way for sorting analytical data elements to make them easier to standardize.
1.6.2 Guide E 2078 on mass spectrometric data, which gives the full details on how to implement the content of the protocol using the public-domain NetCDF data interchange system. It includes a brief introduction to using NetCDF and describes an API (Application Programming Interface) that is intended to be incorporated into application programs to read or write NetCDF files. It is intended for software implementors, not those wanting to understand the definitions of data in a mass spectrometric dataset.
1.6.3 NetCDF Users Guide.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E 2077 – 00
Standard Specification for
Analytical Data Interchange Protocol for Mass
Spectrometric Data
This standard is issued under the fixed designation E2077; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision.Anumber in parentheses indicates the year of last reapproval.A
superscript epsilon (e) indicates an editorial change since the last revision or reapproval.
1. Scope spectrometric data elements used in implementation of the
protocol.Itdefinestheanalyticalinformationcategories,which
1.1 Thisspecificationcoversastandardizedformatformass
are a convenient way for sorting analytical data elements to
spectrometric data representation and a software vehicle to
make them easier to standardize.
effect the transfer of mass spectrometric data between instru-
1.6.2 Guide E2078 on mass spectrometric data, which
ment data systems. This specification provides a protocol
gives the full details on how to implement the content of the
designedtobenefitusersofanalyticalinstrumentsandincrease
protocol using the public-domain NetCDF data interchange
laboratory productivity and efficiency.
system. It includes a brief introduction to using NetCDF and
1.2 The protocol in this specification provides a standard-
describes an API (Application Programming Interface) that is
ized format for the creation of raw data files, library spectrum
intended to be incorporated into application programs to read
files or results files. This standard format has the extension
or write NetCDF files. It is intended for software implemen-
“.cdf” (derived from NetCDF).The contents of the file include
tors, not those wanting to understand the definitions of data in
typical header information like instrument, sample, and acqui-
a mass spectrometric dataset.
sition method description, followed by raw, library or pro-
1.6.3 NetCDF Users Guide.
cessed data. Once data have been written or converted to this
protocol,theycanbereadandprocessedbysoftwarepackages
2. Referenced Documents
that support the protocol.
2.1 ASTM Standards:
1.3 This specification does not provide for the storage of
E2078 Guide for Analytical Data Interchange Protocol for
data acquired simultaneous to and integrated with the mass
Mass Spectrometric Data
spectrometric data, but on other detectors; for example at-
E1443 Terminology Relating to Building and Accessing
tached to the mass spectrometer’s liquid or gas chromato-
Materials and Chemical Databases
graphic system. Related Specification E1947 and Guide 1948
E1947 Specification for Analytical Data Interchange Pro-
describe the storage of 2-dimensional chromatographic data.
tocol for Chromatographic Data
1.4 The software transfer vehicle used for the protocol in
E1948 Guide for Analytical Data Interchange Protocol for
this specification is NetCDF, which was developed by the
Chromatographic Data
UnidataProgramandisfundedbytheDivisionofAtmospheric
2 2.2 Other Standards:
Sciences of the National Science Foundation.
NetCDF User’s Guide
1.5 The protocol in this specification is intended to (1)
Occupational Safety and Health Administration (OSHA)
transfer data between various vendors’ instrument systems, (2)
Standards-29 CFR part 1910
provide Laboratory Information Management Systems (LIMS)
IEEE 488
communications, (3) link data to document processing appli-
IEEE -802
cations, (4) link data to spreadsheet applications, and (5)
EIA - 232
archive analytical data, or a combination thereof.The protocol
2.3 ISO Standards:
is a consistent, vendor independent data format that facilitates
the analytical data interchange for these activities.
1.6 The protocol consists of:
1.6.1 This specification on mass spectrometric data, which 3
Annual Book of ASTM Standards, Vol 03.06.
gives the full definitions for each one of the generic mass
Annual Book of ASTM Standards, Vol 14.01.
Available from Russell K. Rew, Unidata Program Center, University Corpora-
tion for Atmospheric Research, P.O. Box 3000, Boulder, CO 80307-3000.
1 6
This specification is under the jurisdiction of ASTM Committee E13 on Occupational Safety and Health Administration, U.S. Department of Labor.
Molecular Spectroscopy and Chromatography and is the direct responsibility of InstituteofElectricalandElectronicsEngineers,Inc.,445HoesLane,P.O.Box
Subcommittee E13.15 on Analytical Data. 1331, Piscataway, NJ 08855–1331.
Current edition approved March 10, 2000. Published July 2000. Electronics Industries Alliance, 2500 WIlson Blvd., Arlington, VA 22201.
2 9
For more information on the NetCDF standard, contact Unidata at www.uni- Available from ISO, 1 Rue de Varembe, Case Postale 56, CH 1211, Geneve,
data.ucar.edu. Switzerland.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
E2077–00
TABLE 1 Administrative Information Class
8601:1988 Data elements and interchange formats (First
edition published 1988-06-15; with Technical Corrigen-
NOTE 1—Particular analytical information categories (C1, C2, C3, C4,
dum 1 published 1991-05-01)
or C5) are assigned to each data element under the Category column.The
639: 1988 Code for the representation of names of lan- meaning of this category assignment is explained in Section 5.
guages
NOTE 2—The Required column indicates whether a data element is
required, and if required, for which categories. For example, M1234
9000 Quality Management Systems
indicates that that particular data element is required for any dataset that
ISO/IEC - 8802
includes information from Category 1, 2, 3, or 4. M4 indicates that a data
element is only required for Category 4 datasets.
3. Terminology
NOTE 3—Unless otherwise specified, data elements are generally re-
corded to be their actual test values, instead of the nominal values that
3.1 Analytical Information Classes—The Mass Spectrom-
were used at the initiation of a test.
etry Information Model categorizes mass spectrometric infor-
NOTE 4—A table is not to be interpreted as a table of keywords. The
mation into a number of information “classes.” There is not a
software implementation is independent of the data element names used
direct mapping of these classes into the implementation cat-
here,andisinfactquitedifferent.Likewise,thedatatypesgivenarenotan
egories described further below. The implementation catego-
implementation representation, but a description of the form of the data
riesdescribetheinformationhierarchy;theclassesdescribethe
element name. That is, a data element labeled as floating point may, for
contents within the hierarchy. The model presented here only
example, be implemented as a double precision floating point number; in
partially addresses these classes. In particular, the last two
this document, it is sufficient to note it as floating point without reference
(Processed Results and Component Quantitation Results) are to precision.
not described at all. Only Implementation Category 1 is
Data Element Name Datatype Category Required
required for compliance within this specification. Information
dataset-completeness string C1 M12345
about the other implementation categories is provided for
protocol-template-revision string C1 M12345
historical interest. The classes defined here are:
netcdf-revision string C1 M12345
3.1.1 Administrative—information for administrative track-
languages string C1 or C5 . . .
administrative-comments string C1 or C2 . . .
ing of experiments.
dataset-origin string C1 M4
3.1.2 Instrument-ID—information about the instrument that
dataset-owner string C1 . . .
dataset-date-time-stamp string C1 M1234
generally does not change from experiment to experiment.
injection-date-time-stamp string C1 M1234
3.1.3 Sample Description—information describing the
experiment-title string C1 . . .
sample and its history, handling and processing.
experiment-cross-references string array[n] C3 or C4
operator-name string C1 M4
3.1.4 Test Method—allinformationusedtogeneratetheraw
experiment-type string C1 or C4 . . .
data and processed results. This includes instrument control,
pre-experiment-program-name string C2 or C5 . . .
detection, calibration, data processing and quantitation meth- post-experiment-program-name string C2 or C5 . . .
number-of-times-processed integer C5
ods.
number-of-times-calibrated integer C5
3.1.5 Raw Data—the data as stored in the data file, along calibration-history string array[n] C5
source-file-reference string C5 M4
with any parameters needed to describe it.
source-file-format string C5
3.1.6 Processed Results—processing information and val-
source-file-date-time-stamp string C5 M4
ues derived from the raw data. external-file-references string array[n] C5
error-log string C5
3.1.7 Component Quantitation Results—individual quanti-
tation results for components in a complex mixture.
3.2.1 administrative-comments—comments about the
3.2 Definitions for Administrative Information Class— dataset identification of the experiment. This free text field is
These definitions are for those data elements that are imple- foranythinginthisinformationclassthatisnotcoveredbythe
mented in the protocol. See Table 1. other data elements in this class.
E2077–00
3.2.2 calibration-history—an audit trail of file names and 3.2.10 experiment-type—name of the type of data stored in
data sets which records the calibration history; used for Good this file. Select one of the types in the following list.
Laboratory Practice (GLP) compliance.
3.2.10.1 Discussion—The valid types are:
3.2.3 dataset-completeness—indicates which analytical in-
centroided mass spectrum—a data set containing centroided single
formation categories are contained in the dataset. The string
or multiple scan mass spectra. This includes selected ion monitoring/
shouldexactlylistthecategoryvalues,asappropriate,asoneor
recording (SIM/SIR) data, represented as mass-intensity pairs. This is
more of the following “C1+C2+C3+C4+C5,” in a string
the default.
separated by plus (+) signs.This data element is used to check
continuum mass spectrum—a data set containing single or multiple
for completeness of the analytical dataset being transferred.
scanmassspectraincontinuum(non-centroidedorprofile)form.Scans
are represented as mass-intensity pairs, whether incrementally spaced
3.2.4 dataset-date-time-stamp—indicates the absolute time
or not.
of dataset creation relative to Greenwich Mean Time. Ex-
library mass spectrum—a data set consisting of one or more spectra
pressed as the synthetic datetime given in the form:
derived from a spectral library. This is distinguished from an experi-
YYYYMMDDhhmmss6ffff.
mentalmassspectraldatasetinthateachspectruminthelibrarysethas
3.2.4.1 Discussion—This is a synthesis of ISO 8601, which
associated chemical identification and other information.
compensates for local time variations.
3.2.10.2 Discussion—A required Raw Data Information
3.2.4.2 Discussion—The YYYYMMDDhhmmss expresses
parameter, the number of scans, is used to define the shape of
the local time, and time differential factor (ffff) expresses the
the data in the file, that is, to differentiate between single and
hours and minutes between local time and the Coordinated
multiplespectrumfiles.Anotherparameter,thescannumber,is
Universal Time (UTC or Greenwich Mean Time, as dissemi-
used to determine whether multiple scan files have an order or
nated by time signals), as defined in ISO 8601. The time
relatedness between scans.
differential factor (ffff) is represented by a four-digit number
3.2.10.3 Discussion—Some instruments are capable of
preceded by a plus (+) or a minus (−) sign, indicating the
mixedmodedataacquisition,forexample,alternatingpositive/
number of hours and minutes that local time differs from the
negative EI (Electron Ionisation) or CI (Chemical Ionisation)
UTC. Local times vary throughout the world from UTC by as
scans. In order to keep this interchange standard as simple as
much as −1200 h (west of the Greenwich Meridian) and by as
possible, each scan mode must be treated as a separate data
much as +1300 h (east of the Greenwich Meridian). When the
set regardless of how the data are actually stored in the source
time differential factor equals zero, this indicates a zero hour,
data file. Alternating positive/negative EI data, for example,
zerominute,andzeroseconddifferencefromGreenwichMean
will generate two interchange files (possibly simultaneously,
Time.
depending on the implementation); one for the positive EI
3.2.4.3 Discussion—An example of a value for a datetime
scans and one for the negative EI scans. These files may be
would be: 1991,08,01,12:30:23-0500 or 19910801123023-
made mutually cross-referential using their “external-file-
0500. In human terms this is 23 s past 12:30 PM onAugust 1,
references” fields.
1991 in New York City. Note that the −0500 h is 5 full hours
time behind Greenwich MeanTime.The ISO standard permits 3.2.11 external-file-references—an array of strings listing
the use of separators as shown, if they are required to facilitate filenamesreferredtofromwithintherawdatafile.Thesecould
human understanding. However, separators are not required include, for example, tune parameter, method, calibration,
andconsequentlyshallnotbeusedtoseparatedateandtimefor reference, sequence, or other files. NetCDF files produced in
interchange among data processing systems. parallel(suchaspairedfilescontainingalternatingEI/CIscans)
should be cross-referenced here.
3.2.4.4 Discussion—The numerical value for the month of
the year is used, because this eliminates problems with the
3.2.12 injection-date-time-stamp—indicates the absolute
different month abbreviations used in different human lan- time of sample injection relative to Greenwich Mean Time.
guages.
Expressed as the synthetic datetime given in the form:
YYYYMMDDhhmmss 6ffff. See dataset-date-time-stamp for
3.2.5 dataset-origin—name of the organization, address,
details of the ISO standard definition of a date-time-stamp.
telephone number, electronic mail nodes, and names of indi-
vidual contributors, including operator(s), and any other infor-
3.2.13 languages—optional list of natural (human) lan-
mation as appropriate. This is where the dataset originated.
guages and programming languages delineated for processing
3.2.6 dataset-owner—name of the owner of a proprietary by language tools.
dataset. The person or organization named here is responsible
3.2.13.1 ISO-639-language—indicates a language symbol
for this field’s accuracy. Copyrighted data should be indicated
and country code from Annex B and D of the ISO-639
here.
Standard.
3.2.7 error-log—informationthatservesasalogforfailures
3.2.13.2 other-language—indicates the languages and dia-
of any type, such
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.