ISO/IEC 23415:2024
(Main)Information technology — Data Format Description Language (DFDL) v1.0 Specification
Information technology — Data Format Description Language (DFDL) v1.0 Specification
Titre manque
General Information
Standards Content (Sample)
International
Standard
ISO/IEC 23415
First edition
Information technology — Data
Format Description Language
2024-04
(DFDL) v1.0 Specification
Reference number
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2024 – All rights reserved
ii
Contents
1 Introduction . 9
1.1 Why is DFDL Needed? . 10
1.2 What is DFDL? . 10
Simple Example . 10
1.3 What DFDL is not . 13
1.4 Scope of version 1.0 . 13
2 Overview of the Specification . 15
3 Notational and Definitional Conventions . 16
3.1 Glossary and Terminology . 16
3.2 Failure Types . 16
4 The DFDL Information Set (Infoset) . 17
4.1 "No Value'' . 18
4.2 Information Items . 18
Document Information Item . 18
Element Information Items . 18
4.3 DFDL Information Item Order . 19
4.4 DFDL Augmented Infoset . 19
5 DFDL Schema Component Model . 20
5.1 DFDL Simple Types . 20
5.2 DFDL Subset of XML Schema . 21
5.3 XSD Facets, min/maxOccurs, default, and fixed . 22
MinOccurs, MaxOccurs . 23
MinLength, MaxLength . 23
MaxInclusive, MaxExclusive, MinExclusive, MinInclusive, TotalDigits, FractionDigits . 23
Pattern . 23
Enumeration Values . 23
Default . 23
Fixed . 24
5.4 Compatibility with Other Annotation Language Schemas . 24
6 DFDL Syntax Basics . 25
6.1 Namespaces . 25
6.2 The DFDL Annotation Elements . 25
6.3 DFDL Properties . 26
DFDL String Literals . 28
DFDL Expressions . 32
DFDL Regular Expressions . 32
Enumerations in DFDL . 32
7 Syntax of DFDL Annotation Elements . 33
7.1 Component Format Annotations . 33
Property Binding Syntax . 34
Empty String as a Representation Property Value . 35
7.2 dfdl:defineFormat - Reusable Data Format Definitions . 36
Using/Referencing a Named Format Definition: The dfdl:ref Property . 36
Inheritance for dfdl:defineFormat . 36
7.3 The dfdl:defineEscapeScheme Defining Annotation Element . 37
Using/Referencing a Named escapeScheme Definition . 37
7.4 The dfdl:escapeScheme Annotation Element . 37
© ISO/IEC 2024 – All rights reserved
7.5 The dfdl:assert Statement Annotation Element . 38
Properties for dfdl:assert . 38
7.6 The dfdl:discriminator Statement Annotation Element . 40
Properties for dfdl:discriminator . 40
7.7 DFDL Variable Annotations . 43
dfdl:defineVariable Annotation Element . 43
The dfdl:newVariableInstance Statement Annotation Element . 44
The dfdl:setVariable Statement Annotation Element . 45
8 Property Scoping and DFDL Schema Checking . 47
8.1 Property Scoping . 47
Property Scoping Rules . 47
Providing Defaults for DFDL properties . 47
Combining DFDL Representation Properties from a dfdl:defineFormat . 48
Combining DFDL Properties from References . 48
8.2 DFDL Schema Checking . 51
Schema Component Constraint: Unique Particle Attribution . 51
Optional Checks and Warnings . 51
9 DFDL Processing Introduction . 53
9.1 Parser Overview . 53
Points of Uncertainty . 53
Processing Error . 54
Recoverable Error . 54
9.2 DFDL Data Syntax Grammar . 54
Nil Representation . 56
Empty Representation . 56
Normal Representation . 57
Absent Representation . 57
Zero-length Representation . 57
Missing . 57
Examples of Missing and Empty Representation . 58
Round Trip Ambiguities . 58
9.3 Parsing Algorithm . 58
Known-to-exist and Known-not-to-exist . 59
Establishing Representation . 60
Resolving Points of Uncertainty . 61
9.4 Element Defaults . 62
Definitions. 62
Element Defaults When Parsing . 62
Element Defaults When Unparsing . 64
9.5 Evaluation Order for Statement Annotations . 65
Asserts and Discriminators with testKind 'expression' . 66
Discriminators with testKind 'expression' . 66
Elements and setVariable . 66
Controlling the Order of Statement Evaluation . 66
9.6 Validation . 66
9.7 Unparser Infoset Augmentation Algorithm . 67
10 Overview: Representation Properties and their Format Semantics . 68
© ISO/IEC 2024 – All rights reserved
11 Properties Common to both Content and Framing . 69
11.1 Unicode Byte Order Mark (BOM) . 71
11.2 Character Encoding and Decoding Errors . 71
Property dfdl:encodingErrorPolicy . 72
Unicode UTF-16 Decoding/Encoding Non-Errors . 73
Preserving Data Containing Decoding Errors . 73
11.3 Byte Order and Bit Order . 73
11.4 dfdl:bitOrder Example . 73
Example Using Right-to-Left Display for 'leastSignificantBitFirst' . 74
dfdl:bitOrder and Grammar Regions . 74
12 Framing . 75
12.1 Aligned Data . 75
Implicit Alignment . 76
Mandatory Alignment for Textual Data . 76
Mandatory Alignment for Packed Decimal Data . 77
Example: AlignmentFill . 77
12.2 Properties for Specifying Delimiters . 78
12.3 Properties for Specifying Lengths . 81
dfdl:lengthKind 'explicit' . 82
dfdl:lengthKind 'delimited' . 82
dfdl:lengthKind 'implicit' . 83
dfdl:lengthKind 'prefixed' . 84
dfdl:lengthKind 'pattern' . 87
dfdl:lengthKind 'endOfParent' . 87
Elements of Specified Length . 89
13 Simple Types . 92
13.1 Properties Common to All Simple Types . 92
13.2 Properties Common to All Simple Types with Text representation . 93
The dfdl:escapeScheme Properties . 94
13.3 Properties for Bidirectional support for All Simple Types with Text representation . 97
13.4 Properties Specific to String . 97
13.5 Properties Specific to Number with Text or Binary Representation . 99
13.6 Properties Specific to Number with Text Representation . 99
The dfdl:textNumberPattern Property . 106
Converting logical numbers to/from text representation . 110
13.7 Properties Specific to Number with Binary Representation . 112
Converting Logical Numbers to/from Binary Representation . 113
13.8 Properties Specific to Float/Double with Binary Representation . 118
13.9 Properties Specific to Boolean with Text Representation . 118
13.10 Properties Specific to Boolean with Binary Representation . 119
13.11 Properties Specific to Calendar with Text or Binary Representation . 120
The dfdl:calendarPattern property . 122
The dfdl:calendarCheckPolicy Property . 125
13.12 Properties Specific to Calendar with Text Representation . 125
13.13 Properties Specific to Calendar with Binary Representation . 126
13.14 Properties Specific to Opaque Types (xs:hexBinary) . 127
13.15 Nil Value Processing . 127
13.16 Properties for Nillable Elements . 127
© ISO/IEC 2024 – All rights reserved
14 Sequence Groups . 131
14.1 Empty Sequences . 131
14.2 Sequence Groups with Separators . 132
Separators and Suppression. 133
Parsing Sequence Groups with Separators . 134
Unparsing Sequence Groups with Separators . 136
14.3 Unordered Sequence Groups . 138
Restrictions for Unordered Sequences . 138
Parsing an Unordered Sequence . 138
Unparsing an Unordered Sequence . 140
14.4 Floating Elements . 140
14.5 Hidden Groups . 141
15 Choice Groups . 143
15.1 Resolving Choices . 144
Resolving Choices via Speculation . 144
Resolving Choices via Direct Dispatch . 145
Unparsing Choices . 145
16 Properties for Array Elements and Optional Elements . 146
16.1 The dfdl:occursCountKind property . 146
dfdl:occursCountKind 'fixed' . 146
dfdl:occursCountKind 'implicit' . 147
dfdl:occursCountKind 'parsed'. 147
dfdl:occursCountKind 'expression' . 147
dfdl:occursCountKind 'stopValue' . 147
16.2 Default Values for Arrays . 148
16.3 Arrays with DFDL Expressions . 148
16.4 Points of Uncertainty . 148
16.5 Arrays and Sequences . 148
16.6 Forward Progress Requirement . 148
16.7 Parsing Occurrences with Non-Normal Representation . 149
16.8 Sparse Arrays . 149
17 Calculated Value Properties . 150
17.1 Example: 2d Nested Array . 151
17.2 Example: Three-Byte Date . 151
18 DFDL Expression Language . 154
18.1 Expression Language Data Model . 154
18.2 Variables . 155
Rewinding of Variable Memory State . 155
Variable Memory State Transitions . 155
18.3 General Syntax . 156
18.4 DFDL Expression Syntax . 157
18.5 Constructors, Functions and Operators . 158
Constructor Functions for XML Schema Built-in Types . 158
Standard XPath Functions . 159
DFDL Functions . 162
DFDL Constructor Functions . 164
Miscellaneous Functions . 165
18.6 Unparsing and Circular Expression Deadlock Errors . 166
© ISO/IEC 2024 – All rights reserved
19 DFDL Regular Expressions . 167
20 External Control of the DFDL Processor . 168
21 Built-in Specifications . 169
22 Conformance . 170
23 Optional DFDL Features . 171
24 Security Considerations . 173
25 Authors and Contributors . 174
26 Intellectual Property Statement . 175
27 Disclaimer . 176
28 Full Copyright Notice . 177
29 References . 178
30 Appendix A: Escape Scheme Use Cases . 181
30.1 Escape Character Same as dfdl:escapeEscapeCharacter . 181
30.2 Escape Character Different from dfdl:escapeEscapeCharacter . 181
Example 1 - Separator ';' . 181
Example 2 - Separator 'sep' . 182
30.3 Escape Block with Different Start and End Characters . 182
30.4 Escape Block with Same Start and End Characters . 183
31 Appendix B: Rationale for Single-Assignment Variables . 185
32 Appendix C: Processing of DFDL String literals . 186
32.1 Interpreting a DFDL String Literal . 186
32.2 Recognizing a DFDL String Literal . 186
32.3 Recognizing DFDL String Literal Part . 186
33 Appendix D: DFDL Standard Encodings . 188
33.1 Purpose . 188
33.2 Conventions . 188
33.3 Specification Template . 188
33.4 Encoding X-DFDL-US-ASCII-7-BIT-PACKED . 188
Name . 188
Translation table . 188
Width . 188
Alignment . 189
Byte Order . 189
Example 1 . 189
Example 2 . 189
33.5 Encoding X-DFDL-US-ASCII-6-BIT-PACKED . 191
Name . 191
Translation Table . 191
Width . 192
Alignment . 192
ByteOrder . 192
Example 1 . 192
33.6 References for Appendix D . 193
34 Appendix E: Glossary of Terms . 194
35 Appendix F: Specific Errors Classified . 200
36 Appendix G: Property Precedence . 202
36.1 Parsing . 202
dfdl:element (simple) and dfdl:simpleType . 202
© ISO/IEC 2024 – All rights reserved
dfdl:element (complex) . 205
dfdl:sequence and dfdl:group (when reference is to a sequence) . 206
dfdl:choice and dfdl:group (when reference is to a choice) . 207
36.2 Unparsing . 208
dfdl:element (simple) and dfdl:simpleType . 208
dfdl:element (complex) . 212
dfdl:sequence and dfdl:group (when reference is a sequence) . 213
dfdl:choice and dfdl:group (when reference is a choice) . 213
© ISO/IEC 2024 – All rights reserved
1 Introduction
Data interchange is critically important for most computing. Grid computing, Cloud computing, and all forms of
distributed computing require distributed software and hardware resources to work together. Inevitably, these
resources read and write data in a variety of formats. General tools for data interchange are essential to
solving such problems. Scalable and High-Performance Computing (HPC) applications require high-
performance data handling, so data interchange standards must enable efficient representation of data. Data
Format Description Language (DFDL) enables powerful data interchange and very high-performance data
handling.
One can envisage three dominant kinds of data in the future, as follows:
1. Textual data defined by a format specific schema such as XML[XML] or JSON[JSON].
2. Binary data in standard formats.
3. Data with DFDL descriptors.
Textual XML and JSON data are the most successful data interchange standards to date. All such data are by
definition new, meaning created in the Internet era. Because of the large overhead that textual tagging
imposes, there is often a need to compress and decompress XML and JSON data. However, there is a high
cost for compression and decompression that is unacceptable to some applications. Standardized binary data
formats are also relatively new and are suitable for larger data because of the reduced costs of encoding and
more compact size. Examples of standard binary formats are data described by modern versions of ASN.1
[ASN1], XDR [XDR], Thrift [Thrift], Avro [AVRO], and Google Protocol Buffers [GPB]. These techniques lack
the self-describing nature of XML or JSON data. Scientific formats, such as NetCDF[NetCDF] and HDF[HDF]
are used by some communities to provide self-describing binary data. There are also standardized binary-
encoded XML data formats such as EXI [EXI].
It is an important observation that both XML format and standardized binary formats are prescriptive in that
they specify or prescribe a representation of the data. To use them applications must be written to conform to
their encodings and mechanisms of expression.
DFDL suggests an entirely different scheme. The approach is descriptive in that one chooses an appropriate
data representation for an application based on its needs and one then describes the format using DFDL so
that multiple programs can directly interchange the described data. DFDL descriptions can be provided by the
creator of the format or developed as needed by third parties intending to use the format. That is, DFDL is not
a format for data; it is a way of describing any data format . DFDL is intended for data commonly found in
scientific and numeric computations, as well as record-oriented representations found in commercial data
processing.
DFDL can be used to describe legacy data files, to simplify transfer of data across domains without requiring
global standard formats, or to allow third-party tools to easily access multiple formats. DFDL can also be a
powerful tool for supporting backward compatibility as formats evolve.
DFDL is designed to provide flexibility and permit implementations that achieve very high levels of
performance. DFDL descriptions are separable and native applications do not need to use DFDL libraries to
parse their data formats. DFDL parsers can also be highly efficient. The DFDL language is designed to permit
implementations that use lazy evaluation of formats and to support seekable, random access to data. The
following goals can be achieved by DFDL implementations:
• Density. Fewest bytes to represent information (without resorting to compression). Fastest possible
I/O.
• Optimized I/O. Applications can write data aligned to byte, word, or even page boundaries and to use
memory mapped I/O to ensure access to data with the smallest number of machine cycles for
common use cases without sacrificing general access.
DFDL can describe the same types of abstract data that other binary or textual data formats can describe and,
furthermore, it can describe almost any possible representation scheme for those data. It is the intent of DFDL
to support canonical data descriptions that correspond closely to the original in-memory representation of the
data, and to provide sufficient information to write as well as to read the given format.
ASN.1 with any of the prescribed encoding rules: Basic Encoding Rules (BER), Distinguished Encoding Rules (DER),
Canonical Encoding Rules (CER) [ASN1CER] or Packed Encoding Rules (PER) [ASN1PER]
Additional examples of descriptive approaches: ASN1 Encoding Control Notation (also known as ITU-T X.692)
[ASN1ECN], BFD: Binary Format Description (BFD) Language [BFD]. The largest set of examples of descriptive
approaches are all the various proprietary ad-hoc format description languages found almost universally in every
commercial database, analytical, or enterprise software system that must take in data.
© ISO/IEC 2024 – All rights reserved
1.1 Why is DFDL Needed?
In an era when there are so many standard data formats available the question arises of why DFDL is
needed. Ultimately, it is because data formats are rarely a primary consideration when programs are initially
created.
Programs are very often written speculatively, that is, without any advance understanding of how important
they will become. Given this situation, little effort is expended on data formats since it remains easier to
program the I/O in the most straightforward way possible with the programming tools in use. Even something
as simple as using an XML-based data format is often harder than just using the native I/O libraries of a
programming language.
In time, however, if a software program becomes important either because many people are using it, or it has
become important for business or organizational needs, it is often too late to go back and change the data
formats. For example, there may be real or perceived business costs to delaying the deployment of a program
for a rewrite just to change the data formats, particularly if such rewriting will reduce the performance of the
program and increase the costs of deployment.
Indeed, the need for data format standardization for interchange with other software may not be clear at the
point where a program first becomes important. Eventually, however, the need for data interchange with the
program becomes apparent.
There are, of course, efforts to smoothly integrate standardized data-format handling into programming
languages. However, the above phenomena are not going away any time soon and there is a critical role for
DFDL since it allows after-the-fact description of evolving data formats.
1.2 What is DFDL?
DFDL is a language for describing data formats. A DFDL description enables parsing, that is, it allows data to
be read from its native format and presented as a data structure called the DFDL Information Set or DFDL
Infoset. This information set describes the common characteristics of parsed data that are required of all
DFDL implementations and it is fully defined in Section 4. DFDL implementations MAY provide API access to
the Infoset as well as conversion of the Infoset into concrete representations such as XML text, binary XML
[EXI] , or JSON [JSON]. DFDL also enables unparsing , that is, allows data to be taken from an instance of a
DFDL information set and written out to its native format.
DFDL achieves this by leveraging W3C XML Schema Definition Language (XSD) 1.0. [XSD]
An XML schema is written for the logical model of the data. The schema is augmented with special DFDL
annotations and the annotated schema is called a DFDL Schema. The annotations are used to describe the
native representation of the data.
This approach of extending XSD with format annotations has been extensively used in commercial systems
that predate DFDL. The contribution of DFDL for data parsing is creation of a standard for these annotations
that is open, comprehensive, and vendor neutral. For unparsing DFDL does more to advance the state of the
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...