ISO 24615:2010
(Main)Language resource management - Syntactic annotation framework (SynAF)
Language resource management - Syntactic annotation framework (SynAF)
ISO 24615:2010 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615:2010 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.
Gestion de ressources langagières — Cadre d'annotation syntaxique (SynAF)
Upravljanje z jezikovnimi viri - Ogrodje za skladenjsko označevanje (SynAF)
Ta mednarodni standard opisuje ogrodje za skladenjsko označevanje (SynAF), ki je večravninski model za predstavitev skladenjskega označevanja jezikovnih podatkov, da se zagotovi podpora interoperabilnosti med jezikovnimi viri ali komponentami za obdelavo jezikov. Ta mednarodni standard dopolnjuje in je tesno povezan s standardom ISO 24611 (MAF, ogrodje za oblikoskladenjsko označevanje) in zagotavlja metamodel za skladenjske predstavitve in referenčne podatkovne kategorije za predstavitev podatkov o sestavi in odvisnosti v stavkih ali drugih primerljivih izjavah in segmentih.
General Information
Relations
Frequently Asked Questions
ISO 24615:2010 is a standard published by the International Organization for Standardization (ISO). Its full title is "Language resource management - Syntactic annotation framework (SynAF)". This standard covers: ISO 24615:2010 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615:2010 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.
ISO 24615:2010 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615:2010 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.
ISO 24615:2010 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination). The ICS classification helps identify the subject area and facilitates finding related standards.
ISO 24615:2010 has the following relationships with other standards: It is inter standard links to ISO 24615-1:2014. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO 24615:2010 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
SLOVENSKI STANDARD
01-julij-2013
8SUDYOMDQMH]MH]LNRYQLPLYLUL2JURGMH]DVNODGHQMVNRR]QDþHYDQMH6\Q$)
Language resource management -- Syntactic annotation framework (SynAF)
Gestion de ressources langagières -- Cadre d'annotation syntaxique (SynAF)
Ta slovenski standard je istoveten z: ISO 24615:2010
ICS:
01.020 7HUPLQRORJLMDQDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24615
First edition
2010-10-15
Language resource management —
Syntactic annotation framework (SynAF)
Gestion de ressources langagières — Cadre d'annotation syntaxique
(SynAF)
Reference number
©
ISO 2010
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2010
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2010 – All rights reserved
Contents Page
Foreword .iv
Introduction.v
1 Scope.1
2 Normative references.1
3 Terms and definitions .1
4 SynAF metamodel .4
4.1 Introduction.4
4.2 SynAF metamodel .5
4.2.1 Overview.5
4.2.2 SyntacticNode class.6
4.2.3 T_Node class .6
4.2.4 NT_Node class.6
4.2.5 SyntacticEdge class.6
4.2.6 Annotation class.6
Annex A (normative) Data categories for SynAF.7
Annex B (informative) Relation to the Linguistic Annotation Framework .15
Bibliography.17
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24615 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management, in collaboration with the European
eContent Project “LIRICS” (Linguistic Infrastructure for Interoperable Resources and Systems), under the
contract e-Content-22236-LIRICS.
ISO 24615 is designed to coordinate closely with ISO 24612, Language resource management — Linguistic
annotation framework (LAF), ISO 24613:2008, Language resource management — Lexical markup framework
(LMF), and ISO 24611, Language resource management — Morpho-syntactic annotation framework.
iv © ISO 2010 – All rights reserved
Introduction
This International Standard is based on numerous projects and pre-standardisation activities that have taken
[9]
place in the last few years (see Abeillé, 2001 ), to provide reference models and formats for the
representation of syntactic information, whether as the output of a syntactic parser, or as annotations of
language resources (treebanks). For several years, the Penn Treebank initiative has served as a de facto
standard for treebanking, but more recent works e.g. the Negra/Tiger initiative (see: http://www.ims.uni-
stuttgart.de/projekte/TIGER/TIGERCorpus/) in Germany or the ISST initiative in Italy [see Montemagni
[18]
(2003) ] demonstrate the viability of a more coherent framework that can account for both (hierarchical)
constituency and dependency phenomena in syntactic annotation.
The eContent project “LIRICS”, has been seminal in gathering a group of experts, who initiated the ISO 24615
(SynAF) project. While preparing SynAF, this group confirmed that existing initiatives indeed share a common
data model that offers a good basis for the SynAF metamodel (see the study made in Deliverable D.3.1
“Evaluation of initiatives for morpho-syntactic and syntactic annotation” of the EU project LIRICS, available at
http://lirics.loria.fr/doc_pub/Del3_1_V2.pdf).
This International Standard proposes a metamodel for syntactic annotation together with a list of relevant data
categories for syntactic annotation. The data categories are available on the ISOCat server
(http://www.isocat.org/) in the syntax profile (as defined in ISO 12620:2009).
INTERNATIONAL STANDARD ISO 24615:2010(E)
Language resource management — Syntactic annotation
framework (SynAF)
1 Scope
This International Standard describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across
language resources or language processing components. This International Standard is complementary and
closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for
syntactic representations as well as reference data categories for representing both constituency and
dependency information in sentences or other comparable utterances and segments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application
ISO 1087-2:2000, Terminology work — Vocabulary — Part 2: Computer applications
ISO 12620:2009, Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 24611, Language resource management — Morpho-syntactic annotation framework
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1, ISO 1087-2,
ISO 12620:2009, ISO 24611 and the following apply.
3.1
adjunct
non-essential element associated with a verb as opposed to syntactic arguments (3.19)
NOTE Adverbs are possible adjuncts for a sentence.
3.2
chunk
non-recursive constituent (3.4)
3.3
clause
group of phrases (3.14), usually containing a predicate
NOTE A clause can be either a main clause (3.10) or a subordinate clause (3.17). In languages distinguishing
finiteness, clauses whose predicate is a verb can be either finite or non-finite, depending on the form of the verb. A main
clause alone can build a complete sentence (3.15). In the SynAF model, a clause is a special case of a constituent (3.4).
3.4
constituent
syntactic grouping of words [into phrases (3.14)], phrases [into clauses (3.3) or other phrases] or clauses
[into a sentence (3.15)] on the base of structural (or hierarchical) properties
3.5
dependency
dependency relation
syntactic relation between word forms (3.24) or constituents (3.4) on the basis of the grammatical
functions (3.7) that constituents play in relation to each other
3.6
syntactic edge
edge
triplet with a source node (3.12), a target node, and optional annotations (3.9)
NOTE Non-terminal nodes (3.13) have an outgoing constituency syntactic edge.
3.7
grammatical function
grammatical role of a word form (3.24) or constituent (3.4) within its embedding syntactic environment
NOTE For example, a noun phrase (NP) can act as a subject within a sentence (3.15), or a noun may act as a
subject dependent of a verb in a dependency graph. There is a grammatical relation between the subject – NP and the
main verb in a sentence. All grammatical relations (subject – predicate, head – modifier, etc.) are subsumed under the
concept of dependency relations (3.5), whether between terminal or non-terminal nodes.
3.8
syntactic head
head
part of a constituent (3.4) which determines its distribution (the syntactic environments in which the
constituent may appear) and its grammatical properties (e.g. if the grammatical gender of the head is feminine,
then the gender of the entire constituent will be feminine)
NOTE The head of a constituent usually cannot be left out.
3.9
linguistic annotation
annotation
feature-value pair denoting a linguistic property of a linguistic segment
3.10
main clause
clause (3.3), which can act on its own as a complete sentence (3.15)
NOTE In languages distinguishing finiteness, the main clause is usually finite. Example: The train is late.
3.11
modifier
part of a constituent (3.4) which ascribes a property to the head (3.8) of the constituent
NOTE A modifier can be placed before or after the head of the phrase (3.14) (pre-modifier or post-modifier).
Modifiers are optional in a constituent.
3.12
node
syntactic node
word form (3.24) or constituent (3.4) seen as an elementary syntactic component of a syntactic analysis
2 © ISO 2010 – All rights reserved
3.13
non-terminal node
syntactic node (3.12) which is not a word form (3.24)
NOTE A non-terminal node has an outgoing constituency edge (3.6).
3.14
phrase
group of word forms (3.24) (usually containing one or more words) which can fulfill a grammatical function
(3.7), e.g. in a clause (3.3)
NOTE Empty phrases are permitted (being non-realised pronouns, sometimes marked as “pro”, and having the role
of subjects in clauses). A phrase is typically named after its head (3.8), for example noun phrases, verb phrases, adjective
phrases, adverbial phrases and prepositional phrases. Phrases have been informally described as “bloated words”, in that
the parts of the phrase added to the head elaborate and specify the reference of the head. In our model, a phrase is a
special case of a constituent (3.4).
3.15
sentence
related group of word forms (3.24) containing a predication, usually expressing a complete thought and
forming the basic unit of discourse structure
NOTE A sentence consists of one or more clauses (3.3). When describing speech, it is common to talk about
“utterances” rather than sentences.
3.16
span
pair of points (p1, p2), where p1 u p2, identifying the segment of the document to which an annotation (3.9)
is applied
NOTE A multiple span is a sequence of spans where the ending point of each span is less than or equal to the
starting point of the subsequent span.
3.17
subordinate clause
clause which fulfils a grammatical function (3.7) in a phrase (3.14) [for example a relative clause (3.3)
modifying the head (3.8) noun of a nominal phrase] or in another clause
NOTE A subordinate clause usually does not act on its own as a sentence, but is part of a larger sentence.
3.18
subcategorization frame
set of restrictions indicating the properties of the syntactic arguments (3.19) that can or must occur with a
verb
EXAMPLE Alfred (/syntacticArgument/) reads a book (/syntacticArgument/) today (/adjunct/).
NOTE The subject, indirect object and direct object are subcategorized grammatical functions (3.7) within a
sentence; they are dependents of the verb (i.e. they can appear in subcategorization frames).
3.19
syntactic argument
functionally essential element that is required and given its interpretation by the head of its phrase (3.14) or
the node (3.12) of which it is a dependent (e.g. the nominal argument of a prepositional phrase or verb)
NOTE For verbs and verbal phrases, arguments identify the participants in the process referred to by the verb. In
some frameworks, syntactic arguments are called complements.
3.20
syntactic graph
graph
connected set of syntactic nodes (3.12) and edges (3.6)
3.21
syntactic tree
syntactic graph (3.20) in which each node has a single parent
3.22
syntax
way in which word forms (3.24) are interrelated and/or grouped together into phrases, thus capturing the
relations that exist between those units
3.23
terminal node
syntactic node (3.12) which is a single word form (3.24) or an empty element involved in a syntactic relation
3.24
word form
contiguous or non-contiguous entity from a speech or text sequence identified as an autonomous lexical item
4 SynAF metamodel
4.1 Introduction
Syntactic annotations have at least two functions in language processing:
a) to represent linguistic constituency, as in noun phrases (NP), describing a structured sequence of
morpho-syntactically annotated items (including empty elements or traces generated by movements at
the constituency level), as well as constituents built from non-contiguous elements, and
b) to represent dependency relations, such as head-modifier relations, and also including relations between
categories of the same kind (such as the head-head relations between nouns in appositions, or nominal
coordinations in some formalisms). The dependency information can exist between morpho-syntactically
annotated items within a phrase (an adjective is the modifier of the head noun within an NP) or describe a
specific relation between syntactic constituents at the clausal and sentential level (i.e. an NP being the
“subject” of the main verb of a clause or sentence). The dependency relation can also be stated for empty
elements (e.g. the pro element in romance languages, which serves a grammatical function).
As a consequence, syntactic annotations shall comply with a multi-layered annotation strategy interrelating
syntactic annotation for both constituency and dependency as stated in the SynAF metamodel.
4 © ISO 2010 – All rights reserved
4.2 SynAF metamodel
4.2.1 Overview
The SynAF metamodel is represented as a set of UML classes complemented by UML attribute-value pairs,
which represent the associated syntactic data categories. The SynAF textual descriptions specify more
complete information about the SynAF classes, relations and extensions than can be included in the UML
diagram. Developers shall define a data category selection (DCS) as specified for SynAF data category
selection procedures (see Figure 1). The data categories given in Annex A shall be used for the
representation of syntactic annotations.
Figure 1 — SynAF metamodel (articulated with MAF)
4.2.2 SyntacticNode class
The SyntacticNode class is a generic class subsuming both the class of terminal nodes and the class of non-
terminal nodes. Syntactic nodes can be involved in as many syntactic relations as necessary (see 3.6,
syntactic edges).
4.2.3 T_Node class
The T_Node class represents the terminal nodes of a syntactic tree, consisting of morpho-syntactically
annotated word forms, as well as empty elements when appropriate. The T_Nodes are defined over one or
more spans (multiple spans can account for discontinuous constituents). T_Nodes are annotated with
syntactic categories valid for the word level.
4.2.4 NT_Node class
The NT_Node class represents the non-terminal nodes of a syntax tree. Syntax trees mainly consist of
T_Nodes and NT_Nodes, including empty elements when appropriate. T_Nodes make reference to a span.
Thus by virtue of the syntactic tree representation, spans can also be inferred for NT_Nodes. The NT_Nodes
are annotated with syntactic categories valid at the phrasal level and higher (clausal, sentential).
4.2.5 SyntacticEdge class
The SynacticEdge class represents a relation between syntactic nodes (both terminal and non-terminal
nodes). For example, the dependency relation is binary, consisting of a pair of source and target nodes, with
one or more annotations. In particular, a syntactic edge can be annotated by a /syntacticEdgeType/ (see
Annex A), whose conceptual domain can be one of, but is not limited to, /primarySyntacticEdge/,
/secondarySyntacticEdge/.
4.2.6 Annotation class
The Annotation class represents the application of syntactic information to SynAF annotated data, as well as
(see Figure 1) the application of morphosyntactic information to MAF annotated data.
6 © ISO 2010 – All rights reserved
Annex A
(normative)
Data categories for SynAF
A.1 General
The following data categories shall be used for the representation of syntactic annotations in combination with
the SynAF metamodel. When necessary, specific applications may define additional data categories, which
shall be described in compliance with ISO 12620 and provided in the ISOCat data category registry.
A.2 Basic syntactic data categories
/annotation/
Definition [en] information added to a word, phrase, clause, sentence, a text or to a relation among
them
/annotationDepth/
Conceptual Domain /deepParsing/, /shallowParsing/, /tagging/
Definition [en] level of information richness the annotation describes
/annotationStyle/
Conceptual Domain /embeddedNotation/, /mixedNotation/, /standoffNotation/
Definition [en] style of annotation
/annotationType/
Conceptual Domain /constituency/ /constituencyAndDependency/ /dependency/
Definition [en] type of annotation
/clitic/
Definition [en] unstressed word which cannot stand on its own as a normal utterance and is
phonologically dependent upon a neighboring word for pronunciation
— Note [en] There is a great variation concerning clitics. Sometimes, in English, the cliticized forms
are restricted to the contracted forms of auxiliaries, as in I'm, she’ll, etc. However in
some instances, articles are also referred to as clitics.
...
INTERNATIONAL ISO
STANDARD 24615
First edition
2010-10-15
Language resource management —
Syntactic annotation framework (SynAF)
Gestion de ressources langagières — Cadre d'annotation syntaxique
(SynAF)
Reference number
©
ISO 2010
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2010
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2010 – All rights reserved
Contents Page
Foreword .iv
Introduction.v
1 Scope.1
2 Normative references.1
3 Terms and definitions .1
4 SynAF metamodel .4
4.1 Introduction.4
4.2 SynAF metamodel .5
4.2.1 Overview.5
4.2.2 SyntacticNode class.6
4.2.3 T_Node class .6
4.2.4 NT_Node class.6
4.2.5 SyntacticEdge class.6
4.2.6 Annotation class.6
Annex A (normative) Data categories for SynAF.7
Annex B (informative) Relation to the Linguistic Annotation Framework .15
Bibliography.17
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24615 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management, in collaboration with the European
eContent Project “LIRICS” (Linguistic Infrastructure for Interoperable Resources and Systems), under the
contract e-Content-22236-LIRICS.
ISO 24615 is designed to coordinate closely with ISO 24612, Language resource management — Linguistic
annotation framework (LAF), ISO 24613:2008, Language resource management — Lexical markup framework
(LMF), and ISO 24611, Language resource management — Morpho-syntactic annotation framework.
iv © ISO 2010 – All rights reserved
Introduction
This International Standard is based on numerous projects and pre-standardisation activities that have taken
[9]
place in the last few years (see Abeillé, 2001 ), to provide reference models and formats for the
representation of syntactic information, whether as the output of a syntactic parser, or as annotations of
language resources (treebanks). For several years, the Penn Treebank initiative has served as a de facto
standard for treebanking, but more recent works e.g. the Negra/Tiger initiative (see: http://www.ims.uni-
stuttgart.de/projekte/TIGER/TIGERCorpus/) in Germany or the ISST initiative in Italy [see Montemagni
[18]
(2003) ] demonstrate the viability of a more coherent framework that can account for both (hierarchical)
constituency and dependency phenomena in syntactic annotation.
The eContent project “LIRICS”, has been seminal in gathering a group of experts, who initiated the ISO 24615
(SynAF) project. While preparing SynAF, this group confirmed that existing initiatives indeed share a common
data model that offers a good basis for the SynAF metamodel (see the study made in Deliverable D.3.1
“Evaluation of initiatives for morpho-syntactic and syntactic annotation” of the EU project LIRICS, available at
http://lirics.loria.fr/doc_pub/Del3_1_V2.pdf).
This International Standard proposes a metamodel for syntactic annotation together with a list of relevant data
categories for syntactic annotation. The data categories are available on the ISOCat server
(http://www.isocat.org/) in the syntax profile (as defined in ISO 12620:2009).
INTERNATIONAL STANDARD ISO 24615:2010(E)
Language resource management — Syntactic annotation
framework (SynAF)
1 Scope
This International Standard describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across
language resources or language processing components. This International Standard is complementary and
closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for
syntactic representations as well as reference data categories for representing both constituency and
dependency information in sentences or other comparable utterances and segments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application
ISO 1087-2:2000, Terminology work — Vocabulary — Part 2: Computer applications
ISO 12620:2009, Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 24611, Language resource management — Morpho-syntactic annotation framework
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1, ISO 1087-2,
ISO 12620:2009, ISO 24611 and the following apply.
3.1
adjunct
non-essential element associated with a verb as opposed to syntactic arguments (3.19)
NOTE Adverbs are possible adjuncts for a sentence.
3.2
chunk
non-recursive constituent (3.4)
3.3
clause
group of phrases (3.14), usually containing a predicate
NOTE A clause can be either a main clause (3.10) or a subordinate clause (3.17). In languages distinguishing
finiteness, clauses whose predicate is a verb can be either finite or non-finite, depending on the form of the verb. A main
clause alone can build a complete sentence (3.15). In the SynAF model, a clause is a special case of a constituent (3.4).
3.4
constituent
syntactic grouping of words [into phrases (3.14)], phrases [into clauses (3.3) or other phrases] or clauses
[into a sentence (3.15)] on the base of structural (or hierarchical) properties
3.5
dependency
dependency relation
syntactic relation between word forms (3.24) or constituents (3.4) on the basis of the grammatical
functions (3.7) that constituents play in relation to each other
3.6
syntactic edge
edge
triplet with a source node (3.12), a target node, and optional annotations (3.9)
NOTE Non-terminal nodes (3.13) have an outgoing constituency syntactic edge.
3.7
grammatical function
grammatical role of a word form (3.24) or constituent (3.4) within its embedding syntactic environment
NOTE For example, a noun phrase (NP) can act as a subject within a sentence (3.15), or a noun may act as a
subject dependent of a verb in a dependency graph. There is a grammatical relation between the subject – NP and the
main verb in a sentence. All grammatical relations (subject – predicate, head – modifier, etc.) are subsumed under the
concept of dependency relations (3.5), whether between terminal or non-terminal nodes.
3.8
syntactic head
head
part of a constituent (3.4) which determines its distribution (the syntactic environments in which the
constituent may appear) and its grammatical properties (e.g. if the grammatical gender of the head is feminine,
then the gender of the entire constituent will be feminine)
NOTE The head of a constituent usually cannot be left out.
3.9
linguistic annotation
annotation
feature-value pair denoting a linguistic property of a linguistic segment
3.10
main clause
clause (3.3), which can act on its own as a complete sentence (3.15)
NOTE In languages distinguishing finiteness, the main clause is usually finite. Example: The train is late.
3.11
modifier
part of a constituent (3.4) which ascribes a property to the head (3.8) of the constituent
NOTE A modifier can be placed before or after the head of the phrase (3.14) (pre-modifier or post-modifier).
Modifiers are optional in a constituent.
3.12
node
syntactic node
word form (3.24) or constituent (3.4) seen as an elementary syntactic component of a syntactic analysis
2 © ISO 2010 – All rights reserved
3.13
non-terminal node
syntactic node (3.12) which is not a word form (3.24)
NOTE A non-terminal node has an outgoing constituency edge (3.6).
3.14
phrase
group of word forms (3.24) (usually containing one or more words) which can fulfill a grammatical function
(3.7), e.g. in a clause (3.3)
NOTE Empty phrases are permitted (being non-realised pronouns, sometimes marked as “pro”, and having the role
of subjects in clauses). A phrase is typically named after its head (3.8), for example noun phrases, verb phrases, adjective
phrases, adverbial phrases and prepositional phrases. Phrases have been informally described as “bloated words”, in that
the parts of the phrase added to the head elaborate and specify the reference of the head. In our model, a phrase is a
special case of a constituent (3.4).
3.15
sentence
related group of word forms (3.24) containing a predication, usually expressing a complete thought and
forming the basic unit of discourse structure
NOTE A sentence consists of one or more clauses (3.3). When describing speech, it is common to talk about
“utterances” rather than sentences.
3.16
span
pair of points (p1, p2), where p1 u p2, identifying the segment of the document to which an annotation (3.9)
is applied
NOTE A multiple span is a sequence of spans where the ending point of each span is less than or equal to the
starting point of the subsequent span.
3.17
subordinate clause
clause which fulfils a grammatical function (3.7) in a phrase (3.14) [for example a relative clause (3.3)
modifying the head (3.8) noun of a nominal phrase] or in another clause
NOTE A subordinate clause usually does not act on its own as a sentence, but is part of a larger sentence.
3.18
subcategorization frame
set of restrictions indicating the properties of the syntactic arguments (3.19) that can or must occur with a
verb
EXAMPLE Alfred (/syntacticArgument/) reads a book (/syntacticArgument/) today (/adjunct/).
NOTE The subject, indirect object and direct object are subcategorized grammatical functions (3.7) within a
sentence; they are dependents of the verb (i.e. they can appear in subcategorization frames).
3.19
syntactic argument
functionally essential element that is required and given its interpretation by the head of its phrase (3.14) or
the node (3.12) of which it is a dependent (e.g. the nominal argument of a prepositional phrase or verb)
NOTE For verbs and verbal phrases, arguments identify the participants in the process referred to by the verb. In
some frameworks, syntactic arguments are called complements.
3.20
syntactic graph
graph
connected set of syntactic nodes (3.12) and edges (3.6)
3.21
syntactic tree
syntactic graph (3.20) in which each node has a single parent
3.22
syntax
way in which word forms (3.24) are interrelated and/or grouped together into phrases, thus capturing the
relations that exist between those units
3.23
terminal node
syntactic node (3.12) which is a single word form (3.24) or an empty element involved in a syntactic relation
3.24
word form
contiguous or non-contiguous entity from a speech or text sequence identified as an autonomous lexical item
4 SynAF metamodel
4.1 Introduction
Syntactic annotations have at least two functions in language processing:
a) to represent linguistic constituency, as in noun phrases (NP), describing a structured sequence of
morpho-syntactically annotated items (including empty elements or traces generated by movements at
the constituency level), as well as constituents built from non-contiguous elements, and
b) to represent dependency relations, such as head-modifier relations, and also including relations between
categories of the same kind (such as the head-head relations between nouns in appositions, or nominal
coordinations in some formalisms). The dependency information can exist between morpho-syntactically
annotated items within a phrase (an adjective is the modifier of the head noun within an NP) or describe a
specific relation between syntactic constituents at the clausal and sentential level (i.e. an NP being the
“subject” of the main verb of a clause or sentence). The dependency relation can also be stated for empty
elements (e.g. the pro element in romance languages, which serves a grammatical function).
As a consequence, syntactic annotations shall comply with a multi-layered annotation strategy interrelating
syntactic annotation for both constituency and dependency as stated in the SynAF metamodel.
4 © ISO 2010 – All rights reserved
4.2 SynAF metamodel
4.2.1 Overview
The SynAF metamodel is represented as a set of UML classes complemented by UML attribute-value pairs,
which represent the associated syntactic data categories. The SynAF textual descriptions specify more
complete information about the SynAF classes, relations and extensions than can be included in the UML
diagram. Developers shall define a data category selection (DCS) as specified for SynAF data category
selection procedures (see Figure 1). The data categories given in Annex A shall be used for the
representation of syntactic annotations.
Figure 1 — SynAF metamodel (articulated with MAF)
4.2.2 SyntacticNode class
The SyntacticNode class is a generic class subsuming both the class of terminal nodes and the class of non-
terminal nodes. Syntactic nodes can be involved in as many syntactic relations as necessary (see 3.6,
syntactic edges).
4.2.3 T_Node class
The T_Node class represents the terminal nodes of a syntactic tree, consisting of morpho-syntactically
annotated word forms, as well as empty elements when appropriate. The T_Nodes are defined over one or
more spans (multiple spans can account for discontinuous constituents). T_Nodes are annotated with
syntactic categories valid for the word level.
4.2.4 NT_Node class
The NT_Node class represents the non-terminal nodes of a syntax tree. Syntax trees mainly consist of
T_Nodes and NT_Nodes, including empty elements when appropriate. T_Nodes make reference to a span.
Thus by virtue of the syntactic tree representation, spans can also be inferred for NT_Nodes. The NT_Nodes
are annotated with syntactic categories valid at the phrasal level and higher (clausal, sentential).
4.2.5 SyntacticEdge class
The SynacticEdge class represents a relation between syntactic nodes (both terminal and non-terminal
nodes). For example, the dependency relation is binary, consisting of a pair of source and target nodes, with
one or more annotations. In particular, a syntactic edge can be annotated by a /syntacticEdgeType/ (see
Annex A), whose conceptual domain can be one of, but is not limited to, /primarySyntacticEdge/,
/secondarySyntacticEdge/.
4.2.6 Annotation class
The Annotation class represents the application of syntactic information to SynAF annotated data, as well as
(see Figure 1) the application of morphosyntactic information to MAF annotated data.
6 © ISO 2010 – All rights reserved
Annex A
(normative)
Data categories for SynAF
A.1 General
The following data categories shall be used for the representation of syntactic annotations in combination with
the SynAF metamodel. When necessary, specific applications may define additional data categories, which
shall be described in compliance with ISO 12620 and provided in the ISOCat data category registry.
A.2 Basic syntactic data categories
/annotation/
Definition [en] information added to a word, phrase, clause, sentence, a text or to a relation among
them
/annotationDepth/
Conceptual Domain /deepParsing/, /shallowParsing/, /tagging/
Definition [en] level of information richness the annotation describes
/annotationStyle/
Conceptual Domain /embeddedNotation/, /mixedNotation/, /standoffNotation/
Definition [en] style of annotation
/annotationType/
Conceptual Domain /constituency/ /constituencyAndDependency/ /dependency/
Definition [en] type of annotation
/clitic/
Definition [en] unstressed word which cannot stand on its own as a normal utterance and is
phonologically dependent upon a neighboring word for pronunciation
— Note [en] There is a great variation concerning clitics. Sometimes, in English, the cliticized forms
are restricted to the contracted forms of auxiliaries, as in I'm, she’ll, etc. However in
some instances, articles are also referred to as clitics.
/constituency/
Definition [en] mechanism allowing the construction of words into phrases, phrases into higher
phrases or clauses, and clauses into sentences
— Note [en] The construction of sentences into text is not usually called constituency.
/constituencyAndDependency/
Definition [en] union of constituency and dependency
/contiguous/
Definition [en] property of a grammatical unit sharing a boundary with another
/deepParsing/
Definition [en] process of fully
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...