ISO 24623-1:2018 describes the abstract metamodel designed to accommodate any corpus query language (QL) and providing a basis for coarse-grained classification. The metamodel consists of several components referred to as CQLF classes, levels, and modules, and is illustrated with examples from the Single-stream class (where a single data stream is used to organize the relevant data structures). Within this class, this document discusses three CQLF levels (Linear, Complex and Concurrent), as well as their subdivisions into modules, dictated by functional and modelling criteria. ISO 24623-1:2018 does not provide a way to specify further details beyond the above-mentioned divisions, and neither does it contain within its scope QLs designed to query more than one concurrent data stream, as in multimodal corpora or in parallel corpora (such QLs can still be classified according to the criteria suggested here for less expressive QLs).

  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    12 pages
    English language
    sale 15% off
  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24615-2:2018 describes an XML-conformant serialization of the ISO 24615‑1 meta-model, with the objective of supporting interoperability across language resources or language processing components in the domain of syntactic annotations. As an extension of ISO 24615‑1, this document is also coordinated with ISO 24612.

  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    12 pages
    English language
    sale 15% off
  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24624:2016 specifies rules for representing transcriptions of audio- and video-recorded spoken interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the document aims to relate transcribed data with standards for annotated corpora. It is applicable to transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics, corpus lexicography, language technology, qualitative social studies and other transcription data of recorded spoken language. It is not applicable to other forms of transcription, most importantly transcriptions of hand-written manuscripts. Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

  • Standard
    39 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    32 pages
    English language
    sale 15% off
  • Standard
    39 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    34 pages
    French language
    sale 15% off

ISO 24615-1:2014 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615-1:2014 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.

  • Standard
    25 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    20 pages
    English language
    sale 15% off
  • Standard
    25 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    20 pages
    French language
    sale 15% off

ISO 24611:2012 provides a framework for the representation of annotations of word-forms in texts; such annotations concern tokens, their relationship with lexical units, and their morpho-syntactic properties.It describes a metamodel for morpho-syntactic annotation that relates to a reference to the data categories contained in the ISOCat data category registry (DCR, as defined in ISO 12620). It also describes an XML serialization for morpho-syntactic annotations, with equivalences to the guidelines of the TEI (text encoding initiative).

  • Standard
    65 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    58 pages
    English language
    sale 15% off
  • Standard
    65 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    63 pages
    French language
    sale 15% off

The basic concepts and general principles of word segmentation as defined in ISO 24614-1 apply to Chinese, Japanese and Korean. Text needs to be segmented into tokens, words, phrases or some other types of smaller textual units in order to perform certain computational applications on language resources, such as natural language processing, information retrieval and machine translation. ISO 24614-2:2011 is restricted to the segmentation of a text into words or other word segmentation units (WSUs). This task is distinct from morphological or syntactic analysis per se, although it greatly depends on morphosyntactic analysis. It is also different from the task of laying out a framework for constructing a lexicon and identifying its lexical entries, namely lemmas and lexemes. The frameworks for the latter tasks are provided by ISO 24611, ISO 24613 and ISO 24615. ISO 24614-2:2011 specifies rules for delineating WSUs for Chinese, Japanese and Korean. Some rules are common to all three languages, though each language also has its own distinct rules for identifying WSUs. The common features are discussed, then the distinct rules are laid out for Chinese, for Japanese and for Korean.

  • Standard
    49 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    43 pages
    English language
    sale 15% off
  • Standard
    49 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24614-1:2010 presents the basic concepts and general principles of word segmentation, and provides language-independent guidelines to enable written texts to be segmented, in a reliable and reproducible manner, into word segmentation units (WSU). The many applications and fields that need to segment texts into words — and thus to which ISO 24614-1:2010 can be applied — include translation, content management, speech technologies, computational linguistics and lexicography.

  • Standard
    20 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    15 pages
    English language
    sale 15% off
  • Standard
    20 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24615:2010 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615:2010 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.

  • Standard
    18 pages
    English language
    sale 15% off
  • Standard
    23 pages
    English language
    sale 10% off
    e-Library read for
    1 day