November 2025: New Core Standard for Morphosyntactic Annotation in Language Resources

November 2025: New Core Standard for Morphosyntactic Annotation in Language Resources

In November 2025, a significant milestone was reached in the field of terminology, documentation, and standardization with the publication of ISO 24611-1:2025. This new international standard establishes a foundational framework for the morphosyntactic annotation of linguistic resources. For language technologists, researchers, compliance officers, and documentation managers, the adoption of this standard signals greater interoperability, consistency, and efficiency in managing linguistic data.

This article explores the new ISO 24611-1:2025 standard—what it delivers, key requirements, and what it means for organizations looking to stay ahead in language resource management.


Overview / Introduction

The management and annotation of language resources are fundamental to advancements in computational linguistics, language technology, data-driven language analysis, and digital humanities. Well-defined standards are essential for:

  • Interoperability between language technology tools and data sources
  • Consistency in linguistic annotation and terminology
  • Enabling robust information retrieval, machine learning, and AI-driven text analysis

With the increasing importance of linguistic data in AI, machine translation, corpus linguistics, and semantic search, having a unified morphosyntactic annotation framework is critical. ISO 24611-1:2025 provides this foundation by defining a highly structured metamodel, precise data category referencing, and an XML serialization approach that ensures both human- and machine-readability.

This article guides you through:

  • The scope and features of the new standard
  • Its technical requirements and compliance considerations
  • How it transforms practical work in language resource management
  • Strategic steps for implementation and compliance

Detailed Standards Coverage

ISO 24611-1:2025 – Morphosyntactic Annotation Framework (MAF) – Part 1: Core Model

Language resource management — Morphosyntactic annotation framework (MAF) — Part 1: Core model

ISO 24611-1:2025 establishes a core, standardized model for representing morphosyntactic annotations—‘morphosyntactic’ referring to the morphological and syntactic properties of word-sized units in texts. This standard is pivotal for those working with annotated corpora, natural language processing (NLP) datasets, lexicons, and digital text archives.

What the Standard Covers

At its center, ISO 24611-1:2025 provides:

  • A metamodel that separates language data into two distinct levels: tokens (surface-level word segments) and word-forms (linguistic abstractions associated with tokens).
  • Mechanisms to describe the relationships between tokens, word-forms, and the morphosyntactic features such as part of speech, number, gender, and other grammatical attributes.
  • Strong recommendations for referencing data categories—using unique and stable identifiers aligned with ISO 12620-2, which enhances semantic clarity and interoperability.
  • A default XML serialization, conforming to the Text Encoding Initiative (TEI) Guidelines, allowing seamless data exchange, integration, and archiving.

Key Requirements and Specifications

  • Tokenization: Defines tokens as contiguous sequences of characters; provides for complex segmentation strategies for multiple scripts and languages.
  • Word-Form Abstraction: Supports n-to-n relationships between tokens and word-forms, accommodating complex language phenomena like compounds or idiomatic expressions.
  • Feature Structures: Morphosyntactic features are expressed using feature structures (aligned with ISO 24610-1) and tied to a central data category repository.
  • XML Representation: All morphosyntactic annotation must be serialized in XML according to TEI guidelines, ensuring compatibility and long-term preservation.
  • Ambiguity Handling: While structural ambiguities are out of scope for this part, the model provides for explicit representation and referencing of ambiguous cases to be covered in future parts (notably MAF Lattice in Part 2).
  • Conformance Conditions: Newly added clarity on what constitutes compliance, and mechanisms for validation of annotated datasets.

Target Industries/Organizations

  • Language technology providers
  • Digital humanities projects and research centers
  • Lexical database curators and lexicon developers
  • Software vendors in NLP, machine translation, and information retrieval
  • Academic, government, and policy settings handling electronic records or multilingual documentation

Practical Implications for Implementation

  • Organizations must ensure that all morphosyntactic annotations in their language resources conform to the MAF core model and are serialized as TEI XML.
  • When building or curating annotated corpora, data managers should utilize referenced data categories compliant with ISO 12620-2 to ensure uniformity and reuse.
  • If software or tools are developed that process linguistic annotations, these should be upgraded or validated for compliance against this model.

Notable Changes from Previous Versions

  • Full serialization of the data model in TEI XML—providing more clarity and a direct path for data integration.
  • Revised definitions and text—to align with the most recent understanding in the field and improve cross-linguistic applicability.
  • Addition of conformance conditions—making validation and compliance easier and more transparent.
  • Delegation of word lattice (structural ambiguity) modeling to future parts—focusing this part strictly on the core model.
  • Transition of sample data categories to an online repository—improving extensibility and maintainability.

Key highlights:

  • Establishes a flexible, extensible metamodel for morphosyntactic annotation
  • Provides robust XML serialization in line with internationally accepted TEI Guidelines
  • Anchors feature values and annotation types to globally unique data category references for maximum interoperability

Access the full standard:View ISO 24611-1:2025 on iTeh Standards


Industry Impact & Compliance

The publication of ISO 24611-1:2025 marks a leap forward for sectors relying on accurate, shareable language data. For language technology providers, research institutions, and documentation managers, compliance with the MAF core model brings:

  • Enhanced Data Interoperability: Standardized annotation allows seamless data exchange between tools, organizations, and systems worldwide.
  • Reduced Data Silos: Unified descriptions mean language resources can be aggregated or segmented as needed without manual adjustment.
  • Improved Quality/Validation: Clear conformance criteria enable quality managers to assess annotated data and software outputs systematically.
  • Regulatory Alignment: For organizations bound by international or national regulations on digital data, implementing this standard helps fulfill requirements for linguistic data processing and archiving.

Compliance Considerations and Timelines

  • Upgrading existing annotated resources to the new MAF XML serialization may require data transformation and validation.
  • New projects should adopt ISO 24611-1:2025 as the baseline for all morphosyntactic annotation work.
  • Quality and compliance teams should update internal checklists and validation workflows to reflect the new conformance conditions.
  • Cross-project data sharing initiatives can set compliance with MAF as a prerequisite for participation.

Benefits of Adopting This Standard

  • Improved data reuse and reduction in redundancy
  • Greater precision and transparency in linguistic annotation
  • Facilitated adoption by software developers and tool vendors
  • Better multilingual support due to consistent referencing and modular structure
  • Stronger support for future-proof digital assets owing to open, standards-based XML serialization

Risks of Non-Compliance

  • Data incompatibility with new tools or collaborative partners
  • Increased cost in re-annotation or manual data conversion
  • Potential data loss or misinterpretation across formats and languages
  • Difficulty integrating with AI, NLP, or automated text mining workflows

Technical Insights

Common Technical Requirements

The core technical principles shared across the new standard include:

  • TEI XML Serialization: All morphosyntactic annotation must be encoded according to TEI (Text Encoding Initiative) Guidelines, version 4.7.0 or later.
  • Data Category Referencing: Mandatory referencing of data categories in ISO 12620-2, enabling global semantic clarity and preventing ambiguities.
  • Metamodel Distinction: Enforces the essential distinction between tokens and word-forms, allowing both simple and highly complex linguistic structures.
  • Feature Structure Representation: Brings direct compatibility with ISO 24610-1 for expressing morphosyntactic features as feature structures.

Implementation Best Practices

  • Start with Model Review:

    1. Analyze your current annotation streams—identify where tokens, word-forms, and morphosyntactic tags are marked.
    2. Review whether your XML structure aligns with TEI elements, especially <seg> for tokens and <span> for word-forms.
  • Data Migration:

    1. Transform legacy annotations into the MAF-conformant TEI XML.
    2. Validate using XML schema definitions provided by the standard and cross-check feature structure compliance with ISO 24610-1.
  • Data Category Integration:

    1. Create a mapping between your current tagsets and ISO 12620-2 data category identifiers.
    2. Integrate references in your XML to these external semantic resources.
  • Validation & Testing:

    1. Develop or adopt validation scripts to ensure all new data meets the standard's conformance requirements.
    2. Perform interoperability tests with external tools and data from collaborating partners.

Testing and Certification Considerations

  • Ensure that your XML files are well-formed and validate against the latest TEI schemas.
  • Use tools or libraries that support feature structure processing as specified in ISO 24610-1.
  • Conduct internal certification or peer review to verify full compliance prior to publication or exchange.
  • Document any language-specific constraints or extensions in alignment with the standard’s guidance (e.g., in the TEI header metadata).

Conclusion / Next Steps

The November 2025 release of ISO 24611-1:2025 sets a new bar for linguistic annotation, interoperability, and resource management. Key takeaways for professionals in language resource management, NLP, and documentation:

  • Adopting the MAF core model enables robust, extensible, and transferable linguistic data assets.
  • Transition your annotation pipelines and data archives to the new TEI XML serialization if not already compliant.
  • Train your teams on the new model’s principles, especially the strict separation between tokens and word-forms, and the requirement to reference ISO 12620-2 data categories.
  • Review and update internal guidelines to include the new conformance requirements, ensuring smooth future audits and evaluations.

Recommendation:

  • Start by reviewing your current annotation, migration, and validation processes.
  • Engage with industry groups and collaborative partners to ensure alignment in data formats and annotation models.
  • Monitor iTeh Standards and other authoritative resources for updates, case studies, and future parts in this standards series.

Explore and access the new standard now:View ISO 24611-1:2025 on iTeh Standards

Stay at the forefront of linguistic resource management—implement ISO 24611-1:2025 now to ensure compliance, interoperability, and innovation in your language technology workflows.