FAIR Language Resources in NLP: Stewardship, Reuse and Long-Term Sustainability

Workshop at CLIB 2026 (Satellite Event organised by FOCUS ERA Chair)

📍 Sofia, Bulgaria

🗓 7 September 2026

Language resources are the foundation of linguistic research and NLP. Corpora, lexicons, annotated datasets, benchmarks, and models are produced at an unprecedented pace. Yet their long-term stewardship, interoperability, and reuse remain inconsistent and often fragile. Rapid creation has outpaced sustainable design.

This workshop aims to bring together researchers, infrastructure providers, data stewards, and policy actors who are committed to building durable language resource ecosystems. We aim to address the pressing challenges of sustaining datasets used in linguistic research and in the development of NLP systems—from documentation and versioning to governance, licensing, and infrastructure support.

The workshop will explore how FAIR principles (Findable, Accessible, Interoperable, Reusable) can be meaningfully operationalised for language resources in NLP and computational linguistics.

Call for Papers (CFP)

We invite short papers (4–6 pages) presenting original work, position papers, case studies, tools, infrastructure approaches, and critical reflections related to FAIR and sustainable language resources.

The submissions should use the CLIB template. Please use the final submission template. Submissions to the workshop are not anonymous.

Topics of Interest

1. Technical Foundations

Designing language resources so they are interoperable, transparent, and structurally reusable.

  • Domain-specific FAIR implementation strategies for corpora, lexicons, datasets, and models
  • Metadata, paradata, and annotation transparency frameworks
  • Repository architectures and infrastructure design for linguistic data

2. Lifecycle & Reuse

  • Ensuring language resources remain usable, traceable, and measurable across research cycles.
  • From raw data to FAIR-ready assets: preprocessing, cleaning, and quality assurance workflows
  • Documentation, versioning, and provenance tracking for evolving resources
  • Persistent identifiers and citation mechanisms for language datasets
  • Methods for tracking, measuring, and evidencing reuse
  • Critical reflections and lessons learned from implementation challenges
  • Replicability of the experiments over the language resources


3. Policy & Sustainability

Creating the institutional and legal conditions that allow language resources to endure.

  • Legal, ethical, and licensing considerations in sharing and reusing language data
  • Governance structures and sustainability models beyond project funding

Submission Guidelines

Length: 4–6 pages (excluding references)

All submissions will undergo peer review by the Programme Committee. Each paper will be reviewed by at least two reviewers.

Accepted papers will be presented at the workshop and included in the workshop proceedings (details to follow).

Important Dates

  • Submission deadline: 22 April 2026
  • Notification of acceptance: 22 May 2026
  • Camera-ready deadline: To be confirmed
  • Workshop date: 7 September 2026

Workshop Format

The workshop will include invited keynote talks, peer-reviewed short paper presentations, an interactive FAIR & Stewardship Assessment Exercise, and a moderated panel discussion on sustainability and next steps.

We anticipate 25–40 participants and aim for a focused, engaged, and discussion-rich event.

Expected Outcomes

  • A community-developed checklist for FAIR and sustainable language resources
  • A summary report with practical recommendations
  • Exploration of a follow-up working group on FAIR language infrastructures

Chairs

Dr Milena Dobreva (University of Strathclyde, IMI-BAS)

Dr Ivan Lambov (IMI-BAS)


  • Home
    • FOCUS
      • Events
        • FAIR Language Resources in NLP: Stewardship, Reuse and Long-Term Sustainability