𐰋 COLT

Roadmap

Current status

The data model and interface are implemented on a working prototype built around the first line of the south face of the Kül Tigin inscription (KT-S 1). All four strata and the full token-level apparatus described in the methodology are in place on this initial sample.

KT-S 1 · 27 tokens · 33 lemmas · 19 morphemes · 36 morphological analyses · 64 segments. The prototype is live and includes search and lemmatised query on this dataset.

Five-year framework

Year 1
Encoding of the full Kül Tigin inscription. Core digital infrastructure.
Year 2–3
Remaining Old Turkic inscriptions of Mongolia (Bilgä Qaγan, Tonyukuk, smaller monuments of the Second Türk Khaganate).
Year 4
Yenisey inscriptions.
Year 5
Extension to Old Uyghur texts.

The pace is deliberately measured: the value of a corpus of this kind lies in the depth and reliability of each entry, not in volume. Each phase is conceived to produce a citable, versioned release of the dataset alongside the live web interface.

Infrastructure

The production corpus is planned to be hosted on the French national infrastructure for digital humanities, Huma-Num, with PostgreSQL as the data store and Django as the application framework (Django templates first, with React used later for any specifically interactive components that require it). This site, by contrast, is a static project site whose only role is to introduce the work; the actual corpus will live at its own address.