𐰋 COLT

About

The project

COLT (Annotated Corpus of Old Turkic) is an open-access, lemmatised, multi-layered digital corpus of Old Turkic runiform inscriptions. It is the work of Onur Bülbül, a turcologist at the University of Strasbourg, and is in active development.

The model is the long-established tradition of great philological corpora in classical studies (the Thesaurus Linguae Graecae, the Perseus Digital Library, the Thesaurus Linguae Latinae). COLT adapts the same ambition to Turkic philology: a single, exhaustive, citable reference corpus of the Old Turkic written tradition, beginning with the runiform inscriptions of the Second Türk Khaganate.

Where COLT fits

The most extensive digital corpus for Old Turkic to date is VATEC (Vorislamische Alttürkische Texte: Elektronisches Corpus) at Goethe-Universität Frankfurt. VATEC gathers the pre-Islamic Old Uyghur texts and provides their transliteration and transcription, synchronic morphological analysis of each word, and English translation. It does not, however, include the texts of the runiform-inscription period.

The inscription period is covered by Turkic Runiform Inscriptions, hosted by Uppsala University with support from the Johannes Gutenberg University of Mainz. The site, currently in beta, contains twenty-three short inscriptions, each with transliteration, transcription, English translation, and physical metadata. It does not provide synchronic or diachronic morphological analysis of the words.

A further project worth noting is The cataloguing and documentation of the runic inscriptions of the Altai Republic, a joint undertaking of Gorno-Altaisk State University and the Johann Wolfgang Goethe University in Frankfurt, funded by the German Research Foundation (DFG) and the Russian Foundation for Basic Research (RFBR). The corpus is regionally bounded — limited to the Altai Mountains — and gathers ninety inscriptions together with their physical and geographical metadata, without offering a morphological apparatus at the token level.

The most comprehensive runiform-inscription project to date is Türık Bitig, launched under the Kazakh government's Cultural Heritage programme. Its database holds some three hundred entries — inscriptions and manuscripts from Orkhon, Yenisei, Talas, Turfan, the Altai, Kazakhstan, and Fergana — each accompanied by photographs, transcription, and translation, though again without lemmatisation or parallel editorial readings.

COLT adds a different model alongside these four. Full lemmatisation, token-level morphological analysis, parallel storage of alternative readings (for example Tekin and Ölmez where their proposals diverge), and an open digital infrastructure that covers both the inscription period now and Old Uyghur texts in due course. The aim is not to replace existing tools but to complement them.

Author

Onur Bülbül

PhD candidate in Turkish Studies at the University of Strasbourg (defence: September 2026); associate researcher at the GEO research group (Groupe d’études orientales, slaves et néo-helléniques). BA in Turkish Language and Literature (Istanbul University); MA in Turkish Studies, sociolinguistics (Strasbourg). Six years of teaching at undergraduate and graduate level in the Department of Turkish Studies at Strasbourg.