Methodology

The four strata

Every line of an inscription is held in four parallel strata, each preserved as a distinct field rather than collapsed into a single transcription:

Runic. The original runiform text, character by character, in Unicode. Characters are not silently normalised; allographs are preserved.
Transliteration. A one-to-one rendering of the runiform into Latin script, retaining script-level information · vowel-harmony class, allographic distinctions · that a phonological transcription would lose.
Transcription. A phonological reading reflecting the reconstructed pronunciation. Where editors disagree on the vocalism, both readings are recorded as alternative transcriptions, attributed to their analyst.
Translation. A modern-language rendering (currently English and Turkish, with French planned). The translation is conservative: it does not paper over editorial uncertainty in the source.

Transliteration conventions

The transliteration follows an extended version of the conventions in Talat Tekin’s A Grammar of Orkhon Turkic (1968). Each consonantal grapheme is annotated with a superscript marking its vowel-harmony class (¹ for back, ² for front), and a small set of additional symbols disambiguates graphemes that would otherwise collapse in a Latin rendering. The full key will accompany the first public release.

Where the conventions of Mehmet Ölmez or Annemarie von Gabain diverge from Tekin’s, the divergence is recorded at the analysis level rather than imposed on the transliteration. The transliteration stratum is meant to be theory-light; the analytical apparatus is where editorial positions live.

Tokens, lemmas, morphemes

Each line is segmented into tokens, with their position in the line preserved. Each token is linked to one or more analyses; an analysis decomposes the token into segments, each pointing either to a lemma (a dictionary head) or to a morpheme (a grammatical element). Morphemes carry standard glossing labels (PST, 1SG, LOC, CAUS, …) so that the corpus can be queried at the level of grammatical category, not only of surface form.

Where a token admits more than one reading (a synchronic interpretation as a frozen lexeme alongside a diachronic decomposition into root and suffixes, or two competing vocalisations from different editors), both analyses are recorded in parallel, each attributed to its analyst with a confidence rating and editorial note.

A line in four strata

The first line of the south face of the Kül Tigin inscription (KT-S 1), shown here in all four strata with two translations:

Runic

𐱅𐰭𐰼𐰃𐱅𐰏:𐱅𐰭𐰼𐰃𐰓𐰀:𐰉𐰆𐰞𐰢𐱁:𐱅𐰇𐰼𐰜:𐰋𐰃𐰠𐰏𐰀:𐰴𐰍𐰣:𐰉𐰆𐰇𐰓𐰛𐰀:𐰆𐰞𐰺𐱃𐰢:𐰽𐰉𐰢𐰣:𐱅𐰇𐰛𐱅𐰃:𐰾𐰃𐰓𐰏𐰠:𐰆𐰞𐰖𐰆:𐰃𐰤𐰘𐰏𐰇𐰤𐰢:𐰆𐰍𐰞𐰣𐰢:𐰋𐰃𐰼𐰛𐰃:𐰆𐰍𐱁𐰢:𐰉𐰆𐰑𐰣𐰢:𐰋𐰃𐰼𐰘𐰀:𐱁𐰑𐰯𐰃𐱃𐰋𐰏𐰠𐰼:𐰘𐰃𐰺𐰖𐰀:𐱃𐰺𐰴𐱃:𐰉𐰆𐰖𐰺𐰸:𐰋𐰏𐰠𐰼:𐰆𐱃𐰕 ......

Transliteration

T²NGR²IT²G² : T²NGR²ID²A : B¹WL¹MŞ : T²ẄR²ẅK : B²IL²G²A : K¹G¹N¹ : B¹WẄD²K²A : WL¹R¹T¹M : S¹B¹MN¹ : T²ẄK²T²I : S¹ID²G²L² : WL¹Y¹W : IN²Y²G²ẄN²M : WG¹L¹N¹M : B²IR²K²I : WG¹ŞM : B¹WD¹N¹M : B²IR²Y²A : ŞD¹PIT¹B²G²L²R² : Y¹R¹Y¹A : T¹R¹K¹T¹ : B¹WY¹R¹wKB²G²L²R²WT¹Z ......

Transcription

teŋri teg teŋride bolmış türük bilge qaγan bu ödke olurtum sabımın tüketi eşidgil ulayu iniygünüm oγlanım birki uγuşum bodunum biriye şadapıt begler yırıya tarkat buyruk begler otuz……

Translation · EN

I, the God-like, Heaven-bred Turkish Wise [Bilge] Kagan, I have ascended the throne in this era. Hear from the beginning to the end of my words. First and foremost, O my younger brothers (and) my children, my united tribes and my people, the Shads in the South, the Tarkans in the north, the commander lords, the Thirty……

Translation · TR

Tanrı gibi (ve) Tanrı'dan olmuş Türk Bilge Hakan bu devirde (tahta) oturdum. Sözlerimi baştan sona işitin. Önce (siz) ey küçük erkek kardeşlerim (ve) çocuklarım, birleşik boyum (ve) halkım, güneyde Şadlar, kuzeyde tarkanlar, kumandan beyler, Otuz……

Token analysis

When a token is selected in the corpus (here olurtum from KT-S 1), a full entry is shown: the three formal layers (runic, transliteration, transcription), the morphemic segmentation, alternative editorial readings, and an etymological note.

𐰆𐰞𐰺𐱃𐰢

WL¹R¹T¹M

olurtum"I have ascended the throne"

< olur-t-(u)m

olur-: "to sit; to ascend the throne", verb stem
-D: past tense suffix
-(X)m: 1sg personal ending, possessive-derived

Note Mehmet Ölmez reads the word as olortum, with /o/ in the second syllable (Ölmez, 2021).

Etymology The phonetics of this word are complex. The causative form olγurt- suggests that the word’s original form was *olγur-, but no trace of such a word is attested. In the runic inscriptions it occurs very frequently in three distinct senses: (1) to sit down (for a rest), (2) to take (one’s) seat on the throne, (3) to settle down, take up residence. Its specialised use for rulers appears to be peculiar to the early period. (Clauson, 1972)

Lemmatised search

Because every token is linked to its lemma(s) and morpheme(s), the corpus supports search at the lemma level as well as the surface level. A click on the lemma olur- or on the morpheme -(X)m returns the list of every place that lemma or morpheme appears across the corpus, with a direct link from each occurrence to the sentence in which it stands.

Editorial attribution

No analysis is anonymous. Every reading is attributed to the editor who proposed it (Tekin, Ölmez, Gabain, and others as the corpus grows), and editorial notes are preserved alongside the segmentation. Where COLT departs from a published reading, the departure is recorded as an explicit decision, not silently absorbed.

Citation

Each inscription, face, line, sentence, token, and analysis carries a stable identifier. Citation conventions follow the pattern COLT/KT-S 1, token 10, with persistent links from each entity in the interface. A versioned, citable release of the underlying data will be deposited on Zenodo with a DOI once the encoding of the Kül Tigin inscription is complete.