Skip to main content

Almaty Corpus of Kazakh

Almaty Corpus of Kazakh

September 14, 2016

Almaty Corpus of Kazakh

Алматы қазақ тілі корпусы


This is the first version of Kazakh National Corpus (KNC), a tool based on a large collection of annotated texts in literary Kazakh, the official language of the Republic of Kazakhstan. There will be regular updates of the corpus, in terms of both quality and quantity.

The corpus considers its goal the following characteristics of KNC:

  • a linguistically representative corpus;
  • a powerful search engine which allows for complex lexical and morphological queries;
  • a convenient tool for study of the Kazakh language where most words are accompanied by morphological analysis and English/Russian translation equivalents;
  • a diachronically oriented corpus which covers different periods of modern Kazakh language history;
  • a diversified corpus which includes written and oral texts of various genres;
  • an annotated corpus with grammatical and metatext markup;
  • an open access corpus;
  • an online library with acces to more than 100 pieces of classical Kazakh literature.
Read More