Russian National Corpus

Национальный Корпус Русского Языка

This website contains a corpus of the modern Russian language incorporating over 300 million words. The corpus of Russian is a reference system based on a collection of Russian texts in electronic form.

The Corpus is intended for all who are interested in the Russian language and various associated fields: professional linguists, language teachers, school and university students, foreigners learning the language.

The Russian National Corpus includes the following subcorpora:

The Deeply Annotated corpus, containing sentences with full morphological and syntax structure markup.
The Parallel Corpora (English, German, Ukrainian, Belorussian, and multilingual).
The Dialectal corpus, which includes recordings of dialectal speech from various regions of Russia.
The Poetry corpus, which facilitates searches not only by lexical and grammatical features but also by specifically poetical features, such as meter, rhyme types, etc,
The Educational corpus, a corpus of texts with disambiguated grammatical homonyms, which was adapted for the Russian school teaching program.
The Corpus of Spoken Russian which includes the recordings of public and spontaneous spoken Russian and the transcripts of the Russian movies (1930-2007).