Russian Learner Corpus

What is RLC?

The Russian Learner Corpus (RLC) is a collection of texts produced by two categories of non-standard speakers of Russian: learners of Russian as a Foreign language and speakers of Heritage Russian with different dominant languages. The corpus contains both oral and written production and enables search by morphological properties and a variety of deviations from Standard Russian ranging from mistakes in orthography and grammar to non-standard use of lexical and syntactic constructions.
The preliminary linguistic analysis and tagging is done by the members of the Learner Russian Research Group under Ekaterina Rakhilina (Higher School of Economics).

The majority of texts are coming from teachers of Russian as a second language and/or Heritage language in different countries. RLC comprises both academic and non-academic texts, such as movie and picture descriptions, book summaries, expository essays and others (see HELP).
Part of RLC is RULEC – a longitudinal subcorpus of Academic Writing produced by Heritage and L2 speakers of Russian collected by Olessya Kisselev and Anna Alsufieva of Portland State University over a period of 4 years.
Data on Heritage Russian oral production include the results of experimental studies: frog stories (based on the methodology described in Berman & Slobin 1994; Slobin 2004) and narratives based on a short cartoon (“Nu pogodi!”) (see Isurin & Ivanova-Sullivan 2008 and Polinsky 2008 for more details). Also, see our "Partners"


Each text in the Corpus is assigned background information.

Mandatory fields

  • Oral / Written
  • Author’s language background (Heritage / L2)
  • Author’s dominant language
  • Author’s proficiency in Russian

Optional Fields

  • Author’s gender
  • Date
  • Genre

A more elaborated system of text marking is used in RULEC.


Maria Polinsky (Harvard University)
Olessya Kisselev (Penn State University)
Anna Alsufieva
Evgeny Dengub (Middlebury Langugage Schools)
Irina Dubinina (Brandeis University)
Anna Mikhaylova (University of Oregon)
Alla Smyslova (Columbia University)
Anna Pavlova (University of Mainz)
Anna Möhl (Johannes Gutenberg University of Zurich)
Anka Bergmann (Humboldt University of Berlin)
Irina Kor Chahine (Aix-Marseille University)
Suhyoun Lee (Seoul National University)
Svetlana Slavkova (Bologna University)
Francesca Biagini (Bologna University)
Monica Perotto (Bologna University)
Svetlana Sokolova (Tromse University)
Natalia Ringblom (University of Stockholm)
Hayashida Rie (Osaka University)
Tsuneto Shogo (Osaka University)
Margarita Kazakevich (Osaka Universty)
Nazija Zhanpeisova (Aktubinsk University)
Ekaterina Protasova (University of Helsinki)
Alexander Krasovitsky (University of Oxford)
Rashida Kasymova (Al-Farabi Kazakh National University)
Aimgyl Kazkenova (Abai Kazakh National Pedagogical University)


Currently, RLC contains production by L2 and Heritage speakers who have as their dominant language:

American English
British English
German (including Swiss German)


RLC enables both lexico-grammatical search and exact search. A user can specify morphological and grammatical features of a word, as well as search by types of deviations from Standard Russian (errors). See HELP for detailed information.

Search results

Apart from the original sentence, the user is presented with its two-levelled correction: the first level shows formal corrections (orthography, case forms, gender / number agreement, tense and aspect), the second level displays corrected lexical and constructional violations.

Using RLC

Comprising texts from two different groups of non-sandard speakers of Russian, RLC is a valuable source for various studies in the fields of Second Language Acquisition, Second Language teaching, language interference and theoretical linguistics.

Corpus data and its flexible search system provide a sound basis for comparative research in Heritage and L2 production and enables a deeper insight into complicated phenomena, such as non-standard use of Russian aspect, cases, prepositional phrases, as well as lexical and semantic misuse in multi-word constructions.

Apart from telling a lot about non-standard Russian, RLC is a powerful tool for opening new facets of Standard Russian grammar: deviations in language use help uncover subtle rules that previously have been paid no attention to.

Our team

The corpus was created by the Linguistic Laboratory of Corpus Technologies of National Research University Higher School of Economics:

Chief: Ekaterina Rakhilina

Tagging and Research: Anastasia Vyrenkova
Alina Ladygina
Olga Eremina
Ekaterina Shnittke
Ekaterina Vlasova
Olga Kultepina
Ivan Smirnov
Evgenia Smolovskaya
Evgenia Mescheryakova
Svetlana Puzhaeva
Zosya Ivanova
Kirill Aksenov
Maria Grabovskaya
Daria Loshkaryova
Students of School of Linguistics (HSE)

Elmira Mustakimova
Ekaterina Uetova

If you have any questions concerning the error classification, the state of the project or partnership, or if you encounter any problems with the corpus' functionality, please, contact the chiefs of the corpus and the developer : small.corpora@gmail.com.


