The Russian Learner Corpus (RLC) is a collection of texts produced by two categories of non-standard speakers of Russian: learners of Russian as a Foreign language and speakers of Heritage Russian with different dominant languages. The corpus contains both oral and written production and enables search by morphological properties and a variety of deviations from Standard Russian ranging from mistakes in orthography and grammar to non-standard use of lexical and syntactic constructions.
The preliminary linguistic analysis and tagging is done by the members of the Learner Russian Research Group under Ekaterina Rakhilina (Higher School of Economics).
The majority of texts are coming from teachers of Russian as a second language and/or Heritage language in different countries. RLC comprises both academic and non-academic texts, such as movie and picture descriptions, book summaries, expository essays and others (see HELP).
Part of RLC is RULEC – a longitudinal subcorpus of Academic Writing produced by Heritage and L2 speakers of Russian collected by Olessya Kisselev and Anna Alsufieva of Portland State University over a period of 4 years.
Data on Heritage Russian oral production include the results of experimental studies: frog stories (based on the methodology described in Berman & Slobin 1994; Slobin 2004) and narratives based on a short cartoon (“Nu pogodi!”) (see Isurin & Ivanova-Sullivan 2008 and Polinsky 2008 for more details).
Also, see our "Partners"
Each text in the Corpus is assigned background information.
A more elaborated system of text marking is used in RULEC.
Currently, RLC contains production by L2 and Heritage speakers who have as their dominant language:
AbkhazRLC enables both lexico-grammatical search and exact search. A user can specify morphological and grammatical features of a word, as well as search by types of deviations from Standard Russian (errors). See HELP for detailed information.
Apart from the original sentence, the user is presented with its two-levelled correction: the first level shows formal corrections (orthography, case forms, gender / number agreement, tense and aspect), the second level displays corrected lexical and constructional violations.
Comprising texts from two different groups of non-sandard speakers of Russian, RLC is a valuable source for various studies in the fields of Second Language Acquisition, Second Language teaching, language interference and theoretical linguistics.
Corpus data and its flexible search system provide a sound basis for comparative research in Heritage and L2 production and enables a deeper insight into complicated phenomena, such as non-standard use of Russian aspect, cases, prepositional phrases, as well as lexical and semantic misuse in multi-word constructions.
Apart from telling a lot about non-standard Russian, RLC is a powerful tool for opening new facets of Standard Russian grammar: deviations in language use help uncover subtle rules that previously have been paid no attention to.
The corpus was created by the Linguistic Laboratory of Corpus Technologies of National Research University Higher School of Economics:
Chief: Ekaterina Rakhilina
Tagging and Research: Anastasia Vyrenkova
Anastasia Ivanenko
Alina Ladygina
Olga Eremina
Daniil Fedorov
Ekaterina Shnittke
Ekaterina Vlasova
Olga Kultepina
Olga Vedenina
Ivan Smirnov
Kirill Semenov
Kirill Aksenov
Maria Grabovskaya
Sofia Goldina
Students of School of Linguistics (HSE)
Developing:
Elena Sokur
Ekaterina Uetova
Elmira Mustakimova
Timofey Arkhangelskiy
If you have any questions concerning the error classification, the state of the project or partnership, or if you encounter any problems with the corpus' functionality, please, contact the chiefs of the corpus and the developer : small.corpora@gmail.com.
2017
Vyrenkova A. S., Rakhilina E. V. Learner corpora supporting lexical typology, in: XVII April International Academic Conference on Economic and Social Development: в 4 кн. / Ed.: E.G. Yasin, Vol. 4. М. : HSE Publishing House, 2017. P. 450-460.
2016
Polinsky M., Ekaterina Rakhilina, Anastasia Vyrenkova. Linguistic creativity in heritage speakers // Glossa. 2016. Vol. 43. P. 1-29.
Ekaterina Rakhilina, Anastasia Vyrenkova, Elmira Mustakimova, Alina Ladygina, Ivan Smirnov. Building a learner corpus for Russian // Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016. http://aclweb.org/anthology/W16-65
Zarifyan M., Melnik A. A., Vyrenkova A. S. A case of using a Multilingual Database of synonyms for designing lexical drills / NRU HSE. Series WP BRP "Linguistics". 2016.
Рахилина Е. В. О новых инструментах описания русской грамматики: корпус ошибок // Русский язык за рубежом. 2016. № 3. С. 20-25
Рахилина Е.В., Ладыгина А.А. Русские конструкции со значением чередования ситуаций. Язык: поиски, факты, гипотезы. Лексрус Москва, 2016. С. 320-335
2015
Рахилина Е. В., Выренкова А. С. Корпусные исследования особенностей речи нестандартных говорящих ("херитажный русский") // Acta Linguistica Petropolitana. Труды института лингвистических исследований. 2015. Т. XI. № 1. С. 621-639.
Ладыгина А.А. Изменения в предложных конструкциях в эритажном русском (Russian Heritage language)//IV конференция «Русский язык:конструкционные и лексико-семантические подходы», Санкт-Петербург, 16-18 апреля 2015 г.
K. Rakhilina, O. Kisselev, E. Smolovskaya, E. Mescheryakova. Доклад: Russian in the English mirror: (non)grammatical constructions in learner Russian. Corpus Linguistics 2015 (Lancaster).
Е. Смоловская. Доклад: Ошибки нестандартных говорящих: некоторые особенности русской речи иностранцев с доминирующим английским. XIII КОНГРЕСС МАПРЯЛ «Русский язык и литература в пространстве мировой культуры» (Гранада)
2014
Полинская М., Рахилина Е. В., Выренкова А. С. Грамматика ошибок и грамматика конструкций: «эритажный» («унаследованный») русский язык // Вопросы языкознания. 2014. № 3. С. 3-19.
Rakhilina E. V., Vyrenkova A. S. Language Interference in Heritage Russian: Constructional Violations / Working papers by NRU HSE. Series WP BRP "Linguistics". 2014. No. 11.
Ладыгина А.А. Русские эритажные конструкции: корпусное исследование. Дипломная работа. Москва, МГУ
Ладыгина А.А. Семантика конструкций «Х обладает Y», «X владеет Y» (корпусное исследование)//Постерный доклад на I Международной научно-практической конференции «Корпусные технологии и компьютерные методы в современной гуманитарной науке», НИУ-ВШЭ, Нижний Новгород, 11-12 апреля 2014
2013
Ладыгина А.А. Корпус Russisch in Deutschland: состав и особенности разметки// Материалы международной научно-практической конференции "Корпусные технологии. Digital Humanities и современное знание", Нижний Новгород 18-19 октября 2013 г.
Рахилина Е. В., Выренкова А. С. Ошибки в речи херитажных говорящих (на материале текстов русских эмигрантов в США) // В кн.: Проблемы онтолингвистики - 2013 / Рук.: Т. Круглякова; сост.: Т. Круглякова; отв. ред.: Т. Круглякова; под общ. ред.: Т. Круглякова; науч. ред.: Т. Круглякова. СПб. : Российский государственный педагогический университет им. А.И. Герцена, 2013. С. 435-439.
Рахилина Е.В., Ладыгина А.А. То взлёт, то посадка// Тезисы докладов, третья конференция "Русский язык: конструкционные и лексико-семантические подходы", ИЛИ РАН, СПб, 12-14 сентября 2013