RLC

Russian Learner Corpus


What is RLC?

The Russian Learner Corpus (RLC) is a collection of texts produced by two categories of non-standard speakers of Russian: learners of Russian as a Foreign language and speakers of Heritage Russian with different dominant languages. The corpus contains both oral and written production and enables search by morphological properties and a variety of deviations from Standard Russian ranging from mistakes in orthography and grammar to non-standard use of lexical and syntactic constructions.
The preliminary linguistic analysis and tagging is done by the members of the Learner Russian Research Group under Ekaterina Rakhilina (Higher School of Economics).

The majority of texts are coming from teachers of Russian as a second language and/or Heritage language in different countries. RLC comprises both academic and non-academic texts, such as movie and picture descriptions, book summaries, expository essays and others (see HELP).
Part of RLC is RULEC – a longitudinal subcorpus of Academic Writing produced by Heritage and L2 speakers of Russian collected by Olessya Kisselev and Anna Alsufieva of Portland State University over a period of 4 years.
Data on Heritage Russian oral production include the results of experimental studies: frog stories (based on the methodology described in Berman & Slobin 1994; Slobin 2004) and narratives based on a short cartoon (“Nu pogodi!”) (see Isurin & Ivanova-Sullivan 2008 and Polinsky 2008 for more details). Also, see our "Partners"


Metadata

Each text in the Corpus is assigned background information.

Mandatory fields

  • Oral / Written
  • Author’s language background (Heritage / L2)
  • Author’s dominant language
  • Author’s proficiency in Russian

Optional Fields

  • Author’s gender
  • Date
  • Genre

A more elaborated system of text marking is used in RULEC.


Partners

Maria Polinsky (Harvard University)
Olessya Kisselev (Penn State University)
Anna Alsufieva
Evgeny Dengub (Middlebury Langugage Schools)
Irina Dubinina (Brandeis University)
Anna Mikhaylova (University of Oregon)
Alla Smyslova (Columbia University)
Ekaterina Protassova (University of Helsinki)
Anna Pavlova (University of Mainz)
Anna Möhl (Johannes Gutenberg University of Zurich)
Anka Bergmann (Humboldt University of Berlin)
Irina Kor Chahine (Aix-Marseille University)
Suhyoun Lee (Seoul National University)
Svetlana Slavkova (Bologna University)
Francesca Biagini (Bologna University)
Monica Perotto (Bologna University)
Svetlana Sokolova (Tromse University)
Natalia Ringblom (University of Stockholm)
Hayashida Rie (Osaka University)
Tsuneto Shogo (Osaka University)
Margarita Kazakevich (Osaka Universty)
Nazija Zhanpeisova (Aktubinsk University)
Ekaterina Protasova (University of Helsinki)
Alexander Krasovitsky (University of Oxford)
Rashida Kasymova (Al-Farabi Kazakh National University)
Aimgyl Kazkenova (Abai Kazakh National Pedagogical University)

Languages

Currently, RLC contains production by L2 and Heritage speakers who have as their dominant language:

American English
German (including Swiss German)
French
Italian
Serbian
Japanese
Korean
Kazakh
Finnish

Search

RLC enables both lexico-grammatical search and exact search. A user can specify morphological and grammatical features of a word, as well as search by types of deviations from Standard Russian (errors). See HELP for detailed information.


Search results

Apart from the original sentence, the user is presented with its two-levelled correction: the first level shows formal corrections (orthography, case forms, gender / number agreement, tense and aspect), the second level displays corrected lexical and constructional violations.


Using RLC

Comprising texts from two different groups of non-sandard speakers of Russian, RLC is a valuable source for various studies in the fields of Second Language Acquisition, Second Language teaching, language interference and theoretical linguistics.

Corpus data and its flexible search system provide a sound basis for comparative research in Heritage and L2 production and enables a deeper insight into complicated phenomena, such as non-standard use of Russian aspect, cases, prepositional phrases, as well as lexical and semantic misuse in multi-word constructions.

Apart from telling a lot about non-standard Russian, RLC is a powerful tool for opening new facets of Standard Russian grammar: deviations in language use help uncover subtle rules that previously have been paid no attention to.


Our team

The corpus was created by the Linguistic Laboratory of Corpus Technologies of National Research University Higher School of Economics:

Chief: Ekaterina Rakhilina

Tagging and Research: Anastasia Vyrenkova
Alina Ladygina
Olga Eremina
Ekaterina Shnittke
Ekaterina Vlasova
Olga Kultepina
Ivan Smirnov
Evgenia Smolovskaya
Evgenia Mescheryakova
Svetlana Puzhaeva
Zosya Ivanova
Kirill Aksenov
Maria Grabovskaya
Daria Loshkaryova
Students of School of Linguistics (HSE)

Developing:
Elmira Mustakimova
Ekaterina Uetova

If you have any questions concerning the error classification, the state of the project or partnership, or if you encounter any problems with the corpus' functionality, please, contact the chiefs of the corpus and the developer : small.corpora@gmail.com.


Publications

2016
Polinsky M., Ekaterina Rakhilina, Anastasia Vyrenkova. Linguistic creativity in heritage speakers // Glossa. 2016. Vol. 43. P. 1-29.
Рахилина Е.В., Ладыгина А.А. Русские конструкции со значением чередования ситуаций (in print)

2015
Рахилина Е. В., Выренкова А. С. Корпусные исследования особенностей речи нестандартных говорящих ("херитажный русский") // Acta Linguistica Petropolitana. Труды института лингвистических исследований. 2015. Т. XI. № 1. С. 621-639.
Ладыгина А.А. Изменения в предложных конструкциях в эритажном русском (Russian Heritage language)//IV конференция «Русский язык:конструкционные и лексико-семантические подходы», Санкт-Петербург, 16-18 апреля 2015 г.
K. Rakhilina, O. Kisselev, E. Smolovskaya, E. Mescheryakova. Доклад: Russian in the English mirror: (non)grammatical constructions in learner Russian. Corpus Linguistics 2015 (Lancaster).
Е. Смоловская. Доклад: Ошибки нестандартных говорящих: некоторые особенности русской речи иностранцев с доминирующим английским. XIII КОНГРЕСС МАПРЯЛ «Русский язык и литература в пространстве мировой культуры» (Гранада)

2014
Полинская М., Рахилина Е. В., Выренкова А. С. Грамматика ошибок и грамматика конструкций: «эритажный» («унаследованный») русский язык // Вопросы языкознания. 2014. № 3. С. 3-19.
Rakhilina E. V., Vyrenkova A. S. Language Interference in Heritage Russian: Constructional Violations / Working papers by NRU HSE. Series WP BRP "Linguistics". 2014. No. 11.
Ладыгина А.А. Русские эритажные конструкции: корпусное исследование. Дипломная работа. Москва, МГУ
Ладыгина А.А. Семантика конструкций «Х обладает Y», «X владеет Y» (корпусное исследование)//Постерный доклад на I Международной научно-практической конференции «Корпусные технологии и компьютерные методы в современной гуманитарной науке», НИУ-ВШЭ, Нижний Новгород, 11-12 апреля 2014

2013
Ладыгина А.А. Корпус Russisch in Deutschland: состав и особенности разметки// Материалы международной научно-практической конференции "Корпусные технологии. Digital Humanities и современное знание", Нижний Новгород 18-19 октября 2013 г.
Рахилина Е. В., Выренкова А. С. Ошибки в речи херитажных говорящих (на материале текстов русских эмигрантов в США) // В кн.: Проблемы онтолингвистики - 2013 / Рук.: Т. Круглякова; сост.: Т. Круглякова; отв. ред.: Т. Круглякова; под общ. ред.: Т. Круглякова; науч. ред.: Т. Круглякова. СПб. : Российский государственный педагогический университет им. А.И. Герцена, 2013. С. 435-439.
Рахилина Е.В., Ладыгина А.А. То взлёт, то посадка// Тезисы докладов, третья конференция "Русский язык: конструкционные и лексико-семантические подходы", ИЛИ РАН, СПб, 12-14 сентября 2013


Links