Bashkir Poetic Corpus

This website hosts a Poetic Corpus of Bashkir language containing over 1.8 million tokens (about 500,000 lines of verse, more than 15,000 poems, 101 poets). This is the first corpus of Bashkir language and second poetic corpus in the world (after Russian corpus). A specific feature of this corpus is its text collection comprising verse of Bashkir poets of 20th and early 21st century.

Texts in the corpus are annotated with morphological tags, each single token having a set of tags, and with special metric and prosodic tags, enabling search in lines of specific metre, in rhyming parts, etc. Texts are shown to users with word translation into Russian, which makes the system useful not only to speakers of Bashkir language, but to researchers in humanities, metrics and prosody and linguistic typologists.

This Corpus uses an adjusted search engine from East Armenian National Corpus (EANC).

The Corpus is developed by Laboratory of Computer Philology , Bashkir State University, with information support and encouragement from World Poetry Linguistic Research Center of the Institute of Linguistics, Russian Academy of Sciences. Extensive support and consultation for the Project is also provided by Associate Member of Russian Academy of Sciences V. А. Plungian. The invaluable technic help provided by T. A. Arkhangelsky.

Tokens are parsed automatically. Automated parser system Bashmorph has been developed by B. V. Orekhov and А. А. Gallyamov.

Here you can find User Reference Guide for the Corpus.