Two new corpora of varieties of Russian

With support from the Linguistic Convergence Laboratory two new resources were created: a corpus of Rogovatka dialect (Belgorod oblast) and a corpus of Russian spoken in Daghestan.

The texts in the Rogovatka corpus were recorded and annotated by employees of the Institute of Russian Language in name of V.V. Vinogradov with technical support from employees of NRU HSE. At present, the available texts contain about 100 000 tokens. Audio recordings are also accessible.

The Russian texts, recorded in a number mountain villages in Daghestan, were annotated by employees and students of NRU HSE. Employees of NRU HSE were also responsible for technical solutions. Currently, a collection of texts amounting to 80 000 tokens are available. Audio recordings are also available. In order to work with the corpus, it is necessary to obtain a password from the site administrator.

Date

27 March 2018

Keywords

corpus dialect regional language varieties

About

Linguistic Convergence Laboratory