• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Research Projects

Daghestanian Multilingualism

The research aims to capture the social and geographical specifics of multilingualism in Daghestan with statistical methods. Excel spreadsheets contain data on language repertoires of several thousand people from highland Daghestan. The database is accessible online (multidagestan.com). We studied the distribution of multilingualism among men and women, and we wrote a paper on the hypothesis that the introduction of compulsory school education was instrumental in the spread of Russian as an L2. We also carried out a statistical validation of data we acquired through indirect interviews, where people described the language repertoire of their deceased relatives.

Participants: Nina Dobrushina, Michael Daniel, George Moroz, Ilya Schurov


Dobrushina N., Kultepina O. The rise of a lingua franca: The case of Russian in Dagestan // International Journal of Bilingualism . 2020, /doi/10.1177/1367006920959717

Dobrushina N., Daniel M., Koryakov Y. Atlas of multilingualism in Daghestan: A case study in diachronic sociolinguistics // Languages of the Caucasus . 2020. Vol. 4. P. 1-37

Dobrushina N., Kozhukhar A. A., Moroz G. Gendered multilingualism in highland Daghestan: story of a loss // Journal of Multilingual and Multicultural Development . 2019. Vol. 40. No. 2. P. 115-132.

Добрушина Н. Р., Закирова А. Н. Аварский язык как лингва франка: исследование в каратинской зоне // Томский журнал лингвистических и антропологических исследований. 2019. Т. 1. № 23. С. 44-55. doi

Dobrushina N., Daniel M. Field linguistics in Daghestan: A very personal account, in: Word hunters Vol. 194. John Benjamins Publishing Company, 2018.doi P. 79-94

Daniel, Michael, Alexey Koshevoy,Ilya Schurov,Nina Dobrushina. Can recall data be trusted? Field methods. Accepted for publication pending appropriate revision.

Typological atlas of Daghestan

This project was designed to develop a tool for the visualization of information about linguistic structures characteristic of Daghestan, based on the available descriptive literature. A number of datasets and chapters on specific features are now available and can be accessed through the website. New datasets are collected by students of the School of Linguistics every year as part of a workshop, as well as by members of the Linguistic Convergence Laboratory.

The map visualizations in the atlas show the languages as groups of villages where the languages are spoken. This allows for a more accurate picture of the size and spread of the respective speech communities, which is important to formulate accurate hypotheses about contact phenomena. The atlas can be used for bibliographical research on a particular topic or idiom, and to formulate hypotheses about the distribution of linguistic features. It also allows a wider audience to become familiar with the linguistic diversity of the region.

Participants: Chiara Naccarato, Samira Verhees , Michael Daniel, George Moroz, Timofey Mukhin, Konstantion Filatov

Daghestanian Loans

The Daghestanian Loans project studies the lexical influence of different languages in Daghestan on a microlevel, i.e. on a level of granularity that is sensitive to the difference between village varieties. Data from the project on multilingualism in Daghestan show that the conditions and the degree of language contact for each village are unique. Our aim is to discover the lexical correlates of these differences. For this purpose, we compiled a shortlist of 160 concepts for cross-linguistic comparison, and developed a method for quick data collection in the field. Using a fixed list of concepts for comparison allows us to find the quantitative correlates of qualitative differences between areas, such as the spread of a certain lingua franca, the presence and degree of contact with particular languages, as well as migratory processes.

Collecting data in neighboring villages allows us to show variation between villages on the map, and it reveals the contours of various zones of influence for specific L2s. For example, lexical influence of local Turkic languages (Azerbaijani, Kumyk and Nogai) is found throughout Daghestan. In the south, however, where Azerbaijani served as lingua franca for a long time, this influence is much stronger. In the north of Daghestan bilingualism with Turkic languages was not common, and almost all Turkic borrowings in minor local languages are shared with Avar, a major native language. Turkic influence in the north was thus most likely mediated by Avar. Our first paper (to appear in Language) details how we can detect different zones by comparing lexical samples from villages and major neighboring languages.

At the moment our database contains translations of the shortlist in 14 different languages as spoken in 30 different villages in Daghestan and five villages in the Qax region of Azerbaijan. These 35 villages are distributed over five distinct geographical and linguistic areas.The data are available in an online database.


Daniel, Michael, Ilya Chechuro, Samira Verhees, and Nina Dobrushina. To appear in 2021. Lingua francas as lexical donors. Language.

Chechuro, Ilya, Michael Daniel, Samira Verhees. To appear in 2021. Small scale multilingualism through the prism of lexical borrowing.International Journal of Bilingualism.

Participants: Michael Daniel, Ilya Chechuro, Samira Verhees, Nina Dobrushina

Daghestanian Stops

The aim of the project is to describe the variation in the acoustic features of stops in East Caucasian languages. It is probable that the acoustic features of the sounds that fill ‘identical’ slots in the phonetic inventories of East Caucasian languages (such as ejectives in Archi vs. ejectives in Lak) are slightly but consistently different. The immediate goal is to prove the presence of such differences in a statistically significant way. The ultimate goal, ideally, is to show that the differences are areally distributed (in a macro perspective, e.g. South Daghestan vs. North Daghestan, and in a local perspective, e.g. showing influence of neighbouring languages on different lects of the same language). Accounting for acoustic differences and similarities in areal terms is, as far as we know, a truly innovative research challenge. This project includes annotation of the recorded data, acoustic analysis and collecting more data during future fieldwork.


Shiryaev, Alexander, Michael Daniel and George Moroz, submitted. Sonorant lateral in Rikvani Andi. 

Grawunder, Sven, Michael Daniel, and George Moroz. (In prep.) Conflicting behavior of VOT in ejectives of two East Caucasian languages

Participants: С. Гравундер, Г.А. Мороз, М.А. Даниэль, В.Р. Жигульская, А.В. Ширяев

Relativization in Nakh-Daghestanian in Intragenetic and Areal Perspective

In Nakh-Daghestanian languages relative clauses are predominantly formed with a participle construction. Even though they can express different aspectual meanings, participles lack any syntactic orientation. There are no syntactic limitations on the target of relativization. The gap in the relative clause can correspond to a core argument, a peripheral participant or even a participant that is not part of the verb’s argument structure. The relativization of facts, places and time is also frequent. A pilot study on relativization targets in several Daghestanian languages revealed that preferences for the relativization of certain arguments differ. It is not apriori clear whether this is due to the counting method used, the particularities of certain corpora, or the grammar of specific languages. Within the project, relativization will be studied on the basis of more substantial corpus data, using a unified markup for relative clauses. Several Nakh-Daghestanian languages will be researched (Agul, Archi, Ingush, Udi and others), as well as other Caucasian languages, which are typologically and/or genetically far removed from Nakh-Daghestanian languages (e.g. Adyghe). The resulting generalizations will allow us to verify claims on the hierarchy of arguments in relativization as they are proposed in current syntactic theories. 

In 2021, a survey of a sample of ergative languages will be completed. Now we are seeking a transliterated and annotated Japanese corpus or published corpus-based work that distinguishes S and A.

Participants: Michael Daniel, Yury Lander, Timur Maisak, Johanna Nichols

Creation of lexical database of Andic languages

Dictionaries are available for all Andic languages. The goal of this project is to create a database of all Andic dictionaries, which will contain transcription, morphological data and separated meanings for each lemma. The creation of such a database will make it possible to analyze segmental and suprasegmental phonology, colexification (cf. other databases: https://clics.clld.org, concepticon.clld.org). This project shares many features with other projects of dictionary databases such as Intercontinental Dictionary Series (Key, Comrie, (eds.) 2015., https://ids.clld.org/), and LexCauc (Forker, Belyaev 2020, https://lexcauc.github.io/), but focuses on the broader lexicon, rather than limiting itself to a predetermined or fixed list of concepts.

Participants: G. A.Moroz

Moroz George & Samira Verhees. Variability in noun classes assignment in Zilo Andi: experimental data // Iran and the Caucasus. 2019. Т. 23. № 3. С. 268-282.

Typology of small-scale multilingualism

The project aims at studying small-scale multilingualism, a type of language ecology typical of—but not exclusive to—indigenous communities with small numbers of speakers. The increased interest in small-scale multilingualism has been boosted by the realization of its significance for reconstructing the social conditions that favoured linguistic diversity in the pre-colonial world. We identify the similarities and differences among situations of such multilingualism, which lay the foundations for a future typology of this kind of language ecology. We study the sources of multilingualism in small-scale societies, with a special focus on the impact of marriage patterns and the language ideologies. The multilingual ecologies of the pre- and postcolonial world are extremely diverse, with many factors playing a role in their constitution. They are also highly endangered, and thus their study is of the utmost urgency.

Participants: Nina Dobrushina, Brigitte Pakendorf, Olesya Khanina


Pakendorf, Brigitte, Nina Dobrushina, Olesya Khanina (to appear). A Typology of Small-Scale Multilingualism. International Journal of Bilingualism (special issue on “Typology of Small-Scale Multilingualism”, edited by Nina Dobrushina, Olesya Khanina, & Brigitte Pakendorf).

Dobrushina, Nina & Moroz, George (to appear). The speakers of minority languages are more multilingual.  International Journal of Bilingualism (special issue on “Typology of Small-Scale Multilingualism”, edited by Nina Dobrushina, Olesya Khanina, & Brigitte Pakendorf).

Vydrina, Alexandra. Fouta-Djallon linguistic ecology: between polyglossia and small-scale multilingualism (to appear). International Journal of Bilingualism (special issue on “Typology of Small-Scale Multilingualism”, edited by Nina Dobrushina, Olesya Khanina, & Brigitte Pakendorf).

Spoken corpora of nonstandard varieties of Russian and other languages

The laboratory creates spoken corpora of dialects and regional varieties of Russian and other languages. The corpora contain audio files and a transcription in standardized orthography. Using the search function, you can listen to fragments of texts that contain a word or collocation of interest. All corpora are publicly accessible and for many of them, full texts are available.

Non-standard word order in spoken Russian: bilinguals’ and monolinguals’ varieties

The project was started with the aim of investigating non-standard word-order realizations in the variety of Russian spoken in Daghestan, with a special focus on noun phrases with a genitive modifier. Whereas in Standard Russian the neutral word order is “head noun + genitive” (N+Gen), in Daghestanian Russian the opposite word order (Gen+N) is often employed. The first hypothesis is that non-standard word order in such constructions is the result of contact with the speakers’ first languages (East Caucasian and Turkic), which normally display Gen+N order in such phrases. The alternative hypothesis is that Gen+N order is rather a general feature of spoken Russian discourse, in which constructions of this type are also admissible. To verify our hypotheses, we conduct a quantitative  analysis of noun phrases with genitive modifiers based on the Corpus of Russian spoken in Daghestan. The Daghestanian data are compared to data from: a) other contact varieties of spoken Russian, i.e. Bashkir Russian (based on the Corpus of Russian spoken in Bashkortostan) and Chuvash Russian (based on the Corpus of Russian spoken in Chuvashia); b) dialectal varieties of spoken Russian (based on the following corpora: Ustja River Basin Corpus, Corpus of Rogovatka dialect, Corpus of Spiridonova Buda dialect, Corpus of Malinino dialect,Corpus of Opochetsky dialects); and c) standard varieties of spoken Russian (based on the Spoken Subcorpus of the Russian National Corpus and the Corpus of Russian spoken in Zvenigorod).

Participants: Chiara Naccarato, Natalia Stoynova, Anastasia Panova


Наккарато К., Панова А. Б., Плешак П. С., Стойнова Н. М., Хомченкова И. А. Посессивные конструкции с препозицией генитива в русском языке // В кн.: Анализ разговорной русской речи (АР3-2019): Труды восьмого междисциплинарного семинара. СПб. : Политехника-Принт, 2019. С. 78-83.

Наккарато К., Панова А. Б., Стойнова Н. М. Нестандартный порядок слов в дагестанском русском: именные группы с генитивом // Труды института русского языка им. В.В. Виноградова. 2020. № 4. С. 146-167.

Naccarato, Chiara, Anastasia Panova & Natalia Stoynova. (under review for Language Variation and Change ). Word-order variation in a contact setting: A corpus-based investigation of Russian spoken in Daghestan.

Preposition drop in Russian spoken in Daghestan

The project is aimed at understanding the conditions on preposition drop in the Russian speech of people from Daghestanian villages. Previous research, based on data collected from three villages, cite it as a prominent characteristic of this variety of Russian and explain it by interference with the Nakh-Daghestanian morphological system. Based on a statistical analysis of extensive corpus data, we show that the probability of preposition drop depends on preposition type, phonetic context and the speaker’s fluency in Russian. We propose that the prominence of preposition drop in the speech of Daghestanian highlanders results from an interplay of two factors: a typological tendency for certain spatial and temporal locations to be formally unmarked and incomplete acquisition of the Russian prepositional system.

Participants: Anastasia Panova, Tatiana Philippova

Panova, Anastasia and Tatiana Philippova. (in press). When a cross-linguistic tendency marries incomplete acquisition: Preposition drop in Russian spoken in Daghestan.
International Journal of Bilingualism

Circassian Isoglosses

The two Circassian languages of the Northwest Caucasian language family (West Circassian, also known as Adyghe, and Kabardian, also known as East Circassian) are considered to be one language by their speakers. However, this assumed linguistic continuum shows a lot of variation. The aim of the Circassian Isoglosses project is to survey various features and their distribution among regional varieties of Circassian, based on existing literature and fieldwork. The prospective result of the project will be a database of isoglosses that will allow us to compare Circassian idioms. At the present stage, the project focuses on varieties of West Circassian as spoken in the Republic of Adygea and the Krasnodar Kray. In addition, we carried out fieldwork with Israeli Circassians in the fall of 2017.

Participants: Yury Lander, George Moroz, Paul Phellan, Aleksei Fedorenko

Documentation of the Abaza language

In this project we continue the work of the research group “Aspects of Abaza grammar”, and focus on the analysis of Abaza grammar and texts. Within the project we have already created a Spoken corpus of Abaza and work on various aspects of the grammar, mostly connected to the polysynthetic nature of Abaza.

Y. A. Lander, A. B. Panova, G. A. Moroz

Nominal inflection typology

Verb inflection is one of the most useful parameters in the Autotyp database from a typological and geographical perspective. The goal of our project is to create a database similar to Autotyp for nouns. In 2017-2018 we created a database, and carried out a pilot investigation in Eurasia. This year we will add languages from other parts of the world to our database, and measure correlations between the complexity of verbal and nominal inflection systems. Submission for journal publication planned for 2021.

Participants: Elena Sokur, Johanna Nichols

Causative alternation database

The Causative alternation database was created as part of a larger project by J. Nichols and others about transitivity in the languages of the world. The database consists of pairs of verbs, one of which is semantically non-causative, while the other is its semantically causative counterpart. The database also contains information on the morphological relations between these verbs (is one verb from the pair derived from the other? Are their roots the same? Are they morphologically complex? etc.)  

Participants: Polina Nasledskova, Johanna Nichols     

Typology of adnominal inalienability

The project is devoted to the typology of inalienability in the adnominal domain. Adnominal inalienability is expressed via various constructions in the language of the world. We create a cross-linguistic database and focus on various parameters by which these constructions differ: e.g. semantics of inalienable and alienable classes, word order, head or dependent marking.

Participants: Elena Sokur, Yury Lander

Ustja Corpus

The Ustja River Basin Corpus is a growing corpus of a northern Russian dialect (south of Arkhangelskaja oblastj) where the normalized orthographic annotation is aligned with the audio of the interviews. The research based on this corpus is aimed at establishing the dynamics of dialect loss - correlation between dialect variables, consistency of speakers, age outliers (people who are ahead or behind their age peers), etc. It involves a vast amount of perceptive and sometimes instrumental acoustic data annotation. After the first publication (see below) we plan to study how the use of dialect correlates with gender within the same age group.

Participants: Ruprecht von Waldenfels, Nina Dobrushina, Michael Daniel


Daniel M., von Waldenfels R., Ter-Avanesova A., Kazakova P., Schurov I., Gerasimenko E., Игнатенко Д. И., Махлина Е. Н., Tsfasman M., Verhees S., Vinyar A., Zhigulskaya V., Ovsjannikova M., Say S., Dobrushina N. Dialect loss in the Russian North: modeling change across variables // Language Variation and Change . 2019. Vol. 31. No. 3. P. 353-376. doi

Dialectal Differentiation of Even

Even is a Northern Tungusic language spoken in a number of small communities scattered across northeast Siberia. This dispersed mode of settlement has led to considerable dialectal fragmentation with diversification at the lexical, phonological, morphological, and syntactic level. This diversification can be assumed to be the result of multiple factors: differential retention of ancestral variation, independent innovation, as well as contact with typologically different languages. We want to elucidate the relative impact of these different factors during the differentiation of the dialects, and especially, to what extent language contact played a role. That there would have been some contact in the history of the dialects is indicated by molecular genetic data showing intermarriage of different Even groups with their neighbors. This study focuses on two of the geographically most disparate Even dialects: the westernmost still viable Even dialect, Lamunkhin, spoken in the village of Sebjan-Küöl in Yakutia, and one of the easternmost dialects, namely the Bystraja dialect spoken in Central Kamchatka. Oral corpora for both dialects have already been glossed: with the Lamunkhin corpus comprising around 52,000 words and the Bystraja corpus comprising around 34,000 words. An important prerequisite for answering the question of how these dialects diverged is to establish in what way they differ.

The study of dialectal differences usually entails categorical differences, i.e. the presence of a feature in one dialect which is absent in another. Often, however, features differ in frequency, having become less prominent in one dialect in the course of its diachronic evolution, or because a form has developed new functions. When working with smaller corpora, the study of dialectal differences through variation in frequency is problematic. An observed difference in frequency might be the result of a speaker’s preference, rather than a feature of the dialect as a whole. In our first publication we proposed a statistical method that allows us to trace differences in frequency while taking into account the ideolectal heterogeneity of the corpora. Further, we plan to elaborate on statistical models that we use as well as to continue with the linguistic interpretation of the differences we find from the point of view of functional divergence, contact situations and the typology of grammaticalization processes.

Participants: Brigitte Pakendorf, Ekaterina Biryukova, Michael Daniel, Ilya Schurov.


Andriyanets V., Daniel M., Pakendorf B.Discovering dialectal differences based on oral corpora, in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 30 мая — 2 июня 2018 г.) / Под общ. ред.: В. Селегей, И. М. Кобозева, Т. Е. Янко, И. Богуславский, Л. Л. Иомдин,М. А. Кронгауз,А. Ч. Пиперски. Вып. 17(24). М. : Издательский центр «Российский государственный гуманитарный университет», 2018. P. 28-38.

Meadow Mari Corpus

The goal of this project is to create a spoken corpus of Meadow Mari (Mari < Finno-Ugric < Uralic, circa 375,000 speakers). So far, the corpus represents the variety of the Staryj Torjal village (Novyj Torjal district, Mari El Republic), which is close to the standard Meadow Mari variety. The audio data for the corpus have been recorded in 2000-2001 by a fieldwork research team of Moscow State University and in 2018 by Anna Volkova. 

What has been done already: about 3 hours of recordings have been transcribed by a native speaker of Meadow Mari, checked by researchers, segmented and aligned and are being morphologically analyzed.

Every utterance will be provided with morphological analysis, segmented for intonation units and annotated for language, which will make the corpus a good source of information on Mari-Russian code-switching. For morphological analysis we use the Meadow Mari parser created by Timofey Arkhangelsky. 

Participants: Anna VolkovaAigul Zakirova, Zinaida Klyucheva, Ilya Makarchuk, Maria Dolgodvorova, Svetlana Kokoreva, Mikhail Voronov, Irina Khomchenkova


Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.