Research Projects

Daghestanian Stops

The aim of the project is to describe the variation in the acoustic features of stops in East Caucasian languages. It is probable that the acoustic features of the sounds that fill ‘identical’ slots in the phonetic inventories of East Caucasian languages (such as ejectives in Archi vs. ejectives in Lak) are slightly but consistently different. The immediate goal is to prove the presence of such differences in a statistically significant way. The ultimate goal, ideally, is to show that the differences are areally distributed (in a macro perspective, e.g. South Daghestan vs. North Daghestan, and in a local perspective, e.g. showing influence of neighbouring languages on different lects of the same language). Accounting for acoustic differences and similarities in areal terms is, as far as we know, a truly innovative research challenge. This project includes annotation of the recorded data, acoustic analysis and collecting more data during future fieldwork.
Participants: Sven Grawunder, George Moroz, Michael Daniel, Vasilisa Zhigulskaya

Daghestanian Loans

The goal of this project is to investigate lexical borrowing in selected pairs of East Caucasian languages. The focus of the research is ‘horizontal’ borrowing among contact languages, as opposed to borrowing from external major languages such as Russian, Arabic and Persian. A substantial set of wordlists from major literary Turkic and East Caucasian languages of Daghestan and neighboring regions was compiled based on the Leipzig-Jakarta wordlist, including Avar, Lak, standard Dargwa, Lezgian, Chechen, Kumyk and Azerbaijani. This list will allow us to identify lexemes that are most likely borrowed horizontally, and will also reveal patterns of influence among the major languages. Based on the results of this preliminary research, we will set up a field “probe” of likely candidates for horizontal borrowing. This shortlist can then be used to collect data during fieldwork, in order to expose patterns of influence in particular lects (Lezgian and Azerbaijani influence on Rutul, Avar and Lak influence on Mehweb Dargwa etc). The results of these probes will allow us to quantify lexical infusion and to trace differences in the degree of contact exposedness between different varieties of the same language, ideally down to the level of minor one-village lects. We will also look at subtler effects of language contact, such as calques and loan meaning extensions (i.e. polysemy patterns influenced by language contact). An additional goal is to contrast degrees of lexical borrowing with the levels of bilingualism as established by the DagMulti project.
Participants: Michael Daniel, Ilya Chechuro, Samira Verhees, Nina Dobrushina

Ustja Corpus

The Ustja River Basin Corpus is a growing corpus of a northern Russian dialect (south of Arkhangelskaja oblastj) where the normalized orthographic annotation is aligned with the audio of the interviews. The research based on this corpus is aimed at establishing the dynamics of dialect loss - correlation between dialect variables, consistency of speakers, age outliers (people who are ahead or behind their age peers) etc. It involves a vast amount of perceptive and sometimes instrumental acoustic data annotation.
Participants: Ruprecht von Waldenfels, Nina Dobrushina, Michael Daniel, Polina Kazakova

Daghestanian Russian

The aim of the project is to build a corpus of the variety of Russian spoken in Daghestan, with the objective to study its morphosyntactic properties in a consistent way. The corpus will use the same platform as the Ustja River Basin Corpus. The project involves corpus maintenance, transcription, annotation and systematic analysis of regional phenomena in this local variant of Russian.
Participants: Timur Maisak, Nina Dobrushina, Michael Daniel, Ruprecht von Waldenfels, Anastasia Panova

Daghestanian Multilingualism

The aim of the project is to process the data so far collected for the Atlas of multilingualism of Daghestan. These data consist of Excel spreadsheets that contain information about multilingual repertoires of several thousands of people from 38 Daghestanian villages. In the spring of 2017, the database will be published online. The research in general aims at revealing different social and geographical patterns of multilingualism. This project in particular includes the following directions: compilation of online and offline databases, processing data from new field trips and performing statistical analysis of the data.
Participants: Nina Dobrushina, Alexandra Kozhukhar

Dialectal Differentiation of Even

Even is a Northern Tungusic language spoken in a number of small communities scattered across northeast Siberia. This dispersed mode of settlement has led to considerable dialectal fragmentation with diversification at the lexical, phonological, morphological, and syntactic level. This diversification can be assumed to be the result of multiple factors: differential retention of ancestral variation, independent innovation, as well as contact with typologically different languages. We want to elucidate the relative impact of these different factors during the differentiation of the dialects, and especially, to what extent language contact played a role. That there would have been some contact in the history of the dialects is indicated by molecular genetic data showing intermarriage of different Even groups with their neighbors. This study focuses on two of the geographically most disparate Even dialects: the westernmost still viable Even dialect, Lamunkhin, spoken in the village of Sebjan-Küöl in Yakutia, and one of the easternmost dialects, namely the Bystraja dialect spoken in Central Kamchatka. Oral corpora for both dialects has been already glossed: with the Lamunkhin corpus comprising around 52,000 words and the Bystraja corpus comprising around 34,000 words. An important prerequisite for answering the question of how these dialects diverged is to establish in what way they differ. For this, it is necessary to compile an overview of morphological differences and, if possible, syntactic differences based on the corpora. The ultimate step is to investigate each of the differences that emerge from the corpus comparison to evaluate whether it might have arisen as an independent innovation, as the retention of ancestral variation, as the result of contact influence, or perhaps as a combination of these factors.
Participants: Brigitte Pakendorf, Vasilisa Andriyanets

Non-standard Word Order in Daghestanian Russian

The aim of this project is to investigate non-standard word-order realizations in Daghestanian Russian. At the current stage, the focus of the research is constituted by noun phrases with a genitive modifier. Whereas in Standard Russian the neutral word order in such phrases is noun + genitive, in Daghestanian Russian the opposite word order is often employed. Our hypothesis is that non-standard word order in such constructions is the result of contact with the speakers’ first languages (East Caucasian and Turkic), which show the order genitive + noun in such phrases. The alternative hypothesis would be that the order genitive + noun is rather a general feature of spoken Russian discourse, in which constructions of this type are also admissible. To verify our hypotheses, we conduct an analysis of noun phrases with genitive modifiers based on the corpus of spoken Daghestanian Russian and on the spoken subcorpus of the Russian National Corpus. The results of this study will contribute to a description of the syntactic properties of the variety of Russian spoken in Daghestan.
Participants: Chiara Naccarato, Natalia Stoynova, Anastasia Panova

Syntactic Annotation of the Corpora in the Universal Dependencies Format

The unified representation of dependency trees makes it possible to examine the syntactic parallelism across languages and word order effects. In the long run, it would be particularly interesting to apply quantitative methods in order to study the effects of language convergence. At the moment, the collection of existing UD treebanks covers ca. 40-50 languages including Russian, Bielorussian and Buryat (under Creative Commons license). From the point of view of the UD treebanks, the main contribution will be the development of the UD guidelines for ergative and polysynthetic languages based on the manual annotation of the corpora available in the Lab. The UD community provides tools for data annotation, validation, and visualization, as well as a number of online search engines. In this project, we are planning to work with the following language varieties: Mehweb, Adyghe, Even, Mari, spoken Russian and the spoken Russian of Daghestan.
Participants: Olga Lyashevskaya, Alexandra Kozhukhar

Spoken Corpora of Non-standard Variants

The aim of this project is to build a number of spoken corpora. So far, we have annotated data of Russian everyday speech collected by students of the Higher School of Economics (manually disambiguated morphology; stress annotation). Other potential targets include regional variants of Russian and field data from several minority languages of Russia. The study focuses on intonation and its variation within regional variants of Russian as well as expansion of some pitch patterns into languages in contact.
Participants: Olga Lyashevskaya, Ilya Chechuro

Circassian Isoglosses

The two Circassian languages (West Circassian, also known as Adyghe, and Kabardian), which constitute a branch of the Northwest Caucasian family, are often considered by their speakers a single language. This assumed linguistic continuum shows a lot of variation. The aim of this project is to survey various isoglosses found among regional varieties of Circassian, on the basis of both the existing literature and fieldwork. We aim at creating a database comparing the Circassian villages of Adygea, Karachaevo-Cherkessia, Kabardino-Balkaria and Krasnodarskiy Krai according to a number of parameters.
Participants: Yury Lander, George Moroz, Aleksei Fedorenko

Meadow Mari Corpus

Meadow Mari is a Uralic language spoken by about 375 thousand people. The aim of the project is creating a corpus of spoken Meadow Mari. The basis of the corpus will be the audio- and video-recordings made in 2000-2001 by a fieldwork party of the Moscow State University. Tasks of participants include technical support of the corpus (including glossing, annotating and aligning orthographic annotation with the audio) as well as data analysis. The focus of the project is studying the influence of Russian on Meadow Mari.
Participants: Anna Volkova, Mikhail Voronov

Relativization in Nakh-Daghestanian in Intragenetic and Areal Perspective

In Nakh-Daghestanian languages relative clauses are predominantly formed with a participle construction. Even though they can express different aspectual meanings, participles lack any syntactic orientation. There are no syntactic limitations on the target of relativization. The gap in the relative clause can correspond to a core argument, a peripheral participant or even a participant that is not part of the verb’s argument structure. The relativization of facts, places and time is also frequent. A pilot study on relativization targets in several Daghestanian languages revealed that preferences for the relativization of certain arguments differ. It is not apriori clear whether this is due to the counting method used, the particularities of certain corpora, or the grammar of specific languages. Within the project, relativization will be studied on the basis of more substantial corpus data, using a unified markup for relative clauses. Several Nakh-Daghestanian languages will be researched (Agul, Archi, Ingush, Udi and others), as well as other Caucasian languages, which are typologically and/or genetically far removed from Nakh-Daghestanian languages (e.g. Adyghe). The resulting generalizations will allow us to verify claims on the hierarchy of arguments in relativization as they are proposed in current syntactic theories.
Participants: Anna Volkova, Michael Daniel, Yury Lander, Timur Maisak, Johanna Nichols


