If you are interested in participating in the laboratory seminars, please register here.
Seminar schedule 2022
Morphological marking of “meditative” questions in Nakh-Daghestanian languages
In the talk I will present the results of a pilot study of “meditative” questions, a special semantic type of non-canonical questions, which normally do not require an answer and can even be asked in the absence of an addressee (cf. ‘I wonder’-questions in English or ‘интересно’-questions in Russian). In a number of Nakh-Daghestanian languages, questions of this type have dedicated morphological marking (suffixes or enclitics), although there seems to be no systematic study of their marking types. I will look at the marking of meditative questions in comparison with the marking of ordinary (polar and content) and indirect questions in several languages of the family. I will also briefly discuss the typical contexts where meditative questions are found in texts.
Евреи Дагестана: история, современное состояние этнической группы и памятники материального наследия
Светлана Амосова (Еврейский музей и центр толерантности, ИСл РАН)
Михаил Васильев (Центра «Сэфер», ИЯ РАН)
В первой части доклада речь пойдет об этнической группе Дагестана, которую называли и называют по-разному: горские евреи, таты, евреи Дагестана, джуури и Кавкази. Мы расскажем, что означают все эти термины, откуда они появились, рассмотрим разные точки зрения на этногенез этой группы, поговорим о территории проживания и диалектах языка, как складывались в разное время отношения с другими этническими группами Северного Кавказа. Кроме того, на материалах экспедиций последних нескольких лет мы покажем особенности современной идентичности этой группы, как она менялась на протяжении XX в.
Во второй части мы познакомимся с памятниками материального наследия горских евреев в Южном Дагестане, которые представлены главным образом сохранившимися зданиями синагог, а также горско-еврейскими кладбищами XVII – XX вв. При этом мы покажем, как при недостатке других письменных свидетельств надгробная эпиграфика становится одним из важнейших источников сведений о географии расселения и локальной истории небольших еврейских общин, проживавших в удалённых районах Южного Дагестана и прекративших существование в начале XX в.
В заключении мы на примере экспедиций Центра «Сэфер», проводившихся в 2018 – 2020 гг., кратко расскажем об особенностях и сложившейся практике исследований по изучению традиционной и современной культуры, а также наследия горских евреев как в регионах традиционного проживания, так и в диаспоре.
Borrowings and contacts in basic vocabulary and classification
Evgeniya Korovina (Institute of Linguistics, RAS)
Despite the fact that, by definition, the basic vocabulary consists of words that are borrowed least often, borrowings in this part of the lexicon happen regularly. This ranges from highly visible loanwords from languages of other families, such as Spanish borrowings in the languages of the indigenous population of Latin America, as well as hard-to-find loanwords and structural parallelism (homoplasy) between languages within the same subgroup. Cases of the second kind are especially typical for the so-called dialect chains, where it is sometimes difficult to draw a line between idioms, as well as in situations of significant phonetic conservatism of languages. Using examples from the history of, first of all, the languages of Central America and Polynesia, in my talk I'll try to consider ways to formally mathematically identify such situations.
Тысячелетняя история письменности на языках народов Дагестана: взгляд сквозь призму веков
Рамазан Абдулмажидов (ИИАЭ ДНЦ РАН), Шахбан Хапизов (ИИАЭ ДНЦ РАН)
Дагестан представляет собой регион с удивительным этническим и культурным многообразием, который единственный на Северном Кавказе имеет многовековую историю письменности. Еще в советский период был зафиксирован факт создания уникальной письменности Кавказской Албании, генетически связанной с армянским и грузинским письмом. Это государство, как известно, простирало свои границы на большей части современного Дагестана. Подлинный прогресс в исследовании албанской письменности связан с выявлением в 1990-х гг. в монастыре на Синайском полуострове 2 палимпсестов предположительно VII в., написанных на «агванском» языке. Только после их исследования удалось окончательно установить его место среди восточнокавказских языков.
Второй по времени опыт письменной фиксации речи на восточнокавказских языках (в данном случае на аварском языке) связан с распространением в Дагестане православия и грузинской письменности. Деятельность миссионеров здесь сопровождалась подготовкой и обучением служителей церкви из числа местного населения, составлением текстов на грузинском языке. Начало профессиональному исследованию грузинографической эпиграфики Дагестана было положено в первой половине ХХ в.
Ну и следующий этап развития письменности в Дагестане был связан с его исламизацией, и последовавшей за ней экспансией мусульманской культуры. Письменность на арабской графике для записи текстов на восточнокавказских языках начала использоваться еще в средневековый период, хотя вряд ли этот опыт имел системный характер. Из числа зафиксированных и ныне сохранившихся памятников наиболее ранней является аварская надпись XIV в. на камне, вставленном в стену мечети сел. Корода Гунибского района Республики Дагестан.
Все эти этапы и процессы развития письменности в Дагестане будут в развернутом виде освещены в настоящем докладе.
Days after tomorrow in the languages of Daghestan
Timofey Dedov (HSE University)
In the languages of Daghestan, days after tomorrow can be encoded in different ways. Three different types of strategies can be distinguished: 1) using semi-compositional terms with similar suffixes; 2) using transparently compositional constructions (like in most languages); 3) and using non-derived terms, which seems to be rare cross-linguistically. Some East Caucasian languages also differ from most other languages in the number of unique terms that are used to refer to the days after tomorrow (for the third strategy, the amount of unique terms for consecutive days after tomorrow can be as high as six). In my talk I will discuss all three strategies in more detail, and introduce the geographical distribution of these strategies, which was investigated for the “Typological Atlas of the languages of Daghestan”.
Systems of grammatical cases in the languages of Daghestan
Katerina Dagkou (University of Groningen)
The languages of Daghestan vary in terms of the system of core (grammatical) cases they feature. A first distinction concerns ergative vs. accusative languages. For ergative languages the typical system of core cases includes absolutive, ergative, genitive, and dative. Accusative languages include nominative, genitive, accusative, and dative. Some languages also include other grammatical cases besides the basic ones, e.g., affective, comitative, instrumental, comparative, and ablative. More exotic cases like contentive and benefactive are reported for a couple of languages. In this talk, I discuss the classification and distribution of systems of grammatical cases in the languages of Daghestan, which is the result of my research within the TALD (Typological Atlas of the Languages of Daghestan) project. Apart from classifying the languages according to the type and number of cases they include, I will also present the morphology of the grammatical cases per language group, their syntactic functions, and instances of case syncretism.
Zok, the Armenian dialect of Agulis
Katherine Hodgson (University of Cambridge)
Zok (otherwise known as the Agulis dialect of Armenian) is a form of Armenian that is so divergent that it has been described by some as a separate language. It was spoken in and around the town of Agulis in the southern part of Nakhijevan. The first written record of Zok is from 1711, but there were Armenians living in this area from at least the 5th century AD. The Armenian presence in Agulis itself ended with the massacre of 1919, but dialect-speaking populations remained in some of the surrounding villages, notably Tsghna, Tanakert, Ramis, and Paraka, until the 1970s and 80s. Closely related dialects are still spoken today in a few villages just across the border in the area of Meghri in Armenia. However, with the exception of Karchevan (population 292), these villages are now virtually abandoned, and the language is not being passed on to the younger generation. It is the subject of a documentation project funded by the Endangered Languages Documentation Programme. Speakers from the villages Tsghna, Tanakert, Ramis, and Paraka in Nakhijevan, and Karchevan and Kuris in the area of Meghri have produced 27 hours of video, of which 4 hours have so far been annotated using ELAN and FLEx.
Zok is not intelligible to speakers of other forms of Armenian, and various claims have been made about the origin of the speakers. However, a closer linguistic examination reveals that many of its distinctive features are shared wholly or partly with neighbouring Armenian dialects, especially those of Karabagh and northern Iran, implying that the Zoks are a local Armenian population with a long-term, stable presence in the area. The existence of geographically-correlated dialect variation between the villages where the language is spoken (the closer together they are, the more features they have in common) also suggests a stable pattern of settlement. Apart from phonological features (vowel shift, vowel harmony), the most striking distinctive characteristic is the development of a verb system that is unique within Armenian. This involves the loss of all monolectic verb forms in the indicative mood, and their replacement with participles + auxiliary, a tendency that exists in Eastern Armenian in general, but has nowhere else reached this extent. This is accompanied by the shift of tense marking from the auxiliary, which has become essentially a person marker, to a particle added to the ‘present’ (unmarked) form, something which is also found in Khoy/Urmia dialect. The past subjunctive is also formed in this way. Both these processes, as well as the mobility of the auxiliary/person marker, which attaches to the element with the main sentential stress, are characteristic of languages of the Iran-Araxes area.
The role of proto-wordlists in modern historical-comparative studies: from phonetic and semantic to "onomasiological" reconstruction
George Starostin (Centre for Comparative Studies and Phylogenetics of the Institute for Oriental and Classical Studies, HSE / External Fellow, Santa Fe Institute)
Although lexicostatistical methods of estimating linguistic distance between related or potentially related language units have become an essential staple of modern day phylogenetic linguistics, their reliability often depends more on the accurate collection and curation of data than on the specific mathematical / computational methods applied to said data. In my talk, I shall try to delineate the theoretical and pragmatic importance of a relatively new methodology, dubbed "onomasiological reconstruction", which purports to introduce a new level of accuracy to the projection of lexical items onto proto-levels of varying time depth. This methodology, which requires paying equal attention to phonological, semantic, and distributional features of compared items, can then be combined with lexicostatistical methods and applied with equal efficiency to varying datasets and linguistic taxa of widely varying time depth. In addition to having already yielded efficient results across genetic lineages ranging from Indo-European to North Caucasian to African language families, onomasiological reconstruction seems to hold plenty of potential for successfully differentiating between "patently false" and genuinely promising hypotheses of distant linguistic relationship.
Typological patterns and the language dynamics of the ancient Central Andes and South America
Matthias Urban (University of Tübingen)
In this presentation, I will sketch different aspects of the language dynamics of the ancient Central Andes of Peru and Bolivia –one of the few “cradles of civilization” of humanity – and South America more generally. I will highlight in particular the role of linguistic interaction and contact and the resulting typological distributions in understanding this dynamics.
I will start out from the present-day linguistic landscape of the Central Andes, which is strongly dominated by the Quechuan and Aymaran families whose common contact-induced typological profile has for a long time influenced ideas of what Andean languages are like. I will then broaden the scope and explore how new analyses of the available materials for the now extinct languages of the Central Andes bring to light a now submerged interaction sphere in Northern Peru, and how this north-south structure is congruent with archaeological and molecular anthropological evidence, allowing for new ways of interdisciplinary dialogue beyond language expansions. Finally, I will broaden the scope again, and show how recent work in the areal typology of broader parts of the Andes and South America articulates with these new findings. This work suggests a finely spatially structured gradient of typological variation in the Andes into which the new evidence from the Central Andes fits seamlessly. The proper interpretation of this gradient is presently not clear yet, though one possibility is that it is a reflection of an ancient layer of affinities between the languages of the region.
Reported Speech Constructions without Regular Quotation Meaning: A Crosslinguistic Analysis
Daniela Casartelli (University of Helsinki)
Contact-induced innovations at the interface: the case of subject pronouns in heritage languages
Alberto Frasson (Utrecht University)
The Kurdish Imperfective: Diachronic, Typological, and pan-Iranian Perspectives
Shuan Karim (The Ohio State University)
Differential Object Marking in Ossetic: A corpus-based analysis
Emine Şahingöz (Goethe University Frankfurt)
Chepang’s direct-inverse system in need of historical, epistemic, and pragmatic explanations
Marie-Caroline Pons (University of Oregon)
Looking for areal patterns in the domain of discourse formulae:
The case of blessings and curses in Daghestan
Pavel Astafiev, Nikita Beklemishev, Nina Dobrushina & Alina Russkikh (in alphabetical order)
It is a well-known fact that certain discourse markers, such as interjections, formulae of greetings or leave-taking, vocatives or politeness markers are often borrowed (see Andersen 2014 for some references). The claim is primarily derived from the data on material borrowings, such as English OK or Russian davaj, but there is also scarce evidence of pattern borrowing in this domain. Studies mention the similarity of greetings in some areas (Matisoff 2011 in South-East Asia, Lüpke & Watson 2020 in West Africa) or good-night expressions in contacting languages (May God wake us up in Ewe and Likpe - Ameka 2006).
There are few systematic studies of the spread of discourse patterns across certain areas, such as word iteration in the Mediterranean (Stolz 2004) or morning greetings in Daghestan (Naccarato & Verhees 2021). Areal comparison of some types of formulae can also be investigated in anthropology; for example, some formulae are included in the world-wide database of folklore and mythological motifs (Berezkin & Duvakin, http://www.ruthenia.ru/folklore/berezkin/). Many questions remain unanswered: are discourse formulae more diffusible than grammar? Do the areal distributions of formulae correspond to the areal distributions of linguistic features of other levels? How strong is the genealogical signal in the distribution of discourse formulae? What is their role in the transfer of various grammatical phenomena?
In this talk, we will approach this issue from the perspective of wish-expressions, or blessings and curses in Daghestan. We will present the database of wish-expressions in nine languages of Daghestan. One of the problems with cross-linguistic comparison of blessing and curses is that it is not fully clear what are the grounds for such comparison, i.e. which wishes of one language should be mapped on which wishes of other languages. We will discuss the problem of cross-linguistics comparison of wishes and our first attempts to process the sets of formulae in these nine languages in order to detect the areal signal.
Ameka, Felix K. 2006. Grammars in contact in the Volta Basin (West Africa): On contact induced grammatical change in Likpe. In Alexandra Y. Aikhenvald & R. M. W. Dixon (eds.), Grammars in contact: A crosslinguistic typology, 114–142. Oxford: Oxford University Press.
Andersen, G. (2014). Pragmatic borrowing. Journal of Pragmatics, 67, 17-33.
Lüpke, Friederike & Rachel Watson. 2020. Language contact in West Africa. In: Adamou, Evangelia & Yaron Matras (eds.): The Routledge handbook of language contact.
Matisoff, James A. 2011. Areal semantics - Is there such a thing? In: Saxena, A. ed. Himalayan languages: past and present (Vol. 149). Walter de Gruyter.
Naccarato, Chiara & Samira, Verhees. 2021. Dobroe utro, prosnulis'? Utrennie privetstvija v jazykah Dagestana. In Durh#asi hazna. Sbornik statej k 60-letiju r. O. Mutalova, edited by Nina R. & Testelec Majsak Timur A. & Sumbatova. Moskva: Buki Vedi.
Detecting regional areal patterning across multiple linguistic features. A discussion
Chiara Naccarato, Ezequiel Koile, Michael Daniel, Nina Dobrushina, Samira Verhees (Linguistic Convergence Laboratory)
Aleksey Vinyar, Alexandra Nogina, Daria Ignatenko, Tatiana Kazakova, Alexey Baklanov, Ksenia Lapshina (Arctic Lab)
Systematic study of areal convergence is fed by comparable data on typological profiles of languages belonging to the area under consideration. But when envisaging an analysis of areal patterning of languages within a certain area, an expert in this area runs a risk of eschewing the results of her analysis by (unconsciously) selecting linguistic features known or more readily available to her to consider, which may shape the area in a specific way; while choosing a different set of features for data collection may result in another partitioning of the same area. In this seminar, we are going to discuss this in connection with the ongoing projects of areal typological study of the languages of Russia.
In the first part, we will provide a brief recap of the Typological Atlas of the Languages of Daghestan, a project in which data on typological diversity of Daghestanian languages and their relatives and neighbors is being systematized, surveying a general approach to data collection and examples of features. In the second part, our colleagues from the Arctic Lab will present similar data from their research on linguistic diversity of the languages of Northeast Siberia (from Samoyedic branch of Uralic in the west to Turkic, Mongolian and Tungusic to Yukaghir, Nivkh, Chukotko-Kamchatkan and Aleut-Yupik-Inuit in the east). Finally, in a third part, we will discuss methodological issues and current approaches aiming to define linguistic areas with a global perspective.
The seminar will be in a slightly unusual format, probably more interactive than usual and primarily intended to share experience and ideas, and with contributions from different research groups. We also invite comments from other participants that embark on comparable enterprises and encounter similar challenges.
Language contact between Mano and Kpelle: a holistic research program
Maria Khachaturyan (University of Helsinki)
This talk presents an ongoing project on multilingualism and language contact between Mano and Kpelle, two Mande languages spoken in the South-East of Guinea. In the first part of the talk, I provide an overview of the project and its different strands, including 1) an investigation of the sociolinguistic situation of the region with a particular focus on strategies of language choice studied with ethnographic observations and with a sociolinguistic questionnaire (Khachaturyan and Konoshenko 2021); 2) a comparative study of the grammars of these two languages and their close linguistic relatives and identification of convergent and divergent features, including those potentially related to pattern (Konoshenko 2015: 176‑177) and matter borrowing (Khachaturyan 2019); 3) a study of translation, and especially religious translation, translatory artefacts and practice of translation, as a locus of contact and source of convergence (Khachaturyan 2020, Khachaturyan and Konoshenko in prep.). In the second part of the talk, I present an experimental study focusing on the acquisition of a particular morphosyntactic parameter of Mano, namely, reflexive marking, and the impact of the speakers' exposure to Kpelle on the acquisition process. I present the experimental design, preliminary results and theoretical questions which the study aims to address.
Adjectival agreement in the East Caucasian languages: an overview
Few sources deal with the origin of number agreement in the languages of the world. Apart from the theoretical work (Lehmann 1982), only several case studies have been published, among them (Frajzyngier 1997, Di Garbo 2020, Cruz 2015).
The EC languages are numerous, and adjectival number agreement in EC seems to be morphologically and diachronically heterogeneous, which leads one to believe that it is innovative. This makes the EC languages suitable for investigating the origin of number agreement. However, no study of this kind has been undertaken yet.
The goal of this study is to make a survey of number agreement patterns in EC, to assess the weight of genealogical and areal factors in the distribution of patterns and then try to describe paths by which adjectival plural agreement may have originated in the EC languages.
I use the methodology adopted in the Typological Atlas of the Languages of Dagestan: So far, I searched in grammars of the EC languages and the neighboring languages (overall 65 idioms) to find out how adjectival plural agreement is expressed in each of them. I divided the languages into three types: obligatory / optional / absent plural agreement. For the optional type I established the factors that influence the presence of number agreement and plotted these types on maps.
Seminar schedule 2021
Postpositions in East-Caucasian languages: a description and comparative perspective
Polina Nasledskova & Tatiana Philippova
Our study summarizes and analyzes the information about postpositions provided in the various grammatical descriptions of East-Caucasian languages. In our talk, we are going to briefly report on our findings and discuss the prospects. First, we are going to propose an overview chapter on postpositions and several features for maps in the Typological atlas of the languages of Daghestan (TALD) that we are currently working on. Second, we are going to present our conception of a paper about East-Caucasian postpositions from a typological perspective. In particular, we are going to show that despite the fact that a number of properties of East-Caucasian postpositions differ significantly from the typical properties of adpositions in general and Indo-European prepositions in particular, the difference is not due to their special status, but due to the fact that they usually do not serve the functions of primary adpositions in other languages. Rather, East-Caucasian postpositions are more similar to the secondary adpositions in other languages, while typical primary adpositions more often correspond to the East-Caucasian spatial suffixes rather than to postpositions. Finally, we are going to suggest that the notions of localization and directionality, widely used for the description of spatial forms in East-Caucasian languages, can help better describe the meanings and functions of primary adpositions in other languages (e.g. Russian), if applied as comparative concepts.
Detecting linguistic variation with geographic sampling
Ezequiel Koile & George Moroz
Geolectal variation is often present in settings where one language is spoken across a vast geographic area. This can be found in phonological, morphosyntactic, and lexical features. For practical reasons, it is not always possible to collect fieldwork data from every single location in order to obtain this full pattern of variation, and we must select a group of locations to be surveyed, in order to resemble the underlying distribution of linguistic features. We propose and test a method for sampling different locations where a language is spoken, finding the optimal places to be included in a sample, with the goal of obtaining a distribution of typological features representative of the whole area. For this goal, we use different clustering algorithms such as k-means and hierarchical clustering of locations based on their geographic distribution, and define our sample of locations on the basis of this clusterization. We test our methods against simulated data with different distributions of linguistic features, on various spatial configurations, and also against real data from Circassian dialects (Northwest Caucasian). Our results show an efficiency higher than random sampling, both for detecting variation and for estimating its magnitude, which makes our method profitable to fieldworkers when designing their research.
Field NLP and where to find it in the School of linguistics
A.Bonch-Osmolovskaya, E.Klyachko, S. Kosyak, L.Nesterenko, G.Moroz, O.Serikov, S.Toldova
In our talk we are going to present a new research area, which we call Field NLP --- a mixture of several areas:
application of Natural language processing methods to low resourced languages;
creation of tools for field linguists;
getting low resourced languages’ data from non-field: digitalisation, social media parsing etc;
digital preservation of low resourced languages’ data;
popularisation of low resourced languages’ data among speakers.
Some of these domains are more developed and have more prominent results than others. We are going to highlight existing lacunae and make an overview of known tools for Field NLP and present our research in this field. We will cover automatic transliteration, segmentation, speech recognition, morphological glossing and others.
We believe that the only way to advance the Field NLP is working as a community. Hence we aim to find common grounds with all scholars engaged in studies and documentation of minor languages. With this in mind, we will specifically address the problem of digital preservation of linguistic data.
Studying language contact using neighbour graphs: From consonant-inventory prediction to analysis of segment borrowability
Dmitry Nikolaev (University of Stuttgart)
The aim of this talk is to demonstrate the advantages of using geographical nearest-neighbour graphs for large-scale study of language contact. After discussing the motivation for using nearest-neighbour graphs in typological linguistics and briefly surveying the ways of constructing them, I will present two case studies. In the first one, I will show that a nearest-neighbour graph gives a flexible and efficient way of showing the importance of language contact for modelling the composition of segmental inventories of Eurasian languages; I will argue that at a certain time depth language contact becomes a better predictor of consonant-inventory structures than phylogenetics. In the second case study, I will show how SegBo, a recently presented dataset of borrowed phonemes, can be used to construct a world-wide graph of language contact and then will use this graph to model comparative borrowability of different phonemes.
К реконструкции древних контактов языков Северной Сибири
Валентин Гусев (Институт языкознания РАН)
Языки Северной Сибири обладают рядом интересных, в том числе типологически нетривиальных ареальных особенностей. В докладе будут рассмотрены некоторые из этих особенностей, будет показано, в какие кластеры на их основании можно объединить языки (а таких группировок может быть несколько в зависимости от того, какие черты мы рассматриваем) и какие из них обнаруживают неожиданные географические параллели. По крайней мере, некоторые из этих параллелей с большой вероятностью свидетельствуют о древних контактах.
Chaplinsky and other Yupik languages of Chukotka: sociolinguistic situation and a case study in grammar
In this talk I am going to present the results of my fieldtrip to Chukotka in October 2021. First, I will show a small corpus of narratives and songs in Yupik languages which I have collected during my fieldwork. Second, I will present sociolinguistic data: following (Dobrushina 2013), I used the method of retrospective family interviews and gathered some first-hand data on language repertoires of Yupik people (namely, their knowledge of other Yupik languages, Chukchi, Russian and English), the history of Yupik-Chukchi relations and the history of relations between speakers of Chaplinsky (Central Siberian) Yupik in Novoje Chaplino and on St. Lawrence Island (USA) (cf. Morgounova 2007). Third, I will describe constructions with wordforms containing suffixes -st caus, -sq ask, -nəχsiʁ expect, -niq say in Chaplinsky Yupik. As has been previously noted for the reportative suffix -niq say (Vakhtin 2007: 109-115), wordforms with ‑niq can be analyzed as consisting of two predicates: the matrix predicate ‘say’ and the dependent predicate. I develop this analysis and argue that constructions with all four listed suffixes represent examples of morphologically bound complementation (Maisak 2016, Panova 2020).
Dobrushina N. (2013). How to study multilingualism of the past: Investigating traditional contact situations in Daghestan. Journal of Sociolinguistics, 17 (3). P. 376-393.
Maisak T. A. (2016). Morphological fusion without syntactic fusion: The case of the “verificative” in Agul. Linguistics, 54(4). P. 815–870.
Morgounova, D. (2007). Language, identities and ideologies of the past and present Chukotka. Études/Inuit/Studies
Panova, A. B. (2020). Morfologicheski svyazannaya komplementatsiya v abazinskom yazyke. Voprosy Jazykoznanija, 4. P. 87–114.
Vakhtin, N. B. (2007). Morfologiya glagol'nogo slovoizmeneniya v yupikskikh (eskimosskikh) yazykakh. S.-Petersburg: Nestor.
Comparing cross-language phonological profiles
This talk considers different strategies for comparing the phonological profiles of languages. This can be useful for comparing different related lects (dialectology), unrelated lects (phonological typology), different diachronic states of the same lects (historical linguistics), models for language acquisition/loss, some NLP tasks, etc. I discuss two different strategies for comparing phonological profiles: the complexity-based approach and the distance-based approach. In the first approach, researchers propose different ways of calculating phonological complexity (Nichols 2009; Maddieson 2009; Coupé et al. 2009), which can be used in cross-language comparison (see criticism of this approach in (Simpson 1999; Deutscher 2009; Ohala 2009)). In the second approach, scholars apply different measures for calculating the distance between languages based on phonology (Heeringa 2004; Eden 2018; Anderson et al. 2021). There are two methods used in the distance measurement literature:
* parametric approach: different feature sets (segment inventory, feature inventory, typological phonological features like stress and syllable structure) are used for distance calculation;
* cross-entropy approach: entropy is used for the analysis of some samples of language data (corpus, dictionary).
Anderson, C., Tresoldi, T., Greenhill, S. J., Forkel, R., Gray, R. D., and List, J.-M. (2021). Measuring variation in phoneme inventories (preprint v1). Research Square .
Coupé, C., Marsico, E., and Pellegrino, F. (2009). Structural complexity of phonological systems. In Approaches to phonological complexity, pages 141–170. De Gruyter Mouton.
Deutscher, G. (2009). "Overall complexity": a wild goose chase? In Language complexity as an evolving variable, pages 243–252. Oxford University Press.
Eden, S. E. (2018). Measuring phonological distance between languages . PhD thesis, University College London.
Heeringa, W. J. (2004). Measuring dialect pronunciation differences using Levenshtein distance . PhD thesis, University Library Groningen.
Maddieson, I. (2009). Calculating phonological complexity. In Approaches to phonological complexity, pages 83–110. De Gruyter Mouton.
Nichols, J. (2009). Linguistic complexity: a comprehensive definition and survey. In Language complexity as an evolving variable, pages 110–125. Oxford University Press.
Ohala, J. J. (2009). Languages’ sound inventories: the devil in the details. In Approaches to phonological complexity, pages 47–58. De Gruyter Mouton.
Simpson, A. P. (1999). Fundamental problems in comparative phonetics and phonology: does UPSID help to solve them. In Proceedings of the 14th international congress of phonetic sciences, volume 1, pages 349–352.
Avoiding bias in comparative creole studies: Stratification by lexifier and substrate
Susanne Michaelis (Max Planck Institute for Evolutionary Anthropology)
One major research question in creole studies has been whether the social/diachronic circumstances of the creolizaton processes are unique, and if so, whether this uniqueness of the evolution of creoles also leads to unique structural changes, which are reflected in a unique structural profile. Some creolists have claimed that indeed the answer to both questions is yes, e.g. Bickerton (1981), McWhorter (2001), and more recently Peter Bakker and Ayméric Daval-Markussen. But these authors have generally overlooked that cross-creole generalizations require representative sampling, especially when working quantitatively. Sampling for genealogical and areal control has been a much discussed topic within world-wide typology, but not yet in comparative creolistics. In all available comparative creoles studies, European-based Atlantic creoles are strongly overrepresented, so that typical features of these languages are taken as “pan-creole” features, e.g. serial verbs, double-object constructions, or obligatory use of overt pronominal subjects. But many of these Atlantic creoles have the same genealogical/areal profile, i.e. European (lexifier) + Macro-Sudan (substrate). I therefore propose a new sampling method that controls for genealogical/areal relatedness of both the substrate and the lexifier, which I call “bi-clan” control (where “clan” is a cover term for linguistic families and convergence areas).
Assessing inter-speaker variation in contact-influenced Russian
In this talk, I will deal with Russian speech of older speakers of Nanai and Ulcha (Southern Tungusic, the Amur region). A great inter-speaker variation takes place: some bilingual Nanais and Ulchas are speakers of a “near-pidgin” Russian variety, the speech of some others does not differ greatly from the monolingual benchmark. The data used in the study come from the Corpus of contact-influenced Russian of Northern Siberia and the Russian Far East (http://web-corpora.net/
On the one hand, I will show
which contact-induced features appear to be more stable, i.e. equally represented in texts produced by different speakers, and which ones contribute to inter-speaker variation most of all
which features behave similarly, i.e. are equally frequent / infrequent in texts produced by the same speakers.
On the other hand, I will discuss
how speakers group together according to contact-induced features typical of them
whether these clusters of speakers correlate with any sociolinguistic parameters
whether they go in line with the researcher’s intuition or look surprising.
An additional motivation for this study is methodological. I will test how precisely the existing corpus annotation captures the degree of deviation from monolingual benchmark and inter-speaker variation.
Computational processing of Bagvalal morphology: problems and future tasks
Daniil Ignatiev, Nick Howell, George Moroz
Bagvalal is a minority language of the Nakh-Daghestanian language family. Like many indigenous languages, Bagvalal lacks tools for computational processing of language data. While field researchers have accumulated a relatively large amount of linguistic data in documentation projects, it is still insufficient for statistical approaches to text processing to be applied. The talk discusses a rule-based technology for text processing that was successfully used to design a prototype morphological glosser for the Kwanada dialect of Bagvalal. Lack or insufficiency of certain types of lexical and grammatical data, to be discussed in the talk, complicates further tuning of the instrument as well as its application to other Bagvalal dialects. However, further work on the analyzer could facilitate fieldwork and make it possible to design a machine translation system for Bagvalal.
Several aspects of numeral morphology in the languages of Dagestan
Maxim Melenchenko & Aigul Zakirova
In this talk we will demonstrate new maps for the Typological Atlas of the languages of Dagestan, covering several topics of numeral morphology in the East Caucasian languages. We examine numeral markers appearing in different series (cardinals, ordinals, distributives, etc) and elaborate on their diachronic sources. We also address differences in the structure of complex numerals, e.g. the inventories of linking suffixes and the repetition of cardinal markers inside complex numerals. Finally, we will discuss several instances of borrowing in numeral systems, including lexical and morphological borrowings.
Exploring socio-spatial networks and individual-based variation in the study of small-scale multilingualism
Pierpaolo di Carlo, Jeff Good (University of Buffalo)
This talk presents the initial results of research by a number of members of the KPAAM-CAM multidisciplinary team (including linguists, sociolinguists, anthropologists, and geographers) aiming to explore multiple methods and datasets in the study of small-scale multilingualism. The testbed is Lower Fungom, a rural area of western Cameroon where small-scale multilingualism has been widely documented.
In the talk, we will present (i) epistemological issues posed by contexts of small-scale multilingualism and the methodological responses we have put in place to address them, mainly concerning the need to explore individual-based variation; (ii) initial findings from the study of individual-based wordlists by applying tools originally designed for cognate detection for historical linguistic purposes to questions of synchronic variation, and (iii) the correlations that such lexicostatistical data have with geographic distance vs. travel difficulty between locales associated with distinct languages.
Geography and language divergence: the case of Andic languages
Ezequiel Koile, Ilya Chechuro, George Moroz, Michael Daniel
We study the correlation between phylogenetic and geographic distances for the languages of the Andic branch of the East Caucasian (Nakh-Daghestanian) language family. For several alternative phylogenies, we find that geographic distances correlate with linguistic divergence. Notably, qualitative classifications show a better fit with the geography than cognacy-based phylogenies. We interpret this result as follows: the better fit may be due to implicit geographic bias in qualitative classifications and conclude that approaches to classification other than those based on cognacy run a risk to implicitly include geography and geography-related factors as one basis of genealogical classifications.
Update: Typological Atlas of the Languages of Daghestan
George Moroz, Timofey Mukhin, Chiara Naccarato and Samira Verhees
In this talk we introduce the recent updates made to the Typological Atlas of Daghestan, which include new topics and new visualizations. We would also like to use this opportunity to discuss how to turn the atlas into a resource with chapters and data that are both easy to use, cite and find on the one hand, and easy to edit and update on the other hand.
During the talk we also will discuss a new phonological database of East Caucasian languages and patterns that it reveals. We will discuss the distribution of the following phonological features: inventory size, gemination, labialisation, laterals, nasal vowels, long vowels and briefly discuss correlation between elevation and inventory size (sorry for those of you, who have seen this on SLE conference).
Postpositions in Nakh-Daghestanian
Polina Nasledskova, Tatiana Philippova
In this brief talk we will report on our ongoing project devoted to a general description of postpositional systems in the Nakh-Daghestanian languages. In particular, we shall talk about their case government properties and the ability to function as adverbs. At the end we will present ideas concerning our prospective contribution to the Typological Atlas of the languages of Daghestan.
Comparative Andic dictionary database: history of creation
During the last two years, we worked together with Arseniy Averin, Anastasia Davidenko, Ilya Sadakov, Zlata Shkutko, Grigory Kuznetsov, Anna Tsysova, Wanshu Zhang on digitalisation of the Andic dictionaries. During compilation of the database we also worked on several subprojects on comparative phonology, colexicalisation and morphology of plural nouns forms. During the talk I would like to present the database and briefly discuss some preliminary results of the conducted research.
Towards a typology of continuative expressions
This study investigates how continuative semantics is encoded cross-linguistically. The work is based on two independent language samples: a sample with global coverage and an intragenealogical sample of four Northwest Caucasian (Abkhaz-Adyge) languages. The cross-linguistic sample is genealogically and geographically balanced and includes 120 languages. Means that convey continuative semantics — continuative expressions — are analyzed according to the following parameters: morphosyntactic type (affix, auxiliary, adverbial phrase), degree of grammaticalization, tense-aspect-actionality restrictions on the predicate, non-continuative uses of the continuative expressions and semantic effects when combined with negation. The data come mainly from secondary sources (grammatical descriptions and dictionaries) and parallel texts. The second part of the study focuses on the intrageneological typology of continuative expressions in the following Northwest Caucasian languages: Abaza, Abkhaz, Kabardian and West Circassian (Adyghe). The main sources for the study of continuative expressions in Northwest Caucasian are elicited data and parallel texts. Based on the results of the macro-typological and intrageneological studies and their comparison, I suggest that two typological clusters or profiles of continuative expressions can be distinguished — predicative and adverbial, and that continuative expressions belonging to different classes show different degrees of diachronic stability.
Solving the puzzle of the Ob-Ugric passive
Nikita Muravyev & Daria Zhornik
In this talk, we look at the active/passive voice alternation in two Ob-Ugric languages of Western Siberia, Northern Khanty and Northern Mansi. This alternation has been described in the literature as primarily motivated by information structure: a sentence appears in active whenever an Agent is the primary topic of the sentence, otherwise passive voice is used (Kulonen 1989, Nikolaeva 2001). However recent text and elicitation data suggest that a purely information-structure based approach has a number of shortcomings. First, passive can be used if an Agent is topical yet low in animacy and/or definiteness. Second, focused Agents are allowed in special kinds of active sentences, e.g. interrogative contexts. Moreover, passivization is possible with a great variety of intransitive verbs with no Agent role whatsoever, including state verbs and verbs denoting spontaneous change of state. Also intransitive verbs can be passivised in adversative contexts in which some discourse participant external to the event gets affected in some way. These facts posit a problem both for the abovementioned information-structural approach and for the existing typological accounts of the active/passive alternation. We will discuss these facts in detail, compare the situation in Khanty and Mansi and present a model which helps at least partially solve the Ob-Ugric puzzle.
Kulonen U. M. The Passive in Ob-Ugrian. Helsinki, Finno-Ugrian Society, 1989.
Nikolaeva, I., 2001. Secondary topic as a relation in information structure. In: Linguistics, 39.1: 1–50.
Looking for areal convergence in nominal gender assignment in East Caucasian
Ilya Chechuro & Michael Daniel
In this talk, we investigate whether the data on nominal gender assignment in East Caucasian - more specifically, Lezgic - languages show any evidence for areal convergence. To do so, we consider those Lezgic languages and their immediate neighbours that feature four-gender systems, including Budukh, Kryz, Rutul, Tsakhur and Archi, and compare them to Lak, Archi’s immediate neighbour, and Khinalug, immediate neighbour of Kryz and Budukh. In all these languages, Gender 3 and Gender 4 are semantically heterogeneous, so shared assignment may be due to (a) common inheritance, (b) areal convergence, or (c) pure chance. A quantitative analysis of gender assignment across the lexicon documented in Kibrik and Kodzasov (1990) suggests that Archi is more similar to its neighbour Lak than to any of its Lezgic cousins. No such result has been obtained by comparing Khinalug and its Lezgic neighbours Budukh and Kryz. We will discuss various methodological refinements we attempted to unravel the genealogical and areal signals, and to distill both of them from the impact of crude semantics. These attempts were purposefully based on the use of data external to East Caucasian (World Loanwords Database; Wordnet) but so far have not been successful - so we will ask for your ideas to improve our methodology.
On tense, aspect, and evidentiality in Karata (East Caucasian, Karata village variety)
Jérémy Pasquereau (University of Poitiers)
Like other East Caucasian languages, Karata has elaborate verbal paradigms, in particular because of the high number of analytic constructions it uses. On the basis of a 40+-text corpus and Dahl’s 1985 TAM questionnaire, and building on previous work (Magomedbekova 1971, 1998, Magomedova & Xalidova 2001, Xalidova 2019), I present ongoing work aiming at describing the morphosyntax and the meanings of verbal forms in this language.
Karabagly - an Armenian village in Dagestan
In this talk I will report on my two-day visit to Karabagly, a village in northern Dagestan (Tarumovsky district) that was originally mono-ethnic Armenian and presently still has a majority Armenian population. I will discuss some preliminary observations on the preservation of Armenian language and culture in the village, and the relationship of the Armenians with other local people as well as their historical homeland Armenia.
In this talk, we will briefly present the results of the sociolinguistic field study carried out in five adjacent Lak and Tsudakhar villages. We will focus on the Tsudakhar - Lak bilingualism and ethnic contacts, their main site being the Tsudakhar Monday market. Our attempt to observe communication at the Tsudakhar market will be discussed, with a brief reference to other markets of highland Daghestan. We will also mention Tsudakhar - Avar contact in the village of Karekadani.
Grammatical co-expression patterns in creoles and their parent languages: comitative and related functions
Susanne Maria Michaelis (MPI-EVA, Leipzig)
In this talk, I will report on an ongoing project on grammatical coexpression patterns (or polysemy patterns) in creole languages and their parent languages, such as illustrated in examples (1)–(4). The Seychelles Creole polysemous marker (av)ek ‘with, and, by’ (< French avec ‘with’) is used to express four different grammatical functions: comitative (1), instrumental (2), passive agent (3), and noun phrase conjunction (4).
Mon 'n travay avek Sye Raim.
1SG PRF work com Mr Rahim
‘I have worked with Mr Rahim.’ (Bollée & Rosalie 1994:14f.)
Nou fer servolan nou file ek difil.
1pl make kite 1pl let.glide with thread
‘We made a kite and let it glide with a thread.’ (Michaelis 1994:66)
(3) passive agent
Mon’n ganny morde ek lisjen
1sg.prf pass bite pass.agent dog
‘I have been bitten by a dog.’ (Michaelis & Rosalie 2000:82)
(4) noun phrase conjunction
Mari ek Pyer
‘Mary and Peter’
When comparing the specific coexpression pattern of Seychelles Creole (av)ek with the patterns in its parent languages, it becomes clear that in French, the lexifier language, the marker avec ‘with’ only covers a subset of the meanings that the Seychelles Creole marker (av)ek covers, namely only comitative and instrumental. By contrast, the Passive agent is expressed by par ‘by’ in French, and noun phrase conjunction is expressed by the coordination marker et ‘and’. However, Makhuwa and other neighboring Bantu languages of East Africa (the most important cluster of substrate languages relevant for Seychelles Creole) show the same coexpression pattern as the one cited for Seychelles Creole. Here, the marker ni (van der Wal 2009:113) covers all four grammatical meanings that we saw for Seychelles Creole, comitative, instrumental, passive agent, and noun phrase conjunction.
The hypothesis of the paper goes beyond Seychelles Creole: It extends to potentially all creole languages. I suggest that grammatical coexpression patterns in creoles are not randomly distributed, but they systematically reflect the grammatical coexpression patterns of their substrate languages, and much less so those of their lexifier languages. Here I investigate 10 creole languages from around the world (genealogically maximally distinct) and their parent languages for the grammatical markers expressing comitative, instrumental, and noun phrase conjunction (and related meanings).
Recent literature (e.g. Baptista 2020) suggests that “convergence” of functions (and possibly forms) of the parent languages is a major driving force for shaping creole grammars. Indeed, at first glance the coexpression pattern of a grammatical marker ‘with’ in a creole language seems to mirror overlapping, convergent grammatical meanings between its lexifier and its substrate language(s). But a closer look at the grammatical coexpression patterns of similar ‘with’-markers in genealogically different creoles and their parent languages reveals that it is the coexpression patterns of the substrates that tend to be imposed on the nascent creoles, irrespectively of the degree of convergence of the lexifier patterns with those of the substrates and/or the creole. Thus, comitative, instrumental, passive agent and np-conjunction are shared by Makhuwa and Seychelles Creole, whereas French only converges in comitative and instrumental with both Makhuwa and Seychelles Creole.
Baptista, Marlyse. 2020. Competition, selection, and the role of congruence in creole genesis and development. Language 96:1, 160-99.
Bollée, Annegret and Rosalie, Marcel. 1994. Parol ek memwar. Récits de vie des Seychelles. Hamburg: Buske.
Michaelis, Susanne. 1994. Komplexe Syntax im Seychellen-Kreol: Verknüpfung von Sachverhaltsdarstellungen zwischen Mündlichkeit und Schriftlichkeit. Tübingen: Narr.
Michaelis, Susanne and Rosalie, Marcel. 2000. Polysémie et cartes sémantiques: Le relateur (av)ek en créole seychellois. Études Créoles 23. 79-100.
van der Wal, Jenneke. 2009. Word order and information structure in Makhuwa-Enahara. Utrecht: Netherlands Graduate School of Linguistics.
From noun plural to plural agreement: evidence from Andi dialects (and beyond)
Noun plural markers sometimes grammaticalize into markers of plural agreement on various targets: e.g. Turkic –lar (Erdal 2004: 231 for Old Turkic, Matasović 2018 for Karaim), similar developmens can be postulated for the Adyghe -xe (Lander et. al. forthc.), and Nivkh -ɣun (Gruzdeva forthc.).
The process of grammaticalization of noun plural marking into plural agreement marking on other types of targets has not, to my knowledge, been dealt with in typological literature. A way to compensate for this gap would be to describe scenarios of such evolution in particular languages and language groupings.
Andi (Avar-Andic < Avar-Ando-Tsez < East Caucasian) presents an interesting case of grammaticalization of a plural marker -(V)l into a number agreement marker. I will address the question of how this mechanism of number agreement might have evolved. -(V)l is most probably the reflex of *li, one of the reconstructed Proto-Andic plural markers (Alexeyev 1988: 92-93). Whereas related Andic languages have an extensive list of plural markers that hardly have something in common, in most Andi dialects -(V)l was generalized as a nominal plural marker. The next step was the extension of -(V)l onto other word forms, i.e. targets of agreement, both inside the NP and onto verbal forms and adverbs. The behavior of -(V)l on different types of targets will be condisered in order to come to a plausible scenario.
In a more descriptive vein, I will compare the -(V)l-agreement to the more "canonical" gender agreement, also present in Andi.
Finally, I will briefly consider examples of similar developments in the related East Caucasian languages.
Squaring the circle in the Caucasus: Perspectives on Sprachbünde and Language Contact
Thomas Wier (Free University of Tbilisi)
Linguists have long noted both the exceptional internal diversity of the Caucasus, but also that many of the features of languages found there are not found in immediately adjacent regions of Eurasia. In the last two centuries, the question has thus arisen more than once: to what extent do these unusual features arise from language contact, and to what extent can they be explained by other (phylogenetic, typological, or indeed statistically random) traits? In this lecture I will review three different sets of answers that have been proposed: Klimov (1965, 1973); Tuite (1998); and Chirikba (2008). After reviewing these arguments, I will suggest that while autochthonous Caucasian languages do share a quantitatively large number of phonological and morphosyntactic traits in common, qualitative similarities are more probative in answering the question of whether the region constitutes a true Sprachbund, and a better approach might be to distinguish micro- and macro-Sprachbünde.
Contact influences on Ossetic: A general overview
In many ways, Ossetic has a unique status among languages of the Caucasus. Belonging to the Iranian branch of the Indo-European language family, Ossetic is the last living representative of Sarmatian varieties once widely spoken in the northern Black Sea region. Having long developed in isolation from other Iranian languages, Ossetic has, on the one hand, preserved a number of archaic features; on the other hand, it has developed unique innovations, some of which may be explained by language contact. The Ossetic lexicon, mainly being of Iranian origin, has a comparatively large share of loanwords from neighbouring languages, many of them in the basic lexicon. In phonology, a key contact-induced feature is the presence of ejective consonants, mainly in Caucasian loanwords. Some grammatical features of Ossetic (word order, case system, structure of complex clauses) may also be contact-induced. Therefore, the data of Ossetic are valuable both for the typology of language contact and the study of early contacts of Ossetians / Alans and other ethnolinguistic groups. In the talk, I will provide a general overview and discussion of lexical and grammatical features of Ossetic that may be contact-induced, and a preliminary analysis of which contact situations could have led to these results.
Linguistic complexity across East Caucasian: from the eye of the beholder to corpus based measures
Anastasia Panova & Michael Daniel
Measuring complexity in typology is deemed relevant for the sociolinguistic take on language diversity, connecting complexity of language structures to such diverse but correlated factors as language size, its relative isolation, its L2 acquisition and multilingualism of its speakers.
Yet on the empirical side, measuring complexity is difficult not only because the measures are sometimes calibrated in what may seem an arbitrary way, but also - and certainly not less importantly - because they depend on the analysis in a grammar. As one example, Kibrik (1977) counts over a million of synthetic verbal forms in Archi; but excluding verificative and especially quotative 'series' from inflectional morphology dramatically reduces this abundance. Similarly, measuring phonetic complexity based on the cardinality of inventories may deliver different stories depending on the approach; the status of [x] in Archi (only Russian loans) is very different from its status in Rutul (native lexicon); including or excluding rare allophones (Archi [ɮ]) and variants (Mehweb [ɣ]) that sometimes do and sometimes do not make their way into the descriptive inventories could in theory influence the outcomes of the quantitative comparison, and it is not absolutely obvious what can be the impact of these factors on the comparison.
A way to avoid this would be (i) using shallow counts that minimize the analytical impact of language descriptions and (ii) making counts in corpora rather than deriving them from descriptive grammars. In this talk, after a brief survey of the existing corpus based approaches to measuring language complexity, we discuss several experiments we carried out to measure morphological and phonetic complexity across unannotated corpora of the languages of Daghestan.
We (dual, exclusive) are very much looking forward to having feedback and suggestions as to how further develop this take.
Multilingualism as a genre-structuring strategy: the case of Kakabe traditional narratives
Various West-African language communities show the use of a specific type of code-switching that is limited to the genre of traditional narratives: songs that appear in such narratives regularly include passages that are in a language different from the principal language of the narration. This type of conventionalized multilingualism is a regular phenomenon that is recurrently found across languages of West Africa. However, so far, it has never been object to any systematic investigation. In my presentation, I will analyze this type of multilingual practice on the data of 70 Kakabe traditional narratives, investigating the specific mechanism of switching from one language to the other and its relation to the wider context of the type of multilingualism found in this speech community.
Revisiting motion events in Basque
Manuel Padilla-Moyano (University of the Basque Country & Linguistic Convergence Laboratory, HSE)
Asymmetries in spatial relations have been described cross-linguistically [Stefanowitsch & Rohde 2004; Luraghi, Nikitina & Zanchi 2017; Kopecka & Vuillermet 2021]. Basque has a set of spatial cases, in which the ablative encodes Source and Path, and the allative conveys Goal. Additionally, there are both directional and terminative case-markers. In some dialects, this general tableau becomes more complicated, and historical records also provide additional complexity, such as an ancient dedicated perlative marker [Lafon 1948]. As Basque spatial cases can mark animacy, asymmetries in the encoding of motion events must also consider this parameter [Creissels & Mounole 2011; Krajewska 2021].
I will present an incipient study on the Source-Goal asymmetry, which will be part of comprehensive research on the evolution of the Basque case-system. Pursuing Zaika’s study , I will analyze the behavior of verbs of motion, putting and posture, as well as the case-markers and non-grammaticalized postpositions they make appear. This work will consider dialectal variation, diachronic factors, and the role of language contact. To this end, I will exploit existing corpora and other materials, and collect new data from fieldwork with speakers of several dialects.
Creissels, Denis & Mounole, Céline (2011). Animacy and spatial cases: Typological tendencies, and the case of Basque. In Seppo Kittilä, Katja Västi & Jussi Ylikoski (Eds.), Case, Animacy and Semantic Roles (Typological Studies in Language 99), pp. 157–182 Amsterdam/Philadelphia: John Benjamins.
Kopecka, Anetta & Vuillermet, Marine (2021). Source-Goal (a)symmetries across languages. Studies in Language 45(1).
Krajewska, Dorota (2021). The marking of spatial relations on animate nouns in Basque: a diachronic quantitative corpus study [submitted to Journal of Historical Linguistics].
Lafon, René (1948). Sur les suffixes casuels -ti et -tik. Eusko Jakintza 2, 141–150.
Luraghi, Silvia; Nikitina, Tatiana & Zanchi, Chiara (Eds.) (2017). Space in Diachrony. Amsterdam/Philadelphia: John Benjamins.
Stefanowitsch, Anatol & Rohde, Ada (2004). The goal bias in the encoding of motion events.
Zaika, Natalia (2016). Вариативность падежных форм при глаголах движения в баскском языке в диахроническом и диалектном аспектах. Acta Linguistica Petropolitana 12(1), 428–441.
Rare features in phonological typology
The talk will touch upon theoretical aspects of existing and emerging accounts on rare features in phonological typology, in general, and in word-prosodic typology, in particular. Rarities can be ignored by linguistic theory, be reanalysed as regular, or be incorporated by changing the theory. Phonological rara and rarissima used to be rather ignored or reanalysed, but the trend seems to be changing, with always more data coming in from lesser-studied languages, on the one hand, and a strengthening interest of linguistic typology in geographic and evolutionary aspects related to the cross-linguistic distribution of linguistic features, on the other hand.
Modern South Arabian: archaism, innovation and contact in the Arabian peninsula
It has long been known that the Modern South Arabian subgroup of the Semitic language family, made up of six endangered languages spoken in Oman and Yemen, exhibits a set of characteristics regarded by Semitic scholars as archaic, such as: large sound systems including lateral fricatives and affricates, and glottalised stops and affricates; productive subjunctive and conditional moods, as well as other characteristics that may be reminescent of classical Semitic languages, such as the reverse gender agreement between numerals and nouns, and the presence of second and third person feminine and dual pronouns.
However, certain other features of these languages have not been analysed in detail by mainstream Semitic literature. In fact, some of these features have not been discussed at all: for example, the presence of a first person dual pronoun, and the apparently non-Semitic facies of a sizeable part of Modern South Arabian lexis. Moreover, the unexplained relationship between Modern South Arabian languages and a huge amount of undeciphered epigraphs found mostly in caves and on rocks and boulders, calls for further studies. These epigraphs employ a modified version of the south Semitic script, and are found not only in the present-day range of Modern South Arabian, but also further north-east into Oman proper.
This presentation aims at providing a general introduction to the Modern South Arabian languages, and highlighting the above-mentioned issues, as well as advancing some working hypotheses.
Form and function in morphological typology
The goal of linguistic typology is to understand the interactions of form and function in the languages of the world. Typically, investigations conducted in this research paradigm take a certain functional domain (e.g., ‘causative/applicative’) as the starting point and subsequently analyze by which formal means it is expressed. In this talk, though, I will argue that typology can also benefit from following the opposite approach, that is, by focusing on a specific type of linguistic form (e.g., infixation) and analyzing which functions it encodes.The major advantage of the latter strategy is that linguistic forms are ultimately less variegated than linguistic functions, which facilitates comparison.This strategy will then allow typologists to develop a more nuanced theory of morphology and to account for areal patterns that manifest themselves in the distribution of linguistic forms. In order to support these claims, I will draw on novel research on the suffixing preference.
A Functional Discourse Grammar typology of reflexives, with some notes on reciprocals
This chapter presents the first-ever Functional Discourse Grammar typology of reflexives and opens the way to a comparable typology of reciprocals. The main finding of the paper is that the striking morphosyntactic diversity of reflexive markers can be reduced to only three basic classes, which differ as regards the structure of the predication frame on which the construction is built. In Type I reflexives the lexical predicate takes two coindexed arguments; Type II reflexives are based on a one-place frame in which the predicate bears a reflexive (or reflexive/reciprocal) operator; finally, Type III reflexives are characterized by the presence of a configurational predicate which takes both an external and an internal argument. All further differences are explained with reference to different ways of aligning the underlying pragmatic and semantic structures of each construction-type – more specifically, the number and information-structural status of referents at the Interpersonal Level and the number and structural position of verb arguments at the Representational Level. A further advantage of the proposed typology is that of accounting for possible differences in the lexical distribution of reflexive markers on the basis of the notion of partially instantiated predication frames, i.e. partially lexicalized constructional templates of the Representational Level.
The prefixal template of Umóⁿhoⁿ: case study of the “dative” prefix
Umóⁿhoⁿ (Siouan), a highly endangered Native American language spoken in the United States, possesses a highly complex verbal morphology, in particular a series of arbitrarily ordered derivational and inflectional prefixes. After a brief introduction to the language, I will present the verb’s prefixal template, then focus on the prefix gí-, usually called “dative”. The case study of gí- covers several key issues of Umóⁿhoⁿ morphology: (1) change of slot of person marking triggered by the presence of other prefixes; (2) multiple exponence of the dative and of person marking; (3) semantic demotivation and lexicalization of the prefixes. Building on these developments, I will show that the dative prefix exhibits both inflectional and derivational characteristics.
Towards the Nakh-Daghestanian Lexicon of Grammaticalization
What will be discussed in the talk is not an accomplished or even an ongoing project, but rather a general idea of creating a lexicon of grammaticalization for Nakh-Daghestanian languages. I will start with an overview of existing lexicons of grammaticalization (which are very few) and how they can serve as source of inspiration for the Nakh-Daghestanian Lexicon. I will then present example entries of the future Lexicon and mention the choices and the problems one has to face when creating such a Lexicon. Comments and suggestions from the audience will be most welcome.
Adpositions and case: Categorial issues
This talk will address the issue of the categorial status of case markers and adpositions from a cross-linguistic perspective. I will present some major research questions arising in this respect, including the following:
Towards the corpus of Bagwalal dialects
In the talk I will prBagwalal is a small and underdescribed language of the Avar-Andic branch of the Nakh-Daghestanian family. After a general introduction about the language and the history of its research (Timur Maisak), we shall present the ongoing project on the glossing of Bagwalal dialectal texts (Aleksandra Trepalenko). The texts first published in Gudava's (1971) grammar in Georgian represent all six villages where Bagwalal is spoken. We are going to present the results of our analysis of the texts (glossing, translation), mention the main dialectal differences and describe some interesting features of Bagwalal and problems we faced during our work.
On typology of caritive constructions
In the talk I will present the project “Grammatical periphery in the languages of the world: a typological study of caritives”. Caritive (aka abessive) expresses the non-involvement of a participant into a situation, with the non-involvement predication semantically modifying the situation or a participant of a different situation, like in English Mary came without John / money. The project aims at studying the means of expression of caritive meanings in the languages of the world. We developed a questionnaire and collected data from a representative sample of 100 languages. I am going to discuss the methodology of the project: the definition of caritive, questionnaire, methodology of collecting data. The project is still in progress, but I will present some preliminary results.
Postposed -to in North Russian dialects through the lens of Finnic languages in contact
The use of demonstrative-derived morphemes in the head-following position is characteristic of North Russian dialects (-to and its variants -ta, -tu, -ti, -te, …) and eastern Finnic languages (-se [singular] and -ne [plural]), such as Olonets Karelian, Lude, and Veps. In terms of function, some previous studies regard these grammatical elements as definite articles, while other recent studies identify additional functions related to information structure and discourse.
We will give a short update about the project Typological Atlas of Daghestan covering the progress we have made so far. We will discuss our plans to publish the resource and a (partially) new approach to data visualization.
Borrowed postpositions in East Caucasian
Grammar descriptions of East Caucasian languages include information about borrowed postpositions. I attempt summarizing the data on borrowed postpositions (both between branches of East Caucasian family and into East Caucasian from languages of the other families). I suggest contact origins of some postpositions whose diachrony is unclear from the sources. I will also provide an overview of the typology of borrowed postpositions. I ultimately aim at correlating borrowing of postpositions in East Caucasian with other contact-induced changes in the languages of the family. This presentation is a preview of a study, not a final analysis of the data.
Seminar schedule 2020
Using BivalTyp (www.bivaltyp.info) for measuring (dis)similarities between valency class systems
The goal of my presentation is two-fold. In the first part, I am going to introduce BivalTyp (www.bivaltyp.info) — a typological database of bivalent verbs and their encoding frames. This database contains information on the ways in which 130 bivalent contextualized predicates (such ‘be afraid’, ‘listen’, ‘touch’) are assigned to valency classes in 85 languages (mostly spoken in Eurasia). This part of the presentation will be user-oriented, i.e., I will focus on the ways data are processed, stored and visualized in the database.
Multifunctional non-finites in Northern Eurasia
In this talk, I am going to discuss patterns of multifunctionality that are characteristic of non-finite forms in 50 languages of Northern Eurasia. Specifically, non-finite forms are investigated in terms of the inventory of functions each of them can perform when heading a subordinate clause: reference function (complement clauses), adnominal modification (relative clauses), and adverbial modification (adverbial clauses). The primary questions I will address are the following: (a) What patterns of multifunctionality in non-finites are most common and how are they distributed geographically across Northern Eurasia? (b) Do patterns of multifunctionality differ depending on how prominent non-finite subordination is in a language? (c) Are there any recurrent patterns involving specific constructions, and if yes, can we propose an explanation for their occurrence?
Good Practices for Linguistic Data
This talk is devoted to the practices that make linguistic data findable, accessible, interoperable, and reusable (FAIR). First, I will introduce some general guidelines for data structures, file formats, and data description. Then I will touch upon the issues related to orthographic systems and discuss the problem of orthographic ambiguity. The first part will be concluded by the discussion of Cross-Linguistic Data Formats (CLDF) and meta-databases such as Glottolog, CLLD and Concepticon. Together these tools form a framework that attempts to facilitate data standardization and sustainable storage. The second part of the talk will deal with data sharing. I will propose several tools for increasing reproducibility of programming code. I will also discuss version control with Git and academic licences. Finally, I will briefly introduce the tools that are useful when submitting a paper: Open Science Framework (OSF.io) and Zenodo.
Lexical systems with systematic gaps: verbs of falling
The paper presents the results of a project on cross-linguistic analysis of FALLING verbs in more than 40 languages. The main possible oppositions and patterns of colexification in lexical systems are described in the framework of Moscow lexical typology group (Rakhilina, Reznikova 2016). Though in most languages this semantic field appears to be rich, our research did detect language systems without dedicated verbs of falling. We argue that these cases are neither accidental nor culture-specific, but can be seen as following from some fundamental semantic principles.
Data transformations for processing of interlinear glossed texts in SIL FLEx
Особенности функционирования дагестанских транслокальных сообществ в условиях внутрироссийской миграции
В докладе анализируется современное устройство и функционирование дагестанских сельских сообществ, члены которых участвуют во внутрироссийской миграции (в качестве примера выбрана миграция в города Западной Сибири). В качестве теоретической линзы были выбраны положения концепций транснационализма и транслокальности, которые позволяют рассматривать мигранта и его социальный мир без отрыва от его отправляющего сообщества, джамаата. Ориентация на сохранение приоритета сельской локальности при переселении за пределы села и республики Дагестан, поддержание транслокальных связей формируют новый социальный организм – мультилокальное сообщество – о специфическом функционировании такого рода сообществ и пойдет речь в сообщении. В основу работы положен полевой материал автора, собранный в городах Ханты-Мансийского автономного округа и в Республике Дагестан в 2011-2019 гг.
An ancient history bottleneck for linguistic diversity and its consequences for linguistic typology
In this presentation I will discuss the relation between linguistic diversity and basic units of human organization in pre-agricultural, nomadic and forager societies. On the basis of those patterns I will discuss existing hypotheses on the previous stages of linguistic diversity (from early Holocene until today), and I will provide evidence for a relatively brief period of massive linguistic diversity from 4-1 kybp. I will conclude by spelling out the practical consequences of this finding for typological and historical linguistic generalizations.
Clitics li and chi in Rogovatoye and Spiridonova Buda dialects: functions and positional properties
In the dialects that I will be talking about, the repertoire of function words such as conjunctions and particles is considerably different from the standard variety of Russian. Thus, they have clitic chi that is considered to have the same functions as Russian li. So, my first research question is what is the distribution of the functions between these quasi-synonymous clitics in the dialects that have both of them? Secondly, I want to talk about their positional properties, depending on the function they have. There were many studies of Slavic clitics in the standard languages, but none (as far as I know) considered dialect data, and that’s what I’ve tried to do.
Optatives in Nakh-Daghestanian and beyond
Inflectional optatives - dedicated forms to express the speaker's wish - are typical across the Caucasus. In this talk, I give an overview of optatives in Nakh-Daghestanian languages and discuss their possible diachronic sources and grammaticalization paths. I also argue for the contact as one reason for the areal spread of the optatives and suggest their prominent role in everyday discourse as a possible reason for this spread.
The standard of comparison in the languages of Daghestan
In this talk I will present the results of my research on the standard of comparison in the languages of Daghestan, which was started as part of the DagAtlas project (the “Typological Atlas of the Languages of Daghestan”). In the languages of Daghestan, the standard of comparison is usually expressed by a spatial form, i.e. an inflected form of a nominal normally expressing a spatial relation. In this study, I classify the languages of Daghestan according to the type of spatial form used to mark the standard of comparison. Following the methodological approach of the DagAtlas project, I collected the data from the available literature and built maps for the visualization of results. The results obtained are discussed both in terms of frequency and distribution within the linguistic area under investigation, and in comparison with broader typological investigations of comparative constructions (Stassen 1985, 2013), which include almost no reference to data from Daghestan. The latter comparison does not reveal surprising findings: the Daghestanian data adhere quite well to the cross-linguistic picture (with a general preference for elative markers). Within Daghestan, the overall picture seems a bit fuzzy, and the distribution of values on maps does not allow to detect any noteworthy areal or genealogical clustering. An exception is constituted by Andic languages, which form a cluster based on the localization marker employed (forms in -č’- indicating contact with some entity).
Inter-speaker variation in code-switching in the situation of language shift. The case of Nanai and Ulch
In this talk I will present some quantitative data on different structural types of code-switching attested in oral texts in Nanai and Ulch (Southern Tungusic). These texts represent a specific mode of code-switching between Nanai/Ulch and Russian observed in the situation of language shift. Speakers were instructed by the linguist to tell something in their native language, and this was an unusual and artificial way of communication, since both languages are endangered and the dominant language of the speech community is Russian. All the texts contain a lot of Russian fragments of different sizes and morphosyntactic types.
Сorrelations between linguistic distances with geography in Daghestan
We continue our project of looking for correlations between linguistic distances with geography in Daghestan, an area of high language density and mountainous terrain. We are trying to detect the impact of landscape on linguistic divergence by comparing correlations of linguistic distances with Great Circle ("crow flight") distances vs. distances calculated taking the terrain into account. This time we expanded our dataset to include Tsezic. We are trying to find ways to solve the problem of the geographic data being so much richer in datapoints than the documented village lects; and of combining slightly different data (such as Swadesh list vs. Jena lists) into a single count. We will tell you about our progress in the last few months in terms of data cleaning, playing with models and kicking each other. Still very much work in progress.
Phonetic fieldwork and experiments with the phonfieldwork package for R: rOpenSci review
There is a lot of different tasks that typically have to be solved during phonetic research. They include creating slides that would contain the stimuli, renaming and concatenating multiple sound files recorded during a session, automatic annotation in ‘Praat’ TextGrids (one of the sound annotation standards provided by ‘Praat’ software, see Boersma & Weenink 2018), creating an html table with annotations and spectrograms, and converting multiple formats between each other (‘Praat’ TextGrid, ‘EXMARaLDA’, ‘ELAN’, subtitles .srt, and .txt from Audacity). All of these tasks can be solved by combining different tools (relabeling is straightforward, Praat contains scripts for concatenating files, etc.). R package phonfieldwork provides a functionality that makes these tasks easy to solve without additional tools, and also as compared to other packages: rPraat, textgRid. During the talk, I will show how the package works and what it can do, explain some changes that were proposed by rOpenSci reviewers and will take your ideas for improvement. The tutorial is available online.
A corpus of Tsnal Lezgian
Lezgian is a language of the Lezgic branch of the Nakh-Daghestanian language family. Lezgian dialects are subdivided into the Küre dialect group, the Axceh dialect group and the Quba dialect group. The object of my investigation is spoken in the village of Tsnal of the Khivsky district in the Republic of Dagestan and belongs to the Jark'i dialect of the Küre dialect group (Mejlanova 1964). In this talk I am going to present the Tsnal Spoken Corpus I am working on and to discuss some of my early findings on the Tsnal variety of Lezgian.
From spatial deixis to anaphora: data from Lezgic and Tsezic
Crosslinguistically, demonstratives, in addition to their primary, deictic function, often acquire anaphoric uses. East Caucasian languages have rich inventories of demonstrative pronouns involving not less than three different stems, but do not have dedicated 3rd person pronouns; instead, they use demonstratives. The main goal of my study is to examine which demonstratives are recruited in anaphoric function by obtaining corpus counts, separately for adnominal and independent uses. The study is based on narrative corpora of several languages from the Lezgic and Tsezic branches. I conclude that even closely related languages may show divergent behavior.
Reflexive and the generic use of the second-person pronoun in Kakabe
Kakabe (Mande) has a reflexive pronoun with an unusual restriction on its antecedent. It cannot appear with referential nouns and pronouns, where regular personal pronouns are used instead. It does appear, however, with generic and quantified subjects, as well as in infinitival clauses. It also appears in correlative clauses with a relativized subject. I provide an account of this unusual distribution, situate the Kakabe data in the broader typological context, and discuss a possible diachronic path from the second-person pronoun involved in the development of the unusual reflexive pronoun in Kakabe.
Verbal agreement and voice in the Uralic languages of Western Siberia
Uralic (Ob-Ugric, Samoyedic) languages located in Western Siberia mostly exhibit a pragmatically-driven verbal agreement system whereby argument indexing on the verb depends on topicality of the core arguments. As shown in (Nikolaeva 2001; Dalrymple & Nikolaeva 2011) for Obdorsk Khanty (Northern) and Tundra Nenets, languages tend to use a Subject agreement paradigm for the Topical A > Focal O setting and a special Subject-Object paradigm for Topical A > Topical O and sometimes also for Focal O > Topical O. Additionally, some languages use inflectional Passive (Inverse) forms for Focal O > Topical A. However a deeper look into at least some languages of this area reveals that the usage of similar agreement and voice forms can depend not only on information structure but on a number of other factors, such as referentiality, animacy, number, assertiveness etc. partly resembling hierarchical indexation systems found in the Americas, South Asia and Australia, see e. g. (Zúñiga 2006). In the first part of the talk I will present my own field data from Kazym Khanty (Northern) with a more intricate verbal agreement system based on topicality and definiteness, compared to the situation in Obdorsk dialect. In the second part I will discuss an initial stage of a comparative areal research done by our project team in an attempt to shed some light on this phenomenon and its underpinnings across Khanty dialects as well as in Mansi and Tundra Nenets based on the text data available. Despite very limited size and time depth of existing corpora and text collections and yet a rather small amount of text material annotated and discussed by our team, the data show several tendencies that allow us to assess the overall situation in the region and to speculate about possible diachronic evolution of agreement and voice systems in the languages under investigation.
Quantitative Linguistic Geography of Daghestan
We study how geographical factors shape the distribution of languages spoken in Daghestan. An interdisciplinary approach is developed, involving linguistic data, methods based on geographic information systems, and statistics. Using wordlists with the best available granularity, and geolocation data from the Atlas of Multilingualism in Dagestan, we build a geospatial inference modelling for explaining the linguistic diversity of the area. Our project has two stages: (i) A synchronic mapping of the correlation between geographic and linguistic distances, and (ii) a diachronic reconstruction of the speakers’ dynamics, driven by phylogeny and contact events. In this talk, very much a work in progress and an opportunity to discuss the aim and the data, we will only cover the first stage, focusing on the area of North Daghestan where Andic languages are spoken.
Anchiq Karata indicative morphology. Allomorphy, inflectional classes and possible diachronic puzzles
In this talk I am going to present some results of my fieldwork in 2019: a fragment of description of Anchiq Karata (Andic, Avar-Andic-Tsezic, Nakh-Daghestanian) verbal morphology -- the system of indicative verb forms. In the first part of my talk I am going to discuss Time-Aspect markers in three subparts of the paradigm, namely Perfective, Imperfective and Infinitive subsystems. The second part is dedicated to the procedure of establishing and explication of inflection classes. I am going to account for three major verbal inflectional classes (Conjugations) and a few smaller inflectional subclasses of morphophonological nature. Some morphological irregularities in verbal inflection are also going to be surveyed. The third part includes several diachronic questions that Anchiq Karata data on indicative morphology could elucidate in the context of divergence of Central-Andic languages.
Borrowings, frequency and lexical change
In this talk, I am going to explore the relations between borrowability, frequency of use and the dynamics of lexical change, based mostly on corpus data. The talk will have two parts.
Alternative recipient marking in Ossetic: Once-in-a-lifetime clearest case of contact induced change
Ossetic regularly allows marking recipients in ditransitive constructions using either dative or allative case, a kind of variation that closely corresponds to the distribution of dative vs. lative recipients in East Caucasian languages. We show that the semantic motivation for the choice of marking can be described in terms of transfer of ownership vs. spatial transfer; key evidence is provided by the distribution of the two strategies with instances of the verb 'give' containing and not containing spatial prefixes. As the phenomenon is not attested elsewhere in Iranian and seems to be extremely rare cross-linguistically, it is more than likely that this feature of Ossetic developed as a result of language contact with Nakh.
The emphatic particle =gu in Andi dialects
Andi =gu is an emphatic / intensifying enclitic. Beside contexts where it indicates some kind of contrast / emphasis, =gu is found in combination with many types of hosts where its contribution is less clear. With some hosts =gu is obligatory (e.g. cardinal numbers), with others it is optional. Similar enclitics have been observed for other Avaro-Andic and Tsezic languages, cf. Forker 2015 for Avar, Kibrik et. al. 2001: 713 for Bagvalal.
Native speakers simplify their language when writing to non-natives on an internet forum
It is often claimed that large proportion of non-native speakers in a population facilitates morphological simplification (Trudgill 2011). There exists evidence in favour of this claim, but much is still unclear about the actual mechanism of simplification. Atkinson, Smith & Kirby (2018), relying on the evidence provided by artificial language-learning experiments, hypothesize that an important role is played by the interaction between speakers, primarily accommodation by more proficient speakers to less proficient ones. It is reasonable to expect that the most prominent case of accommodation would be foreigner-directed language (that is, accommodation by L1 to L2 speakers).
Elevation as a grammatical and semantic category of demonstratives
In this talk, I study semantic and pragmatic properties of elevational demonstratives by means of a typological investigation of 50 languages with elevational demonstratives from all across the globe. The four basic verticality values expressed by elevational demonstratives are up, down, level, and across. They can be ordered along the elevational hierarchy (up > down > level/across), which reflects cross-linguistic tendencies in the expression of these values by demonstratives and is grounded in our cognitive representation of the vertical axis and the special position of the ‘vertical positive region’. Elevational values are frequently co-expressed with distance-based meanings of demonstratives, and it is almost always distal demonstratives that express elevation, whereas medial or proximal demonstratives can lack elevational distinctions. This means that elevational demonstratives largely refer to areas outside the peripersonal sphere in a similar way as simple distal demonstratives. In the proximal domain, fine grained semantic distinctions such as those encoded by elevational demonstratives are superfluous since this domain is accessible to the interlocuters who in the default case of a normal conversation are located in close proximity to each other. I then discuss metaphorical extensions of elevational demonstratives to non-spatial uses such as temporal and social deixis. There are a few languages in which elevational demonstratives with the meaning up express the temporal meaning future, whereas the down demonstratives encode past. This finding is particularly interesting in view of the widely-debated use of Mandarin Chinese spatial terms ‘up’ for past events and ‘down’ for future events, which show the opposite metaphorical extension. I finally examine areal tendencies and potential correlations between elevational demonstratives and the geographical location of speech communities in mountainous areas such as the Himalayas, the Papuan Highlands and the Caucasus. I conclude that the data from elevational demonstratives do not support the Topographic Correspondence Hypothesis because languages spoken in similar topographic environments do not tend to have similar systems of elevational demonstratives if they belong to different language families.
Verbal morphological complexity in Lezgic languages
The Lezgic branch of East Caucasian, which comprizes about twelve distinct languages, is very diversified typologically, in particular in the verbal morphology. Lezgic verbal systems grammaticalise variable sets of categories, and show differents types and levels of complexity. Based on an overview of the attested morphological realisations of these verbal categories in all Lezgic languages, our presentation will endeavour to link probable cases of increased and decreased complexity in verbal systems (judging form what we can hypothesize about the proto-Legic verbal system) with two main socio-linguistic considerations: the size of linguistic communities in diachrony and the influence of contact with non-Lezgic languages.
Initial report on documentation of Sagada, Tsez
I am a research fellow at the University of the Free State in Bloemfontein, South Africa and now working with the Department of Caucasian Languages at the Institute of Linguistics, Russian Academy of Sciences. I also work for XRI, a research institute which was designed to bridge the gap between academia and humanitarian development initiatives. This talk has two parts. First, I will be presenting an update on the status of my research on the Sagada dialect of Tsez, which I began last summer. The Tsezic languages are a sub-branch of the Nakh-Daghestanian (or East Caucasian) language family. The Tsezic languages are divided into two groups: the East Tsezic group (Bezhta and Hunzib) and the West Tsezic group (Tsez, Hinuq, Khwarshi, and Inkhowari). There is consensus in the prior research on Tsez that it should be divided into two main dialects: Tsez and Sagada, with further dialectal variation within Tsez (Imnaishvili 1963, Radjabov 1999, Abdulaev 2011, Comrie 2007, Polinsky 2015, etc.). The main division between Tsez and Sagada has been made based primarily on the variation noted by Imnaishvili in the middle of the 20th century. The data collected by Imnaishvili have provided most of the present knowledge about Sagada. It has even been noted that Sagada may rightfully be considered a distinct language (Maria Polinsky, Bernard Comrie p.c.). In this talk I will include sociolinguistic details from native speakers of Sagada, specifically how they view their language and its mutual intelligibility with the larger dialects of Tsez. I will also draw attention to phonological, morphological, and lexical similarities and differences between Sagada and Tsez.
Sentence Focus in Kakabe
This talk draws attention to the diversity of pragmatic functions of Sentence Focus utterances in natural speech on the example of Kakabe, a Western Mande language. It is often ignored in the literature that SF can play multiple roles in discourse. Presentational ‘out-of-the blue’ utterances answering the questions ‘What happened?’ or ‘What’s new?’ are often considered as their main or even their only type of use. Yet the analysis of natural texts shows that SF utterances are at least as frequently used with the so-called explicative function (Sasse 1987; 1996; Matras and Sasse 1995) and the even lesser known inferential function, studied by Declerck (1992), Delahunty (1995; 2001) and Bearth 1992; 1997; 1999b). In particular, I will highlight the intersubjectivity aspect of speech production that is crucial in the understanding of how Inferential SF utterances are used. I will show on the example of Kakabe, a Western Mande language, that when natural speech is considered, apart from introducing all-new events, SF utterances turn out to be associated with a rich array of discourse strategies, such as explicative, elaborative, disruptive functions, etc. Accordingly, the discourse properties of the referents inside SF are subject to variation, and crucially, they affect the implementation of the focus-marking.
Preposition drop and language contact: The case of Daghestanian Russian
This paper studies the phenomenon of preposition drop — cases where preposition does not appear when we expect it to — in particular, in locative, directional and temporal adverbial phrases. We review and classify the existing analyses of the phenomenon that were proposed for different languages, predominantly non-standard and contact varieties. Next, we proceed to our quantitative study of preposition drop in Russian spoken in Daghestan, based on data collected from the sociolinguistic interviews of the DagRus corpus. We show how preposition drop depends on various linguistic and sociolinguistic factors, employing statistical methods. Level of Russian, preposition type and phonetic context turn out to be good predictors for preposition drop. We propose a functional explanation for the observed pattern.
Greek diglossia: a case study of spatial marking in Katharevousa
According to Ferguson (1959), in a diglossic situation two distinct varieties of a language (‘high’, learned by formal education, and ‘low’ colloquial) are spoken in the same community. He claims that High variety always exists in a stable codified form, whereas Low demonstrates wide variation in grammar and vocabulary. Although this is the case for some diglossic societies (such as Tamil), for others the situation is the opposite . My corpus analysis of Katharevousa (official language of Greece till 1976) demonstrates the instability of this register in the domain of spatial relations.
What is a quotative evidential, and does it exist?
Reported speech markers constitute a substantial part of evidentiality’s semantic domain. At the same time, the internal division of this subdomain into specific values remains disputed. Aikhenvald (2004) proposed an important distinction of reportative and quotative markers: reportatives refer to information based on hearsay, while quotatives refer to information based on the verbal report of a particular source. Several authors have since argued that quotatives (in contrast with reportatives) are not proper evidentials (among them are Boye (2010) and Holvoet (2018)), because they designate a proposition, rather than specify the speaker’s information source. In this talk I will discuss the typology of reported speech evidentials and compare the properties of "quotative" markers from a variety of languages to determine whether they can be viewed as evidentials.
Denominal postposition in East Caucasian languages
This is a study of grammaticalization sources of postpositions across East Caucasian languages. The focus is on the postpositions grammaticalized from nouns denoting body parts. While these nouns cross-linguistically often grammaticalize into spatial markers, in particular adpositions, this path does not seem to be typical of (all) East Caucasian. Postpositions from body parts are not equally spread across the family. Some languages have many, some few, and some none at all. The goal of this study is to provide an account for their distribution in genealogical and areal terms.
Anaphora and spatial deixis in East Caucasian: an overview of the data
Demonstratives, in addition to the main deictic uses, cross-linguistically often acquire a number of other functions, including anaphora. The majority of East Caucasian have rich inventories of demonstrative pronouns that employ three or more different stems covering various dimensions of deixis (distance, altitude and other). Most languages of Dagestan do not have a special 3rd person nominal pronoun and use attributive demonstratives instead.
Non-canonical inverse in Circassian and Abaza: borrowing of morphological complexity
In this paper I discuss a typologically peculiar inverse-like construction found in the polysynthetic ergative Circassian languages of the Northwest-Caucasian family and will argue that this construction has been borrowed into Abaza belonging to a different branch of the same family. These languages possess a cislocative verbal prefix, which, in addition to marking the spatial meaning of speaker-orientation, systematically occurs in polyvalent verbs when the object outranks the subject on the person hierarchy. The inverse-like use of the cislocative in Circassian differs from the “canonical” direct-inverse system in that, first, it is fully redundant since the person-role linking is achieved by means of the person markers themselves and, second, it does not occur in the basic transitive construction, featuring instead in configurations involving an indirect object both in ditransitive and bivalent intransitive verbs. I argue that the similar use of the cislocative prefix observed in Abaza is a result of pattern-borrowing from Kabardian, with which Abaza has been in intense contact, and that this borrowing has resulted in the increase of both paradigmatic and syntagmatic complexity of Abaza verbal morphology.
Non-pro-drop in the Baltic Area: for and against contact-induced origin
Five geographically close languages to the east of the Baltic sea – Russian (East Slavic), Latvian (Baltic), Ingrian, Votic and Ingrian Finnish (Finnic) – use a similar pattern for marking subject reference. In this pattern both personal pronouns and subject agreement on the verb are employed (from ⅔ to ¾ of all occurrences). This happens with all types of personhood:
Typological atlas of Daghestan: state of affairs and future plans
The typological atlas of Daghestan will be a WALS style resource containing information about linguistic features in the languages of Daghestan. Data for this resource are retrieved from grammars and organized into databases which are then used to generate maps. The final product will be a tool for the visualization of information about linguistic structures characteristic of Daghestan, but also a useful resource for bibliographical research on parameters of interest. In this talk we will discuss the state of affairs of the project and our future plans. We will briefly present the preliminary results related to the features that are currently being developed, and we will discuss some technical issues concerning the design of introductory texts and the generation of maps.
A dialectometric study of Albanian varieties: linguistic complexity and language contact history
The goal of our study is to examine the Albanian dialect continuum using the quantitative methods of dialectometry and interpret the results in terms of the history of the Albanian dialect landscape, in particular its contact history. Our data come from the Dialectological Atlas of Albanian Language that maps phonological, morphological and lexical features of 131 Albanian varieties of the main dialectal area. Using distance calculation, MDS analysis and hierarchical clustering, we estimate and visualize the closeness of these varieties and analyse it against their geographical distribution and the traditional classification of Albanian dialects.
Contact-influenced Russian of Northern Siberia and the Russian Far East
We will present a new small corpus of contact-influenced Russian speech, namely the Corpus of Russian spoken in Northern Siberia and the Russian Far East, and several case studies on contact phenomena in grammar, based on its data.
Bayesian phylogenetic analysis and wordlist handling
In this talk, present an introduction to modern Bayesian phylogenetic analysis in historical linguistics. Algorithms will be discussed with special focus in its conceptual motivations, as well as its scope and limitations. Wordlist building and handling will be approached from a practical perspective, including recommendations and examples of implementation. Cross-linguistic online resources and edition tools for this task will be presented.
Correlates of Language Shift in Population Groups vs. Epigraphic Cultures
The scholars focusing on sociolinguistic situations in ancient societies have no direct access to information about the boundaries of language communities at given points in time but have to study them through the prism of the available written sources. Since the notion of language shift is equally applicable to population groups and epigraphic cultures, and since it can be accompanied by contact-induced changes in both cases, it is appropriate to ask a question how the difference between the two types of communities correlates with different manifestations of language contact.
The speakers of minority languages are more multilingual
Population size is often discussed as a factor which might have influenced patterns of language and cultural evolution (Bowern 2010; Donohue & Nichols 2011; Nettle 2012; Bromham et al. 2015; Greenhill et al. 2018; Koplenig 2019; see Greenhill 2014 for an overview). In this paper, we advance the hypothesis that the larger is the population of language speakers, the less is the number of L2 mastered by these speakers.
The particle =OK in the Volga-Kama languages: contexts of use and frequencies of lexicalization
In this talk I will focus on the emphatic identity particle in the Volga-Kama languages (Chuvash =aχ/ =eχ, Tatar and Bashkir =uk/ =ük, Meadow Mari =ak, Hill Mari =ok and Udmurt =ik). The particle was borrowed from the Turkic (Bulgar) lects to the Finno-Ugric lects of the Sprachbund. Its contexts of use overlap to a fair extent in different languages but do not fully coincide. One of the core functions is that =OK is used when an argument of a proposition is identical to an argument of a different proposition (e.g. that house=OK 'the same house', referring to a previously mentioned house). In this talk I will take a closer look at this and other contexts of use of =OK in the languages of the area. Besides, =OK tends to attach and become lexicalized with some items, mostly adverbial expressions. I will present data on the frequencies of such collocations with =OK in the Volga-Kama Sprachbund.
Seminar schedule 2019
Ethnicity, speaking indigenous languages and fertility in the North Caucasus
Regions of the North Caucasus have experienced considerable social changes within the recent 4-5 decades, which included intensive urbanization, loosening of traditional family norms, weakening of gender asymmetries and lowering of empowerment of elder generations in communities and families. These processes started at different time and went with different speed in the republics of the North Caucasus, but in all the republics they were accompanied by a significant decrease of fertility, known as the First Demographic Transition (FDT) in population studies. That decrease was not at all unexpected, as ‘detraditionalization’ changes of the kind recently observed in the North Caucasus and the First Demographic Transition took place nearly simultaneously in many parts of the world.
The periphrastic causative in West Circassian
West Circassian, along with the other languages of the Northwest Caucasian family, is a highly polysynthetic language with complex verbal morphology. One marker in particular, the causative marker ʁe- , is highly productive. The morphological causative is commonly used to derive transitive predicates from nontransitive verbs and nominals. It is also, clearly, an important instrument for expressing the semantics of causation and in the associated valency increasing operation. In some cases ʁe- has even calcified on predicates, so that the predicate has no meaning without the causative prefix. It is therefore rather remarkable that a periphrastic causative strategy has arisen in West Circassian. This construction is based on the matrix verb ṣ̂ə- 'to do, make', which pairs with a lexical verb with the purposive suffixation. Based on observations of the behavior of personal indexes on the matrix verb it is apparent that this structure is noncompositional and has grammaticalized with causative semantics which are the same as those of the morphological causative.
A detailed corpus study of preposition drop in DagRus: preliminary results
In this talk we will discuss the phenomenon of preposition drop (omission) that is observed in the speech of L2 speakers of Russian with a Nakh-Daghestanian or Turkic (Kumyk, Azerbaijani) first language. We shall first review insights on the issue from the existing literature on the topic (Daniel & Dobrushina 2009, Daniel, Dobrushina & Knyazev 2010, Daniel & Dobrushina 2013 for Russian spoken in Daghestan; Stoynova and Shluinsky 2010 for Russian spoken by the Enets people; Khomchenkova, Pleshak and Stoynova 2017 for Russian of Northern Siberia and Russian Far East; Shagal 2016 for Russian spoken by the Erzya people), then present the data on preposition drop in the Russian speech of Kumyk and Azerbaijani native speakers, and suggest working hypotheses about the theoretical interpretation of the phenomenon that would guide our further work.
Causative alternations database
Causative alternations database There are several kinds of correspondence between causative and non-causative verbs: one can be derived from the other, they can be suppletive etc. These correspondences differ not only from language to language, but also from one causative pair to another within one language. The database was created so that one could deal with exact numbers of these alternations in many languages. I am going to talk about the data, the structure of the database and its challenges.
The Database of Cross-Linguistic Colexifications CLICS³: Data-driven semantic research from a cross-linguistic perspective
The term colexification (François 2008) refers to instances where the same word expresses two or more comparable concepts, covering instances of polysemy, vagueness, and homonymy. The comparative study of colexifications across languages allows for the construction of semantic maps, a useful tool for the study of lexical typology and beyond, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. In this talk, we describe the recently released third version of the Database of Cross-Linguistic Colexifications, CLICS³ https://clics.clld.org/ (Rzymski, Tresoldi, et al. 2019), a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns, containing data for 2811 concepts across 2955 languages.
Challenges of variation
Variation is inherent to all languages. Ɨt seems, however, that the degree of variation can vary from language to language. It is sometimes claimed that languages with writing systems show more variation than unwritten languages. It was also argued that small languages have less variation than large languages with many L2 speakers. It seems, however, that none of these conjectures were ever empirically tested. In fact, to date we have no methods which would allow measuring and comparing the amount of variation between languages. In this talk I want to raise this problem rather than suggest a solution.
Catching variation during fieldwork on Nakh-Daghestanian languages
During fieldwork researchers have to deal with all kinds of variation in the answers given by speakers: free variation, idolectal or sociolinguistic variation. In the present investigation we studied the degree of variation among 44 speakers of Zilo Andi for 16 different morpho(no)logical features known to be variable in this dialect. Additionally, we conducted a survey among a number of researchers of Nakh-Daghestanian languages, asking them about their fieldwork habits - including questions about how many speakers they usually consult. We used these data to evaluate the probability that an average researcher of Nakh-Daghestanian languages catches the observed variation during fieldwork.
Enets in space and time: a study in linguistic geography and history
This paper summarises a joint study by Yuri Koryakov, Andrey Shluinsky, and myself, see (Khanina et al. 2018a, Khanina et al. 2018b). Through a series of linguistic maps based on published ethnographic data and our fieldwork accounts, we reconstruct the territories in which Forest Enets and Tundra Enets (Samoyedic, Uralic; Central Siberia) have been spoken from the 17th century till today. We analyze in details migrations of the two ethnic groups and the changing language contact scenarios. One of the most intriguing findings of this study is an explanation of the Forest Enets - Tundra Enets puzzle.
Western Africa occupies a central place in the research on multilingualism due to the studies on the sociolinguistic situation in Cameroon (di Carlo 2018) and the Casamance area in Senegal (Lüpke and Storch 2013; Lüpke 2016). The study focuses on yet another case of multiligualism in Western Africa by discussing the multilingualism patterns in the area of Fouta-Djallon plateau in Guinea. The situation will be analyzed in the perspective of communities speaking Kakabe, a minor language spoken in about fifty villages. The involved languages are Kakabe, Maninka, Pular, and, to a lesser extent, Sussu, with Pular belonging to the Atlantic family and the three other languages to the Mande family. In my talk, I will analyze the attested multilingualism patterns in different types of language practices. The study is based on a multi-media oral corpus representative of a variety of genres and containing data that I have been collecting in the region since 2009.
Morphosyntax of complement clauses in East Caucasian languages: long-distance agreement
The East-Caucasian languages (Nakh-Daghestanian) show a number of puzzling structures that are challenging from the theoretical point of view: non-finite clauses where all the arguments are encoded in the same way as in independent sentences, backward control, long-distance reflexive pronouns and long-distance agreement in complement clauses. This talk is focused on long-distance agreement in East Caucasian languages. First, I discuss the phenomenon in Qunqi Dargwa. The infinitives and converbs are the only complementation strategies that allow long-distance agreement. In Qunqi, there is a fuzzy boundary between infinitives and indirect mood forms. The converbs are used both with control verbs and with emotive and perception complement-taking verbs. The long-distance agreement pattern is only observed with control verbs. I show that these structures show properties of clause union. Then I consider the data of 19 East-Caucasian languages (mostly based on the data from Kibrik 2005 "Materials to the typology of ergativity"), and discuss the long-distance agreement patterns in those languages. In most of these languages this phenomenon is limited to control constructions, while Tsez and Tsakhur deviate from this generalization.
Subgroups, linkages and beyond: Working on shared innovations in Eastern Polynesian languages
Polynesia covers a vast territory of the planet. It includes a large number of speech communities which descend from a common ancestor; some them are isolated by thousands of miles of open ocean. No wonder, Polynesia has always been the favorite place for both linguists and anthropologists working on phylogenetics. The standard account of the Eastern Polynesian subgrouping is that the language of Easter Island (Rapanui) forms a branch on its own, coordinated with Central Polynesian languages; CE in turns branches into Tahitic and Marquesic (Roger 1985). It has been becoming more and more evident that Tahitic and Marquesic are not valid subgroups (Vladimir Belikov 2009, Mary Walworth 2014). In my talk, I am going to show that Rapanui, Mangarevan, North and South Marquesan constitute a subgroup within Eastern Polynesian languages. Interestingly enough, this proposal implies some phonological and lexical innovations spreading across the Pacific. The main objective of the talk is to discuss the latter and their implications for the theory of language.
Glottalized /lˀ/ in Rikvani Andi
The opposition of the geminate and singleton ejective lateral stop /L'/, reconstructed to proto-Andic, has been lost in various Andic languages due to the phonetic evolution of the simpleton into a different soundtype. The speakers of Rikvani Andi, a dialect of Andi (Andic, East Caucasian) spoken by 800 people in the village of Rikvani in Dagestan, developed a glottalized lateral consonant as a reflex. Glottalized sonorants are a typological rara. While they have been sparsely attested in various areas where glottalic initiation also occurs with stops (ejectives), Rikvani Andi is the only variety of a Caucasian language where it has so far been reported.
Fieldtrip to Botlikh (Daghestan)
In August of 2019 we visited the village of Botlikh to study the Botlikh language. Our aim was to collect some data for a small investigation on agreement patterns of ordinal numerals. In addition, we translated several texts recorded by Togo Gudava in the 1950s-1960s, met a number of potential language consultants and learned some new things about the sociolinguistic situation in Botlikh. We will talk about the trip and our future plans for working on this language.
The Third School on Statistical Methods for Linguistics and Psychology (University of Potsdam, Germany)
Phylogeography of the Bantu Expansion
Bantu expansion is among the most important and least understood human migrations. Bantu-speaking populations (240 million people, 500 languages, spanning 9 million km2 ) are the result of a huge migration originating in a homeland near the border of Nigeria and Cameroon between 4,000BP and 5,000BP [2,3,4,5,6,7,8]. Although the homeland and the time depth are well established, the migration route is still unclear.
Recent phylogenetic studies [1,9,6,7,8] support the late-split [10,11,12,13,14], which claims that East-Bantu and West-Bantu languages' common ancestor crossed the African Rainforest, splitting after this. It is thought that this crossing was made through the Sangha River Interval (SRI), a N-S savanna opening into the rainforest. However, in dated phylogenies , dates don't match consistently: They should have crossed this corridor around 4,000BP, while it was completely open only 2,500BP.
We propose two different hypotheses for competing with the traditional SRI late-split. The first, a coastal savanna corridor . The second, an earlier paath through the rainforest. We compare the hypotheses with a Bayesian phylogeographic approach based on linguistic trees. We use lexical and geographical data for 400+ Bantu and Bantoid languages, inferring the linguistic and geographic history in parallel, by implementing the break-away model  in BEAST2 . We conclude that the way through the rainforest happened around 4,000BP.
Contact-influenced word order in genitive noun phrases: A corpus-based investigation of Russian spoken in Daghestan
The paper deals with non-standard word order in the variety of Russian spoken by bilinguals from Daghestan. Specifically, we focus on the occurrence of prepositive genitive modifiers in bilinguals’ speech. Whereas in monolinguals’ Russian the neutral and most frequent word order in noun phrases with a genitive modifier is the order N+GEN, in Daghestanian Russian the opposite order GEN+N often occurs. This phenomenon was mentioned as one of the striking morphosyntactic features of Daghestanian Russian, and its frequent occurrence can be partly explained in terms of syntactic calquing from speakers’ L1s, all featuring an unmarked GEN+N order in noun phrases. However, the picture is far less trivial than it could look at first sight. On the one hand, the word-order pattern GEN+N does not seem to affect equally all types of genitive noun phrases in Daghestanian Russian. On the other hand, similar examples of non-standard word order are sometimes found in monolinguals’ speech too. In the course of the paper, we present the results of our corpus-based investigation of genitive noun phrases in Daghestanian Russian as compared to monolinguals’ spoken Russian, including dialectal varieties. Prepositive genitives appear to be favored by several lexico-semantic and processing features of both the head and the genitive dependent. The strongest factor is kinship semantics: noun phrases that express a kinship relation tend to be prepositive. In monolinguals’ spoken Russian, although prepositive genitives are very infrequent, they sometimes show similar lexico-semantic and processing features. Therefore, we are not dealing with a simple calquing process. Rather, L1 influence is manifested in the strengthening of some tendencies existing in monolinguals’ Russian too.
Lingua francas as lexical donors: quantitative field study
The paper investigates the role that the rate of bilingualism plays in lexical borrowing. Our data comes from Daghestan, an area of high language density. Based on loanword counts, we isolate two zones of lexical influence, the south, heavily influenced by Azerbaijani, and the north, dominated by Avar. This salience of Avar and Azerbaijani as donor languages is likely to reflect the historical role of these languages as lingua francas in their respective geographical zones. The study supports the idea of Brown (1996, 2011) that contact influence from a lingua franca is higher than from a language only used to communicate with its L1 speakers. In line with the widespread argument that the amount of contact-induced change from a language is proportional to intensity of bilingualism (Thomason & Kaufman 1988), Brown stipulates that the importance of lingua francas as lexical donors must be linked to the high rate of bilingualism in these languages. The bilingualism in Azerbaijani and Avar was indeed high, as the evidence from field research on traditional language repertoires of Daghestanian highlanders shows. On the other hand, the knowledge of two other locally important languages, Chechen and Georgian, which was, at some locations, only slightly lower, did not lead to the same level of lexical transfer; in fact, the amount of Georgian and Chechen borrowings seems disproportionately low. High bilingualism rates are thus not sufficient for a language to become a major lexical donor. At the level of methodology, the paper explores the prospects of using short wordlists as ‘contact probes’, tools for measuring lexical contact. We follow the approach by Haspelmath & Tadmor (2009) and Bowern et al. (2011) in applying a fixed list of concept to quantify lexical contact between languages. Based on field elicitations conducted in a number of villages in the Republic of Daghestan, a list of 160 concepts is shown to be efficient enough to differentiate the degrees of lexical impact from the locally important L2’s to minority languages. The method does not only ensure comparability across contact situations but also provides a level of resolution that is sensitive to differences between villages speaking the same language. By fine-tuning the wordlist to a different linguistic setting, the methodology suggested here may be extended to other geographical areas of intense language contact and become a tool for reconstructing multilingual patterns of the past.
Contact-influenced word order in genitive noun phrases: A corpus-based investigation of Russian spoken in Daghestan
In a recent paper (Naccarato, Panova & Stoynova Forth.), we have examined cases of non-standard word order in the variety of Russian spoken by bilinguals from Daghestan. Specifically, we have restricted our analysis to the noun phrase, and have looked at the occurrence of prepositive genitive modifiers in bilinguals’ speech. As we have shown, whereas in Standard Russian the neutral and most frequent word order in noun phrases with a genitive modifier is the order N+GEN (muž sestry), in Daghestanian Russian the opposite order GEN+N (sestry muž) often occurs. This phenomenon has been partly explained in terms of syntactic calquing from speakers’ L1s, all featuring a neutral GEN+N order in noun phrases. However, such inversion in word order does not seem to equally affect all types of genitive noun phrases in Daghestanian Russian, but appears to correlate significantly with noun phrases featuring kinship semantics. Moreover, similar examples of non-standard word order are sometimes found in monolinguals’ speech too, which makes the picture far less trivial than it could look at first sight. In this talk, we present the latest results of our corpus-based investigation of genitive noun phrases in Daghestanian Russian as compared to monolinguals’ spoken varieties of Russian, with the aim of explaining the factors boosting non-standard word-order realizations.
Animacy agreement in Botlikh: ordinal numeral
Botlikh (Avar-Andic, East Caucasian) features a two-fold animacy agreement system including, on the one hand, a set of noun class (i.e. gender) markers representative of many EC languages and, on the other hand, an additional set of dedicated animacy markers which are unique to Botlikh. The dedicated animacy markers can appear on various targets (i.e. negative copulas, interrogative particles, question word formants, attributive clitics, present/future participles, ordinal numerals), and agreement is controlled by either the nominal head or the absolutive argument of the verb. By focusing on ordinal numerals, which appear to mark animacy most consistently, we set the following goals: a) to better understand the agreement patterns of these forms; b) to clarify which referents qualify as animates and which do not. For these purposes, we have created the first draft of a survey which we will discuss during the talk.
A database of census in Daghestan
Research on a variative phenomenon in Russian varieties of Daghestan and native speakers: Preliminary results (based on experimental data)
Numeral classifiers in Udi as a contact-induced development
Rutul dictionary: new resource
A database for structural borrowings from Azerbaijani into Lezgic languages
Vowel quantity as the distinction of spatial forms in Kina Rutul: an experimental study
IngRel, DagRel, and others: Relativization and the accessibility hierarchy in ergative languages, with implications for corpus databases
Language change in Northwest Amazonia: grammatical categories
When genetic and linguistic transmission do not coincide: emerging patterns from multiple studies
The scope of refactive markers in Abaza
Most descriptions of Abaza mention two affixes which express the meaning of refactive (‘again’, ‘once more’, etc.): the suffix -χ and the prefix ata-, which almost always appears only in combination with -χ. I argue that the main difference between these two refactive markers is that the marker -χ “sees” the internal structure of an event and can have scope over any part of it (just the resultant state, or just the process, with or without arguments), while the marker ata-+-χ is “blind” to the internal structure of the situation and can only “copy” the whole event with its arguments.
Noun vs. verb inflectional synthesis: A complexity trade-off?
A database for loanwords in Daghestan
In this talk we introduce our first pilot database for the DagLoans project. The database contains translations of 160 concepts collected in the field in Daghestan (and Northern Azerbaijan). At present, this includes a total of 24.785 entries from 23 different languages. The database can be used to find the translation of a concept in one or more languages. The most important feature is “Set”: all entries are grouped in sets with other similar words, which allows us to plot the spread of lexical items on the map. The database can be used for conducting quantitative research on lexical convergence as well as for creating geographical maps showing the areas and the intensity of foreign influence.
Bayes Factor: Bayesian way without diving in Bayesian maze
The most common statistical task is hypothesis testing. When a pair of competing models is fully defined, their definition immediately leads to a measure of how strongly each model supports the data. The ratio of their support is often called the likelihood ratio or the Bayes factor. During the talk I will show how to define different models and compare them with Bayes factor.
Typological atlas of multilingualism in Daghestan: problems and perspectives
The u+gen construction in Modern Standard Russian
In Modern Standard Russian, the prefix/preposition pair u-/u is peculiar with respect to other similar pairs, due to the meaning mismatch between the two. While the prefix u- has an ablative meaning, as shown when it is prefixed to motion verbs, the prepositional phrase u+gen occurs in locative constructions, and other related constructions, such as predicative possession that is expressed via the cross-linguistically common Locative Schema. Etymological considerations show that the meaning preserved by the prefix is older. The only type of occurrence which, according to the literature, preserves the ablative meaning for the u+gen construction preposition is found with verbs of requesting, removing, and buying. Notably, however, in other Slavic languages putative ablative contexts are limited to verbs of requesting. Data from MSR, OCS, Polish and Czech lead to the conclusion that the extension of the u+gen construction to verbs of removing in MSR is based on its use for the encoding of predicative possession. Extension to verbs of buying is better explained through the locative meaning of the construction. As a result of different developments, the u+gen construction has become part of the argument structure of a group of verbs including verbs of asking and requesting, verbs of removing and verbs of buying, which are characterized by the common feature of taking human non-recipient third arguments.
Kunbarlang is a critically endangered polysynthetic language spoken in central Arnhem Land, Northern Territory, by approximately 40 people. It belongs to the non-Pama-Nyungan Gunwinyguan family.
This talk reports on the first comprehensive description of Kunbarlang (although it builds on and extends important unpublished work by Carolyn Coleman and Joy Kinslow Harris). Kunbarlang has very rich verbal morphology that includes complex agreement paradigms, composite TMA system that differs from other Gunwinyguan languages, an array of argument derivation tools, and coverb constructions. The nominal domain, on the contrary, has little morphology and relies heavily on syntactic constructions - for instance, case marking of nouns is analytical.
The talk will give a general overview of the grammar, and then focus on a few selected topics across different areas.
Sequence of tenses in Russian? Tense choice in complement clauses in Standard and Learner Russian
It is generally believed that Russian has no sequence of tenses (SoT) in complement clauses, and the choice of absolute tense over relative is considered to be a typical error in the interlanguage of non-standard speakers of Russian as a foreign language whose native language features SoT, e.g. English. However, all uses of absolute tense in Learner Russian cannot qualify as errors, since Standard Russian shows a great deal of variation in tense assignment in complement clauses. One of the factors that is said to govern tense choice is the semantics of the matrix verb (Barentsen, 1996; Гиро-Вебер, 1975, Schlenker, 2003, inter alia). Specifically, speech and mental verbs are said to strictly require the relative tense, whereas sensory, emotion, and existential matrix verbs allow for both absolute and relative tense patterns. Despite the acknowledged variation, the precise distributional patterns of tenses in complement clauses have been understudied. This paper is a systematic corpus-based study of the variation in tense choice across the semantic classes of the matrix verbs in two language varieties: (i) Standard Russian as represented in the Russian National Corpus and (ii) Learner Russian of anglophone speakers as represented in the Russian Learner Corpus. I examine those clausal complexes where the matrix verb in the past tense and the verb of the complement clause denote simultaneous actions.
The analysis identified a likelihood hierarchy of verbal semantic classes ranging from the least likely to tolerate past tense in the complement clauses to the most likely ones: speech<mental<sensory≈emotion
Seminar schedule 2018
Limitative kə̄n in Ulch: morpheme with very weird positional properties
Validity of the data collected indirectly: belated proof of concept
Within the framework of Multidagestan project, vast amount of sociolinguistic data about traditional small-scale multilingualism was collected in Daghestan. The aim of the project is to trace the change of the multilingual patterns in the 20th century. However, 71 percent of the data were collected in an indirect way, asking people about their relatives. We will discuss the statistical methods that we used to check the robustness of the indirectly collected sociolinguistic data.
Relative clauses in Andi
The emphatic identity particle =OK in the Volga-Kama Sprachbund
The particle =OK, originally Turkic, is attested in all the core members of the Volga-Kama Sprachbund: Chuvash =aχ/ =eχ, Tatar and Bashkir =uk/ =ük, Meadow Mari =ak, Hill Mari =ok and Udmurt =ik. Meadow Mari, Hill Mari and Udmurt have arguably borrowed the particle from Turkic (Bulgar).
=OK is used in contexts many of which may be characterized as emphatic identity contexts: the argument marked by =OK is the same as an argument of a different proposition (≈ Russian že: Masha rabotajet v pole, Masha že sidit s det'mi 'Mary works in the field and it is also she who sits with the children').
However, in different languages =OK exhibits different morphosyntactic restrictions on the constituent to which it may attach, i.e. in Tatar it attaches to demonstratives (šul uk keše 'the same person') but not to proper names (*Märijäm ük 'it is also Mary who...'). In Chuvash, Meadow and Hill Mari and Udmurt the particle can attach to proper names. =OK can also attach to the verb, with different interpretations and again with different morphosyntactic restrictions.
There are similar constructions and lexicalizations with =OK in some of the languages (e.g. reduplication construction of the type V-converb=OK V).
I would like to discuss whether -- and how -- we can approach these similar patterns in terms of contact. From the literature we know what were the strongest bonds in the area (Chuvash-Mari, Tatar-Meadow Mari, Tatar-Bashkir). The question is whether greater and weaker similarity of =OK morphosyntactic, construction and lexicalization patterns between languages corresponds to areal affinity and how to demonstrate it.
Evaluating DP as a measure of corpus heterogeneity. The Even dialect comparison project at crossroads
In this SMALL discussion, we will present the path taken so far for the methods of inter-dialectal comparison, the point we are currently standing (or stuck) at, and will gratefully take advices as to how to proceed.
We will first remind of the starting point of the project, ie a mehod of isolating inter-dialectal divergence that takes into account inter-speaker variation. Than we will briefly overview the steps we did so far (LogLikelihood, Wilcoxon-Mann-Whitney test, Gries' DP). We will then focus on the very last result we got, evaluating the observed DP value against the simluation and permutation test for the distribution of the DP in a random sample - and whether we can use it for our purposes.
Conditions and questions: several cases of combined marking in Nakh-Dagestanian languages
In this paper, I consider several Lezgic languages suffixes (possibly, but not definitely related) that cover a rather wide range of contexts. Some contexts of their use may be qualified as denoting unrealized state-of-affairs (such as conditional clauses, polar questions, and, to a certain extent, indefinite pronouns). Some others fall short of this definition, including indirect questions and other subordinate clauses with WH-words. The set of contexts covered by the markers in question is one and the same in at least three Lezgic languages (Lezgian, Aghul, Tabassaran), and also in Azerbaijani, which raises a question of possible contact origin of this pattern. Some other Lezgic languages employ, in these contexts, several different markers. Kina dialect of Rutul presents an especially interesting case, combining in one morpheme (-jden) the meanings which are unlikely to be associated. In this talk, I will present the case of Kina Rutul in details, discuss possible interpretation and origins of the marker -jden, and compare Kina Rutul with other Lezgic languages.
Corpus research of target relativization in several languages of the Caucausus
In this talk, we will discuss modifying participial constructions which is a predominant type of relative clauses in East Caucasian languages. One of the key properties of participles in East Caucasian languages is the lack of syntactic orientation. There is little to no syntactic restrictions on what can be relativized: the gap in the relative clause can correspond to a core argument, a peripheral participant or even a participant that is not part of the verb’s argument structure. Languages also share some common patterns of constructionalization of specififc relative construction (such as name-constructions). On the other hand, there is variation across languages, e.g. more or less strongly articulated preference for S relativization; or more or less widespread use of the resumptives; and language-particular features, e.g. a very high ratio of addresse relativization (in name-constructions) in Agul. After a general overview of the problems related to the study of relativization targets, we concentrate on language-particular case studies and discuss the counts of relativization targets in the corpora of two East Caucasian languages (Agul and Archi). As a comparative background, West Circassian corpus data will be presented. In this language relativization is syntactically oriented, the strategy cannot be classified as participial, and special reflexivizers may be interpreted as obligatorified resumptive pronouns. Finally, we discuss the comparison of corpus counts on the relativized syntactic role in the three languages, and the problems connected to such comparison.
The internalization of inflection? The restrictive kə̄n in Ulch
Seminar schedule 2017
The DagLoans project aims at investigating lexical convergence between East Caucasian languages and their neighbours in quantitative terms. We focus on horizontal interaction, looking at borrowings between languages that are in direct contact and dismissing influence of dominant cultures and distant languages, e.g. Arabic, Persian or Russian. The project consists of two parts - one dealing with lexical matter copy, the other with lexical pattern copy. Today we are only discussing the data of the former.
We deal with lexical matter borrowing in an attempt to compare and quantify horizontal borrowing between languages at different locations. Instead of comparing standard languages, we aim at comparing local varieties and, ideally, village lects. Basing on the Leipzig-Jakarta list and issues of "Отраслевая лексика", we attempt to compose a list of lexical items with a high borrowability rate. The list should be concise enough to be elicited from several speakers during a one day visit to a village, but, on the other hand, long enough to discriminate local varieties and, ideally, village lects.
At present, we work on data from Rutul, Tsunta and Botlikh districts. In the talk on September, 19 we plan to discuss the list and its composition and elicitation techniques we use to decide whether it is an adequate tool for studying lexical contact rate at local level and how it reflects local geography and data on bilingualism. The talk is based on the data from 6 villages (4 languages) of the Rutul region that are located in the same valley: Khlut (Lezgian), Kiche (Rutul), Rutul (Rutul), Kina (Rutul), Helmets (Tsakhur), and Kusur (Avar).
The project "Atlas of Multilingualism of Daghestan" is based on sociolinguistic interviews recorded in Daghestan by the team of this project over the course of seven years. The aim of the project is to determine the level of bilingualism in Daghestanian mountain villages and describe the sociolinguistic patterns of linguistic convergence of local languages. In addition, the project allows to establish the type of linguistic contact characteristic of neighboring villages, which languages not pertaining to a particular area were spoken by inhabitants, how the command of Russian changed, what role the geographic distance between languages played and how the command of certain languages was distributed among inhabitants of a village. This talk will focus on two topics: first of all, we will show the results of a study of how the command of languages was distributed among men and women and how these dynamics have evolved since the beginning of the 20th century until now. Second, we will discuss some problems and shortcomings of the method used and we will suggest some verification methods.
Rich consonantal inventories are a salient feature of the languages of the Caucasus on the whole and of the languages of Daghestan in particular. Their composition is a subject to a certain variation from language to language, but is overall similar. All languages have ejectives, most languages have labialization, and geminates are not uncommon. On the other hand, acoustic properties of the phonologically identical elements may be substantially different. To document these differences, in the last few years we do systematic field recordings of data from different languages. In this talk we will introduce the aims and methods of the project and will present preliminary results of our analysis of acoustic properties of ejectives as compared to corresponding voiceless stops. We will evaluate the impact of such parameters as closure duration and voice onset time. We will use the data from three different languages - Rutul (Kina dialect), Andi (Zilo dialect) and Mehweb. In the long run, we hope that our project will be able to address the following theoretical question: is the observed intragenetic variation more sensitive to areal or to systemic factors?
Spoken Meadow Mari corpus: data, design, and aims
The talk presents the Spoken Meadow Mari corpus project. Meadow Mari is a Uralic language spoken in the Volga region by some 375 thousand people. The core of the corpus are recordings made in 2000-2001 by a group of researchers from the Lomonosov Moscow State University. In our talk we will discuss the data we have, possible applications of the project and the target audiences of the corpus, as well as its structure. Making the corpus data presentable involves transcribing, glossing and annotating the data as well as aligning audio and text which should facilitate data analysis.
Universal Dependencies for Mehweb Dargwa
The Universal Dependencies (UD) is a project dealing with consistent cross-linguistic morphological and syntactic mark-up. The UD is currently in version 2 and covers 52 languages with 10 more languages yet to be included.
With its own annotation principles and abstract inventory for parts of speech, morphosyntactic features and dependency relations, UD aims to facilitate multilingual parser development, crosslingual learning, and parsing research from a language typology perspective. While UD covers 11 language families, it does not include any languages of the Caucasus (including the East Caucasian family). In our talk we will describe the way Mehweb Dargwa (East Caucasian) meets the UD scheme.
Dialectal Variation in Even based on Corpora of Field Recordings
The talk presents the "Dialectal variation in Even" project. The project works with two dialects of Even: the easternmost one spoken in the village Sebjan-Küöl in Yakutia and the westernmost one spoken in Kamchatka. The first dialect has been in contact with Yakut for a long time, while the second has possibly been in contact with Koryak and Itelmen. The aim of the project is to discover differences between the two dialects and whether they stem from independent innovations or contact. For this, we use corpora of field recordings collected by Brigitta Pakendorf in the course of 2007-2012. In this talk we will describe the data, argue what differences can be found in them and what statistical methods can be used for this, and present some differences in morphology and syntax that have been found so far.
West Circassian and Kabardian languages represent a dialectal continuum spread in Krasnodar district, republics of Adygea, Karachai-Circassia and Kabardino-Balkaria. During the presentation, we are going to talk about phonetic and sociolinguistic features of different Circassian idioms, present a project of an atlas of Circassian isoglosses and show first maps, which the atlas will include. In addition, we will describe the process of creating a phonetic and grammatical questionnaire and the difficulties associated with that.
Prosodic Analysis of Non-standard Russian Spontaneous Speech
The study deals with the intonation patterns in three Non-Standard varieties of Russian. In the first part of the talk we discuss the methodological issues, such as analysis of pitch range, data filtering and data representation. In the second part, we consider a number of case studies, namely Daghestanian Russian, Southern Russian and Jewish Russian. The general goal of the project is to compile an annotated corpus of non-standard Russian with pitch annotation and explore the intonation patterns of non-standard varieties of Russian basing on corpus data.
Corpus of Russian spoken in Daghestan
The corpus of regional variants of Russian spoken in Daghestan is based on transcribed sociolinguistic interviews in Russian with speakers of various Daghestanian languages who live in rural areas. Technically, the corpus is built using the platform and annotation principles developed for the dialectal corpus of Ustja River Basin. The aims of the project include both its maintenance and adding new texts, as well as the use of the corpus for systematic study of morphosyntactic characteristics of Daghestanian Russian. In the talk, we plan to discuss the current state of the corpus, the possibilities of corpus-based research, as well as the problems we met and the perspectives of the project.
Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.