Anna Golovina and Ksenia Dunaeva (HSE University) Using transducers to create morphological parsers and other NLP tools for Nakh-Daghestanian languages

*recommended age

Event ended

Our talk is dedicated to creating morphological parsers for low-resource Nakh-Daghestanian languages.
Morphological parsers can be created based on either processing of a set of grammatical rules of the language or probabilistic models underlying neural networks. The latter are not suitable for languages with a small collection of annotated texts. A finite-state transducer is a rule-based parser that can be defined as a type of finite-state automata with two input tapes. Whereas ordinary finite-state automata can merely determine whether a concrete string belongs to the described regular language, a transducer maps between two sets of symbols: input symbols and output symbols. The transducer makes correspondence between a surface word form and a string with morphological analysis. Building a two-level rule-based parser requires combining a minimum of two different finite-state transducers: one for lexicon storing and morphotactics modeling and another for implementing morphophonological rules.
In recent years, morphological parsers based on transducers were implemented for a wide range of East Caucasian languages, including Tsez (Wilson & Howell, 2022), Andi Proper (Buntiakova 2023) and Zilo Andi (Moroz 2022), Bagvalal (Ignatiev 2022), and some others. Our talk will be focused on building parsers for Avar and Bezhta Proper. We will discuss in detail the tools that can be used to create a morphological transducer, the difficulties that one may encounter while computationally modeling the morphology and morphophonology of Nakh-Daghestanian languages, the projects that are already being implemented at the Higher School of Economics, and the future prospects for using rule-based morphological parsers.

Date

23 April 16:00

Address

Б-421

About

Linguistic Convergence Laboratory