• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Masha Krivolap, Maksim Melenchenko (HSE University) Predicting Shughni gender with machine learning

0+
*recommended age
Event ended

Our study aims to investigate the influence of various factors of gender assignment in the Shughni language (Eastern Iranian) using machine learning. We have trained several models to predict grammatical gender (feminine or masculine) on a dataset of 2,390 nouns from the Shughni-Russian dictionary. For training, we used both semantic features (semantic classes and vectorized Russian definitions) and formal features (word endings and the last vowel of the stem). Our results show that semantics plays a primary role in gender assignment in Shughni, as the proposed semantic features can correctly predict the gender for ≈80% of nouns in our sample. Formal features seem less significant and can correctly predict the gender for only ≈70% of nouns in the dataset. The correlation between these two types of gender predictors is high (especially for feminine gender), so combining them does not yield significantly better results.