Oxana Goncharova (Pyatigorsk State University) Emotion Recognition in Bilingual Speech: A Comprehensive Deep Learning-Based Method
This study explores emotion recognition in bilingual speech through a comparative analysis of machine learning (ML) and deep learning (DL) techniques. Initially, a hybrid framework was implemented, combining Mel-frequency cepstral coefficients (MFCCs) with prosodic features (e.g., pitch, intensity, speech rate) and conventional ML algorithms. While preliminary results were encouraging, the approach suffered from overfitting and limited robustness to minor data variations.To overcome these limitations, we propose a deep learning architecture that integrates a CNN-based autoencoder with an embedding network. Experimental evaluations demonstrate a significant enhancement in performance metrics compared to traditional methods, highlighting the potential of multimodal frameworks for emotion analysis in bilingual speech.