A survey that unifies prior code-switching research for LLMs into a taxonomy of data, modeling, and evaluation and distills it into actionable recommendations for practitioners.
Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Amidst the rapid advances of large language models (LLMs), most LLMs still struggle with mixed-language inputs, limited Codeswitching (CSW) datasets, and evaluation biases, which hinder their deployment in multilingual societies. This survey provides the first comprehensive analysis of CSW-aware LLM research, reviewing 327 studies spanning five research areas, 15+ NLP tasks, 30+ datasets, and 80+ languages. We categorize recent advances by architecture, training strategy, and evaluation methodology, outlining how LLMs have reshaped CSW modeling and identifying the challenges that persist. The paper concludes with a roadmap that emphasizes the need for inclusive datasets, fair evaluation, and linguistically grounded models to achieve truly multilingual capabilities https://github.com/lingo-iitgn/awesome-code-mixing/.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Code Mixologist : A Practitioner's Guide to Building Code-Mixed LLMs
A survey that unifies prior code-switching research for LLMs into a taxonomy of data, modeling, and evaluation and distills it into actionable recommendations for practitioners.