Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities

· 2025 · cs.CL · arXiv 2510.07037

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Amidst the rapid advances of large language models (LLMs), most LLMs still struggle with mixed-language inputs, limited Codeswitching (CSW) datasets, and evaluation biases, which hinder their deployment in multilingual societies. This survey provides the first comprehensive analysis of CSW-aware LLM research, reviewing 327 studies spanning five research areas, 15+ NLP tasks, 30+ datasets, and 80+ languages. We categorize recent advances by architecture, training strategy, and evaluation methodology, outlining how LLMs have reshaped CSW modeling and identifying the challenges that persist. The paper concludes with a roadmap that emphasizes the need for inclusive datasets, fair evaluation, and linguistically grounded models to achieve truly multilingual capabilities https://github.com/lingo-iitgn/awesome-code-mixing/.

representative citing papers

Code Mixologist : A Practitioner's Guide to Building Code-Mixed LLMs

cs.CL · 2026-01-21 · unverdicted · novelty 5.0

A survey that unifies prior code-switching research for LLMs into a taxonomy of data, modeling, and evaluation and distills it into actionable recommendations for practitioners.

citing papers explorer

Showing 1 of 1 citing paper.

Code Mixologist : A Practitioner's Guide to Building Code-Mixed LLMs cs.CL · 2026-01-21 · unverdicted · none · ref 25 · internal anchor
A survey that unifies prior code-switching research for LLMs into a taxonomy of data, modeling, and evaluation and distills it into actionable recommendations for practitioners.

Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities

fields

years

verdicts

representative citing papers

citing papers explorer