A unified survey that consolidates Indian NLP resources by task, language, domain, and modality while identifying gaps in coverage and generalization.
Muril: Multilingual representations for indian languages
6 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 6representative citing papers
Human rationales in supervision for Telugu sentiment analysis improve model alignment with human reasoning and often produce gains in predictive performance.
KS-PRET-5M is a newly released 5.09 million word Kashmiri pretraining dataset containing 12.13 million subword tokens after MuRIL tokenization, made available as a continuous text stream under CC BY 4.0.
Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.
A language-adaptive combination of generalist, specialist, and ensemble transformer models achieves 0.796 macro F1 and 0.826 accuracy on multilingual polarization detection across 22 languages.
A survey that taxonomizes motivations for transliteration in cross-lingual NLP, reviews incorporation approaches and their evolution, analyzes trade-offs in settings like code-mixing and language families, and offers implementation recommendations.
citing papers explorer
-
BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources
A unified survey that consolidates Indian NLP resources by task, language, domain, and modality while identifying gaps in coverage and generalization.
-
Human-Centered Supervision for Sentiment Analysis in Telugu: A Systematic Inquiry Beyond Accuracy
Human rationales in supervision for Telugu sentiment analysis improve model alignment with human reasoning and often produce gains in predictive performance.
-
ks-pret-5m: a 5 million word, 12 million token kashmiri pretraining dataset
KS-PRET-5M is a newly released 5.09 million word Kashmiri pretraining dataset containing 12.13 million subword tokens after MuRIL tokenization, made available as a continuous text stream under CC BY 4.0.
-
Mitigating Extrinsic Gender Bias for Bangla Classification Tasks
Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.
-
MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization
A language-adaptive combination of generalist, specialist, and ensemble transformer models achieves 0.796 macro F1 and 0.826 accuracy on multilingual polarization detection across 22 languages.
-
Scripts Through Time: A Survey of the Evolving Role of Transliteration in NLP
A survey that taxonomizes motivations for transliteration in cross-lingual NLP, reviews incorporation approaches and their evolution, analyzes trade-offs in settings like code-mixing and language families, and offers implementation recommendations.