A P\={a}ninian Foundation for Indic Language Processing

Lav R. Varshney; Ritwik Banerjee

arxiv: 2606.24172 · v1 · pith:PKHYXDD5new · submitted 2026-06-23 · 💻 cs.CL · cs.AI· cs.LG

A P\={a}ninian Foundation for Indic Language Processing

Ritwik Banerjee , Lav R. Varshney This is my paper

Pith reviewed 2026-06-26 00:37 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords Indic languagesPāṇiniAstādhyāyīnatural language processingmorphosyntaxbenchmarkscomputational linguisticsSanskrit

0 comments

The pith

Indic languages share a Pāṇinian morphosyntactic architecture that can unify their NLP systems across genealogical lines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that more than two millennia of convergence around Sanskrit produced a shared morphosyntactic architecture in Indic languages, formalized in Pāṇini's Astādhyāyī. Current NLP practice ignores this regularity by building separate tools and benchmarks for individual languages or small families. Grounding future work in the Pāṇinian framework would therefore produce more accurate, data-efficient, and transferable systems while merging sparse resources into one high-resource bedrock. A four-part benchmark suite is proposed to make the shared architecture explicit and measurable. The work also flags an open question for interpretability research about whether neural models trained on these languages acquire Pāṇini's categories without explicit supervision.

Core claim

Through more than two millennia of convergence around Sanskrit, Indic languages came to share a morphosyntactic architecture formalized in Pāṇini's grammar, the Astādhyāyī. This cuts across genealogical lines, uniting languages through a common framework. We argue that this Pāṇinian framework supplies a unifying computational architecture the field has lacked, and that benchmarks grounded explicitly in it would make Indic language systems more accurate, more data-efficient, and more transferable, effectively merging many apparently disparate and sparse Indic language resources into a single high-resource metalanguage bedrock.

What carries the argument

The morphosyntactic architecture formalized in Pāṇini's Astādhyāyī, which supplies the shared computational structure across Indic languages.

If this is right

Indic NLP systems would become more accurate by exploiting the shared structure.
Systems would require less training data because resources from many languages could be pooled.
Transfer across languages would improve because the architecture is common rather than language-specific.
Disparate and sparse datasets would merge into a single high-resource metalanguage bedrock.
A four-part benchmark suite would render the architecture explicit and ready for practical use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Interpretability techniques could be used to test whether trained models have independently discovered Pāṇini's categories.
The same logic might apply to other language groups that share an ancient grammatical tradition even if the paper does not explore them.
If the benchmarks succeed, they could serve as a template for unifying resources in other fragmented NLP domains.

Load-bearing premise

The morphosyntactic architecture formalized in Pāṇini's Astādhyāyī actually cuts across genealogical lines in a manner that is directly actionable and beneficial for contemporary neural NLP models and benchmarks.

What would settle it

Empirical results showing that benchmarks and models built on Pāṇinian categories produce no gains in accuracy, data efficiency, or cross-language transfer compared with existing language-specific or family-specific approaches would falsify the central claim.

read the original abstract

More than a billion people communicate in Indic languages, yet the natural language processing infrastructure serving them remains fragmented and underdeveloped. The cause is structural: the field organizes its tools and benchmarks around individual languages or small subsets of genealogical language families, building separate analyzers, parsers, and datasets for each language and starting over for the next. This overlooks a deep regularity. Through more than two millennia of convergence around Sanskrit, Indic languages came to share a morphosyntactic architecture formalized in P\={a}nini's grammar, the Ast\={a}dhy\={a}y\={i}. This cuts across genealogical lines, uniting languages through a common framework. We argue that this P\={a}ninian framework supplies a unifying computational architecture the field has lacked, and that benchmarks grounded explicitly in it would make Indic language systems more accurate, more data-efficient, and more transferable, effectively merging many apparently disparate and sparse Indic language resources into a single high-resource metalanguage bedrock. We propose a four-part benchmark suite to render this shared architecture explicit, measurable, and ready to be leveraged for practical applications. Moreover, we underscore the question it raises for interpretability research: whether neural models trained on these languages come to represent P\={a}nini's categories on their own.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Position paper proposes Pāṇinian benchmarks for Indic NLP but supplies no mappings, definitions, or evidence for the claimed gains.

read the letter

The punchline for this paper is that it proposes a Pāṇinian framework as a unifying computational architecture for Indic language processing and suggests a four-part benchmark suite, but it provides no data, mappings, or experiments to demonstrate any benefits.

What the paper does well is identify the structural cause of fragmentation in Indic NLP tools and point to the historical role of Sanskrit grammar in creating shared morphosyntactic features across languages. This observation about convergence cutting across genealogical lines is accurate and could be a useful starting point for thinking about cross-lingual methods.

The soft spots are clear and central. The manuscript does not include any formal mapping of Pāṇini's rules, such as kāraka or samāsa, to differentiable components or probing tasks in neural models. The benchmark suite is proposed but left undefined, with no schematic details on what the four parts would measure. As a result, the assertions about improved accuracy, data efficiency, and transferability are not backed by evidence and remain speculative.

The paper raises an interesting question for interpretability research regarding whether models learn these categories implicitly, but again offers no analysis.

This work is aimed at researchers in NLP who focus on Indic languages or who are interested in incorporating linguistic theory into benchmark design. A reader seeking new empirical findings or implemented systems will not find them, but someone looking for conceptual reorganization might see value in the discussion.

It deserves a serious referee because the core idea has enough potential to warrant feedback on how to make it actionable.

My recommendation is to send it to peer review, expecting the authors to add concrete examples or definitions in response to comments.

Referee Report

2 major / 1 minor

Summary. The paper argues that Pāṇini's Aṣṭādhyāyī formalizes a morphosyntactic architecture shared across Indic languages through historical convergence around Sanskrit, supplying a unifying computational framework that the field has lacked. It claims that benchmarks explicitly grounded in this framework would yield more accurate, data-efficient, and transferable Indic NLP systems by merging disparate resources into a single high-resource metalanguage bedrock, and proposes a four-part benchmark suite to render the architecture explicit and measurable while raising questions about neural model interpretability of Pāṇinian categories.

Significance. If the proposed framework can be operationalized with concrete mappings and benchmarks that deliver measurable gains, the work could address fragmentation in Indic NLP by providing a cross-lingual unifying architecture, potentially improving transfer and efficiency for over a billion speakers. The emphasis on historical grammatical convergence as a computational resource is a novel angle for low-resource language processing.

major comments (2)

[Abstract] Abstract: The claim that 'benchmarks grounded explicitly in it would make Indic language systems more accurate, more data-efficient, and more transferable' is load-bearing but unsupported, as the manuscript provides neither a formal mapping of any sūtra (e.g., a kāraka or samāsa rule) to a differentiable loss term, embedding objective, or probing task, nor even a schematic definition of the four-part benchmark suite.
[Abstract] Abstract: The assertion that the Pāṇinian architecture 'cuts across genealogical lines' in a manner 'directly actionable' for neural models is stated as a deep regularity but lacks any concrete examples from distinct language families (e.g., Indo-Aryan vs. Dravidian) showing how specific grammatical categories translate into shared neural objectives or evaluation metrics.

minor comments (1)

[Abstract] The manuscript refers to 'a four-part benchmark suite' without naming or outlining the parts, which reduces clarity of the proposal even at a high level.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. The manuscript is a position paper that identifies a unifying Pāṇinian architecture and proposes a benchmark direction rather than delivering implemented mappings or results. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'benchmarks grounded explicitly in it would make Indic language systems more accurate, more data-efficient, and more transferable' is load-bearing but unsupported, as the manuscript provides neither a formal mapping of any sūtra (e.g., a kāraka or samāsa rule) to a differentiable loss term, embedding objective, or probing task, nor even a schematic definition of the four-part benchmark suite.

Authors: We agree the abstract states a forward-looking claim without implementation details. This paper introduces the conceptual framework and the existence of the four-part benchmark suite as a proposal for the community; it does not claim to have derived differentiable objectives or executed the mappings. To make the proposal more actionable, we will add a schematic outline of the four-part benchmark suite in the revised manuscript. revision: partial
Referee: [Abstract] Abstract: The assertion that the Pāṇinian architecture 'cuts across genealogical lines' in a manner 'directly actionable' for neural models is stated as a deep regularity but lacks any concrete examples from distinct language families (e.g., Indo-Aryan vs. Dravidian) showing how specific grammatical categories translate into shared neural objectives or evaluation metrics.

Authors: The manuscript grounds the cross-family claim in the historical convergence around Sanskrit that produced shared morphosyntactic categories. While the abstract is concise, the body discusses these shared features. We will strengthen the abstract and add explicit cross-family examples (e.g., kāraka alignment in Indo-Aryan and Dravidian languages) to illustrate actionability for shared objectives. revision: yes

Circularity Check

0 steps flagged

No significant circularity; proposal is conceptual argument without self-referential derivations

full rationale

The manuscript advances a conceptual proposal that Pāṇinian morphosyntax supplies a shared architecture across Indic languages and that benchmarks grounded in it would yield accuracy and transfer gains. No equations, fitted parameters, predictions, or derivations appear in the provided text. The central claim is an argument from historical convergence rather than a reduction of any output quantity to an input defined by the paper itself. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is presented as a derivation. The proposal therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that Indic languages share a computationally useful morphosyntactic architecture from the Astādhyāyī that has been overlooked by current per-language tooling.

axioms (1)

domain assumption Indic languages share a morphosyntactic architecture formalized in Pāṇini's Astādhyāyī that cuts across genealogical lines.
Presented in the abstract as the deep regularity that current fragmented approaches overlook.

invented entities (1)

Pāṇinian framework as unifying computational architecture and metalanguage bedrock no independent evidence
purpose: To serve as the basis for new benchmarks that merge disparate Indic language resources.
Introduced in the abstract as the missing unifying structure without prior computational validation or independent evidence cited.

pith-pipeline@v0.9.1-grok · 5760 in / 1282 out tokens · 25592 ms · 2026-06-26T00:37:27.693074+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 26 canonical work pages

[1]

Amitav Acharya. 2013. Civilizations in Embrace: The Spread of Ideas and the Transformation of Power : India and Southeast Asia in the Classical Age . Institute of Southeast Asian Studies, Singapore

2013
[2]

Annamalai

E. Annamalai. 2024. The Sanskrit Paradigm of Tamil Grammar: Embrace and Resistance. Bhasha 3, 1 (April 2024), 1–16. doi:10.30687/bhasha/2785-5953/2024/01/002 A Pāṇinian Foundation for Indic Language Processing 13

work page doi:10.30687/bhasha/2785-5953/2024/01/002 2024
[3]

Niyati Bafna, Cristina España-Bonet, Josef Van Genabith, Benoît Sagot, and Rachel Bawden. 2023. Cross-lingual Strategies for Low-resource Language Modeling: A Study on Five Indic Dialects. In Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux – article...

2023
[4]

Varshney

Razan Baltaji, Saurabh Pujar, Martin Hirzel, Louis Mandel, Luca Buratti, and Lav R. Varshney. 2025. Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study. Transactions on Machine Learning Research 2025, June (2025), 26 pages. https://openreview.net/forum?id=1PRBHKgQVM

2025
[5]

Tamali Banerjee and Pushpak Bhattacharyya. 2018. Meaningless yet meaningful: Morphology grounded subword- level NMT. In Proceedings of the Second Workshop on Subword/Character LEvel Models , Manaal Faruqui, Hinrich Schütze, Isabel Trancoso, Yulia Tsvetkov, and Yadollah Yaghoobzadeh (Eds.). Association for Computational Linguis- tics, New Orleans, 55–60. d...

work page doi:10.18653/v1/w18-1207 2018
[6]

Ajitesh Bankula and Praney Bankula. 2025. Cross-Linguistic Transfer in Multilingual NLP: The Role of Language Families and Morphology. arXiv: 2505.13908

arXiv 2025
[7]

Ramakrishna- macharyulu

Akshar Bharati, Rajeev Sangal, Vineet Chaitanya, Amba Kulkarni, Dipti Misra Sharma, and K.V. Ramakrishna- macharyulu. 2002. AnnCorra: Building Tree-banks in Indian Languages. In COLING-02: The 3rd Workshop on Asian Language Resources and International Standardization . Association for Computational Linguistics, Taipei, Taiwan, 8 pages. https://aclantholog...

2002
[8]

Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Sharma, and Fei Xia. 2009. A Multi- Representational and Multi-Layered Treebank for Hindi/Urdu. In Proceedings of the Third Linguistic Annotation Work- shop (LA W III), Manfred Stede, Chu-Ren Huang, Nancy Ide, and Adam Meyers (Eds.). Association for Computational Linguistics, Suntec, Sing...

2009
[9]

Soham Bhattacharjee, Mukund K. Roy, Yathish Poojary, Bhargav Dave, Mihir Raj, Vandan Mujadia, Baban Gain, Pruthwik Mishra, Arafat Ahsan, Parameswari Krishnamurthy, Ashwath Rao, Gurpreet Singh Josan, Preeti Dubey, Aadil Amin Kak, Anna Rao Kulkarni, Narendra V. G., Sunita Arora, Rakesh Balbantray, Prasenjit Majumdar, Karunesh K. Arora, Asif Ekbal, and Dipti...

arXiv 2025
[10]

Norman Blake. 1996. A History of the English Language . New York University Press, New York

1996
[11]

Maharaj Brahma, N J Karthika, Atul Singh, Devaraj Adiga, Smruti Bhate, Ganesh Ramakrishnan, Rohit Saluja, and Maunendra Sankar Desarkar. 2025. MorphTok: Morphologically Grounded Tokenization for Indian Languages . arXiv:2504.10335

arXiv 2025
[12]

Jannik Brinkmann, Chris Wendler, Christian Bartelt, and Aaron Mueller. 2025. Large Language Models Share Rep- resentations of Latent Grammatical Concepts Across Typologically Diverse Languages. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume ...

work page doi:10.18653/v1/2025.naacl-long.312 2025
[13]

Chang, Catherine Arnett, Zhuowen Tu, and Ben Bergen

Tyler A. Chang, Catherine Arnett, Zhuowen Tu, and Benjamin K. Bergen. 2024. When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguisti...

work page doi:10.18653/v1/2024.emnlp-main.236 2024
[14]

Suniti Kumar Chatterji. 1926. The Origin and Development of the Bengali Language. Calcutta University Press, Calcutta, India

1926
[15]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , Dan Jurafsky, Joyce Chai, N...

work page doi:10.18653/v1/2020.acl-main.747 2020
[16]

Alexis Conneau, Shijie Wu, Haoran Li, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Emerging Cross-lingual Struc- ture in Pretrained Language Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Lin- guistics,...

work page doi:10.18653/v1/2020.acl-main.536 2020
[17]

and Nivre, Joakim and Zeman, Daniel , year = 2021, month = may, journal =

Marie-Catherine de Marneffe, Christopher D. Manning, Joakim Nivre, and Daniel Zeman. 2021. Universal Dependen- cies. Computational Linguistics 47, 2 (July 2021), 255–308. doi:10.1162/coli_a_00402

work page doi:10.1162/coli_a_00402 2021
[18]

Murray B. Emeneau. 1956. India as a Linguistic Area. Language 32, 1 (1956), 3–16. doi:10.2307/410649

work page doi:10.2307/410649 1956
[19]

Pawan Goyal and Gerard Huet. 2016. Design and analysis of a lean interface for Sanskrit corpus annotation. Journal of Language Modelling 4, 2 (2016), 145–182. doi:10.15398/jlm.v4i2.108 14 Ritwik Banerjee and Lav R. Varshney

work page doi:10.15398/jlm.v4i2.108 2016
[20]

Oliver Hellwig. 2016. Improving the Morphological Analysis of Classical Sanskrit. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016) , Dekai Wu and Pushpak Bhattacharyya (Eds.). The COLING 2016 Organizing Committee, Osaka, Japan, 142–151. https://aclanthology.org/W16-3715/

2016
[21]

Oliver Hellwig and Erica Biagetti. 2025. The Sanskrit Sembank. Language Resources and Evaluation 59 (2025), 3635–

2025
[22]

doi:10.1007/s10579-025-09852-1

work page doi:10.1007/s10579-025-09852-1
[23]

Gérard Huet. 2005. A Functional Toolkit for Morphological and Phonological Processing, Application to a Sanskrit Tagger. Journal of Functional Programming 15, 4 (2005), 573–614. doi:10.1017/S0956796804005416

work page doi:10.1017/s0956796804005416 2005
[24]

Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. 2024. Position: The Platonic Representation Hy- pothesis. In Proceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Re- search, Vol. 235), Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Feli...

2024
[25]

Pānini-Backus Form

Peter Zilahy Ingerman. 1967. “Pānini-Backus Form” suggested. Commun. ACM 10, 3 (March 1967), 137. doi:10.1145/ 363162.363165

arXiv 1967
[26]

Girish Nath Jha. 2010. The TDIL Program and the Indian Langauge Corpora Intitiative (ILCI). In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias (Eds.). European Lan- guage Reso...

2010
[27]

Braj B. Kachru. 1992. The other tongue: English across cultures (2 ed.). University of Illinois Press, Urbana, Illinois

1992
[28]

Braj B. Kachru. 1992. World Englishes: approaches, issues and resources. Language Teaching 25, 1 (1992), 1–14. doi:10.1017/S0261444800006583

work page doi:10.1017/s0261444800006583 1992
[29]

Khapra, and Pratyush Kumar

Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, and Pratyush Kumar. 2020. IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 , Trevor Cohn, Yulan He, and Y...

2020
[30]

N. J. Karthika, Maharaj Brahma, Rohit Saluja, Ganesh Ramakrishnan, and Maunendra Sankar Desarkar. 2025. Multi- lingual Tokenization through the Lens of Indian Languages: Challenges and Insights . arXiv: 2506.17789 [cs.CL]

Pith/arXiv arXiv 2025
[31]

Robert D. King. 2006. The Poisonous Potency of Script: Hindi and Urdu. International Journal of the Sociology of Language 2001, 150 (2006), 43–59. doi:10.1515/ijsl.2001.035

work page doi:10.1515/ijsl.2001.035 2006
[32]

Amrith Krishna, Pavan Kumar Satuluri, and Pawan Goyal. 2017. A Dataset for Sanskrit Word Segmentation. In Pro- ceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, and Stan Sz- pakowicz (Ed...

work page doi:10.18653/v1/w17-2214 2017
[33]

Sriram Krishnan and Amba Kulkarni. 2019. Sanskrit Segmentation revisited. In Proceedings of the 16th International Conference on Natural Language Processing, Dipti Misra Sharma and Pushpak Bhattacharya (Eds.). NLP Association of India, International Institute of Information Technology, Hyderabad, India, 105–114. https://aclanthology.org/2019. icon-1.12/

2019
[34]

Sriram Krishnan, Amba Kulkarni, and Gérard Huet. 2023. Validation and Normalization of DCS corpus and Develop- ment of the Sanskrit Heritage Engine’s Segmenter. In Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference , Amba Kulkarni and Oliver Hellwig (Eds.). Association for Comput...

2023
[35]

Anoop Kunchukuttan and Pushpak Bhattacharyya. 2022. Machine Translation and Transliteration involving Related and Low-Resource Languages. CRC Press, Boca Raton, USA and Abingdon, UK

2022
[36]

David B. Lurie. 2023. The Vernacular in the World of Wen: Sheldon Pollock’s Model in East Asia? In Cosmopolitan and Vernacular in the World of Wen 文, Ross King (Ed.). Language, Writing and Literary Culture in the Sinographic Cosmopolis, Vol. 5. Brill, Leiden, The Netherlands, 49–68. doi:10.1163/9789004529441_003

work page doi:10.1163/9789004529441_003 2023
[37]

Anand Mishra. 2009. Simulating the Pāṇinian System of Sanskrit Grammar. In Sanskrit Computational Linguistics , Gérard Huet, Amba Kulkarni, and Peter Scharf (Eds.). Lecture Notes in Computer Science, Vol. 5402. Springer, Berlin, 127–138. doi:10.1007/978-3-642-00155-0_4

work page doi:10.1007/978-3-642-00155-0_4 2009
[38]

2025.BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages

Vandan Mujadia and Dipti Misra Sharma. 2025.BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages. arXiv:2412.04351

arXiv 2025
[39]

Karthika N J, Krishnakant Bhatt, Ganesh Ramakrishnan, and Preethi Jyothi. 2025. LEVOS: Leveraging Vocabulary Overlap with Sanskrit to Generate Technical Lexicons in Indian Languages. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025) , Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein,...

work page doi:10.18653/v1/2025.bea-1.20 2025
[40]

Arijit Nag, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, and Soumen Chakrabarti. 2023. Transfer Learning for Low-Resource Multilingual Relation Classification. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 2, Article 50 (March 2023), 24 pages. doi:10.1145/3554734

work page doi:10.1145/3554734 2023
[41]

Sebastian Nehrdich, Oliver Hellwig, and Kurt Keutzer. 2024. One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks. In Findings of the Association for Computational Linguistics: EMNLP 2024 , Yaser Al- Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 13742–13751. doi:...

work page doi:10.18653/v1/2024.findings-emnlp.805 2024
[42]

NLLB Team. 2024. Scaling neural machine translation to 200 languages. Nature 630 (2024), 841–846. doi:10.1038/ s41586-024-07335-x

2024
[43]

Riya Pal and Dipti Sharma. 2019. Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), Wei Xu, Alan Ritter, Tim Baldwin, and Afshin Rahimi (Eds.). Association for Computational Linguistics, Hong Kong, China, 291–296. doi:10.18653/v1/D19- 5538

work page doi:10.18653/v1/d19- 2019
[44]

Martha Palmer, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure. In Proceedings of the 7th International Conference on Natural Language Processing (ICON) . Macmillan Publishers, Hyderabad, India, 259– 268

2009
[45]

Martha Palmer, Paul Kingsbury, and Daniel Gildea. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics 31, 1 (2005), 71–106. doi:10.1162/0891201053630264

work page doi:10.1162/0891201053630264 2005
[46]

Priyaranjan Pattnayak, Hitesh Patel, and Amit Agarwal. 2025. Tokenization Matters: Improving Zero-Shot NER for Indic Languages. In 2025 IEEE International Conference on Electro Information Technology (eIT) . IEEE, Valparaiso, Indiana, USA, 456–462. doi:10.1109/eIT64391.2025.11103625

work page doi:10.1109/eit64391.2025.11103625 2025
[47]

Siddhesh Pawar, Pushpak Bhattacharyya, and Partha Talukdar. 2023. Evaluating Cross Lingual Transfer for Mor- phological Analysis: a Case Study of Indian Languages. In Proceedings of the 20th SIGMORPHON workshop on Com- putational Research in Phonetics, Phonology, and Morphology , Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, and Çağrı Çöltekin (Eds...

work page doi:10.18653/v1/2023 2023
[48]

Gerald Penn and Paul Kiparsky. 2012. On Pāṇini and the Generative Capacity of Contextualized Replacement Sys- tems. In Proceedings of COLING 2012: Posters , Martin Kay and Christian Boitet (Eds.). The COLING 2012 Organizing Committee, Mumbai, India, 943–950. https://aclanthology.org/C12-2092

2012
[49]

Sheldon I. Pollock. 2000. Cosmopolitan and Vernacular in History. Public Culture 12, 3 (2000), 591–625. Project MUSE. https://muse.jhu.edu/article/26221

2000
[50]

Pooja Rai, Ayan Das, and Sanjay Chatterji. 2025. Mapping of the Nepali Dependency Treebank to Universal Depen- dencies. ACM Trans. Asian Low-Resour. Lang. Inf. Process.24, 11, Article 132 (Nov. 2025), 22 pages. doi:10.1145/3749643

work page doi:10.1145/3749643 2025
[51]

Vinit Ravishankar. 2017. A Universal Dependencies Treebank for Marathi. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories , Jan Hajič (Ed.). Association for Computational Linguistics, Prague, Czech Republic, 190–200. https://aclanthology.org/W17-7623

2017
[52]

Rajendran, K

Baskaran Sankaran, Kalika Bali, Monojit Choudhury, Tanmoy Bhattacharya, Pushpak Bhattacharyya, Girish Nath Jha, S. Rajendran, K. Saravanan, L. Sobha, and K.V. Subbarao. 2008. A Common Parts-of-Speech Tagset Framework for Indian Languages. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Nicoletta Calzola...

2008
[53]

P., Brejesh Lall, and Shresth Mehta

Arun Kumar Singh, Sushant Dave, Prathosh A. P., Brejesh Lall, and Shresth Mehta. 2020. A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis . arXiv: 2010.12937

arXiv 2020
[54]

Abhishek Kumar Singh, Vishwajeet Kumar, Rudra Murthy, Jaydeep Sen, Ashish Mittal, and Ganesh Ramakrishnan
[55]

In Findings of the Association for Computational Linguistics: NAACL 2025 , Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.)

INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages. In Findings of the Association for Computational Linguistics: NAACL 2025 , Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.). Association for Computational Linguistics, Albuquerque, New Mexico, 2607–2626. doi:10.18653/ v1/2025.findings-naacl.141

2025
[56]

Juhi Tandon, Himani Chaudhary, Riyaz Ahmad Bhat, and Dipti Misra Sharma. 2016. Conversion from Pāṇinian Karakas to Universal Dependencies for Hindi Dependency Treebank. In Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LA W-X 2016), Annemarie Friedrich and Katrin Tomanek (Eds.). Associ- ation for Computational Li...

work page doi:10.18653/v1/w16-1716 2016
[57]

Sarah Grey Thomason. 2000. Linguistic Areas and Language History. In Languages in Contact , D. G. Gilbers, J. Ner- bonne, and J. Schaeken (Eds.). Studies in Slavic and General Linguistics, Vol. 28. Brill, Leiden, The Netherlands, 311–

2000
[58]

Varshney

doi:10.1163/9789004488472_030 16 Ritwik Banerjee and Lav R. Varshney

work page doi:10.1163/9789004488472_030
[59]

Srisa Chandra Vasu. 1897. The Ashtādhyāyī of Pāṇini. Sindhu Charan Bose, Benares
[60]

Joshi, Aiman A

Devika Verma, Ramprasad S. Joshi, Aiman A. Shivani, and Rohan D. Gupta. 2023. Kāraka-Based Answer Retrieval for Question Answering in Indic Languages. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing , Ruslan Mitkov and Galia Angelova (Eds.). INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria, 1216–1224. h...

2023
[61]

Daniel Zeman, Joakim Nivre, Rimsha Abid, Mitchell Abrams, et al. 2026. Universal Dependencies 2.18 . Institute of Formal and Applied Linguistics (ÚFAL), LINDAT/CLARIAH-CZ Digital Library. http://hdl.handle.net/11234/1-6149

2026

[1] [1]

Amitav Acharya. 2013. Civilizations in Embrace: The Spread of Ideas and the Transformation of Power : India and Southeast Asia in the Classical Age . Institute of Southeast Asian Studies, Singapore

2013

[2] [2]

Annamalai

E. Annamalai. 2024. The Sanskrit Paradigm of Tamil Grammar: Embrace and Resistance. Bhasha 3, 1 (April 2024), 1–16. doi:10.30687/bhasha/2785-5953/2024/01/002 A Pāṇinian Foundation for Indic Language Processing 13

work page doi:10.30687/bhasha/2785-5953/2024/01/002 2024

[3] [3]

Niyati Bafna, Cristina España-Bonet, Josef Van Genabith, Benoît Sagot, and Rachel Bawden. 2023. Cross-lingual Strategies for Low-resource Language Modeling: A Study on Five Indic Dialects. In Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux – article...

2023

[4] [4]

Varshney

Razan Baltaji, Saurabh Pujar, Martin Hirzel, Louis Mandel, Luca Buratti, and Lav R. Varshney. 2025. Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study. Transactions on Machine Learning Research 2025, June (2025), 26 pages. https://openreview.net/forum?id=1PRBHKgQVM

2025

[5] [5]

Tamali Banerjee and Pushpak Bhattacharyya. 2018. Meaningless yet meaningful: Morphology grounded subword- level NMT. In Proceedings of the Second Workshop on Subword/Character LEvel Models , Manaal Faruqui, Hinrich Schütze, Isabel Trancoso, Yulia Tsvetkov, and Yadollah Yaghoobzadeh (Eds.). Association for Computational Linguis- tics, New Orleans, 55–60. d...

work page doi:10.18653/v1/w18-1207 2018

[6] [6]

Ajitesh Bankula and Praney Bankula. 2025. Cross-Linguistic Transfer in Multilingual NLP: The Role of Language Families and Morphology. arXiv: 2505.13908

arXiv 2025

[7] [7]

Ramakrishna- macharyulu

Akshar Bharati, Rajeev Sangal, Vineet Chaitanya, Amba Kulkarni, Dipti Misra Sharma, and K.V. Ramakrishna- macharyulu. 2002. AnnCorra: Building Tree-banks in Indian Languages. In COLING-02: The 3rd Workshop on Asian Language Resources and International Standardization . Association for Computational Linguistics, Taipei, Taiwan, 8 pages. https://aclantholog...

2002

[8] [8]

Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Sharma, and Fei Xia. 2009. A Multi- Representational and Multi-Layered Treebank for Hindi/Urdu. In Proceedings of the Third Linguistic Annotation Work- shop (LA W III), Manfred Stede, Chu-Ren Huang, Nancy Ide, and Adam Meyers (Eds.). Association for Computational Linguistics, Suntec, Sing...

2009

[9] [9]

Soham Bhattacharjee, Mukund K. Roy, Yathish Poojary, Bhargav Dave, Mihir Raj, Vandan Mujadia, Baban Gain, Pruthwik Mishra, Arafat Ahsan, Parameswari Krishnamurthy, Ashwath Rao, Gurpreet Singh Josan, Preeti Dubey, Aadil Amin Kak, Anna Rao Kulkarni, Narendra V. G., Sunita Arora, Rakesh Balbantray, Prasenjit Majumdar, Karunesh K. Arora, Asif Ekbal, and Dipti...

arXiv 2025

[10] [10]

Norman Blake. 1996. A History of the English Language . New York University Press, New York

1996

[11] [11]

Maharaj Brahma, N J Karthika, Atul Singh, Devaraj Adiga, Smruti Bhate, Ganesh Ramakrishnan, Rohit Saluja, and Maunendra Sankar Desarkar. 2025. MorphTok: Morphologically Grounded Tokenization for Indian Languages . arXiv:2504.10335

arXiv 2025

[12] [12]

Jannik Brinkmann, Chris Wendler, Christian Bartelt, and Aaron Mueller. 2025. Large Language Models Share Rep- resentations of Latent Grammatical Concepts Across Typologically Diverse Languages. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume ...

work page doi:10.18653/v1/2025.naacl-long.312 2025

[13] [13]

Chang, Catherine Arnett, Zhuowen Tu, and Ben Bergen

Tyler A. Chang, Catherine Arnett, Zhuowen Tu, and Benjamin K. Bergen. 2024. When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguisti...

work page doi:10.18653/v1/2024.emnlp-main.236 2024

[14] [14]

Suniti Kumar Chatterji. 1926. The Origin and Development of the Bengali Language. Calcutta University Press, Calcutta, India

1926

[15] [15]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , Dan Jurafsky, Joyce Chai, N...

work page doi:10.18653/v1/2020.acl-main.747 2020

[16] [16]

Alexis Conneau, Shijie Wu, Haoran Li, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Emerging Cross-lingual Struc- ture in Pretrained Language Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Lin- guistics,...

work page doi:10.18653/v1/2020.acl-main.536 2020

[17] [17]

and Nivre, Joakim and Zeman, Daniel , year = 2021, month = may, journal =

Marie-Catherine de Marneffe, Christopher D. Manning, Joakim Nivre, and Daniel Zeman. 2021. Universal Dependen- cies. Computational Linguistics 47, 2 (July 2021), 255–308. doi:10.1162/coli_a_00402

work page doi:10.1162/coli_a_00402 2021

[18] [18]

Murray B. Emeneau. 1956. India as a Linguistic Area. Language 32, 1 (1956), 3–16. doi:10.2307/410649

work page doi:10.2307/410649 1956

[19] [19]

Pawan Goyal and Gerard Huet. 2016. Design and analysis of a lean interface for Sanskrit corpus annotation. Journal of Language Modelling 4, 2 (2016), 145–182. doi:10.15398/jlm.v4i2.108 14 Ritwik Banerjee and Lav R. Varshney

work page doi:10.15398/jlm.v4i2.108 2016

[20] [20]

Oliver Hellwig. 2016. Improving the Morphological Analysis of Classical Sanskrit. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016) , Dekai Wu and Pushpak Bhattacharyya (Eds.). The COLING 2016 Organizing Committee, Osaka, Japan, 142–151. https://aclanthology.org/W16-3715/

2016

[21] [21]

Oliver Hellwig and Erica Biagetti. 2025. The Sanskrit Sembank. Language Resources and Evaluation 59 (2025), 3635–

2025

[22] [22]

doi:10.1007/s10579-025-09852-1

work page doi:10.1007/s10579-025-09852-1

[23] [23]

Gérard Huet. 2005. A Functional Toolkit for Morphological and Phonological Processing, Application to a Sanskrit Tagger. Journal of Functional Programming 15, 4 (2005), 573–614. doi:10.1017/S0956796804005416

work page doi:10.1017/s0956796804005416 2005

[24] [24]

Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. 2024. Position: The Platonic Representation Hy- pothesis. In Proceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Re- search, Vol. 235), Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Feli...

2024

[25] [25]

Pānini-Backus Form

Peter Zilahy Ingerman. 1967. “Pānini-Backus Form” suggested. Commun. ACM 10, 3 (March 1967), 137. doi:10.1145/ 363162.363165

arXiv 1967

[26] [26]

Girish Nath Jha. 2010. The TDIL Program and the Indian Langauge Corpora Intitiative (ILCI). In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias (Eds.). European Lan- guage Reso...

2010

[27] [27]

Braj B. Kachru. 1992. The other tongue: English across cultures (2 ed.). University of Illinois Press, Urbana, Illinois

1992

[28] [28]

Braj B. Kachru. 1992. World Englishes: approaches, issues and resources. Language Teaching 25, 1 (1992), 1–14. doi:10.1017/S0261444800006583

work page doi:10.1017/s0261444800006583 1992

[29] [29]

Khapra, and Pratyush Kumar

Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, and Pratyush Kumar. 2020. IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 , Trevor Cohn, Yulan He, and Y...

2020

[30] [30]

N. J. Karthika, Maharaj Brahma, Rohit Saluja, Ganesh Ramakrishnan, and Maunendra Sankar Desarkar. 2025. Multi- lingual Tokenization through the Lens of Indian Languages: Challenges and Insights . arXiv: 2506.17789 [cs.CL]

Pith/arXiv arXiv 2025

[31] [31]

Robert D. King. 2006. The Poisonous Potency of Script: Hindi and Urdu. International Journal of the Sociology of Language 2001, 150 (2006), 43–59. doi:10.1515/ijsl.2001.035

work page doi:10.1515/ijsl.2001.035 2006

[32] [32]

Amrith Krishna, Pavan Kumar Satuluri, and Pawan Goyal. 2017. A Dataset for Sanskrit Word Segmentation. In Pro- ceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, and Stan Sz- pakowicz (Ed...

work page doi:10.18653/v1/w17-2214 2017

[33] [33]

Sriram Krishnan and Amba Kulkarni. 2019. Sanskrit Segmentation revisited. In Proceedings of the 16th International Conference on Natural Language Processing, Dipti Misra Sharma and Pushpak Bhattacharya (Eds.). NLP Association of India, International Institute of Information Technology, Hyderabad, India, 105–114. https://aclanthology.org/2019. icon-1.12/

2019

[34] [34]

Sriram Krishnan, Amba Kulkarni, and Gérard Huet. 2023. Validation and Normalization of DCS corpus and Develop- ment of the Sanskrit Heritage Engine’s Segmenter. In Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference , Amba Kulkarni and Oliver Hellwig (Eds.). Association for Comput...

2023

[35] [35]

Anoop Kunchukuttan and Pushpak Bhattacharyya. 2022. Machine Translation and Transliteration involving Related and Low-Resource Languages. CRC Press, Boca Raton, USA and Abingdon, UK

2022

[36] [36]

David B. Lurie. 2023. The Vernacular in the World of Wen: Sheldon Pollock’s Model in East Asia? In Cosmopolitan and Vernacular in the World of Wen 文, Ross King (Ed.). Language, Writing and Literary Culture in the Sinographic Cosmopolis, Vol. 5. Brill, Leiden, The Netherlands, 49–68. doi:10.1163/9789004529441_003

work page doi:10.1163/9789004529441_003 2023

[37] [37]

Anand Mishra. 2009. Simulating the Pāṇinian System of Sanskrit Grammar. In Sanskrit Computational Linguistics , Gérard Huet, Amba Kulkarni, and Peter Scharf (Eds.). Lecture Notes in Computer Science, Vol. 5402. Springer, Berlin, 127–138. doi:10.1007/978-3-642-00155-0_4

work page doi:10.1007/978-3-642-00155-0_4 2009

[38] [38]

2025.BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages

Vandan Mujadia and Dipti Misra Sharma. 2025.BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages. arXiv:2412.04351

arXiv 2025

[39] [39]

Karthika N J, Krishnakant Bhatt, Ganesh Ramakrishnan, and Preethi Jyothi. 2025. LEVOS: Leveraging Vocabulary Overlap with Sanskrit to Generate Technical Lexicons in Indian Languages. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025) , Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein,...

work page doi:10.18653/v1/2025.bea-1.20 2025

[40] [40]

Arijit Nag, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, and Soumen Chakrabarti. 2023. Transfer Learning for Low-Resource Multilingual Relation Classification. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 2, Article 50 (March 2023), 24 pages. doi:10.1145/3554734

work page doi:10.1145/3554734 2023

[41] [41]

Sebastian Nehrdich, Oliver Hellwig, and Kurt Keutzer. 2024. One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks. In Findings of the Association for Computational Linguistics: EMNLP 2024 , Yaser Al- Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 13742–13751. doi:...

work page doi:10.18653/v1/2024.findings-emnlp.805 2024

[42] [42]

NLLB Team. 2024. Scaling neural machine translation to 200 languages. Nature 630 (2024), 841–846. doi:10.1038/ s41586-024-07335-x

2024

[43] [43]

Riya Pal and Dipti Sharma. 2019. Towards Automated Semantic Role Labelling of Hindi-English Code-Mixed Tweets. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), Wei Xu, Alan Ritter, Tim Baldwin, and Afshin Rahimi (Eds.). Association for Computational Linguistics, Hong Kong, China, 291–296. doi:10.18653/v1/D19- 5538

work page doi:10.18653/v1/d19- 2019

[44] [44]

Martha Palmer, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure. In Proceedings of the 7th International Conference on Natural Language Processing (ICON) . Macmillan Publishers, Hyderabad, India, 259– 268

2009

[45] [45]

Martha Palmer, Paul Kingsbury, and Daniel Gildea. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics 31, 1 (2005), 71–106. doi:10.1162/0891201053630264

work page doi:10.1162/0891201053630264 2005

[46] [46]

Priyaranjan Pattnayak, Hitesh Patel, and Amit Agarwal. 2025. Tokenization Matters: Improving Zero-Shot NER for Indic Languages. In 2025 IEEE International Conference on Electro Information Technology (eIT) . IEEE, Valparaiso, Indiana, USA, 456–462. doi:10.1109/eIT64391.2025.11103625

work page doi:10.1109/eit64391.2025.11103625 2025

[47] [47]

Siddhesh Pawar, Pushpak Bhattacharyya, and Partha Talukdar. 2023. Evaluating Cross Lingual Transfer for Mor- phological Analysis: a Case Study of Indian Languages. In Proceedings of the 20th SIGMORPHON workshop on Com- putational Research in Phonetics, Phonology, and Morphology , Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, and Çağrı Çöltekin (Eds...

work page doi:10.18653/v1/2023 2023

[48] [48]

Gerald Penn and Paul Kiparsky. 2012. On Pāṇini and the Generative Capacity of Contextualized Replacement Sys- tems. In Proceedings of COLING 2012: Posters , Martin Kay and Christian Boitet (Eds.). The COLING 2012 Organizing Committee, Mumbai, India, 943–950. https://aclanthology.org/C12-2092

2012

[49] [49]

Sheldon I. Pollock. 2000. Cosmopolitan and Vernacular in History. Public Culture 12, 3 (2000), 591–625. Project MUSE. https://muse.jhu.edu/article/26221

2000

[50] [50]

Pooja Rai, Ayan Das, and Sanjay Chatterji. 2025. Mapping of the Nepali Dependency Treebank to Universal Depen- dencies. ACM Trans. Asian Low-Resour. Lang. Inf. Process.24, 11, Article 132 (Nov. 2025), 22 pages. doi:10.1145/3749643

work page doi:10.1145/3749643 2025

[51] [51]

Vinit Ravishankar. 2017. A Universal Dependencies Treebank for Marathi. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories , Jan Hajič (Ed.). Association for Computational Linguistics, Prague, Czech Republic, 190–200. https://aclanthology.org/W17-7623

2017

[52] [52]

Rajendran, K

Baskaran Sankaran, Kalika Bali, Monojit Choudhury, Tanmoy Bhattacharya, Pushpak Bhattacharyya, Girish Nath Jha, S. Rajendran, K. Saravanan, L. Sobha, and K.V. Subbarao. 2008. A Common Parts-of-Speech Tagset Framework for Indian Languages. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Nicoletta Calzola...

2008

[53] [53]

P., Brejesh Lall, and Shresth Mehta

Arun Kumar Singh, Sushant Dave, Prathosh A. P., Brejesh Lall, and Shresth Mehta. 2020. A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis . arXiv: 2010.12937

arXiv 2020

[54] [54]

Abhishek Kumar Singh, Vishwajeet Kumar, Rudra Murthy, Jaydeep Sen, Ashish Mittal, and Ganesh Ramakrishnan

[55] [55]

In Findings of the Association for Computational Linguistics: NAACL 2025 , Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.)

INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages. In Findings of the Association for Computational Linguistics: NAACL 2025 , Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.). Association for Computational Linguistics, Albuquerque, New Mexico, 2607–2626. doi:10.18653/ v1/2025.findings-naacl.141

2025

[56] [56]

Juhi Tandon, Himani Chaudhary, Riyaz Ahmad Bhat, and Dipti Misra Sharma. 2016. Conversion from Pāṇinian Karakas to Universal Dependencies for Hindi Dependency Treebank. In Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LA W-X 2016), Annemarie Friedrich and Katrin Tomanek (Eds.). Associ- ation for Computational Li...

work page doi:10.18653/v1/w16-1716 2016

[57] [57]

Sarah Grey Thomason. 2000. Linguistic Areas and Language History. In Languages in Contact , D. G. Gilbers, J. Ner- bonne, and J. Schaeken (Eds.). Studies in Slavic and General Linguistics, Vol. 28. Brill, Leiden, The Netherlands, 311–

2000

[58] [58]

Varshney

doi:10.1163/9789004488472_030 16 Ritwik Banerjee and Lav R. Varshney

work page doi:10.1163/9789004488472_030

[59] [59]

Srisa Chandra Vasu. 1897. The Ashtādhyāyī of Pāṇini. Sindhu Charan Bose, Benares

[60] [60]

Joshi, Aiman A

Devika Verma, Ramprasad S. Joshi, Aiman A. Shivani, and Rohan D. Gupta. 2023. Kāraka-Based Answer Retrieval for Question Answering in Indic Languages. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing , Ruslan Mitkov and Galia Angelova (Eds.). INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria, 1216–1224. h...

2023

[61] [61]

Daniel Zeman, Joakim Nivre, Rimsha Abid, Mitchell Abrams, et al. 2026. Universal Dependencies 2.18 . Institute of Formal and Applied Linguistics (ÚFAL), LINDAT/CLARIAH-CZ Digital Library. http://hdl.handle.net/11234/1-6149

2026