LangMAP adapts UnigramLM for multilingual use to deliver language-specific tokenization from a shared vocabulary, boosting boundary alignment metrics across natural and programming languages with mixed downstream fine-tuning gains.
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
LangMAP: A Language-Adaptive Approach to Tokenization
LangMAP adapts UnigramLM for multilingual use to deliver language-specific tokenization from a shared vocabulary, boosting boundary alignment metrics across natural and programming languages with mixed downstream fine-tuning gains.