TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
An empirical comparison of vocabulary expansion and initialization approaches for language models
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2representative citing papers
SSU mitigates catastrophic forgetting in low-resource LLM target-language adaptation by scoring and column-wise freezing source-critical parameters, reducing source degradation to ~3% versus ~20% for full fine-tuning while matching target performance.
citing papers explorer
-
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
-
Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates
SSU mitigates catastrophic forgetting in low-resource LLM target-language adaptation by scoring and column-wise freezing source-critical parameters, reducing source degradation to ~3% versus ~20% for full fine-tuning while matching target performance.