Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
The Twelfth International Conference on Learning Representations , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.
citing papers explorer
-
Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining
Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
-
Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective
BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.