SA-BPE regularizes standard BPE training for code by incorporating source diversity to skip problematic merges, substantially cutting unused tokens without altering inference.
InProceedings of the 58th Annual Meeting of the Association for Computational Lin- guistics, pages 1882–1892, Online
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution
SA-BPE regularizes standard BPE training for code by incorporating source diversity to skip problematic merges, substantially cutting unused tokens without altering inference.