Frequency aggregation of supermerge candidates and a two-phase formulation make BoundlessBPE and SuperBPE training over 600x faster on 1GB data while preserving identical results, with open-source Python and Rust code.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.CL 3years
2026 3representative citing papers
Performance gaps in multilingual LMs frequently arise from modeling choices such as tokenization and data exposure rather than intrinsic linguistic complexity.
citing papers explorer
-
Faster Superword Tokenization
Frequency aggregation of supermerge candidates and a two-phase formulation make BoundlessBPE and SuperBPE training over 600x faster on 1GB data while preserving identical results, with open-source Python and Rust code.
-
The Roots of Performance Disparity in Multilingual Language Models: Intrinsic Modeling Difficulty or Design Choices?
Performance gaps in multilingual LMs frequently arise from modeling choices such as tokenization and data exposure rather than intrinsic linguistic complexity.
- Tokenization with Split Trees