ToaST uses split trees and integer programming to cut token counts by over 11% versus BPE on English text at 40k+ vocab sizes, yielding higher CORE scores in 1.5B-parameter language model training.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Tokenization with Split Trees
ToaST uses split trees and integer programming to cut token counts by over 11% versus BPE on English text at 40k+ vocab sizes, yielding higher CORE scores in 1.5B-parameter language model training.