Title resolution pending

Sander Land, Catherine Arnett · 2025 · arXiv 2505.24689

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Faster Superword Tokenization

cs.CL · 2026-04-06 · accept · novelty 7.0

Frequency aggregation of supermerge candidates and a two-phase formulation make BoundlessBPE and SuperBPE training over 600x faster on 1GB data while preserving identical results, with open-source Python and Rust code.

The Roots of Performance Disparity in Multilingual Language Models: Intrinsic Modeling Difficulty or Design Choices?

cs.CL · 2026-01-12 · accept · novelty 4.0

Performance gaps in multilingual LMs frequently arise from modeling choices such as tokenization and data exposure rather than intrinsic linguistic complexity.

Tokenization with Split Trees

cs.CL · 2026-05-21

citing papers explorer

Showing 3 of 3 citing papers.

Faster Superword Tokenization cs.CL · 2026-04-06 · accept · none · ref 8
Frequency aggregation of supermerge candidates and a two-phase formulation make BoundlessBPE and SuperBPE training over 600x faster on 1GB data while preserving identical results, with open-source Python and Rust code.
The Roots of Performance Disparity in Multilingual Language Models: Intrinsic Modeling Difficulty or Design Choices? cs.CL · 2026-01-12 · accept · none · ref 6
Performance gaps in multilingual LMs frequently arise from modeling choices such as tokenization and data exposure rather than intrinsic linguistic complexity.
Tokenization with Split Trees cs.CL · 2026-05-21 · unreviewed · ref 55

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer