Byte-level simulations show subword tokenization improves LLM training mainly via increased throughput and boundary priors.
arXiv preprint arXiv:2601.21204 (2026)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
unclear 1representative citing papers
Q-PIPE is a quantum phase encoding for images that achieves O(qN) gate complexity, supports native finite-difference operations, and shows low error in edge-detection tests on benchmark data.
Key-Gram uses a memory module with key-grams and hashed lookup to inject static linguistic priors into vision-language-action backbones, yielding reported gains on manipulation benchmarks.
NGM is a plug-and-play n-gram memory module that encodes n-grams from pretrained embeddings and gates their injection to improve LLM performance by 0.5-1.2 points on average across eight benchmarks.
SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.
citing papers explorer
-
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation
Byte-level simulations show subword tokenization improves LLM training mainly via increased throughput and boundary priors.
-
Q-PIPE A Practical Quantum Phase Encoding Method
Q-PIPE is a quantum phase encoding for images that achieves O(qN) gate complexity, supports native finite-difference operations, and shows low error in edge-detection tests on benchmark data.
-
Key-Gram: Extensible World Knowledge for Embodied Manipulation
Key-Gram uses a memory module with key-grams and hashed lookup to inject static linguistic priors into vision-language-action backbones, yielding reported gains on manipulation benchmarks.
-
NGM: A Plug-and-Play Training-Free Memory Module for LLMs
NGM is a plug-and-play n-gram memory module that encodes n-grams from pretrained embeddings and gates their injection to improve LLM performance by 0.5-1.2 points on average across eight benchmarks.
-
SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining
SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.