Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
Magnet: Improving the multilingual fairness of language models with adaptive gradient-based tokenization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
CONDITIONAL 2representative citing papers
Proxy compression trains language models on both raw bytes and compressed sequences to enable efficient training with raw-byte inference at test time.
citing papers explorer
-
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
-
Proxy Compression for Language Modeling
Proxy compression trains language models on both raw bytes and compressed sequences to enable efficient training with raw-byte inference at test time.