Magnet: Improving the multilingual fairness of language models with adaptive gradient-based tokenization

URL https://aclanthology · 2023 · arXiv 2407.08818

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

cs.CL · 2026-05-10 · conditional · novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.

Proxy Compression for Language Modeling

cs.CL · 2026-02-04 · conditional · novelty 6.0

Proxy compression trains language models on both raw bytes and compressed sequences to enable efficient training with raw-byte inference at test time.

citing papers explorer

Showing 2 of 2 citing papers.

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models cs.CL · 2026-05-10 · conditional · none · ref 1
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
Proxy Compression for Language Modeling cs.CL · 2026-02-04 · conditional · none · ref 1
Proxy compression trains language models on both raw bytes and compressed sequences to enable efficient training with raw-byte inference at test time.

Magnet: Improving the multilingual fairness of language models with adaptive gradient-based tokenization

fields

years

verdicts

representative citing papers

citing papers explorer