pith. sign in

Multi-Token Prediction via Self-Distillation

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We consider a new approach for converting a pretrained autoregressive language model from a slow single next token prediction model into a fast standalone multi-token prediction model using a simple online distillation objective. The final model retains the exact same implementation as the pretrained initial checkpoint and is deployable without the addition of any auxiliary verifier or other specialized inference code. Our method produces models that decode more than $3\times$ faster at $<5\%$ drop in accuracy on GSM8K relative to the single token decoding performance of the same checkpoint.

fields

cs.HC 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.