pith:7FGC55WL
Fast Inference from Transformers via Speculative Decoding
Speculative decoding accelerates large autoregressive models by verifying multiple draft tokens in one parallel run of the target model while preserving the exact output distribution.
arxiv:2211.17192 v2 · 2022-11-30 · cs.LG · cs.CL
Record completeness
Claims
Our method can accelerate existing off-the-shelf models without retraining or architecture changes. We demonstrate it on T5-XXL and show a 2X-3X acceleration compared to the standard T5X implementation, with identical outputs.
That sufficiently accurate and faster approximation models exist for the subtasks inside typical language-modeling workloads, so that the draft model produces enough accepted tokens to offset the overhead of the verification step.
Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:12.749938Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519 (pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
f94c2ef6cb16546a985607b74cdb77bdecf4b252b7eda6367d020b0fe46c71c8
Aliases
· ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7FGC55WLCZKGVGCWA63UZW3XXX \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f94c2ef6cb16546a985607b74cdb77bdecf4b252b7eda6367d020b0fe46c71c8
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "722a11c4fbeb93e2a0d8e8109b2373e3064887a6df7d2970ce294692c809004e",
"cross_cats_sorted": [
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2022-11-30T17:33:28Z",
"title_canon_sha256": "09f8a8477b22a0ae2be2f682fdccd88bb0a06063c9f966e27e3edaf14d91ee52"
},
"schema_version": "1.0",
"source": {
"id": "2211.17192",
"kind": "arxiv",
"version": 2
}
}