pith. machine review for the scientific record. sign in
Pith Number

pith:7FGC55WL

pith:2022:7FGC55WLCZKGVGCWA63UZW3XXX
not attested not anchored not stored refs resolved

Fast Inference from Transformers via Speculative Decoding

Matan Kalman, Yaniv Leviathan, Yossi Matias

Speculative decoding accelerates large autoregressive models by verifying multiple draft tokens in one parallel run of the target model while preserving the exact output distribution.

arxiv:2211.17192 v2 · 2022-11-30 · cs.LG · cs.CL

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

Our method can accelerate existing off-the-shelf models without retraining or architecture changes. We demonstrate it on T5-XXL and show a 2X-3X acceleration compared to the standard T5X implementation, with identical outputs.

C2weakest assumption

That sufficiently accurate and faster approximation models exist for the subtasks inside typical language-modeling workloads, so that the draft model produces enough accepted tokens to offset the overhead of the verification step.

C3one line summary

Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.

References

67 extracted · 67 resolved · 9 Pith anchors

[1] Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarw 2020
[2] LaMDA: Language Models for Dialog Applications , author=. ArXiv , year=
[3] Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , author=. ArXiv , year=
[4] PaLM: Scaling Language Modeling with Pathways , author=. ArXiv , year=
[5] Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding , author=. ArXiv , year=

Formal links

2 machine-checked theorem links

Cited by

16 papers in Pith

Receipt and verification
First computed2026-05-17T23:38:12.749938Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

f94c2ef6cb16546a985607b74cdb77bdecf4b252b7eda6367d020b0fe46c71c8

Aliases

arxiv: 2211.17192 · arxiv_version: 2211.17192v2 · doi: 10.48550/arxiv.2211.17192
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7FGC55WLCZKGVGCWA63UZW3XXX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f94c2ef6cb16546a985607b74cdb77bdecf4b252b7eda6367d020b0fe46c71c8
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "722a11c4fbeb93e2a0d8e8109b2373e3064887a6df7d2970ce294692c809004e",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2022-11-30T17:33:28Z",
    "title_canon_sha256": "09f8a8477b22a0ae2be2f682fdccd88bb0a06063c9f966e27e3edaf14d91ee52"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2211.17192",
    "kind": "arxiv",
    "version": 2
  }
}