pith:LK3AAFPB
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
SARATHI splits each prefill into equal chunks and fills the rest of every batch with decode requests so the chunks saturate GPU compute while decodes piggyback at far lower cost.
arxiv:2308.16369 v1 · 2023-08-31 · cs.LG · cs.DC
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LK3AAFPBRF6FDZR2UNQ677ROMZ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x.
That chunked prefills can be performed without accuracy loss or extra memory overhead and that decode requests can be freely mixed into the same batch as a prefill chunk while preserving correct autoregressive generation.
SARATHI uses chunked prefills and decode-maximal batching to let decode steps ride along with prefill compute, delivering up to 10x higher decode throughput and 1.91x end-to-end throughput on models including LLaMA-13B and GPT-3.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.830035Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
5ab60015e1897c51e63aa361effe2e666a536eca4b29df28eb889ec3d70dd7a7
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LK3AAFPBRF6FDZR2UNQ677ROMZ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5ab60015e1897c51e63aa361effe2e666a536eca4b29df28eb889ec3d70dd7a7
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "466bf7c6ea41511e785a758ea569c17066f4dacb24704504de9573d6d0ed8b1e",
"cross_cats_sorted": [
"cs.DC"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2023-08-31T00:03:02Z",
"title_canon_sha256": "52013df046821f6fa51ac4289806767fcc03790d923841a9e0b1f85213776b67"
},
"schema_version": "1.0",
"source": {
"id": "2308.16369",
"kind": "arxiv",
"version": 1
}
}