pith. sign in
Pith Number

pith:LK3AAFPB

pith:2023:LK3AAFPBRF6FDZR2UNQ677ROMZ
not attested not anchored not stored refs resolved

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Amey Agrawal, Ashish Panwar, Bhargav S. Gulavani, Jayashree Mohan, Nipun Kwatra, Ramachandran Ramjee

SARATHI splits each prefill into equal chunks and fills the rest of every batch with decode requests so the chunks saturate GPU compute while decodes piggyback at far lower cost.

arxiv:2308.16369 v1 · 2023-08-31 · cs.LG · cs.DC

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LK3AAFPBRF6FDZR2UNQ677ROMZ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x.

C2weakest assumption

That chunked prefills can be performed without accuracy loss or extra memory overhead and that decode requests can be freely mixed into the same batch as a prefill chunk while preserving correct autoregressive generation.

C3one line summary

SARATHI uses chunked prefills and decode-maximal batching to let decode steps ride along with prefill compute, delivering up to 10x higher decode throughput and 1.91x end-to-end throughput on models including LLaMA-13B and GPT-3.

References

48 extracted · 48 resolved · 4 Pith anchors

[1] https://aws.amazon.com/ codewhisperer/
[2] https://claude.ai
[3] https://www.bing.com/chat
[4] https://character.ai
[5] https://chat.openai.com

Formal links

2 machine-checked theorem links

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.830035Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5ab60015e1897c51e63aa361effe2e666a536eca4b29df28eb889ec3d70dd7a7

Aliases

arxiv: 2308.16369 · arxiv_version: 2308.16369v1 · doi: 10.48550/arxiv.2308.16369 · pith_short_12: LK3AAFPBRF6F · pith_short_16: LK3AAFPBRF6FDZR2 · pith_short_8: LK3AAFPB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LK3AAFPBRF6FDZR2UNQ677ROMZ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5ab60015e1897c51e63aa361effe2e666a536eca4b29df28eb889ec3d70dd7a7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "466bf7c6ea41511e785a758ea569c17066f4dacb24704504de9573d6d0ed8b1e",
    "cross_cats_sorted": [
      "cs.DC"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2023-08-31T00:03:02Z",
    "title_canon_sha256": "52013df046821f6fa51ac4289806767fcc03790d923841a9e0b1f85213776b67"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2308.16369",
    "kind": "arxiv",
    "version": 1
  }
}