pith. sign in
Pith Number

pith:R5I3O3BK

pith:2023:R5I3O3BKJRI5W7L5DDVIWJLSCQ
not attested not anchored not stored refs resolved

Fast Distributed Inference Serving for Large Language Models

Bingyang Wu, Fangyue Liu, Gang Huang, Shengyu Liu, Xin Jin, Xuanzhe Liu, Yinmin Zhong, Yuanhang Sun, Zili Zhang

FastServe enables token-level preemption and skip-join scheduling for LLM inference to raise throughput while holding latency fixed.

arxiv:2305.05920 v3 · 2023-05-10 · cs.LG · cs.DC

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{R5I3O3BKJRI5W7L5DDVIWJLSCQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

experimental results show that compared to the state-of-the-art solution vLLM, FastServe improves the throughput by up to 31.4x and 17.9x under the same average and tail latency requirements, respectively.

C2weakest assumption

That token-level preemption and the skip-join MLFQ assignment based on input length incur low enough overhead to deliver the reported gains without hidden costs in real workloads.

C3one line summary

FastServe adds token-level preemption and a skip-join MLFQ scheduler to LLM serving, delivering up to 31.4x higher throughput than vLLM at equivalent average and tail latency.

References

59 extracted · 59 resolved · 0 Pith anchors

[1] Introducing ChatGPT 2022
[2] ChatGPT sets record for fastest-growing user base 2023
[3] Reinventing search with a new ai-powered bing and edge, your copilot for the web 2023
[4] Our next-generation model: Gemini 1.5 2024
[5] Introducing the next generation of Claude 2024

Cited by

28 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.249636Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

8f51b76c2a4c51db7d7d18ea8b25721415869cc95e7906d90d5fba833ac4d882

Aliases

arxiv: 2305.05920 · arxiv_version: 2305.05920v3 · doi: 10.48550/arxiv.2305.05920 · pith_short_12: R5I3O3BKJRI5 · pith_short_16: R5I3O3BKJRI5W7L5 · pith_short_8: R5I3O3BK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/R5I3O3BKJRI5W7L5DDVIWJLSCQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8f51b76c2a4c51db7d7d18ea8b25721415869cc95e7906d90d5fba833ac4d882
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "78cdf56d7fa54d739f556b8432c47f660915962d0d52e492c2ac1e70b807618a",
    "cross_cats_sorted": [
      "cs.DC"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2023-05-10T06:17:50Z",
    "title_canon_sha256": "11d47b641c181c272ea0ee2eff1e59d151b2e50a675352094028c329f8712803"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2305.05920",
    "kind": "arxiv",
    "version": 3
  }
}