pith. sign in

SPLIT: SymPathy for Large jobs Improves Tail latency

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

We study the asymptotic response time tail in the M/G/n multi-server queue with heavy-tailed (regularly varying) job sizes, a setting representative of modern computing workloads. For single-server systems, tail optimization is well understood: under heavy-tailed job sizes, policies such as SRPT that strictly prioritize short jobs are strongly tail optimal, and giving any priority to large jobs is harmful. For multi-server systems, the question has been almost entirely open. This paper gives the first strongly tail-optimal scheduling policies for the M/G/n queue with heavy-tailed job sizes. Our central finding is that the multi-server case is intrinsically different from the single-server case: giving a small amount of ``sympathy'' to large jobs is essential for strong tail optimality. We establish strong (or arbitrarily close to strong) tail optimality across the full stability region, both with and without knowledge of job sizes.

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

cs.LG · 2026-06-16 · unverdicted · novelty 7.0

Presents a distribution-aware scheduling framework for LLM inference that reduces P99 TTLT by 35-50% and TTFT by 34-47% versus SRPT with perfect length knowledge using statistical signals instead of predictions.

citing papers explorer

Showing 1 of 1 citing paper.

  • Beyond Prediction: Tail-Aware Scheduling for LLM Inference cs.LG · 2026-06-16 · unverdicted · none · ref 4 · internal anchor

    Presents a distribution-aware scheduling framework for LLM inference that reduces P99 TTLT by 35-50% and TTFT by 34-47% versus SRPT with perfect length knowledge using statistical signals instead of predictions.