SuperInfer improves TTFT SLO attainment by up to 74.7% on GH200 Superchips via SLO-aware rotary scheduling (RotaSched) and full-duplex KV cache rotation (DuplexKV) over NVLink-C2C while preserving TBT and throughput.
We compare theFirst- Come-First-Serve(FCFS) andShortest-Job-Firstwith ora- cle generation length information (SJF-Oracle) policy
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips
SuperInfer improves TTFT SLO attainment by up to 74.7% on GH200 Superchips via SLO-aware rotary scheduling (RotaSched) and full-duplex KV cache rotation (DuplexKV) over NVLink-C2C while preserving TBT and throughput.