MORI improves throughput 20-71% and TTFT 18-43% over baselines by ranking programs on a continuous idleness spectrum and shifting the GPU-CPU boundary to match capacity in agentic LLM serving.
Jackson, Zhifei Li, Jiarong Xing, Scott Shenker, and Ion Stoica
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.OS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Idleness is Relative: Exploiting Tool-Call Idle Windows for Offloading in Agentic Systems with MORI
MORI improves throughput 20-71% and TTFT 18-43% over baselines by ranking programs on a continuous idleness spectrum and shifting the GPU-CPU boundary to match capacity in agentic LLM serving.