MARS coordinates heterogeneous GPU-CPU resources for agentic LLM workloads via decoupled admission control and agent-centric KV cache management, delivering up to 5.94x lower latency and 1.87x faster task completion.
Gonzalez, Clark Barrett, and Ying Sheng
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.OS 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems
MARS coordinates heterogeneous GPU-CPU resources for agentic LLM workloads via decoupled admission control and agent-centric KV cache management, delivering up to 5.94x lower latency and 1.87x faster task completion.