Leyline adds a policy-directed KV cache edit primitive with closed-form RoPE correction for agentic inference, reporting +11.2 pp cache-hit lift and +14.3 pp solve-rate gain.
arXiv preprint arXiv:2503.16525 (2025)
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Dual-pool token-budget routing for LLM serving reduces GPU-hours by 31-42% and preemption rates by 5.4x through online-learned request classification without a tokenizer.
OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.
SparseX adds segment-level KV cache reuse with Sparse-Q guided recomputation and layer-wise hybrid attention to handle interleaved serving patterns beyond standard prefix caching.
citing papers explorer
-
Leyline: KV Cache Directives for Agentic Inference
Leyline adds a policy-directed KV cache edit primitive with closed-form RoPE correction for agentic inference, reporting +11.2 pp cache-hit lift and +14.3 pp solve-rate gain.
-
Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
Dual-pool token-budget routing for LLM serving reduces GPU-hours by 31-42% and preemption rates by 5.4x through online-learned request classification without a tokenizer.
-
OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism
OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.
-
SparseX: Efficient Segment-Level KV Cache Sharing for Interleaved LLM Serving
SparseX adds segment-level KV cache reuse with Sparse-Q guided recomputation and layer-wise hybrid attention to handle interleaved serving patterns beyond standard prefix caching.