Tutti is a GPU-direct SSD-backed KV cache that removes CPU bottlenecks via object abstraction, GPU io_uring, and slack scheduling, delivering near-DRAM performance at 2x higher request rate and 27% lower cost than prior GDS-based systems.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
HybridGen achieves 1.41x-3.2x average speedups over six prior KV cache methods for LLM inference by using attention logit parallelism, a feedback-driven scheduler, and semantic-aware KV cache mapping.
citing papers explorer
-
Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving
Tutti is a GPU-direct SSD-backed KV cache that removes CPU bottlenecks via object abstraction, GPU io_uring, and slack scheduling, delivering near-DRAM performance at 2x higher request rate and 27% lower cost than prior GDS-based systems.
-
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
HybridGen achieves 1.41x-3.2x average speedups over six prior KV cache methods for LLM inference by using attention logit parallelism, a feedback-driven scheduler, and semantic-aware KV cache mapping.