TTKV reduces cross-tier KV cache traffic by 5.94x on 128K-context tasks and cuts latency up to 76% by using temporal tiers, HBM/DRAM separation, and block-wise streaming attention.
Leave no document behind: Benchmarking long-context LLMs with extended multi-doc QA
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
TTKV reduces cross-tier KV cache traffic by 5.94x on 128K-context tasks and cuts latency up to 76% by using temporal tiers, HBM/DRAM separation, and block-wise streaming attention.