TTKV reduces cross-tier KV cache traffic by 5.94x on 128K-context tasks and cuts latency up to 76% by using temporal tiers, HBM/DRAM separation, and block-wise streaming attention.
ProcessBench: Identify- ing process errors in mathematical reasoning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
TTKV reduces cross-tier KV cache traffic by 5.94x on 128K-context tasks and cuts latency up to 76% by using temporal tiers, HBM/DRAM separation, and block-wise streaming attention.