SAECache uses a multi-queue semantic-aware eviction policy with fully adaptive online learning to improve TTFT by 1.4x-2.7x over LRU-style baselines in LLM prefix caching.
Preble: Efficient distributed prompt scheduling for LLM serving
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches
SAECache uses a multi-queue semantic-aware eviction policy with fully adaptive online learning to improve TTFT by 1.4x-2.7x over LRU-style baselines in LLM prefix caching.