Preble: Efficient distributed prompt scheduling for LLM serving

Vikranth Srivatsa, Zijian He, Reyna Abhyankar, Dongming Li, Yiying Zhang · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

SAECache uses a multi-queue semantic-aware eviction policy with fully adaptive online learning to improve TTFT by 1.4x-2.7x over LRU-style baselines in LLM prefix caching.

citing papers explorer

Showing 1 of 1 citing paper.

Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches cs.LG · 2026-05-12 · unverdicted · none · ref 21
SAECache uses a multi-queue semantic-aware eviction policy with fully adaptive online learning to improve TTFT by 1.4x-2.7x over LRU-style baselines in LLM prefix caching.

Preble: Efficient distributed prompt scheduling for LLM serving

fields

years

verdicts

representative citing papers

citing papers explorer