WISP suppresses wasted drafting time and verification interference in edge-cloud speculative LLM serving through dynamic drafting and SLO-aware batching, delivering up to 2.1x capacity and 1.94x goodput gains over centralized and prior baselines.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching
WISP suppresses wasted drafting time and verification interference in edge-cloud speculative LLM serving through dynamic drafting and SLO-aware batching, delivering up to 2.1x capacity and 1.94x goodput gains over centralized and prior baselines.