FASER delivers up to 53% higher throughput and 1.92x lower latency in dynamic LLM serving by adjusting speculative lengths per request, early pruning of rejects, and overlapping draft/verification phases via frontiers.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
WISP suppresses wasted drafting time and verification interference in edge-cloud speculative LLM serving through dynamic drafting and SLO-aware batching, delivering up to 2.1x capacity and 1.94x goodput gains over centralized and prior baselines.
citing papers explorer
-
FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving
FASER delivers up to 53% higher throughput and 1.92x lower latency in dynamic LLM serving by adjusting speculative lengths per request, early pruning of rejects, and overlapping draft/verification phases via frontiers.
-
WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching
WISP suppresses wasted drafting time and verification interference in edge-cloud speculative LLM serving through dynamic drafting and SLO-aware batching, delivering up to 2.1x capacity and 1.94x goodput gains over centralized and prior baselines.