DiP-SD jointly optimizes batch count, user-to-batch assignment, and per-user draft lengths to deliver up to 17.89x throughput over autoregressive decoding and 1.93x over greedy batching in a device-edge Qwen deployment.
Distillspec: Improving speculative decoding via knowledge distillation,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IT 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DiP-SD: Distributed Pipelined Speculative Decoding for Efficient LLM Inference at the Edge
DiP-SD jointly optimizes batch count, user-to-batch assignment, and per-user draft lengths to deliver up to 17.89x throughput over autoregressive decoding and 1.93x over greedy batching in a device-edge Qwen deployment.