Speculative decoding: Exploiting speculative execution for accelerating seq2seq generation

Xia, Heming, Ge, Tao, Wang, Peiyi, Chen, Si-Qing, Wei, Furu, Sui, Zhifang · 2023 · DOI 10.18653/v1/2023.findings-emnlp.257

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding

cs.DC · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

PipeSD achieves 1.16x-2.16x speedup and 14.3%-25.3% lower energy use in cloud-edge LLM inference via token-batch pipeline scheduling optimized by dynamic programming and a Bayesian-optimized dual-threshold NAV trigger.

SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

cs.DC · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

SPECTRE achieves up to 2.28x speedup for large-model LLM serving by running speculative draft generation and target verification in parallel using idle tail-model services.

citing papers explorer

Showing 2 of 2 citing papers.

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding cs.DC · 2026-05-13 · unverdicted · none · ref 35 · 2 links
PipeSD achieves 1.16x-2.16x speedup and 14.3%-25.3% lower energy use in cloud-edge LLM inference via token-batch pipeline scheduling optimized by dynamic programming and a Bayesian-optimized dual-threshold NAV trigger.
SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference cs.DC · 2026-05-04 · unverdicted · none · ref 25 · 2 links
SPECTRE achieves up to 2.28x speedup for large-model LLM serving by running speculative draft generation and target verification in parallel using idle tail-model services.

Speculative decoding: Exploiting speculative execution for accelerating seq2seq generation

fields

years

verdicts

representative citing papers

citing papers explorer