Speed: Speculative pipelined execution for efficient decoding, 2024

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao · 2024 · arXiv 2310.12072

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

cs.LG · 2024-01-26 · unverdicted · novelty 6.0

EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.

LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

cs.CL · 2025-07-02 · unverdicted · novelty 5.0

LogitSpec accelerates retrieval-based speculative decoding by speculating the next-next token from the last logit and retrieving relevant references for both next and next-next tokens, reporting up to 2.61x speedup and 3.28 mean accepted tokens.

citing papers explorer

Showing 2 of 2 citing papers.

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty cs.LG · 2024-01-26 · unverdicted · none · ref 56
EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation cs.CL · 2025-07-02 · unverdicted · none · ref 40
LogitSpec accelerates retrieval-based speculative decoding by speculating the next-next token from the last logit and retrieving relevant references for both next and next-next tokens, reporting up to 2.61x speedup and 3.28 mean accepted tokens.

Speed: Speculative pipelined execution for efficient decoding, 2024

fields

years

verdicts

representative citing papers

citing papers explorer