Judge decoding: Faster speculative sampling requires going beyond model alignment

Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Schönfeld, Ali Thabet, Jonas Kohler · 2025 · arXiv 2501.19309

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

cs.IT · 2026-04-20 · unverdicted · novelty 7.0

WISV uses a channel-aware semantic acceptance policy on hidden representations to boost accepted sequence length by up to 60.8% and cut interaction rounds by 37.3% in distributed speculative decoding, with under 1% accuracy loss.

Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

cs.CL · 2025-11-28 · unverdicted · novelty 6.0

FLy is a training-free method that speeds up LLM generation by accepting semantically correct but non-exact draft tokens via an entropy gate and deferred verification window.

Speculative Coupled Decoding for Training-Free Lossless Acceleration of Autoregressive Visual Generation

cs.CV · 2025-10-28 · unverdicted · novelty 6.0

Speculative Coupled Decoding stabilizes draft sampling in Speculative Jacobi Decoding via an information-theoretic coupling step, delivering up to 4.2x image and 13.6x video speedups with no quality loss or training.

SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

eess.SP · 2026-04-28 · unverdicted · novelty 5.0

SpecFed accelerates federated LLM inference via speculative decoding for parallel processing and top-K compression with server-side reconstruction, achieving high fidelity with reduced communication overhead.

LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

cs.CL · 2025-07-02 · unverdicted · novelty 5.0

LogitSpec accelerates retrieval-based speculative decoding by speculating the next-next token from the last logit and retrieving relevant references for both next and next-next tokens, reporting up to 2.61x speedup and 3.28 mean accepted tokens.

citing papers explorer

Showing 5 of 5 citing papers.

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference cs.IT · 2026-04-20 · unverdicted · none · ref 20
WISV uses a channel-aware semantic acceptance policy on hidden representations to boost accepted sequence length by up to 60.8% and cut interaction rounds by 37.3% in distributed speculative decoding, with under 1% accuracy loss.
Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match cs.CL · 2025-11-28 · unverdicted · none · ref 4
FLy is a training-free method that speeds up LLM generation by accepting semantically correct but non-exact draft tokens via an entropy gate and deferred verification window.
Speculative Coupled Decoding for Training-Free Lossless Acceleration of Autoregressive Visual Generation cs.CV · 2025-10-28 · unverdicted · none · ref 3
Speculative Coupled Decoding stabilizes draft sampling in Speculative Jacobi Decoding via an information-theoretic coupling step, delivering up to 4.2x image and 13.6x video speedups with no quality loss or training.
SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission eess.SP · 2026-04-28 · unverdicted · none · ref 21
SpecFed accelerates federated LLM inference via speculative decoding for parallel processing and top-K compression with server-side reconstruction, achieving high fidelity with reduced communication overhead.
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation cs.CL · 2025-07-02 · unverdicted · none · ref 35
LogitSpec accelerates retrieval-based speculative decoding by speculating the next-next token from the last logit and retrieving relevant references for both next and next-next tokens, reporting up to 2.61x speedup and 3.28 mean accepted tokens.

Judge decoding: Faster speculative sampling requires going beyond model alignment

fields

years

verdicts

representative citing papers

citing papers explorer