KV cache reuse improves long-range draft acceptance in speculative decoding but delivers only marginal end-to-end speedups due to drafter limitations.
ISBN 9798400703867
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Speculative decoding integrated into NeMo-RL accelerates synchronous RL rollouts by 1.8x at 8B scale and projects up to 2.5x end-to-end training speedup at 235B scale when combined with asynchronous pipelines.
citing papers explorer
-
When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?
KV cache reuse improves long-range draft acceptance in speculative decoding but delivers only marginal end-to-end speedups due to drafter limitations.
-
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
Speculative decoding integrated into NeMo-RL accelerates synchronous RL rollouts by 1.8x at 8B scale and projects up to 2.5x end-to-end training speedup at 235B scale when combined with asynchronous pipelines.