WhiFlash introduces token-level cross-paradigm routing between autoregressive and diffusion drafting models, with cache optimizations, to raise acceptance lengths and deliver up to 69.6% throughput gains over EAGLE-3.
RASD : Retrieval-Augmented Speculative Decoding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
AdaPLD adaptively mixes lexical and semantic retrieval with branched reuse to improve model-free speculative decoding and reports up to 3.10x speedup across benchmarks.
citing papers explorer
-
AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding
AdaPLD adaptively mixes lexical and semantic retrieval with branched reuse to improve model-free speculative decoding and reports up to 3.10x speedup across benchmarks.