Quantspec: Self-speculative decoding with hierarchical quantized kv cache,

· 2025 · arXiv 2502.10424

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding

cs.AR · 2026-05-26 · unverdicted · novelty 5.0

Cassandra is a self-speculative decoding system that builds a draft model via fine-grained data selection and optimized pruning/mantissa truncation, achieving up to 2.41x speedup over BF16 and 1.81x more tokens than Eagle-3 on Llama 3 8B without training.

citing papers explorer

Showing 1 of 1 citing paper.

Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding cs.AR · 2026-05-26 · unverdicted · none · ref 58
Cassandra is a self-speculative decoding system that builds a draft model via fine-grained data selection and optimized pruning/mantissa truncation, achieving up to 2.41x speedup over BF16 and 1.81x more tokens than Eagle-3 on Llama 3 8B without training.

Quantspec: Self-speculative decoding with hierarchical quantized kv cache,

fields

years

verdicts

representative citing papers

citing papers explorer