QSpec: Speculative decoding with complementaryquantizationschemes

Juntao Zhao, Wenhao Lu, Sheng Wang, Lingpeng Kong, Chuan Wu · 2025 · DOI 10.18653/v1/2025.emnlp-main.240

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

cs.AI · 2026-06-01 · conditional · novelty 7.0

2-bit quantized reasoning models exhibit process failures like loops and delayed commitment that degrade end-to-end performance, but FP16 planning and loop rescue recover accuracy on MATH-500 from 17.2% to 74.2% for Qwen3-8B while retaining speed gains.

citing papers explorer

Showing 1 of 1 citing paper.

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery cs.AI · 2026-06-01 · conditional · none · ref 41
2-bit quantized reasoning models exhibit process failures like loops and delayed commitment that degrade end-to-end performance, but FP16 planning and loop rescue recover accuracy on MATH-500 from 17.2% to 74.2% for Qwen3-8B while retaining speed gains.

QSpec: Speculative decoding with complementaryquantizationschemes

fields

years

verdicts

representative citing papers

citing papers explorer