SpenseGPT introduces a hybrid sparse-dense weight format and one-shot pruning that delivers 1.2x end-to-end LLM decoding speedup on B200 GPUs with FP8 while preserving accuracy on Qwen3-32B and Seed-OSS-36B.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference
SpenseGPT introduces a hybrid sparse-dense weight format and one-shot pruning that delivers 1.2x end-to-end LLM decoding speedup on B200 GPUs with FP8 while preserving accuracy on Qwen3-32B and Seed-OSS-36B.