A multiplication-only truncated Neumann approximation for matrix inversion in quantized Gated DeltaNet linear attention delivers up to 5x kernel speedup and 20% decode overhead reduction while preserving accuracy on Qwen3.5 models.
Neural network quantization with AI model efficiency toolkit (AIMET)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet
A multiplication-only truncated Neumann approximation for matrix inversion in quantized Gated DeltaNet linear attention delivers up to 5x kernel speedup and 20% decode overhead reduction while preserving accuracy on Qwen3.5 models.