Low-bit model quantization for deep neural networks: A survey.arXiv preprint arXiv:2505.05530

Low-bit model quantization for deep neural networks: A survey , author= · 2025 · arXiv 2505.05530

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Zero-Shot Quantization via Weight-Space Arithmetic

cs.CV · 2026-04-03 · unverdicted · novelty 8.0

A quantization vector derived from a donor model via weight-space arithmetic can be added to a receiver model to improve post-PTQ Top-1 accuracy by up to 60 points in 3-bit settings without receiver-side QAT or data.

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.

MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

MARR uses per-module adaptive residual scaling updated by PID feedback to balance error correction against Hessian-approximation bias in low-bit PTQ.

TinyNeRV: Compact Neural Video Representations via Capacity Scaling, Distillation, and Low-Precision Inference

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

Tiny NeRV models using capacity scaling, frequency-aware distillation, and low-precision quantization achieve favorable quality-efficiency trade-offs with far fewer parameters and lower computational costs than standard NeRV.

citing papers explorer

Showing 4 of 4 citing papers.

Zero-Shot Quantization via Weight-Space Arithmetic cs.CV · 2026-04-03 · unverdicted · none · ref 4
A quantization vector derived from a donor model via weight-space arithmetic can be added to a receiver model to improve post-PTQ Top-1 accuracy by up to 60 points in 3-bit settings without receiver-side QAT or data.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices cs.LG · 2026-05-11 · unverdicted · none · ref 179 · 3 links
DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.
MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization cs.LG · 2026-05-18 · unverdicted · none · ref 22
MARR uses per-module adaptive residual scaling updated by PID feedback to balance error correction against Hessian-approximation bias in low-bit PTQ.
TinyNeRV: Compact Neural Video Representations via Capacity Scaling, Distillation, and Low-Precision Inference cs.CV · 2026-04-10 · unverdicted · none · ref 60
Tiny NeRV models using capacity scaling, frequency-aware distillation, and low-precision quantization achieve favorable quality-efficiency trade-offs with far fewer parameters and lower computational costs than standard NeRV.

Low-bit model quantization for deep neural networks: A survey.arXiv preprint arXiv:2505.05530

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer