SARQC augments standard PTQ calibration with a saliency-aware regularizer to keep quantized weights closer to original floating-point values, yielding improved perplexity and zero-shot accuracy on dense and MoE LLMs.
Awq: Activation-aware weight quantization for on-device llm compression and acceleration
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Attention sinks induce gradient sinks under causal masking, with massive activations serving as adaptive RMSNorm regulators that attenuate localized gradient pressure in Transformer training.
citing papers explorer
-
Saliency-Aware Regularized Quantization Calibration for Large Language Models
SARQC augments standard PTQ calibration with a saliency-aware regularizer to keep quantized weights closer to original floating-point values, yielding improved perplexity and zero-shot accuracy on dense and MoE LLMs.
-
Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers
Attention sinks induce gradient sinks under causal masking, with massive activations serving as adaptive RMSNorm regulators that attenuate localized gradient pressure in Transformer training.