pith. sign in

hub

Omniquant: Omnidirectionally calibrated quantization for large lan- guage models.arXiv preprint arXiv:2308.13137

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

hub tools

citation-role summary

background 2 method 1

citation-polarity summary

representative citing papers

SpinQuant: LLM quantization with learned rotations

cs.LG · 2024-05-26 · conditional · novelty 7.0

SpinQuant learns optimal rotations to enable accurate 4-bit quantization of LLM weights, activations, and KV cache, reducing the zero-shot gap to full precision to 2.9 points on LLaMA-2 7B.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

Theory-optimal Quantization Based on Flatness

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

The paper introduces the Flatness metric, derives a theory-optimal quantization solution, and presents BDQ that uses bidirectional diagonal transformations to reduce outlier impact, achieving under 1% drop at W4A4 on LLaMA-3-8B.

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

cs.LG · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

OSAQ suppresses weight outliers in LLMs via a closed-form additive transformation from the Hessian's stable null space, improving 2-bit quantization perplexity by over 40% versus vanilla GPTQ with no inference overhead.

Spike-driven Large Language Model

cs.NE · 2026-04-11 · unverdicted · novelty 6.0

SDLLM is a spike-driven LLM that uses gamma-SQP two-step encoding, bidirectional symmetric quantization, and membrane potential clipping to achieve 7x lower energy consumption and 4.2% higher accuracy than prior spike-based language models.

You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations

cs.CL · 2025-11-09 · conditional · novelty 6.0

TAQ estimates per-layer importance from hidden representations and output sensitivity on task calibration data to allocate mixed precision in a training-free PTQ setting, outperforming task-agnostic baselines on accuracy-memory ratio across benchmarks.

citing papers explorer

Showing 21 of 21 citing papers.