hub

Rptq: Reorder-based post- training quantization for large language models.arXiv preprint arXiv:2304.01089

Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu · 2023 · arXiv 2304.01089

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

cs.LG · 2026-02-23 · unverdicted · novelty 7.0

QuantVLA is the first post-training quantization framework for VLA models that quantizes the diffusion transformer action head and reports higher task success rates than full-precision baselines with roughly 70% memory savings on the quantized components.

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

cs.LG · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

OSAQ suppresses weight outliers in LLMs via a closed-form additive transformation from the Hessian's stable null space, improving 2-bit quantization perplexity by over 40% versus vanilla GPTQ with no inference overhead.

LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

LBLLM achieves better accuracy than prior binarization methods for LLMs by decoupling weight and activation quantization through initialization, layer-wise distillation, and learnable activation scaling.

Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

DASH-Q uses a stable diagonal curvature estimate and weighted least squares to achieve robust ultra-low-bit post-training quantization of LLMs, improving zero-shot accuracy by 7% on average over baselines.

Rethinking Residual Errors in Compensation-based LLM Quantization

cs.LG · 2026-04-09 · conditional · novelty 6.0

Redefining residual errors to include compensation-aware discrepancies and realigning calibration to full-precision outputs improves GPTQ and GPTAQ performance on LLMs.

You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations

cs.CL · 2025-11-09 · conditional · novelty 6.0

TAQ estimates per-layer importance from hidden representations and output sensitivity on task calibration data to allocate mixed precision in a training-free PTQ setting, outperforming task-agnostic baselines on accuracy-memory ratio across benchmarks.

AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization

cs.CV · 2025-03-05 · unverdicted · novelty 6.0

AHCQ-SAM introduces ACNR, HLUQ, CAG, and LNQ quantization techniques that deliver 15.2% mAP gain on 4-bit SAM-B and 14.01% J&F gain on 4-bit SAM2-Tiny versus prior PTQ methods.

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

cs.CL · 2023-12-10 · unverdicted · novelty 6.0

ASVD compresses LLMs by 10-30% and KV caches by 50% via activation-aware SVD that absorbs outliers into transformed weights and calibrates per-layer sensitivity.

SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

cs.CL · 2026-04-11 · unverdicted · novelty 5.0

SEPTQ simplifies LLM post-training quantization to two steps via static global importance scoring and mask-guided column-wise weight updates, claiming superior results over baselines in low-bit settings.

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

cs.LG · 2024-12-19 · unverdicted · novelty 5.0

MixLLM uses global output-feature importance to set mixed bit-widths for LLM quantization and adds two-step dequantization plus software pipelining for system efficiency.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

citing papers explorer

Showing 11 of 11 citing papers.

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models cs.LG · 2026-02-23 · unverdicted · none · ref 46
QuantVLA is the first post-training quantization framework for VLA models that quantizes the diffusion transformer action head and reports higher task success rates than full-precision baselines with roughly 70% memory savings on the quantized components.
OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization cs.LG · 2026-05-06 · unverdicted · none · ref 17 · 2 links
OSAQ suppresses weight outliers in LLMs via a closed-form additive transformation from the Hessian's stable null space, improving 2-bit quantization perplexity by over 40% versus vanilla GPTQ with no inference overhead.
LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation cs.LG · 2026-04-21 · unverdicted · none · ref 27
LBLLM achieves better accuracy than prior binarization methods for LLMs by decoupling weight and activation quantization through initialization, layer-wise distillation, and learnable activation scaling.
Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate cs.LG · 2026-04-15 · unverdicted · none · ref 46
DASH-Q uses a stable diagonal curvature estimate and weighted least squares to achieve robust ultra-low-bit post-training quantization of LLMs, improving zero-shot accuracy by 7% on average over baselines.
Rethinking Residual Errors in Compensation-based LLM Quantization cs.LG · 2026-04-09 · conditional · none · ref 19
Redefining residual errors to include compensation-aware discrepancies and realigning calibration to full-precision outputs improves GPTQ and GPTAQ performance on LLMs.
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations cs.CL · 2025-11-09 · conditional · none · ref 66
TAQ estimates per-layer importance from hidden representations and output sensitivity on task calibration data to allocate mixed precision in a training-free PTQ setting, outperforming task-agnostic baselines on accuracy-memory ratio across benchmarks.
AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization cs.CV · 2025-03-05 · unverdicted · none · ref 42
AHCQ-SAM introduces ACNR, HLUQ, CAG, and LNQ quantization techniques that deliver 15.2% mAP gain on 4-bit SAM-B and 14.01% J&F gain on 4-bit SAM2-Tiny versus prior PTQ methods.
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models cs.CL · 2023-12-10 · unverdicted · none · ref 24
ASVD compresses LLMs by 10-30% and KV caches by 50% via activation-aware SVD that absorbs outliers into transformed weights and calibrates per-layer sensitivity.
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models cs.CL · 2026-04-11 · unverdicted · none · ref 43
SEPTQ simplifies LLM post-training quantization to two steps via static global importance scoring and mask-guided column-wise weight updates, claiming superior results over baselines in low-bit settings.
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design cs.LG · 2024-12-19 · unverdicted · none · ref 48
MixLLM uses global output-feature importance to set mixed bit-widths for LLM quantization and adds two-step dequantization plus software pipelining for system efficiency.
A Survey on Efficient Inference for Large Language Models cs.CL · 2024-04-22 · accept · none · ref 207
The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

Rptq: Reorder-based post- training quantization for large language models.arXiv preprint arXiv:2304.01089

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer