pith. sign in
Pith Number

pith:F3SQWA3G

pith:2024:F3SQWA3GM44IXO5Y3EQ3SSF4O5
not attested not anchored not stored refs resolved

SpinQuant: LLM quantization with learned rotations

Bilge Soran, Changsheng Zhao, Dhruv Choudhary, Igor Fedorov, Raghuraman Krishnamoorthi, Tijmen Blankevoort, Vikas Chandra, Yuandong Tian, Zechun Liu

SpinQuant learns rotation matrices to quantize LLM weights, activations, and KV cache to 4 bits while keeping outputs identical in full precision.

arxiv:2405.16406 v4 · 2024-05-26 · cs.LG · cs.AI · cs.CL · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{F3SQWA3GM44IXO5Y3EQ3SSF4O5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points.

C2weakest assumption

That learned rotation matrices found on calibration data will generalize to preserve full-precision outputs and improve quantization accuracy across diverse downstream tasks without introducing new errors.

C3one line summary

SpinQuant learns optimal rotations to enable accurate 4-bit quantization of LLM weights, activations, and KV cache, reducing the zero-shot gap to full precision to 2.9 points on LLaMA-2 7B.

References

33 extracted · 33 resolved · 14 Pith anchors

[1] GPT-4 Technical Report · arXiv:2303.08774
[2] BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions 1905 · arXiv:1905.10044
[3] Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge · arXiv:1803.05457
[4] Extreme compression of large language models via additive quantization
[5] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers · arXiv:2210.17323

Formal links

3 machine-checked theorem links

Cited by

37 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:51.013172Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2ee50b036667388bbbb8d921b948bc7750779dea0b22b98eabe92be28d7cfed6

Aliases

arxiv: 2405.16406 · arxiv_version: 2405.16406v4 · doi: 10.48550/arxiv.2405.16406 · pith_short_12: F3SQWA3GM44I · pith_short_16: F3SQWA3GM44IXO5Y · pith_short_8: F3SQWA3G
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/F3SQWA3GM44IXO5Y3EQ3SSF4O5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2ee50b036667388bbbb8d921b948bc7750779dea0b22b98eabe92be28d7cfed6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1538f48e6afe2d097304cf1c4b0d8a7c258b50ad60ddaf75d8de3fc142c5dd7a",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-05-26T02:15:49Z",
    "title_canon_sha256": "ba0aab9cac8a079304b5cac58e1b79301b39a03caf2c9306481eb82595e638bd"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2405.16406",
    "kind": "arxiv",
    "version": 4
  }
}