pith. sign in
Pith Number

pith:B3J5DKC6

pith:2026:B3J5DKC6BASQEIV4OTLB3DOYRW
not attested not anchored not stored refs resolved

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

Alireza Dadgarnia, Dan Alistarh, Eldar Kurtic, Mahdi Nikdan, Maximilian Kleinegger, Michael Helcig, Soroush Tabesh

Gumbel-Softmax relaxation of discrete grid choices lets scalar quantization recover most accuracy of vector methods at 2-3 bits while staying kernel-compatible.

arxiv:2604.18556 v2 · 2026-04-20 · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{B3J5DKC6BASQEIV4OTLB3DOYRW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

GSQ closes most of the gap between scalar quantization and the QTIP frontier at 2 and 3 bits, while using a symmetric scalar grid with group-wise quantization and thus remains compatible with existing scalar inference kernels.

C2weakest assumption

The Gumbel-Softmax relaxation of the discrete grid assignment problem converges to high-quality discrete solutions without introducing optimization bias or instability that would degrade final quantized model accuracy on held-out tasks.

C3one line summary

GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.

References

38 extracted · 38 resolved · 12 Pith anchors

[1] arXiv preprint arXiv:2402.11960 , year=
[2] Symbolic discovery of optimization algorithms
[3] Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge · arXiv:1803.05457
[4] Differentiable model compression via pseudo quantiza- tion noise
[5] 8-bit optimizers via block-wise quantization

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-20T00:00:39.153514Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0ed3d1a85e08250222bc74d61d8dd88d88eabd544cffd0738734d2fb3e214eea

Aliases

arxiv: 2604.18556 · arxiv_version: 2604.18556v2 · doi: 10.48550/arxiv.2604.18556 · pith_short_12: B3J5DKC6BASQ · pith_short_16: B3J5DKC6BASQEIV4 · pith_short_8: B3J5DKC6
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/B3J5DKC6BASQEIV4OTLB3DOYRW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0ed3d1a85e08250222bc74d61d8dd88d88eabd544cffd0738734d2fb3e214eea
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "b341284d823cd8b432ec6a414ceaede7be8264348bbdf5e5e536f9e76c1d361e",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-04-20T17:45:47Z",
    "title_canon_sha256": "fbcb9f07fa494b509eb1ecd05968813911ed0e6af294e3e02f99be79a85dab3e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.18556",
    "kind": "arxiv",
    "version": 2
  }
}