pith:FCZGIG6Y
FlashSampling: Fast and Memory-Efficient Exact Sampling
FlashSampling fuses exact categorical sampling into the LM-head matrix multiply so the full logits tensor is never written to HBM.
arxiv:2603.15854 v2 · 2026-03-16 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FCZGIG6YM6MT5QELDJIKM7ZAA6}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
FlashSampling is an exact sampling primitive that fuses sampling into the LM-head matmul and never materializes the logits tensor in HBM; in tensor-parallel decoding it replaces all-gather with streaming peer-to-peer writes, achieving kernel-level speedups and up to 10% reduction in time per output token.
The claim that argmax decomposes exactly over vocabulary partitions (and that the hierarchical factorization for grouped variants preserves exact categorical sampling) holds without numerical error or edge-case failure when tiles are processed independently on-chip.
FlashSampling performs exact Gumbel-max sampling inside the LM-head matmul via on-chip tiling and hierarchical argmax reduction, delivering up to 10% faster token generation in vLLM on datacenter GPUs without approximation.
Cited by
Receipt and verification
| First computed | 2026-05-18T03:10:03.435468Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
28b2641bd867993ec08b1a50a67f20079be979139708f2d2798dec5064f538c1
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FCZGIG6YM6MT5QELDJIKM7ZAA6 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 28b2641bd867993ec08b1a50a67f20079be979139708f2d2798dec5064f538c1
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "c0d3d84724b127fce2caceeb33a5adc2fc84b887648672cb76874fab169bb4a0",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-03-16T19:37:08Z",
"title_canon_sha256": "e435c8655d84f9c3aa0afc24a8b8288f6ba2cdf4d13ca277893c96bab91d73a8"
},
"schema_version": "1.0",
"source": {
"id": "2603.15854",
"kind": "arxiv",
"version": 2
}
}