pith. sign in
Pith Number

pith:FCZGIG6Y

pith:2026:FCZGIG6YM6MT5QELDJIKM7ZAA6
not attested not anchored not stored refs pending

FlashSampling: Fast and Memory-Efficient Exact Sampling

Mengdi Wang, Tomas Ruiz, Xuyang Shen, Yifan Zhang, Yiran Zhong, Zhen Qin

FlashSampling fuses exact categorical sampling into the LM-head matrix multiply so the full logits tensor is never written to HBM.

arxiv:2603.15854 v2 · 2026-03-16 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FCZGIG6YM6MT5QELDJIKM7ZAA6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

FlashSampling is an exact sampling primitive that fuses sampling into the LM-head matmul and never materializes the logits tensor in HBM; in tensor-parallel decoding it replaces all-gather with streaming peer-to-peer writes, achieving kernel-level speedups and up to 10% reduction in time per output token.

C2weakest assumption

The claim that argmax decomposes exactly over vocabulary partitions (and that the hierarchical factorization for grouped variants preserves exact categorical sampling) holds without numerical error or edge-case failure when tiles are processed independently on-chip.

C3one line summary

FlashSampling performs exact Gumbel-max sampling inside the LM-head matmul via on-chip tiling and hierarchical argmax reduction, delivering up to 10% faster token generation in vLLM on datacenter GPUs without approximation.

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-18T03:10:03.435468Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

28b2641bd867993ec08b1a50a67f20079be979139708f2d2798dec5064f538c1

Aliases

arxiv: 2603.15854 · arxiv_version: 2603.15854v2 · doi: 10.48550/arxiv.2603.15854 · pith_short_12: FCZGIG6YM6MT · pith_short_16: FCZGIG6YM6MT5QEL · pith_short_8: FCZGIG6Y
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FCZGIG6YM6MT5QELDJIKM7ZAA6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 28b2641bd867993ec08b1a50a67f20079be979139708f2d2798dec5064f538c1
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c0d3d84724b127fce2caceeb33a5adc2fc84b887648672cb76874fab169bb4a0",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-16T19:37:08Z",
    "title_canon_sha256": "e435c8655d84f9c3aa0afc24a8b8288f6ba2cdf4d13ca277893c96bab91d73a8"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.15854",
    "kind": "arxiv",
    "version": 2
  }
}