pith. sign in
Pith Number

pith:O4RUHQ6F

pith:2026:O4RUHQ6FO7NAIIIKB4HIKDISHV
not attested not anchored not stored refs resolved

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

Fuyu Lv, Jialiang Cheng, Juntong Wu, Li Yuan, Ou Dan, Qishen Yin, Yue Dai, Yuliang Yan

Trainable binary masks let MoE models pick experts token-by-token, cutting expert-layer FLOPs by up to 85 percent while keeping more than 98 percent of original accuracy.

arxiv:2605.14438 v1 · 2026-05-14 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{O4RUHQ6FO7NAIIIKB4HIKDISHV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

BEAM retains over 98% of the original model's performance while reducing MoE layer FLOPs by up to 85%, achieving up to 2.5× faster decoding and 1.4× higher throughput, as a practical plug-and-play solution.

C2weakest assumption

That the binary masks learned during training will generalize well to inference without significant mismatch, and that the straight-through estimator combined with the auxiliary loss can induce effective sparsity without degrading model capability.

C3one line summary

BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.

References

25 extracted · 25 resolved · 9 Pith anchors

[1] Da-moe: Towards dy- namic expert allocation for mixture-of-experts models
[2] ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding · arXiv:2604.14612
[3] Qwen Technical Report · arXiv:2309.16609
[4] Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation · arXiv:1308.3432
[5] BoolQ: Exploring the surprising difficulty of natural yes/no questions 2019
Receipt and verification
First computed 2026-05-17T23:39:07.053456Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

772343c3c577da04210a0f0e850d123d5ab13c3d5508fac0d34dbf7b580dff8e

Aliases

arxiv: 2605.14438 · arxiv_version: 2605.14438v1 · doi: 10.48550/arxiv.2605.14438 · pith_short_12: O4RUHQ6FO7NA · pith_short_16: O4RUHQ6FO7NAIIIK · pith_short_8: O4RUHQ6F
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 772343c3c577da04210a0f0e850d123d5ab13c3d5508fac0d34dbf7b580dff8e
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f55f67de57dd4f29b5d41dca7d04ce921475bdd72c6d0553c4a886129a031cd6",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-14T06:33:41Z",
    "title_canon_sha256": "2619a11dffbed1535207813ea1019ce681a99d86d6f9dca78f18cba0e2efe99a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14438",
    "kind": "arxiv",
    "version": 1
  }
}