{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:O4RUHQ6FO7NAIIIKB4HIKDISHV","short_pith_number":"pith:O4RUHQ6F","schema_version":"1.0","canonical_sha256":"772343c3c577da04210a0f0e850d123d5ab13c3d5508fac0d34dbf7b580dff8e","source":{"kind":"arxiv","id":"2605.14438","version":1},"attestation_state":"computed","paper":{"title":"BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"Trainable binary masks let MoE models pick experts token-by-token, cutting expert-layer FLOPs by up to 85 percent while keeping more than 98 percent of original accuracy.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Fuyu Lv, Jialiang Cheng, Juntong Wu, Li Yuan, Ou Dan, Qishen Yin, Yue Dai, Yuliang Yan","submitted_at":"2026-05-14T06:33:41Z","abstract_excerpt":"Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and suboptimal inference latency. Existing acceleration methods either require costly retraining with architectural changes or suffer from severe performance drop at high sparsity due to train-inference mismatch. To address these limitations, we propose BEAM (Binary Expert Activation Masking), a novel method that learns token-adaptive expert selection via trainable "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.14438","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","primary_cat":"cs.AI","submitted_at":"2026-05-14T06:33:41Z","cross_cats_sorted":[],"title_canon_sha256":"2619a11dffbed1535207813ea1019ce681a99d86d6f9dca78f18cba0e2efe99a","abstract_canon_sha256":"f55f67de57dd4f29b5d41dca7d04ce921475bdd72c6d0553c4a886129a031cd6"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:07.054058Z","signature_b64":"IPjC3GYDblAX+LHkYEaKrUAguzIwQ7T6+voWOmTalPS9sBgYuU8VoIuIOjeLKVXc5Sfae9+J1pBfAoEtzE4BDw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"772343c3c577da04210a0f0e850d123d5ab13c3d5508fac0d34dbf7b580dff8e","last_reissued_at":"2026-05-17T23:39:07.053456Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:07.053456Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"Trainable binary masks let MoE models pick experts token-by-token, cutting expert-layer FLOPs by up to 85 percent while keeping more than 98 percent of original accuracy.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Fuyu Lv, Jialiang Cheng, Juntong Wu, Li Yuan, Ou Dan, Qishen Yin, Yue Dai, Yuliang Yan","submitted_at":"2026-05-14T06:33:41Z","abstract_excerpt":"Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and suboptimal inference latency. Existing acceleration methods either require costly retraining with architectural changes or suffer from severe performance drop at high sparsity due to train-inference mismatch. To address these limitations, we propose BEAM (Binary Expert Activation Masking), a novel method that learns token-adaptive expert selection via trainable "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"BEAM retains over 98% of the original model's performance while reducing MoE layer FLOPs by up to 85%, achieving up to 2.5× faster decoding and 1.4× higher throughput, as a practical plug-and-play solution.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the binary masks learned during training will generalize well to inference without significant mismatch, and that the straight-through estimator combined with the auxiliary loss can induce effective sparsity without degrading model capability.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Trainable binary masks let MoE models pick experts token-by-token, cutting expert-layer FLOPs by up to 85 percent while keeping more than 98 percent of original accuracy.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"025546b93eb4c03000c0af869dcf3ecdfd4721fdccbe18ba8d103656f1a5cacd"},"source":{"id":"2605.14438","kind":"arxiv","version":1},"verdict":{"id":"8f6cfee4-f40e-4595-8e61-c6f61280a7d1","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T01:55:47.778168Z","strongest_claim":"BEAM retains over 98% of the original model's performance while reducing MoE layer FLOPs by up to 85%, achieving up to 2.5× faster decoding and 1.4× higher throughput, as a practical plug-and-play solution.","one_line_summary":"BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the binary masks learned during training will generalize well to inference without significant mismatch, and that the straight-through estimator combined with the auxiliary loss can induce effective sparsity without degrading model capability.","pith_extraction_headline":"Trainable binary masks let MoE models pick experts token-by-token, cutting expert-layer FLOPs by up to 85 percent while keeping more than 98 percent of original accuracy."},"references":{"count":25,"sample":[{"doi":"","year":null,"title":"Da-moe: Towards dy- namic expert allocation for mixture-of-experts models","work_id":"5fd4b7b5-86f8-48ca-b804-d3a3ae6abd11","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding","work_id":"a1255ddd-2b2e-4f7b-bbb5-ea72ee18bfcf","ref_index":2,"cited_arxiv_id":"2604.14612","is_internal_anchor":true},{"doi":"","year":null,"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","ref_index":3,"cited_arxiv_id":"2309.16609","is_internal_anchor":true},{"doi":"","year":null,"title":"Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation","work_id":"1fe8c7c8-aff7-4b94-9096-e549d7e60789","ref_index":4,"cited_arxiv_id":"1308.3432","is_internal_anchor":true},{"doi":"","year":2019,"title":"BoolQ: Exploring the surprising difficulty of natural yes/no questions","work_id":"8d3d9bd8-a118-422a-a695-404ed9e21211","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":25,"snapshot_sha256":"63db9dbba5e37299dd7bf8bfe014982d21c651820c9be4abc6a0239f9c9d37ad","internal_anchors":9},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.14438","created_at":"2026-05-17T23:39:07.053568+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.14438v1","created_at":"2026-05-17T23:39:07.053568+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14438","created_at":"2026-05-17T23:39:07.053568+00:00"},{"alias_kind":"pith_short_12","alias_value":"O4RUHQ6FO7NA","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"O4RUHQ6FO7NAIIIK","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"O4RUHQ6F","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV","json":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV.json","graph_json":"https://pith.science/api/pith-number/O4RUHQ6FO7NAIIIKB4HIKDISHV/graph.json","events_json":"https://pith.science/api/pith-number/O4RUHQ6FO7NAIIIKB4HIKDISHV/events.json","paper":"https://pith.science/paper/O4RUHQ6F"},"agent_actions":{"view_html":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV","download_json":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV.json","view_paper":"https://pith.science/paper/O4RUHQ6F","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.14438&json=true","fetch_graph":"https://pith.science/api/pith-number/O4RUHQ6FO7NAIIIKB4HIKDISHV/graph.json","fetch_events":"https://pith.science/api/pith-number/O4RUHQ6FO7NAIIIKB4HIKDISHV/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV/action/timestamp_anchor","attest_storage":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV/action/storage_attestation","attest_author":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV/action/author_attestation","sign_citation":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV/action/citation_signature","submit_replication":"https://pith.science/pith/O4RUHQ6FO7NAIIIKB4HIKDISHV/action/replication_record"}},"created_at":"2026-05-17T23:39:07.053568+00:00","updated_at":"2026-05-17T23:39:07.053568+00:00"}