{"paper":{"title":"L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Projecting mixture-of-experts routing to a low-rank latent space with saturated inner-product scoring yields smoother geometry and stronger expert specialization.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Guang Li, Miki Haseyama, Minghao Yang, Ren Togo, Takahiro Ogawa","submitted_at":"2026-01-29T07:18:33Z","abstract_excerpt":"Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small subset of experts, where the router plays a central role in determining expert specialization and overall model performance. However, many modern MoE systems still adopt linear routers in raw high-dimensional representation spaces, where representation mismatch, angular concentration, and scale-sensitive scoring can jointly undermine routing discriminability and stable expert specialization. In this work, we propose Low-rank & Lipschitz-controlled Routing (L2R), a unified routing framework that reshapes b"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"L2R consistently improves routing geometry, expert discrimination, and overall model performance on OLMoE-based language models and ImageNet vision MoE settings.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That projecting to a low-rank latent space plus saturated inner-product scoring preserves sufficient expressiveness while only improving stability, without hidden costs in specialization or generalization.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"L2R improves MoE performance by routing in a low-rank space with Lipschitz-controlled saturated inner-product scoring and multi-anchor mechanisms.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Projecting mixture-of-experts routing to a low-rank latent space with saturated inner-product scoring yields smoother geometry and stronger expert specialization.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"dfbca722c023f096f5c40604e7888af8da156f7471bd404aff2ce5a4649e8226"},"source":{"id":"2601.21349","kind":"arxiv","version":2},"verdict":{"id":"1319a854-684a-495e-bd0a-4cc1ffca5d3a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T09:53:56.842045Z","strongest_claim":"L2R consistently improves routing geometry, expert discrimination, and overall model performance on OLMoE-based language models and ImageNet vision MoE settings.","one_line_summary":"L2R improves MoE performance by routing in a low-rank space with Lipschitz-controlled saturated inner-product scoring and multi-anchor mechanisms.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That projecting to a low-rank latent space plus saturated inner-product scoring preserves sufficient expressiveness while only improving stability, without hidden costs in specialization or generalization.","pith_extraction_headline":"Projecting mixture-of-experts routing to a low-rank latent space with saturated inner-product scoring yields smoother geometry and stronger expert specialization."},"references":{"count":19,"sample":[{"doi":"","year":1905,"title":"BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions","work_id":"511eeb84-4b95-46d5-b14f-50da43f4f19f","ref_index":1,"cited_arxiv_id":"1905.10044","is_internal_anchor":true},{"doi":"","year":null,"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","ref_index":2,"cited_arxiv_id":"1803.05457","is_internal_anchor":true},{"doi":"","year":null,"title":"DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models","work_id":"a9888d6d-bf47-4324-9834-7cc12ac3a78c","ref_index":3,"cited_arxiv_id":"2401.06066","is_internal_anchor":true},{"doi":"","year":null,"title":"Toy Models of Superposition","work_id":"43875dbe-bc2d-4ab5-af63-744411533ff7","ref_index":4,"cited_arxiv_id":"2209.10652","is_internal_anchor":true},{"doi":"","year":null,"title":"Mixtral of Experts","work_id":"0de8c352-9daa-4e1e-8c7b-3d0dec69f369","ref_index":5,"cited_arxiv_id":"2401.04088","is_internal_anchor":true}],"resolved_work":19,"snapshot_sha256":"bb7ec696bafdc431a09b60052fffd05a9eaa8473f6cb9dbef9535d979314ecc7","internal_anchors":10},"formal_canon":{"evidence_count":2,"snapshot_sha256":"a30533bc8974317261516c655c679397f50fec86035a63bb5d5fc902dd26b687"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}