{"work":{"id":"267500ca-1512-478f-8a1b-6ecbdb09771d","openalex_id":null,"doi":null,"arxiv_id":"2408.15664","raw_key":null,"title":"Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts","authors":null,"authors_text":"Lean Wang, Huazuo Gao, Chenggang Zhao, Xu Sun, Damai Dai","year":2024,"venue":"cs.LG","abstract":"For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients during training, we propose Loss-Free Balancing, featured by an auxiliary-loss-free load balancing strategy. To be specific, before the top-K routing decision, Loss-Free Balancing will first apply an expert-wise bias to the routing scores of each expert. By dynamically updating the bias of each expert according to its recent load, Loss-Free Balancing can consistently maintain a balanced distribution of expert load. In addition, since Loss-Free Balancing does not produce any interference gradients, it also elevates the upper bound of model performance gained from MoE training. We validate the performance of Loss-Free Balancing on MoE models with up to 3B parameters trained on up to 200B tokens. Experimental results show that Loss-Free Balancing achieves both better performance and better load balance compared with traditional auxiliary-loss-controlled load balancing strategies.","external_url":"https://arxiv.org/abs/2408.15664","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-21T15:40:18.978147+00:00","pith_arxiv_id":"2408.15664","created_at":"2026-05-08T18:28:57.535678+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts","render_title":"Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts"},"hub":{"state":{"work_id":"267500ca-1512-478f-8a1b-6ecbdb09771d","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":34,"external_cited_by_count":null,"distinct_field_count":7,"first_pith_cited_at":"2024-12-13T17:37:48+00:00","last_pith_cited_at":"2026-05-19T15:01:39+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-09T14:34:56.964850+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":6},{"context_role":"method","n":2},{"context_role":"baseline","n":1},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":6},{"context_polarity":"use_method","n":2},{"context_polarity":"baseline","n":1},{"context_polarity":"unclear","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}