pith. sign in

arxiv: 2605.29121 · v1 · pith:TIB4MVMRnew · submitted 2026-05-27 · 🧮 math.DS · cs.AI· cs.LG

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router

classification 🧮 math.DS cs.AIcs.LG
keywords modelbifurcationloadsmalladaptivecuspexpertfeedback
0
0 comments X
read the original abstract

We propose a minimal dynamical model of adaptive softmax routing for a two-expert Mixture-of-Experts (MoE) layer. The model is obtained as a mean-field limit of a discrete reinforcement rule: the selected expert receives a small score increment, while all scores undergo regularizing decay. In the symmetric case the limiting system has a supercritical pitchfork bifurcation: for weak feedback there is a unique stable balanced state, whereas above a critical feedback strength two stable asymmetric states appear. When an external asymmetry is added, the pitchfork unfolds into a pair of fold bifurcations forming a cusp in the control-parameter plane. We derive exact parametric equations for the bifurcation set and the local normal form of the cusp catastrophe. Numerical experiments connect this picture to empirical expert load, a small trainable MoE model, hard top-1 PyTorch routing, and a small classification experiment on digits. The results provide a controlled low-dimensional mechanism for abrupt transitions to load imbalance in adaptive MoE routers.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.