Diagnoses five failure modes in Token-Choice MoE routing for visual diffusion transformers and proposes the Functional Redundancy Hypothesis to explain selective deadlock.
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.JMLR, 2022
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Sparse Mixture-of-Experts Routing in Visual Diffusion Transformers:Diagnosis, Boundary Calibration and Evolutionary Roadmap from Routing Collapse to Selective Deadlock
Diagnoses five failure modes in Token-Choice MoE routing for visual diffusion transformers and proposes the Functional Redundancy Hypothesis to explain selective deadlock.