MoE experts in pretrained Transformers exhibit functional decorrelation with near-zero Jacobian alignment yet occupy partially overlapping representation subspaces, with routing sparsity modulating the geometry.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Geometric Asymmetry in MoE Specialization: Functional Decorrelation and Representational Overlap
MoE experts in pretrained Transformers exhibit functional decorrelation with near-zero Jacobian alignment yet occupy partially overlapping representation subspaces, with routing sparsity modulating the geometry.