Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry

Ye Su , Huayi Tang , Zixuan Gong , Yong Liu

Authors on Pith no claims yet

classification 💻 cs.LG

keywords capacitycombinatorialexpressivityroutingtropicalarchitecturesboundscollapse

read the original abstract

While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this work, we present the first analysis of MoE through the lens of tropical geometry, establishing that the Top-$k$ routing mechanism is algebraically isomorphic to the $k$-th elementary symmetric tropical polynomial. This isomorphism partitions the input space into the Normal Fan of a Hypersimplex, revealing that \textbf{sparsity is combinatorial depth} which scales geometric capacity by the binomial coefficient $\binom{N}{k}$. Moving beyond ambient bounds, we introduce the concept of \textit{Effective Capacity} under the Manifold Hypothesis. We prove that while dense networks suffer from capacity collapse on low-dimensional data, MoE architectures exhibit \textit{Combinatorial Resilience}, maintaining high expressivity via the transversality of routing cones. Translating these theoretical bounds into architectural principles, we derive asymptotic capacity limits for optimal expert granularity and prove that shared experts are geometrically necessary to prevent routing collapse.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Expressivity of Transformers: A Tropical Geometry Perspective
cs.LG 2026-04 unverdicted novelty 6.0

Self-attention in transformers corresponds exactly to Power Voronoi diagrams under tropical geometry, yielding tight bounds of Theta(N to the power of d_model times L) linear regions.