Competesmoe--effective training of sparse mixture of experts via competition

Quang Pham, Giang Do, Huy Nguyen, TrungTin Nguyen, Chenghao Liu, Mina Sartipi, Binh T Nguyen, Savitha Ramasamy, Xiaoli Li, Steven Hoi, et al · 2024 · arXiv 2402.02526

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Convergence Rates for Latent Mixing Measures in Infinite Homoscedastic Location-Scale Mixture Models

math.ST · 2026-05-07 · unverdicted · novelty 8.0

The paper provides novel lower bounds connecting L1 distances of mixture densities to discrepancies in mixing measures, leading to first contraction rates for Dirichlet process mixtures with unknown scale.

Tight Clusters Make Specialized Experts

cs.LG · 2025-02-21 · unverdicted · novelty 6.0

Introduces Adaptive Clustering router for MoE models that scales features to identify tight expert clusters, yielding faster convergence, robustness to corruption, and performance gains.

citing papers explorer

Showing 2 of 2 citing papers.

Convergence Rates for Latent Mixing Measures in Infinite Homoscedastic Location-Scale Mixture Models math.ST · 2026-05-07 · unverdicted · none · ref 95
The paper provides novel lower bounds connecting L1 distances of mixture densities to discrepancies in mixing measures, leading to first contraction rates for Dirichlet process mixtures with unknown scale.
Tight Clusters Make Specialized Experts cs.LG · 2025-02-21 · unverdicted · none · ref 35
Introduces Adaptive Clustering router for MoE models that scales features to identify tight expert clusters, yielding faster convergence, robustness to corruption, and performance gains.

Competesmoe--effective training of sparse mixture of experts via competition

fields

years

verdicts

representative citing papers

citing papers explorer