pith. sign in

hub

MoETuner: Optimized mixture of expert serving with balanced expert placement and token routing

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

dataset 1

citation-polarity summary

fields

cs.DC 8 cs.LG 3

years

2026 8 2025 3

roles

dataset 1

polarities

use dataset 1

representative citing papers

Hierarchical Mixture-of-Experts with Two-Stage Optimization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Hi-MoE uses two-level hierarchical routing objectives to enforce group-level balance while promoting within-group specialization, yielding better perplexity and expert utilization than prior MoE baselines in NLP and vision tasks.

citing papers explorer

Showing 11 of 11 citing papers.