pith. sign in

hub

MoETuner: Optimized mixture of expert serving with balanced expert placement and token routing

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

citation-role summary

dataset 1

citation-polarity summary

years

2026 12 2025 3

roles

dataset 1

polarities

use dataset 1

clear filters

representative citing papers

Hierarchical Mixture-of-Experts with Two-Stage Optimization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Hi-MoE uses two-level hierarchical routing objectives to enforce group-level balance while promoting within-group specialization, yielding better perplexity and expert utilization than prior MoE baselines in NLP and vision tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.