Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.
Tutel: Adaptive mixture-of-experts at scale
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
A three-stage ViT with sparsity-aware MoE and adaptive inference depth delivers improved accuracy-efficiency trade-off for event-stream visual tracking on FE240hz, COESOT, and EventVOT benchmarks.
WILD-SAM is a fine-tuned SAM variant using phase-aware MoE adapters and wavelet subband enhancement that achieves state-of-the-art landslide detection on wrapped InSAR data.
citing papers explorer
-
Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference
Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.