Krishna Teja Chitty-Venkata, Sandeep Madireddy, Mu- rali Emani, and Venkatram Vishwanath

· 2025 · arXiv 2509.02753

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

Alloc-MoE allocates a fixed expert activation budget using layer-level dynamic programming based on sensitivity and token-level score-based redistribution, delivering 1.15x prefill and 1.34x decode speedups on DeepSeek-V2-Lite at half the original budget while preserving performance.

Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

cs.DC · 2025-10-07 · conditional · novelty 6.0

Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.

citing papers explorer

Showing 2 of 2 citing papers.

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference cs.LG · 2026-04-09 · unverdicted · none · ref 1
Alloc-MoE allocates a fixed expert activation budget using layer-level dynamic programming based on sensitivity and token-level score-based redistribution, delivering 1.15x prefill and 1.34x decode speedups on DeepSeek-V2-Lite at half the original budget while preserving performance.
Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference cs.DC · 2025-10-07 · conditional · none · ref 7
Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.

Krishna Teja Chitty-Venkata, Sandeep Madireddy, Mu- rali Emani, and Venkatram Vishwanath

fields

years

verdicts

representative citing papers

citing papers explorer