MoE lens – an expert is all you need

Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, Shivam Raval · 2026 · arXiv 2603.05806

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Expert specialization in vision MoE models is dominated by a stable animate-inanimate distinction visible from gating to readout, with broader tuning to continuous visual and semantic dimensions rather than narrow categorical preferences.

Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality

cs.AI · 2026-04-15 · conditional · novelty 7.0

Routing topology in sparse Mixture-of-Experts models does not determine asymptotic language modeling perplexity; multiple variants including cosine-similarity routing achieve statistically equivalent performance.

Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models

cs.CL · 2026-03-28 · unverdicted · novelty 7.0

Routing sensitivity in MoE models is necessary but insufficient for stereotype control because bias and knowledge remain entangled within expert groups and preference shifts do not transfer to generated text.

Post-Trained MoE Can Skip Half Experts via Self-Distillation

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

ZEDA injects zero-output experts and uses two-stage self-distillation to adapt post-trained MoE models into dynamic ones that skip over half the experts, yielding 1.2x inference speedup with small accuracy drops.

citing papers explorer

Showing 4 of 4 citing papers.

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts cs.CV · 2026-05-20 · unverdicted · none · ref 18
Expert specialization in vision MoE models is dominated by a stable animate-inanimate distinction visible from gating to readout, with broader tuning to continuous visual and semantic dimensions rather than narrow categorical preferences.
Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality cs.AI · 2026-04-15 · conditional · none · ref 23
Routing topology in sparse Mixture-of-Experts models does not determine asymptotic language modeling perplexity; multiple variants including cosine-similarity routing achieve statistically equivalent performance.
Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models cs.CL · 2026-03-28 · unverdicted · none · ref 2
Routing sensitivity in MoE models is necessary but insufficient for stereotype control because bias and knowledge remain entangled within expert groups and preference shifts do not transfer to generated text.
Post-Trained MoE Can Skip Half Experts via Self-Distillation cs.LG · 2026-05-18 · unverdicted · none · ref 7
ZEDA injects zero-output experts and uses two-stage self-distillation to adapt post-trained MoE models into dynamic ones that skip over half the experts, yielding 1.2x inference speedup with small accuracy drops.

MoE lens – an expert is all you need

fields

years

verdicts

representative citing papers

citing papers explorer