Unveiling super experts in mixture-of-experts large language models

Zunhai Su, Qingyuan Li, Hao Zhang, Weihao Ye, Qibo Xue, YuLei Qian, Yuchen Xie, Ngai Wong, Kehong Yuan · 2025 · arXiv 2507.23279

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

OScaR mitigates token norm imbalance via canalized rotation and omni-token scaling to enable near-lossless INT2 KV cache quantization with up to 3x decoding speedup and 5.3x memory reduction.

Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs

cs.AR · 2026-05-07 · unverdicted · novelty 6.0

DySHARP accelerates MoE expert parallelism via dynamic multimem addressing and token-centric kernel fusion to cut redundant traffic and deliver up to 1.79x speedup over prior in-switch solutions.

SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining

cs.LG · 2026-02-11 · conditional · novelty 5.0

SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 5 of 5 citing papers.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning cs.LG · 2026-04-24 · unverdicted · none · ref 27
A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond cs.LG · 2026-05-19 · unverdicted · none · ref 46
OScaR mitigates token norm imbalance via canalized rotation and omni-token scaling to enable near-lossless INT2 KV cache quantization with up to 3x decoding speedup and 5.3x memory reduction.
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs cs.AR · 2026-05-07 · unverdicted · none · ref 47
DySHARP accelerates MoE expert parallelism via dynamic multimem addressing and token-centric kernel fusion to cut redundant traffic and deliver up to 1.79x speedup over prior in-switch solutions.
SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining cs.LG · 2026-02-11 · conditional · none · ref 33
SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 288
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

Unveiling super experts in mixture-of-experts large language models

fields

years

verdicts

representative citing papers

citing papers explorer