Cluster-driven expert pruning for mixture-of-experts large language models.arXiv preprint arXiv:2504.07807, 2025

Hongcheng Guo, Juntao Yao, Boyang Wang, Junjia Du, Shaosheng Cao, Donglin Di, Shun Zhang, Zhoujun Li · 2025 · arXiv 2504.07807

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

dMoE: dLLMs with Learnable Block Experts

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.

dMoE: dLLMs with Learnable Block Experts cs.CL · 2026-05-29 · unverdicted · none · ref 20
dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.

Cluster-driven expert pruning for mixture-of-experts large language models.arXiv preprint arXiv:2504.07807, 2025

fields

years

verdicts

representative citing papers

citing papers explorer