Dynamic expert sharing: Decoupling memory from parallelism in mixture-of-experts diffusion llms.arXiv preprint arXiv:2602.00879, 2026

Hao Mark Chen, Zhiwen Mo, Royson Lee, Qianzhou Wang, Da Li, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan · 2026 · arXiv 2602.00879

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

dMoE: dLLMs with Learnable Block Experts

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.

citing papers explorer

Showing 1 of 1 citing paper.

dMoE: dLLMs with Learnable Block Experts cs.CL · 2026-05-29 · unverdicted · none · ref 33
dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.

Dynamic expert sharing: Decoupling memory from parallelism in mixture-of-experts diffusion llms.arXiv preprint arXiv:2602.00879, 2026

fields

years

verdicts

representative citing papers

citing papers explorer