pith. sign in

Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1 method 1

citation-polarity summary

fields

cs.LG 6 cs.RO 1

years

2026 6 2024 1

polarities

background 2

representative citing papers

Path-Constrained Mixture-of-Experts

cs.LG · 2026-03-18 · unverdicted · novelty 7.0

PathMoE constrains expert paths in MoE models by sharing router parameters across layer blocks, yielding more concentrated paths, better performance on perplexity and tasks, and no need for auxiliary losses.

CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

CP-MoE uses a transient expert, consistency-preserving routing bias, and guided regularization to reduce catastrophic forgetting in MoE-based LLMs and VLMs while preserving cross-task transfer, reporting SOTA on SuperNI and gains on VQA v2.

citing papers explorer

Showing 7 of 7 citing papers.