Our conclusion regarding a consistent optimal activation rate contradicts the findings of Abnar et al

find that the optimal sparsity increases with model size · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource

cs.CL · 2025-06-13 · conditional · novelty 6.0

MoE models with activation rates in an optimal region outperform dense LLMs of identical total parameter count, training compute, and data budget, with the optimal region consistent across scales.

citing papers explorer

Showing 1 of 1 citing paper.

Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource cs.CL · 2025-06-13 · conditional · none · ref 51
MoE models with activation rates in an optimal region outperform dense LLMs of identical total parameter count, training compute, and data budget, with the optimal region consistent across scales.

Our conclusion regarding a consistent optimal activation rate contradicts the findings of Abnar et al

fields

years

verdicts

representative citing papers

citing papers explorer