Case-based or rule-based: How do transformers do the math? In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

Yi Hu, Xiaojuan Tang, Haotong Yang, Muhan Zhang · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource

cs.CL · 2025-06-13 · conditional · novelty 6.0

MoE models with activation rates in an optimal region outperform dense LLMs of identical total parameter count, training compute, and data budget, with the optimal region consistent across scales.

citing papers explorer

Showing 1 of 1 citing paper.

Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource cs.CL · 2025-06-13 · conditional · none · ref 16
MoE models with activation rates in an optimal region outperform dense LLMs of identical total parameter count, training compute, and data budget, with the optimal region consistent across scales.

Case-based or rule-based: How do transformers do the math? In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

fields

years

verdicts

representative citing papers

citing papers explorer