Dai and Simon Tong and Dmitry Lepikhin and Yuanzhong Xu and Maxim Krikun and Yanqi Zhou and Adams Wei Yu and Orhan Firat and Barret Zoph and Liam Fedus and Maarten P

Nan Du, Yanping Huang, Andrew M · 2022

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

cs.CL · 2024-01-11 · unverdicted · novelty 5.0

DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

citing papers explorer

Showing 1 of 1 citing paper.

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models cs.CL · 2024-01-11 · unverdicted · none · ref 92
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

Dai and Simon Tong and Dmitry Lepikhin and Yuanzhong Xu and Maxim Krikun and Yanqi Zhou and Adams Wei Yu and Orhan Firat and Barret Zoph and Liam Fedus and Maarten P

fields

years

verdicts

representative citing papers

citing papers explorer