Pith Number
pith:4F4CSDS3
pith:2024:4F4CSDS3JUYT4CXTN6WB3MYQAQ
not attested
not anchored
not stored
refs resolved
OLMoE: Open Mixture-of-Experts Language Models
OLMoE shows a 7B-parameter sparse MoE model with 1B active parameters per token can outperform denser models like Llama2-13B.
arxiv:2409.02060 v2 · 2024-09-03 · cs.CL · cs.AI · cs.LG
Record completeness
1
Bitcoin timestamp
2
Internet Archive
3
Author claim
· sign in to claim
4
Citations
5
Replications
Claims
C1strongest claim
Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B.
C2weakest assumption
That benchmark comparisons are fair across models trained under different data regimes, token counts, and optimization details, with no post-hoc selection affecting the reported gains.
C3one line summary
OLMoE-1B-7B is an open MoE language model activating 1B parameters per token that outperforms models with similar active parameters after pretraining on 5T tokens.
References
[1] Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R
[2] 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Ya
[3] Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr´on, and Sumit Sanghai. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
[4] Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, an
[5] Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Car- los Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. 2023. SantaCoder: don’t reach for
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.556662Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519 (pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e178290e5b4d313e0af36fac1db310041b0ab0879963349fe3d04da5142c5cfd
Aliases
· ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4F4CSDS3JUYT4CXTN6WB3MYQAQ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e178290e5b4d313e0af36fac1db310041b0ab0879963349fe3d04da5142c5cfd
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "9c16d9c7d6967a5232dece92619a6ab9873c90da68f13440851db7ecfbd86e8e",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2024-09-03T17:08:20Z",
"title_canon_sha256": "ea4f149eedbdfb33fbd75aabf8e4f4c5741ff88e5d58565436ddf2c697541b16"
},
"schema_version": "1.0",
"source": {
"id": "2409.02060",
"kind": "arxiv",
"version": 2
}
}