pith. machine review for the scientific record. sign in
Pith Number

pith:4F4CSDS3

pith:2024:4F4CSDS3JUYT4CXTN6WB3MYQAQ
not attested not anchored not stored refs resolved

OLMoE: Open Mixture-of-Experts Language Models

Akshita Bhagia, Alexander Wettig, Ali Farhadi, Amanpreet Singh, Binyuan Hui, David Wadden, Dirk Groeneveld, Douwe Kiela, Dustin Schwenk, Hannaneh Hajishirzi, Jacob Morrison, Kyle Lo, Luca Soldaini, Nathan Lambert, Niklas Muennighoff, Noah A. Smith, Oyvind Tafjord, Pang Wei Koh, Pete Walsh, Sewon Min, Shane Arora, Tim Dettmers, Weijia Shi, Yuling Gu

OLMoE shows a 7B-parameter sparse MoE model with 1B active parameters per token can outperform denser models like Llama2-13B.

arxiv:2409.02060 v2 · 2024-09-03 · cs.CL · cs.AI · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B.

C2weakest assumption

That benchmark comparisons are fair across models trained under different data regimes, token counts, and optimization details, with no post-hoc selection affecting the reported gains.

C3one line summary

OLMoE-1B-7B is an open MoE language model activating 1B parameters per token that outperforms models with similar active parameters after pretraining on 5T tokens.

References

236 extracted · 236 resolved · 0 Pith anchors

[1] Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R 2024
[2] 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Ya 2024
[3] Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr´on, and Sumit Sanghai. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints 2023
[4] Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, an 2024
[5] Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Car- los Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. 2023. SantaCoder: don’t reach for 2023

Formal links

3 machine-checked theorem links

Cited by

20 papers in Pith

Receipt and verification
First computed2026-05-17T23:38:47.556662Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

e178290e5b4d313e0af36fac1db310041b0ab0879963349fe3d04da5142c5cfd

Aliases

arxiv: 2409.02060 · arxiv_version: 2409.02060v2 · doi: 10.48550/arxiv.2409.02060
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4F4CSDS3JUYT4CXTN6WB3MYQAQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e178290e5b4d313e0af36fac1db310041b0ab0879963349fe3d04da5142c5cfd
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9c16d9c7d6967a5232dece92619a6ab9873c90da68f13440851db7ecfbd86e8e",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-09-03T17:08:20Z",
    "title_canon_sha256": "ea4f149eedbdfb33fbd75aabf8e4f4c5741ff88e5d58565436ddf2c697541b16"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.02060",
    "kind": "arxiv",
    "version": 2
  }
}