pith. sign in
Pith Number

pith:WSQP4RCK

pith:2025:WSQP4RCK2AH3TFATZQ23MUUQQK
not attested not anchored not stored refs resolved

MiniMax-01: Scaling Foundation Models with Lightning Attention

Aonian Li, Bangwei Gong, Boji Shan, Bo Yang, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jingtao Han, Jingyang Li, Jin Zhu, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Linbo Chai, Lin Zheng, Long Xing, Meizhi Ju, Mingyuan Chi, MiniMax, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Qi Yang, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiaodong Han, Xiao Su, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhenhua Fan, Zhen Qin, Zhihang Yu, Zhuo Jiang, Zijia Wu

MiniMax-01 matches GPT-4o and Claude-3.5-Sonnet performance while supporting 20-32 times longer contexts.

arxiv:2501.08313 v1 · 2025-01-14 · cs.CL · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WSQP4RCK2AH3TFATZQ23MUUQQK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window.

C2weakest assumption

That lightning attention combined with the described MoE parallel and overlap techniques preserves model quality and training stability at the claimed parameter and context scales without unstated performance trade-offs or instabilities.

C3one line summary

MiniMax-01 models match GPT-4o and Claude-3.5-Sonnet performance while providing 20-32 times longer context windows through lightning attention and MoE scaling.

References

68 extracted · 68 resolved · 9 Pith anchors

[1] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark 2020 · arXiv:2409.02813
[2] Introduction and Motivation The rapid advancement of large language models (LLMs) has significantly enhanced their capabilities but has also raised concerns about their alignment with human values and
[3] This process repeats iteratively until the response is complete, ensuring that every sentence in the output aligns with human preferences
[4] The training objective is to mini- mize the negative log-likelihood loss between the model’s output and the corrected answer
[5] Experimental Results The paper evaluates Stream Aligner on three tasks: helpful and harmless QA, math questions, and sum- mary tasks. The results demonstrate significant im- provements: • Helpfulness

Formal links

3 machine-checked theorem links

Cited by

33 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.842422Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b4a0fe444ad00fb99413cc35b65290828da1ea0d33ae3182cd84e98914202ccd

Aliases

arxiv: 2501.08313 · arxiv_version: 2501.08313v1 · doi: 10.48550/arxiv.2501.08313 · pith_short_12: WSQP4RCK2AH3 · pith_short_16: WSQP4RCK2AH3TFAT · pith_short_8: WSQP4RCK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WSQP4RCK2AH3TFATZQ23MUUQQK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b4a0fe444ad00fb99413cc35b65290828da1ea0d33ae3182cd84e98914202ccd
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "42614d59c7a60130d5893b48965ded06dc631cde6d350ccbaa9321f080ac4f85",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-01-14T18:50:05Z",
    "title_canon_sha256": "5982cf85dc8f4ced5ffa68d3232fd5356221326bda3570f2e0e5d3a133bcec9a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.08313",
    "kind": "arxiv",
    "version": 1
  }
}