Pith Number

pith:RMPTP6PH

pith:2024:RMPTP6PHGEWGNBOCG2Q3CHTWLE

not attested not anchored not stored refs resolved

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Aixin Liu, Bingxuan Wang, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Daya Guo, DeepSeek-AI, Dejian Yang, Deli Chen, Fuli Luo, Hanwei Xu, Huazuo Gao, Jiashi Li, Junxiao Song, Kai Dong, Kang Guan, Liyue Zhang, Peiyi Wang, Qihao Zhu, Qinyu Chen, Qiushi Du, Runxin Xu, Shirong Ma, Wangding Zeng, Wenfeng Liang, Wenjun Gao, Xiao Bi, Xin Xie, Xuan Lu, Yaohui Wang, Yishi Piao, Yukun Li, Yuxiang You, Y. Wu, Zhenda Xie, Zhewen Hao, Zhibin Gou, Zhihong Shao, Zihui Gu

An open-source code model matches or exceeds closed-source leaders on coding and math benchmarks after training on six trillion extra tokens.

arxiv:2406.11931 v1 · 2024-06-17 · cs.SE · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{RMPTP6PHGEWGNBOCG2Q3CHTWLE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.

C2weakest assumption

That the reported benchmark scores reflect genuine generalization rather than overfitting or contamination from training data that overlaps with the test sets.

C3one line summary

An open-source MoE code model matches GPT-4 Turbo on coding and math benchmarks while expanding to 338 languages and 128K context length.

References

27 extracted · 27 resolved · 19 Pith anchors

[1] Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988

[2] Program Synthesis with Large Language Models · arXiv:2108.07732

[3] Evaluating Large Language Models Trained on Code · arXiv:2107.03374

[5] Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge · arXiv:1803.05457

[6] Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators · arXiv:2404.04475

Formal links

2 machine-checked theorem links

Cited by

35 papers in Pith

MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms

A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

Retrieve Only Relevant Tables Whether Few or Many: Adaptive Table Retrieval Method

Contextualized Code Pretraining for Code Generation

Receipt and verification

First computed	2026-05-17T23:38:49.520181Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

8b1f37f9e7312c6685c236a1b11e765920ee25fd4e7b03b7370dde88e2aaab7b

Aliases

arxiv: 2406.11931 · arxiv_version: 2406.11931v1 · doi: 10.48550/arxiv.2406.11931 · pith_short_12: RMPTP6PHGEWG · pith_short_16: RMPTP6PHGEWGNBOC · pith_short_8: RMPTP6PH

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/RMPTP6PHGEWGNBOCG2Q3CHTWLE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8b1f37f9e7312c6685c236a1b11e765920ee25fd4e7b03b7370dde88e2aaab7b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "b5874c8bedd8029dece000b6d162d5db3239f80b23b0a4f337f298e1ef467364",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2024-06-17T13:51:35Z",
    "title_canon_sha256": "5ade7b81b2bfee6bda41be857c84238f2338b4abe648694e6ddb4ae106aa7a59"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2406.11931",
    "kind": "arxiv",
    "version": 1
  }
}