pith. sign in
Pith Number

pith:7RRY4RV6

pith:2026:7RRY4RV6SZSYNNXPVF7OME6JLA
not attested not anchored not stored refs pending

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

Bingxiang He, Jiarui Yuan, Jinyi Hu, Maosong Sun, Ran Li, Weize Chen, Yinghao Chen, Zeyuan Liu, Zhiyuan Liu, Zixuan Fu

A Coach generates tasks and rewards a Player for solving them, improving LLM math reasoning without any external data.

arxiv:2602.02979 v2 · 2026-02-03 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{7RRY4RV6SZSYNNXPVF7OME6JLA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CPMobius achieves substantial improvement without relying on any external training data, outperforming existing unsupervised approaches. For example, on Qwen2.5-Math-7B-Instruct, our method improves accuracy by an overall average of +4.9 and an out-of-distribution average of +5.4.

C2weakest assumption

The cooperative optimization loop between Coach and Player directly enhances the Player's mathematical reasoning ability without external data or labels.

C3one line summary

CPMobius uses iterative coach-player reinforcement learning to improve mathematical reasoning in LLMs without external training data, yielding +4.9 average accuracy gains on Qwen2.5-Math-7B-Instruct.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T02:45:05.536378Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

fc638e46be966586b6efa97ee613c958354bcdc76130b7ca3b67fd1b586cf3a0

Aliases

arxiv: 2602.02979 · arxiv_version: 2602.02979v2 · doi: 10.48550/arxiv.2602.02979 · pith_short_12: 7RRY4RV6SZSY · pith_short_16: 7RRY4RV6SZSYNNXP · pith_short_8: 7RRY4RV6
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7RRY4RV6SZSYNNXPVF7OME6JLA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fc638e46be966586b6efa97ee613c958354bcdc76130b7ca3b67fd1b586cf3a0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "7f9bb2ed9390126133d2ea896d1f09dbc4f8d2226b8cb7e2deae47b864e9d2fb",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-02-03T01:38:53Z",
    "title_canon_sha256": "5600fecc22fcedcfa4335d03307271e53eceb287693b0c06d90137e6baf6dc9c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.02979",
    "kind": "arxiv",
    "version": 2
  }
}