Pith Number

pith:XZVRNAW7

pith:2026:XZVRNAW7VTNCWLVB5IXFOTN5DP

not attested not anchored not stored refs resolved

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

Chongyang Tao, Renda Li, Ru Zhang, Weijie Qiu, Xiangxiang Chu, Yong Wang, Ziyu Ma

D²Evo achieves data-efficient RL for LLM reasoning by mining medium-difficulty anchors and jointly evolving a question generator with the solver.

arxiv:2605.17037 v1 · 2026-05-16 · cs.LG · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{XZVRNAW7VTNCWLVB5IXFOTN5DP}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

D²Evo outperforms existing methods on mathematical reasoning benchmarks with fewer than 2K real mathematical samples, and exhibits strong generalization on general reasoning benchmarks.

C2weakest assumption

The framework assumes that mining medium-difficulty anchors based on the current Solver's capability and jointly training the Questioner to generate diverse questions at matching levels will produce stable progressive gains without persistent difficulty mismatch or instability in the co-evolution loop.

C3one line summary

D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.

References

64 extracted · 64 resolved · 21 Pith anchors

[1] Langley , title = 2000

[2] T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980 1980

[3] M. J. Kearns , title =

[4] Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983 1983

[5] R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000 2000

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:03:37.166728Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae

Aliases

arxiv: 2605.17037 · arxiv_version: 2605.17037v1 · doi: 10.48550/arxiv.2605.17037 · pith_short_12: XZVRNAW7VTNC · pith_short_16: XZVRNAW7VTNCWLVB · pith_short_8: XZVRNAW7

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "66c0e698b50ada383f3cce171a09fd79e9ffb09f316fbb99be26a4822989eda2",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-16T15:16:00Z",
    "title_canon_sha256": "2fb27cd9b437b79c0e25a4610d2596ca526cb3bb5cb439778484e2ce8e07610b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17037",
    "kind": "arxiv",
    "version": 1
  }
}