pith. sign in
Pith Number

pith:XZVRNAW7

pith:2026:XZVRNAW7VTNCWLVB5IXFOTN5DP
not attested not anchored not stored refs resolved

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

Chongyang Tao, Renda Li, Ru Zhang, Weijie Qiu, Xiangxiang Chu, Yong Wang, Ziyu Ma

D²Evo achieves data-efficient RL for LLM reasoning by mining medium-difficulty anchors and jointly evolving a question generator with the solver.

arxiv:2605.17037 v1 · 2026-05-16 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XZVRNAW7VTNCWLVB5IXFOTN5DP}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

D²Evo outperforms existing methods on mathematical reasoning benchmarks with fewer than 2K real mathematical samples, and exhibits strong generalization on general reasoning benchmarks.

C2weakest assumption

The framework assumes that mining medium-difficulty anchors based on the current Solver's capability and jointly training the Questioner to generate diverse questions at matching levels will produce stable progressive gains without persistent difficulty mismatch or instability in the co-evolution loop.

C3one line summary

D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.

References

64 extracted · 64 resolved · 21 Pith anchors

[1] Langley , title = 2000
[2] T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980 1980
[3] M. J. Kearns , title =
[4] Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983 1983
[5] R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000 2000

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:37.166728Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae

Aliases

arxiv: 2605.17037 · arxiv_version: 2605.17037v1 · doi: 10.48550/arxiv.2605.17037 · pith_short_12: XZVRNAW7VTNC · pith_short_16: XZVRNAW7VTNCWLVB · pith_short_8: XZVRNAW7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "66c0e698b50ada383f3cce171a09fd79e9ffb09f316fbb99be26a4822989eda2",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-16T15:16:00Z",
    "title_canon_sha256": "2fb27cd9b437b79c0e25a4610d2596ca526cb3bb5cb439778484e2ce8e07610b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17037",
    "kind": "arxiv",
    "version": 1
  }
}