pith:XZVRNAW7
D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning
D²Evo achieves data-efficient RL for LLM reasoning by mining medium-difficulty anchors and jointly evolving a question generator with the solver.
arxiv:2605.17037 v1 · 2026-05-16 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XZVRNAW7VTNCWLVB5IXFOTN5DP}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
D²Evo outperforms existing methods on mathematical reasoning benchmarks with fewer than 2K real mathematical samples, and exhibits strong generalization on general reasoning benchmarks.
The framework assumes that mining medium-difficulty anchors based on the current Solver's capability and jointly training the Questioner to generate diverse questions at matching levels will produce stable progressive gains without persistent difficulty mismatch or instability in the co-evolution loop.
D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:03:37.166728Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "66c0e698b50ada383f3cce171a09fd79e9ffb09f316fbb99be26a4822989eda2",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-16T15:16:00Z",
"title_canon_sha256": "2fb27cd9b437b79c0e25a4610d2596ca526cb3bb5cb439778484e2ce8e07610b"
},
"schema_version": "1.0",
"source": {
"id": "2605.17037",
"kind": "arxiv",
"version": 1
}
}