{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:XZVRNAW7VTNCWLVB5IXFOTN5DP","short_pith_number":"pith:XZVRNAW7","schema_version":"1.0","canonical_sha256":"be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae","source":{"kind":"arxiv","id":"2605.17037","version":1},"attestation_state":"computed","paper":{"title":"D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"D²Evo achieves data-efficient RL for LLM reasoning by mining medium-difficulty anchors and jointly evolving a question generator with the solver.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.LG","authors_text":"Chongyang Tao, Renda Li, Ru Zhang, Weijie Qiu, Xiangxiang Chu, Yong Wang, Ziyu Ma","submitted_at":"2026-05-16T15:16:00Z","abstract_excerpt":"Reinforcement learning (RL) has demonstrated potential for enhancing reasoning in large language models (LLMs). However, effective RL training, which requires medium-difficulty training samples, faces two fundamental challenges: Effective Data Scarcity and Dynamic Difficulty Shifts, where medium-difficulty samples are scarce and become trivial as models improve. Existing methods mitigate this scarcity to some extent by generating training samples. However, these approaches suffer from anchor-free generation, ignoring co-evolution, and difficulty mismatch. To address these issues, we propose D$"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2605.17037","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2026-05-16T15:16:00Z","cross_cats_sorted":["cs.AI","cs.CL"],"title_canon_sha256":"2fb27cd9b437b79c0e25a4610d2596ca526cb3bb5cb439778484e2ce8e07610b","abstract_canon_sha256":"66c0e698b50ada383f3cce171a09fd79e9ffb09f316fbb99be26a4822989eda2"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-20T00:03:37.167503Z","signature_b64":"4j+JkQmIHspJfjhtFkXpC1HoFqwFmhj/nVzv0qaVTjSKb8PwHKsBRvdkaSqL1mJaAicbJX71LY86qGMeu/XlCg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"be6b1682dfacda2b2ea1ea2e574dbd1bdfc97606a8eaedc91c798e28d97994ae","last_reissued_at":"2026-05-20T00:03:37.166728Z","signature_status":"signed_v1","first_computed_at":"2026-05-20T00:03:37.166728Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"D²Evo achieves data-efficient RL for LLM reasoning by mining medium-difficulty anchors and jointly evolving a question generator with the solver.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.LG","authors_text":"Chongyang Tao, Renda Li, Ru Zhang, Weijie Qiu, Xiangxiang Chu, Yong Wang, Ziyu Ma","submitted_at":"2026-05-16T15:16:00Z","abstract_excerpt":"Reinforcement learning (RL) has demonstrated potential for enhancing reasoning in large language models (LLMs). However, effective RL training, which requires medium-difficulty training samples, faces two fundamental challenges: Effective Data Scarcity and Dynamic Difficulty Shifts, where medium-difficulty samples are scarce and become trivial as models improve. Existing methods mitigate this scarcity to some extent by generating training samples. However, these approaches suffer from anchor-free generation, ignoring co-evolution, and difficulty mismatch. To address these issues, we propose D$"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"D²Evo outperforms existing methods on mathematical reasoning benchmarks with fewer than 2K real mathematical samples, and exhibits strong generalization on general reasoning benchmarks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The framework assumes that mining medium-difficulty anchors based on the current Solver's capability and jointly training the Questioner to generate diverse questions at matching levels will produce stable progressive gains without persistent difficulty mismatch or instability in the co-evolution loop.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"D²Evo achieves data-efficient RL for LLM reasoning by mining medium-difficulty anchors and jointly evolving a question generator with the solver.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"37e4e2373496ef0a6a7e2db7d029cddb36b531e97403e3e93b055a084724d977"},"source":{"id":"2605.17037","kind":"arxiv","version":1},"verdict":{"id":"213852fb-3fd8-445c-9ff1-60a6b57eee41","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T20:17:53.748928Z","strongest_claim":"D²Evo outperforms existing methods on mathematical reasoning benchmarks with fewer than 2K real mathematical samples, and exhibits strong generalization on general reasoning benchmarks.","one_line_summary":"D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The framework assumes that mining medium-difficulty anchors based on the current Solver's capability and jointly training the Questioner to generate diverse questions at matching levels will produce stable progressive gains without persistent difficulty mismatch or instability in the co-evolution loop.","pith_extraction_headline":"D²Evo achieves data-efficient RL for LLM reasoning by mining medium-difficulty anchors and jointly evolving a question generator with the solver."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.17037/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T20:31:18.941157Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T20:30:42.417002Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"citation_quote_validity","ran_at":"2026-05-19T19:49:42.510126Z","status":"skipped","version":"0.1.0","findings_count":0},{"name":"cited_work_retraction","ran_at":"2026-05-19T18:51:56.516085Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T18:41:56.166074Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T18:33:23.001928Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"57e64d4233335854b1a81444a08c45724597de199dc97c4ad0cfb01c44bb5281"},"references":{"count":64,"sample":[{"doi":"","year":2000,"title":"Langley , title =","work_id":"6cd283dc-0548-45e9-af07-6bc1005593ad","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1980,"title":"T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980","work_id":"6b09bca6-ef5d-4a83-8c8e-219f23cbd761","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"M. J. Kearns , title =","work_id":"8efd8073-6f5d-45c5-94d5-62d366b52518","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1983,"title":"Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983","work_id":"51835800-f16e-4534-8339-d3ea09147556","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2000,"title":"R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000","work_id":"a24cb892-7ac9-4509-8f5b-abbdfb15998b","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":64,"snapshot_sha256":"98b04eaa4ebf9ebf6288a0687487d7574b8a526d567003912b65507f42b8b59f","internal_anchors":21},"formal_canon":{"evidence_count":2,"snapshot_sha256":"9a2db39ca9e16faaeb0cb5c225e9b060d9c05fcbfee4058429a09c0ae2721852"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.17037","created_at":"2026-05-20T00:03:37.166851+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.17037v1","created_at":"2026-05-20T00:03:37.166851+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.17037","created_at":"2026-05-20T00:03:37.166851+00:00"},{"alias_kind":"pith_short_12","alias_value":"XZVRNAW7VTNC","created_at":"2026-05-20T00:03:37.166851+00:00"},{"alias_kind":"pith_short_16","alias_value":"XZVRNAW7VTNCWLVB","created_at":"2026-05-20T00:03:37.166851+00:00"},{"alias_kind":"pith_short_8","alias_value":"XZVRNAW7","created_at":"2026-05-20T00:03:37.166851+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP","json":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP.json","graph_json":"https://pith.science/api/pith-number/XZVRNAW7VTNCWLVB5IXFOTN5DP/graph.json","events_json":"https://pith.science/api/pith-number/XZVRNAW7VTNCWLVB5IXFOTN5DP/events.json","paper":"https://pith.science/paper/XZVRNAW7"},"agent_actions":{"view_html":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP","download_json":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP.json","view_paper":"https://pith.science/paper/XZVRNAW7","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.17037&json=true","fetch_graph":"https://pith.science/api/pith-number/XZVRNAW7VTNCWLVB5IXFOTN5DP/graph.json","fetch_events":"https://pith.science/api/pith-number/XZVRNAW7VTNCWLVB5IXFOTN5DP/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP/action/timestamp_anchor","attest_storage":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP/action/storage_attestation","attest_author":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP/action/author_attestation","sign_citation":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP/action/citation_signature","submit_replication":"https://pith.science/pith/XZVRNAW7VTNCWLVB5IXFOTN5DP/action/replication_record"}},"created_at":"2026-05-20T00:03:37.166851+00:00","updated_at":"2026-05-20T00:03:37.166851+00:00"}