{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:FTYVOO5ZNSHTNTQW2ZQWK3PXFZ","short_pith_number":"pith:FTYVOO5Z","schema_version":"1.0","canonical_sha256":"2cf1573bb96c8f36ce16d661656df72e422dfbafe44ac75866e929a641f1909d","source":{"kind":"arxiv","id":"2508.06165","version":5},"attestation_state":"computed","paper":{"title":"UR$^2$: Unify RAG and Reasoning through Reinforcement Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A reinforcement learning framework unifies RAG and reasoning by learning when to retrieve and how to combine knowledge sources.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Boran Xiang, Weitao Li, Weizhi Ma, Xiaolong Wang, Yang Liu, Zhinan Gou","submitted_at":"2025-08-08T09:33:20Z","abstract_excerpt":"Large Language Models (LLMs) have shown strong capabilities through two complementary paradigms: Retrieval-Augmented Generation (RAG) for knowledge grounding and Reinforcement Learning from Verifiable Rewards (RLVR) for complex reasoning. However, existing attempts to unify these paradigms remain narrow in scope, typically limited to open-domain QA with fixed retrieval settings, which constrains generalization to broader domains. To address this limitation, we propose UR$^2$ (Unified RAG and Reasoning)), a general reinforcement learning framework that dynamically coordinates retrieval and reas"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2508.06165","kind":"arxiv","version":5},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2025-08-08T09:33:20Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"0bfd828ac4e8c9d7774cb48123585f70ecbeb3992eccded2963e2a531b34a1f7","abstract_canon_sha256":"73e36473814734f3c5918d47fc2e2f1921671421024901fcb94582f40854694a"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-06-03T01:05:05.454399Z","signature_b64":"CQEGeXNqtEJciK1HbpvBeg9y6N59O114vWqSsJziGdp6SyiTLEE2jrH21GQV2g1HDcJtgQC81ALs/grxyVTQBw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"2cf1573bb96c8f36ce16d661656df72e422dfbafe44ac75866e929a641f1909d","last_reissued_at":"2026-06-03T01:05:05.453833Z","signature_status":"signed_v1","first_computed_at":"2026-06-03T01:05:05.453833Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"UR$^2$: Unify RAG and Reasoning through Reinforcement Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A reinforcement learning framework unifies RAG and reasoning by learning when to retrieve and how to combine knowledge sources.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Boran Xiang, Weitao Li, Weizhi Ma, Xiaolong Wang, Yang Liu, Zhinan Gou","submitted_at":"2025-08-08T09:33:20Z","abstract_excerpt":"Large Language Models (LLMs) have shown strong capabilities through two complementary paradigms: Retrieval-Augmented Generation (RAG) for knowledge grounding and Reinforcement Learning from Verifiable Rewards (RLVR) for complex reasoning. However, existing attempts to unify these paradigms remain narrow in scope, typically limited to open-domain QA with fixed retrieval settings, which constrains generalization to broader domains. To address this limitation, we propose UR$^2$ (Unified RAG and Reasoning)), a general reinforcement learning framework that dynamically coordinates retrieval and reas"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"UR², built on Qwen-2.5-3/7B and LLaMA-3.1-8B, consistently outperforms existing RAG and RL baselines, and achieves performance comparable to GPT-4o-mini and GPT-4.1-mini on several benchmarks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The difficulty-aware curriculum and hybrid knowledge access strategy can be reliably learned through RL without introducing new instabilities or requiring extensive hyperparameter tuning that is not reported in the abstract.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"UR² is a general RL framework that dynamically coordinates RAG and reasoning via difficulty-aware curriculum and hybrid knowledge access, outperforming baselines on QA, MMLU-Pro, medical, and math tasks with models up to 8B parameters.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A reinforcement learning framework unifies RAG and reasoning by learning when to retrieve and how to combine knowledge sources.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e70a5b2d88fc825cbb81e554923c6024e74008b853a22bc208b12d14e1cde76e"},"source":{"id":"2508.06165","kind":"arxiv","version":5},"verdict":{"id":"2f75d0fc-d43a-4f60-8d67-041fbf3d37d3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T00:45:09.358117Z","strongest_claim":"UR², built on Qwen-2.5-3/7B and LLaMA-3.1-8B, consistently outperforms existing RAG and RL baselines, and achieves performance comparable to GPT-4o-mini and GPT-4.1-mini on several benchmarks.","one_line_summary":"UR² is a general RL framework that dynamically coordinates RAG and reasoning via difficulty-aware curriculum and hybrid knowledge access, outperforming baselines on QA, MMLU-Pro, medical, and math tasks with models up to 8B parameters.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The difficulty-aware curriculum and hybrid knowledge access strategy can be reliably learned through RL without introducing new instabilities or requiring extensive hyperparameter tuning that is not reported in the abstract.","pith_extraction_headline":"A reinforcement learning framework unifies RAG and reasoning by learning when to retrieve and how to combine knowledge sources."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2508.06165/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7d0213d241330b653c302549e84b3ad24236277cd12223c80ce3601a8856609a"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2508.06165","created_at":"2026-06-03T01:05:05.453892+00:00"},{"alias_kind":"arxiv_version","alias_value":"2508.06165v5","created_at":"2026-06-03T01:05:05.453892+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2508.06165","created_at":"2026-06-03T01:05:05.453892+00:00"},{"alias_kind":"pith_short_12","alias_value":"FTYVOO5ZNSHT","created_at":"2026-06-03T01:05:05.453892+00:00"},{"alias_kind":"pith_short_16","alias_value":"FTYVOO5ZNSHTNTQW","created_at":"2026-06-03T01:05:05.453892+00:00"},{"alias_kind":"pith_short_8","alias_value":"FTYVOO5Z","created_at":"2026-06-03T01:05:05.453892+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ","json":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ.json","graph_json":"https://pith.science/api/pith-number/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/graph.json","events_json":"https://pith.science/api/pith-number/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/events.json","paper":"https://pith.science/paper/FTYVOO5Z"},"agent_actions":{"view_html":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ","download_json":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ.json","view_paper":"https://pith.science/paper/FTYVOO5Z","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2508.06165&json=true","fetch_graph":"https://pith.science/api/pith-number/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/graph.json","fetch_events":"https://pith.science/api/pith-number/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/action/storage_attestation","attest_author":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/action/author_attestation","sign_citation":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/action/citation_signature","submit_replication":"https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ/action/replication_record"}},"created_at":"2026-06-03T01:05:05.453892+00:00","updated_at":"2026-06-03T01:05:05.453892+00:00"}