pith. sign in
Pith Number

pith:FTYVOO5Z

pith:2025:FTYVOO5ZNSHTNTQW2ZQWK3PXFZ
not attested not anchored not stored refs pending

UR$^2$: Unify RAG and Reasoning through Reinforcement Learning

Boran Xiang, Weitao Li, Weizhi Ma, Xiaolong Wang, Yang Liu, Zhinan Gou

A reinforcement learning framework unifies RAG and reasoning by learning when to retrieve and how to combine knowledge sources.

arxiv:2508.06165 v5 · 2025-08-08 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FTYVOO5ZNSHTNTQW2ZQWK3PXFZ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

UR², built on Qwen-2.5-3/7B and LLaMA-3.1-8B, consistently outperforms existing RAG and RL baselines, and achieves performance comparable to GPT-4o-mini and GPT-4.1-mini on several benchmarks.

C2weakest assumption

The difficulty-aware curriculum and hybrid knowledge access strategy can be reliably learned through RL without introducing new instabilities or requiring extensive hyperparameter tuning that is not reported in the abstract.

C3one line summary

UR² is a general RL framework that dynamically coordinates RAG and reasoning via difficulty-aware curriculum and hybrid knowledge access, outperforming baselines on QA, MMLU-Pro, medical, and math tasks with models up to 8B parameters.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-06-03T01:05:05.453833Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2cf1573bb96c8f36ce16d661656df72e422dfbafe44ac75866e929a641f1909d

Aliases

arxiv: 2508.06165 · arxiv_version: 2508.06165v5 · doi: 10.48550/arxiv.2508.06165 · pith_short_12: FTYVOO5ZNSHT · pith_short_16: FTYVOO5ZNSHTNTQW · pith_short_8: FTYVOO5Z
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FTYVOO5ZNSHTNTQW2ZQWK3PXFZ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2cf1573bb96c8f36ce16d661656df72e422dfbafe44ac75866e929a641f1909d
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "73e36473814734f3c5918d47fc2e2f1921671421024901fcb94582f40854694a",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-08-08T09:33:20Z",
    "title_canon_sha256": "0bfd828ac4e8c9d7774cb48123585f70ecbeb3992eccded2963e2a531b34a1f7"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2508.06165",
    "kind": "arxiv",
    "version": 5
  }
}