pith:MUNRGAUK
Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration
Uncertainty-guided tree search decouples exploration from policy optimization to reach SOTA on hard RL benchmarks.
arxiv:2603.22273 v4 · 2026-03-23 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MUNRGAUKXDN3AS6PILEHIIXO5Y}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
By removing the overhead of policy optimization, our approach explores an order of magnitude more efficiently than standard intrinsic motivation baselines on hard exploration benchmarks. ... achieving state-of-the-art performance by a wide margin on Montezuma's Revenge, Pitfall!, and Venture without relying on domain-specific knowledge. ... solving the MuJoCo Adroit dexterous manipulation and AntMaze tasks in a sparse-reward setting, directly from image observations and without expert demonstrations or offline datasets.
That an uncertainty measure paired with Go-With-The-Winner-style tree search will systematically expand state coverage in hard exploration domains without the policy optimization step, and that the resulting trajectories can be reliably distilled into high-performing policies using existing supervised backward learning algorithms.
Uncertainty-guided tree search decouples exploration from RL policy optimization, achieving order-of-magnitude better efficiency and SOTA performance on sparse-reward tasks like Montezuma's Revenge, Pitfall, and Venture via trajectory distillation.
Receipt and verification
| First computed | 2026-05-18T02:44:30.761130Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
651b13028ab8dbb04bcf42c87422eeee30bace98d09b6ca7a3b140b601a51de6
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MUNRGAUKXDN3AS6PILEHIIXO5Y \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 651b13028ab8dbb04bcf42c87422eeee30bace98d09b6ca7a3b140b601a51de6
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "75c8a6e40ae4c65700b5c11715334b9e7e52624b7f1857693e421012fc5ce241",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-03-23T17:56:52Z",
"title_canon_sha256": "0bbc81c192dee09c25af7997247b6ad8a825e9fbb458ac5d2b94559859087437"
},
"schema_version": "1.0",
"source": {
"id": "2603.22273",
"kind": "arxiv",
"version": 4
}
}