Pith Number

pith:MUNRGAUK

pith:2026:MUNRGAUKXDN3AS6PILEHIIXO5Y

not attested not anchored not stored refs pending

Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration

James Cohan, Zakaria Mhammedi

Uncertainty-guided tree search decouples exploration from policy optimization to reach SOTA on hard RL benchmarks.

arxiv:2603.22273 v4 · 2026-03-23 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{MUNRGAUKXDN3AS6PILEHIIXO5Y}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By removing the overhead of policy optimization, our approach explores an order of magnitude more efficiently than standard intrinsic motivation baselines on hard exploration benchmarks. ... achieving state-of-the-art performance by a wide margin on Montezuma's Revenge, Pitfall!, and Venture without relying on domain-specific knowledge. ... solving the MuJoCo Adroit dexterous manipulation and AntMaze tasks in a sparse-reward setting, directly from image observations and without expert demonstrations or offline datasets.

C2weakest assumption

That an uncertainty measure paired with Go-With-The-Winner-style tree search will systematically expand state coverage in hard exploration domains without the policy optimization step, and that the resulting trajectories can be reliably distilled into high-performing policies using existing supervised backward learning algorithms.

C3one line summary

Uncertainty-guided tree search decouples exploration from RL policy optimization, achieving order-of-magnitude better efficiency and SOTA performance on sparse-reward tasks like Montezuma's Revenge, Pitfall, and Venture via trajectory distillation.

Receipt and verification

First computed	2026-05-18T02:44:30.761130Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

651b13028ab8dbb04bcf42c87422eeee30bace98d09b6ca7a3b140b601a51de6

Aliases

arxiv: 2603.22273 · arxiv_version: 2603.22273v4 · doi: 10.48550/arxiv.2603.22273 · pith_short_12: MUNRGAUKXDN3 · pith_short_16: MUNRGAUKXDN3AS6P · pith_short_8: MUNRGAUK

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/MUNRGAUKXDN3AS6PILEHIIXO5Y \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 651b13028ab8dbb04bcf42c87422eeee30bace98d09b6ca7a3b140b601a51de6

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "75c8a6e40ae4c65700b5c11715334b9e7e52624b7f1857693e421012fc5ce241",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-23T17:56:52Z",
    "title_canon_sha256": "0bbc81c192dee09c25af7997247b6ad8a825e9fbb458ac5d2b94559859087437"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.22273",
    "kind": "arxiv",
    "version": 4
  }
}