pith. sign in
Pith Number

pith:5IMNTGCP

pith:2026:5IMNTGCPCOPPMZ36753LOFQIYH
not attested not anchored not stored refs resolved

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

Alexandre Proutiere, Stefan Stojanovic

Switching successor measures arise naturally from classical ones and let a single forward-backward representation produce both high-level subgoals and low-level actions in zero-shot hierarchical RL.

arxiv:2605.13207 v1 · 2026-05-13 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{5IMNTGCPCOPPMZ36753LOFQIYH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Switching successor measures arise naturally from classical successor measures while preserving their underlying structure, allowing FB π-Switch to extract both a high-level subgoal-selection policy and a low-level control policy directly from forward-backward representations for hierarchical zero-shot RL without additional supervision, fixed horizons, or manually designed subgoals.

C2weakest assumption

That switching successor measures can be derived from classical ones in a way that preserves structure sufficiently to support emergent hierarchical behavior from a single FB representation across both goal-conditioned and general reward tasks.

C3one line summary

Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.

References

64 extracted · 64 resolved · 3 Pith anchors

[1] Deep reinforcement learning at the edge of the statistical precipice 2021
[2] A unified framework for unsupervised reinforcement learning al- gorithms 2025
[3] Proto successor measure: Representing the behavior space of an RL agent.arXiv preprint arXiv:2411.19418, 2024 2024
[4] Option-aware temporally abstracted value for offline goal-conditioned reinforcement learning 2025
[5] OPAL: Offline primitive discovery for accelerating offline reinforcement learning 2021

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T03:08:48.587295Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ea18d9984f139ef6677eff76b71608c1ef188d712c6ea0e74f1e87623ddfb6c4

Aliases

arxiv: 2605.13207 · arxiv_version: 2605.13207v1 · doi: 10.48550/arxiv.2605.13207 · pith_short_12: 5IMNTGCPCOPP · pith_short_16: 5IMNTGCPCOPPMZ36 · pith_short_8: 5IMNTGCP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/5IMNTGCPCOPPMZ36753LOFQIYH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ea18d9984f139ef6677eff76b71608c1ef188d712c6ea0e74f1e87623ddfb6c4
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "eb236112774df187358ca124604368097ee047bca5ed54dd1318152a31c2821a",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T08:58:33Z",
    "title_canon_sha256": "09f2085250a3244c76dfc64a56592c9b2f11523a3a97982b0ab1af7245936e21"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13207",
    "kind": "arxiv",
    "version": 1
  }
}