Pith Number

pith:5IMNTGCP

pith:2026:5IMNTGCPCOPPMZ36753LOFQIYH

not attested not anchored not stored refs resolved

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

Alexandre Proutiere, Stefan Stojanovic

Switching successor measures arise naturally from classical ones and let a single forward-backward representation produce both high-level subgoals and low-level actions in zero-shot hierarchical RL.

arxiv:2605.13207 v1 · 2026-05-13 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{5IMNTGCPCOPPMZ36753LOFQIYH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Switching successor measures arise naturally from classical successor measures while preserving their underlying structure, allowing FB π-Switch to extract both a high-level subgoal-selection policy and a low-level control policy directly from forward-backward representations for hierarchical zero-shot RL without additional supervision, fixed horizons, or manually designed subgoals.

C2weakest assumption

That switching successor measures can be derived from classical ones in a way that preserves structure sufficiently to support emergent hierarchical behavior from a single FB representation across both goal-conditioned and general reward tasks.

C3one line summary

Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.

References

64 extracted · 64 resolved · 3 Pith anchors

[1] Deep reinforcement learning at the edge of the statistical precipice 2021

[2] A unified framework for unsupervised reinforcement learning al- gorithms 2025

[3] Proto successor measure: Representing the behavior space of an RL agent.arXiv preprint arXiv:2411.19418, 2024 2024

[4] Option-aware temporally abstracted value for offline goal-conditioned reinforcement learning 2025

[5] OPAL: Offline primitive discovery for accelerating offline reinforcement learning 2021

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-18T03:08:48.587295Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

ea18d9984f139ef6677eff76b71608c1ef188d712c6ea0e74f1e87623ddfb6c4

Aliases

arxiv: 2605.13207 · arxiv_version: 2605.13207v1 · doi: 10.48550/arxiv.2605.13207 · pith_short_12: 5IMNTGCPCOPP · pith_short_16: 5IMNTGCPCOPPMZ36 · pith_short_8: 5IMNTGCP

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/5IMNTGCPCOPPMZ36753LOFQIYH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ea18d9984f139ef6677eff76b71608c1ef188d712c6ea0e74f1e87623ddfb6c4

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "eb236112774df187358ca124604368097ee047bca5ed54dd1318152a31c2821a",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T08:58:33Z",
    "title_canon_sha256": "09f2085250a3244c76dfc64a56592c9b2f11523a3a97982b0ab1af7245936e21"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13207",
    "kind": "arxiv",
    "version": 1
  }
}