pith. sign in
Pith Number

pith:SZRBQK25

pith:2025:SZRBQK25IGU622WI7VGQCFN7RA
not attested not anchored not stored refs resolved

Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control

Ali Mesbah, Nathan P. Lawrence

Goal-conditioned reinforcement learning succeeds because its reward represents the probability of reaching target states, yielding a smaller optimality gap than classical quadratic objectives and suiting it to dual control.

arxiv:2512.06471 v2 · 2025-12-06 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{SZRBQK25IGU622WI7VGQCFN7RA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we derive an optimality gap between more classical, often quadratic, objectives and the goal-conditioned reward, elucidating the success of goal-conditioned RL and why classical ``dense'' rewards can falter. We then consider the partially observed Markov decision setting and connect state estimation to our probabilistic reward, making the goal-conditioned reward well suited to dual control problems.

C2weakest assumption

The analysis assumes that the goal-conditioned reward can be interpreted directly as a probability of reaching target states and that this interpretation transfers without additional unstated restrictions on the system dynamics or observation model when moving to the POMDP and dual-control setting.

C3one line summary

Goal-conditioned RL succeeds over dense rewards because its probabilistic goal-reaching objective aligns naturally with dual control requirements in uncertain, partially observed systems.

References

3 extracted · 3 resolved · 0 Pith anchors

[1] Bar-Shalom, Y. and Tse, E. (1974). Dual effect, certainty equivalence, and separation in stochastic control.IEEE Transactions on Automatic Control, 19(5), 494–500. Bayard, D.S. and Schumitzky, A. (201 1974
[2] Athena scientific. Chen, Z. (2003). Bayesian filtering: From Kalman filters to particle filters, and beyond.Statistics, 182(1), 1–69. Drgoˇ na, J., Kiˇ s, K., Tuor, A., Vrabie, D., and Klauˇ co, M. (2 2003
[3] Vyas, N., Morwani, D., Zhao, R., Kwun, M., Shapira, I., Brandfonbrener, D., Janson, L., and Kakade, S 2025

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:00.583124Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9662182b5d41a9ed6ac8fd4d0115bf880df3d8e6348c8d1eac64231669893571

Aliases

arxiv: 2512.06471 · arxiv_version: 2512.06471v2 · doi: 10.48550/arxiv.2512.06471 · pith_short_12: SZRBQK25IGU6 · pith_short_16: SZRBQK25IGU622WI · pith_short_8: SZRBQK25
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SZRBQK25IGU622WI7VGQCFN7RA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9662182b5d41a9ed6ac8fd4d0115bf880df3d8e6348c8d1eac64231669893571
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1f3dee88a1f7937337015abb3a085ed697aca02282d04d6ee24436c3c0418f41",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-12-06T15:28:35Z",
    "title_canon_sha256": "72b7a0aead423de5ff3232f497621bb6590a182c6b94f97156784ac26df32b67"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2512.06471",
    "kind": "arxiv",
    "version": 2
  }
}