{"paper":{"title":"Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"Goal-conditioned reinforcement learning succeeds because its reward represents the probability of reaching target states, yielding a smaller optimality gap than classical quadratic objectives and suiting it to dual control.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Ali Mesbah, Nathan P. Lawrence","submitted_at":"2025-12-06T15:28:35Z","abstract_excerpt":"Goal-conditioned reinforcement learning (RL) concerns the problem of training an agent to maximize the probability of reaching target goal states. This paper presents an analysis of the goal-conditioned setting based on optimal control. In particular, we derive an optimality gap between more classical, often quadratic, objectives and the goal-conditioned reward, elucidating the success of goal-conditioned RL and why classical ``dense'' rewards can falter. We then consider the partially observed Markov decision setting and connect state estimation to our probabilistic reward, making the goal-co"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"we derive an optimality gap between more classical, often quadratic, objectives and the goal-conditioned reward, elucidating the success of goal-conditioned RL and why classical ``dense'' rewards can falter. We then consider the partially observed Markov decision setting and connect state estimation to our probabilistic reward, making the goal-conditioned reward well suited to dual control problems.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The analysis assumes that the goal-conditioned reward can be interpreted directly as a probability of reaching target states and that this interpretation transfers without additional unstated restrictions on the system dynamics or observation model when moving to the POMDP and dual-control setting.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Goal-conditioned RL succeeds over dense rewards because its probabilistic goal-reaching objective aligns naturally with dual control requirements in uncertain, partially observed systems.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Goal-conditioned reinforcement learning succeeds because its reward represents the probability of reaching target states, yielding a smaller optimality gap than classical quadratic objectives and suiting it to dual control.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"70b40642b31f1f375b2c1ec6a3afe54633b812ce331e44fdf92e6ca0bc55fda0"},"source":{"id":"2512.06471","kind":"arxiv","version":2},"verdict":{"id":"9a7c26e3-749b-46b8-9daf-378c9ec8c9e1","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T00:34:21.113237Z","strongest_claim":"we derive an optimality gap between more classical, often quadratic, objectives and the goal-conditioned reward, elucidating the success of goal-conditioned RL and why classical ``dense'' rewards can falter. We then consider the partially observed Markov decision setting and connect state estimation to our probabilistic reward, making the goal-conditioned reward well suited to dual control problems.","one_line_summary":"Goal-conditioned RL succeeds over dense rewards because its probabilistic goal-reaching objective aligns naturally with dual control requirements in uncertain, partially observed systems.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The analysis assumes that the goal-conditioned reward can be interpreted directly as a probability of reaching target states and that this interpretation transfers without additional unstated restrictions on the system dynamics or observation model when moving to the POMDP and dual-control setting.","pith_extraction_headline":"Goal-conditioned reinforcement learning succeeds because its reward represents the probability of reaching target states, yielding a smaller optimality gap than classical quadratic objectives and suiting it to dual control."},"references":{"count":3,"sample":[{"doi":"","year":1974,"title":"Bar-Shalom, Y. and Tse, E. (1974). Dual effect, certainty equivalence, and separation in stochastic control.IEEE Transactions on Automatic Control, 19(5), 494–500. Bayard, D.S. and Schumitzky, A. (201","work_id":"529407f9-f5d8-4324-9e4f-d017932e30cb","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2003,"title":"Athena scientific. Chen, Z. (2003). Bayesian filtering: From Kalman filters to particle filters, and beyond.Statistics, 182(1), 1–69. Drgoˇ na, J., Kiˇ s, K., Tuor, A., Vrabie, D., and Klauˇ co, M. (2","work_id":"1f850bfa-57e8-427f-9deb-f092399d598c","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Vyas, N., Morwani, D., Zhao, R., Kwun, M., Shapira, I., Brandfonbrener, D., Janson, L., and Kakade, S","work_id":"cfa4927a-bb15-4530-8f2f-c3a94293a3fe","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":3,"snapshot_sha256":"29292371b13746bf47fd245a783774c7bc96dd3a1df84070cfe5a67c3f85d8f4","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"3f541339ad2e74fb54896cde805d2176d702883a507df23eae94608fe236db54"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}