Functional Graphs for Predicting and Explaining Goal Failure in Sparse Goal-Conditioned RL

Shalley Dash

arxiv: 2605.09335 · v1 · submitted 2026-05-10 · 💻 cs.LG

Functional Graphs for Predicting and Explaining Goal Failure in Sparse Goal-Conditioned RL

Shalley Dash This is my paper

Pith reviewed 2026-05-12 04:28 UTC · model grok-4.3

classification 💻 cs.LG

keywords goal-conditioned reinforcement learningsparse rewardsfunctional graphsgoal failure diagnosticslocal goal supportattractors and basinsgridworld environments

0 comments

The pith

A one-step local support measure on greedy policy graphs predicts most goal failures in sparse goal-conditioned RL.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that goal-conditioned policies trained with sparse rewards can be turned into deterministic functional graphs by always taking the greedy action from each state. Inside these graphs, local goal support counts the fraction of a state's valid neighbors whose successor is the goal itself. This cheap statistic, when below 0.5, flags low-success goals with precision 0.921, recall 0.929, and F1 0.925 in the main 8x8 setting. When local support is present yet the goal still fails, the paper classifies the full graph into four types that reveal whether distant states are pulled into competing attractors or trapped in fragmented basins. The result is a post-training diagnostic that explains why aggregate success rates conceal per-goal problems.

Core claim

Trained goal-conditioned value policies induce deterministic functional graphs under greedy evaluation that decompose behavior into attractors and basins. Local goal support, defined as the one-step fraction of valid neighbors whose greedy successor is the goal, is exactly zero for unreachable goals and, when at or below 0.5, identifies low-success goals with precision 0.921, recall 0.929, and F1 0.925 in 8x8 TD learning. A compact taxonomy of goal-dominant, competitor-dominated, partial/contested, and fragmented graphs accounts for the residual failures where local support exists but global success does not.

What carries the argument

The deterministic functional graph induced by greedy policy evaluation, which maps every state to a single successor, together with the local goal support statistic that counts how many neighboring states point directly to the goal.

If this is right

Low-LGS goals can be identified after training without running many evaluation episodes per goal.
Residual failures can be attributed to specific global structures such as competitor attractors or basin fragmentation.
The same threshold and taxonomy apply across TD and other update rules, different curricula, larger grids, and bottleneck geometries.
Aggregate success rates can be decomposed into per-goal contributions using the local and global graph properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training procedures could be modified to increase local goal support directly, for example by adding an auxiliary loss that encourages neighbors to map to the intended goal.
The graph taxonomy might be used to design automatic relabeling or goal-filtering steps inside curricula that drop or repair competitor-dominated or fragmented goals.
Similar successor-graph analysis could be applied to hierarchical or skill-based RL to detect when low-level skills fail to compose into higher-level goals.

Load-bearing premise

The environments are deterministic and sparse so that greedy evaluation produces clean functional graphs without stochastic transitions or continuous-state noise.

What would settle it

Applying the identical LGS <= 0.5 rule in a version of the gridworld where each action has a nonzero probability of moving in a random direction and checking whether the precision, recall, and F1 scores remain above 0.9.

Figures

Figures reproduced from arXiv: 2605.09335 by Shalley Dash.

**Figure 2.** Figure 2: Policy maps in the 8×8 bottleneck environment under TD learning. The red star marks the goal, orange markers indicate non-goal attractors, blue arrows show greedy policy flow, and brown arrows indicate wall-hit transitions. Text beneath each panel reports S: finite-horizon success, Sup: local goal support count, C: largest competing-basin fraction, and F: fragmentation index [PITH_FULL_IMAGE:figures/full_… view at source ↗

**Figure 3.** Figure 3: Success rates as a function of goal distance (mean [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Additional policy maps under TD learning illustrating variability within the same learning [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Policy maps under Monte Carlo (MC) learning for three representative goals (high, medium, [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Policy maps under TD learning with edge-biased curriculum sampling. Despite preferential [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Policy maps for neighboring goals in a 12 × 12 environment under TD learning. The left goal exhibits a coherent basin of attraction with full local support and achieves perfect success. The middle goal shows partial success due to a competing attractor that captures a large portion of the state space. The right goal exhibits fragmented dynamics with multiple competing attractors and weak local support, lea… view at source ↗

**Figure 8.** Figure 8: Policy maps for neighboring goals under Monte Carlo (MC) learning in a [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Additional policy maps in the 8×8 bottleneck environment under TD learning. The red star marks the goal, orange markers indicate non-goal attractors, blue arrows show greedy policy flow, and brown arrows indicate wall-hit transitions. Text below each panel reports S: finite-horizon success, Sup: local goal support count, C: largest competing-basin fraction, and F: fragmentation index. The examples show tha… view at source ↗

**Figure 10.** Figure 10: LGS stratification curves across experimental settings. Each panel plots mean finite [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

read the original abstract

Sparse goal-conditioned reinforcement learning can produce policies whose failures are hidden by aggregate success rates. We analyze trained goal-conditioned value policies through the deterministic functional graphs induced by greedy evaluation: for each goal, every state maps to a single successor, decomposing behavior into attractors and basins. This reveals a local-to-global structure in learned policies. We define local goal support (LGS), a one-step statistic measuring the fraction of valid neighboring states whose greedy successor is the goal. In deterministic sparse GridWorlds, zero LGS exactly precludes goal entry from non-goal starts. Empirically, weak LGS is a strong diagnostic of goal-level failure across update rules, curricula, larger grids, and bottleneck geometries: the fixed rule LGS <= 0.5 identifies low-success goals with precision 0.921, recall 0.929, and F1 0.925 in the main 8x8 TD setting, with similar performance across variants. However, local support is not sufficient for global success: some supported goals still fail because distant states are captured by competing attractors or fragmented basin structure. We therefore introduce a compact post-hoc taxonomy of policy-induced graphs -- goal-dominant, competitor-dominated, partial/contested, and fragmented -- to characterize residual failure modes beyond local support. These results show that sparse GCRL failures can be understood as structured policy-induced dynamics, and that local one-step policy structure provides a cheap post-training diagnostic for goal-level failure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LGS gives a cheap, high-accuracy post-training flag for failing goals in deterministic sparse GCRL by reading the greedy policy graph, but the method is tied to that exact setting.

read the letter

The paper shows that you can build the deterministic functional graph of a trained goal-conditioned policy and read off a simple one-step statistic, local goal support, that predicts which goals the policy will actually reach. In their 8x8 TD experiments the rule LGS <= 0.5 catches low-success goals at 0.921 precision and 0.929 recall. They also supply a four-category taxonomy of the resulting graphs to separate cases where local support is present but global structure still blocks success. Both the metric and the taxonomy are new in this form and not just rephrasings of earlier graph or value-function analyses. The empirical numbers hold across several update rules, curricula, and grid sizes, which is the strongest part of the work. The construction is cheap once you have the policy, and the zero-LGS impossibility result follows directly from the deterministic assumption. That gives practitioners a concrete debugging handle they did not have before. The central limitation is exactly what the stress test flags: everything is defined for deterministic discrete dynamics where every state has a single greedy successor. Add action noise, transition noise, or continuous states and both the graph and the LGS definition stop being well-posed. The paper states this scope clearly and does not claim broader validity, but it also supplies no experiments outside deterministic grids, so the practical reach is narrower than the title might suggest. The taxonomy is still qualitative and requires inspecting the full graph, which limits how automatic the diagnosis can be. This is useful reading for anyone training sparse goal-conditioned agents on discrete deterministic domains who wants to understand per-goal failures after training. It is not a broad theoretical result and will not change how most people train policies, but the diagnostic is reproducible and the experiments are straightforward. The paper deserves a serious referee because the core claim is falsifiable, the numbers are reported cleanly, and the limitation is stated rather than hidden.

Referee Report

2 major / 2 minor

Summary. The paper analyzes trained goal-conditioned policies in sparse RL by constructing deterministic functional graphs from greedy action selection. It introduces Local Goal Support (LGS) as a one-step local statistic and shows that a threshold on LGS can predict goal-level success or failure with high precision and recall in deterministic GridWorld environments. Additionally, it proposes a taxonomy of policy graph structures to explain cases where local support does not guarantee global success.

Significance. If the empirical findings hold, the approach provides an efficient, post-training diagnostic tool for identifying problematic goals in goal-conditioned RL without requiring further environment interactions. This could be valuable for practitioners working with sparse rewards, as it turns policy inspection into a graph analysis problem and offers interpretable categories for failure modes. The cross-variant consistency of the LGS diagnostic is a notable strength.

major comments (2)

[Abstract] Abstract: The fixed rule LGS <= 0.5 is reported to achieve precision 0.921, recall 0.929, F1 0.925 in the 8x8 TD setting. However, this performance is tied to the deterministic setting where each state has exactly one greedy successor; the manuscript should include a discussion on how the diagnostic would need to be adapted for stochastic environments, as the current definition of LGS and the graph taxonomy would not apply directly.
[Abstract] Abstract: The statement that zero LGS exactly precludes goal entry from non-goal starts is presented as a key property supporting the diagnostic. A formal proof or derivation of this impossibility result should be provided in the main text (with explicit assumptions) to allow verification of its scope and to confirm it is not circular with the graph construction.

minor comments (2)

The abstract mentions similar performance across variants but does not specify the exact variants or provide quantitative comparison; a table summarizing F1 scores for all settings would improve clarity.
Ensure consistent use of terminology for the graph taxonomy (e.g., goal-dominant, competitor-dominated, partial/contested, fragmented) and provide at least one illustrative example per category in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and the recommendation for minor revision. The comments highlight important points on scope and rigor that we will address directly in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The fixed rule LGS <= 0.5 is reported to achieve precision 0.921, recall 0.929, F1 0.925 in the 8x8 TD setting. However, this performance is tied to the deterministic setting where each state has exactly one greedy successor; the manuscript should include a discussion on how the diagnostic would need to be adapted for stochastic environments, as the current definition of LGS and the graph taxonomy would not apply directly.

Authors: We agree that the LGS diagnostic and the functional-graph taxonomy are defined for deterministic greedy policies. We will add a new paragraph in the Discussion section that explicitly states this scope limitation and sketches two directions for stochastic extension: (i) replacing the deterministic successor with an expected-support measure over the policy distribution, and (ii) constructing sampled functional graphs via Monte-Carlo rollouts. We will also note that the exact zero-LGS impossibility result does not carry over without additional assumptions on the support of the stochastic policy. revision: yes
Referee: [Abstract] Abstract: The statement that zero LGS exactly precludes goal entry from non-goal starts is presented as a key property supporting the diagnostic. A formal proof or derivation of this impossibility result should be provided in the main text (with explicit assumptions) to allow verification of its scope and to confirm it is not circular with the graph construction.

Authors: We accept the request for a formal derivation. The claim follows immediately from the definition of LGS and the construction of the deterministic functional graph: if LGS(g) = 0 then no non-goal state s has g as its unique greedy successor, so the goal set has in-degree zero from outside itself. We will insert a short, self-contained lemma (with the explicit assumptions of determinism and greedy action selection) immediately after the definition of LGS in Section 3, making the argument non-circular. revision: yes

Circularity Check

0 steps flagged

No circularity: LGS is an independent one-step statistic tested against rollout success

full rationale

The paper defines local goal support (LGS) as the fraction of valid neighboring states whose greedy successor equals the goal, directly from the deterministic functional graph. It proves that LGS=0 precludes entry in deterministic sparse GridWorlds and reports empirical F1=0.925 for the fixed threshold LGS<=0.5 against independently measured goal success rates obtained via policy rollouts. Success rates are not used to fit any parameter or define LGS; the taxonomy of graph structures is a post-hoc descriptive classification. No self-citations, fitted inputs, or self-definitional reductions appear in the derivation. The central claim is an empirical correlation between a local graph property and global outcome, which does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claims rest on deterministic dynamics and greedy evaluation in sparse grids; LGS and taxonomy are introduced without external validation.

axioms (1)

domain assumption Environments are deterministic sparse GridWorlds allowing exact functional graph construction from greedy policy.
Stated as the setting for all empirical results and the zero-LGS preclusion claim.

invented entities (2)

Local Goal Support (LGS) no independent evidence
purpose: One-step statistic measuring direct goal entry from neighbors
Newly defined in the paper as the core diagnostic.
Policy-induced graph taxonomy (goal-dominant, competitor-dominated, partial/contested, fragmented) no independent evidence
purpose: Classify residual failure modes beyond local support
Post-hoc categories introduced to explain cases where LGS is high but success is low.

pith-pipeline@v0.9.0 · 5557 in / 1355 out tokens · 37065 ms · 2026-05-12T04:28:42.765349+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We analyze trained goal-conditioned value policies through the deterministic functional graphs induced by greedy evaluation: for each goal, every state maps to a single successor, decomposing behavior into attractors and basins. ... We define local goal support (LGS), a one-step statistic measuring the fraction of valid neighboring states whose greedy successor is the goal.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 2 (Zero local support precludes goal entry). ... If fg(s) ≠ g ∀ s ∈ N(g), then no trajectory starting from any s0 ≠ g can reach g.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Hindsight experience replay

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[2]

TLDR: Unsupervised goal-conditioned RL via temporal distance-aware representations

Junik Bae, Kwanyoung Park, and Youngwoon Lee. TLDR: Unsupervised goal-conditioned RL via temporal distance-aware representations. InConference on Robot Learning, 2024

work page 2024
[3]

Zico Kolter, and Vladlen Koltun

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. Deep equilibrium models. InAdvances in Neural Information Processing Systems, 2019

work page 2019
[4]

Hunt, Tom Schaul, Hado van Hasselt, and David Silver

Andre Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, and David Silver. Successor features for transfer in reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[5]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, 2018

work page 2018
[6]

Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993

Peter Dayan. Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993

work page 1993
[7]

Elaydi.An Introduction to Difference Equations

Saber N. Elaydi.An Introduction to Difference Equations. Springer, 3 edition, 2005

work page 2005
[8]

Contrastive learning as goal-conditioned reinforcement learning

Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, and Sergey Levine. Contrastive learning as goal-conditioned reinforcement learning. InAdvances in Neural Information Processing Systems, volume 35, 2022

work page 2022
[9]

Graph neural networks and reinforcement learning: A survey

Fatemeh Fathinezhad, Peyman Adibi, Babak Shoushtarian, and Jocelyn Chanussot. Graph neural networks and reinforcement learning: A survey. InGraph Neural Networks. IntechOpen, 2023

work page 2023
[10]

Cambridge University Press, Cambridge, 2009.doi:10.1017/CBO9780511801655

Philippe Flajolet and Robert Sedgewick.Analytic Combinatorics. Cambridge University Press, Cambridge, 2009. doi: 10.1017/CBO9780511801655

work page doi:10.1017/cbo9780511801655 2009
[11]

Automatic goal generation for reinforcement learning agents

Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. Automatic goal generation for reinforcement learning agents. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 1515–1528. PMLR, 2018

work page 2018
[12]

Learning to reach goals via iterated supervised learning

Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin, Benjamin Eysenbach, and Sergey Levine. Learning to reach goals via iterated supervised learning. InInternational Conference on Learning Representations, 2021

work page 2021
[13]

Vittorio Giammarino, Ruiqi Ni, and Ahmed H. Qureshi. Physics-informed value learner for offline goal-conditioned reinforcement learning. InAdvances in Neural Information Processing Systems, 2025

work page 2025
[14]

Addison-Wesley, 1969

Frank Harary.Graph Theory. Addison-Wesley, 1969

work page 1969
[15]

Offline goal- conditioned reinforcement learning with quasimetric representations

Vivek Myers, Bill Chunyuan Zheng, Benjamin Eysenbach, and Sergey Levine. Offline goal- conditioned reinforcement learning with quasimetric representations. InAdvances in Neural Information Processing Systems, 2025

work page 2025
[16]

Au- tomatic curriculum learning for deep rl: A short survey

Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. Au- tomatic curriculum learning for deep rl: A short survey. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI), pages 4819–4825, 2020. doi: 10.24963/ijcai.2020/671

work page doi:10.24963/ijcai.2020/671 2020
[17]

Universal value function approxima- tors

Tom Schaul, Dan Horgan, Karol Gregor, and David Silver. Universal value function approxima- tors. InInternational Conference on Machine Learning (ICML), 2015

work page 2015
[18]

NerveNet: Learning structured policy with graph neural networks

Tingwu Wang, Renjie Liao, Jimmy Ba, and Sanja Fidler. NerveNet: Learning structured policy with graph neural networks. InInternational Conference on Learning Representations, 2018. 10

work page 2018
[19]

Louis-Pascal A. C. Xhonneux, Meng Qu, and Jian Tang. Continuous graph neural networks. InProceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 10432–10441, 2020

work page 2020
[20]

Rethinking goal-conditioned supervised learning and its connection to offline RL

Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, and Chongjie Zhang. Rethinking goal-conditioned supervised learning and its connection to offline RL. In International Conference on Learning Representations, 2022

work page 2022
[21]

Reichert, Timothy P

Vinícius Flores Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David P. Reichert, Timothy P. Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, and Pe- ter W. Battaglia. Deep reinforcement learning with relational inductive biases. InInternational Co...

work page 2019
[22]

IfLGS(g) = 0, thenSucc H (g) = 0

work page
[23]

IfLGS(g)>0, thenSucc H (g)>0forH≥1, but this does not imply high success

work page
[24]

16 Proof.The first statement follows from Proposition 2: any trajectory reaching g from a non-goal start must enter g from some valid neighbor s∈N(g)

EvenLGS(g) = 1does not guaranteeSucc H (g) = 1. 16 Proof.The first statement follows from Proposition 2: any trajectory reaching g from a non-goal start must enter g from some valid neighbor s∈N(g) . If no neighbor maps to g, such an entry transition is impossible. For the second statement, if LGS(g)>0 , then there exists at least one neighbor s∈N(g) such...

work page

[1] [1]

Hindsight experience replay

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017

[2] [2]

TLDR: Unsupervised goal-conditioned RL via temporal distance-aware representations

Junik Bae, Kwanyoung Park, and Youngwoon Lee. TLDR: Unsupervised goal-conditioned RL via temporal distance-aware representations. InConference on Robot Learning, 2024

work page 2024

[3] [3]

Zico Kolter, and Vladlen Koltun

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. Deep equilibrium models. InAdvances in Neural Information Processing Systems, 2019

work page 2019

[4] [4]

Hunt, Tom Schaul, Hado van Hasselt, and David Silver

Andre Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, and David Silver. Successor features for transfer in reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017

[5] [5]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, 2018

work page 2018

[6] [6]

Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993

Peter Dayan. Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993

work page 1993

[7] [7]

Elaydi.An Introduction to Difference Equations

Saber N. Elaydi.An Introduction to Difference Equations. Springer, 3 edition, 2005

work page 2005

[8] [8]

Contrastive learning as goal-conditioned reinforcement learning

Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, and Sergey Levine. Contrastive learning as goal-conditioned reinforcement learning. InAdvances in Neural Information Processing Systems, volume 35, 2022

work page 2022

[9] [9]

Graph neural networks and reinforcement learning: A survey

Fatemeh Fathinezhad, Peyman Adibi, Babak Shoushtarian, and Jocelyn Chanussot. Graph neural networks and reinforcement learning: A survey. InGraph Neural Networks. IntechOpen, 2023

work page 2023

[10] [10]

Cambridge University Press, Cambridge, 2009.doi:10.1017/CBO9780511801655

Philippe Flajolet and Robert Sedgewick.Analytic Combinatorics. Cambridge University Press, Cambridge, 2009. doi: 10.1017/CBO9780511801655

work page doi:10.1017/cbo9780511801655 2009

[11] [11]

Automatic goal generation for reinforcement learning agents

Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. Automatic goal generation for reinforcement learning agents. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 1515–1528. PMLR, 2018

work page 2018

[12] [12]

Learning to reach goals via iterated supervised learning

Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin, Benjamin Eysenbach, and Sergey Levine. Learning to reach goals via iterated supervised learning. InInternational Conference on Learning Representations, 2021

work page 2021

[13] [13]

Vittorio Giammarino, Ruiqi Ni, and Ahmed H. Qureshi. Physics-informed value learner for offline goal-conditioned reinforcement learning. InAdvances in Neural Information Processing Systems, 2025

work page 2025

[14] [14]

Addison-Wesley, 1969

Frank Harary.Graph Theory. Addison-Wesley, 1969

work page 1969

[15] [15]

Offline goal- conditioned reinforcement learning with quasimetric representations

Vivek Myers, Bill Chunyuan Zheng, Benjamin Eysenbach, and Sergey Levine. Offline goal- conditioned reinforcement learning with quasimetric representations. InAdvances in Neural Information Processing Systems, 2025

work page 2025

[16] [16]

Au- tomatic curriculum learning for deep rl: A short survey

Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. Au- tomatic curriculum learning for deep rl: A short survey. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI), pages 4819–4825, 2020. doi: 10.24963/ijcai.2020/671

work page doi:10.24963/ijcai.2020/671 2020

[17] [17]

Universal value function approxima- tors

Tom Schaul, Dan Horgan, Karol Gregor, and David Silver. Universal value function approxima- tors. InInternational Conference on Machine Learning (ICML), 2015

work page 2015

[18] [18]

NerveNet: Learning structured policy with graph neural networks

Tingwu Wang, Renjie Liao, Jimmy Ba, and Sanja Fidler. NerveNet: Learning structured policy with graph neural networks. InInternational Conference on Learning Representations, 2018. 10

work page 2018

[19] [19]

Louis-Pascal A. C. Xhonneux, Meng Qu, and Jian Tang. Continuous graph neural networks. InProceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 10432–10441, 2020

work page 2020

[20] [20]

Rethinking goal-conditioned supervised learning and its connection to offline RL

Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, and Chongjie Zhang. Rethinking goal-conditioned supervised learning and its connection to offline RL. In International Conference on Learning Representations, 2022

work page 2022

[21] [21]

Reichert, Timothy P

Vinícius Flores Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David P. Reichert, Timothy P. Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, and Pe- ter W. Battaglia. Deep reinforcement learning with relational inductive biases. InInternational Co...

work page 2019

[22] [22]

IfLGS(g) = 0, thenSucc H (g) = 0

work page

[23] [23]

IfLGS(g)>0, thenSucc H (g)>0forH≥1, but this does not imply high success

work page

[24] [24]

16 Proof.The first statement follows from Proposition 2: any trajectory reaching g from a non-goal start must enter g from some valid neighbor s∈N(g)

EvenLGS(g) = 1does not guaranteeSucc H (g) = 1. 16 Proof.The first statement follows from Proposition 2: any trajectory reaching g from a non-goal start must enter g from some valid neighbor s∈N(g) . If no neighbor maps to g, such an entry transition is impossible. For the second statement, if LGS(g)>0 , then there exists at least one neighbor s∈N(g) such...

work page