arXiv preprint arXiv:2101.07123 , year=

Learning successor states, goal-dependent values: A mathematical viewpoint , author= · 2021 · arXiv 2101.07123

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.

Understanding Human Actions through the Lens of Executable Models

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

EXACT is a new DSL for human motions as executable reward-generating programs, enabling compositional neuro-symbolic models that improve data efficiency and capture intuitive action relationships over monolithic approaches.

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

cs.LG · 2026-04-19 · unverdicted · novelty 7.0

Survival value learning expresses the goal-conditioned value function as a discounted sum of survival probabilities and estimates it with maximum-likelihood hazard models on censored data, matching or exceeding TD baselines on long-horizon offline GCRL tasks.

Offline Reinforcement Learning with Universal Horizon Models

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Universal horizon models extend geometric horizon models to arbitrary horizons and apply winsorized distributions for stable offline RL value learning, outperforming baselines on 100 OGBench tasks.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

cs.LG · 2025-06-11 · unverdicted · novelty 5.0

BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.

Intention-Conditioned Flow Occupancy Models

cs.LG · 2025-06-10 · unverdicted · novelty 5.0

InFOM applies flow matching to model intention-conditioned occupancy measures for RL pre-training, reporting 1.8x median return gains and 36% higher success rates on benchmarks.

Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

cs.LG · 2026-03-20 · unverdicted · novelty 4.0

Temporal abstraction functions as a low-pass filter on transition dynamics to lower the effective rank of successor representations while bounding value function error in forward-backward learning.

citing papers explorer

Showing 9 of 9 citing papers.

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 10
Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.
Understanding Human Actions through the Lens of Executable Models cs.AI · 2026-04-20 · unverdicted · none · ref 2
EXACT is a new DSL for human motions as executable reward-generating programs, enabling compositional neuro-symbolic models that improve data efficiency and capture intuitive action relationships over monolithic approaches.
SVL: Goal-Conditioned Reinforcement Learning as Survival Learning cs.LG · 2026-04-19 · unverdicted · none · ref 1
Survival value learning expresses the goal-conditioned value function as a discounted sum of survival probabilities and estimates it with maximum-likelihood hazard models on censored data, matching or exceeding TD baselines on long-horizon offline GCRL tasks.
Offline Reinforcement Learning with Universal Horizon Models cs.LG · 2026-05-15 · unverdicted · none · ref 13
Universal horizon models extend geometric horizon models to arbitrary horizons and apply winsorized distributions for stable offline RL value learning, outperforming baselines on 100 OGBench tasks.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 10
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited cs.LG · 2026-05-16 · unverdicted · none · ref 11
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning cs.LG · 2025-06-11 · unverdicted · none · ref 2
BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.
Intention-Conditioned Flow Occupancy Models cs.LG · 2025-06-10 · unverdicted · none · ref 11
InFOM applies flow matching to model intention-conditioned occupancy measures for RL pre-training, reporting 1.8x median return gains and 36% higher success rates on benchmarks.
Spectral Alignment in Forward-Backward Representations via Temporal Abstraction cs.LG · 2026-03-20 · unverdicted · none · ref 3
Temporal abstraction functions as a low-pass filter on transition dynamics to lower the effective rank of successor representations while bounding value function error in forward-backward learning.

arXiv preprint arXiv:2101.07123 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer