Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.
arXiv preprint arXiv:2101.07123 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
EXACT is a new DSL for human motions as executable reward-generating programs, enabling compositional neuro-symbolic models that improve data efficiency and capture intuitive action relationships over monolithic approaches.
Survival value learning expresses the goal-conditioned value function as a discounted sum of survival probabilities and estimates it with maximum-likelihood hazard models on censored data, matching or exceeding TD baselines on long-horizon offline GCRL tasks.
Universal horizon models extend geometric horizon models to arbitrary horizons and apply winsorized distributions for stable offline RL value learning, outperforming baselines on 100 OGBench tasks.
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.
InFOM applies flow matching to model intention-conditioned occupancy measures for RL pre-training, reporting 1.8x median return gains and 36% higher success rates on benchmarks.
Temporal abstraction functions as a low-pass filter on transition dynamics to lower the effective rank of successor representations while bounding value function error in forward-backward learning.
citing papers explorer
-
Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning
Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.
-
Understanding Human Actions through the Lens of Executable Models
EXACT is a new DSL for human motions as executable reward-generating programs, enabling compositional neuro-symbolic models that improve data efficiency and capture intuitive action relationships over monolithic approaches.
-
SVL: Goal-Conditioned Reinforcement Learning as Survival Learning
Survival value learning expresses the goal-conditioned value function as a discounted sum of survival probabilities and estimates it with maximum-likelihood hazard models on censored data, matching or exceeding TD baselines on long-horizon offline GCRL tasks.
-
Offline Reinforcement Learning with Universal Horizon Models
Universal horizon models extend geometric horizon models to arbitrary horizons and apply winsorized distributions for stable offline RL value learning, outperforming baselines on 100 OGBench tasks.
-
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
-
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
-
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.
-
Intention-Conditioned Flow Occupancy Models
InFOM applies flow matching to model intention-conditioned occupancy measures for RL pre-training, reporting 1.8x median return gains and 36% higher success rates on benchmarks.
-
Spectral Alignment in Forward-Backward Representations via Temporal Abstraction
Temporal abstraction functions as a low-pass filter on transition dynamics to lower the effective rank of successor representations while bounding value function error in forward-backward learning.