Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.
arXiv preprint arXiv:2101.07123 , year=
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
EXACT is a new DSL for human motions as executable reward-generating programs, enabling compositional neuro-symbolic models that improve data efficiency and capture intuitive action relationships over monolithic approaches.
Frames online zero-shot transfer with BFMs as a bandit problem and derives an eigenvalue-minimization exploration strategy under linear reward approximation.
Text2BFM aligns language with a frozen BFM via a text-aligned variational behavioral bottleneck to generate long motions by decoding latents into policy actions.
Universal horizon models extend geometric horizon models to arbitrary horizons and apply winsorized distributions for stable offline RL value learning, outperforming baselines on 100 OGBench tasks.
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
BYOL-γ uses self-predictive representations to approximate successor representations, improving zero-shot combinatorial generalization in goal-conditioned behavioral cloning.
InFOM applies flow matching to model intention-conditioned occupancy measures for RL pre-training, reporting 1.8x median return gains and 36% higher success rates on benchmarks.
Temporal abstraction functions as a low-pass filter on transition dynamics to lower the effective rank of successor representations while bounding value function error in forward-backward learning.