DIBS decouples task policy learning via RL from evolution function learning via behavioral cloning to achieve more stable training and better generalization than prior RL and meta-RL methods for inductive generalization from specifications.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Iterative refinement of unknown MDP parameters allows repeated satisfaction of PAC conditions, yielding asymptotic optimality for reachability specifications in RL.
citing papers explorer
-
Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications
DIBS decouples task policy learning via RL from evolution function learning via behavioral cloning to achieve more stable training and better generalization than prior RL and meta-RL methods for inductive generalization from specifications.
-
Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality
Iterative refinement of unknown MDP parameters allows repeated satisfaction of PAC conditions, yielding asymptotic optimality for reachability specifications in RL.