Extends SPIBB with soft uncertainty-constrained policy search for less conservative safe policy improvement in batch RL, with optimal and approximate solvers shown empirically on finite and neural MDPs.
36 Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes C Helicopter experiment details C.1 Details about the helicopter environment See (Laroche et al., 2019, Appendix D.1)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Safe Policy Improvement with Soft Baseline Bootstrapping
Extends SPIBB with soft uncertainty-constrained policy search for less conservative safe policy improvement in batch RL, with optimal and approximate solvers shown empirically on finite and neural MDPs.