36 Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes C Helicopter experiment details C.1 Details about the helicopter environment See (Laroche et al., 2019, Appendix D.1)

Random MDPs: hyper-parameter mean, 1%-CVaR performance heatmaps for RaMDP methods under a weak (η = 0 · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Safe Policy Improvement with Soft Baseline Bootstrapping

cs.LG · 2019-07-11 · unverdicted · novelty 6.0

Extends SPIBB with soft uncertainty-constrained policy search for less conservative safe policy improvement in batch RL, with optimal and approximate solvers shown empirically on finite and neural MDPs.

citing papers explorer

Showing 1 of 1 citing paper.

Safe Policy Improvement with Soft Baseline Bootstrapping cs.LG · 2019-07-11 · unverdicted · none · ref 13
Extends SPIBB with soft uncertainty-constrained policy search for less conservative safe policy improvement in batch RL, with optimal and approximate solvers shown empirically on finite and neural MDPs.

36 Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes C Helicopter experiment details C.1 Details about the helicopter environment See (Laroche et al., 2019, Appendix D.1)

fields

years

verdicts

representative citing papers

citing papers explorer