Dataset generation The MDP is modiﬁed to include another goal: terminal state with a reward of 1 when accessing it

Baseline generation See (Laroche et al · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Safe Policy Improvement with Soft Baseline Bootstrapping

cs.LG · 2019-07-11 · unverdicted · novelty 6.0

Extends SPIBB with soft uncertainty-constrained policy search for less conservative safe policy improvement in batch RL, with optimal and approximate solvers shown empirically on finite and neural MDPs.

citing papers explorer

Showing 1 of 1 citing paper.

Safe Policy Improvement with Soft Baseline Bootstrapping cs.LG · 2019-07-11 · unverdicted · none · ref 5
Extends SPIBB with soft uncertainty-constrained policy search for less conservative safe policy improvement in batch RL, with optimal and approximate solvers shown empirically on finite and neural MDPs.

Dataset generation The MDP is modiﬁed to include another goal: terminal state with a reward of 1 when accessing it

fields

years

verdicts

representative citing papers

citing papers explorer