Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Bernard Boigelot; Damien Ernst; Quentin Louveaux; Raphael Fonteneau

arxiv: 1202.5298 · v2 · pith:UYJCRZCMnew · submitted 2012-02-23 · 💻 cs.SY · cs.LG· cs.SY

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Raphael Fonteneau , Damien Ernst , Bernard Boigelot , Quentin Louveaux This is my paper

classification 💻 cs.SY cs.LGcs.SY

keywords relaxationproblemschemesbatchconstraintsdeterministicfirstlearning

0 comments

read the original abstract

We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22].

This paper has not been read by Pith yet.

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

discussion (0)