We start by showing that with these parameter we have that Q∗ M1(z0 i ) − Q∗ M0(z0 i ) > 2ε, which we do by casing on the sign of β : 21 Case 1: β < 0

Fix any ( ε, δ)-correct Q-algorithm U

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

cs.LG · 2025-05-30 · unverdicted · novelty 7.0

Derives PAC-type upper bounds and matching lower bounds on sample complexity for value and policy learning under recursive entropic risk measures, with exponential dependence on |β|/(1-γ).

citing papers explorer

Showing 1 of 1 citing paper.

Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model cs.LG · 2025-05-30 · unverdicted · none · ref 59
Derives PAC-type upper bounds and matching lower bounds on sample complexity for value and policy learning under recursive entropic risk measures, with exponential dependence on |β|/(1-γ).

We start by showing that with these parameter we have that Q∗ M1(z0 i ) − Q∗ M0(z0 i ) > 2ε, which we do by casing on the sign of β : 21 Case 1: β < 0

fields

years

verdicts

representative citing papers

citing papers explorer