Derives PAC-type upper bounds and matching lower bounds on sample complexity for value and policy learning under recursive entropic risk measures, with exponential dependence on |β|/(1-γ).
We start by showing that with these parameter we have that Q∗ M1(z0 i ) − Q∗ M0(z0 i ) > 2ε, which we do by casing on the sign of β : 21 Case 1: β < 0
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model
Derives PAC-type upper bounds and matching lower bounds on sample complexity for value and policy learning under recursive entropic risk measures, with exponential dependence on |β|/(1-γ).