pith. sign in

First-Order Policy Optimization for Robust Markov Decision Process

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 4 2024 1

verdicts

UNVERDICTED 5

roles

background 1

polarities

background 1

representative citing papers

Value Mirror Descent for Reinforcement Learning

math.OC · 2026-04-07 · unverdicted · novelty 5.0

Value mirror descent integrates mirror descent into value iteration for discounted MDPs, delivering near-optimal sample complexity of order |S||A|(1-γ)^{-3}ε^{-2} for general convex regularizers and bounded Bregman divergence between generated and optimal policies.

citing papers explorer

Showing 5 of 5 citing papers.