MA-BC partitions divergent expert data and pools non-conflicting pairs to achieve faster convergence to Pareto-optimal policies in MOMDPs, with a matching minimax lower bound.
Stochastic games.Proceedings of the national academy of sciences, 39(10):1095–1100
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Value mirror descent integrates mirror descent into value iteration for discounted MDPs, delivering near-optimal sample complexity of order |S||A|(1-γ)^{-3}ε^{-2} for general convex regularizers and bounded Bregman divergence between generated and optimal policies.
citing papers explorer
-
Split the Differences, Pool the Rest: Provably Efficient Multi-Objective Imitation
MA-BC partitions divergent expert data and pools non-conflicting pairs to achieve faster convergence to Pareto-optimal policies in MOMDPs, with a matching minimax lower bound.
-
Value Mirror Descent for Reinforcement Learning
Value mirror descent integrates mirror descent into value iteration for discounted MDPs, delivering near-optimal sample complexity of order |S||A|(1-γ)^{-3}ε^{-2} for general convex regularizers and bounded Bregman divergence between generated and optimal policies.