An MLMC-enhanced primal-dual natural actor-critic algorithm achieves optimal ilde{O}(1/\sqrt{T}) global convergence and constraint violation rates for constrained multi-objective average-reward RL without mixing-time knowledge.
Breaking the bias barrier in concave multi-objective reinforcement learning.arXiv preprint arXiv:2603.08518, 2026
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Replaces scalar reward with a distribution over reward functions and applies a non-linear objective over action sets to induce controllable diversity in contextual bandit RL, generalizing policy gradient methods.
citing papers explorer
-
Bias-Controlled Primal-Dual Natural Actor-Critic: Optimal Rates for Constrained Multi-Objective Average-Reward RL
An MLMC-enhanced primal-dual natural actor-critic algorithm achieves optimal ilde{O}(1/\sqrt{T}) global convergence and constraint violation rates for constrained multi-objective average-reward RL without mixing-time knowledge.
-
Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning
Replaces scalar reward with a distribution over reward functions and applies a non-linear objective over action sets to induce controllable diversity in contextual bandit RL, generalizing policy gradient methods.