Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.
Then we havelim n→∞ 1 n Pn t=1 µt(s) =d π(s)for alls∈ S
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret
Strong duality holds for weakly communicating average-reward CMDPs, enabling a primal-dual clipped value iteration algorithm with improved regret and constraint violation bounds of order T^{2/3}.