A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes , publisher =

Honghao Wei, Xin Liu, Lei Ying · 2021 · arXiv 2106.01577

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees

cs.LG · 2026-04-15 · unverdicted · novelty 8.0

RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.

Online Resource Allocation With General Constraints

cs.GT · 2026-05-11 · unverdicted · novelty 7.0

An algorithm for online resource allocation with budget and general constraints achieves O(sqrt(T)) regret in stochastic and alpha-regret in adversarial regimes with bounded constraint violations.

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

cs.LG · 2024-08-29 · unverdicted · novelty 7.0

Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.

citing papers explorer

Showing 4 of 4 citing papers.

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time cs.LG · 2026-05-08 · unverdicted · none · ref 17
A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.
Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees cs.LG · 2026-04-15 · unverdicted · none · ref 29
RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.
Online Resource Allocation With General Constraints cs.GT · 2026-05-11 · unverdicted · none · ref 16
An algorithm for online resource allocation with budget and general constraints achieves O(sqrt(T)) regret in stochastic and alpha-regret in adversarial regimes with bounded constraint violations.
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form cs.LG · 2024-08-29 · unverdicted · none · ref 75
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.

A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes , publisher =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer