Online robust reinforcement learning with model uncertainty,

· 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees

cs.LG · 2026-04-15 · unverdicted · novelty 8.0

RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.

citing papers explorer

Showing 1 of 1 citing paper.

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees cs.LG · 2026-04-15 · unverdicted · none · ref 15
RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.

Online robust reinforcement learning with model uncertainty,

fields

years

verdicts

representative citing papers

citing papers explorer