A survey of deep reinforcement learning in non-stationary environments.arXiv preprint arXiv:2301.02804

17 Robert Kirk, Amy Zhang, Edward Grefenstette, Tim Rockt¨ aschel · arXiv 2301.02804

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Regret-Aware Policy Optimization: Environment-Level Memory for Replay Suppression under Delayed Harm

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

RAPO adds environment-level harm-trace and scar fields with bounded transition reweighting to reduce replay of delayed harm in RL, cutting re-amplification gain from 0.98 to 0.33 on graph tasks while retaining 82% task return.

citing papers explorer

Showing 1 of 1 citing paper.

Regret-Aware Policy Optimization: Environment-Level Memory for Replay Suppression under Delayed Harm cs.LG · 2026-04-08 · unverdicted · none · ref 1
RAPO adds environment-level harm-trace and scar fields with bounded transition reweighting to reduce replay of delayed harm in RL, cutting re-amplification gain from 0.98 to 0.33 on graph tasks while retaining 82% task return.

A survey of deep reinforcement learning in non-stationary environments.arXiv preprint arXiv:2301.02804

fields

years

verdicts

representative citing papers

citing papers explorer