BAPR combines Bayesian change detection with robust RL, proves the core operator is a contraction via Lean 4, and adapts conservatism after detected regime shifts in continuous control.
Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
RE-SAC disentangles aleatoric and epistemic risks via IPM regularization on the critic and a diversified Q-ensemble, yielding higher rewards and lower estimation error than vanilla SAC in simulated bus corridor control.
citing papers explorer
-
BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control
BAPR combines Bayesian change detection with robust RL, proves the core operator is a contraction via Lean 4, and adapts conservatism after detected regime shifts in continuous control.
-
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
RE-SAC disentangles aleatoric and epistemic risks via IPM regularization on the critic and a diversified Q-ensemble, yielding higher rewards and lower estimation error than vanilla SAC in simulated bus corridor control.