n−1X s=0 γsrt+s h<t # , which satisfies V π ν (h<t)−V π,n ν (h<t) = (1−γ)E π ν

2T 2/3 · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

Expanding an RL agent's reward model to include large negative outcomes makes it risk-averse to untested strategies and defers to a mentor when uncertain, yielding sublinear regret and safety against low-complexity predicates.

citing papers explorer

Showing 1 of 1 citing paper.

Golden Handcuffs make safer AI agents cs.LG · 2026-04-15 · unverdicted · none · ref 4
Expanding an RL agent's reward model to include large negative outcomes makes it risk-averse to untested strategies and defers to a mentor when uncertain, yielding sublinear regret and safety against low-complexity predicates.

n−1X s=0 γsrt+s h<t # , which satisfies V π ν (h<t)−V π,n ν (h<t) = (1−γ)E π ν

fields

years

verdicts

representative citing papers

citing papers explorer