pith. sign in

n−1X s=0 γsrt+s h<t # , which satisfies V π ν (h<t)−V π,n ν (h<t) = (1−γ)E π ν

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Golden Handcuffs make safer AI agents

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

Expanding an RL agent's reward model to include large negative outcomes makes it risk-averse to untested strategies and defers to a mentor when uncertain, yielding sublinear regret and safety against low-complexity predicates.

citing papers explorer

Showing 1 of 1 citing paper.

  • Golden Handcuffs make safer AI agents cs.LG · 2026-04-15 · unverdicted · none · ref 4

    Expanding an RL agent's reward model to include large negative outcomes makes it risk-averse to untested strategies and defers to a mentor when uncertain, yielding sublinear regret and safety against low-complexity predicates.