Towards Shutdownable Agents via Stochastic Choice

Elliott Thornley , Alexander Roman , Christos Ziakas , Leyton Ho , Louis Thomson

Authors on Pith no claims yet

classification 💻 cs.AI

keywords agentsrewardusefuldrestneutraltrainadvancedfunction

read the original abstract

The POST-Agents Proposal (PAP) is an idea for ensuring that advanced artificial agents never resist shutdown. A key part of the PAP is using a novel `Discounted Reward for Same-Length Trajectories (DReST)' reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be `USEFUL'), and (2) choose stochastically between different trajectory-lengths (be `NEUTRAL' about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus provide some initial evidence that DReST reward functions could train advanced agents to be USEFUL and NEUTRAL. Our theoretical work suggests that these agents would be useful and shutdownable.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
cs.AI 2026-04 conditional novelty 7.0

DReST training makes RL agents and LLMs neutral to trajectory lengths and useful at goals, generalizing to halve shutdown influence probability in out-of-distribution tests.
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
cs.AI 2026-04 unverdicted novelty 6.0

DReST-trained deep RL agents and fine-tuned LLMs generalize to higher usefulness and neutrality on unseen test contexts, with reported gains of 11-18% over baselines and near-maximum scores for the LLM.
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
cs.AI 2026-04 conditional novelty 6.0

DReST-trained RL agents and LLMs achieve higher usefulness and neutrality to trajectory lengths, halving the probability of delaying shutdown in out-of-distribution tests.