A Game-Theoretic Analysis of the Off-Switch Game

Elliot Catt; Marcus Hutter; Mikael B\"o\"ors; Tobias W\"angberg; Tom Everitt

arxiv: 1708.03871 · v1 · pith:XJS7BGHYnew · submitted 2017-08-13 · 💻 cs.GT

A Game-Theoretic Analysis of the Off-Switch Game

Tobias W\"angberg , Mikael B\"o\"ors , Elliot Catt , Tom Everitt , Marcus Hutter This is my paper

classification 💻 cs.GT

keywords gameanalysishumanrobotactionassumptionsbestfully

0 comments

read the original abstract

The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al. (2016), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot's best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot's best action for arbitrary belief and irrationality assumptions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
cs.AI 2026-04 conditional novelty 7.0

DReST training makes RL agents and LLMs neutral to trajectory lengths and useful at goals, generalizing to halve shutdown influence probability in out-of-distribution tests.
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
cs.AI 2026-04 unverdicted novelty 6.0

DReST-trained deep RL agents and fine-tuned LLMs generalize to higher usefulness and neutrality on unseen test contexts, with reported gains of 11-18% over baselines and near-maximum scores for the LLM.
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
cs.AI 2026-04 conditional novelty 6.0

DReST-trained RL agents and LLMs achieve higher usefulness and neutrality to trajectory lengths, halving the probability of delaying shutdown in out-of-distribution tests.
Towards Shutdownable Agents via Stochastic Choice
cs.AI 2024-06 conditional novelty 6.0

Gridworld agents trained with DReST reward functions learn to be USEFUL at tasks conditional on trajectory length and NEUTRAL across lengths, supplying initial evidence that the method could produce shutdownable agents.