Low Impact Artificial Intelligences
read the original abstract
There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact'. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Learning the Arrow of Time
Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.
-
Towards Empathic Deep Q-Learning
Empathic DQN augments DQN value estimates with an empathy term computed by swapping the learning agent into other agents' situations, reducing collateral harms in two gridworld proof-of-concept environments.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.