Low Impact Artificial Intelligences

Benjamin Levinstein; Stuart Armstrong

arxiv: 1705.10720 · v1 · pith:NWRSHL5Wnew · submitted 2017-05-30 · 💻 cs.AI

Low Impact Artificial Intelligences

Stuart Armstrong , Benjamin Levinstein This is my paper

classification 💻 cs.AI

keywords impactapproachdangerousdefiningevengoalspowerfuladdresses

0 comments

read the original abstract

There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact'. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning the Arrow of Time
cs.LG 2019-07 unverdicted novelty 7.0

Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.
Towards Empathic Deep Q-Learning
cs.LG 2019-06 unverdicted novelty 6.0

Empathic DQN augments DQN value estimates with an empathy term computed by swapping the learning agent into other agents' situations, reducing collateral harms in two gridworld proof-of-concept environments.