pith. sign in

Low Impact Artificial Intelligences

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact'. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research.

fields

cs.LG 2

years

2019 2

verdicts

UNVERDICTED 2

representative citing papers

Learning the Arrow of Time

cs.LG · 2019-07-02 · unverdicted · novelty 7.0

Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.

Towards Empathic Deep Q-Learning

cs.LG · 2019-06-26 · unverdicted · novelty 6.0

Empathic DQN augments DQN value estimates with an empathy term computed by swapping the learning agent into other agents' situations, reducing collateral harms in two gridworld proof-of-concept environments.

citing papers explorer

Showing 2 of 2 citing papers.

  • Learning the Arrow of Time cs.LG · 2019-07-02 · unverdicted · none · ref 17 · internal anchor

    Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.

  • Towards Empathic Deep Q-Learning cs.LG · 2019-06-26 · unverdicted · none · ref 2 · internal anchor

    Empathic DQN augments DQN value estimates with an empathy term computed by swapping the learning agent into other agents' situations, reducing collateral harms in two gridworld proof-of-concept environments.