Guidelines for Artificial Intelligence Containment

· 2017 · cs.AI · arXiv 1707.08476

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

With almost daily improvements in capabilities of artificial intelligence it is more important than ever to develop safety software for use by the AI research community. Building on our previous work on AI Containment Problem we propose a number of guidelines which should help AI safety researchers to develop reliable sandboxing software for intelligent programs of all levels. Such safety container software will make it possible to study and analyze intelligent artificial agent while maintaining certain level of safety against information leakage, social engineering attacks and cyberattacks from within the container.

representative citing papers

Safety from Honesty in a Disinterested AI Predictor

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

A disinterested Bayesian Predictor trained on contextualized statements has low probability of producing harmful agency because dangerous behaviors require rare coordinated underestimation of harm with no training signal favoring them.

citing papers explorer

Showing 1 of 1 citing paper.

Safety from Honesty in a Disinterested AI Predictor cs.AI · 2026-06-28 · unverdicted · none · ref 6 · internal anchor
A disinterested Bayesian Predictor trained on contextualized statements has low probability of producing harmful agency because dangerous behaviors require rare coordinated underestimation of harm with no training signal favoring them.

Guidelines for Artificial Intelligence Containment

fields

years

verdicts

representative citing papers

citing papers explorer