Presents a taxonomy of wireheading in partially embedded agents, defines wirehead-vulnerable agents, demonstrates via AIXIjs simulation, and conjectures that specification gaming is the only other misalignment type.
Avoiding wireheading with value reinforcement learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Categorizing Wireheading in Partially Embedded Agents
Presents a taxonomy of wireheading in partially embedded agents, defines wirehead-vulnerable agents, demonstrates via AIXIjs simulation, and conjectures that specification gaming is the only other misalignment type.