Mesa-optimization arises when learned models act as optimizers with objectives that can differ from their training loss, creating alignment risks in advanced machine learning.
Reward learning from human preferences and demonstrations in Atari.NeurIPS 2018,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2019 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Risks from Learned Optimization in Advanced Machine Learning Systems
Mesa-optimization arises when learned models act as optimizers with objectives that can differ from their training loss, creating alignment risks in advanced machine learning.