Kingma and Jimmy Ba , title=

Diederik P · 2015

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

cs.LG · 2026-05-01

Showing 2 of 2 citing papers.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 206
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots cs.LG · 2026-05-01 · unreviewed · ref 59