pith. sign in

Scheming AIs: Will AIs fake alignment during training in order to get power? arXiv preprint arXiv:2311.08379

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

Scheming Ability in LLM-to-LLM Strategic Interactions

cs.CL · 2025-10-11 · conditional · novelty 6.0

Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.

citing papers explorer

Showing 6 of 6 citing papers.