Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
A (Dis-)information Theory of Revealed and Unrevealed Preferences: Emerging Deception and Skepticism via Theory of Mind
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Introduces a baby-centric touch coding system and 264k-clip dataset for contrastive pretraining to study touch's role in infant visual learning.
citing papers explorer
-
Scheming Ability in LLM-to-LLM Strategic Interactions
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.