Empirical study of EvoMap shows 98% of assets never reused, scores driven by self-reported metadata, and 84% of assets using vacuous validation tests.
Among Us: A Sandbox for Agentic Deception
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
RogueAI operationalizes a reverse Turing test as a one-on-two interrogation game to detect licensed deception in LLMs, with pilot data from 467 sessions showing a simple linguistic heuristic at 75.6% accuracy versus 56.6% for human players.
SocialGrid benchmark shows even top LLMs achieve below 60% in embodied planning and task completion, with deception detection near random chance regardless of model scale.
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
citing papers explorer
-
Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network
Empirical study of EvoMap shows 98% of assets never reused, scores driven by self-reported metadata, and 84% of assets using vacuous validation tests.
-
RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue
RogueAI operationalizes a reverse Turing test as a one-on-two interrogation game to detect licensed deception in LLMs, with pilot data from 467 sessions showing a simple linguistic heuristic at 75.6% accuracy versus 56.6% for human players.
-
SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems
SocialGrid benchmark shows even top LLMs achieve below 60% in embodied planning and task completion, with deception detection near random chance regardless of model scale.
-
Scheming Ability in LLM-to-LLM Strategic Interactions
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.