Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.
Sara Fish, Yannai A
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 2representative citing papers
NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.
AgentReputation proposes separating AI agent task execution, reputation management, and secure record-keeping into distinct layers, with context-specific reputation cards and a risk-based policy engine to handle verification in decentralized settings.
LLMs achieve only 59.7% role identification accuracy in Secret Hitler versus 86.7% for rule-based agents, show negative impact as fascists, and produce 40% shorter games due to failed deception.
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
citing papers explorer
-
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.
-
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.
-
AgentReputation: A Decentralized Agentic AI Reputation Framework
AgentReputation proposes separating AI agent task execution, reputation management, and secure record-keeping into distinct layers, with context-specific reputation cards and a risk-based policy engine to handle verification in decentralized settings.
-
Evaluating Large Language Models in a Complex Hidden Role Game
LLMs achieve only 59.7% role identification accuracy in Secret Hitler versus 86.7% for rule-based agents, show negative impact as fascists, and produce 40% shorter games due to failed deception.
-
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.