LLM agents in an opposing-incentive NYC simulation develop limited selective trust and deception through KTO policy updates but stay 70% susceptible to adversarial persuasion.
- 3: The agent attempts to reach the destination but takes highly inefficient routes or gets easily sidetracked
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MA 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation
LLM agents in an opposing-incentive NYC simulation develop limited selective trust and deception through KTO policy updates but stay 70% susceptible to adversarial persuasion.