In a Bayesian persuasion model of AI misalignment on bit strings, receiver utility under sender-optimal signaling is at most 3/2 times prior-only utility, with an additive bound for near-product priors and a 6-bit example achieving 39/31.
arXiv:2509.15090, version 2 revised 2026, https://arxiv.org/abs/2509.15090
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Equation-to-Behavior Prompting lets large LLMs match cognitive models like Bayesian updating in persuasion games; RL training cuts small-model belief error by 26.5% and improves diverse training outcomes by 2.5-12%.
citing papers explorer
-
Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion
In a Bayesian persuasion model of AI misalignment on bit strings, receiver utility under sender-optimal signaling is at most 3/2 times prior-only utility, with an additive bound for near-product priors and a 6-bit example achieving 39/31.
-
Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games
Equation-to-Behavior Prompting lets large LLMs match cognitive models like Bayesian updating in persuasion games; RL training cuts small-model belief error by 26.5% and improves diverse training outcomes by 2.5-12%.