arXiv:2509.15090, version 2 revised 2026, https://arxiv.org/abs/2509.15090

Emergent Alignment via Competition · 2026 · arXiv 2509.15090

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion

cs.GT · 2026-06-20 · unverdicted · novelty 7.0

In a Bayesian persuasion model of AI misalignment on bit strings, receiver utility under sender-optimal signaling is at most 3/2 times prior-only utility, with an additive bound for near-product priors and a 6-bit example achieving 39/31.

Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games

cs.AI · 2026-06-16 · unverdicted · novelty 6.0

Equation-to-Behavior Prompting lets large LLMs match cognitive models like Bayesian updating in persuasion games; RL training cuts small-model belief error by 26.5% and improves diverse training outcomes by 2.5-12%.

citing papers explorer

Showing 2 of 2 citing papers.

Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion cs.GT · 2026-06-20 · unverdicted · none · ref 1
In a Bayesian persuasion model of AI misalignment on bit strings, receiver utility under sender-optimal signaling is at most 3/2 times prior-only utility, with an additive bound for near-product priors and a 6-bit example achieving 39/31.
Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games cs.AI · 2026-06-16 · unverdicted · none · ref 31
Equation-to-Behavior Prompting lets large LLMs match cognitive models like Bayesian updating in persuasion games; RL training cuts small-model belief error by 26.5% and improves diverse training outcomes by 2.5-12%.

arXiv:2509.15090, version 2 revised 2026, https://arxiv.org/abs/2509.15090

fields

years

verdicts

representative citing papers

citing papers explorer