Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and equity while developing non-exploitable strategies.
A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
MNPO extends NLHF to multiplayer Nash games, inheriting equilibrium guarantees while showing empirical gains on instruction-following benchmarks under diverse preferences.
citing papers explorer
-
Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and equity while developing non-exploitable strategies.
-
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
-
Multiplayer Nash Preference Optimization
MNPO extends NLHF to multiplayer Nash games, inheriting equilibrium guarantees while showing empirical gains on instruction-following benchmarks under diverse preferences.