Bayes' Bluff: Opponent Modelling in Poker

Finnegan Southey , Michael P. Bowling , Bryce Larson , Carmelo Piccione , Neil Burch , Darse Billings , Chris Rayner

Authors on Pith no claims yet

classification 💻 cs.GT cs.AI

keywords opponentpokerdemonstratedistributiondynamicsgamemodellingplaying

read the original abstract

Poker is a challenging problem for artificial intelligence, with non-deterministic dynamics, partial observability, and the added difficulty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of the opponent's strategy. We then describe approaches to two key subproblems: (i) inferring a posterior over opponent strategies given a prior distribution and observations of their play, and (ii) playing an appropriate response to that distribution. We demonstrate the overall approach on a reduced version of poker using Dirichlet priors and then on the full game of Texas hold'em using a more informed prior. We demonstrate methods for playing effective responses to the opponent, based on the posterior.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms
cs.AI 2026-05 unverdicted novelty 7.0

Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on t...
AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play
cs.LG 2026-05 unverdicted novelty 5.0

AlphaExploitem adds a hierarchical transformer encoder and a diverse pool of exploitable opponents to AlphaHoldem, enabling exploitation of suboptimal poker play while preserving performance against Nash-equilibrium o...