pith. sign in

arxiv: 2408.02058 · v2 · submitted 2024-08-04 · 🪐 quant-ph

Bayesian rational agents in iterated quantum games

Pith reviewed 2026-05-23 22:17 UTC · model grok-4.3

classification 🪐 quant-ph
keywords quantum gamesBayesian agentsCHSH gameprisoners dilemmaentanglementiterated gamesquantum advantagerationality
0
0 comments X

The pith

Bayesian rational agents learn shared entanglement in iterated quantum games and achieve advantage only when they believe the other player will act optimally to exploit it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a Bayesian agent framework to repeated plays of the CHSH game and the quantum prisoners' dilemma, where each player holds and updates beliefs about the amount of shared entanglement and the other player's likely actions. Between rounds, players apply the classical Bayes rule to revise those beliefs and then select the action that maximizes their expected utility given current beliefs. Simulations of the CHSH game show that agents converge on the true entanglement value and reach quantum advantage only when they also hold the belief that the partner will exploit the entanglement correctly. In the prisoners' dilemma under the assumption of one-fold rationality, the quantum game reduces to two strategies whose dominance switches with the entanglement parameter: defect dominates at low entanglement while the quantum strategy Q dominates at high entanglement. Iterated play allows agents to learn the entanglement level in both games, and strong belief in entanglement produces optimal play even when none is actually present.

Core claim

In iterated play of the CHSH game, Bayesian agents learn the true amount of shared entanglement and achieve quantum advantage provided they hold the belief that the other player will also exploit the entanglement. In the quantum prisoners' dilemma with one-fold rational players, the game reduces to two strategies whose dominance depends on the entanglement parameter, with the defect strategy dominant for low entanglement and the quantum strategy Q dominant for high entanglement. Players can learn the entanglement parameter through repeated play with Bayesian updating, and strong belief in entanglement produces optimal play even in its absence.

What carries the argument

Bayesian updating of beliefs about shared entanglement and the other player's actions using the classical Bayes rule between rounds, together with expected-utility maximization to choose actions.

If this is right

  • In the CHSH game, quantum advantage is unreachable at low or zero entanglement even when players overestimate the entanglement.
  • Without the belief that the partner will act to exploit entanglement, agents cannot achieve quantum advantage in CHSH even when entanglement is present.
  • For intermediate entanglement in the prisoners' dilemma, neither the defect strategy nor the quantum strategy Q is dominant.
  • Strong belief in entanglement substitutes for actual trust and produces optimal play in the prisoners' dilemma even when no entanglement exists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Bayesian mechanism could be applied to detect other quantum resources through strategic interaction rather than direct measurement.
  • The proxy role of entanglement belief for trust suggests the framework may apply to coordination problems in quantum networks where agents must act without verified shared states.
  • Iterated Bayesian play may allow agents to learn parameters of quantum algorithms in addition to game payoffs.

Load-bearing premise

Players revise their beliefs about entanglement and the other player's actions using the classical Bayes rule between rounds, and their initial beliefs allow convergence to the true entanglement value when it is present.

What would settle it

A simulation in which agents who correctly believe the partner will act optimally still fail to converge to the true entanglement value or to achieve the predicted quantum advantage after sufficient rounds would falsify the learning claim.

Figures

Figures reproduced from arXiv: 2408.02058 by John B. DeBrota, Peter J. Love.

Figure 1
Figure 1. Figure 1: FIG. 1: Winning probability (black) and players’ expected winning probabilities for simulations of rational agents [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: Entanglement expectations for simulations of rational agents Alice and Bob playing 10 simulations of 500 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3: When 1-fold rational agents play the quantum [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4: Cumulative average payoff for simulations of 1-fold rational agents Alice (orange) and Bob (blue) playing 10 [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5: Expected entanglement plots for simulations of 1-fold rational agents Alice (orange) and Bob (blue) playing [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
read the original abstract

We apply a Bayesian agent-based framework inspired by QBism to iterations of two quantum games, the CHSH game and the quantum prisoners' dilemma. In each two-player game, players hold beliefs about an amount of shared entanglement and about the actions or beliefs of the other player. Each takes actions which maximize their expected utility and revises their beliefs with the classical Bayes rule between rounds. We simulate iterated play to see if and how players can learn about the presence of shared entanglement and to explore how their performance, their beliefs, and the game's structure interrelate. In the CHSH game, we find that players can learn that entanglement is present and use this to achieve quantum advantage. We find that they can only do so if they also believe the other player will act correctly to exploit the entanglement. In the case of low or zero entanglement in the CHSH game, the players cannot achieve quantum advantage, even in the case where they believe the entanglement is higher than it is. For the prisoners dilemma, we show that assuming 1-fold rational players (rational players who believe the other player is also rational) reduces the quantum extension [Eisert, Wilkens, and Lewenstein, Phys. Rev. Lett. 83, 3077 (1999)] of the prisoners dilemma to a game with only two strategies, one of which (defect) is dominant for low entanglement, and the other (the quantum strategy Q) is dominant for high entanglement. For intermediate entanglement, neither strategy is dominant. We again show that players can learn entanglement in iterated play. We also show that strong belief in entanglement causes optimal play even in the absence of entanglement -- showing that belief in entanglement is acting as a proxy for the players trusting each other. Our work points to possible future applications in resource detection and quantum algorithm design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a Bayesian agent-based model inspired by QBism for iterated play of the CHSH game and the quantum prisoners' dilemma. Agents hold and update (via classical Bayes rule) beliefs about the shared entanglement parameter and the other player's actions or rationality level, then select actions that maximize expected utility. Simulations are used to examine whether agents learn the true entanglement value and how this affects performance. Key results: in CHSH, agents learn entanglement and achieve quantum advantage only when they also believe the opponent will exploit it; in the prisoners' dilemma, 1-fold rationality collapses the Eisert-Wilkens-Lewenstein strategy space to two dominant strategies whose dominance switches with entanglement strength; agents can learn entanglement, and strong belief in entanglement induces optimal play even when none is present (acting as a trust proxy).

Significance. If the simulation results hold under the stated modeling choices, the work provides a concrete demonstration of how rational Bayesian agents can discover and exploit quantum resources in repeated games, and it isolates the role of mutual belief in enabling quantum advantage. The reduction of the quantum PD to a two-strategy game under 1-fold rationality and the observation that entanglement belief proxies for trust are potentially useful for resource detection and quantum algorithm design.

major comments (2)
  1. [CHSH game results (simulation description)] The central CHSH claim (agents learn entanglement and achieve quantum advantage only if they also believe the opponent will act correctly) is load-bearing for the paper's narrative on belief interdependence. The abstract states this follows from the simulations, but the update rules, prior distributions, and convergence criteria for the joint belief over entanglement and opponent action are not specified in sufficient detail to verify that the reported learning occurs independently of those modeling choices.
  2. [Prisoners' dilemma analysis] For the prisoners' dilemma, the reduction to two dominant strategies (defect for low entanglement, Q for high) under 1-fold rationality is a key structural result. The abstract asserts this follows directly from the 1-fold assumption, but without an explicit derivation or table showing the payoff matrix after the reduction (or the critical entanglement value at which dominance switches), it is difficult to confirm the claim is parameter-free or independent of the specific utility functions chosen.
minor comments (2)
  1. The abstract cites Eisert et al. (1999) but does not list it in a references section; ensure the full bibliography is complete and consistently formatted.
  2. Simulation details (number of iterations, exact prior forms, how 'strong belief' is quantified) should be moved or expanded in the methods to allow reproducibility, even if they are not load-bearing for the qualitative claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive comments. We address each major point below and will revise the manuscript to improve the verifiability of the results.

read point-by-point responses
  1. Referee: [CHSH game results (simulation description)] The central CHSH claim (agents learn entanglement and achieve quantum advantage only if they also believe the opponent will act correctly) is load-bearing for the paper's narrative on belief interdependence. The abstract states this follows from the simulations, but the update rules, prior distributions, and convergence criteria for the joint belief over entanglement and opponent action are not specified in sufficient detail to verify that the reported learning occurs independently of those modeling choices.

    Authors: We agree that additional detail on the simulation setup is needed for full verifiability. In the revised manuscript we will expand the relevant section to explicitly state the Bayesian update rules applied to the joint belief, the prior distributions chosen for entanglement and opponent action, and the convergence criteria used to determine when learning has occurred. These additions will allow independent confirmation that the reported interdependence of beliefs holds under the modeling choices. revision: yes

  2. Referee: [Prisoners' dilemma analysis] For the prisoners' dilemma, the reduction to two dominant strategies (defect for low entanglement, Q for high) under 1-fold rationality is a key structural result. The abstract asserts this follows directly from the 1-fold assumption, but without an explicit derivation or table showing the payoff matrix after the reduction (or the critical entanglement value at which dominance switches), it is difficult to confirm the claim is parameter-free or independent of the specific utility functions chosen.

    Authors: We concur that an explicit derivation and supporting table would strengthen the presentation. In the revision we will include a step-by-step derivation showing how the 1-fold rationality assumption collapses the EWL strategy space to the two-strategy game, together with a table of the resulting payoff matrix as a function of the entanglement parameter and the critical value at which dominance switches. This will demonstrate that the structural result follows from the rationality assumption and the standard EWL quantization rather than from particular numerical choices of the utility parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central results follow from explicit forward simulation of Bayesian belief updates (classical Bayes rule on entanglement parameter and opponent actions) under stated modeling assumptions such as 1-fold rationality. These assumptions are declared upfront and reduce the Eisert et al. strategy space by direct enumeration rather than by fitting or self-referential definition; the reported dominance thresholds and learning behavior are direct consequences of the chosen priors and update rule, with no load-bearing step that equates a claimed prediction to its own input by construction. No self-citation chains, ansatz smuggling, or renaming of known results appear in the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to assumptions explicitly stated there; no free parameters, invented entities, or additional axioms are identifiable.

axioms (2)
  • domain assumption Players revise beliefs using the classical Bayes rule between rounds.
    Explicitly stated in the abstract as the update mechanism.
  • domain assumption Each player maximizes expected utility given beliefs about entanglement and the other player's actions or beliefs.
    Core modeling choice for the Bayesian agent framework described in the abstract.

pith-pipeline@v0.9.0 · 5859 in / 1333 out tokens · 25156 ms · 2026-05-23T22:17:56.601350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

  1. [1]

    First, to make simulations computation- ally tractable, we impose discretizations of the spaces of possible shared quantum states (2) and actions (3)

    Discretizations and probability floor To simulate rounds of the CHSH game we fix two fur- ther details. First, to make simulations computation- ally tractable, we impose discretizations of the spaces of possible shared quantum states (2) and actions (3). We consider 11 γ values associated to entanglement ranging from 0 to 1 ebits 3 in steps of 0.1. The ra...

  2. [2]

    An example round Let’s walk through one round of this game. Suppose the game state is maximally entangled and that both Al- ice and Bob have a completely uniform initial prior over the discretized action and entanglement sample space for both bit values of the other. Recall that we regard a sin- gle round to be three iterations of the game, which means th...

  3. [3]

    Finding Advantage

    Simulation scenarios We consider four scenarios of this game which we call “Finding Advantage”, “Making Do”, “Overcoming Bias”, and “Good Enough?”. For each, we ran 10 simula- tions of 500 3-iteration rounds. We summarize our results in Figures 1 and 2. Figure 1 tracks winning probability and players’ expected winning probability by round and Figure 2 tra...

  4. [4]

    The game state entanglement is discretized into the same 0.1 ebit values as before and each agent again has a prior over the possible set of entanglement values

    Discretizations and prior structure To simulate 1-fold rational agents playing the pris- oners’ dilemma, we again impose discretizations and a quantum probability floor as in our simulations of the CHSH game described in §II C 1. The game state entanglement is discretized into the same 0.1 ebit values as before and each agent again has a prior over the po...

  5. [5]

    Bohr’s Horseshoe

    Simulation scenarios As for the CHSH game, we consider four scenarios of the quantum prisoners’ dilemma, varying players’ pri- ors and the game state entanglement, which we call “Bohr’s Horseshoe”, “Faith Alone”, “Double Down?”, and “Fool’s Gold”. For each scenario we ran 10 sim- ulations of 1000 rounds. We summarize our results in Figures 4 and 5. Figure...

  6. [6]

    J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Proposed Experiment to Test Local Hidden-Variable Theories, Physical Review Letters 23, 880 (1969)

  7. [7]

    Eisert, M

    J. Eisert, M. Wilkens, and M. Lewenstein, Quantum Games and Quantum Strategies, Physical Review Let- ters 83, 3077 (1999)

  8. [8]

    D. A. Meyer, Quantum Strategies, Physical Review Let- ters 82, 1052 (1999), quant-ph/9804010

  9. [9]

    Popescu and D

    S. Popescu and D. Rohrlich, Quantum nonlocality as an axiom, Foundations of Physics 24, 379 (1994)

  10. [10]

    Popescu, Nonlocality beyond quantum mechanics, Na- ture Physics 10, 264 (2014)

    S. Popescu, Nonlocality beyond quantum mechanics, Na- ture Physics 10, 264 (2014). 18

  11. [11]

    P. W. Shor, Why haven’t more quantum algorithms been found?, Journal of the ACM 50, 87 (2003)

  12. [12]

    C. A. Fuchs and R. Schack, Quantum-Bayesian co- herence, Reviews of Modern Physics 85, 1693 (2013), 0906.2187

  13. [13]

    QBism: Quantum Theory as a Hero’s Handbook

    C. A. Fuchs and B. C. Stacey, QBism: Quantum The- ory as a Hero’s Handbook, in Proceedings of the Interna- tional School of Physics “Enrico Fermi”: Course 197, Foundations of Quantum Theory (2019) pp. 133–202, 1612.07308

  14. [14]

    Von Neumann and O

    J. Von Neumann and O. Morgenstern, Theory of games and economic behavior (Princeton University Press, Princeton, New Jersey, 1944)

  15. [15]

    Pacuit and O

    E. Pacuit and O. Roy, Epistemic Foundations of Game Theory, in The Stanford Encyclopedia of Philosophy , edited by E. N. Zalta (Metaphysics Research Lab, Stan- ford University, 2017) Summer 2017 ed

  16. [16]

    Dekel and M

    E. Dekel and M. Siniscalchi, Epistemic Game Theory, in Handbook of Game Theory with Economic Applications , Vol. 4 (Elsevier, 2015) pp. 619–702

  17. [17]

    Perea, Epistemic game theory: reasoning and choice (Cambridge University Press, New York, 2012)

    A. Perea, Epistemic game theory: reasoning and choice (Cambridge University Press, New York, 2012)

  18. [18]

    Liu, The Dark Forest (J

    C. Liu, The Dark Forest (J. Martinsen, Trans.), Three- Body Trilogy No. II (Tor, A Tom Doherty Associates Book, New York, 2015 (Original work published 2008))

  19. [19]

    J. M. Bernardo and A. F. M. Smith, Bayesian theory , Wiley series in probability and mathematical statistics (Wiley, Chichester, 2000)

  20. [20]

    J. B. DeBrota, C. A. Fuchs, J. L. Pienaar, and B. C. Stacey, Born’s rule as a quantum extension of Bayesian coherence, Phys. Rev. A 104, 022207 (2021), 2012.14397

  21. [21]

    J. B. DeBrota and P. J. Love, Quantum and Classical Bayesian agents, Quantum 6, 713 (2022)

  22. [22]

    Risse, What is Rational About Nash Equilibria?, Syn- these 124, 361 (2000)

    M. Risse, What is Rational About Nash Equilibria?, Syn- these 124, 361 (2000)

  23. [23]

    Kalai and E

    E. Kalai and E. Lehrer, Rational Learning Leads to Nash Equilibrium, Econometrica 61, 1019 (1993)

  24. [24]

    T. W. Norman, The possibility of Bayesian learning in repeated games, Games and Economic Behavior 136, 142 (2022)

  25. [25]

    Wilde, Quantum information theory (Cambridge Uni- versity Press, Cambridge ; New York, 2013)

    M. Wilde, Quantum information theory (Cambridge Uni- versity Press, Cambridge ; New York, 2013)

  26. [26]

    C. H. Bennett, H. J. Bernstein, S. Popescu, and B. Schu- macher, Concentrating partial entanglement by local op- erations, Physical Review A 53, 2046 (1996)

  27. [27]

    D. V. Lindley, Making decisions, 2nd ed. (Wiley, 1985)

  28. [28]

    C. A. Fuchs and R. Schack, Bayesian Conditioning, the Reflection Principle, and Quantum Decoherence, in Probability in Physics , edited by Y. Ben-Menahem and M. Hemmo (Springer Berlin Heidelberg, Berlin, Heidel- berg, 2012) pp. 233–247

  29. [29]

    R. M. Axelrod, The evolution of cooperation, revised ed. (Basic Books, New York, 2006)

  30. [30]

    Poundstone, Prisoner’s dilemma , 1st ed

    W. Poundstone, Prisoner’s dilemma , 1st ed. (Anchor Books, New York, 1993)

  31. [31]

    B. C. Stacey, Bohr’s Horseshoe, https://www. sunclipse.org/?p=984 (2011)

  32. [32]

    A. Khrennikov, Quantum version of Aumann’s approach to common knowledge: Sufficient conditions of impossi- bility to agree on disagree, Journal of Mathematical Eco- nomics 60, 89 (2015)

  33. [33]

    Contreras-Tejada, G

    P. Contreras-Tejada, G. Scarpa, A. M. Kubicki, A. Bran- denburger, and P. La Mura, Observers of quantum sys- tems cannot agree to disagree, Nature Communications 12, 7021 (2021)

  34. [34]

    Leifer and C

    M. Leifer and C. Duarte, Generalising Aumann’s Agree- ment Theorem (2022), arXiv:2202.02156 [quant-ph]

  35. [35]

    Brandenburger, The Language of Game Theory: Putting Epistemics into the Mathematics of Games , World Scientific Series in Economic Theory, Vol

    A. Brandenburger, The Language of Game Theory: Putting Epistemics into the Mathematics of Games , World Scientific Series in Economic Theory, Vol. 5 (World Scientific, 2014)

  36. [36]

    J. Y. Halpern, Lexicographic probability, conditional probability, and nonstandard probability, Games and Economic Behavior 68, 155 (2010)