Projected Variational Quantum Extragradient for Zero-Sum Games

Duong The Do; Duong Tung Nguyen; Matthew Aldridge

arxiv: 2604.16466 · v1 · submitted 2026-04-08 · 📡 eess.SY · cs.GT· cs.SY

Projected Variational Quantum Extragradient for Zero-Sum Games

Duong The Do , Matthew Aldridge , Duong Tung Nguyen This is my paper

Pith reviewed 2026-05-10 17:08 UTC · model grok-4.3

classification 📡 eess.SY cs.GTcs.SY

keywords variational quantum algorithmsextragradient methodsNash equilibriazero-sum gamesquantum optimizationstochastic gradientsmatrix gamesparameter shift rule

0 comments

The pith

A projected variational quantum extragradient method computes approximate Nash equilibria in zero-sum matrix games by optimizing parameters of quantum circuits with finite-shot gradients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a variational quantum framework that represents mixed strategies in two-player zero-sum games as Born distributions from parameterized quantum circuits. This converts the bilinear saddle-point problem into a smooth minmax optimization over circuit parameters, with payoffs evaluated as expectations of diagonal observables. A dominated embedding maps arbitrary game dimensions to power-of-two qubit sizes while preserving equilibrium structure. The authors then introduce a projected extragradient algorithm that uses stochastic gradients obtained via the parameter-shift rule and finite measurement shots, proving that gradient variance scales as O(1/S) and that the iterates converge to approximate first-order stationarity under standard assumptions.

Core claim

The central claim is that the projected variational quantum extragradient (VQEG) algorithm, driven by stochastic gradient estimates from finite shots, converges to approximate first-order stationary points in circuit-parameter space, and that these points correspond numerically to small Nash gaps when evaluated in the original game space.

What carries the argument

The projected extragradient iteration that applies stochastic gradients, obtained via the parameter-shift rule on finite-shot measurements of a diagonal observable, to the parameters of a quantum circuit whose Born distribution encodes a mixed strategy.

If this is right

Gradient variance scales linearly with the inverse number of shots, so precision can be increased by allocating more measurements without changing the circuit depth.
The dominated embedding allows the same quantum circuit ansatz to be used for games of any size by padding to the next power of two while leaving equilibria unchanged.
Convergence holds to approximate stationarity under the standard Lipschitz and smoothness assumptions on the payoff function in parameter space.
Numerical tests achieve high-precision solutions on structured 32-by-32 instances, measured by the game-space Nash gap.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding and shot-based gradient machinery could be tested on non-zero-sum or multiplayer games if the payoff can still be expressed as an expectation of a diagonal observable.
Unstructured games expose the limits of the chosen circuit ansatz, suggesting that problem-specific initialization or adaptive circuit growth may be needed for broader applicability.
Because stationarity is only a necessary condition for equilibrium, post-processing steps that explicitly minimize the Nash gap after convergence could tighten the final solution quality.

Load-bearing premise

That reaching first-order stationarity in the circuit-parameter space produces mixed strategies whose Nash gap in the original game is small.

What would settle it

A sequence of unstructured games where the algorithm reaches stationarity in parameter space yet the Nash gap in game space remains larger than a fixed positive threshold, independent of the number of shots.

Figures

Figures reproduced from arXiv: 2604.16466 by Duong The Do, Duong Tung Nguyen, Matthew Aldridge.

**Figure 1.** Figure 1: Illustration of payoff matrix for 8 × 8 zero-sum game. in the original game. A solution is considered successful if G(x, y) ≤ ε, with tolerance ε = 5×10−3 . Our circuit ansatz is constructed by which, within each layer, we apply single-qubit rotations RY and RZ on every qubit, followed by a nearestneighbor CZ entangling ring. With L layers, each player has dplayer = 2qL trainable parameters. For example, … view at source ↗

**Figure 2.** Figure 2: VQEG convergence and leakage for 8 × 8 matrix game with dominant row startegy [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

We propose a projected variational quantum extragradient (VQEG) framework for computing approximate Nash equilibria in two-player zero-sum matrix games. Mixed strategies are parameterized as Born distributions of parameterized quantum circuits (PQCs), transforming the classical bilinear saddle point problem into a smooth but generally minmax optimization in circuit-parameter space. The expected payoff is expressed as the expectation of a diagonal observable, enabling gradient evaluation via the parameter shift rule and compatibility with shot based quantum hardware. To support arbitrary game sizes, we introduce a dominated embedding that maps (m,n) games to qubit-compatible power-of-two dimensions while preserving equilibrium structure. We then develop a projected extragradient method using stochastic gradient estimates derived from finite measurement shots, and establish variance bounds scaling as O(1/S) with respect to the number of measurement shots S, along with convergence to approximate first-order stationarity under standard assumptions. Since stationarity does not guarantee equilibrium optimality, we evaluate performance using the game-space Nash gap. Numerical results demonstrate high-precision solutions on structured instances up to 32x32, while highlighting challenges in unstructured settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable way to run extragradient on quantum-circuit parameters for small zero-sum games, with a useful embedding trick, but the link from parameter stationarity to actual Nash gaps stays numerical.

read the letter

The main contribution is a projected variational quantum extragradient method that represents mixed strategies as Born distributions from parameterized quantum circuits, turns the bilinear game into a smooth minmax problem over circuit parameters, and applies stochastic extragradient with shot-based gradients. They add a dominated embedding that pads (m,n) games to power-of-two qubit sizes while keeping the equilibrium structure intact. This lets them use the parameter-shift rule on a diagonal observable and derive variance bounds that scale as O(1/S) for S shots, plus convergence to approximate first-order stationarity under standard assumptions. Numerically they reach high precision on structured instances up to 32x32 and report the game-space Nash gap directly.

Referee Report

3 major / 2 minor

Summary. The paper proposes a projected variational quantum extragradient (VQEG) algorithm for approximating Nash equilibria in two-player zero-sum matrix games. Mixed strategies are encoded as Born-rule distributions over parameterized quantum circuits (PQCs), converting the classical bilinear saddle-point problem into a smooth but non-convex-non-concave optimization over circuit parameters. Gradients are obtained via the parameter-shift rule with finite-shot stochastic estimates; variance bounds of O(1/S) are derived, a dominated embedding maps arbitrary (m,n) games to power-of-two qubit dimensions, and projected extragradient is shown to converge to approximate first-order stationarity in parameter space under standard assumptions. Because stationarity does not imply game-space optimality, performance is assessed via the Nash gap on numerical instances up to 32×32.

Significance. If the numerical Nash-gap results hold under broader conditions, the framework supplies a concrete variational-quantum route to game solving that is compatible with near-term hardware and supplies explicit shot-noise analysis. The embedding technique and stochastic convergence bounds are technically useful contributions even if the parameter-to-strategy mapping remains many-to-one.

major comments (3)

[§4] §4 (Convergence Analysis) and the statement following Eq. (variance bound): convergence is established only to first-order stationarity of the parameter-space loss L(θ). No quantitative relation is supplied between ||∇_θ L|| and the Nash gap in the original mixed-strategy simplex; the manuscript correctly notes the distinction but then relies solely on numerical evaluation without a supporting lemma or worst-case example relating the two quantities.
[§3.2] §3.2 (Dominated Embedding): the construction maps an (m,n) game to a power-of-two dimension while claiming to preserve equilibrium structure. The proof that the projected equilibrium of the embedded game corresponds to an equilibrium of the original game (or an explicit bound on the induced gap) is not provided; a small counter-example or formal invariance argument is needed.
[Numerical Experiments] Numerical section (results up to 32×32): experiments are reported only on structured instances. The manuscript states that unstructured games are more challenging, yet supplies neither failure rates, scaling plots with problem size, nor a clear definition of “structured” versus “unstructured,” making it difficult to assess the practical scope of the method.

minor comments (2)

[§3] Notation for the projected update and the definition of the feasible set after embedding should be stated explicitly once, rather than introduced piecemeal across sections.
[Figures] Figure captions for the Nash-gap plots should include the number of shots S and the circuit depth used, to allow direct comparison with the O(1/S) variance claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. We address each major comment point by point below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [§4] §4 (Convergence Analysis) and the statement following Eq. (variance bound): convergence is established only to first-order stationarity of the parameter-space loss L(θ). No quantitative relation is supplied between ||∇_θ L|| and the Nash gap in the original mixed-strategy simplex; the manuscript correctly notes the distinction but then relies solely on numerical evaluation without a supporting lemma or worst-case example relating the two quantities.

Authors: We agree that the convergence result is limited to first-order stationarity in parameter space and that no general quantitative bound relating ||∇_θ L|| to the Nash gap is derived, owing to the non-convex parameterization and many-to-one mapping. The manuscript already notes this distinction. In the revision we will add a short remark in §4 together with a simple illustrative example showing correlation between small parameter gradients and small Nash gaps, while acknowledging that a worst-case lemma remains difficult under the present assumptions. This will make the reliance on numerical evaluation more transparent. revision: partial
Referee: [§3.2] §3.2 (Dominated Embedding): the construction maps an (m,n) game to a power-of-two dimension while claiming to preserve equilibrium structure. The proof that the projected equilibrium of the embedded game corresponds to an equilibrium of the original game (or an explicit bound on the induced gap) is not provided; a small counter-example or formal invariance argument is needed.

Authors: The dominated embedding is constructed so that the extra dimensions receive zero probability under any equilibrium strategy, thereby preserving the original equilibria. We will insert a formal invariance argument and short proof into §3.2 (with details in the appendix) showing that every equilibrium of the embedded game projects to an equilibrium of the original game with zero induced gap. This will substantiate the preservation claim. revision: yes
Referee: [Numerical Experiments] Numerical section (results up to 32×32): experiments are reported only on structured instances. The manuscript states that unstructured games are more challenging, yet supplies neither failure rates, scaling plots with problem size, nor a clear definition of “structured” versus “unstructured,” making it difficult to assess the practical scope of the method.

Authors: We will add an explicit definition of structured instances (low-rank or symmetric payoff matrices) versus unstructured ones in the numerical section. We will also include scaling plots versus game size and report failure rates (instances failing to reach a prescribed Nash-gap threshold) for both classes, thereby clarifying the practical scope and limitations of the method. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained on standard primitives

full rationale

The paper parameterizes mixed strategies as Born distributions of PQCs, expresses the payoff as the expectation of a diagonal observable, and applies the parameter-shift rule for stochastic gradients before feeding them into a projected extragradient iteration. These are standard, externally validated components of variational quantum algorithms and classical saddle-point methods; none is defined in terms of the target Nash gap or fitted to it. The dominated embedding is an explicit, structure-preserving construction rather than a learned or predicted mapping. Convergence to approximate first-order stationarity is proven from variance bounds O(1/S) and standard assumptions on the stochastic oracle, without reducing to self-referential definitions or load-bearing self-citations. Numerical Nash-gap evaluation is presented separately as a practical diagnostic, explicitly acknowledging that stationarity in parameter space does not imply game-space optimality. The chain therefore contains no self-definitional, fitted-input, or self-citation reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that parameterized quantum circuits can effectively represent the required mixed strategies and that the embedding preserves the equilibrium structure, along with standard optimization assumptions. No new free parameters or invented entities are introduced.

axioms (1)

domain assumption Standard assumptions for the convergence of the projected extragradient method in the parameter space
Invoked to establish convergence to approximate first-order stationarity.

pith-pipeline@v0.9.0 · 5495 in / 1347 out tokens · 60176 ms · 2026-05-10T17:08:32.414507+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

A game theoretic perspective on adversarial machine learning and related cybersecurity applications,

Y . Zhou, M. Kantarcioglu, and B. Xi, “A game theoretic perspective on adversarial machine learning and related cybersecurity applications,” in Game Theory and Machine Learning for Cyber Security. Wiley, 2021, ch. 13, pp. 231–269

work page 2021
[2]

Games of GANs: Game-theoretical models for generative adversarial networks,

B. Boroomand, M. Jalali, A. Zareian, and M. H. Manshaei, “Games of GANs: Game-theoretical models for generative adversarial networks,” Artif. Intell. Rev., vol. 56, no. 9, pp. 10 263–10 310, 2023

work page 2023
[3]

Data-based discrete-time two-player zero-sum delayed game via policy iteration Q-learning method,

Z. Jiang, H. Zhang, and Y . Xiao, “Data-based discrete-time two-player zero-sum delayed game via policy iteration Q-learning method,”Neuro- computing, vol. 631, p. 129709, 2025

work page 2025
[4]

Two-player zero-sum games with bandit feedback,

E. Yılmaz and C. Dimitrakakis, “Two-player zero-sum games with bandit feedback,” inProc. 18th European Workshop on Reinforcement Learning (EWRL), 2025

work page 2025
[5]

Online minimax Q network learning for two-player zero-sum markov games,

Y . Zhu and D. Zhao, “Online minimax Q network learning for two-player zero-sum markov games,”IEEE Trans. Neural Netw. Learn. Syst., vol. 33, pp. 1228–1241, 2020

work page 2020
[6]

Binmore,Playing for Real: A Text on Game Theory

K. Binmore,Playing for Real: A Text on Game Theory. Oxford, U.K.: Oxford University Press, 2007

work page 2007
[7]

M. J. Osborne,Introduction to Game Theory. New York, NY , USA: Oxford University Press, 2004

work page 2004
[8]

Prox-method with rate of convergenceO(1/t)for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems,

A. Nemirovski, “Prox-method with rate of convergenceO(1/t)for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems,”SIAM J. Optim., vol. 15, no. 1, pp. 229–251, 2004

work page 2004
[9]

Last-iterate convergence: Zero-sum games and constrained min-max optimization,

C. Daskalakis and I. Panageas, “Last-iterate convergence: Zero-sum games and constrained min-max optimization,” 2025

work page 2025
[10]

A unified analysis of extra- gradient and optimistic gradient methods for saddle point problems: Proximal point approach,

A. Mokhtari, A. Ozdaglar, and S. Pattathil, “A unified analysis of extra- gradient and optimistic gradient methods for saddle point problems: Proximal point approach,” inProc. 23rd Int. Conf. Artificial Intelligence and Statistics (AISTATS), vol. 108, 2020, pp. 1497–1507

work page 2020
[11]

Fast policy extragradient methods for competitive games with entropy regularization,

S. Cen, Y . Wei, and Y . Chi, “Fast policy extragradient methods for competitive games with entropy regularization,”J. Mach. Learn. Res., vol. 25, no. 4, pp. 1–48, 2024

work page 2024
[12]

Last-iterate convergence separation between extra-gradient and optimism in constrained periodic games,

Y . Feng, P. Li, I. Panageas, and X. Wang, “Last-iterate convergence separation between extra-gradient and optimism in constrained periodic games,” inProc. 40th Conf. Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, 2024

work page 2024
[13]

Last- iterate convergence with full and noisy feedback in two-player zero- sum games,

K. Abe, K. Ariu, M. Sakamoto, K. Toyoshima, and A. Iwasaki, “Last- iterate convergence with full and noisy feedback in two-player zero- sum games,” inProc. 26th Int. Conf. Artificial Intelligence and Statistics (AISTATS), vol. 206, 2023, pp. 7999–8028

work page 2023
[14]

Last-iterate convergence properties of regret-matching algorithms in games,

Y . Cai, G. Farina, J. Grand-Cl ´ement, C. Kroer, C.-W. Lee, H. Luo, and W. Zheng, “Last-iterate convergence properties of regret-matching algorithms in games,” inProc. Int. Conf. Learning Representations (ICLR), 2025

work page 2025
[15]

Last-iterate convergence for generalized frank-wolfe in monotone variational inequalities,

Z. Chen and E. Mazumdar, “Last-iterate convergence for generalized frank-wolfe in monotone variational inequalities,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2024

work page 2024
[16]

A descent-based method on the duality gap for solving zero-sum games,

M. Fasoulakis, E. Markakis, G. Roussakis, and C. Santorinaios, “A descent-based method on the duality gap for solving zero-sum games,” in Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), Montreal, Canada, 2025

work page 2025
[17]

General parameter- shift rules for quantum gradients,

D. Wierichs, J. Izaac, C. Wang, and C. Y .-Y . Lin, “General parameter- shift rules for quantum gradients,”Quantum, vol. 6, p. 677, 2022

work page 2022
[18]

Variational quantum extragradient for zero-sum matrix games,

D. T. Do and D. T. Nguyen, “Variational quantum extragradient for zero-sum matrix games,” Arizona State University, Tech. Rep., 2026. [Online]. Available: https://tinyurl.com/ycb8kne5

work page 2026

[1] [1]

A game theoretic perspective on adversarial machine learning and related cybersecurity applications,

Y . Zhou, M. Kantarcioglu, and B. Xi, “A game theoretic perspective on adversarial machine learning and related cybersecurity applications,” in Game Theory and Machine Learning for Cyber Security. Wiley, 2021, ch. 13, pp. 231–269

work page 2021

[2] [2]

Games of GANs: Game-theoretical models for generative adversarial networks,

B. Boroomand, M. Jalali, A. Zareian, and M. H. Manshaei, “Games of GANs: Game-theoretical models for generative adversarial networks,” Artif. Intell. Rev., vol. 56, no. 9, pp. 10 263–10 310, 2023

work page 2023

[3] [3]

Data-based discrete-time two-player zero-sum delayed game via policy iteration Q-learning method,

Z. Jiang, H. Zhang, and Y . Xiao, “Data-based discrete-time two-player zero-sum delayed game via policy iteration Q-learning method,”Neuro- computing, vol. 631, p. 129709, 2025

work page 2025

[4] [4]

Two-player zero-sum games with bandit feedback,

E. Yılmaz and C. Dimitrakakis, “Two-player zero-sum games with bandit feedback,” inProc. 18th European Workshop on Reinforcement Learning (EWRL), 2025

work page 2025

[5] [5]

Online minimax Q network learning for two-player zero-sum markov games,

Y . Zhu and D. Zhao, “Online minimax Q network learning for two-player zero-sum markov games,”IEEE Trans. Neural Netw. Learn. Syst., vol. 33, pp. 1228–1241, 2020

work page 2020

[6] [6]

Binmore,Playing for Real: A Text on Game Theory

K. Binmore,Playing for Real: A Text on Game Theory. Oxford, U.K.: Oxford University Press, 2007

work page 2007

[7] [7]

M. J. Osborne,Introduction to Game Theory. New York, NY , USA: Oxford University Press, 2004

work page 2004

[8] [8]

Prox-method with rate of convergenceO(1/t)for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems,

A. Nemirovski, “Prox-method with rate of convergenceO(1/t)for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems,”SIAM J. Optim., vol. 15, no. 1, pp. 229–251, 2004

work page 2004

[9] [9]

Last-iterate convergence: Zero-sum games and constrained min-max optimization,

C. Daskalakis and I. Panageas, “Last-iterate convergence: Zero-sum games and constrained min-max optimization,” 2025

work page 2025

[10] [10]

A unified analysis of extra- gradient and optimistic gradient methods for saddle point problems: Proximal point approach,

A. Mokhtari, A. Ozdaglar, and S. Pattathil, “A unified analysis of extra- gradient and optimistic gradient methods for saddle point problems: Proximal point approach,” inProc. 23rd Int. Conf. Artificial Intelligence and Statistics (AISTATS), vol. 108, 2020, pp. 1497–1507

work page 2020

[11] [11]

Fast policy extragradient methods for competitive games with entropy regularization,

S. Cen, Y . Wei, and Y . Chi, “Fast policy extragradient methods for competitive games with entropy regularization,”J. Mach. Learn. Res., vol. 25, no. 4, pp. 1–48, 2024

work page 2024

[12] [12]

Last-iterate convergence separation between extra-gradient and optimism in constrained periodic games,

Y . Feng, P. Li, I. Panageas, and X. Wang, “Last-iterate convergence separation between extra-gradient and optimism in constrained periodic games,” inProc. 40th Conf. Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, 2024

work page 2024

[13] [13]

Last- iterate convergence with full and noisy feedback in two-player zero- sum games,

K. Abe, K. Ariu, M. Sakamoto, K. Toyoshima, and A. Iwasaki, “Last- iterate convergence with full and noisy feedback in two-player zero- sum games,” inProc. 26th Int. Conf. Artificial Intelligence and Statistics (AISTATS), vol. 206, 2023, pp. 7999–8028

work page 2023

[14] [14]

Last-iterate convergence properties of regret-matching algorithms in games,

Y . Cai, G. Farina, J. Grand-Cl ´ement, C. Kroer, C.-W. Lee, H. Luo, and W. Zheng, “Last-iterate convergence properties of regret-matching algorithms in games,” inProc. Int. Conf. Learning Representations (ICLR), 2025

work page 2025

[15] [15]

Last-iterate convergence for generalized frank-wolfe in monotone variational inequalities,

Z. Chen and E. Mazumdar, “Last-iterate convergence for generalized frank-wolfe in monotone variational inequalities,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2024

work page 2024

[16] [16]

A descent-based method on the duality gap for solving zero-sum games,

M. Fasoulakis, E. Markakis, G. Roussakis, and C. Santorinaios, “A descent-based method on the duality gap for solving zero-sum games,” in Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), Montreal, Canada, 2025

work page 2025

[17] [17]

General parameter- shift rules for quantum gradients,

D. Wierichs, J. Izaac, C. Wang, and C. Y .-Y . Lin, “General parameter- shift rules for quantum gradients,”Quantum, vol. 6, p. 677, 2022

work page 2022

[18] [18]

Variational quantum extragradient for zero-sum matrix games,

D. T. Do and D. T. Nguyen, “Variational quantum extragradient for zero-sum matrix games,” Arizona State University, Tech. Rep., 2026. [Online]. Available: https://tinyurl.com/ycb8kne5

work page 2026