pith. sign in

arxiv: 2606.02655 · v1 · pith:E4PWNVEDnew · submitted 2026-06-01 · 🪐 quant-ph · cs.GT· cs.LG· math.OC

Coherent Swap Regret and Channel-Proof Learning

Pith reviewed 2026-06-28 14:25 UTC · model grok-4.3

classification 🪐 quant-ph cs.GTcs.LGmath.OC
keywords coherent swap regretCPTP mapsquantum gamesregret minimizationquantum correlated equilibriumchannel deviationsmirror ascent
0
0 comments X

The pith

Coherent swap regret against local CPTP maps in quantum games is set by non-unital measurement-and-preparation channels at rate Ω(√(dT log d)).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that external regret fails to capture a natural physical deviation in quantum games: a player can apply any local completely positive trace-preserving map to the state it actually received. Coherent swap regret is introduced as the appropriate benchmark against all such maps. An algorithm based on entropic mirror ascent over the CPTP Choi slice with a fixed-point play rule achieves O(√(dT log d)) regret. The analysis partitions deviations into three classes whose minimax rates differ sharply: replacement channels recover ordinary external regret, unital channels including all unitaries admit zero regret, and deterministic measurement-and-preparation channels already force the full Ω(√(dT log d)) lower bound that also upper-bounds every CPTP map. The result implies that hardness stems from non-unital use of the recommendation register rather than from coherence.

Core claim

Coherent swap regret measures performance against all local CPTP maps applied to the received or prepared state. Entropic mirror ascent on the CPTP Choi slice together with a fixed-point play rule attains O(√(dT log d)) coherent swap regret. Replacement channels recover the classical Θ(√(T log d)) external-regret rate, every unital channel has minimax regret zero, and deterministic measurement-and-preparation channels already require Ω(√(dT log d)) regret in the moderate-horizon regime; this same rate is tight for the entire class of CPTP maps.

What carries the argument

entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule, which produces the O(√(dT log d)) upper bound on coherent swap regret against arbitrary local CPTP maps.

If this is right

  • Replacement channels recover ordinary external regret at rate Θ(√(T log d)).
  • Unital channels, including unitary deviations and mixtures of unitaries, have zero minimax regret.
  • Deterministic measurement-and-preparation channels force Ω(√(dT log d)) regret, and this rate suffices for all CPTP deviations.
  • Decentralized full-information learning reaches an ε-approximate separable quantum correlated equilibrium after T = O(max_i d_i log d_i / ε²) rounds.
  • An SDP audit certifies local CPTP exploitability of mediated quantum recommendation protocols for arbitrary finite-dimensional states.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The clean separation between unital and non-unital maps indicates that preserving the maximally mixed state is what makes a deviation easy to defend against.
  • The probing-bandit extension with pseudo-regret O(d^{4/3} T^{2/3} (log d)^{1/3}) under Haar-random probes may extend to other quantum online-learning settings that use random pure-state measurements.
  • Identifying equilibria with channel-proofness of mediated protocols suggests that experimental verification of approximate quantum correlated equilibria could be performed by checking SDP conditions on observed statistics alone.

Load-bearing premise

The fixed-point play rule combined with entropic mirror ascent on the CPTP Choi slice produces the claimed O(√(dT log d)) bound against arbitrary local CPTP maps.

What would settle it

A calculation or finite-horizon simulation showing that regret against some deterministic measurement-and-preparation channel exceeds the Ω(√(dT log d)) lower bound, or that the stated algorithm fails to keep regret O(√(dT log d)) against a general local CPTP map.

read the original abstract

External regret certifies stability only against replacing one's behavior by a fixed alternative. In a quantum game, this misses a natural physical move: a player can apply a local completely positive trace-preserving (CPTP) map to the state it actually received or prepared. We introduce coherent swap regret as the regret benchmark against all such local CPTP deviations, and give an algorithm achieving $O(\sqrt{dT\log d})$ coherent swap regret via entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule. The main result is a three-level deviation-class landscape. Replacement channels recover ordinary external regret at rate $\Theta(\sqrt{T\log d})$. Unital channels, including unitary deviations and mixtures of unitaries, have zero minimax regret. Deterministic measurement-and-preparation channels already force $\Omega(\sqrt{dT\log d})$ regret in the moderate-horizon regime, and this rate is also sufficient for all CPTP deviations. Thus the hardness comes from non-unital use of the recommendation register, not from quantum coherence alone. As an application, decentralized full-information learning in finite quantum games reaches an $\varepsilon$-approximate separable quantum correlated equilibrium after $T=O(\max_i d_i\log d_i/\varepsilon^2)$ rounds. We identify these equilibria with channel-proofness of mediated quantum recommendation protocols, give an SDP audit for local CPTP exploitability applicable to arbitrary finite-dimensional states, and include a probing-bandit extension with pseudo-regret $O(d^{4/3}T^{2/3}(\log d)^{1/3})$ under Haar-random pure-state probes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces coherent swap regret as the benchmark against local CPTP deviations in quantum games (beyond ordinary external regret against fixed replacements). It claims a three-level deviation landscape: replacement channels recover Θ(√(T log d)) external regret; unital channels (including unitaries) admit zero minimax regret; deterministic measurement-and-preparation channels force Ω(√(dT log d)) regret in the moderate-horizon regime, and this rate is also sufficient against arbitrary local CPTP maps. The upper bound is achieved by entropic mirror ascent on the CPTP Choi slice together with a fixed-point play rule. Applications include decentralized convergence to ε-approximate separable quantum correlated equilibrium in T = O(max_i d_i log d_i / ε²) rounds, an SDP audit for local CPTP exploitability, and a probing-bandit extension with pseudo-regret O(d^{4/3} T^{2/3} (log d)^{1/3}).

Significance. If the claimed rates and algorithm are correct, the work would clarify that hardness in quantum online learning arises from non-unital use of the recommendation register rather than from coherence itself, supplying a structured taxonomy of deviation classes and a concrete link between regret minimization and channel-proof mediated protocols. The SDP audit and equilibrium application could be practically useful for verifying exploitability in finite-dimensional quantum games.

major comments (1)
  1. [Abstract] Abstract (and main result statement): the sufficiency claim that entropic mirror ascent on the CPTP Choi slice plus the fixed-point play rule yields O(√(dT log d)) coherent swap regret against arbitrary local CPTP maps is asserted without exhibiting the potential function, the regret decomposition, or the handling of the non-unital recommendation-register constraint. Consequently the analysis cannot be verified to close or to be free of hidden horizon or unitality dependence.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying an opportunity to improve the clarity and verifiability of the main result statement. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and main result statement): the sufficiency claim that entropic mirror ascent on the CPTP Choi slice plus the fixed-point play rule yields O(√(dT log d)) coherent swap regret against arbitrary local CPTP maps is asserted without exhibiting the potential function, the regret decomposition, or the handling of the non-unital recommendation-register constraint. Consequently the analysis cannot be verified to close or to be free of hidden horizon or unitality dependence.

    Authors: The abstract is a concise summary; the full analysis appears in the body. The potential function is the negative von Neumann entropy on the Choi operator of the CPTP map (Section 3). The regret decomposition proceeds by applying the standard entropic mirror-descent inequality on the CPTP slice and using the fixed-point play rule to absorb the non-unital action on the recommendation register without introducing extra T-factors or unitality assumptions (proof of Theorem 3.1). The resulting bound is O(√(dT log d)) with no hidden horizon dependence. We agree the abstract should better signpost these elements and will add a one-sentence outline of the key steps together with a pointer to the relevant theorem and section. revision: yes

Circularity Check

0 steps flagged

No circularity; bound asserted without exhibited derivation steps or self-referential reductions

full rationale

The abstract states that an algorithm achieves O(√(dT log d)) coherent swap regret via entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule, and that this rate is sufficient for all CPTP deviations while deterministic measurement-and-preparation channels force Ω(√(dT log d)). No equations, potential functions, regret decompositions, or self-citations appear in the provided text that would reduce these claims to inputs by construction (e.g., no fitted parameters renamed as predictions, no uniqueness theorems imported from prior author work, and no ansatz smuggled via citation). The three-level landscape is presented as derived from channel classes without any load-bearing step that collapses to a self-definition or fitted input. This matches the default expectation of no significant circularity when no reduction can be quoted.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available; ledger populated from stated concepts. No explicit free parameters or invented entities listed. Relies on standard quantum information axioms (CPTP maps, Choi representation) and game-theoretic regret definitions.

axioms (2)
  • domain assumption Local CPTP maps are the natural physical deviations a player can apply to its received or prepared state.
    Abstract opens with this as the motivation for coherent swap regret.
  • standard math Finite-dimensional quantum systems with dimension d.
    Bounds and rates are stated in terms of d.

pith-pipeline@v0.9.1-grok · 5822 in / 1386 out tokens · 23084 ms · 2026-06-28T14:25:27.942122+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 5 canonical work pages

  1. [1]

    Online learning of quantum states

    Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak. Online learning of quantum states. Advances in Neural Information Processing Systems, 31, 2018.https://arxiv.org/abs/1802.09025

  2. [2]

    Robert J. Aumann. Subjectivity and correlation in randomized strategies.Journal of Mathematical Economics 1(1):67–96, 1974.https://doi.org/10.1016/0304-4068(74)90037-8

  3. [3]

    Online learning of a panoply of quantum objects.Quantum Machine Intelligence7, article 78, 2025.https://arxiv.org/abs/2406.04245

    Akshay Bansal, Ian George, Soumik Ghosh, Jamie Sikora, and Alice Zheng. Online learning of a panoply of quantum objects.Quantum Machine Intelligence7, article 78, 2025.https://arxiv.org/abs/2406.04245

  4. [4]

    From external to internal regret.Journal of Machine Learning Research 8:1307–1324, 2007.https://www.jmlr.org/papers/v8/blum07a.html

    Avrim Blum and Yishay Mansour. From external to internal regret.Journal of Machine Learning Research 8:1307–1324, 2007.https://www.jmlr.org/papers/v8/blum07a.html

  5. [5]

    Quantum game theory and the complexity of approximating quantum Nash equilibria.Quantum6:882, 2022.https://doi.org/10.22331/q-2022-12-22-882

    John Bostanci and John Watrous. Quantum game theory and the complexity of approximating quantum Nash equilibria.Quantum6:882, 2022.https://doi.org/10.22331/q-2022-12-22-882

  6. [6]

    Eric A. Carlen. Trace inequalities and quantum entropy: an introductory course. InEntropy and the Quantum, Contemporary Mathematics 529, pages 73–140. American Mathematical Society, 2010.https://doi.org/10. 1090/conm/529/10428

  7. [7]

    Quantum correlated equilibria in classical complete information games

    Alan Deckelbaum. Quantum correlated equilibria in classical complete information games. arXiv:1101.3380, 2011.https://arxiv.org/abs/1101.3380

  8. [8]

    From external to swap regret 2.0: An efficient reduction for large action spaces

    Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. From external to swap regret 2.0: An efficient reduction for large action spaces. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, 2024.https://arxiv.org/abs/2310.19786

  9. [9]

    Instance-optimal matrix multiplicative weight update and its quantum applications

    Weiyuan Gong, Tongyang Li, Xinzhao Wang, and Zhiyu Zhang. Instance-optimal matrix multiplicative weight update and its quantum applications. arXiv:2509.08911, 2025.https://arxiv.org/abs/2509.08911

  10. [10]

    Gordon, Amy Greenwald, and Casey Marks

    Geoffrey J. Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. InProceedings of the 25th International Conference on Machine Learning, pages 360–367, 2008.https://www.cs.cmu.edu/ ~ggordon/gordon-greenwald-marks-icml-phi-regret.pdf

  11. [11]

    Toward a general theory of quantum games

    Gus Gutoski and John Watrous. Toward a general theory of quantum games. InProceedings of the 39th ACM Symposium on Theory of Computing, pages 565–574, 2007.https://arxiv.org/abs/quant-ph/0611234

  12. [12]

    A simple adaptive procedure leading to correlated equilibrium.Econo- metrica68(5):1127–1150, 2000.https://doi.org/10.1111/1468-0262.00153

    Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium.Econo- metrica68(5):1127–1150, 2000.https://doi.org/10.1111/1468-0262.00153

  13. [13]

    A tight lower bound and efficient reduction for swap regret

    Shinji Ito. A tight lower bound and efficient reduction for swap regret. InAdvances in Neural Infor- mation Processing Systems, 33:18550–18559, 2020.https://proceedings.neurips.cc/paper/2020/hash/ d79c8788088c2193f0244d8f1f36d2db-Abstract.html

  14. [14]

    Parallel approximation of non-interactive zero-sum quantum games

    Rahul Jain and John Watrous. Parallel approximation of non-interactive zero-sum quantum games. In Proceedings of the 24th IEEE Conference on Computational Complexity, pages 243–253, 2009.https: //arxiv.org/abs/0808.2775

  15. [15]

    Payoff-based learning with matrix multiplicative weights in quantum games.Advances in Neural Information Processing Systems36, 2023.https://arxiv.org/abs/2311.02423

    Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos, and Jose Blanchet. Payoff-based learning with matrix multiplicative weights in quantum games.Advances in Neural Information Processing Systems36, 2023.https://arxiv.org/abs/2311.02423

  16. [16]

    No-regret learning and equilibrium computation in quantum games.Quantum8:1569, 2024.https://doi.org/10.22331/q-2024-12-17-1569

    Wayne Lin, Georgios Piliouras, Ryann Sim, and Antonios Varvitsiotis. No-regret learning and equilibrium computation in quantum games.Quantum8:1569, 2024.https://doi.org/10.22331/q-2024-12-17-1569

  17. [17]

    Caro, Jens Eisert, and Sumeet Khatri

    Asad Raza, Matthias C. Caro, Jens Eisert, and Sumeet Khatri. Online learning of quantum processes. arXiv:2406.04250, 2024.https://arxiv.org/abs/2406.04250

  18. [18]

    Full characterization of quantum correlated equilibria.Quantum Informa- tion and Computation13(9–10):846–860, 2013.https://arxiv.org/abs/1105.5353

    Zhaohui Wei and Shengyu Zhang. Full characterization of quantum correlated equilibria.Quantum Informa- tion and Computation13(9–10):846–860, 2013.https://arxiv.org/abs/1105.5353

  19. [19]

    Quantum game players can have advantage without discord.Information and Computation256:174–181, 2017.https://arxiv.org/abs/1502.00207

    Zhaohui Wei and Shengyu Zhang. Quantum game players can have advantage without discord.Information and Computation256:174–181, 2017.https://arxiv.org/abs/1502.00207

  20. [20]

    Quantum strategic game theory

    Shengyu Zhang. Quantum strategic game theory. InProceedings of the 3rd Innovations in Theoretical Com- puter Science Conference, pages 39–59, 2012.https://arxiv.org/abs/1012.5141

  21. [21]

    Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Pil- iouras, and Michael I. Jordan. A quadratic speedup in finding Nash equilibria of quantum zero-sum games. Quantum9:1737, 2025.https://doi.org/10.22331/q-2025-05-06-1737. COHERENT SWAP REGRET AND CHANNEL-PROOF LEARNING 23

  22. [22]

    Cambridge University Press, 2018

    John Watrous.The Theory of Quantum Information. Cambridge University Press, 2018. Author’s draft: https://cs.uwaterloo.ca/~watrous/TQI/TQI.pdf