Coherent Swap Regret and Channel-Proof Learning
Pith reviewed 2026-06-28 14:25 UTC · model grok-4.3
The pith
Coherent swap regret against local CPTP maps in quantum games is set by non-unital measurement-and-preparation channels at rate Ω(√(dT log d)).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Coherent swap regret measures performance against all local CPTP maps applied to the received or prepared state. Entropic mirror ascent on the CPTP Choi slice together with a fixed-point play rule attains O(√(dT log d)) coherent swap regret. Replacement channels recover the classical Θ(√(T log d)) external-regret rate, every unital channel has minimax regret zero, and deterministic measurement-and-preparation channels already require Ω(√(dT log d)) regret in the moderate-horizon regime; this same rate is tight for the entire class of CPTP maps.
What carries the argument
entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule, which produces the O(√(dT log d)) upper bound on coherent swap regret against arbitrary local CPTP maps.
If this is right
- Replacement channels recover ordinary external regret at rate Θ(√(T log d)).
- Unital channels, including unitary deviations and mixtures of unitaries, have zero minimax regret.
- Deterministic measurement-and-preparation channels force Ω(√(dT log d)) regret, and this rate suffices for all CPTP deviations.
- Decentralized full-information learning reaches an ε-approximate separable quantum correlated equilibrium after T = O(max_i d_i log d_i / ε²) rounds.
- An SDP audit certifies local CPTP exploitability of mediated quantum recommendation protocols for arbitrary finite-dimensional states.
Where Pith is reading between the lines
- The clean separation between unital and non-unital maps indicates that preserving the maximally mixed state is what makes a deviation easy to defend against.
- The probing-bandit extension with pseudo-regret O(d^{4/3} T^{2/3} (log d)^{1/3}) under Haar-random probes may extend to other quantum online-learning settings that use random pure-state measurements.
- Identifying equilibria with channel-proofness of mediated protocols suggests that experimental verification of approximate quantum correlated equilibria could be performed by checking SDP conditions on observed statistics alone.
Load-bearing premise
The fixed-point play rule combined with entropic mirror ascent on the CPTP Choi slice produces the claimed O(√(dT log d)) bound against arbitrary local CPTP maps.
What would settle it
A calculation or finite-horizon simulation showing that regret against some deterministic measurement-and-preparation channel exceeds the Ω(√(dT log d)) lower bound, or that the stated algorithm fails to keep regret O(√(dT log d)) against a general local CPTP map.
read the original abstract
External regret certifies stability only against replacing one's behavior by a fixed alternative. In a quantum game, this misses a natural physical move: a player can apply a local completely positive trace-preserving (CPTP) map to the state it actually received or prepared. We introduce coherent swap regret as the regret benchmark against all such local CPTP deviations, and give an algorithm achieving $O(\sqrt{dT\log d})$ coherent swap regret via entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule. The main result is a three-level deviation-class landscape. Replacement channels recover ordinary external regret at rate $\Theta(\sqrt{T\log d})$. Unital channels, including unitary deviations and mixtures of unitaries, have zero minimax regret. Deterministic measurement-and-preparation channels already force $\Omega(\sqrt{dT\log d})$ regret in the moderate-horizon regime, and this rate is also sufficient for all CPTP deviations. Thus the hardness comes from non-unital use of the recommendation register, not from quantum coherence alone. As an application, decentralized full-information learning in finite quantum games reaches an $\varepsilon$-approximate separable quantum correlated equilibrium after $T=O(\max_i d_i\log d_i/\varepsilon^2)$ rounds. We identify these equilibria with channel-proofness of mediated quantum recommendation protocols, give an SDP audit for local CPTP exploitability applicable to arbitrary finite-dimensional states, and include a probing-bandit extension with pseudo-regret $O(d^{4/3}T^{2/3}(\log d)^{1/3})$ under Haar-random pure-state probes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces coherent swap regret as the benchmark against local CPTP deviations in quantum games (beyond ordinary external regret against fixed replacements). It claims a three-level deviation landscape: replacement channels recover Θ(√(T log d)) external regret; unital channels (including unitaries) admit zero minimax regret; deterministic measurement-and-preparation channels force Ω(√(dT log d)) regret in the moderate-horizon regime, and this rate is also sufficient against arbitrary local CPTP maps. The upper bound is achieved by entropic mirror ascent on the CPTP Choi slice together with a fixed-point play rule. Applications include decentralized convergence to ε-approximate separable quantum correlated equilibrium in T = O(max_i d_i log d_i / ε²) rounds, an SDP audit for local CPTP exploitability, and a probing-bandit extension with pseudo-regret O(d^{4/3} T^{2/3} (log d)^{1/3}).
Significance. If the claimed rates and algorithm are correct, the work would clarify that hardness in quantum online learning arises from non-unital use of the recommendation register rather than from coherence itself, supplying a structured taxonomy of deviation classes and a concrete link between regret minimization and channel-proof mediated protocols. The SDP audit and equilibrium application could be practically useful for verifying exploitability in finite-dimensional quantum games.
major comments (1)
- [Abstract] Abstract (and main result statement): the sufficiency claim that entropic mirror ascent on the CPTP Choi slice plus the fixed-point play rule yields O(√(dT log d)) coherent swap regret against arbitrary local CPTP maps is asserted without exhibiting the potential function, the regret decomposition, or the handling of the non-unital recommendation-register constraint. Consequently the analysis cannot be verified to close or to be free of hidden horizon or unitality dependence.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for identifying an opportunity to improve the clarity and verifiability of the main result statement. We address the comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract (and main result statement): the sufficiency claim that entropic mirror ascent on the CPTP Choi slice plus the fixed-point play rule yields O(√(dT log d)) coherent swap regret against arbitrary local CPTP maps is asserted without exhibiting the potential function, the regret decomposition, or the handling of the non-unital recommendation-register constraint. Consequently the analysis cannot be verified to close or to be free of hidden horizon or unitality dependence.
Authors: The abstract is a concise summary; the full analysis appears in the body. The potential function is the negative von Neumann entropy on the Choi operator of the CPTP map (Section 3). The regret decomposition proceeds by applying the standard entropic mirror-descent inequality on the CPTP slice and using the fixed-point play rule to absorb the non-unital action on the recommendation register without introducing extra T-factors or unitality assumptions (proof of Theorem 3.1). The resulting bound is O(√(dT log d)) with no hidden horizon dependence. We agree the abstract should better signpost these elements and will add a one-sentence outline of the key steps together with a pointer to the relevant theorem and section. revision: yes
Circularity Check
No circularity; bound asserted without exhibited derivation steps or self-referential reductions
full rationale
The abstract states that an algorithm achieves O(√(dT log d)) coherent swap regret via entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule, and that this rate is sufficient for all CPTP deviations while deterministic measurement-and-preparation channels force Ω(√(dT log d)). No equations, potential functions, regret decompositions, or self-citations appear in the provided text that would reduce these claims to inputs by construction (e.g., no fitted parameters renamed as predictions, no uniqueness theorems imported from prior author work, and no ansatz smuggled via citation). The three-level landscape is presented as derived from channel classes without any load-bearing step that collapses to a self-definition or fitted input. This matches the default expectation of no significant circularity when no reduction can be quoted.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Local CPTP maps are the natural physical deviations a player can apply to its received or prepared state.
- standard math Finite-dimensional quantum systems with dimension d.
Reference graph
Works this paper leans on
-
[1]
Online learning of quantum states
Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak. Online learning of quantum states. Advances in Neural Information Processing Systems, 31, 2018.https://arxiv.org/abs/1802.09025
arXiv 2018
-
[2]
Robert J. Aumann. Subjectivity and correlation in randomized strategies.Journal of Mathematical Economics 1(1):67–96, 1974.https://doi.org/10.1016/0304-4068(74)90037-8
-
[3]
Akshay Bansal, Ian George, Soumik Ghosh, Jamie Sikora, and Alice Zheng. Online learning of a panoply of quantum objects.Quantum Machine Intelligence7, article 78, 2025.https://arxiv.org/abs/2406.04245
arXiv 2025
-
[4]
From external to internal regret.Journal of Machine Learning Research 8:1307–1324, 2007.https://www.jmlr.org/papers/v8/blum07a.html
Avrim Blum and Yishay Mansour. From external to internal regret.Journal of Machine Learning Research 8:1307–1324, 2007.https://www.jmlr.org/papers/v8/blum07a.html
2007
-
[5]
John Bostanci and John Watrous. Quantum game theory and the complexity of approximating quantum Nash equilibria.Quantum6:882, 2022.https://doi.org/10.22331/q-2022-12-22-882
-
[6]
Eric A. Carlen. Trace inequalities and quantum entropy: an introductory course. InEntropy and the Quantum, Contemporary Mathematics 529, pages 73–140. American Mathematical Society, 2010.https://doi.org/10. 1090/conm/529/10428
2010
-
[7]
Quantum correlated equilibria in classical complete information games
Alan Deckelbaum. Quantum correlated equilibria in classical complete information games. arXiv:1101.3380, 2011.https://arxiv.org/abs/1101.3380
Pith/arXiv arXiv 2011
-
[8]
From external to swap regret 2.0: An efficient reduction for large action spaces
Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. From external to swap regret 2.0: An efficient reduction for large action spaces. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, 2024.https://arxiv.org/abs/2310.19786
arXiv 2024
-
[9]
Instance-optimal matrix multiplicative weight update and its quantum applications
Weiyuan Gong, Tongyang Li, Xinzhao Wang, and Zhiyu Zhang. Instance-optimal matrix multiplicative weight update and its quantum applications. arXiv:2509.08911, 2025.https://arxiv.org/abs/2509.08911
arXiv 2025
-
[10]
Gordon, Amy Greenwald, and Casey Marks
Geoffrey J. Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. InProceedings of the 25th International Conference on Machine Learning, pages 360–367, 2008.https://www.cs.cmu.edu/ ~ggordon/gordon-greenwald-marks-icml-phi-regret.pdf
2008
-
[11]
Toward a general theory of quantum games
Gus Gutoski and John Watrous. Toward a general theory of quantum games. InProceedings of the 39th ACM Symposium on Theory of Computing, pages 565–574, 2007.https://arxiv.org/abs/quant-ph/0611234
Pith/arXiv arXiv 2007
-
[12]
Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium.Econo- metrica68(5):1127–1150, 2000.https://doi.org/10.1111/1468-0262.00153
-
[13]
A tight lower bound and efficient reduction for swap regret
Shinji Ito. A tight lower bound and efficient reduction for swap regret. InAdvances in Neural Infor- mation Processing Systems, 33:18550–18559, 2020.https://proceedings.neurips.cc/paper/2020/hash/ d79c8788088c2193f0244d8f1f36d2db-Abstract.html
2020
-
[14]
Parallel approximation of non-interactive zero-sum quantum games
Rahul Jain and John Watrous. Parallel approximation of non-interactive zero-sum quantum games. In Proceedings of the 24th IEEE Conference on Computational Complexity, pages 243–253, 2009.https: //arxiv.org/abs/0808.2775
Pith/arXiv arXiv 2009
-
[15]
Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos, and Jose Blanchet. Payoff-based learning with matrix multiplicative weights in quantum games.Advances in Neural Information Processing Systems36, 2023.https://arxiv.org/abs/2311.02423
arXiv 2023
-
[16]
Wayne Lin, Georgios Piliouras, Ryann Sim, and Antonios Varvitsiotis. No-regret learning and equilibrium computation in quantum games.Quantum8:1569, 2024.https://doi.org/10.22331/q-2024-12-17-1569
-
[17]
Caro, Jens Eisert, and Sumeet Khatri
Asad Raza, Matthias C. Caro, Jens Eisert, and Sumeet Khatri. Online learning of quantum processes. arXiv:2406.04250, 2024.https://arxiv.org/abs/2406.04250
arXiv 2024
-
[18]
Zhaohui Wei and Shengyu Zhang. Full characterization of quantum correlated equilibria.Quantum Informa- tion and Computation13(9–10):846–860, 2013.https://arxiv.org/abs/1105.5353
Pith/arXiv arXiv 2013
-
[19]
Zhaohui Wei and Shengyu Zhang. Quantum game players can have advantage without discord.Information and Computation256:174–181, 2017.https://arxiv.org/abs/1502.00207
Pith/arXiv arXiv 2017
-
[20]
Shengyu Zhang. Quantum strategic game theory. InProceedings of the 3rd Innovations in Theoretical Com- puter Science Conference, pages 39–59, 2012.https://arxiv.org/abs/1012.5141
Pith/arXiv arXiv 2012
-
[21]
Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Pil- iouras, and Michael I. Jordan. A quadratic speedup in finding Nash equilibria of quantum zero-sum games. Quantum9:1737, 2025.https://doi.org/10.22331/q-2025-05-06-1737. COHERENT SWAP REGRET AND CHANNEL-PROOF LEARNING 23
-
[22]
Cambridge University Press, 2018
John Watrous.The Theory of Quantum Information. Cambridge University Press, 2018. Author’s draft: https://cs.uwaterloo.ca/~watrous/TQI/TQI.pdf
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.