Recognition: no theorem link
Swap Regret Minimization Through Response-Based Approachability
Pith reviewed 2026-05-16 06:29 UTC · model grok-4.3
The pith
A simpler algorithm achieves O(d^{3/2} √T) linear swap regret over general convex sets via response-based approachability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a simpler algorithm for linear swap regret minimization that guarantees O(d^{3/2} √T) regret for general convex sets and O(d √T) regret when the set is centrally symmetric. The method relies on the response-based approachability framework combined with John ellipsoid preconditioning. It also minimizes profile swap regret and extends to the set of swap deviations with polynomial dimension. We prove a matching Ω(d √T) lower bound showing that the classic Gordon-Greenwald-Marks algorithm is information-theoretically optimal.
What carries the argument
Response-based approachability framework applied to linear swap regret after John ellipsoid preconditioning of the convex decision set.
Load-bearing premise
The response-based approachability framework of Bernstein and Shimkin can be directly applied to linear swap regret minimization over convex sets after John ellipsoid preconditioning without hidden computational costs or extra assumptions on the losses.
What would settle it
A concrete convex set in dimension d where the algorithm's realized linear swap regret exceeds O(d^{3/2} √T) for large T, or an explicit strategy for a learner that achieves o(d √T) regret on a centrally symmetric set.
read the original abstract
We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee non-manipulability against strategic adversaries. The only computationally efficient algorithm for minimizing linear swap regret over a general convex set in $\mathbb{R}^d$ was developed recently by Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25). However, it incurs a highly suboptimal regret bound of $\Omega(d^4 \sqrt{T})$ and also relies on computationally intensive calls to the ellipsoid algorithm at each iteration. In this paper, we develop a significantly simpler, computationally efficient algorithm that guarantees $O(d^{3/2} \sqrt{T})$ linear swap regret for a general convex set and $O(d \sqrt{T})$ when the set is centrally symmetric. Our approach leverages the powerful response-based approachability framework of Bernstein and Shimkin (JMLR '15) -- previously overlooked in the line of work on swap regret minimization -- combined with geometric preconditioning via the John ellipsoid. Our algorithm simultaneously minimizes profile swap regret, which was recently shown to guarantee non-manipulability. Moreover, we establish a matching information-theoretic lower bound: any learner must incur in expectation $\Omega(d \sqrt{T})$ linear swap regret for large enough $T$, even when the set is centrally symmetric. This also shows that the classic algorithm of Gordon, Greenwald, and Marks (ICML '08) is existentially optimal for minimizing linear swap regret, although it is computationally inefficient. Finally, we extend our approach to minimize regret with respect to the set of swap deviations with polynomial dimension, unifying and strengthening recent results in equilibrium computation and online learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a simpler algorithm for minimizing linear swap regret over convex sets in online optimization by combining the response-based approachability framework of Bernstein and Shimkin with one-time John ellipsoid preconditioning. It achieves O(d^{3/2} √T) regret for general convex sets and O(d √T) for centrally symmetric sets, simultaneously handles profile swap regret, proves a matching Ω(d √T) information-theoretic lower bound, and extends the method to swap deviations of polynomial dimension.
Significance. If the central claims hold, the work substantially improves computational efficiency and regret bounds over the recent STOC '25 result while establishing optimality for the symmetric case. The application of the previously overlooked Bernstein-Shimkin framework is a clear strength, as is the unification of results across equilibrium computation and online learning. The matching lower bound and reproducible algorithmic construction add to the contribution.
major comments (2)
- [§3] §3 (Algorithm description): The one-time John ellipsoid preconditioning must be shown to preserve the linear structure of the swap deviation set without introducing per-round oracle calls or hidden factors that would affect the claimed O(d^{3/2} √T) bound; the current sketch leaves the distortion analysis implicit.
- [Theorem 5.2] Theorem 5.2 (lower bound): The information-theoretic construction for Ω(d √T) should explicitly verify that it applies to centrally symmetric sets under the same loss assumptions used in the upper bound; any gap in the adversary construction would weaken the optimality claim for the Gordon et al. algorithm.
minor comments (2)
- [Eq. (8)] Notation for the preconditioned swap deviation set (around Eq. (8)) could be clarified to distinguish it from the original set after the linear transformation.
- [Related Work] The related-work discussion of Daskalakis et al. (STOC '25) should include a brief comparison table of per-iteration oracle complexity to highlight the efficiency gain.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation for minor revision. We appreciate the suggestions for clarifying the algorithmic details and lower-bound construction, and we will incorporate the requested expansions in the revised manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Algorithm description): The one-time John ellipsoid preconditioning must be shown to preserve the linear structure of the swap deviation set without introducing per-round oracle calls or hidden factors that would affect the claimed O(d^{3/2} √T) bound; the current sketch leaves the distortion analysis implicit.
Authors: We agree that the distortion analysis should be made fully explicit. In the revised manuscript we will expand §3 with a complete, self-contained argument showing that the one-time John ellipsoid transformation preserves the linear structure of the swap deviation set. The preconditioning is performed only once before the online phase begins, after which the response-based approachability procedure uses exactly the same per-round oracles as the un-preconditioned version; no additional factors appear in the O(d^{3/2} √T) bound. revision: yes
-
Referee: [Theorem 5.2] Theorem 5.2 (lower bound): The information-theoretic construction for Ω(d √T) should explicitly verify that it applies to centrally symmetric sets under the same loss assumptions used in the upper bound; any gap in the adversary construction would weaken the optimality claim for the Gordon et al. algorithm.
Authors: We will add an explicit paragraph in the proof of Theorem 5.2 verifying that the information-theoretic adversary construction applies verbatim to centrally symmetric convex sets under the identical loss assumptions used for the upper bound. This will confirm that the Ω(d √T) lower bound holds in the symmetric case and therefore establishes the existential optimality of the Gordon et al. algorithm for that setting as well. revision: yes
Circularity Check
No significant circularity: derivation combines external frameworks without reduction to inputs
full rationale
The paper derives its O(d^{3/2} √T) and O(d √T) regret bounds by applying the Bernstein-Shimkin response-based approachability framework (JMLR '15) after a one-time John ellipsoid preconditioning of the decision set. These are independent external tools whose properties are invoked directly; the resulting bounds follow from standard approachability guarantees on the transformed set and do not reduce by construction to any fitted parameter, self-definition, or prior result by the same authors. The matching Ω(d √T) lower bound is information-theoretic. Overlapping-author citations to Daskalakis et al. (STOC '25) are used only for context on prior suboptimal algorithms and do not carry the central claim. The construction is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Decision set is a compact convex subset of R^d
- domain assumption Response-based approachability framework applies directly to linear swap regret after preconditioning
Forward citations
Cited by 1 Pith paper
-
An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $\Phi$-Regret Minimization
Black-box reductions from no-regret online learning to multicalibration and from multicalibration to Phi-regret minimization are established, resolving the main open question in Garg et al. (SODA '24).
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.