Swap Regret Minimization Through Response-Based Approachability
Pith reviewed 2026-05-22 11:28 UTC · model grok-4.3
The pith
A simpler algorithm achieves optimal O(d sqrt(T)) linear swap regret over preconditioned convex sets using response-based approachability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop a significantly simpler, computationally efficient algorithm that guarantees O(d sqrt(T)) linear swap regret for a general convex set that has been preconditioned via the John ellipsoid. Our algorithm leverages the response-based approachability framework and simultaneously minimizes profile swap regret. We establish a matching lower bound showing that any learner must incur Omega(d sqrt(T)) linear swap regret even on centrally symmetric sets, and extend the method to polynomial-dimension swap deviations.
What carries the argument
Response-based approachability framework, which reduces swap regret minimization to approaching a target set of response functions without requiring per-step ellipsoid calls.
If this is right
- Linear swap regret of O(d sqrt(T)) is achieved efficiently without repeated ellipsoid algorithm calls.
- Profile swap regret is minimized at the same time, guaranteeing non-manipulability against strategic adversaries.
- The classic Gordon-Greenwald-Marks algorithm is existentially optimal despite being computationally inefficient.
- Regret minimization extends to the set of all swap deviations whose dimension is polynomial in d.
Where Pith is reading between the lines
- Preconditioning steps like the John ellipsoid could transfer to other online learning problems involving high-dimensional convex bodies.
- The lower bound indicates that dimension-linear dependence is unavoidable in worst-case swap regret even without computational constraints.
- Equilibrium computation in games may become simpler by replacing complex regret minimizers with this response-based method.
Load-bearing premise
The convex decision set admits effective preconditioning via the John ellipsoid so the response-based approachability framework applies directly without extra computational cost.
What would settle it
Running the algorithm on a centrally symmetric convex set preconditioned by the John ellipsoid and observing average linear swap regret exceeding c d sqrt(T) for large T and some constant c would falsify the claimed bound.
read the original abstract
We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee non-manipulability against strategic adversaries. The only computationally efficient algorithm for minimizing linear swap regret over a general convex set in $\mathbb{R}^d$ was developed recently by Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25). However, it incurs a highly suboptimal regret bound of $\Omega(d^4 \sqrt{T})$ and also relies on computationally intensive calls to the ellipsoid algorithm at each iteration. In this paper, we develop a significantly simpler, computationally efficient algorithm that guarantees $O(d \sqrt{T})$ linear swap regret for a general convex set that has been preconditioned via the John ellipsoid. Our algorithm leverages the powerful response-based approachability framework of Bernstein and Shimkin (JMLR~'15) -- previously overlooked in the line of work on swap regret minimization -- and simultaneously minimizes profile swap regret, which was recently shown to guarantee non-manipulability. Moreover, we establish a matching information-theoretic lower bound: any learner must incur in expectation $\Omega(d \sqrt{T})$ linear swap regret for large enough $T$, even when the set is centrally symmetric. This also shows that the classic algorithm of Gordon, Greenwald, and Marks (ICML '08) is existentially optimal for minimizing linear swap regret, although it is computationally inefficient. Finally, we extend our approach to minimize regret with respect to the set of swap deviations with polynomial dimension, unifying and strengthening recent results in equilibrium computation and online learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a computationally efficient algorithm for minimizing linear swap regret (and profile swap regret) over general convex sets in R^d. By preconditioning the decision set via the John ellipsoid, the authors apply the response-based approachability framework of Bernstein and Shimkin to obtain an O(d sqrt(T)) regret guarantee. They also prove a matching information-theoretic lower bound of Omega(d sqrt(T)) even for centrally symmetric sets, and extend the method to swap deviations whose dimension is polynomial in d. The approach is positioned as simpler than the recent STOC '25 algorithm of Daskalakis et al. while achieving optimal dependence on d.
Significance. If the central claims hold, the work is significant for closing the gap between the best known regret bound and the information-theoretic lower bound for linear swap regret. It provides an optimal O(d sqrt(T)) guarantee together with computational efficiency, leverages an overlooked approachability framework, and unifies results across online learning and equilibrium computation. The matching lower bound and the explicit demonstration that the Gordon et al. algorithm is existentially optimal (though inefficient) are notable strengths.
minor comments (3)
- [Abstract and §3] Abstract and §3: the statement that the set 'has been preconditioned via the John ellipsoid' should include a brief remark on the one-time computational cost of computing the ellipsoid and how it interacts with the per-round response computation.
- [§4.2] §4.2, the response function definition: clarify whether the response map is computed exactly or approximately and how any approximation error propagates into the O(d sqrt(T)) bound.
- [§5] §5, lower-bound construction: explicitly state the dimension and symmetry assumptions under which the Omega(d sqrt(T)) bound is shown to be tight.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript and the recommendation of minor revision. The summary accurately reflects our contributions: an efficient algorithm achieving O(d sqrt(T)) linear swap regret via response-based approachability after John ellipsoid preconditioning, a matching Omega(d sqrt(T)) lower bound even for centrally symmetric sets, and extensions to polynomial-dimension swap deviations. We appreciate the recognition that this closes the gap to the information-theoretic optimum while being simpler than the STOC '25 result.
Circularity Check
No significant circularity detected
full rationale
The derivation relies on the external response-based approachability framework of Bernstein and Shimkin (JMLR '15) and the John ellipsoid preconditioning technique, both cited as independent prior results with no overlap in the load-bearing steps. The O(d sqrt(T)) upper bound follows directly from applying the cited approachability theorem to the preconditioned convex body, while the matching Omega(d sqrt(T)) lower bound is established via a separate information-theoretic construction that does not reduce to any fitted parameter or self-defined quantity from the authors' prior work. The efficiency claim avoids the ellipsoid method invocations of the cited STOC '25 paper by the same overlapping authors, instead using simpler response computations. No self-definitional, fitted-input, or self-citation-load-bearing reductions appear in the claimed chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The decision set is a convex body in R^d that can be preconditioned via the John ellipsoid.
- domain assumption Response-based approachability framework applies directly to the swap regret minimization setting.
Forward citations
Cited by 1 Pith paper
-
An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $\Phi$-Regret Minimization
Black-box reductions from no-regret online learning to multicalibration and from multicalibration to Phi-regret minimization are established, resolving the main open question in Garg et al. (SODA '24).
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.