Swap Regret Minimization Through Response-Based Approachability

Gabriele Farina; Haipeng Luo; Ioannis Anagnostides; Jon Schneider; Maxwell Fishelson

arxiv: 2602.06264 · v3 · pith:PE6GDVQInew · submitted 2026-02-05 · 💻 cs.LG

Swap Regret Minimization Through Response-Based Approachability

Ioannis Anagnostides , Gabriele Farina , Maxwell Fishelson , Haipeng Luo , Jon Schneider This is my paper

Pith reviewed 2026-05-22 11:28 UTC · model grok-4.3

classification 💻 cs.LG

keywords swap regretonline optimizationapproachabilityconvex setsJohn ellipsoidcorrelated equilibriumnon-manipulability

0 comments

The pith

A simpler algorithm achieves optimal O(d sqrt(T)) linear swap regret over preconditioned convex sets using response-based approachability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a computationally efficient method to minimize linear swap regret in online optimization over general convex decision sets. By preconditioning the set with the John ellipsoid and applying the response-based approachability framework, it reaches a regret bound of O(d sqrt(T)). The approach simultaneously handles profile swap regret to prevent strategic manipulation and matches an information-theoretic lower bound of Omega(d sqrt(T)). It also extends to minimizing regret against swap deviations of polynomial dimension, unifying prior results on equilibria and learning.

Core claim

We develop a significantly simpler, computationally efficient algorithm that guarantees O(d sqrt(T)) linear swap regret for a general convex set that has been preconditioned via the John ellipsoid. Our algorithm leverages the response-based approachability framework and simultaneously minimizes profile swap regret. We establish a matching lower bound showing that any learner must incur Omega(d sqrt(T)) linear swap regret even on centrally symmetric sets, and extend the method to polynomial-dimension swap deviations.

What carries the argument

Response-based approachability framework, which reduces swap regret minimization to approaching a target set of response functions without requiring per-step ellipsoid calls.

If this is right

Linear swap regret of O(d sqrt(T)) is achieved efficiently without repeated ellipsoid algorithm calls.
Profile swap regret is minimized at the same time, guaranteeing non-manipulability against strategic adversaries.
The classic Gordon-Greenwald-Marks algorithm is existentially optimal despite being computationally inefficient.
Regret minimization extends to the set of all swap deviations whose dimension is polynomial in d.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Preconditioning steps like the John ellipsoid could transfer to other online learning problems involving high-dimensional convex bodies.
The lower bound indicates that dimension-linear dependence is unavoidable in worst-case swap regret even without computational constraints.
Equilibrium computation in games may become simpler by replacing complex regret minimizers with this response-based method.

Load-bearing premise

The convex decision set admits effective preconditioning via the John ellipsoid so the response-based approachability framework applies directly without extra computational cost.

What would settle it

Running the algorithm on a centrally symmetric convex set preconditioned by the John ellipsoid and observing average linear swap regret exceeding c d sqrt(T) for large T and some constant c would falsify the claimed bound.

read the original abstract

We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee non-manipulability against strategic adversaries. The only computationally efficient algorithm for minimizing linear swap regret over a general convex set in $\mathbb{R}^d$ was developed recently by Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25). However, it incurs a highly suboptimal regret bound of $\Omega(d^4 \sqrt{T})$ and also relies on computationally intensive calls to the ellipsoid algorithm at each iteration. In this paper, we develop a significantly simpler, computationally efficient algorithm that guarantees $O(d \sqrt{T})$ linear swap regret for a general convex set that has been preconditioned via the John ellipsoid. Our algorithm leverages the powerful response-based approachability framework of Bernstein and Shimkin (JMLR~'15) -- previously overlooked in the line of work on swap regret minimization -- and simultaneously minimizes profile swap regret, which was recently shown to guarantee non-manipulability. Moreover, we establish a matching information-theoretic lower bound: any learner must incur in expectation $\Omega(d \sqrt{T})$ linear swap regret for large enough $T$, even when the set is centrally symmetric. This also shows that the classic algorithm of Gordon, Greenwald, and Marks (ICML '08) is existentially optimal for minimizing linear swap regret, although it is computationally inefficient. Finally, we extend our approach to minimize regret with respect to the set of swap deviations with polynomial dimension, unifying and strengthening recent results in equilibrium computation and online learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They tighten linear swap regret to O(d sqrt(T)) with a simpler response-based algorithm after John preconditioning, plus a matching lower bound.

read the letter

The main point is that this work improves the regret bound for linear swap regret from the recent d^4 sqrt(T) result down to O(d sqrt(T)) using the response-based approachability framework from Bernstein and Shimkin. They precondition the convex set with the John ellipsoid so the response function satisfies the conditions for the approachability theorem, and the algorithm stays computationally lighter by skipping repeated ellipsoid calls. They also get the same bound for profile swap regret, which matters for non-manipulability, and they prove a matching Omega(d sqrt(T)) lower bound that holds even for centrally symmetric sets. That lower bound shows the older Gordon-Greenwald-Marks algorithm was already information-theoretically optimal, though still inefficient. The extension to swap deviations of polynomial dimension pulls together some recent threads in equilibrium computation and online learning. The argument looks internally consistent; the preconditioning step is presented as enabling the linear dependence on d without hidden costs in the main loop. One minor practical question is how expensive the initial John ellipsoid computation is in high dimensions, but the paper treats it as a one-time cost and focuses on per-round efficiency. This is useful reading for people working on regret minimization in games or on correlated equilibria with computational constraints. The bounds are tight enough and the technique is clean enough that it deserves a full referee process rather than a desk reject.

Referee Report

0 major / 3 minor

Summary. The paper develops a computationally efficient algorithm for minimizing linear swap regret (and profile swap regret) over general convex sets in R^d. By preconditioning the decision set via the John ellipsoid, the authors apply the response-based approachability framework of Bernstein and Shimkin to obtain an O(d sqrt(T)) regret guarantee. They also prove a matching information-theoretic lower bound of Omega(d sqrt(T)) even for centrally symmetric sets, and extend the method to swap deviations whose dimension is polynomial in d. The approach is positioned as simpler than the recent STOC '25 algorithm of Daskalakis et al. while achieving optimal dependence on d.

Significance. If the central claims hold, the work is significant for closing the gap between the best known regret bound and the information-theoretic lower bound for linear swap regret. It provides an optimal O(d sqrt(T)) guarantee together with computational efficiency, leverages an overlooked approachability framework, and unifies results across online learning and equilibrium computation. The matching lower bound and the explicit demonstration that the Gordon et al. algorithm is existentially optimal (though inefficient) are notable strengths.

minor comments (3)

[Abstract and §3] Abstract and §3: the statement that the set 'has been preconditioned via the John ellipsoid' should include a brief remark on the one-time computational cost of computing the ellipsoid and how it interacts with the per-round response computation.
[§4.2] §4.2, the response function definition: clarify whether the response map is computed exactly or approximately and how any approximation error propagates into the O(d sqrt(T)) bound.
[§5] §5, lower-bound construction: explicitly state the dimension and symmetry assumptions under which the Omega(d sqrt(T)) bound is shown to be tight.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation of minor revision. The summary accurately reflects our contributions: an efficient algorithm achieving O(d sqrt(T)) linear swap regret via response-based approachability after John ellipsoid preconditioning, a matching Omega(d sqrt(T)) lower bound even for centrally symmetric sets, and extensions to polynomial-dimension swap deviations. We appreciate the recognition that this closes the gap to the information-theoretic optimum while being simpler than the STOC '25 result.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation relies on the external response-based approachability framework of Bernstein and Shimkin (JMLR '15) and the John ellipsoid preconditioning technique, both cited as independent prior results with no overlap in the load-bearing steps. The O(d sqrt(T)) upper bound follows directly from applying the cited approachability theorem to the preconditioned convex body, while the matching Omega(d sqrt(T)) lower bound is established via a separate information-theoretic construction that does not reduce to any fitted parameter or self-defined quantity from the authors' prior work. The efficiency claim avoids the ellipsoid method invocations of the cited STOC '25 paper by the same overlapping authors, instead using simpler response computations. No self-definitional, fitted-input, or self-citation-load-bearing reductions appear in the claimed chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions from online convex optimization without introducing new free parameters or invented entities.

axioms (2)

domain assumption The decision set is a convex body in R^d that can be preconditioned via the John ellipsoid.
Invoked to achieve the improved regret bound in the main algorithmic result.
domain assumption Response-based approachability framework applies directly to the swap regret minimization setting.
Core technical tool cited from Bernstein and Shimkin.

pith-pipeline@v0.9.0 · 5849 in / 1331 out tokens · 85445 ms · 2026-05-22T11:28:27.200642+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $\Phi$-Regret Minimization
cs.LG 2026-04 unverdicted novelty 8.0

Black-box reductions from no-regret online learning to multicalibration and from multicalibration to Phi-regret minimization are established, resolving the main open question in Garg et al. (SODA '24).