arxiv: 2602.06264 · v2 · submitted 2026-02-05 · 💻 cs.LG

Recognition: no theorem link

Swap Regret Minimization Through Response-Based Approachability

Ioannis Anagnostides , Gabriele Farina , Maxwell Fishelson , Haipeng Luo , Jon Schneider

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:29 UTC · model grok-4.3

classification 💻 cs.LG

keywords swap regretonline convex optimizationapproachabilitycorrelated equilibriumJohn ellipsoidregret boundsconvex setsonline learning

0 comments

The pith

A simpler algorithm achieves O(d^{3/2} √T) linear swap regret over general convex sets via response-based approachability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a computationally efficient algorithm for minimizing linear swap regret in online optimization over convex sets in R^d. It applies the response-based approachability framework after preconditioning with the John ellipsoid to obtain O(d^{3/2} √T) regret for general sets and O(d √T) when the set is centrally symmetric. This simultaneously handles profile swap regret, which guarantees non-manipulability against strategic adversaries. The work also proves a matching information-theoretic lower bound of Ω(d √T) that holds even for symmetric sets and extends the method to swap deviations of polynomial dimension.

Core claim

We present a simpler algorithm for linear swap regret minimization that guarantees O(d^{3/2} √T) regret for general convex sets and O(d √T) regret when the set is centrally symmetric. The method relies on the response-based approachability framework combined with John ellipsoid preconditioning. It also minimizes profile swap regret and extends to the set of swap deviations with polynomial dimension. We prove a matching Ω(d √T) lower bound showing that the classic Gordon-Greenwald-Marks algorithm is information-theoretically optimal.

What carries the argument

Response-based approachability framework applied to linear swap regret after John ellipsoid preconditioning of the convex decision set.

Load-bearing premise

The response-based approachability framework of Bernstein and Shimkin can be directly applied to linear swap regret minimization over convex sets after John ellipsoid preconditioning without hidden computational costs or extra assumptions on the losses.

What would settle it

A concrete convex set in dimension d where the algorithm's realized linear swap regret exceeds O(d^{3/2} √T) for large T, or an explicit strategy for a learner that achieves o(d √T) regret on a centrally symmetric set.

read the original abstract

We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee non-manipulability against strategic adversaries. The only computationally efficient algorithm for minimizing linear swap regret over a general convex set in $\mathbb{R}^d$ was developed recently by Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25). However, it incurs a highly suboptimal regret bound of $\Omega(d^4 \sqrt{T})$ and also relies on computationally intensive calls to the ellipsoid algorithm at each iteration. In this paper, we develop a significantly simpler, computationally efficient algorithm that guarantees $O(d^{3/2} \sqrt{T})$ linear swap regret for a general convex set and $O(d \sqrt{T})$ when the set is centrally symmetric. Our approach leverages the powerful response-based approachability framework of Bernstein and Shimkin (JMLR '15) -- previously overlooked in the line of work on swap regret minimization -- combined with geometric preconditioning via the John ellipsoid. Our algorithm simultaneously minimizes profile swap regret, which was recently shown to guarantee non-manipulability. Moreover, we establish a matching information-theoretic lower bound: any learner must incur in expectation $\Omega(d \sqrt{T})$ linear swap regret for large enough $T$, even when the set is centrally symmetric. This also shows that the classic algorithm of Gordon, Greenwald, and Marks (ICML '08) is existentially optimal for minimizing linear swap regret, although it is computationally inefficient. Finally, we extend our approach to minimize regret with respect to the set of swap deviations with polynomial dimension, unifying and strengthening recent results in equilibrium computation and online learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Cleaner algorithm tightens swap regret to near-optimal O(d^{3/2} sqrt T) using overlooked response-based approachability plus one-time John preconditioning.

read the letter

The main point is that this paper gives a simpler and stronger algorithm for linear swap regret. It applies the Bernstein-Shimkin response-based approachability framework, which had been ignored in this line of work, together with a single John ellipsoid preconditioning step. The result is O(d^{3/2} sqrt T) regret over general convex sets and O(d sqrt T) when the set is centrally symmetric, plus a matching Omega(d sqrt T) lower bound. That is a solid improvement over the recent d^4 bound and per-step ellipsoid calls from the STOC '25 paper.

Referee Report

2 major / 2 minor

Summary. The paper develops a simpler algorithm for minimizing linear swap regret over convex sets in online optimization by combining the response-based approachability framework of Bernstein and Shimkin with one-time John ellipsoid preconditioning. It achieves O(d^{3/2} √T) regret for general convex sets and O(d √T) for centrally symmetric sets, simultaneously handles profile swap regret, proves a matching Ω(d √T) information-theoretic lower bound, and extends the method to swap deviations of polynomial dimension.

Significance. If the central claims hold, the work substantially improves computational efficiency and regret bounds over the recent STOC '25 result while establishing optimality for the symmetric case. The application of the previously overlooked Bernstein-Shimkin framework is a clear strength, as is the unification of results across equilibrium computation and online learning. The matching lower bound and reproducible algorithmic construction add to the contribution.

major comments (2)

[§3] §3 (Algorithm description): The one-time John ellipsoid preconditioning must be shown to preserve the linear structure of the swap deviation set without introducing per-round oracle calls or hidden factors that would affect the claimed O(d^{3/2} √T) bound; the current sketch leaves the distortion analysis implicit.
[Theorem 5.2] Theorem 5.2 (lower bound): The information-theoretic construction for Ω(d √T) should explicitly verify that it applies to centrally symmetric sets under the same loss assumptions used in the upper bound; any gap in the adversary construction would weaken the optimality claim for the Gordon et al. algorithm.

minor comments (2)

[Eq. (8)] Notation for the preconditioned swap deviation set (around Eq. (8)) could be clarified to distinguish it from the original set after the linear transformation.
[Related Work] The related-work discussion of Daskalakis et al. (STOC '25) should include a brief comparison table of per-iteration oracle complexity to highlight the efficiency gain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. We appreciate the suggestions for clarifying the algorithmic details and lower-bound construction, and we will incorporate the requested expansions in the revised manuscript.

read point-by-point responses

Referee: [§3] §3 (Algorithm description): The one-time John ellipsoid preconditioning must be shown to preserve the linear structure of the swap deviation set without introducing per-round oracle calls or hidden factors that would affect the claimed O(d^{3/2} √T) bound; the current sketch leaves the distortion analysis implicit.

Authors: We agree that the distortion analysis should be made fully explicit. In the revised manuscript we will expand §3 with a complete, self-contained argument showing that the one-time John ellipsoid transformation preserves the linear structure of the swap deviation set. The preconditioning is performed only once before the online phase begins, after which the response-based approachability procedure uses exactly the same per-round oracles as the un-preconditioned version; no additional factors appear in the O(d^{3/2} √T) bound. revision: yes
Referee: [Theorem 5.2] Theorem 5.2 (lower bound): The information-theoretic construction for Ω(d √T) should explicitly verify that it applies to centrally symmetric sets under the same loss assumptions used in the upper bound; any gap in the adversary construction would weaken the optimality claim for the Gordon et al. algorithm.

Authors: We will add an explicit paragraph in the proof of Theorem 5.2 verifying that the information-theoretic adversary construction applies verbatim to centrally symmetric convex sets under the identical loss assumptions used for the upper bound. This will confirm that the Ω(d √T) lower bound holds in the symmetric case and therefore establishes the existential optimality of the Gordon et al. algorithm for that setting as well. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation combines external frameworks without reduction to inputs

full rationale

The paper derives its O(d^{3/2} √T) and O(d √T) regret bounds by applying the Bernstein-Shimkin response-based approachability framework (JMLR '15) after a one-time John ellipsoid preconditioning of the decision set. These are independent external tools whose properties are invoked directly; the resulting bounds follow from standard approachability guarantees on the transformed set and do not reduce by construction to any fitted parameter, self-definition, or prior result by the same authors. The matching Ω(d √T) lower bound is information-theoretic. Overlapping-author citations to Daskalakis et al. (STOC '25) are used only for context on prior suboptimal algorithms and do not carry the central claim. The construction is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard convexity of the decision set and applicability of the cited approachability framework; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption Decision set is a compact convex subset of R^d
Standard assumption in online convex optimization stated in the problem setup.
domain assumption Response-based approachability framework applies directly to linear swap regret after preconditioning
Core leverage of Bernstein and Shimkin (JMLR '15) as described in the abstract.

pith-pipeline@v0.9.0 · 5638 in / 1394 out tokens · 33488 ms · 2026-05-16T06:29:20.616992+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $\Phi$-Regret Minimization
cs.LG 2026-04 unverdicted novelty 8.0

Black-box reductions from no-regret online learning to multicalibration and from multicalibration to Phi-regret minimization are established, resolving the main open question in Garg et al. (SODA '24).