Frank-Wolfe Beyond 1/t Convergence

Sebastian Pokutta

arxiv: 2604.28006 · v2 · submitted 2026-04-30 · 🧮 math.OC

Frank-Wolfe Beyond 1/t Convergence

Sebastian Pokutta This is my paper

Pith reviewed 2026-05-07 06:47 UTC · model grok-4.3

classification 🧮 math.OC

keywords Frank-Wolfe algorithmconvex optimizationconvergence ratesuniform convexitylocal dual sharpnesslinear minimization oraclesmooth convex minimization

0 comments

The pith

A property of the feasible set lets Frank-Wolfe converge faster than 1/t for any smooth convex objective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Local Dual Sharpness condition on the feasible region and its linear minimization oracle. Under this condition, the Frank-Wolfe algorithm achieves o(1/t) convergence for smooth convex minimization problems, even without assuming strong convexity or other favorable properties of the function. This breaks the known Ω(1/t) lower bound that holds in general. The condition generalizes uniform convexity of sets and holds for any such set. If correct, this means faster convergence is possible purely due to the geometry of the domain rather than the objective function.

Core claim

The central discovery is that the Local Dual Sharpness (LDS) condition on the feasible set C and its linear minimization oracle suffices for the Frank-Wolfe algorithm to converge with primal gap o(1/t) for any smooth convex f. This rules out the standard Ω(1/t) worst-case lower bound under LDS. LDS is a localized generalization of uniform convexity, satisfied by uniformly convex sets, providing the first unconditional o(1/t) result for those sets.

What carries the argument

The Local Dual Sharpness (LDS) condition, a property of the feasible region and its linear minimization oracle that generalizes uniform convexity of sets.

If this is right

The Frank-Wolfe algorithm achieves o(1/t) convergence on any uniformly convex set for smooth convex objectives.
Combining Local Dual Sharpness with a local variant of Hölder error bounds allows explicit quantification of the convergence rate.
Standard worst-case Ω(1/t) analyses for Frank-Wolfe no longer apply when the feasible set satisfies the Local Dual Sharpness property.
Faster convergence can arise solely from the geometry of the constraint set rather than from extra assumptions on the objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar localized sharpness conditions on feasible sets could improve rates for other projection-free first-order methods.
In applied settings one could verify whether a problem's constraint set satisfies Local Dual Sharpness before choosing an algorithm.
The result suggests shifting some convergence analysis effort from function properties toward characterizing favorable feasible-set geometries.
Future work might derive simple sufficient conditions on C that guarantee Local Dual Sharpness without checking the oracle directly.

Load-bearing premise

The Local Dual Sharpness condition must hold for the feasible set and its linear minimization oracle.

What would settle it

A counterexample of a compact convex set whose linear minimization oracle satisfies Local Dual Sharpness yet Frank-Wolfe still shows only Ω(1/t) convergence on some smooth convex function would disprove the result.

Figures

Figures reproduced from arXiv: 2604.28006 by Sebastian Pokutta.

**Figure 1.** Figure 1: Primal-gap decay and geometry for the stadium. The left panel shows primal gaps for iterations view at source ↗

read the original abstract

We consider smooth convex minimization over compact convex sets, i.e., $\min_{x \in C} f(x)$ with the (vanilla) Frank-Wolfe algorithm. Well-known lower bounds establish a worst-case $\Omega(1/t)$ primal-gap barrier in the general smooth convex case, and faster convergence usually requires favorable function properties such as H\"older error bounds or strong convexity. We present a new Local Dual Sharpness (LDS) condition, essentially a property of the feasible region and its LMO, under which the Frank-Wolfe algorithm converges in $o(1/t)$ for any smooth convex function, ruling out an $\Omega(1/t)$ lower bound under LDS. The condition is a generalization (and localization) of uniform convexity of sets and it is satisfied by any uniformly convex set. To our knowledge, this is the first unconditional $o(1/t)$ convergence result for uniformly convex sets. Combining LDS with stronger function properties, e.g., a local variant of H\"older error bounds, allows us to quantify the actual rates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pokutta introduces a Local Dual Sharpness condition on the feasible set alone that lets standard Frank-Wolfe reach o(1/t) rates for any smooth convex objective when the set is uniformly convex.

read the letter

The main takeaway is that this paper shows how a property of the constraint set and its linear minimization oracle can push Frank-Wolfe past the usual 1/t barrier without any extra assumptions on the objective. Pokutta defines Local Dual Sharpness (LDS) as a localized form of uniform convexity tied to the oracle, proves that every uniformly convex set satisfies it, and derives o(1/t) convergence for arbitrary smooth convex f. This is new because prior faster rates always required function-side conditions such as strong convexity or error bounds; here the gain comes purely from set geometry and rules out the standard lower-bound constructions that use non-sharp sets. The argument stays modular: LDS is defined independently of f, the core proof extends the classic gap analysis, and the paper then shows how to combine LDS with local Hölder conditions on f to get explicit rates. That separation is clean and useful. The soft spots are limited but real. The o(1/t) result is asymptotic, so finite-time behavior could still be slow, and the manuscript would be stronger with a few concrete examples on standard uniformly convex sets (ball, simplex) to illustrate the actual improvement and to show how one checks LDS in practice. Verification of the condition for non-obvious sets is left somewhat open. This work is aimed at people who already know the standard Frank-Wolfe analysis and care about the role of constraint geometry in first-order methods. A reader in that group will get a clear, self-contained extension without needing to re-derive everything. The paper is coherent on its own terms, with no circularity or hidden function dependence, so it deserves a serious referee. I would send it out for review, with a note to add one or two worked examples for concreteness.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces a new Local Dual Sharpness (LDS) condition, defined solely in terms of the compact convex feasible set C and its linear minimization oracle. Under this condition, the vanilla Frank-Wolfe algorithm is shown to converge in o(1/t) (primal gap) for every smooth convex objective f, thereby circumventing the known Ω(1/t) lower bound that holds in the absence of such a condition. The authors prove that LDS is satisfied by every uniformly convex set, yielding the first unconditional o(1/t) guarantee for Frank-Wolfe on such domains, and further combine LDS with a local Hölder error bound to obtain explicit rates.

Significance. If the central claims are correct, the work is significant because it decouples convergence-rate improvements from function-specific assumptions (strong convexity, error bounds) and instead ties them to a verifiable geometric property of the constraint set and its LMO. This is the first result that unconditionally improves upon the 1/t barrier for the entire class of uniformly convex sets without additional restrictions on f. The localization of uniform convexity via the dual sharpness notion is a technically clean generalization that may apply to other first-order methods.

major comments (2)

[§4, Theorem 4.1] §4, Theorem 4.1 (main o(1/t) result): the inductive argument establishing that the gap sequence satisfies g_{t+1} = o(g_t) under LDS appears to rely on a uniform lower bound on the dual sharpness parameter over a neighborhood of the optimum; it is not immediately clear whether this neighborhood radius can be chosen independently of the smoothness constant L of f while still guaranteeing the little-o statement for every smooth convex f.
[§5, Proposition 5.2] §5, Proposition 5.2 (uniform convexity implies LDS): the proof that every uniformly convex set satisfies LDS is stated to hold for any LMO; however, the argument uses a global modulus of convexity, and it would be useful to see an explicit verification that the resulting LDS constant remains positive when the LMO is only an approximate linear minimization oracle (as is common in practice).

minor comments (3)

[§2] The notation for the dual gap and the LMO output is introduced in §2 but reused with slight variations in later sections; a single consolidated notation table or consistent use of subscripts would improve readability.
[Introduction / Related Work] The abstract claims this is the 'first unconditional o(1/t) result for uniformly convex sets,' yet the related-work section does not explicitly contrast the new result with the recent line of work on FW over strongly convex sets (e.g., the 2022–2023 papers that obtain 1/t^{2} rates under additional function assumptions). A short clarifying sentence would help.
[Figure 1] Figure 1 (illustrating the geometry of LDS) has axis labels that are too small for print; increasing font size and adding a caption that explicitly ties the shaded region to the LDS definition would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and the encouraging recommendation for minor revision. We appreciate the positive assessment of the significance of our work on the Local Dual Sharpness condition. Below, we provide point-by-point responses to the major comments.

read point-by-point responses

Referee: [§4, Theorem 4.1] §4, Theorem 4.1 (main o(1/t) result): the inductive argument establishing that the gap sequence satisfies g_{t+1} = o(g_t) under LDS appears to rely on a uniform lower bound on the dual sharpness parameter over a neighborhood of the optimum; it is not immediately clear whether this neighborhood radius can be chosen independently of the smoothness constant L of f while still guaranteeing the little-o statement for every smooth convex f.

Authors: We thank the referee for pointing out this potential ambiguity in the presentation. The Local Dual Sharpness (LDS) condition is a property solely of the compact convex set C and the linear minimization oracle, and is thus independent of the objective function f and its smoothness constant L. In Definition 4.1, the neighborhood radius r > 0 around the optimum x* is chosen based on the geometry of C to ensure the dual sharpness inequality holds with some positive constant δ. This choice of r does not depend on f or L. In the proof of Theorem 4.1, we use this fixed neighborhood to show that once the iterates enter a certain region (which happens in finite time for any f), the gap satisfies g_{t+1} ≤ (1 - c) g_t for some c > 0 depending on δ, but actually the argument establishes the little-o by showing faster than 1/t decrease eventually. Since the little-o is asymptotic and can depend on f (including L), the independence holds. We will revise the manuscript to explicitly state that the LDS parameters are independent of f and L, and clarify the proof steps to make the independence evident. revision: yes
Referee: [§5, Proposition 5.2] §5, Proposition 5.2 (uniform convexity implies LDS): the proof that every uniformly convex set satisfies LDS is stated to hold for any LMO; however, the argument uses a global modulus of convexity, and it would be useful to see an explicit verification that the resulting LDS constant remains positive when the LMO is only an approximate linear minimization oracle (as is common in practice).

Authors: We agree that extending the result to approximate linear minimization oracles is valuable for practical considerations. The current proof of Proposition 5.2 assumes an exact LMO to leverage the global modulus of convexity δ(·) of the set. For an approximate LMO with error ε_t (where the returned direction satisfies the linear minimization up to additive ε_t), the effective dual sharpness parameter would be reduced by a term proportional to ε_t. As long as the approximation error is bounded (which it is in practice, e.g., by solving the LMO to sufficient accuracy), the resulting LDS constant remains positive. We will add a remark or a short appendix subsection providing this explicit verification, showing that the o(1/t) convergence still holds under bounded approximation errors, with the constant adjusted accordingly. This is a minor addition that strengthens the applicability of the result. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines the Local Dual Sharpness (LDS) condition explicitly as a geometric property of the compact convex set C and its linear minimization oracle, independent of the objective function f. It then derives that this condition implies o(1/t) Frank-Wolfe convergence uniformly for every smooth convex f, and separately verifies that every uniformly convex set satisfies LDS. No step in the claimed chain reduces the convergence guarantee to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose validity depends on the present result. The o(1/t) rate is obtained from the LDS assumption via standard Frank-Wolfe analysis rather than by construction or renaming of known bounds. The derivation is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the newly defined LDS condition together with the standard background assumptions of compactness, convexity of the set, and smoothness/convexity of the objective. No numerical parameters are fitted to data. The LDS condition itself is the main invented element.

axioms (2)

domain assumption The feasible set C is compact and convex.
Standard setup for the Frank-Wolfe algorithm stated in the problem formulation.
domain assumption The objective function f is smooth and convex.
Given explicitly in the minimization problem considered.

invented entities (1)

Local Dual Sharpness (LDS) condition no independent evidence
purpose: A property of the feasible region and its linear minimization oracle that guarantees o(1/t) convergence for any smooth convex objective.
Newly introduced in the paper as a generalization and localization of uniform convexity; no independent evidence outside the definition is provided in the abstract.

pith-pipeline@v0.9.0 · 5476 in / 1502 out tokens · 85649 ms · 2026-05-07T06:47:00.953359+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 2 canonical work pages

[1]

doi: 10.1137/1.9781611978568

ISBN 978-1-61197-856-8. doi: 10.1137/1.9781611978568. M. D. Canon and C. D. Cullum. A tight upper bound on the rate of convergence of Frank–Wolfe algorithm.SIAM Journal on Control, 6(4):509–516, 1968. doi: 10.1137/0306032. 16 Vladimir F. Demyanov and Alexander M. Rubinov.Approximate Methods in Optimization Problems, volume 32 ofModern Analytic and Computa...

work page doi:10.1137/1.9781611978568 1968
[2]

Evgeny S

URLhttps://optimization-online.org/2013/05/3904/. Evgeny S. Levitin and Boris T. Polyak. Constrained minimization methods.U.S.S.R. Computational Mathematics and Mathematical Physics, 6(5):1–50, 1966. doi: 10.1016/0041-5553(66)90114-5. Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on...

work page doi:10.1016/0041-5553(66)90114-5 2013
[3]

the stadium Cstad(a) def = [−a, a]× {0} +B 2(0,1), a >0, together with any compact M contained in the relative interior of the right rounded cap, equivalently of the open semicircle Γ+ a def ={(a+ cosθ,sinθ) :−π/2< θ < π/2}, K + a def = (a,0) +B 2(0,1)
[4]

Neither family is globally uniformly convex: the stadium contains a line segment, while the truncated disk contains a facet

the truncated disk Ctr(b) def =B 2(0,1)∩ {x 1 ≤b},0< b <1, together with any compactMcontained more specifically in the lower arc Γ− b def ={(x 1, x2)∈∂B 2(0,1) :x 2 <− p 1−b 2}. Neither family is globally uniformly convex: the stadium contains a line segment, while the truncated disk contains a facet. Nevertheless, in both cases the active curved patch h...

[1] [1]

doi: 10.1137/1.9781611978568

ISBN 978-1-61197-856-8. doi: 10.1137/1.9781611978568. M. D. Canon and C. D. Cullum. A tight upper bound on the rate of convergence of Frank–Wolfe algorithm.SIAM Journal on Control, 6(4):509–516, 1968. doi: 10.1137/0306032. 16 Vladimir F. Demyanov and Alexander M. Rubinov.Approximate Methods in Optimization Problems, volume 32 ofModern Analytic and Computa...

work page doi:10.1137/1.9781611978568 1968

[2] [2]

Evgeny S

URLhttps://optimization-online.org/2013/05/3904/. Evgeny S. Levitin and Boris T. Polyak. Constrained minimization methods.U.S.S.R. Computational Mathematics and Mathematical Physics, 6(5):1–50, 1966. doi: 10.1016/0041-5553(66)90114-5. Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on...

work page doi:10.1016/0041-5553(66)90114-5 2013

[3] [3]

the stadium Cstad(a) def = [−a, a]× {0} +B 2(0,1), a >0, together with any compact M contained in the relative interior of the right rounded cap, equivalently of the open semicircle Γ+ a def ={(a+ cosθ,sinθ) :−π/2< θ < π/2}, K + a def = (a,0) +B 2(0,1)

[4] [4]

Neither family is globally uniformly convex: the stadium contains a line segment, while the truncated disk contains a facet

the truncated disk Ctr(b) def =B 2(0,1)∩ {x 1 ≤b},0< b <1, together with any compactMcontained more specifically in the lower arc Γ− b def ={(x 1, x2)∈∂B 2(0,1) :x 2 <− p 1−b 2}. Neither family is globally uniformly convex: the stadium contains a line segment, while the truncated disk contains a facet. Nevertheless, in both cases the active curved patch h...