Frank-Wolfe Beyond 1/t Convergence
Pith reviewed 2026-05-07 06:47 UTC · model grok-4.3
The pith
A property of the feasible set lets Frank-Wolfe converge faster than 1/t for any smooth convex objective.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that the Local Dual Sharpness (LDS) condition on the feasible set C and its linear minimization oracle suffices for the Frank-Wolfe algorithm to converge with primal gap o(1/t) for any smooth convex f. This rules out the standard Ω(1/t) worst-case lower bound under LDS. LDS is a localized generalization of uniform convexity, satisfied by uniformly convex sets, providing the first unconditional o(1/t) result for those sets.
What carries the argument
The Local Dual Sharpness (LDS) condition, a property of the feasible region and its linear minimization oracle that generalizes uniform convexity of sets.
If this is right
- The Frank-Wolfe algorithm achieves o(1/t) convergence on any uniformly convex set for smooth convex objectives.
- Combining Local Dual Sharpness with a local variant of Hölder error bounds allows explicit quantification of the convergence rate.
- Standard worst-case Ω(1/t) analyses for Frank-Wolfe no longer apply when the feasible set satisfies the Local Dual Sharpness property.
- Faster convergence can arise solely from the geometry of the constraint set rather than from extra assumptions on the objective.
Where Pith is reading between the lines
- Similar localized sharpness conditions on feasible sets could improve rates for other projection-free first-order methods.
- In applied settings one could verify whether a problem's constraint set satisfies Local Dual Sharpness before choosing an algorithm.
- The result suggests shifting some convergence analysis effort from function properties toward characterizing favorable feasible-set geometries.
- Future work might derive simple sufficient conditions on C that guarantee Local Dual Sharpness without checking the oracle directly.
Load-bearing premise
The Local Dual Sharpness condition must hold for the feasible set and its linear minimization oracle.
What would settle it
A counterexample of a compact convex set whose linear minimization oracle satisfies Local Dual Sharpness yet Frank-Wolfe still shows only Ω(1/t) convergence on some smooth convex function would disprove the result.
Figures
read the original abstract
We consider smooth convex minimization over compact convex sets, i.e., $\min_{x \in C} f(x)$ with the (vanilla) Frank-Wolfe algorithm. Well-known lower bounds establish a worst-case $\Omega(1/t)$ primal-gap barrier in the general smooth convex case, and faster convergence usually requires favorable function properties such as H\"older error bounds or strong convexity. We present a new Local Dual Sharpness (LDS) condition, essentially a property of the feasible region and its LMO, under which the Frank-Wolfe algorithm converges in $o(1/t)$ for any smooth convex function, ruling out an $\Omega(1/t)$ lower bound under LDS. The condition is a generalization (and localization) of uniform convexity of sets and it is satisfied by any uniformly convex set. To our knowledge, this is the first unconditional $o(1/t)$ convergence result for uniformly convex sets. Combining LDS with stronger function properties, e.g., a local variant of H\"older error bounds, allows us to quantify the actual rates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a new Local Dual Sharpness (LDS) condition, defined solely in terms of the compact convex feasible set C and its linear minimization oracle. Under this condition, the vanilla Frank-Wolfe algorithm is shown to converge in o(1/t) (primal gap) for every smooth convex objective f, thereby circumventing the known Ω(1/t) lower bound that holds in the absence of such a condition. The authors prove that LDS is satisfied by every uniformly convex set, yielding the first unconditional o(1/t) guarantee for Frank-Wolfe on such domains, and further combine LDS with a local Hölder error bound to obtain explicit rates.
Significance. If the central claims are correct, the work is significant because it decouples convergence-rate improvements from function-specific assumptions (strong convexity, error bounds) and instead ties them to a verifiable geometric property of the constraint set and its LMO. This is the first result that unconditionally improves upon the 1/t barrier for the entire class of uniformly convex sets without additional restrictions on f. The localization of uniform convexity via the dual sharpness notion is a technically clean generalization that may apply to other first-order methods.
major comments (2)
- [§4, Theorem 4.1] §4, Theorem 4.1 (main o(1/t) result): the inductive argument establishing that the gap sequence satisfies g_{t+1} = o(g_t) under LDS appears to rely on a uniform lower bound on the dual sharpness parameter over a neighborhood of the optimum; it is not immediately clear whether this neighborhood radius can be chosen independently of the smoothness constant L of f while still guaranteeing the little-o statement for every smooth convex f.
- [§5, Proposition 5.2] §5, Proposition 5.2 (uniform convexity implies LDS): the proof that every uniformly convex set satisfies LDS is stated to hold for any LMO; however, the argument uses a global modulus of convexity, and it would be useful to see an explicit verification that the resulting LDS constant remains positive when the LMO is only an approximate linear minimization oracle (as is common in practice).
minor comments (3)
- [§2] The notation for the dual gap and the LMO output is introduced in §2 but reused with slight variations in later sections; a single consolidated notation table or consistent use of subscripts would improve readability.
- [Introduction / Related Work] The abstract claims this is the 'first unconditional o(1/t) result for uniformly convex sets,' yet the related-work section does not explicitly contrast the new result with the recent line of work on FW over strongly convex sets (e.g., the 2022–2023 papers that obtain 1/t^{2} rates under additional function assumptions). A short clarifying sentence would help.
- [Figure 1] Figure 1 (illustrating the geometry of LDS) has axis labels that are too small for print; increasing font size and adding a caption that explicitly ties the shaded region to the LDS definition would help.
Simulated Author's Rebuttal
We thank the referee for the thorough review and the encouraging recommendation for minor revision. We appreciate the positive assessment of the significance of our work on the Local Dual Sharpness condition. Below, we provide point-by-point responses to the major comments.
read point-by-point responses
-
Referee: [§4, Theorem 4.1] §4, Theorem 4.1 (main o(1/t) result): the inductive argument establishing that the gap sequence satisfies g_{t+1} = o(g_t) under LDS appears to rely on a uniform lower bound on the dual sharpness parameter over a neighborhood of the optimum; it is not immediately clear whether this neighborhood radius can be chosen independently of the smoothness constant L of f while still guaranteeing the little-o statement for every smooth convex f.
Authors: We thank the referee for pointing out this potential ambiguity in the presentation. The Local Dual Sharpness (LDS) condition is a property solely of the compact convex set C and the linear minimization oracle, and is thus independent of the objective function f and its smoothness constant L. In Definition 4.1, the neighborhood radius r > 0 around the optimum x* is chosen based on the geometry of C to ensure the dual sharpness inequality holds with some positive constant δ. This choice of r does not depend on f or L. In the proof of Theorem 4.1, we use this fixed neighborhood to show that once the iterates enter a certain region (which happens in finite time for any f), the gap satisfies g_{t+1} ≤ (1 - c) g_t for some c > 0 depending on δ, but actually the argument establishes the little-o by showing faster than 1/t decrease eventually. Since the little-o is asymptotic and can depend on f (including L), the independence holds. We will revise the manuscript to explicitly state that the LDS parameters are independent of f and L, and clarify the proof steps to make the independence evident. revision: yes
-
Referee: [§5, Proposition 5.2] §5, Proposition 5.2 (uniform convexity implies LDS): the proof that every uniformly convex set satisfies LDS is stated to hold for any LMO; however, the argument uses a global modulus of convexity, and it would be useful to see an explicit verification that the resulting LDS constant remains positive when the LMO is only an approximate linear minimization oracle (as is common in practice).
Authors: We agree that extending the result to approximate linear minimization oracles is valuable for practical considerations. The current proof of Proposition 5.2 assumes an exact LMO to leverage the global modulus of convexity δ(·) of the set. For an approximate LMO with error ε_t (where the returned direction satisfies the linear minimization up to additive ε_t), the effective dual sharpness parameter would be reduced by a term proportional to ε_t. As long as the approximation error is bounded (which it is in practice, e.g., by solving the LMO to sufficient accuracy), the resulting LDS constant remains positive. We will add a remark or a short appendix subsection providing this explicit verification, showing that the o(1/t) convergence still holds under bounded approximation errors, with the constant adjusted accordingly. This is a minor addition that strengthens the applicability of the result. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper defines the Local Dual Sharpness (LDS) condition explicitly as a geometric property of the compact convex set C and its linear minimization oracle, independent of the objective function f. It then derives that this condition implies o(1/t) Frank-Wolfe convergence uniformly for every smooth convex f, and separately verifies that every uniformly convex set satisfies LDS. No step in the claimed chain reduces the convergence guarantee to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose validity depends on the present result. The o(1/t) rate is obtained from the LDS assumption via standard Frank-Wolfe analysis rather than by construction or renaming of known bounds. The derivation is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The feasible set C is compact and convex.
- domain assumption The objective function f is smooth and convex.
invented entities (1)
-
Local Dual Sharpness (LDS) condition
no independent evidence
Reference graph
Works this paper leans on
-
[1]
ISBN 978-1-61197-856-8. doi: 10.1137/1.9781611978568. M. D. Canon and C. D. Cullum. A tight upper bound on the rate of convergence of Frank–Wolfe algorithm.SIAM Journal on Control, 6(4):509–516, 1968. doi: 10.1137/0306032. 16 Vladimir F. Demyanov and Alexander M. Rubinov.Approximate Methods in Optimization Problems, volume 32 ofModern Analytic and Computa...
-
[2]
URLhttps://optimization-online.org/2013/05/3904/. Evgeny S. Levitin and Boris T. Polyak. Constrained minimization methods.U.S.S.R. Computational Mathematics and Mathematical Physics, 6(5):1–50, 1966. doi: 10.1016/0041-5553(66)90114-5. Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on...
-
[3]
the stadium Cstad(a) def = [−a, a]× {0} +B 2(0,1), a >0, together with any compact M contained in the relative interior of the right rounded cap, equivalently of the open semicircle Γ+ a def ={(a+ cosθ,sinθ) :−π/2< θ < π/2}, K + a def = (a,0) +B 2(0,1)
-
[4]
Neither family is globally uniformly convex: the stadium contains a line segment, while the truncated disk contains a facet
the truncated disk Ctr(b) def =B 2(0,1)∩ {x 1 ≤b},0< b <1, together with any compactMcontained more specifically in the lower arc Γ− b def ={(x 1, x2)∈∂B 2(0,1) :x 2 <− p 1−b 2}. Neither family is globally uniformly convex: the stadium contains a line segment, while the truncated disk contains a facet. Nevertheless, in both cases the active curved patch h...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.