A conditional-gradient-based single-loop augmented Lagrangian method for inequality constrained optimization

Ting Kei Pong; Xiaozhou Wang; Zev Woodstock

arxiv: 2605.22539 · v1 · pith:HZS466ORnew · submitted 2026-05-21 · 🧮 math.OC

A conditional-gradient-based single-loop augmented Lagrangian method for inequality constrained optimization

Xiaozhou Wang , Ting Kei Pong , Zev Woodstock This is my paper

Pith reviewed 2026-05-22 04:18 UTC · model grok-4.3

classification 🧮 math.OC

keywords conditional gradientaugmented Lagrangianinequality constraintsconvex optimizationconvergence rateslinear minimization oracle

0 comments

The pith

A single conditional gradient step inside each augmented Lagrangian iteration achieves the best-known convergence rate for convex problems with smooth inequality constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a single-loop algorithm for minimizing a Lipschitz-differentiable convex function plus a convex function that has an efficient linear minimization oracle, all subject to multiple smooth convex inequality constraints. Each iteration applies one conditional gradient step to the augmented Lagrangian, then updates the dual variable using a diminishing stepsize exactly as in classical augmented Lagrangian methods. The authors prove that this procedure attains convergence rates matching the best-known complexity results for the problem class under both open-loop and short stepsize rules for conditional gradient, and they obtain accelerated rates when the nonsmooth term is the indicator of a uniformly convex set.

Core claim

The algorithm performs one conditional gradient step on the augmented Lagrangian function followed by a dual update with a diminishing stepsize. Under standard open-loop and short stepsize rules for conditional gradient, the method achieves a convergence rate that matches the best-known complexity for minimizing the sum of a Lipschitz differentiable convex function and a proper closed convex function admitting linear minimization oracles, subject to smooth convex inequality constraints. Accelerated rates hold when the nonsmooth term is the indicator function of a uniformly convex set.

What carries the argument

Single conditional gradient step applied to the augmented Lagrangian, followed by a diminishing dual stepsize update.

Load-bearing premise

The proof requires exact adherence to a diminishing dual stepsize rule together with convexity of the objective and smoothness plus convexity of the inequality constraints.

What would settle it

Execute the algorithm on a concrete convex test problem with known optimal value and check whether the observed objective gap decreases at the claimed rate under the stated stepsize rules.

read the original abstract

We consider the problem of minimizing the sum of a Lipschitz differentiable convex function $f$ and a proper closed convex function $h$ that admits efficient linear minimization oracles, subject to multiple smooth convex inequality constraints. We adapt the classical augmented Lagrangian (AL) method for these problems: in each iteration, our algorithm consists of one step of the conditional gradient (CG) method applied to the AL function, followed by an update of the dual variable as in classical AL methods with a diminishing dual stepsize. We study the convergence rate of our algorithm under two standard stepsize rules for the CG method, namely, an open-loop stepsize and the short stepsize, and obtain a convergence rate that matches the best-known complexity for this class of problems. We also establish accelerated rates when $h$ is the indicator function of a uniformly convex set.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper puts one CG step inside a single-loop AL scheme with diminishing dual steps and claims to recover the best-known rates for convex problems with smooth inequalities; the analysis looks plausible but needs a close check on how it controls drift from the changing AL function.

read the letter

The main takeaway is that this combines a single conditional gradient step on the augmented Lagrangian with a standard dual update and diminishing stepsize to get an overall rate that matches the best known for this problem class. It also gets accelerated rates when h is the indicator of a uniformly convex set. That single-loop structure is the concrete advance over the usual inner-loop AL or multi-step CG approaches in the cited literature. The algorithm itself is straightforward to implement if you already have a linear minimization oracle for h, and the two stepsize rules (open-loop and short) are standard CG choices, so the method stays practical. The abstract and setup are clean, and the claims about matching complexity are stated precisely enough to evaluate. The soft spot is exactly the one in the stress-test note: the AL function changes every outer iteration because of the penalty growth and dual ascent, so the usual fixed-function CG error bounds do not apply directly. The paper must supply a separate argument that absorbs the function drift without extra factors that would ruin the target rate. If that telescoping or Lipschitz control is tight and the diminishing dual stepsize is used exactly as stated, the result holds; otherwise the claimed rate could slip. The assumptions on f and the constraints are standard and not hidden. This is the kind of incremental but usable paper that optimization theorists working on first-order constrained methods will want to see. It is worth sending to a serious referee who can verify the convergence details and the interaction between the CG inexactness and the AL updates. I would bring it to a reading group for the proof section.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a single-loop augmented Lagrangian (AL) algorithm for minimizing f + h subject to smooth convex inequality constraints, where f is Lipschitz differentiable and convex, and h admits an efficient linear minimization oracle. In each iteration, the algorithm applies one conditional gradient (CG) step to the AL function and then updates the dual variables using a diminishing stepsize. The paper analyzes the convergence under open-loop and short stepsize rules for CG, claiming rates that match the best-known complexity for this problem class. It also provides accelerated rates when h is the indicator function of a uniformly convex set.

Significance. This approach is significant as it offers a single-loop method that integrates CG with AL, potentially improving efficiency over methods requiring multiple inner iterations. Matching the best-known rates and providing accelerated convergence in the uniformly convex case would be a valuable contribution to the field of constrained optimization if the analysis is rigorous. The use of standard CG stepsizes and the focus on inequality constraints with smooth convex functions aligns with practical needs in large-scale problems.

major comments (2)

[Convergence Analysis] The central claim that one CG step suffices to match the best-known rates relies on controlling the error from the inexact minimization of the changing AL function. The standard analysis for CG on fixed functions does not immediately extend here due to the penalty parameter increase and dual update at each iteration. A specific telescoping argument or Lipschitz-drift bound is needed to absorb the function variation; without it, the claimed rate may not hold. Please clarify in the proof how the CG error term is bounded in the presence of these updates.
[Accelerated Rates Theorem] For the accelerated rates when h is the indicator of a uniformly convex set, the analysis should specify the dependence on the convexity parameter and ensure that the single CG step still achieves the acceleration without additional assumptions on the constraint functions.

minor comments (2)

[Notation] Ensure consistent use of notation for the augmented Lagrangian function and the penalty parameter across sections.
[Abstract] The abstract mentions 'matching the best-known complexity' but does not specify what that complexity is (e.g., O(1/ε) or O(1/√ε)); clarifying this would help readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We appreciate the recognition of the potential significance of our single-loop conditional-gradient augmented Lagrangian approach. Below we respond to each major comment, providing clarifications on the analysis and indicating where revisions will be made.

read point-by-point responses

Referee: The central claim that one CG step suffices to match the best-known rates relies on controlling the error from the inexact minimization of the changing AL function. The standard analysis for CG on fixed functions does not immediately extend here due to the penalty parameter increase and dual update at each iteration. A specific telescoping argument or Lipschitz-drift bound is needed to absorb the function variation; without it, the claimed rate may not hold. Please clarify in the proof how the CG error term is bounded in the presence of these updates.

Authors: We agree that the variation of the augmented Lagrangian must be carefully controlled. In the proof of the main convergence theorem, we derive a Lipschitz-drift bound on the change in the AL function value between iterations. This bound uses the smoothness of the constraint functions together with the fact that the dual stepsize is diminishing; the resulting perturbation terms are summable and can be absorbed into the standard CG error telescope without degrading the overall rate. We will add an explicit lemma stating this drift bound and a short remark immediately after the main proof to make the argument self-contained. revision: yes
Referee: For the accelerated rates when h is the indicator of a uniformly convex set, the analysis should specify the dependence on the convexity parameter and ensure that the single CG step still achieves the acceleration without additional assumptions on the constraint functions.

Authors: We thank the referee for this observation. The accelerated rate stated in the relevant theorem depends explicitly on the modulus of uniform convexity of the feasible set (appearing in the denominator of the leading term). The proof shows that the single conditional-gradient step exploits this geometry directly; the smoothness and convexity of the inequality constraints are already sufficient and no further assumptions are imposed. In the revision we will restate the theorem with the explicit dependence on the convexity parameter and add a clarifying sentence confirming that the existing assumptions on the constraints suffice. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper defines a single-loop algorithm that performs one CG step on the current augmented Lagrangian followed by a dual update with diminishing stepsize. Convergence rates are derived by adapting standard CG guarantees to the sequence of changing AL functions under the stated Lipschitz and convexity assumptions. No equation reduces a claimed rate to a fitted parameter, self-referential definition, or load-bearing self-citation; the analysis treats the AL drift explicitly via the algorithm's own update rules rather than assuming a fixed objective. The result is therefore independent of its inputs and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claim rests on standard domain assumptions from convex optimization rather than new free parameters or invented entities.

axioms (3)

domain assumption f is Lipschitz differentiable and convex
Required for the augmented Lagrangian to remain convex and for the CG step to be well-defined.
domain assumption h is proper closed convex and admits efficient linear minimization oracles
Enables the conditional gradient step to be computationally cheap.
domain assumption Inequality constraints are smooth and convex
Ensures the augmented Lagrangian penalty terms behave appropriately for convergence analysis.

pith-pipeline@v0.9.0 · 5674 in / 1363 out tokens · 45326 ms · 2026-05-22T04:18:11.492860+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Hestenes, Multiplier and gradient methods

M.R. Hestenes, Multiplier and gradient methods. J. Optim. Theory Appl.4(5), 303–320 (1969)

work page 1969
[2]

Powell, A method for nonlinear constraints in minimization problems

M.J. Powell, A method for nonlinear constraints in minimization problems. Optimization pp. 283–298 (1969)

work page 1969
[3]

Rockafellar, A dual approach to solving nonlinear programming problems by unconstrained optimization

R.T. Rockafellar, A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program.5(1), 354–373 (1973)

work page 1973
[4]

Bertsekas, On penalty and multiplier methods for constrained minimization

D.P. Bertsekas, On penalty and multiplier methods for constrained minimization. SIAM J. Control Optim.14(2), 216–235 (1976)

work page 1976
[5]

Kort, D.P

B.W. Kort, D.P. Bertsekas, Combined primal–dual and penalty methods for convex programming. SIAM J. Control Optim.14(2), 268–294 (1976)

work page 1976
[6]

Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming

R.T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res.1(2), 97–116 (1976)

work page 1976
[7]

Bertsekas,Constrained Optimization and Lagrange Multiplier Methods (Academic Press, 1982)

D.P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods (Academic Press, 1982)

work page 1982
[8]

Iusem, Augmented Lagrangian methods and proximal point methods for convex optimization

A.N. Iusem, Augmented Lagrangian methods and proximal point methods for convex optimization. Investigaci´ on Operativa8, 11–49 (1999)

work page 1999
[9]

De Marchi, X

A. De Marchi, X. Jia, C. Kanzow, P. Mehlitz, Constrained composite optimization and augmented Lagrangian methods. Math. Program.201(1), 863–896 (2023)

work page 2023
[10]

Tapia, Diagonalized multiplier methods and quasi-Newton methods for constrained optimization

R.A. Tapia, Diagonalized multiplier methods and quasi-Newton methods for constrained optimization. J. Optim. Theory Appl.22(2), 135–194 (1977)

work page 1977
[11]

Gidel, F

G. Gidel, F. Pedregosa, S. Lacoste-Julien,Frank-Wolfe splitting via augmented Lagrangian method, inInternational Conference on Artificial Intelligence and Statistics(PMLR, 2018), pp. 1456–1465

work page 2018
[12]

Yurtsever, O

A. Yurtsever, O. Fercoq, V. Cevher,A Conditional-Gradient-Based Augmented Lagrangian Framework, inProceedings of the 36th International Conference on 32 Machine Learning,Proceedings of Machine Learning Research, vol. 97, ed. by K. Chaudhuri, R. Salakhutdinov (PMLR, 2019), pp. 7272–7281

work page 2019
[13]

Silveti-Falls, C

A. Silveti-Falls, C. Molinari, J. Fadili, Generalized conditional gradient with augmented Lagrangian for composite minimization. SIAM J. Optim.30(4), 2687–2725 (2020)

work page 2020
[14]

Garber, T

D. Garber, T. Livney, S. Sabach,Faster Projection-Free Augmented Lagrangian Methods via Weak Proximal Oracle, inProceedings of The 26th International Con- ference on Artificial Intelligence and Statistics,Proceedings of Machine Learning Research, vol. 206, ed. by F. Ruiz, J. Dy, J.W. van de Meent (PMLR, 2023), pp. 7213–7238

work page 2023
[15]

Freund, P

R.M. Freund, P. Grigas, New analysis and results for the Frank-Wolfe method. Math. Program.155, 199–230 (2016)

work page 2016
[16]

Combettes, S

C.W. Combettes, S. Pokutta, Complexity of linear minimization and projection on some sets. Oper. Res. Lett.49(4), 565–571 (2021)

work page 2021
[17]

Woodstock, High-precision linear minimization is no slower than projection

Z. Woodstock, High-precision linear minimization is no slower than projection. Optim. Lett. (2026)

work page 2026
[18]

N. He, Z. Harchaoui,Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization, inAdvances in Neural Information Processing Systems, vol. 28, ed. by C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (Curran Associates, Inc., 2015)

work page 2015
[19]

Y.F. Liu, X. Liu, S. Ma, On the nonergodic convergence rate of an inexact augmented Lagrangian framework for composite convex programming. Math. Oper. Res.44(2), 632–650 (2019)

work page 2019
[20]

Kolmogorov, T

V. Kolmogorov, T. Pock,One-sided Frank-Wolfe algorithms for saddle problems, in Proceedings of the 38th International Conference on Machine Learning,Proceedings of Machine Learning Research, vol. 139, ed. by M. Meila, T. Zhang (PMLR, 2021), pp. 5665–5675

work page 2021
[21]

Mill´ an, O.P

R.D. Mill´ an, O.P. Ferreira, L.F. Prudente, Alternating conditional gradient method for convex feasibility problems. Comput. Optim. Appl.80(1), 245––269 (2021)

work page 2021
[22]

A. Beck, E. Pauwels, S. Sabach, The cyclic block conditional gradient method for convex optimization problems. SIAM J. Optim.25(4), 2024–2049 (2015)

work page 2024
[23]

Richard, P

E. Richard, P. Savalle, N. Vayatis,Estimation of Simultaneously Sparse and Low Rank Matrices, inProceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012(icml.cc / Omnipress, 2012) 33

work page 2012
[24]

G. Lan, E. Romeijn, Z. Zhou, Conditional gradient methods for convex optimization with general affine and nonlinear constraints. SIAM J. Optim.31(3), 2307–2339 (2021)

work page 2021
[25]

Woodstock, S

Z. Woodstock, S. Pokutta, Splitting the conditional gradient algorithm. SIAM J. Optim.35(1), 347–368 (2025)

work page 2025
[26]

Asgari, M.J

K. Asgari, M.J. Neely, Nonsmooth projection-free optimization with functional constraints. Comput. Optim. Appl.89(3), 927–975 (2024)

work page 2024
[27]

Giang-Tran, N

K.H. Giang-Tran, N. Ho-Nguyen, D. Lee, A projection-free method for solving convex bilevel optimization problems. Math. Program.213(1), 907–940 (2025)

work page 2025
[28]

Garber, E

D. Garber, E. Hazan,Faster rates for the Frank-Wolfe method over strongly- convex sets, inInternational Conference on Machine Learning(PMLR, 2015), pp. 541–549

work page 2015
[29]

Kerdreux, A

T. Kerdreux, A. d’Aspremont, S. Pokutta,Projection-free optimization on uni- formly convex sets, inInternational Conference on Artificial Intelligence and Statistics(PMLR, 2021), pp. 19–27

work page 2021
[30]

Grimmer, N

B. Grimmer, N. Liu, Lower bounds for linear minimization oracle methods optimizing over strongly convex sets. arXiv preprint arXiv:2602.22608 (2026)

work page arXiv 2026
[31]

Lower Bounds for Frank-Wolfe on Strongly Convex Sets

J. Halbey, D. Deza, M. Zimmer, C. Roux, B. Stellato, S. Pokutta, Lower bounds for Frank-Wolfe on strongly convex sets. arXiv preprint arXiv:2602.04378 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[32]

Xu, Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming

Y. Xu, Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Program.185(1), 199–244 (2021)

work page 2021
[33]

Xu, First-order methods for constrained convex programming based on lin- earized augmented Lagrangian function

Y. Xu, First-order methods for constrained convex programming based on lin- earized augmented Lagrangian function. INFORMS J. Optim.3(1), 89–117 (2021)

work page 2021
[34]

Bach, Duality between subgradient and conditional gradient methods

F. Bach, Duality between subgradient and conditional gradient methods. SIAM J. Optim.25(1), 115–129 (2015)

work page 2015
[35]

Jaggi,Revisiting Frank-Wolfe: Projection-free sparse convex optimization, in International Conference on Machine Learning(PMLR, 2013), pp

M. Jaggi,Revisiting Frank-Wolfe: Projection-free sparse convex optimization, in International Conference on Machine Learning(PMLR, 2013), pp. 427–435

work page 2013
[36]

Pe˜ na, Affine invariant convergence rates of the conditional gradient method

J.F. Pe˜ na, Affine invariant convergence rates of the conditional gradient method. SIAM J. Optim.33(4), 2654–2674 (2023)

work page 2023
[37]

Z˘ alinescu,Convex Analysis in General Vector Spaces(World scientific, 2002)

C. Z˘ alinescu,Convex Analysis in General Vector Spaces(World scientific, 2002)

work page 2002
[38]

Wirth, J

E. Wirth, J. Pe˜ na, S. Pokutta, Accelerated affine-invariant convergence rates of the Frank-Wolfe algorithm with open-loop step-sizes. Math. Program.214, 201–245 (2025) 34

work page 2025
[39]

Wirth, T

E. Wirth, T. Kerdreux, S. Pokutta,Acceleration of Frank-Wolfe algorithms with open-loop step-sizes, inInternational Conference on Artificial Intelligence and Statistics(PMLR, 2023), pp. 77–100

work page 2023
[40]

Gurobi Optimization,Gurobi Optimizer Reference Manual(2026)

L. Gurobi Optimization,Gurobi Optimizer Reference Manual(2026). URL https: //github.com/jump-dev/Gurobi.jl. Version 1.9.2

work page 2026
[41]

Besan¸ con, A

M. Besan¸ con, A. Carderera, S. Pokutta, FrankWolfe.jl: A high-performance and flexible toolbox for Frank-Wolfe algorithms and conditional gradients. INFORMS J. Comput. (2022)

work page 2022
[42]

Kuhn, The Hungarian method for the assignment problem

H.W. Kuhn, The Hungarian method for the assignment problem. Nav. Res. Logist. Q.2(1-2), 83–97 (1955) 35

work page 1955

[1] [1]

Hestenes, Multiplier and gradient methods

M.R. Hestenes, Multiplier and gradient methods. J. Optim. Theory Appl.4(5), 303–320 (1969)

work page 1969

[2] [2]

Powell, A method for nonlinear constraints in minimization problems

M.J. Powell, A method for nonlinear constraints in minimization problems. Optimization pp. 283–298 (1969)

work page 1969

[3] [3]

Rockafellar, A dual approach to solving nonlinear programming problems by unconstrained optimization

R.T. Rockafellar, A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program.5(1), 354–373 (1973)

work page 1973

[4] [4]

Bertsekas, On penalty and multiplier methods for constrained minimization

D.P. Bertsekas, On penalty and multiplier methods for constrained minimization. SIAM J. Control Optim.14(2), 216–235 (1976)

work page 1976

[5] [5]

Kort, D.P

B.W. Kort, D.P. Bertsekas, Combined primal–dual and penalty methods for convex programming. SIAM J. Control Optim.14(2), 268–294 (1976)

work page 1976

[6] [6]

Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming

R.T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res.1(2), 97–116 (1976)

work page 1976

[7] [7]

Bertsekas,Constrained Optimization and Lagrange Multiplier Methods (Academic Press, 1982)

D.P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods (Academic Press, 1982)

work page 1982

[8] [8]

Iusem, Augmented Lagrangian methods and proximal point methods for convex optimization

A.N. Iusem, Augmented Lagrangian methods and proximal point methods for convex optimization. Investigaci´ on Operativa8, 11–49 (1999)

work page 1999

[9] [9]

De Marchi, X

A. De Marchi, X. Jia, C. Kanzow, P. Mehlitz, Constrained composite optimization and augmented Lagrangian methods. Math. Program.201(1), 863–896 (2023)

work page 2023

[10] [10]

Tapia, Diagonalized multiplier methods and quasi-Newton methods for constrained optimization

R.A. Tapia, Diagonalized multiplier methods and quasi-Newton methods for constrained optimization. J. Optim. Theory Appl.22(2), 135–194 (1977)

work page 1977

[11] [11]

Gidel, F

G. Gidel, F. Pedregosa, S. Lacoste-Julien,Frank-Wolfe splitting via augmented Lagrangian method, inInternational Conference on Artificial Intelligence and Statistics(PMLR, 2018), pp. 1456–1465

work page 2018

[12] [12]

Yurtsever, O

A. Yurtsever, O. Fercoq, V. Cevher,A Conditional-Gradient-Based Augmented Lagrangian Framework, inProceedings of the 36th International Conference on 32 Machine Learning,Proceedings of Machine Learning Research, vol. 97, ed. by K. Chaudhuri, R. Salakhutdinov (PMLR, 2019), pp. 7272–7281

work page 2019

[13] [13]

Silveti-Falls, C

A. Silveti-Falls, C. Molinari, J. Fadili, Generalized conditional gradient with augmented Lagrangian for composite minimization. SIAM J. Optim.30(4), 2687–2725 (2020)

work page 2020

[14] [14]

Garber, T

D. Garber, T. Livney, S. Sabach,Faster Projection-Free Augmented Lagrangian Methods via Weak Proximal Oracle, inProceedings of The 26th International Con- ference on Artificial Intelligence and Statistics,Proceedings of Machine Learning Research, vol. 206, ed. by F. Ruiz, J. Dy, J.W. van de Meent (PMLR, 2023), pp. 7213–7238

work page 2023

[15] [15]

Freund, P

R.M. Freund, P. Grigas, New analysis and results for the Frank-Wolfe method. Math. Program.155, 199–230 (2016)

work page 2016

[16] [16]

Combettes, S

C.W. Combettes, S. Pokutta, Complexity of linear minimization and projection on some sets. Oper. Res. Lett.49(4), 565–571 (2021)

work page 2021

[17] [17]

Woodstock, High-precision linear minimization is no slower than projection

Z. Woodstock, High-precision linear minimization is no slower than projection. Optim. Lett. (2026)

work page 2026

[18] [18]

N. He, Z. Harchaoui,Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization, inAdvances in Neural Information Processing Systems, vol. 28, ed. by C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (Curran Associates, Inc., 2015)

work page 2015

[19] [19]

Y.F. Liu, X. Liu, S. Ma, On the nonergodic convergence rate of an inexact augmented Lagrangian framework for composite convex programming. Math. Oper. Res.44(2), 632–650 (2019)

work page 2019

[20] [20]

Kolmogorov, T

V. Kolmogorov, T. Pock,One-sided Frank-Wolfe algorithms for saddle problems, in Proceedings of the 38th International Conference on Machine Learning,Proceedings of Machine Learning Research, vol. 139, ed. by M. Meila, T. Zhang (PMLR, 2021), pp. 5665–5675

work page 2021

[21] [21]

Mill´ an, O.P

R.D. Mill´ an, O.P. Ferreira, L.F. Prudente, Alternating conditional gradient method for convex feasibility problems. Comput. Optim. Appl.80(1), 245––269 (2021)

work page 2021

[22] [22]

A. Beck, E. Pauwels, S. Sabach, The cyclic block conditional gradient method for convex optimization problems. SIAM J. Optim.25(4), 2024–2049 (2015)

work page 2024

[23] [23]

Richard, P

E. Richard, P. Savalle, N. Vayatis,Estimation of Simultaneously Sparse and Low Rank Matrices, inProceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012(icml.cc / Omnipress, 2012) 33

work page 2012

[24] [24]

G. Lan, E. Romeijn, Z. Zhou, Conditional gradient methods for convex optimization with general affine and nonlinear constraints. SIAM J. Optim.31(3), 2307–2339 (2021)

work page 2021

[25] [25]

Woodstock, S

Z. Woodstock, S. Pokutta, Splitting the conditional gradient algorithm. SIAM J. Optim.35(1), 347–368 (2025)

work page 2025

[26] [26]

Asgari, M.J

K. Asgari, M.J. Neely, Nonsmooth projection-free optimization with functional constraints. Comput. Optim. Appl.89(3), 927–975 (2024)

work page 2024

[27] [27]

Giang-Tran, N

K.H. Giang-Tran, N. Ho-Nguyen, D. Lee, A projection-free method for solving convex bilevel optimization problems. Math. Program.213(1), 907–940 (2025)

work page 2025

[28] [28]

Garber, E

D. Garber, E. Hazan,Faster rates for the Frank-Wolfe method over strongly- convex sets, inInternational Conference on Machine Learning(PMLR, 2015), pp. 541–549

work page 2015

[29] [29]

Kerdreux, A

T. Kerdreux, A. d’Aspremont, S. Pokutta,Projection-free optimization on uni- formly convex sets, inInternational Conference on Artificial Intelligence and Statistics(PMLR, 2021), pp. 19–27

work page 2021

[30] [30]

Grimmer, N

B. Grimmer, N. Liu, Lower bounds for linear minimization oracle methods optimizing over strongly convex sets. arXiv preprint arXiv:2602.22608 (2026)

work page arXiv 2026

[31] [31]

Lower Bounds for Frank-Wolfe on Strongly Convex Sets

J. Halbey, D. Deza, M. Zimmer, C. Roux, B. Stellato, S. Pokutta, Lower bounds for Frank-Wolfe on strongly convex sets. arXiv preprint arXiv:2602.04378 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[32] [32]

Xu, Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming

Y. Xu, Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Program.185(1), 199–244 (2021)

work page 2021

[33] [33]

Xu, First-order methods for constrained convex programming based on lin- earized augmented Lagrangian function

Y. Xu, First-order methods for constrained convex programming based on lin- earized augmented Lagrangian function. INFORMS J. Optim.3(1), 89–117 (2021)

work page 2021

[34] [34]

Bach, Duality between subgradient and conditional gradient methods

F. Bach, Duality between subgradient and conditional gradient methods. SIAM J. Optim.25(1), 115–129 (2015)

work page 2015

[35] [35]

Jaggi,Revisiting Frank-Wolfe: Projection-free sparse convex optimization, in International Conference on Machine Learning(PMLR, 2013), pp

M. Jaggi,Revisiting Frank-Wolfe: Projection-free sparse convex optimization, in International Conference on Machine Learning(PMLR, 2013), pp. 427–435

work page 2013

[36] [36]

Pe˜ na, Affine invariant convergence rates of the conditional gradient method

J.F. Pe˜ na, Affine invariant convergence rates of the conditional gradient method. SIAM J. Optim.33(4), 2654–2674 (2023)

work page 2023

[37] [37]

Z˘ alinescu,Convex Analysis in General Vector Spaces(World scientific, 2002)

C. Z˘ alinescu,Convex Analysis in General Vector Spaces(World scientific, 2002)

work page 2002

[38] [38]

Wirth, J

E. Wirth, J. Pe˜ na, S. Pokutta, Accelerated affine-invariant convergence rates of the Frank-Wolfe algorithm with open-loop step-sizes. Math. Program.214, 201–245 (2025) 34

work page 2025

[39] [39]

Wirth, T

E. Wirth, T. Kerdreux, S. Pokutta,Acceleration of Frank-Wolfe algorithms with open-loop step-sizes, inInternational Conference on Artificial Intelligence and Statistics(PMLR, 2023), pp. 77–100

work page 2023

[40] [40]

Gurobi Optimization,Gurobi Optimizer Reference Manual(2026)

L. Gurobi Optimization,Gurobi Optimizer Reference Manual(2026). URL https: //github.com/jump-dev/Gurobi.jl. Version 1.9.2

work page 2026

[41] [41]

Besan¸ con, A

M. Besan¸ con, A. Carderera, S. Pokutta, FrankWolfe.jl: A high-performance and flexible toolbox for Frank-Wolfe algorithms and conditional gradients. INFORMS J. Comput. (2022)

work page 2022

[42] [42]

Kuhn, The Hungarian method for the assignment problem

H.W. Kuhn, The Hungarian method for the assignment problem. Nav. Res. Logist. Q.2(1-2), 83–97 (1955) 35

work page 1955