Projection-Free Functional Constrained Optimization for Risk Aversion and Sparsity Control

Guanghui Lan; H. Edwin Romeijn; Saeed Masiha; Yi Cheng

arxiv: 2210.05108 · v2 · submitted 2022-10-11 · 🧮 math.OC · cs.LG

Projection-Free Functional Constrained Optimization for Risk Aversion and Sparsity Control

Yi Cheng , Guanghui Lan , Saeed Masiha , H. Edwin Romeijn This is my paper

Pith reviewed 2026-05-24 10:35 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords projection-free optimizationconditional gradient methodfunctional constraintsrisk aversionsparsity controlportfolio optimizationlevel-set methodsnear-KKT points

0 comments

The pith

Level conditional gradient methods solve functional constrained convex optimization in O(ε^{-2} log(1/ε)) iterations independent of optimal dual multiplier size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops projection-free algorithms for optimization problems that include functional constraints, a setting that arises when risk-aware criteria must be balanced against sparsity requirements in portfolio selection and radiation therapy planning. For convex objectives the authors introduce the Level Conditional Gradient method, which places a level-set outer loop around an inner conditional gradient oracle that solves saddle-point subproblems induced by the constraints. This construction yields an iteration complexity of order ε^{-2} log(ε^{-1}) that holds uniformly for smooth and nonsmooth convex cases and does not grow with the size of the optimal dual Lagrange multiplier. An inexact proximal-point outer loop extends the same inner solver to nonconvex objectives, producing an (ε,ε)-near-KKT point in O(ε^{-3} log(ε^{-1})) iterations.

Core claim

For convex functional constrained problems the Level Conditional Gradient algorithm attains O(ε^{-2} log(ε^{-1})) iteration complexity for both smooth and nonsmooth objectives without any dependence on the magnitude of the optimal dual Lagrange multiplier; the nonconvex case is handled by an inexact proximal-point outer loop that calls the same inner solver and reaches an (ε,ε)-near-KKT point in O(ε^{-3} log(ε^{-1})) iterations.

What carries the argument

Level Conditional Gradient (LCG) method, which wraps a level-set outer loop around a conditional gradient oracle that solves saddle-point subproblems arising from the functional constraints.

If this is right

The algorithm can be applied to large-scale problems where dual multipliers grow with problem size without incurring extra iterations.
The same inner oracle works for both smooth and nonsmooth convex objectives.
Numerical experiments on portfolio selection and IMRT problems demonstrate concrete sparsity versus risk trade-offs.
The nonconvex extension produces near-KKT points with one additional order of complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual-independent bound may allow the same framework to scale to problems containing many functional constraints without additional tuning.
Level-set ideas of this type could be tested inside other projection-free frameworks that already possess conditional gradient oracles.
Whether the observed practical sparsity control extends to stochastic or online variants of the same problems is left open by the current analysis.

Load-bearing premise

The functional constraints must admit an efficient conditional gradient oracle for the saddle-point subproblems that arise inside the level-set loop.

What would settle it

Construct a family of functional constrained problems in which the optimal dual multiplier grows arbitrarily while all other problem parameters remain fixed, then check whether the observed number of iterations needed to reach a target accuracy stays bounded by O(ε^{-2} log(ε^{-1})) independent of that growth.

read the original abstract

We study projection-free methods for functional constrained optimization with convex or smooth nonconvex objectives. Such problems arise in applications such as portfolio optimization and radiation therapy planning, where risk-aware criteria and sparsity frequently appear together. For the convex setting, we propose a Level Conditional Gradient (LCG) method that combines a level-set outer loop with a conditional gradient oracle for saddle-point subproblems, and we show an iteration complexity of $\mathcal{O}\big(\epsilon^{-2}\log(\epsilon^{-1})\big)$ for smooth and nonsmooth cases without dependence on the magnitude of an optimal dual Lagrange multiplier. For the nonconvex setting, we propose the Inexact Proximal Point LCG (IPP-LCG) method, which solves a sequence of convex subproblems by LCG and attains $\mathcal{O}\big(\epsilon^{-3}\log(\epsilon^{-1})\big)$ complexity for computing an $(\epsilon,\epsilon)$-near-KKT point. Numerical results on portfolio selection and IMRT illustrate the practical sparsity/risk trade-offs of the proposed methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces LCG and IPP-LCG algorithms that claim projection-free rates for functional constraints without dual-norm dependence, but the rates rest on cheap saddle-point oracles.

read the letter

The core contribution is a level-set conditional gradient method (LCG) for the convex case that reaches O(ε^{-2} log(1/ε)) iterations without explicit dependence on the optimal dual multiplier, plus an inexact proximal-point version (IPP-LCG) for nonconvex problems that reaches O(ε^{-3} log(1/ε)) for near-KKT points. Both target problems that mix risk measures with sparsity, such as portfolio selection and IMRT planning, and they stay projection-free by routing subproblems through conditional-gradient oracles on saddle-point formulations induced by the functional constraints. That template is new relative to earlier projection-free work on simpler constraint sets. The numerical examples on portfolio and radiotherapy instances show the sparsity-risk trade-offs in practice, which is useful for readers who need implementable methods rather than just rates. The main soft spot is the oracle assumption. The claimed rates hold only if each saddle-point subproblem can be solved to the needed accuracy with a number of CG steps whose cost does not grow with ε or with ||λ*||. For arbitrary functional constraints the inner min-max problem may itself require inner iterations that reintroduce that dependence, and the abstract gives no explicit bound on the inner cost. Without the full proofs it is unclear how tightly the paper controls this. The work is aimed at optimization researchers who already work with conditional-gradient oracles and who care about risk-aware or sparsity-constrained problems. A reader in that niche can extract the algorithmic idea and the complexity statements even if the oracle-efficiency claim needs verification. It is worth sending to peer review because the problem setting is practical, the algorithmic departure is clear, and the claimed independence from dual magnitude would be a genuine improvement if the inner-oracle accounting holds up.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes projection-free methods for functional constrained optimization with convex or smooth nonconvex objectives. For convex problems, the Level Conditional Gradient (LCG) method uses a level-set outer loop combined with a conditional gradient oracle on saddle-point subproblems induced by the functional constraints, claiming an iteration complexity of O(ε^{-2} log(ε^{-1})) independent of the magnitude of the optimal dual Lagrange multiplier for both smooth and nonsmooth cases. For nonconvex problems, the Inexact Proximal Point LCG (IPP-LCG) method solves a sequence of convex subproblems via LCG to attain O(ε^{-3} log(ε^{-1})) complexity for an (ε,ε)-near-KKT point. Numerical illustrations on portfolio selection and IMRT planning are included to demonstrate sparsity/risk trade-offs.

Significance. If the claimed complexities can be established under the stated oracle assumptions without reintroducing dependence on dual magnitudes, the work would advance the design of projection-free algorithms for constrained problems where functional constraints encode risk aversion and sparsity. Removing explicit dependence on ||λ*|| is a desirable property in such settings, and the application examples suggest relevance to practical domains.

major comments (1)

[Abstract] Abstract (LCG method paragraph): The O(ε^{-2} log(ε^{-1})) iteration complexity for the convex case is asserted to hold without dependence on ||λ*||. This rests on the assumption that each saddle-point subproblem admits an efficient conditional gradient oracle whose total cost (number of linear optimization steps) remains independent of both ε and ||λ*||. For general functional constraints the induced min-max saddle-point problem may itself require inner iterations whose cost scales with dual magnitude or smoothness parameters, which would reintroduce the dependence the claim asserts is absent. This assumption is load-bearing for the central complexity result.

minor comments (1)

The abstract states that numerical results illustrate practical sparsity/risk trade-offs but provides no information on the specific metrics, comparison baselines, or problem dimensions used in the portfolio and IMRT experiments.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful review and for identifying a key assumption underlying our complexity claims. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract (LCG method paragraph): The O(ε^{-2} log(ε^{-1})) iteration complexity for the convex case is asserted to hold without dependence on ||λ*||. This rests on the assumption that each saddle-point subproblem admits an efficient conditional gradient oracle whose total cost (number of linear optimization steps) remains independent of both ε and ||λ*||. For general functional constraints the induced min-max saddle-point problem may itself require inner iterations whose cost scales with dual magnitude or smoothness parameters, which would reintroduce the dependence the claim asserts is absent. This assumption is load-bearing for the central complexity result.

Authors: We agree that the claimed complexity depends on the inner saddle-point subproblems admitting a conditional gradient oracle whose total linear optimization cost is independent of both ε and ||λ*||. In the LCG analysis, the level-set outer loop is constructed precisely so that each subproblem is a saddle-point problem over a compact domain in which the dual variables remain normalized (e.g., within the probability simplex for the functional constraints). Under this normalization the standard CG convergence bound for the inner min-max problem depends only on the smoothness constant of the Lagrangian and the target subproblem accuracy δ (chosen proportionally to ε), without explicit scaling by ||λ*||. The overall iteration count therefore aggregates to O(ε^{-2} log(ε^{-1})) linear optimization steps. For completely arbitrary functional constraints the referee’s concern would be valid; our statements are made under the modeling assumption that the constraints admit such a normalized saddle-point formulation (as is the case for the risk and sparsity functionals considered in the applications). To remove any ambiguity we will add an explicit sentence in the abstract and a short paragraph in Section 2 clarifying the oracle model and the normalization step. revision: partial

Circularity Check

0 steps flagged

No circularity; complexity claims rest on explicit oracle assumptions

full rationale

The paper states its O(ε^{-2} log(ε^{-1})) bound for the convex case under the assumption that functional constraints admit an efficient conditional gradient oracle for the induced saddle-point subproblems. This is an external modeling assumption, not a quantity derived from the result itself. No self-definitional loops, fitted inputs renamed as predictions, load-bearing self-citations, or ansatz smuggling appear in the provided text. The derivation chain therefore remains self-contained against the stated oracles and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions of convex or smooth-nonconvex objectives and the existence of conditional-gradient oracles; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Objectives are convex or smooth nonconvex
Required to separate the convex LCG analysis from the nonconvex IPP-LCG analysis.
domain assumption Conditional gradient oracle exists for the saddle-point subproblems
Invoked by the LCG method description.

pith-pipeline@v0.9.0 · 5723 in / 1277 out tokens · 29022 ms · 2026-05-24T10:35:30.705740+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization
math.OC 2024-12 unverdicted novelty 6.0

Parameter-free first-order methods attain optimal oracle complexity O(ε^{-2/(1+3ρ)}) for convex function-constrained optimization under Hölder smoothness by combining modified Polyak steps, Nesterov momentum, and APL ...

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · cited by 1 Pith paper

[1]

International conference on machine learning , 699–707 (PMLR)

Allen-Zhu Z, Hazan E (2016) Variance reduction for faster non-co nvex optimization. International conference on machine learning , 699–707 (PMLR). Aravkin AY, Burke JV, Drusvyatskiy D, Friedlander MP, Roy S (2019 ) Level-set methods for convex opti- mization. Mathematical Programming 174(1):359–390. Aravkin AY, Burke JV, Friedlander MP (2013) Variational ...

work page 2016
[2]

Bertsekas DP (1997) Nonlinear programming

28 Bertsekas D (2015) Convex optimization algorithms (Athena Scientiﬁc). Bertsekas DP (1997) Nonlinear programming. Journal of the Operational Research Society 48(3):334–334. Bertsimas D, Shioda R (2009) Algorithm for cardinality-constrained quadratic optimization. Computational Optimization and Applications 43(1):1–22. Blumensath T, Davies ME (2008) Iter...

work page arXiv 2015
[3]

McNeil AJ, Frey R, Embrechts P (2015) Quantitative risk management: concepts, techniques and to ols-revised edition (Princeton university press). Nemirovski A (2004) Prox-method with rate of convergence o (1/ t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave sa ddle point problems. SIAM Journal on Optim...

work page 2015
[4]

Springer Optimization and Its Applications (Springer International Publishing)

Nesterov Y (2018) Lectures on Convex Optimization . Springer Optimization and Its Applications (Springer International Publishing). Reddi SJ, Hefny A, Sra S, Poczos B, Smola A (2016) Stochastic var iance reduction for nonconvex optimization. International conference on machine learning , 314–323 (PMLR). Rockafellar RT, Uryasev S (2002) Conditional value-a...

work page 2018
[5]

three-point

a1 Appendix A.1. Convergence Analysis of CGO A.1.1. Auxiliary Lemmas To establish the convergence for CGO, we tap into the followi ng three well-studied results. Through- out the analysis, we need to use the following notation: let α t be deﬁned in Algorithm 2, deﬁne the sequence {Γt} as Γ t := { 1, if t = 1, (1 − α t)Γt− 1, if t> 1, t = 1, 2 · · ·. The ﬁ...

work page 2020
[6]

Let (x∗,y ∗) ∈ Rn × Rm + be the saddle point of the convex constrained problem ( 1.1)

a2 Lemma A.1.4. Let (x∗,y ∗) ∈ Rn × Rm + be the saddle point of the convex constrained problem ( 1.1). Let (γ ∗,z ∗) be the optimal dual solution of the root ﬁnding problem min x∈ X max (γ,z )∈ Z L(x, (γ,z )) :=γ[f (x) − f ∗] + ⟨h(x),z ⟩ (A.1.1) (i.e. problem (2.1) with l =f ∗). Denote [·]+ := max{0, ·}. Then ∀x ∈ X, f (x) − f ∗ is lower bounded such that...

work page 2013
[7]

(A.1.15) In this way, we show the conclusion in ( 2.23)

+ 18 √ T ¯V T + 1 ] ≤ 2(L ¯f +z⊤L¯h)D2 ¯X T + 1 + ¯MD ¯X√ T + 1 [ 18 ¯V + 7 6 ] , ∀w ∈ ( ¯X, ¯Z). (A.1.15) In this way, we show the conclusion in ( 2.23). □ a7 A.1.3. CGO for Structured Nonsmooth Functions In this section, we focus on problem ( 2.8) where ¯f (·) and ¯hi(·), i = 1, · · ·, ¯m are structured nons- mooth functions represented by the following...

work page 2005
[8]

(A.1.17) It can be shown that (see Nesterov (2005)), ¯fη0 and ¯hi,η i are diﬀerentiable with Lipschitz constants L ¯f,η := ∥B0∥2 ω 0+η0 andL¯hi,η := ∥Bi∥2 ω i+ηi

to approximate the possibly nonsmooth functions ¯f and ¯ hi by ¯fη0 and ¯hi,η i stated below: ¯fη0(x) := max y∈ Y0 {⟨B0x,y ⟩ − ˆf (y) − η0U0(y)}, (A.1.16) ¯hi,η i(x) := max y∈ Yi {⟨Bix,y ⟩ − ˆhi(y) − ηiUi(y)},i = 1, · · ·, ¯m. (A.1.17) It can be shown that (see Nesterov (2005)), ¯fη0 and ¯hi,η i are diﬀerentiable with Lipschitz constants L ¯f,η := ∥B0∥2 ω...

work page 2005
[9]

In this way, ˜f ≤ f ∗ and ˜κ ≤ κ

We can choose ˜κ = − U1 l1− ˜f , where ˜f := min{f (x0) +⟨∇ f (x0),x − x0⟩ :x ∈ X}. In this way, ˜f ≤ f ∗ and ˜κ ≤ κ. Similar to ( Lin et al. 2018 ), the outer loop iteration complexity relies on κ and we will demonstrate later how the terminal condition Lk ≥ − ǫ˜κ implies the convergence of the outer loop. The MLCG method is summari zed in Algorithm

work page 2018
[10]

Coincidentally, inf u∈ R {u +α − 1E[X − u]+} is the Conditional-Value-at-Risk ( CVaR) measure that is convex

Note that inf t>0 E[φ(tX)] − α = inf u∈ R {u +α − 1E[X − u]+} and the minimum of the right-hand-side of the above inequali ty falls in [ u, ¯u], where where u and ¯u are the respective left and right side 1 − α quantile of the distribution of X. Coincidentally, inf u∈ R {u +α − 1E[X − u]+} is the Conditional-Value-at-Risk ( CVaR) measure that is convex. L...

work page 2021
[11]

01] 3 180 262144 2000 0.25 [40 , 50, 100] [0 . 01,

work page 2000
[12]

05] 4 180 262144 2000 0.25 [50 , 60, 80] [0 . 01,

work page 2000

[1] [1]

International conference on machine learning , 699–707 (PMLR)

Allen-Zhu Z, Hazan E (2016) Variance reduction for faster non-co nvex optimization. International conference on machine learning , 699–707 (PMLR). Aravkin AY, Burke JV, Drusvyatskiy D, Friedlander MP, Roy S (2019 ) Level-set methods for convex opti- mization. Mathematical Programming 174(1):359–390. Aravkin AY, Burke JV, Friedlander MP (2013) Variational ...

work page 2016

[2] [2]

Bertsekas DP (1997) Nonlinear programming

28 Bertsekas D (2015) Convex optimization algorithms (Athena Scientiﬁc). Bertsekas DP (1997) Nonlinear programming. Journal of the Operational Research Society 48(3):334–334. Bertsimas D, Shioda R (2009) Algorithm for cardinality-constrained quadratic optimization. Computational Optimization and Applications 43(1):1–22. Blumensath T, Davies ME (2008) Iter...

work page arXiv 2015

[3] [3]

McNeil AJ, Frey R, Embrechts P (2015) Quantitative risk management: concepts, techniques and to ols-revised edition (Princeton university press). Nemirovski A (2004) Prox-method with rate of convergence o (1/ t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave sa ddle point problems. SIAM Journal on Optim...

work page 2015

[4] [4]

Springer Optimization and Its Applications (Springer International Publishing)

Nesterov Y (2018) Lectures on Convex Optimization . Springer Optimization and Its Applications (Springer International Publishing). Reddi SJ, Hefny A, Sra S, Poczos B, Smola A (2016) Stochastic var iance reduction for nonconvex optimization. International conference on machine learning , 314–323 (PMLR). Rockafellar RT, Uryasev S (2002) Conditional value-a...

work page 2018

[5] [5]

three-point

a1 Appendix A.1. Convergence Analysis of CGO A.1.1. Auxiliary Lemmas To establish the convergence for CGO, we tap into the followi ng three well-studied results. Through- out the analysis, we need to use the following notation: let α t be deﬁned in Algorithm 2, deﬁne the sequence {Γt} as Γ t := { 1, if t = 1, (1 − α t)Γt− 1, if t> 1, t = 1, 2 · · ·. The ﬁ...

work page 2020

[6] [6]

Let (x∗,y ∗) ∈ Rn × Rm + be the saddle point of the convex constrained problem ( 1.1)

a2 Lemma A.1.4. Let (x∗,y ∗) ∈ Rn × Rm + be the saddle point of the convex constrained problem ( 1.1). Let (γ ∗,z ∗) be the optimal dual solution of the root ﬁnding problem min x∈ X max (γ,z )∈ Z L(x, (γ,z )) :=γ[f (x) − f ∗] + ⟨h(x),z ⟩ (A.1.1) (i.e. problem (2.1) with l =f ∗). Denote [·]+ := max{0, ·}. Then ∀x ∈ X, f (x) − f ∗ is lower bounded such that...

work page 2013

[7] [7]

(A.1.15) In this way, we show the conclusion in ( 2.23)

+ 18 √ T ¯V T + 1 ] ≤ 2(L ¯f +z⊤L¯h)D2 ¯X T + 1 + ¯MD ¯X√ T + 1 [ 18 ¯V + 7 6 ] , ∀w ∈ ( ¯X, ¯Z). (A.1.15) In this way, we show the conclusion in ( 2.23). □ a7 A.1.3. CGO for Structured Nonsmooth Functions In this section, we focus on problem ( 2.8) where ¯f (·) and ¯hi(·), i = 1, · · ·, ¯m are structured nons- mooth functions represented by the following...

work page 2005

[8] [8]

(A.1.17) It can be shown that (see Nesterov (2005)), ¯fη0 and ¯hi,η i are diﬀerentiable with Lipschitz constants L ¯f,η := ∥B0∥2 ω 0+η0 andL¯hi,η := ∥Bi∥2 ω i+ηi

to approximate the possibly nonsmooth functions ¯f and ¯ hi by ¯fη0 and ¯hi,η i stated below: ¯fη0(x) := max y∈ Y0 {⟨B0x,y ⟩ − ˆf (y) − η0U0(y)}, (A.1.16) ¯hi,η i(x) := max y∈ Yi {⟨Bix,y ⟩ − ˆhi(y) − ηiUi(y)},i = 1, · · ·, ¯m. (A.1.17) It can be shown that (see Nesterov (2005)), ¯fη0 and ¯hi,η i are diﬀerentiable with Lipschitz constants L ¯f,η := ∥B0∥2 ω...

work page 2005

[9] [9]

In this way, ˜f ≤ f ∗ and ˜κ ≤ κ

We can choose ˜κ = − U1 l1− ˜f , where ˜f := min{f (x0) +⟨∇ f (x0),x − x0⟩ :x ∈ X}. In this way, ˜f ≤ f ∗ and ˜κ ≤ κ. Similar to ( Lin et al. 2018 ), the outer loop iteration complexity relies on κ and we will demonstrate later how the terminal condition Lk ≥ − ǫ˜κ implies the convergence of the outer loop. The MLCG method is summari zed in Algorithm

work page 2018

[10] [10]

Coincidentally, inf u∈ R {u +α − 1E[X − u]+} is the Conditional-Value-at-Risk ( CVaR) measure that is convex

Note that inf t>0 E[φ(tX)] − α = inf u∈ R {u +α − 1E[X − u]+} and the minimum of the right-hand-side of the above inequali ty falls in [ u, ¯u], where where u and ¯u are the respective left and right side 1 − α quantile of the distribution of X. Coincidentally, inf u∈ R {u +α − 1E[X − u]+} is the Conditional-Value-at-Risk ( CVaR) measure that is convex. L...

work page 2021

[11] [11]

01] 3 180 262144 2000 0.25 [40 , 50, 100] [0 . 01,

work page 2000

[12] [12]

05] 4 180 262144 2000 0.25 [50 , 60, 80] [0 . 01,

work page 2000