Single-loop approaches to nonsmooth bilevel optimisation
Pith reviewed 2026-06-26 19:51 UTC · model grok-4.3
The pith
A single-loop algorithm solves nonsmooth bilevel optimization problems with set-valued inner constraints and is proven to converge.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study bilevel optimisation problems in which the inner problem is represented as a set-valued, parametric constraint. We develop relevant optimistic and pessimistic calculus rules, derive corresponding optimality conditions, and formulate nonsmooth adjoint inclusions based on both the Fréchet and limiting coderivatives. Founded on these results, we propose a single-loop algorithm that accommodates a wide range of inner and adjoint steps, including those of primal-dual methods. We prove its convergence.
What carries the argument
The single-loop algorithm built on optimistic and pessimistic calculus rules and Fréchet/limiting coderivative adjoint inclusions for set-valued parametric constraints.
If this is right
- The algorithm accommodates a wide range of inner and adjoint steps including primal-dual methods.
- Convergence of the algorithm is proven under the stated conditions.
- The method applies to total variation regularised inverse problems.
- Optimality conditions are derived for both optimistic and pessimistic cases.
Where Pith is reading between the lines
- The single-loop structure may reduce the computational burden compared to alternating inner-outer iterations.
- This framework could apply to other nested optimization problems in imaging and machine learning.
- Extensions might involve different nonsmooth regularizers beyond total variation.
Load-bearing premise
The inner problem can be represented as a set-valued parametric constraint for which the optimistic and pessimistic calculus rules and Fréchet/limiting coderivative adjoint inclusions hold.
What would settle it
Running the algorithm on a bilevel problem with set-valued inner constraint and observing that it does not converge or produces solutions that violate the derived optimality conditions.
Figures
read the original abstract
We study bilevel optimisation problems in which the inner problem is represented as a set-valued, parametric constraint. We develop relevant optimistic and pessimistic calculus rules, derive corresponding optimality conditions, and formulate nonsmooth adjoint inclusions based on both the Fr\'echet and limiting coderivatives. Founded on these results, we propose a single-loop algorithm that accommodates a wide range of inner and adjoint steps, including those of primal-dual methods. We prove its convergence. Numerical experiments on total variation regularised inverse problems demonstrate the practicality of the approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies bilevel optimization problems in which the inner problem is represented as a set-valued parametric constraint. It develops optimistic and pessimistic calculus rules, derives corresponding optimality conditions, and formulates nonsmooth adjoint inclusions based on both the Fréchet and limiting coderivatives. On this foundation the authors propose a single-loop algorithm accommodating a range of inner and adjoint steps (including primal-dual methods), prove its convergence, and illustrate the approach with numerical experiments on total-variation-regularized inverse problems.
Significance. If the derivations and convergence result hold, the paper supplies a unified variational-analysis pipeline for single-loop nonsmooth bilevel problems with set-valued constraints, explicitly accommodating primal-dual inner solvers. This extends existing coderivative-based optimality conditions and offers a practical algorithmic template for applications such as inverse problems, with the numerical demonstration providing concrete evidence of implementability.
minor comments (3)
- [§1] §1 (Introduction): the statement that the inner problem is 'represented as a set-valued parametric constraint' would benefit from an explicit reference to the precise regularity assumptions (e.g., closed-graph or Aubin continuity) that are later used to justify the optimistic/pessimistic calculus rules.
- [Numerical experiments] The numerical section would be strengthened by reporting iteration counts or CPU times alongside the reconstruction quality metrics, so that the claimed practicality of the single-loop scheme can be directly compared with existing double-loop baselines.
- [Adjoint inclusions] Notation: the distinction between the optimistic and pessimistic value functions is introduced early but the corresponding adjoint inclusions are presented without a side-by-side comparison table; adding such a table would improve readability.
Simulated Author's Rebuttal
We thank the referee for the positive summary and significance assessment of our manuscript on single-loop approaches to nonsmooth bilevel optimisation. The recommendation for minor revision is noted. However, the report lists no specific major comments, so we have no point-by-point responses to provide. We remain available to address any additional points the editor or referee may raise.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper states an assumption on the inner problem as a set-valued parametric constraint, develops optimistic/pessimistic calculus rules and Fréchet/limiting coderivative inclusions from first principles in variational analysis, then uses those inclusions to construct and prove convergence of a single-loop algorithm accommodating primal-dual steps. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness or ansatz is imported via self-citation. The convergence result rests on the derived adjoint inclusions rather than on any input quantity defined by the same construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
F. J. A. Artacho and M. H. Geoffroy, Characterization of metric regularity of subdifferentials, Journal of Convex Analysis15 (2008), 365–380
2008
-
[2]
Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)
M. Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)
2021
-
[3]
Benko, H
M. Benko, H. Gfrerer, and J. V. Outrata, Calculus for directional limiting normal cones and subd- ifferentials,Set-Valued and Variational Analysis27 (2019), 713–745
2019
-
[4]
Bertrand, Q
Q. Bertrand, Q. Klopfenstein, M. Massias, M. Blondel, S. Vaiter, A. Gramfort, and J. Salmon, Im- plicit Differentiation for Fast Hyperparameter Selection in Non-Smooth Convex Learning,Journal of Machine Learning Research23 (2022), 1–43,http://jmlr.org/papers/v23/21-0486.html
2022
-
[5]
L. Bogensperger, M. J. Ehrhardt, T. Pock, M. S. Salehi, and H. S. Wong, An Adaptively Inexact Method for Bilevel Learning Using Primal–Dual-Style Differentiation,Journal of Mathematical Imaging and Vision67 (2025),doi:10.1007/s10851-025-01262-w
-
[6]
Bolte, T
J. Bolte, T. Le, E. Pauwels, and T. Silveti-Falls, Nonsmooth Implicit Differentiation for Machine-Learning and Optimization, inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan (eds.), volume 34, Curran Associates, Inc., 2021, 13537–13549,https://proceedings.neurips.cc/paper_files/pap...
2021
-
[7]
J. Bolte, E. Pauwels, and A. Silveti-Falls, Differentiating Nonsmooth Solutions to Parametric Mono- tone Inclusion Problems,SIAM Journal on Optimization34 (2024), 71–97,doi:10.1137/22m1541630. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 46 of 56 0 5 10 15 100 200 300 ˜𝑥 𝑥 𝐽◦𝑆 𝑢 (𝑥) (a)funct...
-
[8]
J. F. Bonnans and A. Shapiro,Perturbation Analysis of Optimization Problems, Springer New York, 2000,doi:10.1007/978-1-4612-1394-9
-
[9]
Journal of Mathematical Imaging and Vision , year =
A. Chambolle, An algorithm for total variation minimization and applications,Journal of Mathe- matical Imaging and Vision20 (2004), 89–97,doi:10.1023/b:jmiv.0000011325.36760.1e
-
[10]
Chambolle and T
A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with appli- cations to imaging,Journal of Mathematical Imaging and Vision40 (2011), 120–145,doi:10.1007/ s10851-010-0251-1
2011
- [11]
-
[12]
C. Christof, J. C. De Los Reyes, and C. Meyer, A non-smooth trust-region method for locally Lips- chitz functions with application to optimization problems constrained by variational inequalities, SIAM Journal on Optimization30 (2020), 2163–2196,doi:10.1137/18m1164925,arXiv:1711.03208
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1137/18m1164925 2020
-
[13]
C. Clason and T. Valkonen,Introduction to Nonsmooth Analysis and Optimization, MOS-SIAM Series on Optimization, SIAM, 2026,doi:10.1137/1.9781611978995
-
[14]
M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau, A framework for bilevel optimization that enables stochastic and global variance reduction algorithms, 2022,arXiv:2201.13409
-
[15]
J. C. De Los Reyes,Numerical PDE-Constrained Optimization, SpringerBriefs in Optimization, Springer International Publishing, 2015,doi:10.1007/978-3-319-13395-9
-
[16]
J. C. De Los Reyes, C. B. Schönlieb, and T. Valkonen, Bilevel parameter learning for higher-order total variation regularisation models,Journal of Mathematical Imaging and Vision57 (2017), 1–25, doi:10.1007/s10851-016-0662-8,arXiv:1508.07243. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 47 o...
-
[17]
S. Dempe and P. Mehlitz, Duality-Based Single-Level Reformulations of Bilevel Optimization Prob- lems,Journal of Optimization Theory and Applications205 (2025),doi:10.1007/s10957-025-02627-2
-
[18]
N. Dizon and T. Valkonen, Differential estimates for fast first-order multilevel nonconvex optimi- sation, 2024,arXiv:2412.01481v3. submitted
-
[19]
A. L. Dontchev and R. T. Rockafellar,Implicit Functions and Solution Mappings: A View from Variational Analysis, Springer Series in Operations Research and Financial Engineering, Springer, 2014,doi:10.1007/978-1-4939-1037-3
-
[20]
M. J. Ehrhardt and L. Roberts, Inexact derivative-free optimization for bilevel learning,Journal of mathematical imaging and vision63 (2021), 580–600,doi:10.1007/s10851-021-01020-8
-
[21]
M. C. Ferris, O. L. Mangasarian, and J. S. Pang,Complementarity: Applications, Algorithms and Extensions, Springer US, 2001,doi:10.1007/978-1-4757-3279-5
-
[22]
Franzen, Kodak lossless true color image suite, PhotoCD PCD0992
R. Franzen, Kodak lossless true color image suite, PhotoCD PCD0992. Lossless, true color images released by the Eastman Kodak Company, 1999,http://r0k.us/graphics/kodak/
1999
-
[23]
Ghadimi and M
S. Ghadimi and M. Wang, Approximation methods for bilevel programming, 2018,arXiv:1802. 02246. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 48 of 56
2018
-
[24]
N. T. V. Hang, W. Jung, and E. Sarabi, Role of Subgradients in Variational Analysis of Polyhedral Functions,Journal of Optimization Theory and Applications200 (2024), 1160–1192,doi:10.1007/ s10957-024-02378-6
2024
-
[25]
N. T. V. Hang and M. E. Sarabi, A Chain Rule for Strict Twice Epi-Differentiability and Its Appli- cations,SIAM Journal on Optimization34 (2024), 918–945,doi:10.1137/22m1520025
-
[26]
He and X
B. He and X. Yuan, Convergence Analysis of Primal-Dual Algorithms for a Saddle-Point Problem: From Contraction Perspective,SIAM Journal on Imaging Sciences5 (2012), 119–149,doi:10.1137/ 100814494
2012
-
[27]
Hinze, R
M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich,Optimization with PDE Constraints, number 23 in Mathematical Modelling: Theory and Applications, Springer Netherlands, 2009,doi:10.1007/ 978-1-4020-8839-1
2009
- [28]
-
[29]
A. D. Ioffe,Variational Analysis of Regular Mappings: Theory and Applications, Springer Mono- graphs in Mathematics, Springer, 2017,doi:10.1007/978-3-319-64277-2
-
[30]
Dynamic inverse problems: Single-loop online algorithms
J. Jauhiainen, Y. Nabou, and T. Valkonen, Dynamic inverse problems: Single-loop online algo- rithms, 2026,arXiv:2606.10951. submitted
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[31]
Ji and Y
K. Ji and Y. Liang, Lower bounds and accelerated algorithms for bilevel optimization,Journal of Machine Learning Research23 (2022), 1–56
2022
-
[32]
K. Ji, J. Yang, and Y. Liang, Bilevel optimization: Convergence analysis and enhanced design, in International Conference on Machine Learning, 2021, 4882–4892
2021
- [33]
-
[34]
J. Kwon, D. Kwon, S. Wright, and R. D. Nowak, A fully first-order method for stochastic bilevel optimization, inInternational Conference on Machine Learning, 2023, 18083–18113
2023
-
[35]
J. Li, B. Gu, and H. Huang, A fully single loop algorithm for bilevel optimization without hessian inverse, inProceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2022, 7426–7434
2022
-
[36]
Z. Q. Luo, J. S. Pang, and D. Ralph,Mathematical Programs with Equilibrium Constraints, Cam- bridge University Press, 1996,doi:10.1017/cbo9780511983658
-
[37]
B. S. Mordukhovich,Variational Analysis and Generalized Differentiation I: Basic Theory, vol- ume 330 of Grundlehren der mathematischen Wissenschaften, Springer, 2006,doi:10.1007/ 3-540-31247-1
2006
-
[38]
B. S. Mordukhovich, N. M. Nam, and N. D. Yen, Fréchet subdifferential calculus and optimal- ity conditions in nondifferentiable programming,Optimization55 (2006), 685–708,doi:10.1080/ 02331930600816395
2006
-
[39]
J. V. Outrata, On the numerical solution of a class of Stackelberg problems,ZOR Zeitschrift für Operations Research Methods and Models of Operations Research34 (1990), 255–277,doi:10.1007/ bf01416737. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 49 of 56
1990
-
[40]
L. Qi and J. Sun, A trust region algorithm for minimization of locally Lipschitzian functions, Mathematical Programming66 (1994), 25–43,doi:10.1007/bf01581136
-
[41]
R. T. Rockafellar and R. J. B. Wets,Variational Analysis, Springer, 1998,doi:10.1007/ 978-3-642-02431-3
1998
-
[42]
M. S. Salehi, S. Mukherjee, L. Roberts, , and M. J. Ehrhardt, An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning, 2024,arXiv:2308. 10098
2024
-
[43]
E. Suonperä and T. Valkonen, Linearly convergent bilevel optimization with single-step inner methods,Computational Optimization and Applications(2023),doi:10.1007/s10589-023-00527-7, arXiv:2205.04862
-
[44]
E. Suonperä and T. Valkonen, Single-loop methods for bilevel parameter learning in inverse imaging,Journal of Nonsmooth Analysis and Optimization5 (2025),doi:10.46298/jnsao-2025-15577, arXiv:2408.08123
-
[45]
T. Valkonen, Preconditioned proximal point methods and notions of partial subregularity,Journal of Convex Analysis28 (2021), 251–278,arXiv:1711.05123
-
[46]
Vilches, D
E. Vilches, D. Villacís, and P. Pérez-Aros, A Variational Analysis Approach for Bilevel Hyperpa- rameter Optimization with Sparse Regularization, 2025,https://optimization-online.org/?p=30877
2025
-
[47]
Wachsmuth and D
G. Wachsmuth and D. Walter, Proximal gradient methods in Banach spaces, 2025,arXiv:2509. 24685
2025
-
[48]
a priori
J. Yang, K. Ji, and Y. Liang, Provably faster algorithms for bilevel optimization,Advances in Neural Information Processing Systems34 (2021), 13670–13682. appendix a contractivity under metric subregularity With the goal of verifying the inner tracking property for standard inner algorithms, subject to metric subregularity, we prove here the linear conver...
2021
-
[49]
Otherwise: the point (𝑧, 𝑦∗) does not belong to the closed Graph𝜕𝛿 𝐵ℝ2 (0,𝐶) and thus (b.1) cannot be satisfied. The limiting coderivative equals the Fréchet coderivative with the following exceptions: • when ∥𝑧∥=𝐶 and ⟨𝑦, 𝑧⟩=0 , we have b𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧[0,∞) but taking 𝑦𝑘 ∗ ∈𝑧(0,∞) such that 𝑦𝑘 ∗ →0 we obtain 𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧ℝ from t...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.