Single-loop approaches to nonsmooth bilevel optimisation

Ensio Suonper\"a; Tuomo Valkonen

arxiv: 2606.19143 · v1 · pith:R5XJFEPTnew · submitted 2026-06-17 · 🧮 math.OC

Single-loop approaches to nonsmooth bilevel optimisation

Ensio Suonper\"a , Tuomo Valkonen This is my paper

Pith reviewed 2026-06-26 19:51 UTC · model grok-4.3

classification 🧮 math.OC

keywords bilevel optimizationnonsmooth optimizationset-valued constraintscoderivativessingle-loop algorithmprimal-dual methodsinverse problemsadjoint inclusions

0 comments

The pith

A single-loop algorithm solves nonsmooth bilevel optimization problems with set-valued inner constraints and is proven to converge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops optimistic and pessimistic calculus rules for bilevel problems where the inner problem is a set-valued parametric constraint. It derives optimality conditions and nonsmooth adjoint inclusions using Fréchet and limiting coderivatives. Based on these, it proposes a single-loop algorithm compatible with various inner and adjoint steps, including primal-dual methods, and proves convergence. Numerical tests on total variation regularized inverse problems show practicality. This approach addresses the challenge of solving nested optimization problems without relying on double-loop iterations.

Core claim

We study bilevel optimisation problems in which the inner problem is represented as a set-valued, parametric constraint. We develop relevant optimistic and pessimistic calculus rules, derive corresponding optimality conditions, and formulate nonsmooth adjoint inclusions based on both the Fréchet and limiting coderivatives. Founded on these results, we propose a single-loop algorithm that accommodates a wide range of inner and adjoint steps, including those of primal-dual methods. We prove its convergence.

What carries the argument

The single-loop algorithm built on optimistic and pessimistic calculus rules and Fréchet/limiting coderivative adjoint inclusions for set-valued parametric constraints.

If this is right

The algorithm accommodates a wide range of inner and adjoint steps including primal-dual methods.
Convergence of the algorithm is proven under the stated conditions.
The method applies to total variation regularised inverse problems.
Optimality conditions are derived for both optimistic and pessimistic cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-loop structure may reduce the computational burden compared to alternating inner-outer iterations.
This framework could apply to other nested optimization problems in imaging and machine learning.
Extensions might involve different nonsmooth regularizers beyond total variation.

Load-bearing premise

The inner problem can be represented as a set-valued parametric constraint for which the optimistic and pessimistic calculus rules and Fréchet/limiting coderivative adjoint inclusions hold.

What would settle it

Running the algorithm on a bilevel problem with set-valued inner constraint and observing that it does not converge or produces solutions that violate the derived optimality conditions.

Figures

Figures reproduced from arXiv: 2606.19143 by Ensio Suonper\"a, Tuomo Valkonen.

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

**Figure 6.** Figure 6: b [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗

**Figure 6.** Figure 6: a [PITH_FULL_IMAGE:figures/full_fig_p043_6.png] view at source ↗

**Figure 6.** Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p044_6.png] view at source ↗

**Figure 6.** Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p046_6.png] view at source ↗

**Figure 6.** Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p047_6.png] view at source ↗

read the original abstract

We study bilevel optimisation problems in which the inner problem is represented as a set-valued, parametric constraint. We develop relevant optimistic and pessimistic calculus rules, derive corresponding optimality conditions, and formulate nonsmooth adjoint inclusions based on both the Fr\'echet and limiting coderivatives. Founded on these results, we propose a single-loop algorithm that accommodates a wide range of inner and adjoint steps, including those of primal-dual methods. We prove its convergence. Numerical experiments on total variation regularised inverse problems demonstrate the practicality of the approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper works out optimistic/pessimistic calculus rules for set-valued bilevel constraints, derives adjoint inclusions, and gives a convergent single-loop algorithm compatible with primal-dual steps.

read the letter

The main point is that they develop calculus rules for optimistic and pessimistic set-valued parametric constraints, turn those into Fréchet and limiting coderivative adjoint inclusions, and then build a single-loop algorithm around the resulting optimality conditions. The algorithm is designed to accept a range of inner and adjoint steps, including primal-dual ones, and they prove convergence. Numerical tests on total variation inverse problems are included to show it runs.

What is actually new is the specific combination of those calculus rules with a flexible single-loop scheme for the nonsmooth set-valued case. Most existing bilevel work either assumes smoother inner problems or uses double-loop methods, so this fills a narrower but practical gap.

The paper does a reasonable job staying within the standard variational-analysis pipeline and targeting applications like hyperparameter tuning and imaging. The flexibility claim for the algorithm is the part that could be useful if the proofs hold.

The soft spots are the usual ones for this style of work. The whole construction rests on the inner problem admitting a set-valued representation where the optimistic/pessimistic rules apply without extra qualification conditions that turn out to be hard to check. The convergence result is stated but the abstract gives no indication of how restrictive the required assumptions are in practice. The numerical section is mentioned without enough detail to judge whether the method is competitive or just feasible.

This is for readers already comfortable with coderivatives and bilevel formulations who need single-loop methods for nonsmooth problems. Someone working on inverse problems or meta-learning might find the algorithmic template worth trying.

It deserves peer review. The claims are concrete, the approach follows established methods without obvious internal contradictions, and the topic is relevant enough that referees can evaluate the details.

Referee Report

0 major / 3 minor

Summary. The manuscript studies bilevel optimization problems in which the inner problem is represented as a set-valued parametric constraint. It develops optimistic and pessimistic calculus rules, derives corresponding optimality conditions, and formulates nonsmooth adjoint inclusions based on both the Fréchet and limiting coderivatives. On this foundation the authors propose a single-loop algorithm accommodating a range of inner and adjoint steps (including primal-dual methods), prove its convergence, and illustrate the approach with numerical experiments on total-variation-regularized inverse problems.

Significance. If the derivations and convergence result hold, the paper supplies a unified variational-analysis pipeline for single-loop nonsmooth bilevel problems with set-valued constraints, explicitly accommodating primal-dual inner solvers. This extends existing coderivative-based optimality conditions and offers a practical algorithmic template for applications such as inverse problems, with the numerical demonstration providing concrete evidence of implementability.

minor comments (3)

[§1] §1 (Introduction): the statement that the inner problem is 'represented as a set-valued parametric constraint' would benefit from an explicit reference to the precise regularity assumptions (e.g., closed-graph or Aubin continuity) that are later used to justify the optimistic/pessimistic calculus rules.
[Numerical experiments] The numerical section would be strengthened by reporting iteration counts or CPU times alongside the reconstruction quality metrics, so that the claimed practicality of the single-loop scheme can be directly compared with existing double-loop baselines.
[Adjoint inclusions] Notation: the distinction between the optimistic and pessimistic value functions is introduced early but the corresponding adjoint inclusions are presented without a side-by-side comparison table; adding such a table would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our manuscript on single-loop approaches to nonsmooth bilevel optimisation. The recommendation for minor revision is noted. However, the report lists no specific major comments, so we have no point-by-point responses to provide. We remain available to address any additional points the editor or referee may raise.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper states an assumption on the inner problem as a set-valued parametric constraint, develops optimistic/pessimistic calculus rules and Fréchet/limiting coderivative inclusions from first principles in variational analysis, then uses those inclusions to construct and prove convergence of a single-loop algorithm accommodating primal-dual steps. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness or ansatz is imported via self-citation. The convergence result rests on the derived adjoint inclusions rather than on any input quantity defined by the same construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; insufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5606 in / 1113 out tokens · 20322 ms · 2026-06-26T19:51:05.325416+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 25 canonical work pages · 2 internal anchors

[1]

F. J. A. Artacho and M. H. Geoffroy, Characterization of metric regularity of subdifferentials, Journal of Convex Analysis15 (2008), 365–380

2008
[2]

Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)

M. Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)

2021
[3]

Benko, H

M. Benko, H. Gfrerer, and J. V. Outrata, Calculus for directional limiting normal cones and subd- ifferentials,Set-Valued and Variational Analysis27 (2019), 713–745

2019
[4]

Bertrand, Q

Q. Bertrand, Q. Klopfenstein, M. Massias, M. Blondel, S. Vaiter, A. Gramfort, and J. Salmon, Im- plicit Differentiation for Fast Hyperparameter Selection in Non-Smooth Convex Learning,Journal of Machine Learning Research23 (2022), 1–43,http://jmlr.org/papers/v23/21-0486.html

2022
[5]

Bogensperger, M

L. Bogensperger, M. J. Ehrhardt, T. Pock, M. S. Salehi, and H. S. Wong, An Adaptively Inexact Method for Bilevel Learning Using Primal–Dual-Style Differentiation,Journal of Mathematical Imaging and Vision67 (2025),doi:10.1007/s10851-025-01262-w

work page doi:10.1007/s10851-025-01262-w 2025
[6]

Bolte, T

J. Bolte, T. Le, E. Pauwels, and T. Silveti-Falls, Nonsmooth Implicit Differentiation for Machine-Learning and Optimization, inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan (eds.), volume 34, Curran Associates, Inc., 2021, 13537–13549,https://proceedings.neurips.cc/paper_files/pap...

2021
[7]

Bolte, E

J. Bolte, E. Pauwels, and A. Silveti-Falls, Differentiating Nonsmooth Solutions to Parametric Mono- tone Inclusion Problems,SIAM Journal on Optimization34 (2024), 71–97,doi:10.1137/22m1541630. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 46 of 56 0 5 10 15 100 200 300 ˜𝑥 𝑥 𝐽◦𝑆 𝑢 (𝑥) (a)funct...

work page doi:10.1137/22m1541630 2024
[8]

J. F. Bonnans and A. Shapiro,Perturbation Analysis of Optimization Problems, Springer New York, 2000,doi:10.1007/978-1-4612-1394-9

work page doi:10.1007/978-1-4612-1394-9 2000
[9]

Journal of Mathematical Imaging and Vision , year =

A. Chambolle, An algorithm for total variation minimization and applications,Journal of Mathe- matical Imaging and Vision20 (2004), 89–97,doi:10.1023/b:jmiv.0000011325.36760.1e

work page doi:10.1023/b:jmiv.0000011325.36760.1e 2004
[10]

Chambolle and T

A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with appli- cations to imaging,Journal of Mathematical Imaging and Vision40 (2011), 120–145,doi:10.1007/ s10851-010-0251-1

2011
[11]

T. Chen, Y. Sun, and W. Yin, A single-timescale stochastic bilevel optimization method, 2021, arXiv:2102.04671

work page arXiv 2021
[12]

A non-smooth trust-region method for locally Lipschitz functions with application to optimization problems constrained by variational inequalities

C. Christof, J. C. De Los Reyes, and C. Meyer, A non-smooth trust-region method for locally Lips- chitz functions with application to optimization problems constrained by variational inequalities, SIAM Journal on Optimization30 (2020), 2163–2196,doi:10.1137/18m1164925,arXiv:1711.03208

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1137/18m1164925 2020
[13]

Clason and T

C. Clason and T. Valkonen,Introduction to Nonsmooth Analysis and Optimization, MOS-SIAM Series on Optimization, SIAM, 2026,doi:10.1137/1.9781611978995

work page doi:10.1137/1.9781611978995 2026
[14]

Dagréou, P

M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau, A framework for bilevel optimization that enables stochastic and global variance reduction algorithms, 2022,arXiv:2201.13409

work page arXiv 2022
[15]

J. C. De Los Reyes,Numerical PDE-Constrained Optimization, SpringerBriefs in Optimization, Springer International Publishing, 2015,doi:10.1007/978-3-319-13395-9

work page doi:10.1007/978-3-319-13395-9 2015
[16]

computational resources

J. C. De Los Reyes, C. B. Schönlieb, and T. Valkonen, Bilevel parameter learning for higher-order total variation regularisation models,Journal of Mathematical Imaging and Vision57 (2017), 1–25, doi:10.1007/s10851-016-0662-8,arXiv:1508.07243. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 47 o...

work page doi:10.1007/s10851-016-0662-8 2017
[17]

Dempe and P

S. Dempe and P. Mehlitz, Duality-Based Single-Level Reformulations of Bilevel Optimization Prob- lems,Journal of Optimization Theory and Applications205 (2025),doi:10.1007/s10957-025-02627-2

work page doi:10.1007/s10957-025-02627-2 2025
[18]

Dizon and T

N. Dizon and T. Valkonen, Differential estimates for fast first-order multilevel nonconvex optimi- sation, 2024,arXiv:2412.01481v3. submitted

work page arXiv 2024
[19]

A. L. Dontchev and R. T. Rockafellar,Implicit Functions and Solution Mappings: A View from Variational Analysis, Springer Series in Operations Research and Financial Engineering, Springer, 2014,doi:10.1007/978-1-4939-1037-3

work page doi:10.1007/978-1-4939-1037-3 2014
[20]

M. J. Ehrhardt and L. Roberts, Inexact derivative-free optimization for bilevel learning,Journal of mathematical imaging and vision63 (2021), 580–600,doi:10.1007/s10851-021-01020-8

work page doi:10.1007/s10851-021-01020-8 2021
[21]

M. C. Ferris, O. L. Mangasarian, and J. S. Pang,Complementarity: Applications, Algorithms and Extensions, Springer US, 2001,doi:10.1007/978-1-4757-3279-5

work page doi:10.1007/978-1-4757-3279-5 2001
[22]

Franzen, Kodak lossless true color image suite, PhotoCD PCD0992

R. Franzen, Kodak lossless true color image suite, PhotoCD PCD0992. Lossless, true color images released by the Eastman Kodak Company, 1999,http://r0k.us/graphics/kodak/

1999
[23]

Ghadimi and M

S. Ghadimi and M. Wang, Approximation methods for bilevel programming, 2018,arXiv:1802. 02246. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 48 of 56

2018
[24]

N. T. V. Hang, W. Jung, and E. Sarabi, Role of Subgradients in Variational Analysis of Polyhedral Functions,Journal of Optimization Theory and Applications200 (2024), 1160–1192,doi:10.1007/ s10957-024-02378-6

2024
[25]

N. T. V. Hang and M. E. Sarabi, A Chain Rule for Strict Twice Epi-Differentiability and Its Appli- cations,SIAM Journal on Optimization34 (2024), 918–945,doi:10.1137/22m1520025

work page doi:10.1137/22m1520025 2024
[26]

He and X

B. He and X. Yuan, Convergence Analysis of Primal-Dual Algorithms for a Saddle-Point Problem: From Contraction Perspective,SIAM Journal on Imaging Sciences5 (2012), 119–149,doi:10.1137/ 100814494

2012
[27]

Hinze, R

M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich,Optimization with PDE Constraints, number 23 in Mathematical Modelling: Theory and Applications, Springer Netherlands, 2009,doi:10.1007/ 978-1-4020-8839-1

2009
[28]

M. Hong, H. T. Wai, Z. Wang, and Z. Yang, A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic, 2020,arXiv:2007.05170

work page arXiv 2020
[29]

A. D. Ioffe,Variational Analysis of Regular Mappings: Theory and Applications, Springer Mono- graphs in Mathematics, Springer, 2017,doi:10.1007/978-3-319-64277-2

work page doi:10.1007/978-3-319-64277-2 2017
[30]

Dynamic inverse problems: Single-loop online algorithms

J. Jauhiainen, Y. Nabou, and T. Valkonen, Dynamic inverse problems: Single-loop online algo- rithms, 2026,arXiv:2606.10951. submitted

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

Ji and Y

K. Ji and Y. Liang, Lower bounds and accelerated algorithms for bilevel optimization,Journal of Machine Learning Research23 (2022), 1–56

2022
[32]

K. Ji, J. Yang, and Y. Liang, Bilevel optimization: Convergence analysis and enhanced design, in International Conference on Machine Learning, 2021, 4882–4892

2021
[33]

L. O. Jolaoso, P. Mehlitz, and A. B. Zemkoho, A fresh look at nonsmooth Levenberg–Marquardt methods with applications to bilevel optimization,Optimization74 (2024), 2745–2792,doi:10.1080/ 02331934.2024.2313688

work page arXiv 2024
[34]

J. Kwon, D. Kwon, S. Wright, and R. D. Nowak, A fully first-order method for stochastic bilevel optimization, inInternational Conference on Machine Learning, 2023, 18083–18113

2023
[35]

J. Li, B. Gu, and H. Huang, A fully single loop algorithm for bilevel optimization without hessian inverse, inProceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2022, 7426–7434

2022
[36]

Z. Q. Luo, J. S. Pang, and D. Ralph,Mathematical Programs with Equilibrium Constraints, Cam- bridge University Press, 1996,doi:10.1017/cbo9780511983658

work page doi:10.1017/cbo9780511983658 1996
[37]

B. S. Mordukhovich,Variational Analysis and Generalized Differentiation I: Basic Theory, vol- ume 330 of Grundlehren der mathematischen Wissenschaften, Springer, 2006,doi:10.1007/ 3-540-31247-1

2006
[38]

B. S. Mordukhovich, N. M. Nam, and N. D. Yen, Fréchet subdifferential calculus and optimal- ity conditions in nondifferentiable programming,Optimization55 (2006), 685–708,doi:10.1080/ 02331930600816395

2006
[39]

J. V. Outrata, On the numerical solution of a class of Stackelberg problems,ZOR Zeitschrift für Operations Research Methods and Models of Operations Research34 (1990), 255–277,doi:10.1007/ bf01416737. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 49 of 56

1990
[40]

Qi and J

L. Qi and J. Sun, A trust region algorithm for minimization of locally Lipschitzian functions, Mathematical Programming66 (1994), 25–43,doi:10.1007/bf01581136

work page doi:10.1007/bf01581136 1994
[41]

R. T. Rockafellar and R. J. B. Wets,Variational Analysis, Springer, 1998,doi:10.1007/ 978-3-642-02431-3

1998
[42]

M. S. Salehi, S. Mukherjee, L. Roberts, , and M. J. Ehrhardt, An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning, 2024,arXiv:2308. 10098

2024
[43]

Suonperä and T

E. Suonperä and T. Valkonen, Linearly convergent bilevel optimization with single-step inner methods,Computational Optimization and Applications(2023),doi:10.1007/s10589-023-00527-7, arXiv:2205.04862

work page doi:10.1007/s10589-023-00527-7 2023
[44]

Suonperä and T

E. Suonperä and T. Valkonen, Single-loop methods for bilevel parameter learning in inverse imaging,Journal of Nonsmooth Analysis and Optimization5 (2025),doi:10.46298/jnsao-2025-15577, arXiv:2408.08123

work page doi:10.46298/jnsao-2025-15577 2025
[45]

Valkonen, Preconditioned proximal point methods and notions of partial subregularity,Journal of Convex Analysis28 (2021), 251–278,arXiv:1711.05123

T. Valkonen, Preconditioned proximal point methods and notions of partial subregularity,Journal of Convex Analysis28 (2021), 251–278,arXiv:1711.05123

work page arXiv 2021
[46]

Vilches, D

E. Vilches, D. Villacís, and P. Pérez-Aros, A Variational Analysis Approach for Bilevel Hyperpa- rameter Optimization with Sparse Regularization, 2025,https://optimization-online.org/?p=30877

2025
[47]

Wachsmuth and D

G. Wachsmuth and D. Walter, Proximal gradient methods in Banach spaces, 2025,arXiv:2509. 24685

2025
[48]

a priori

J. Yang, K. Ji, and Y. Liang, Provably faster algorithms for bilevel optimization,Advances in Neural Information Processing Systems34 (2021), 13670–13682. appendix a contractivity under metric subregularity With the goal of verifying the inner tracking property for standard inner algorithms, subject to metric subregularity, we prove here the linear conver...

2021
[49]

Otherwise: the point (𝑧, 𝑦∗) does not belong to the closed Graph𝜕𝛿 𝐵ℝ2 (0,𝐶) and thus (b.1) cannot be satisfied. The limiting coderivative equals the Fréchet coderivative with the following exceptions: • when ∥𝑧∥=𝐶 and ⟨𝑦, 𝑧⟩=0 , we have b𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧[0,∞) but taking 𝑦𝑘 ∗ ∈𝑧(0,∞) such that 𝑦𝑘 ∗ →0 we obtain 𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧ℝ from t...

2026

[1] [1]

F. J. A. Artacho and M. H. Geoffroy, Characterization of metric regularity of subdifferentials, Journal of Convex Analysis15 (2008), 365–380

2008

[2] [2]

Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)

M. Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)

2021

[3] [3]

Benko, H

M. Benko, H. Gfrerer, and J. V. Outrata, Calculus for directional limiting normal cones and subd- ifferentials,Set-Valued and Variational Analysis27 (2019), 713–745

2019

[4] [4]

Bertrand, Q

Q. Bertrand, Q. Klopfenstein, M. Massias, M. Blondel, S. Vaiter, A. Gramfort, and J. Salmon, Im- plicit Differentiation for Fast Hyperparameter Selection in Non-Smooth Convex Learning,Journal of Machine Learning Research23 (2022), 1–43,http://jmlr.org/papers/v23/21-0486.html

2022

[5] [5]

Bogensperger, M

L. Bogensperger, M. J. Ehrhardt, T. Pock, M. S. Salehi, and H. S. Wong, An Adaptively Inexact Method for Bilevel Learning Using Primal–Dual-Style Differentiation,Journal of Mathematical Imaging and Vision67 (2025),doi:10.1007/s10851-025-01262-w

work page doi:10.1007/s10851-025-01262-w 2025

[6] [6]

Bolte, T

J. Bolte, T. Le, E. Pauwels, and T. Silveti-Falls, Nonsmooth Implicit Differentiation for Machine-Learning and Optimization, inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan (eds.), volume 34, Curran Associates, Inc., 2021, 13537–13549,https://proceedings.neurips.cc/paper_files/pap...

2021

[7] [7]

Bolte, E

J. Bolte, E. Pauwels, and A. Silveti-Falls, Differentiating Nonsmooth Solutions to Parametric Mono- tone Inclusion Problems,SIAM Journal on Optimization34 (2024), 71–97,doi:10.1137/22m1541630. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 46 of 56 0 5 10 15 100 200 300 ˜𝑥 𝑥 𝐽◦𝑆 𝑢 (𝑥) (a)funct...

work page doi:10.1137/22m1541630 2024

[8] [8]

J. F. Bonnans and A. Shapiro,Perturbation Analysis of Optimization Problems, Springer New York, 2000,doi:10.1007/978-1-4612-1394-9

work page doi:10.1007/978-1-4612-1394-9 2000

[9] [9]

Journal of Mathematical Imaging and Vision , year =

A. Chambolle, An algorithm for total variation minimization and applications,Journal of Mathe- matical Imaging and Vision20 (2004), 89–97,doi:10.1023/b:jmiv.0000011325.36760.1e

work page doi:10.1023/b:jmiv.0000011325.36760.1e 2004

[10] [10]

Chambolle and T

A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with appli- cations to imaging,Journal of Mathematical Imaging and Vision40 (2011), 120–145,doi:10.1007/ s10851-010-0251-1

2011

[11] [11]

T. Chen, Y. Sun, and W. Yin, A single-timescale stochastic bilevel optimization method, 2021, arXiv:2102.04671

work page arXiv 2021

[12] [12]

A non-smooth trust-region method for locally Lipschitz functions with application to optimization problems constrained by variational inequalities

C. Christof, J. C. De Los Reyes, and C. Meyer, A non-smooth trust-region method for locally Lips- chitz functions with application to optimization problems constrained by variational inequalities, SIAM Journal on Optimization30 (2020), 2163–2196,doi:10.1137/18m1164925,arXiv:1711.03208

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1137/18m1164925 2020

[13] [13]

Clason and T

C. Clason and T. Valkonen,Introduction to Nonsmooth Analysis and Optimization, MOS-SIAM Series on Optimization, SIAM, 2026,doi:10.1137/1.9781611978995

work page doi:10.1137/1.9781611978995 2026

[14] [14]

Dagréou, P

M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau, A framework for bilevel optimization that enables stochastic and global variance reduction algorithms, 2022,arXiv:2201.13409

work page arXiv 2022

[15] [15]

J. C. De Los Reyes,Numerical PDE-Constrained Optimization, SpringerBriefs in Optimization, Springer International Publishing, 2015,doi:10.1007/978-3-319-13395-9

work page doi:10.1007/978-3-319-13395-9 2015

[16] [16]

computational resources

J. C. De Los Reyes, C. B. Schönlieb, and T. Valkonen, Bilevel parameter learning for higher-order total variation regularisation models,Journal of Mathematical Imaging and Vision57 (2017), 1–25, doi:10.1007/s10851-016-0662-8,arXiv:1508.07243. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 47 o...

work page doi:10.1007/s10851-016-0662-8 2017

[17] [17]

Dempe and P

S. Dempe and P. Mehlitz, Duality-Based Single-Level Reformulations of Bilevel Optimization Prob- lems,Journal of Optimization Theory and Applications205 (2025),doi:10.1007/s10957-025-02627-2

work page doi:10.1007/s10957-025-02627-2 2025

[18] [18]

Dizon and T

N. Dizon and T. Valkonen, Differential estimates for fast first-order multilevel nonconvex optimi- sation, 2024,arXiv:2412.01481v3. submitted

work page arXiv 2024

[19] [19]

A. L. Dontchev and R. T. Rockafellar,Implicit Functions and Solution Mappings: A View from Variational Analysis, Springer Series in Operations Research and Financial Engineering, Springer, 2014,doi:10.1007/978-1-4939-1037-3

work page doi:10.1007/978-1-4939-1037-3 2014

[20] [20]

M. J. Ehrhardt and L. Roberts, Inexact derivative-free optimization for bilevel learning,Journal of mathematical imaging and vision63 (2021), 580–600,doi:10.1007/s10851-021-01020-8

work page doi:10.1007/s10851-021-01020-8 2021

[21] [21]

M. C. Ferris, O. L. Mangasarian, and J. S. Pang,Complementarity: Applications, Algorithms and Extensions, Springer US, 2001,doi:10.1007/978-1-4757-3279-5

work page doi:10.1007/978-1-4757-3279-5 2001

[22] [22]

Franzen, Kodak lossless true color image suite, PhotoCD PCD0992

R. Franzen, Kodak lossless true color image suite, PhotoCD PCD0992. Lossless, true color images released by the Eastman Kodak Company, 1999,http://r0k.us/graphics/kodak/

1999

[23] [23]

Ghadimi and M

S. Ghadimi and M. Wang, Approximation methods for bilevel programming, 2018,arXiv:1802. 02246. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 48 of 56

2018

[24] [24]

N. T. V. Hang, W. Jung, and E. Sarabi, Role of Subgradients in Variational Analysis of Polyhedral Functions,Journal of Optimization Theory and Applications200 (2024), 1160–1192,doi:10.1007/ s10957-024-02378-6

2024

[25] [25]

N. T. V. Hang and M. E. Sarabi, A Chain Rule for Strict Twice Epi-Differentiability and Its Appli- cations,SIAM Journal on Optimization34 (2024), 918–945,doi:10.1137/22m1520025

work page doi:10.1137/22m1520025 2024

[26] [26]

He and X

B. He and X. Yuan, Convergence Analysis of Primal-Dual Algorithms for a Saddle-Point Problem: From Contraction Perspective,SIAM Journal on Imaging Sciences5 (2012), 119–149,doi:10.1137/ 100814494

2012

[27] [27]

Hinze, R

M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich,Optimization with PDE Constraints, number 23 in Mathematical Modelling: Theory and Applications, Springer Netherlands, 2009,doi:10.1007/ 978-1-4020-8839-1

2009

[28] [28]

M. Hong, H. T. Wai, Z. Wang, and Z. Yang, A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic, 2020,arXiv:2007.05170

work page arXiv 2020

[29] [29]

A. D. Ioffe,Variational Analysis of Regular Mappings: Theory and Applications, Springer Mono- graphs in Mathematics, Springer, 2017,doi:10.1007/978-3-319-64277-2

work page doi:10.1007/978-3-319-64277-2 2017

[30] [30]

Dynamic inverse problems: Single-loop online algorithms

J. Jauhiainen, Y. Nabou, and T. Valkonen, Dynamic inverse problems: Single-loop online algo- rithms, 2026,arXiv:2606.10951. submitted

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

Ji and Y

K. Ji and Y. Liang, Lower bounds and accelerated algorithms for bilevel optimization,Journal of Machine Learning Research23 (2022), 1–56

2022

[32] [32]

K. Ji, J. Yang, and Y. Liang, Bilevel optimization: Convergence analysis and enhanced design, in International Conference on Machine Learning, 2021, 4882–4892

2021

[33] [33]

L. O. Jolaoso, P. Mehlitz, and A. B. Zemkoho, A fresh look at nonsmooth Levenberg–Marquardt methods with applications to bilevel optimization,Optimization74 (2024), 2745–2792,doi:10.1080/ 02331934.2024.2313688

work page arXiv 2024

[34] [34]

J. Kwon, D. Kwon, S. Wright, and R. D. Nowak, A fully first-order method for stochastic bilevel optimization, inInternational Conference on Machine Learning, 2023, 18083–18113

2023

[35] [35]

J. Li, B. Gu, and H. Huang, A fully single loop algorithm for bilevel optimization without hessian inverse, inProceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2022, 7426–7434

2022

[36] [36]

Z. Q. Luo, J. S. Pang, and D. Ralph,Mathematical Programs with Equilibrium Constraints, Cam- bridge University Press, 1996,doi:10.1017/cbo9780511983658

work page doi:10.1017/cbo9780511983658 1996

[37] [37]

B. S. Mordukhovich,Variational Analysis and Generalized Differentiation I: Basic Theory, vol- ume 330 of Grundlehren der mathematischen Wissenschaften, Springer, 2006,doi:10.1007/ 3-540-31247-1

2006

[38] [38]

B. S. Mordukhovich, N. M. Nam, and N. D. Yen, Fréchet subdifferential calculus and optimal- ity conditions in nondifferentiable programming,Optimization55 (2006), 685–708,doi:10.1080/ 02331930600816395

2006

[39] [39]

J. V. Outrata, On the numerical solution of a class of Stackelberg problems,ZOR Zeitschrift für Operations Research Methods and Models of Operations Research34 (1990), 255–277,doi:10.1007/ bf01416737. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 49 of 56

1990

[40] [40]

Qi and J

L. Qi and J. Sun, A trust region algorithm for minimization of locally Lipschitzian functions, Mathematical Programming66 (1994), 25–43,doi:10.1007/bf01581136

work page doi:10.1007/bf01581136 1994

[41] [41]

R. T. Rockafellar and R. J. B. Wets,Variational Analysis, Springer, 1998,doi:10.1007/ 978-3-642-02431-3

1998

[42] [42]

M. S. Salehi, S. Mukherjee, L. Roberts, , and M. J. Ehrhardt, An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning, 2024,arXiv:2308. 10098

2024

[43] [43]

Suonperä and T

E. Suonperä and T. Valkonen, Linearly convergent bilevel optimization with single-step inner methods,Computational Optimization and Applications(2023),doi:10.1007/s10589-023-00527-7, arXiv:2205.04862

work page doi:10.1007/s10589-023-00527-7 2023

[44] [44]

Suonperä and T

E. Suonperä and T. Valkonen, Single-loop methods for bilevel parameter learning in inverse imaging,Journal of Nonsmooth Analysis and Optimization5 (2025),doi:10.46298/jnsao-2025-15577, arXiv:2408.08123

work page doi:10.46298/jnsao-2025-15577 2025

[45] [45]

Valkonen, Preconditioned proximal point methods and notions of partial subregularity,Journal of Convex Analysis28 (2021), 251–278,arXiv:1711.05123

T. Valkonen, Preconditioned proximal point methods and notions of partial subregularity,Journal of Convex Analysis28 (2021), 251–278,arXiv:1711.05123

work page arXiv 2021

[46] [46]

Vilches, D

E. Vilches, D. Villacís, and P. Pérez-Aros, A Variational Analysis Approach for Bilevel Hyperpa- rameter Optimization with Sparse Regularization, 2025,https://optimization-online.org/?p=30877

2025

[47] [47]

Wachsmuth and D

G. Wachsmuth and D. Walter, Proximal gradient methods in Banach spaces, 2025,arXiv:2509. 24685

2025

[48] [48]

a priori

J. Yang, K. Ji, and Y. Liang, Provably faster algorithms for bilevel optimization,Advances in Neural Information Processing Systems34 (2021), 13670–13682. appendix a contractivity under metric subregularity With the goal of verifying the inner tracking property for standard inner algorithms, subject to metric subregularity, we prove here the linear conver...

2021

[49] [49]

Otherwise: the point (𝑧, 𝑦∗) does not belong to the closed Graph𝜕𝛿 𝐵ℝ2 (0,𝐶) and thus (b.1) cannot be satisfied. The limiting coderivative equals the Fréchet coderivative with the following exceptions: • when ∥𝑧∥=𝐶 and ⟨𝑦, 𝑧⟩=0 , we have b𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧[0,∞) but taking 𝑦𝑘 ∗ ∈𝑧(0,∞) such that 𝑦𝑘 ∗ →0 we obtain 𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧ℝ from t...

2026