pith. sign in

arxiv: 2606.19143 · v1 · pith:R5XJFEPTnew · submitted 2026-06-17 · 🧮 math.OC

Single-loop approaches to nonsmooth bilevel optimisation

Pith reviewed 2026-06-26 19:51 UTC · model grok-4.3

classification 🧮 math.OC
keywords bilevel optimizationnonsmooth optimizationset-valued constraintscoderivativessingle-loop algorithmprimal-dual methodsinverse problemsadjoint inclusions
0
0 comments X

The pith

A single-loop algorithm solves nonsmooth bilevel optimization problems with set-valued inner constraints and is proven to converge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops optimistic and pessimistic calculus rules for bilevel problems where the inner problem is a set-valued parametric constraint. It derives optimality conditions and nonsmooth adjoint inclusions using Fréchet and limiting coderivatives. Based on these, it proposes a single-loop algorithm compatible with various inner and adjoint steps, including primal-dual methods, and proves convergence. Numerical tests on total variation regularized inverse problems show practicality. This approach addresses the challenge of solving nested optimization problems without relying on double-loop iterations.

Core claim

We study bilevel optimisation problems in which the inner problem is represented as a set-valued, parametric constraint. We develop relevant optimistic and pessimistic calculus rules, derive corresponding optimality conditions, and formulate nonsmooth adjoint inclusions based on both the Fréchet and limiting coderivatives. Founded on these results, we propose a single-loop algorithm that accommodates a wide range of inner and adjoint steps, including those of primal-dual methods. We prove its convergence.

What carries the argument

The single-loop algorithm built on optimistic and pessimistic calculus rules and Fréchet/limiting coderivative adjoint inclusions for set-valued parametric constraints.

If this is right

  • The algorithm accommodates a wide range of inner and adjoint steps including primal-dual methods.
  • Convergence of the algorithm is proven under the stated conditions.
  • The method applies to total variation regularised inverse problems.
  • Optimality conditions are derived for both optimistic and pessimistic cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The single-loop structure may reduce the computational burden compared to alternating inner-outer iterations.
  • This framework could apply to other nested optimization problems in imaging and machine learning.
  • Extensions might involve different nonsmooth regularizers beyond total variation.

Load-bearing premise

The inner problem can be represented as a set-valued parametric constraint for which the optimistic and pessimistic calculus rules and Fréchet/limiting coderivative adjoint inclusions hold.

What would settle it

Running the algorithm on a bilevel problem with set-valued inner constraint and observing that it does not converge or produces solutions that violate the derived optimality conditions.

Figures

Figures reproduced from arXiv: 2606.19143 by Ensio Suonper\"a, Tuomo Valkonen.

Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: b [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗
Figure 6
Figure 6. Figure 6: a [PITH_FULL_IMAGE:figures/full_fig_p043_6.png] view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p044_6.png] view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p046_6.png] view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p047_6.png] view at source ↗
read the original abstract

We study bilevel optimisation problems in which the inner problem is represented as a set-valued, parametric constraint. We develop relevant optimistic and pessimistic calculus rules, derive corresponding optimality conditions, and formulate nonsmooth adjoint inclusions based on both the Fr\'echet and limiting coderivatives. Founded on these results, we propose a single-loop algorithm that accommodates a wide range of inner and adjoint steps, including those of primal-dual methods. We prove its convergence. Numerical experiments on total variation regularised inverse problems demonstrate the practicality of the approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript studies bilevel optimization problems in which the inner problem is represented as a set-valued parametric constraint. It develops optimistic and pessimistic calculus rules, derives corresponding optimality conditions, and formulates nonsmooth adjoint inclusions based on both the Fréchet and limiting coderivatives. On this foundation the authors propose a single-loop algorithm accommodating a range of inner and adjoint steps (including primal-dual methods), prove its convergence, and illustrate the approach with numerical experiments on total-variation-regularized inverse problems.

Significance. If the derivations and convergence result hold, the paper supplies a unified variational-analysis pipeline for single-loop nonsmooth bilevel problems with set-valued constraints, explicitly accommodating primal-dual inner solvers. This extends existing coderivative-based optimality conditions and offers a practical algorithmic template for applications such as inverse problems, with the numerical demonstration providing concrete evidence of implementability.

minor comments (3)
  1. [§1] §1 (Introduction): the statement that the inner problem is 'represented as a set-valued parametric constraint' would benefit from an explicit reference to the precise regularity assumptions (e.g., closed-graph or Aubin continuity) that are later used to justify the optimistic/pessimistic calculus rules.
  2. [Numerical experiments] The numerical section would be strengthened by reporting iteration counts or CPU times alongside the reconstruction quality metrics, so that the claimed practicality of the single-loop scheme can be directly compared with existing double-loop baselines.
  3. [Adjoint inclusions] Notation: the distinction between the optimistic and pessimistic value functions is introduced early but the corresponding adjoint inclusions are presented without a side-by-side comparison table; adding such a table would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our manuscript on single-loop approaches to nonsmooth bilevel optimisation. The recommendation for minor revision is noted. However, the report lists no specific major comments, so we have no point-by-point responses to provide. We remain available to address any additional points the editor or referee may raise.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper states an assumption on the inner problem as a set-valued parametric constraint, develops optimistic/pessimistic calculus rules and Fréchet/limiting coderivative inclusions from first principles in variational analysis, then uses those inclusions to construct and prove convergence of a single-loop algorithm accommodating primal-dual steps. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness or ansatz is imported via self-citation. The convergence result rests on the derived adjoint inclusions rather than on any input quantity defined by the same construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; insufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5606 in / 1113 out tokens · 20322 ms · 2026-06-26T19:51:05.325416+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 25 canonical work pages · 2 internal anchors

  1. [1]

    F. J. A. Artacho and M. H. Geoffroy, Characterization of metric regularity of subdifferentials, Journal of Convex Analysis15 (2008), 365–380

  2. [2]

    Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)

    M. Benko, On inner calmness*, generalized calculus, and derivatives of the normal cone mapping, Journal of Nonsmooth Analysis and Optimization2 (2021)

  3. [3]

    Benko, H

    M. Benko, H. Gfrerer, and J. V. Outrata, Calculus for directional limiting normal cones and subd- ifferentials,Set-Valued and Variational Analysis27 (2019), 713–745

  4. [4]

    Bertrand, Q

    Q. Bertrand, Q. Klopfenstein, M. Massias, M. Blondel, S. Vaiter, A. Gramfort, and J. Salmon, Im- plicit Differentiation for Fast Hyperparameter Selection in Non-Smooth Convex Learning,Journal of Machine Learning Research23 (2022), 1–43,http://jmlr.org/papers/v23/21-0486.html

  5. [5]

    Bogensperger, M

    L. Bogensperger, M. J. Ehrhardt, T. Pock, M. S. Salehi, and H. S. Wong, An Adaptively Inexact Method for Bilevel Learning Using Primal–Dual-Style Differentiation,Journal of Mathematical Imaging and Vision67 (2025),doi:10.1007/s10851-025-01262-w

  6. [6]

    Bolte, T

    J. Bolte, T. Le, E. Pauwels, and T. Silveti-Falls, Nonsmooth Implicit Differentiation for Machine-Learning and Optimization, inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan (eds.), volume 34, Curran Associates, Inc., 2021, 13537–13549,https://proceedings.neurips.cc/paper_files/pap...

  7. [7]

    Bolte, E

    J. Bolte, E. Pauwels, and A. Silveti-Falls, Differentiating Nonsmooth Solutions to Parametric Mono- tone Inclusion Problems,SIAM Journal on Optimization34 (2024), 71–97,doi:10.1137/22m1541630. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 46 of 56 0 5 10 15 100 200 300 ˜𝑥 𝑥 𝐽◦𝑆 𝑢 (𝑥) (a)funct...

  8. [8]

    J. F. Bonnans and A. Shapiro,Perturbation Analysis of Optimization Problems, Springer New York, 2000,doi:10.1007/978-1-4612-1394-9

  9. [9]

    Journal of Mathematical Imaging and Vision , year =

    A. Chambolle, An algorithm for total variation minimization and applications,Journal of Mathe- matical Imaging and Vision20 (2004), 89–97,doi:10.1023/b:jmiv.0000011325.36760.1e

  10. [10]

    Chambolle and T

    A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with appli- cations to imaging,Journal of Mathematical Imaging and Vision40 (2011), 120–145,doi:10.1007/ s10851-010-0251-1

  11. [11]

    T. Chen, Y. Sun, and W. Yin, A single-timescale stochastic bilevel optimization method, 2021, arXiv:2102.04671

  12. [12]

    A non-smooth trust-region method for locally Lipschitz functions with application to optimization problems constrained by variational inequalities

    C. Christof, J. C. De Los Reyes, and C. Meyer, A non-smooth trust-region method for locally Lips- chitz functions with application to optimization problems constrained by variational inequalities, SIAM Journal on Optimization30 (2020), 2163–2196,doi:10.1137/18m1164925,arXiv:1711.03208

  13. [13]

    Clason and T

    C. Clason and T. Valkonen,Introduction to Nonsmooth Analysis and Optimization, MOS-SIAM Series on Optimization, SIAM, 2026,doi:10.1137/1.9781611978995

  14. [14]

    Dagréou, P

    M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau, A framework for bilevel optimization that enables stochastic and global variance reduction algorithms, 2022,arXiv:2201.13409

  15. [15]

    J. C. De Los Reyes,Numerical PDE-Constrained Optimization, SpringerBriefs in Optimization, Springer International Publishing, 2015,doi:10.1007/978-3-319-13395-9

  16. [16]

    computational resources

    J. C. De Los Reyes, C. B. Schönlieb, and T. Valkonen, Bilevel parameter learning for higher-order total variation regularisation models,Journal of Mathematical Imaging and Vision57 (2017), 1–25, doi:10.1007/s10851-016-0662-8,arXiv:1508.07243. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 47 o...

  17. [17]

    Dempe and P

    S. Dempe and P. Mehlitz, Duality-Based Single-Level Reformulations of Bilevel Optimization Prob- lems,Journal of Optimization Theory and Applications205 (2025),doi:10.1007/s10957-025-02627-2

  18. [18]

    Dizon and T

    N. Dizon and T. Valkonen, Differential estimates for fast first-order multilevel nonconvex optimi- sation, 2024,arXiv:2412.01481v3. submitted

  19. [19]

    A. L. Dontchev and R. T. Rockafellar,Implicit Functions and Solution Mappings: A View from Variational Analysis, Springer Series in Operations Research and Financial Engineering, Springer, 2014,doi:10.1007/978-1-4939-1037-3

  20. [20]

    M. J. Ehrhardt and L. Roberts, Inexact derivative-free optimization for bilevel learning,Journal of mathematical imaging and vision63 (2021), 580–600,doi:10.1007/s10851-021-01020-8

  21. [21]

    M. C. Ferris, O. L. Mangasarian, and J. S. Pang,Complementarity: Applications, Algorithms and Extensions, Springer US, 2001,doi:10.1007/978-1-4757-3279-5

  22. [22]

    Franzen, Kodak lossless true color image suite, PhotoCD PCD0992

    R. Franzen, Kodak lossless true color image suite, PhotoCD PCD0992. Lossless, true color images released by the Eastman Kodak Company, 1999,http://r0k.us/graphics/kodak/

  23. [23]

    Ghadimi and M

    S. Ghadimi and M. Wang, Approximation methods for bilevel programming, 2018,arXiv:1802. 02246. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 48 of 56

  24. [24]

    N. T. V. Hang, W. Jung, and E. Sarabi, Role of Subgradients in Variational Analysis of Polyhedral Functions,Journal of Optimization Theory and Applications200 (2024), 1160–1192,doi:10.1007/ s10957-024-02378-6

  25. [25]

    N. T. V. Hang and M. E. Sarabi, A Chain Rule for Strict Twice Epi-Differentiability and Its Appli- cations,SIAM Journal on Optimization34 (2024), 918–945,doi:10.1137/22m1520025

  26. [26]

    He and X

    B. He and X. Yuan, Convergence Analysis of Primal-Dual Algorithms for a Saddle-Point Problem: From Contraction Perspective,SIAM Journal on Imaging Sciences5 (2012), 119–149,doi:10.1137/ 100814494

  27. [27]

    Hinze, R

    M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich,Optimization with PDE Constraints, number 23 in Mathematical Modelling: Theory and Applications, Springer Netherlands, 2009,doi:10.1007/ 978-1-4020-8839-1

  28. [28]

    M. Hong, H. T. Wai, Z. Wang, and Z. Yang, A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic, 2020,arXiv:2007.05170

  29. [29]

    A. D. Ioffe,Variational Analysis of Regular Mappings: Theory and Applications, Springer Mono- graphs in Mathematics, Springer, 2017,doi:10.1007/978-3-319-64277-2

  30. [30]

    Dynamic inverse problems: Single-loop online algorithms

    J. Jauhiainen, Y. Nabou, and T. Valkonen, Dynamic inverse problems: Single-loop online algo- rithms, 2026,arXiv:2606.10951. submitted

  31. [31]

    Ji and Y

    K. Ji and Y. Liang, Lower bounds and accelerated algorithms for bilevel optimization,Journal of Machine Learning Research23 (2022), 1–56

  32. [32]

    K. Ji, J. Yang, and Y. Liang, Bilevel optimization: Convergence analysis and enhanced design, in International Conference on Machine Learning, 2021, 4882–4892

  33. [33]

    L. O. Jolaoso, P. Mehlitz, and A. B. Zemkoho, A fresh look at nonsmooth Levenberg–Marquardt methods with applications to bilevel optimization,Optimization74 (2024), 2745–2792,doi:10.1080/ 02331934.2024.2313688

  34. [34]

    J. Kwon, D. Kwon, S. Wright, and R. D. Nowak, A fully first-order method for stochastic bilevel optimization, inInternational Conference on Machine Learning, 2023, 18083–18113

  35. [35]

    J. Li, B. Gu, and H. Huang, A fully single loop algorithm for bilevel optimization without hessian inverse, inProceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2022, 7426–7434

  36. [36]

    Z. Q. Luo, J. S. Pang, and D. Ralph,Mathematical Programs with Equilibrium Constraints, Cam- bridge University Press, 1996,doi:10.1017/cbo9780511983658

  37. [37]

    B. S. Mordukhovich,Variational Analysis and Generalized Differentiation I: Basic Theory, vol- ume 330 of Grundlehren der mathematischen Wissenschaften, Springer, 2006,doi:10.1007/ 3-540-31247-1

  38. [38]

    B. S. Mordukhovich, N. M. Nam, and N. D. Yen, Fréchet subdifferential calculus and optimal- ity conditions in nondifferentiable programming,Optimization55 (2006), 685–708,doi:10.1080/ 02331930600816395

  39. [39]

    J. V. Outrata, On the numerical solution of a class of Stackelberg problems,ZOR Zeitschrift für Operations Research Methods and Models of Operations Research34 (1990), 255–277,doi:10.1007/ bf01416737. E. Suonperä, T. Valkonen Single-loop approaches to nonsmooth bilevel optimisation Manuscript, 2026-06-17 page 49 of 56

  40. [40]

    Qi and J

    L. Qi and J. Sun, A trust region algorithm for minimization of locally Lipschitzian functions, Mathematical Programming66 (1994), 25–43,doi:10.1007/bf01581136

  41. [41]

    R. T. Rockafellar and R. J. B. Wets,Variational Analysis, Springer, 1998,doi:10.1007/ 978-3-642-02431-3

  42. [42]

    M. S. Salehi, S. Mukherjee, L. Roberts, , and M. J. Ehrhardt, An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning, 2024,arXiv:2308. 10098

  43. [43]

    Suonperä and T

    E. Suonperä and T. Valkonen, Linearly convergent bilevel optimization with single-step inner methods,Computational Optimization and Applications(2023),doi:10.1007/s10589-023-00527-7, arXiv:2205.04862

  44. [44]

    Suonperä and T

    E. Suonperä and T. Valkonen, Single-loop methods for bilevel parameter learning in inverse imaging,Journal of Nonsmooth Analysis and Optimization5 (2025),doi:10.46298/jnsao-2025-15577, arXiv:2408.08123

  45. [45]

    Valkonen, Preconditioned proximal point methods and notions of partial subregularity,Journal of Convex Analysis28 (2021), 251–278,arXiv:1711.05123

    T. Valkonen, Preconditioned proximal point methods and notions of partial subregularity,Journal of Convex Analysis28 (2021), 251–278,arXiv:1711.05123

  46. [46]

    Vilches, D

    E. Vilches, D. Villacís, and P. Pérez-Aros, A Variational Analysis Approach for Bilevel Hyperpa- rameter Optimization with Sparse Regularization, 2025,https://optimization-online.org/?p=30877

  47. [47]

    Wachsmuth and D

    G. Wachsmuth and D. Walter, Proximal gradient methods in Banach spaces, 2025,arXiv:2509. 24685

  48. [48]

    a priori

    J. Yang, K. Ji, and Y. Liang, Provably faster algorithms for bilevel optimization,Advances in Neural Information Processing Systems34 (2021), 13670–13682. appendix a contractivity under metric subregularity With the goal of verifying the inner tracking property for standard inner algorithms, subject to metric subregularity, we prove here the linear conver...

  49. [49]

    Otherwise: the point (𝑧, 𝑦∗) does not belong to the closed Graph𝜕𝛿 𝐵ℝ2 (0,𝐶) and thus (b.1) cannot be satisfied. The limiting coderivative equals the Fréchet coderivative with the following exceptions: • when ∥𝑧∥=𝐶 and ⟨𝑦, 𝑧⟩=0 , we have b𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧[0,∞) but taking 𝑦𝑘 ∗ ∈𝑧(0,∞) such that 𝑦𝑘 ∗ →0 we obtain 𝐷 ∗𝜕𝛿𝐵ℝ2 (0,𝐶) (𝑧|0) (𝑦)=𝑧ℝ from t...