pith. sign in

arxiv: 2605.30531 · v1 · pith:GDTA7U6Cnew · submitted 2026-05-28 · 🧮 math.OC

Benchmarking Bilevel Derivative-Free Optimization Algorithms

Pith reviewed 2026-06-29 05:36 UTC · model grok-4.3

classification 🧮 math.OC
keywords bilevel optimizationderivative-free optimizationbenchmarking methodologyrefereeing procedureadmissibilitycomputational costnested optimizationperformance profiles
0
0 comments X

The pith

A post-optimization refereeing procedure validates solution admissibility and accounts for total solver effort to enable fair benchmarking of bilevel derivative-free optimization algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a new benchmarking methodology for BL-DFO algorithms that addresses two common shortcomings in existing comparisons. It defines a refereeing procedure applied after optimization to discard points that fail to satisfy all constraints or are not optimal for the lower-level problem. The approach also incorporates the computational work performed by both the upper-level and lower-level solvers into the overall cost metric. A sympathetic reader would care because prior benchmarks risk crediting algorithms that return invalid solutions or that conceal expensive nested solves. Numerical experiments in the paper demonstrate the methodology on test problems.

Core claim

Existing BL-DFO benchmarking techniques often fail to rigorously validate the admissibility of proposed solutions and do not adequately account for the computational effort deployed by the upper- and lower-level solvers; the proposed methodology introduces a post-optimization refereeing procedure that discards non-admissible points and folds both levels' effort into the total cost, thereby producing fairer algorithm comparisons.

What carries the argument

The refereeing procedure, a post-optimization check that verifies constraint satisfaction and lower-level optimality to filter admissible points for comparison.

If this is right

  • Only admissible points contribute to performance profiles and data profiles in the benchmark.
  • Total computational cost reflects the combined work of upper- and lower-level solvers rather than upper-level calls alone.
  • Algorithms that produce many non-admissible points are penalized by exclusion from the comparison set.
  • Benchmark results become reproducible across different solver implementations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same refereeing idea could be adapted to benchmark other nested or hierarchical optimization methods beyond bilevel problems.
  • Standard test libraries for bilevel problems would need to supply explicit lower-level optimality certificates for the refereeing step to scale.
  • If the refereeing procedure proves cheap relative to the solves, it could be inserted into the algorithms themselves rather than used only for post-hoc comparison.

Load-bearing premise

The refereeing procedure can be defined and applied uniformly to any BL-DFO algorithm to correctly identify non-admissible points without needing solver internals or introducing selection bias.

What would settle it

Applying the refereeing procedure to a set of known admissible and non-admissible test solutions from multiple BL-DFO algorithms and finding that it either accepts clearly invalid points or rejects valid ones in a solver-dependent pattern.

Figures

Figures reproduced from arXiv: 2605.30531 by Charles Audet, Valentin Dijon, Youssef Diouane.

Figure 1
Figure 1. Figure 1: Data profiles considering as effort metric solely upper-level evaluations [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Data profiles employing (from left to right) No Referee, End-Point, Reverse and Complete [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Data profiles employing (from left to right) No Referee and Internal End-Point, Reverse [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
read the original abstract

Bilevel optimization involves an upper-level and a lower-level decision maker. The lower-level optimization problem is nested within the constraints of the upper-level one. A point is said to be admissible for the bilevel problem if it satisfies all constraints and is optimal for the lower-level decision-maker. Bilevel derivative-free optimization (BL-DFO) algorithms address bilevel optimization problems in which either the upper-level or the lower-level problem is solved using a derivative-free optimization method. In this context, existing BL-DFO benchmarking techniques often do not rigorously validate the admissibility of proposed solutions, and do not adequately account for the computational effort deployed by the upper- and lower-level solvers. This work proposes a benchmarking methodology for BL-DFO algorithms. A post-optimization procedure, named refereeing procedure, is introduced to discard non-admissible points and ensure a fair comparison between the algorithms. The computational effort deployed by upper- and lower-level solvers are also taken into account into the overall computational cost. Numerical experiments illustrate the benchmarking methodology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a benchmarking methodology for bilevel derivative-free optimization (BL-DFO) algorithms. It introduces a post-optimization 'refereeing procedure' to discard non-admissible points (those not optimal for the lower-level problem) and ensure fair comparisons between algorithms. The methodology incorporates the computational effort of both upper- and lower-level solvers into the overall cost metric. Numerical experiments are presented to illustrate the approach.

Significance. If the refereeing procedure can be shown to work generally for arbitrary BL-DFO methods without introducing selection bias or unaccounted costs, the work would address a clear gap in existing benchmarking practices that fail to validate admissibility or properly track solver effort. This could lead to more reliable algorithm comparisons in the field.

major comments (2)
  1. [Abstract and refereeing-procedure section] The refereeing procedure (described in the abstract and the section introducing the methodology) is presented as a general post-optimization step that identifies admissible points using only the candidate point and problem definition. No explicit mechanism is given for verifying lower-level optimality when the lower level is solved by a black-box DFO solver; any practical implementation would appear to require either re-solving the lower level (risking bias from a different solver) or access to algorithm internals (violating the derivative-free and general setting). This is load-bearing for the central claim of fair, unbiased comparison.
  2. [Cost-model section] The cost model (in the section defining the overall computational cost) includes only upper- and lower-level solver effort but excludes any computational cost incurred by the refereeing procedure itself. If verification of lower-level optimality requires additional function evaluations or optimization steps, the metric as stated does not fully account for total effort.
minor comments (1)
  1. [Abstract] The abstract states that numerical experiments illustrate the methodology, but provides no information on the test problems used, the specific BL-DFO algorithms compared, performance metrics, or quantitative outcomes; this makes it impossible to assess whether the experiments support the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify key aspects of our proposed benchmarking methodology. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract and refereeing-procedure section] The refereeing procedure (described in the abstract and the section introducing the methodology) is presented as a general post-optimization step that identifies admissible points using only the candidate point and problem definition. No explicit mechanism is given for verifying lower-level optimality when the lower level is solved by a black-box DFO solver; any practical implementation would appear to require either re-solving the lower level (risking bias from a different solver) or access to algorithm internals (violating the derivative-free and general setting). This is load-bearing for the central claim of fair, unbiased comparison.

    Authors: The refereeing procedure is defined to operate post-optimization using only the reported candidate point and the mathematical definition of the bilevel problem. In the benchmarking setting we consider, verification of lower-level optimality is performed by the benchmarker via an independent, standardized solver that is not one of the algorithms under test; this avoids both access to algorithm internals and direct reuse of the tested solver. We acknowledge that the original manuscript does not spell out this implementation detail with sufficient precision. In the revised version we will add an explicit subsection describing the verification protocol, including the requirement that the verification solver be fixed across all compared algorithms and that its effort be recorded separately. revision: yes

  2. Referee: [Cost-model section] The cost model (in the section defining the overall computational cost) includes only upper- and lower-level solver effort but excludes any computational cost incurred by the refereeing procedure itself. If verification of lower-level optimality requires additional function evaluations or optimization steps, the metric as stated does not fully account for total effort.

    Authors: We agree that any function evaluations or optimization steps performed during refereeing constitute additional computational effort. Our current cost metric is deliberately restricted to the effort expended by the upper- and lower-level solvers inside each tested algorithm; the refereeing step is viewed as part of the external benchmarking infrastructure rather than part of an algorithm’s runtime. Nevertheless, the referee’s observation is valid for transparency. We will revise the cost-model section to state explicitly that refereeing costs are tracked and reported separately, and we will include a short discussion of when those costs may become non-negligible. revision: partial

Circularity Check

0 steps flagged

No circularity: methodological proposal with independent refereeing procedure

full rationale

The paper introduces a benchmarking methodology and post-optimization refereeing procedure for BL-DFO algorithms to validate admissibility and account for solver effort. No derivations, equations, or predictions are present that reduce claims to inputs by construction. No self-citations are invoked as load-bearing for uniqueness or ansatzes. The central contribution is a forward methodology definition, self-contained against external benchmarks without fitted parameters or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the effectiveness of the newly introduced refereeing procedure, which has no independent evidence or prior validation provided.

invented entities (1)
  • refereeing procedure no independent evidence
    purpose: Discard non-admissible points after optimization to ensure fair benchmarking
    Newly introduced post-optimization step with no external validation mentioned.

pith-pipeline@v0.9.1-grok · 5707 in / 1212 out tokens · 37232 ms · 2026-06-29T05:36:41.220918+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 6 canonical work pages · 2 internal anchors

  1. [1]

    Aghasi and S

    A. Aghasi and S. Ghadimi. Fully zeroth-order bilevel programming via Gaussian smoothing.Journal of Optimization Theory and Applications, 205(2):31, 2025

  2. [2]

    Aghasi, I

    A. Aghasi, I. Mendoza-Sanchez, L. Abriola, and E. Miller. Joint electrical and hydrological inversion for reconstruction of subsurface contaminant source zones. InIEEE International Geoscience and Remote Sensing Symposium, pages 7059–7062, 2012

  3. [3]

    Audet and J

    C. Audet and J. Dennis, Jr. Mesh Adaptive Direct Search Algorithms for Constrained Optimization. SIAM Journal on Optimization, 17(1):188–217, 2006

  4. [4]

    Audet and W

    C. Audet and W. Hare.Derivative-Free and Blackbox Optimization. Springer Series in Operations Research and Financial Engineering. Springer, Cham, Switzerland, 2017

  5. [5]

    Audet, S

    C. Audet, S. Le Digabel, V. Rochon Montplaisir, and C. Tribes. Algorithm 1027: NOMAD version 4: Nonlinear optimization with the MADS algorithm.ACM Transactions on Mathematical Software, 48 (3):35:1–35:22, 2022

  6. [6]

    Audet, W

    C. Audet, W. Hare, and C. Tribes. A summary of benchmarking constrained, multi-objective and surrogate-assisted derivative-free optimization methods.Optimization Letters, 2026. In press

  7. [7]

    F. Bao, G. Wu, C. Li, J. Zhu, and B. Zhang. Stability and generalization of bilevel programming in hyperparameter optimization.Advances in neural information processing systems, 34:4529–4541, 2021

  8. [8]

    J. Bard. An algorithm for solving the general bilevel programming problem.Mathematics of Operations research, 8(2):260–272, 1983

  9. [9]

    Beck and M

    A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009

  10. [10]

    Beiranvand, W

    V. Beiranvand, W. Hare, and Y. Lucet. Best practices for comparing optimization algorithms.Opti- mization and Engineering, 18(4):815–848, 2017. 19

  11. [11]

    Cesaroni, G

    E. Cesaroni, G. Liuzzi, and S. Lucidi. Derivative-Free Bilevel Optimization with Inexact Lower-Level Solutions. Technical Report 2603.20759, arXiv, 2026

  12. [12]

    R. Chew, Q. Nguyen, and B. Low. BILBO: BILevel Bayesian Optimization. Technical Report 2502.02121, arXiv, 2025

  13. [13]

    Colson, P

    B. Colson, P. Marcotte, and G. Savard. A trust-region method for nonlinear bilevel programming: algorithm and computational experience.Computational Optimization and Applications, 30(3):211– 227, 2005

  14. [14]

    Colson, P

    B. Colson, P. Marcotte, and G. Savard. An overview of bilevel optimization.Annals of Operations Research, 153(1):235–256, 2007

  15. [15]

    Conn and S

    A. Conn and S. Le Digabel. Use of quadratic models with mesh-adaptive direct search for constrained black box optimization.Optimization Methods and Software, 28(1):139–158, 2013

  16. [16]

    Conn and L

    A. Conn and L. Vicente. Bilevel derivative-free optimization and its application to robust optimization. Optimization Methods and Software, 27(3):561–577, 2012

  17. [17]

    Das and P

    S. Das and P. Suganthan. Differential evolution: A survey of the state-of-the-art.IEEE transactions on evolutionary computation, 15(1):4–31, 2010

  18. [18]

    S. Dempe. A necessary and a sufficient optimality condition for bilevel programming problems.Opti- mization, 25(4):341–354, 1992

  19. [19]

    Dempe.Foundations of bilevel programming

    S. Dempe.Foundations of bilevel programming. MA: Springer US, 2002

  20. [20]

    Diouane, V

    Y. Diouane, V. Kungurtsev, F. Rinaldi, and D. Zeffiro. Inexact direct-search methods for bilevel optimization problems.Computational Optimization and Applications, 88(2):469–490, 2024

  21. [21]

    Dogan and S

    V. Dogan and S. Prestwich. Bayesian optimization with multi-objective acquisition function for bilevel problems. InIrish Conference on Artificial Intelligence and Cognitive Science, pages 409–422. Cham: Springer Nature Switzerland, 2022

  22. [22]

    Dogan and S

    V. Dogan and S. Prestwich. Bilevel optimization by conditional Bayesian optimization. InInternational Conference on Machine Learning, Optimization, and Data Science, pages 243–258. Cham: Springer Nature Switzerland, 2023

  23. [23]

    Dolan and J

    E. Dolan and J. Mor´ e. Benchmarking optimization software with performance profiles.Mathematical Programming, 91(2):201–213, 2002

  24. [24]

    Dzahini, F

    K. Dzahini, F. Rinaldi, C. Royer, and D. Zeffiro. Direct-search methods in the year 2025: Theoretical guarantees and algorithmic paradigms.EURO Journal on Computational Optimization, 13:100110, 2025

  25. [25]

    Eggleston and W

    K. Eggleston and W. Yip. Hospital competition under regulated prices: application to urban health sector reforms in China.International Journal of health care finance and economics, 4(4):343–368, 2004

  26. [26]

    Ehrhardt and L

    M. Ehrhardt and L. Roberts. Inexact derivative-free optimization for bilevel learning.Journal of mathematical imaging and vision, 63(5):580–600, 2021

  27. [27]

    Ekmekcioglu, N

    O. Ekmekcioglu, N. Aydin, and J. Branke. Bayesian Optimization of Bilevel Problems. Technical Report 2412.18518, arXiv, 2024

  28. [28]

    Fermi and N

    E. Fermi and N. Metropolis. Numerical solution of a minimum problem. Los Alamos Unclassified Report LA–1492, Los Alamos National Laboratory, Los Alamos, USA, 1952

  29. [29]

    Franceschi, P

    L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. Bilevel Programming for Hyperparameter Optimization and Meta-Learning. InInternational Conference on Machine Learning, pages 1568–1577, 2018

  30. [30]

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014

  31. [31]

    Gulrajani, F

    I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasser- stein gans.Advances in neural information processing systems, 30, 2017

  32. [32]

    Hansen, B

    P. Hansen, B. Jaumard, and G. Savard. New Branch-and-Bound Rules for Linear Bilevel Programming. SIAM Journal on Scientific and Statistical Computing, 13(5):1194–1217, 1992

  33. [33]

    Direct Search

    R. Hooke and T. Jeeves. “Direct Search” Solution of Numerical and Statistical Problems.Journal of the Association for Computing Machinery, 8(2):212–229, 1961

  34. [34]

    Islam, H

    M. Islam, H. Singh, and T. Ray. A surrogate assisted approach for single-objective bilevel optimization. IEEE Transactions on Evolutionary Computation, 21(5):681–696, 2017

  35. [35]

    Islam, H

    M. Islam, H. Singh, and T. Ray. Efficient global optimization for solving computationally expensive bilevel optimization problems. In2018 IEEE congress on evolutionary computation (CEC), pages 1–8, 20 2018

  36. [36]

    Jiang, K

    H. Jiang, K. Chou, Y. Tian, X. Zhang, and Y. Jin. Efficient surrogate modeling method for evolutionary algorithm to solve bilevel optimization problems.IEEE transactions on cybernetics, 54(7):4335–4347, 2023

  37. [37]

    Jones, M

    D. Jones, M. Schonlau, and W. Welch. Efficient Global Optimization of Expensive Black Box Functions. Journal of Global Optimization, 13(4):455–492, 1998

  38. [38]

    Kieffer, G

    E. Kieffer, G. Danoy, P. Bouvry, and A. Nagih. Bayesian optimization approach of general bi-level problems. InProceedings of the genetic and evolutionary computation conference companion, pages 1614–1621, 2017

  39. [39]

    Kleinert, M

    T. Kleinert, M. Labb´ e, I. Ljubi´ c, and M. Schmidt. A survey on mixed-integer programming techniques in bilevel optimization.EURO Journal on Computational Optimization, 9:100007, 2021

  40. [40]

    D. Kraft. A software package for sequential quadratic programming. DFVLR-FB 88–28, Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt, Braunschweig, Germany, 1988

  41. [41]

    LeBlanc and D

    L. LeBlanc and D. Boyce. A bilevel programming algorithm for exact solution of the network design problem with user-optimal flows.Transportation Research Part B: Methodological, 20(3):259–265, 1986

  42. [42]

    Lewis, V

    R. Lewis, V. Torczon, and M. Trosset. Direct Search Methods: Then and Now.Journal of Computational and Applied Mathematics, 124(1–2):191–207, 2000

  43. [43]

    Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇ ci´ c, T. Hou, and M. Tegmark. Kan: Kolmogorov-arnold networks. Technical Report 2404.19756, arXiv, 2024

  44. [44]

    K. Ma, L. Rios, H. Zheng, N. Sahinidis, and S. Rajagopalan. Model-and-search: a derivative-free local optimization algorithm.Computational Optimization and Applications, 92(3):889–921, 2025

  45. [45]

    Z. Ma, Z. Huang, J. Chen, Z. Cao, and Y. Gong. Surrogate learning in meta-black-box optimization: A preliminary study. InProceedings of the Genetic and Evolutionary Computation Conference, pages 1137–1145, 2025

  46. [46]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. Technical Report 1706.06083, arXiv, 2017

  47. [47]

    Maheshwari, S

    C. Maheshwari, S. Sasty, L. Ratliff, and E. Mazumdar. Convergent first-order methods for bi-level optimization and stackelberg games. Technical Report 2302.01421, arXiv, 2023

  48. [48]

    T. Man, X. Li, Z. Liu, H. Zhang, B. Gu, and Y. Chang. Enhancing Black-Box Adversarial Attacks on Discrete Sequential Data via Bilevel Bayesian Optimization in Hybrid Spaces. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1032–1043, 2025

  49. [49]

    Mersha and S

    A. Mersha and S. Dempe. Direct search algorithm for bilevel programming problems.Computational Optimization and Applications, 49(1):1–15, 2011

  50. [50]

    Mor´ e and S

    J. Mor´ e and S. Wild. Benchmarking Derivative-Free Optimization Algorithms.SIAM Journal on Optimization, 20(1):172–191, 2009

  51. [51]

    Nguyen, R

    V. Nguyen, R. Johnson, F. Sup, and B. Umberger. Bilevel optimization for cost function determina- tion in dynamic simulation of human gait.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7):1426–1435, 2019

  52. [52]

    J. Outrata. On the numerical solution of a class of Stackelberg problems.Zeitschrift f¨ ur Operations Research, 34(4):255–277, 1990

  53. [53]

    Papavassilopoulos

    G. Papavassilopoulos. Algorithms for static Stackelberg games with linear costs and polyhedra con- straints. In1982 21st IEEE Conference on Decision and Control, pages 647–652, 1982

  54. [54]

    Sinha, P

    A. Sinha, P. Malo, and K. Deb. Test problem construction for single-objective bilevel optimization. Evolutionary computation, 22(3):439–477, 2014

  55. [55]

    Staudigl, S

    M. Staudigl, S. Weissmann, and T. Van Leeuwen. Derivative-free stochastic bilevel optimization for inverse problems.Computational Optimization and Applications, 93(3):967–1021, 2025

  56. [56]

    Von Stackelberg, A

    H. Von Stackelberg, A. Peacock, E. Schneider, and T. Hutchison. The theory of the market economy. Economica, 20(80):384, 1953

  57. [57]

    W¨ achter and L

    A. W¨ achter and L. T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming.Mathematical Programming, 106(1):25–57, 2006

  58. [58]

    Y. Yuan. A trust region algorithm for Nash equilibrium problems.Pacific Journal of Optimization, 7 (1):125–138, 2011

  59. [59]

    Zemkoho and S

    A. Zemkoho and S. Zhou. Theoretical and numerical comparison of the Karush–Kuhn–Tucker and value 21 function reformulations in bilevel optimization.Computational Optimization and Applications, 78(2): 625–674, 2021

  60. [60]

    Zhang and G

    D. Zhang and G. Lin. Bilevel direct search method for leader–follower problems and application in health insurance.Computers and Operations Research, 41:359–373, 2014

  61. [61]

    S. Zhou, A. Zemkoho, and A. Tin. BOLIB: bilevel optimization LIBrary of test problems.Bilevel Optimization: Advances and Next Challenges, 161:563–580, 2020. 22