Benchmarking Bilevel Derivative-Free Optimization Algorithms
Pith reviewed 2026-06-29 05:36 UTC · model grok-4.3
The pith
A post-optimization refereeing procedure validates solution admissibility and accounts for total solver effort to enable fair benchmarking of bilevel derivative-free optimization algorithms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Existing BL-DFO benchmarking techniques often fail to rigorously validate the admissibility of proposed solutions and do not adequately account for the computational effort deployed by the upper- and lower-level solvers; the proposed methodology introduces a post-optimization refereeing procedure that discards non-admissible points and folds both levels' effort into the total cost, thereby producing fairer algorithm comparisons.
What carries the argument
The refereeing procedure, a post-optimization check that verifies constraint satisfaction and lower-level optimality to filter admissible points for comparison.
If this is right
- Only admissible points contribute to performance profiles and data profiles in the benchmark.
- Total computational cost reflects the combined work of upper- and lower-level solvers rather than upper-level calls alone.
- Algorithms that produce many non-admissible points are penalized by exclusion from the comparison set.
- Benchmark results become reproducible across different solver implementations.
Where Pith is reading between the lines
- The same refereeing idea could be adapted to benchmark other nested or hierarchical optimization methods beyond bilevel problems.
- Standard test libraries for bilevel problems would need to supply explicit lower-level optimality certificates for the refereeing step to scale.
- If the refereeing procedure proves cheap relative to the solves, it could be inserted into the algorithms themselves rather than used only for post-hoc comparison.
Load-bearing premise
The refereeing procedure can be defined and applied uniformly to any BL-DFO algorithm to correctly identify non-admissible points without needing solver internals or introducing selection bias.
What would settle it
Applying the refereeing procedure to a set of known admissible and non-admissible test solutions from multiple BL-DFO algorithms and finding that it either accepts clearly invalid points or rejects valid ones in a solver-dependent pattern.
Figures
read the original abstract
Bilevel optimization involves an upper-level and a lower-level decision maker. The lower-level optimization problem is nested within the constraints of the upper-level one. A point is said to be admissible for the bilevel problem if it satisfies all constraints and is optimal for the lower-level decision-maker. Bilevel derivative-free optimization (BL-DFO) algorithms address bilevel optimization problems in which either the upper-level or the lower-level problem is solved using a derivative-free optimization method. In this context, existing BL-DFO benchmarking techniques often do not rigorously validate the admissibility of proposed solutions, and do not adequately account for the computational effort deployed by the upper- and lower-level solvers. This work proposes a benchmarking methodology for BL-DFO algorithms. A post-optimization procedure, named refereeing procedure, is introduced to discard non-admissible points and ensure a fair comparison between the algorithms. The computational effort deployed by upper- and lower-level solvers are also taken into account into the overall computational cost. Numerical experiments illustrate the benchmarking methodology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a benchmarking methodology for bilevel derivative-free optimization (BL-DFO) algorithms. It introduces a post-optimization 'refereeing procedure' to discard non-admissible points (those not optimal for the lower-level problem) and ensure fair comparisons between algorithms. The methodology incorporates the computational effort of both upper- and lower-level solvers into the overall cost metric. Numerical experiments are presented to illustrate the approach.
Significance. If the refereeing procedure can be shown to work generally for arbitrary BL-DFO methods without introducing selection bias or unaccounted costs, the work would address a clear gap in existing benchmarking practices that fail to validate admissibility or properly track solver effort. This could lead to more reliable algorithm comparisons in the field.
major comments (2)
- [Abstract and refereeing-procedure section] The refereeing procedure (described in the abstract and the section introducing the methodology) is presented as a general post-optimization step that identifies admissible points using only the candidate point and problem definition. No explicit mechanism is given for verifying lower-level optimality when the lower level is solved by a black-box DFO solver; any practical implementation would appear to require either re-solving the lower level (risking bias from a different solver) or access to algorithm internals (violating the derivative-free and general setting). This is load-bearing for the central claim of fair, unbiased comparison.
- [Cost-model section] The cost model (in the section defining the overall computational cost) includes only upper- and lower-level solver effort but excludes any computational cost incurred by the refereeing procedure itself. If verification of lower-level optimality requires additional function evaluations or optimization steps, the metric as stated does not fully account for total effort.
minor comments (1)
- [Abstract] The abstract states that numerical experiments illustrate the methodology, but provides no information on the test problems used, the specific BL-DFO algorithms compared, performance metrics, or quantitative outcomes; this makes it impossible to assess whether the experiments support the claims.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify key aspects of our proposed benchmarking methodology. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract and refereeing-procedure section] The refereeing procedure (described in the abstract and the section introducing the methodology) is presented as a general post-optimization step that identifies admissible points using only the candidate point and problem definition. No explicit mechanism is given for verifying lower-level optimality when the lower level is solved by a black-box DFO solver; any practical implementation would appear to require either re-solving the lower level (risking bias from a different solver) or access to algorithm internals (violating the derivative-free and general setting). This is load-bearing for the central claim of fair, unbiased comparison.
Authors: The refereeing procedure is defined to operate post-optimization using only the reported candidate point and the mathematical definition of the bilevel problem. In the benchmarking setting we consider, verification of lower-level optimality is performed by the benchmarker via an independent, standardized solver that is not one of the algorithms under test; this avoids both access to algorithm internals and direct reuse of the tested solver. We acknowledge that the original manuscript does not spell out this implementation detail with sufficient precision. In the revised version we will add an explicit subsection describing the verification protocol, including the requirement that the verification solver be fixed across all compared algorithms and that its effort be recorded separately. revision: yes
-
Referee: [Cost-model section] The cost model (in the section defining the overall computational cost) includes only upper- and lower-level solver effort but excludes any computational cost incurred by the refereeing procedure itself. If verification of lower-level optimality requires additional function evaluations or optimization steps, the metric as stated does not fully account for total effort.
Authors: We agree that any function evaluations or optimization steps performed during refereeing constitute additional computational effort. Our current cost metric is deliberately restricted to the effort expended by the upper- and lower-level solvers inside each tested algorithm; the refereeing step is viewed as part of the external benchmarking infrastructure rather than part of an algorithm’s runtime. Nevertheless, the referee’s observation is valid for transparency. We will revise the cost-model section to state explicitly that refereeing costs are tracked and reported separately, and we will include a short discussion of when those costs may become non-negligible. revision: partial
Circularity Check
No circularity: methodological proposal with independent refereeing procedure
full rationale
The paper introduces a benchmarking methodology and post-optimization refereeing procedure for BL-DFO algorithms to validate admissibility and account for solver effort. No derivations, equations, or predictions are present that reduce claims to inputs by construction. No self-citations are invoked as load-bearing for uniqueness or ansatzes. The central contribution is a forward methodology definition, self-contained against external benchmarks without fitted parameters or renaming of known results.
Axiom & Free-Parameter Ledger
invented entities (1)
-
refereeing procedure
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Aghasi and S
A. Aghasi and S. Ghadimi. Fully zeroth-order bilevel programming via Gaussian smoothing.Journal of Optimization Theory and Applications, 205(2):31, 2025
2025
-
[2]
Aghasi, I
A. Aghasi, I. Mendoza-Sanchez, L. Abriola, and E. Miller. Joint electrical and hydrological inversion for reconstruction of subsurface contaminant source zones. InIEEE International Geoscience and Remote Sensing Symposium, pages 7059–7062, 2012
2012
-
[3]
Audet and J
C. Audet and J. Dennis, Jr. Mesh Adaptive Direct Search Algorithms for Constrained Optimization. SIAM Journal on Optimization, 17(1):188–217, 2006
2006
-
[4]
Audet and W
C. Audet and W. Hare.Derivative-Free and Blackbox Optimization. Springer Series in Operations Research and Financial Engineering. Springer, Cham, Switzerland, 2017
2017
-
[5]
Audet, S
C. Audet, S. Le Digabel, V. Rochon Montplaisir, and C. Tribes. Algorithm 1027: NOMAD version 4: Nonlinear optimization with the MADS algorithm.ACM Transactions on Mathematical Software, 48 (3):35:1–35:22, 2022
2022
-
[6]
Audet, W
C. Audet, W. Hare, and C. Tribes. A summary of benchmarking constrained, multi-objective and surrogate-assisted derivative-free optimization methods.Optimization Letters, 2026. In press
2026
-
[7]
F. Bao, G. Wu, C. Li, J. Zhu, and B. Zhang. Stability and generalization of bilevel programming in hyperparameter optimization.Advances in neural information processing systems, 34:4529–4541, 2021
2021
-
[8]
J. Bard. An algorithm for solving the general bilevel programming problem.Mathematics of Operations research, 8(2):260–272, 1983
1983
-
[9]
Beck and M
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009
2009
-
[10]
Beiranvand, W
V. Beiranvand, W. Hare, and Y. Lucet. Best practices for comparing optimization algorithms.Opti- mization and Engineering, 18(4):815–848, 2017. 19
2017
-
[11]
E. Cesaroni, G. Liuzzi, and S. Lucidi. Derivative-Free Bilevel Optimization with Inexact Lower-Level Solutions. Technical Report 2603.20759, arXiv, 2026
- [12]
-
[13]
Colson, P
B. Colson, P. Marcotte, and G. Savard. A trust-region method for nonlinear bilevel programming: algorithm and computational experience.Computational Optimization and Applications, 30(3):211– 227, 2005
2005
-
[14]
Colson, P
B. Colson, P. Marcotte, and G. Savard. An overview of bilevel optimization.Annals of Operations Research, 153(1):235–256, 2007
2007
-
[15]
Conn and S
A. Conn and S. Le Digabel. Use of quadratic models with mesh-adaptive direct search for constrained black box optimization.Optimization Methods and Software, 28(1):139–158, 2013
2013
-
[16]
Conn and L
A. Conn and L. Vicente. Bilevel derivative-free optimization and its application to robust optimization. Optimization Methods and Software, 27(3):561–577, 2012
2012
-
[17]
Das and P
S. Das and P. Suganthan. Differential evolution: A survey of the state-of-the-art.IEEE transactions on evolutionary computation, 15(1):4–31, 2010
2010
-
[18]
S. Dempe. A necessary and a sufficient optimality condition for bilevel programming problems.Opti- mization, 25(4):341–354, 1992
1992
-
[19]
Dempe.Foundations of bilevel programming
S. Dempe.Foundations of bilevel programming. MA: Springer US, 2002
2002
-
[20]
Diouane, V
Y. Diouane, V. Kungurtsev, F. Rinaldi, and D. Zeffiro. Inexact direct-search methods for bilevel optimization problems.Computational Optimization and Applications, 88(2):469–490, 2024
2024
-
[21]
Dogan and S
V. Dogan and S. Prestwich. Bayesian optimization with multi-objective acquisition function for bilevel problems. InIrish Conference on Artificial Intelligence and Cognitive Science, pages 409–422. Cham: Springer Nature Switzerland, 2022
2022
-
[22]
Dogan and S
V. Dogan and S. Prestwich. Bilevel optimization by conditional Bayesian optimization. InInternational Conference on Machine Learning, Optimization, and Data Science, pages 243–258. Cham: Springer Nature Switzerland, 2023
2023
-
[23]
Dolan and J
E. Dolan and J. Mor´ e. Benchmarking optimization software with performance profiles.Mathematical Programming, 91(2):201–213, 2002
2002
-
[24]
Dzahini, F
K. Dzahini, F. Rinaldi, C. Royer, and D. Zeffiro. Direct-search methods in the year 2025: Theoretical guarantees and algorithmic paradigms.EURO Journal on Computational Optimization, 13:100110, 2025
2025
-
[25]
Eggleston and W
K. Eggleston and W. Yip. Hospital competition under regulated prices: application to urban health sector reforms in China.International Journal of health care finance and economics, 4(4):343–368, 2004
2004
-
[26]
Ehrhardt and L
M. Ehrhardt and L. Roberts. Inexact derivative-free optimization for bilevel learning.Journal of mathematical imaging and vision, 63(5):580–600, 2021
2021
-
[27]
O. Ekmekcioglu, N. Aydin, and J. Branke. Bayesian Optimization of Bilevel Problems. Technical Report 2412.18518, arXiv, 2024
-
[28]
Fermi and N
E. Fermi and N. Metropolis. Numerical solution of a minimum problem. Los Alamos Unclassified Report LA–1492, Los Alamos National Laboratory, Los Alamos, USA, 1952
1952
-
[29]
Franceschi, P
L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. Bilevel Programming for Hyperparameter Optimization and Meta-Learning. InInternational Conference on Machine Learning, pages 1568–1577, 2018
2018
-
[30]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014
2014
-
[31]
Gulrajani, F
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasser- stein gans.Advances in neural information processing systems, 30, 2017
2017
-
[32]
Hansen, B
P. Hansen, B. Jaumard, and G. Savard. New Branch-and-Bound Rules for Linear Bilevel Programming. SIAM Journal on Scientific and Statistical Computing, 13(5):1194–1217, 1992
1992
-
[33]
Direct Search
R. Hooke and T. Jeeves. “Direct Search” Solution of Numerical and Statistical Problems.Journal of the Association for Computing Machinery, 8(2):212–229, 1961
1961
-
[34]
Islam, H
M. Islam, H. Singh, and T. Ray. A surrogate assisted approach for single-objective bilevel optimization. IEEE Transactions on Evolutionary Computation, 21(5):681–696, 2017
2017
-
[35]
Islam, H
M. Islam, H. Singh, and T. Ray. Efficient global optimization for solving computationally expensive bilevel optimization problems. In2018 IEEE congress on evolutionary computation (CEC), pages 1–8, 20 2018
2018
-
[36]
Jiang, K
H. Jiang, K. Chou, Y. Tian, X. Zhang, and Y. Jin. Efficient surrogate modeling method for evolutionary algorithm to solve bilevel optimization problems.IEEE transactions on cybernetics, 54(7):4335–4347, 2023
2023
-
[37]
Jones, M
D. Jones, M. Schonlau, and W. Welch. Efficient Global Optimization of Expensive Black Box Functions. Journal of Global Optimization, 13(4):455–492, 1998
1998
-
[38]
Kieffer, G
E. Kieffer, G. Danoy, P. Bouvry, and A. Nagih. Bayesian optimization approach of general bi-level problems. InProceedings of the genetic and evolutionary computation conference companion, pages 1614–1621, 2017
2017
-
[39]
Kleinert, M
T. Kleinert, M. Labb´ e, I. Ljubi´ c, and M. Schmidt. A survey on mixed-integer programming techniques in bilevel optimization.EURO Journal on Computational Optimization, 9:100007, 2021
2021
-
[40]
D. Kraft. A software package for sequential quadratic programming. DFVLR-FB 88–28, Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt, Braunschweig, Germany, 1988
1988
-
[41]
LeBlanc and D
L. LeBlanc and D. Boyce. A bilevel programming algorithm for exact solution of the network design problem with user-optimal flows.Transportation Research Part B: Methodological, 20(3):259–265, 1986
1986
-
[42]
Lewis, V
R. Lewis, V. Torczon, and M. Trosset. Direct Search Methods: Then and Now.Journal of Computational and Applied Mathematics, 124(1–2):191–207, 2000
2000
-
[43]
Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇ ci´ c, T. Hou, and M. Tegmark. Kan: Kolmogorov-arnold networks. Technical Report 2404.19756, arXiv, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
K. Ma, L. Rios, H. Zheng, N. Sahinidis, and S. Rajagopalan. Model-and-search: a derivative-free local optimization algorithm.Computational Optimization and Applications, 92(3):889–921, 2025
2025
-
[45]
Z. Ma, Z. Huang, J. Chen, Z. Cao, and Y. Gong. Surrogate learning in meta-black-box optimization: A preliminary study. InProceedings of the Genetic and Evolutionary Computation Conference, pages 1137–1145, 2025
2025
-
[46]
Towards Deep Learning Models Resistant to Adversarial Attacks
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. Technical Report 1706.06083, arXiv, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[47]
C. Maheshwari, S. Sasty, L. Ratliff, and E. Mazumdar. Convergent first-order methods for bi-level optimization and stackelberg games. Technical Report 2302.01421, arXiv, 2023
-
[48]
T. Man, X. Li, Z. Liu, H. Zhang, B. Gu, and Y. Chang. Enhancing Black-Box Adversarial Attacks on Discrete Sequential Data via Bilevel Bayesian Optimization in Hybrid Spaces. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1032–1043, 2025
2025
-
[49]
Mersha and S
A. Mersha and S. Dempe. Direct search algorithm for bilevel programming problems.Computational Optimization and Applications, 49(1):1–15, 2011
2011
-
[50]
Mor´ e and S
J. Mor´ e and S. Wild. Benchmarking Derivative-Free Optimization Algorithms.SIAM Journal on Optimization, 20(1):172–191, 2009
2009
-
[51]
Nguyen, R
V. Nguyen, R. Johnson, F. Sup, and B. Umberger. Bilevel optimization for cost function determina- tion in dynamic simulation of human gait.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7):1426–1435, 2019
2019
-
[52]
J. Outrata. On the numerical solution of a class of Stackelberg problems.Zeitschrift f¨ ur Operations Research, 34(4):255–277, 1990
1990
-
[53]
Papavassilopoulos
G. Papavassilopoulos. Algorithms for static Stackelberg games with linear costs and polyhedra con- straints. In1982 21st IEEE Conference on Decision and Control, pages 647–652, 1982
1982
-
[54]
Sinha, P
A. Sinha, P. Malo, and K. Deb. Test problem construction for single-objective bilevel optimization. Evolutionary computation, 22(3):439–477, 2014
2014
-
[55]
Staudigl, S
M. Staudigl, S. Weissmann, and T. Van Leeuwen. Derivative-free stochastic bilevel optimization for inverse problems.Computational Optimization and Applications, 93(3):967–1021, 2025
2025
-
[56]
Von Stackelberg, A
H. Von Stackelberg, A. Peacock, E. Schneider, and T. Hutchison. The theory of the market economy. Economica, 20(80):384, 1953
1953
-
[57]
W¨ achter and L
A. W¨ achter and L. T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming.Mathematical Programming, 106(1):25–57, 2006
2006
-
[58]
Y. Yuan. A trust region algorithm for Nash equilibrium problems.Pacific Journal of Optimization, 7 (1):125–138, 2011
2011
-
[59]
Zemkoho and S
A. Zemkoho and S. Zhou. Theoretical and numerical comparison of the Karush–Kuhn–Tucker and value 21 function reformulations in bilevel optimization.Computational Optimization and Applications, 78(2): 625–674, 2021
2021
-
[60]
Zhang and G
D. Zhang and G. Lin. Bilevel direct search method for leader–follower problems and application in health insurance.Computers and Operations Research, 41:359–373, 2014
2014
-
[61]
S. Zhou, A. Zemkoho, and A. Tin. BOLIB: bilevel optimization LIBrary of test problems.Bilevel Optimization: Advances and Next Challenges, 161:563–580, 2020. 22
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.