pith. sign in

arxiv: 2506.20335 · v2 · pith:EVSRTUIPnew · submitted 2025-06-25 · 🧮 math.OC

CLARSTA: A random subspace trust-region algorithm for convex-constrained derivative-free optimization

Pith reviewed 2026-05-19 08:17 UTC · model grok-4.3

classification 🧮 math.OC
keywords random subspacetrust-region algorithmderivative-free optimizationconvex constraintsglobal convergenceworst-case complexityGrassmann manifoldprojected models
0
0 comments X

The pith

A random subspace trust-region algorithm converges almost surely to first-order stationary points for convex-constrained derivative-free optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops CLARSTA, a trust-region method that works in randomly chosen low-dimensional subspaces to solve convex-constrained problems when derivatives are unavailable. It introduces a new model class that only needs to be accurate after the constraints are projected onto the current subspace, together with a geometry measure that makes such models straightforward to build and verify. Subspaces are drawn so that they retain a guaranteed fraction of the true first-order criticality measure with a probability lower bound taken from concentration results on the Grassmann manifold. These ingredients together deliver an almost-sure global convergence guarantee to a stationary point plus a worst-case complexity bound, and the method is shown to run reliably on problems with up to ten thousand variables.

Core claim

The algorithm achieves almost-sure global convergence to a first-order stationary point and admits a worst-case complexity analysis for general convex-constrained derivative-free optimization problems, provided the sampled subspaces preserve a sufficient fraction of the criticality measure with a probability lower bound derived from concentration of measure on the Grassmann manifold and the models satisfy the new accuracy requirement only on the projected constraint set.

What carries the argument

Random subspace sampling on the Grassmann manifold that preserves a positive fraction of the first-order criticality measure with a computable probability lower bound, paired with a new class of models required to be accurate only on the projection of the feasible set onto the current subspace.

If this is right

  • The method extends existing random-subspace DFO techniques from unconstrained to convex-constrained problems while retaining almost-sure convergence.
  • A new geometry measure simplifies the construction and management of models that are accurate only on projected feasible sets.
  • Worst-case complexity bounds become available for high-dimensional convex DFO under the stated subspace and model conditions.
  • Numerical tests indicate reliable performance on problems whose dimension reaches at least 10000.
  • The framework separates model accuracy from full-space geometry, allowing cheaper local models in each iteration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same projected-model idea could be tested on non-convex constraints or on problems with stochastic noise to see whether the convergence theory carries over.
  • Because each iteration works in a low-dimensional subspace, the approach may lower the per-iteration cost enough to make derivative-free methods practical for engineering design tasks with thousands of variables.
  • The Grassmann concentration argument supplies an explicit sampling probability that could be reused to design subspace methods for other derivative-free settings such as multi-objective or constrained least-squares problems.

Load-bearing premise

The randomly sampled subspaces must retain enough of the true first-order criticality measure with a positive probability lower bound, and the models need only be accurate after the constraints are projected into the subspace.

What would settle it

A sequence of independent runs on a simple convex-constrained DFO test problem in which the algorithm fails to reach a first-order stationary point with probability bounded away from zero, or in which the observed number of function evaluations exceeds the predicted worst-case bound by more than a constant factor.

read the original abstract

This paper proposes a random subspace trust-region algorithm for general convex-constrained derivative-free optimization (DFO) problems. Similar to previous random subspace DFO methods, the convergence of our algorithm requires a certain accuracy of models and a certain quality of subspaces. For model accuracy, we define a new class of models that is only required to provide reasonable accuracy on the projection of the constraint set onto the subspace. We provide a new geometry measure to make these models easy to analyze, construct, and manage. For subspace quality, we use the concentration of measure on the Grassmann manifold to provide a method to sample subspaces that preserve the first-order criticality measure by a certain fraction with a certain probability lower bound. Based on all these new theoretical results, we present an almost-sure global convergence and a worst-case complexity analysis of our algorithm. Numerical experiments on problems with dimensions up to 10000 demonstrate the reliable performance of our algorithm in high dimensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes CLARSTA, a random subspace trust-region algorithm for general convex-constrained derivative-free optimization. It defines a new class of models required to be accurate only on the projection of the constraint set onto the sampled subspace, introduces a new geometry measure to facilitate analysis and construction of such models, derives subspace sampling probabilities via concentration of measure on the Grassmann manifold that preserve a fraction of the first-order criticality measure, and establishes almost-sure global convergence to a first-order stationary point together with a worst-case complexity bound. Numerical experiments on problems with dimensions up to 10000 are presented to illustrate performance.

Significance. If the central claims hold, the work meaningfully extends random-subspace DFO methods to convex constraints by replacing full-space model accuracy with a projected-set condition and by supplying an explicit Grassmannian concentration argument for subspace quality. The new geometry measure is a targeted device that renders the projected accuracy condition amenable to standard trust-region decrease arguments, and the almost-sure convergence plus complexity results are obtained via Borel-Cantelli and standard DFO counting arguments once the probabilistic ingredients are in place. The high-dimensional numerical tests provide practical evidence that the approach scales beyond the reach of full-space methods. These elements constitute a solid, self-contained contribution to the derivative-free optimization literature.

major comments (2)
  1. [§3.2] §3.2, Definition 3.3 and Lemma 3.6: The new geometry measure is defined on the projected constraint set; the proof that this measure implies a uniform Cauchy decrease for the trust-region subproblem on the original feasible set is only sketched and appears to require an additional uniform bound on the distance between the projected and original feasible sets that is not stated explicitly.
  2. [§5.3] §5.3, Theorem 5.2: The worst-case complexity bound is stated as O(1/ε²) iterations with high probability; the dependence of the hidden constant on the Grassmann concentration probability p and on the subspace dimension k is not displayed, making it impossible to verify whether the bound remains useful when k grows with the ambient dimension.
minor comments (3)
  1. [§7] The numerical section reports results up to dimension 10000 but does not list the specific test-problem families, the choice of subspace dimension k, or the baseline solvers used for comparison; adding a table with these details would improve reproducibility.
  2. [§2.1] Notation for the projected feasible set P_C(x) is introduced without an explicit statement of how the projection is computed in practice when C is given only by oracles.
  3. [§1] A few sentences in the introduction repeat the abstract almost verbatim; a more concise motivation paragraph would improve flow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive comments. We address each major comment below and will incorporate the suggested clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2, Definition 3.3 and Lemma 3.6: The new geometry measure is defined on the projected constraint set; the proof that this measure implies a uniform Cauchy decrease for the trust-region subproblem on the original feasible set is only sketched and appears to require an additional uniform bound on the distance between the projected and original feasible sets that is not stated explicitly.

    Authors: We agree that the sketch in the proof of Lemma 3.6 can be expanded for clarity. The required uniform bound on the distance between the projected and original feasible sets follows directly from the convexity of the constraint set together with the fact that the orthogonal projection onto a subspace is non-expansive. We will insert an explicit auxiliary lemma stating and proving this bound (with a short argument using the definition of convexity and the projection operator) immediately before Lemma 3.6, and we will expand the subsequent decrease argument to reference the new lemma at each step. revision: yes

  2. Referee: [§5.3] §5.3, Theorem 5.2: The worst-case complexity bound is stated as O(1/ε²) iterations with high probability; the dependence of the hidden constant on the Grassmann concentration probability p and on the subspace dimension k is not displayed, making it impossible to verify whether the bound remains useful when k grows with the ambient dimension.

    Authors: We agree that making the dependence explicit will strengthen the result. The constant hidden in the O(1/ε²) bound depends on p through the Borel–Cantelli summation of the failure probabilities and on k through the Lipschitz constants of the model and the geometry measure in the k-dimensional subspace. We will restate Theorem 5.2 with the explicit factors (including the precise polynomial dependence on 1/p and on k) and will add a short remark after the proof indicating how the bound behaves when k = o(n) or k grows slowly with n. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces an independent new model class whose accuracy is required only on the projection of the constraint set, defines a new geometry measure to support analysis of these models, and derives subspace sampling guarantees from concentration of measure on the Grassmann manifold. These elements are used to establish almost-sure convergence and worst-case complexity bounds. None of the load-bearing steps reduce by construction to fitted inputs, self-definitions, or unverified self-citations; the central claims rest on newly stated and analyzed properties rather than renaming or circular reuse of prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claims rest on the existence of models satisfying the new projected accuracy condition and on probabilistic guarantees from concentration of measure; these are introduced rather than derived from prior results.

axioms (1)
  • domain assumption Concentration of measure on the Grassmann manifold yields a probability lower bound that subspaces preserve a fraction of the first-order criticality measure.
    Invoked to justify the random subspace sampling procedure.
invented entities (2)
  • New class of models accurate on the projection of the constraint set onto the subspace no independent evidence
    purpose: To relax full-space model accuracy requirements while enabling convergence analysis
    Defined to make models easier to construct and analyze for constrained problems
  • New geometry measure for the projected models no independent evidence
    purpose: To quantify and control model quality in the subspace setting
    Introduced to support construction and management of the new model class

pith-pipeline@v0.9.0 · 5695 in / 1406 out tokens · 36123 ms · 2026-05-19T08:17:29.788472+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Accuracy and Relationships of Quadratic Models in Derivative-free Optimization

    math.OC 2026-05 unverdicted novelty 6.0

    The paper derives fully linear error bounds for minimum norm, minimum Frobenius norm, and quadratic generalized simplex derivative models, establishes directional fully quadratic Hessian accuracy under mild sample set...

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    M. A. Abramson and C. Audet , Convergence of mesh adaptive direct search to second-order stationary points, SIAM Journal on Optimization, 17 (2006), pp. 606–619

  2. [2]

    Alarie, C

    S. Alarie, C. Audet, A. E. Gheribi, M. Kokkolaras, and S. Le Digabel , Two decades of blackbox optimization applications, EURO Journal on Computational Optimization, 9 (2021), p. 100011

  3. [3]

    Audet and J

    C. Audet and J. E. Dennis Jr , Mesh adaptive direct search algorithms for constrained optimization , SIAM Journal on optimization, 17 (2006), pp. 188–217

  4. [4]

    Audet, J

    C. Audet, J. E. Dennis Jr, and S. Le Digabel , Parallel space decomposition of the mesh adaptive direct search algorithm, SIAM Journal on Optimization, 19 (2008), pp. 1150–1170

  5. [5]

    Audet and W

    C. Audet and W. Hare , Derivative-free and blackbox optimization , Springer, Cham, 2017

  6. [6]

    Beck , First-order methods in optimization , SIAM, 2017

    A. Beck , First-order methods in optimization , SIAM, 2017

  7. [7]

    J. P. Boyle and R. L. Dykstra , A method for finding projections onto the intersection of convex sets in Hilbert spaces, in Advances in Order Restricted Statistical Inference, Springer, 1986, pp. 28–47

  8. [8]

    Cartis, J

    C. Cartis, J. Fowkes, and Z. Shao , Randomised subspace methods for non-convex optimization, with applications to nonlinear least-squares, 2022. arXiv:2211.09873

  9. [9]

    Cartis and L

    C. Cartis and L. Roberts , Scalable subspace methods for derivative-free nonlinear least-squares optimiza- tion, Mathematical Programming, 199 (2023), pp. 461–524

  10. [10]

    arXiv:2412.14431

    , Randomized subspace derivative-free optimization with quadratic models and second-order conver- gence, 2024. arXiv:2412.14431

  11. [11]

    Y. Chen, W. Hare, and A. Wiebe , Q-fully quadratic modeling and its application in a random subspace derivative-free method, Computational Optimization and Applications, (2024)

  12. [12]

    Chikuse , Statistics on special manifolds , vol

    Y. Chikuse , Statistics on special manifolds , vol. 174, Springer, 2012

  13. [13]

    P. D. Conejo, E. W. Karas, L. G. Pedroso, A. A. Ribeiro, and M. Sachine , Global convergence of trust-region algorithms for convex constrained minimization without derivatives , Applied Mathematics and Computation, 220 (2013), pp. 324–330

  14. [14]

    A. R. Conn, N. I. M. Gould, and P. L. Toint , Trust region methods, SIAM, 2000

  15. [15]

    A. R. Conn, K. Scheinberg, and L. N. Vicente , Geometry of interpolation sets in derivative free opti- mization, Mathematical programming, 111 (2008), pp. 141–172. CLARSTA: A random subspace trust-region algorithm for convex-constrained derivative-free optimization 25

  16. [16]

    , Global convergence of general derivative-free trust-region algorithms to first-and second-order critical points, SIAM Journal on Optimization, 20 (2009), pp. 387–415

  17. [17]

    , Introduction to derivative-free optimization, SIAM, 2009

  18. [18]

    K. J. Dzahini and S. M. Wild , Stochastic trust-region algorithm in random subspaces with convergence and expected complexity analyses, SIAM Journal on Optimization, 34 (2024), pp. 2671–2699

  19. [19]

    Feurer and F

    M. Feurer and F. Hutter , Hyperparameter optimization, Springer, Cham, 2019, pp. 3–33

  20. [20]

    Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm

    H. Ghanbari and K. Scheinberg , Black-box optimization in machine learning with trust region based deriva- tive free algorithm, 2017. arXiv:1703.06925

  21. [21]

    G ¨otze and H

    F. G ¨otze and H. Sambale , Higher order concentration on Stiefel and Grassmann manifolds , Electronic Journal of Probability, 28 (2023), pp. 1 – 30

  22. [22]

    Gratton, C

    S. Gratton, C. W. Royer, L. N. Vicente, and Z. Zhang , Direct search based on probabilistic descent , SIAM Journal on Optimization, 25 (2015), pp. 1515–1541

  23. [23]

    , Direct search based on probabilistic feasible descent for bound and linearly constrained problems , Computational Optimization and Applications, 72 (2019), pp. 525–559

  24. [24]

    Gratton, P

    S. Gratton, P. L. Toint, and A. Tr ¨oltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Optimization Methods and Software, 26 (2011), pp. 873–894

  25. [25]

    Grimmett and D

    G. Grimmett and D. Stirzaker , Probability and random processes, Oxford university press, 2020

  26. [26]

    Hare and G

    W. Hare and G. Jarry-Bolduc , Calculus identities for generalized simplex gradients: rules and applications, SIAM Journal on Optimization, 30 (2020), pp. 853–884

  27. [27]

    Hough and L

    M. Hough and L. Roberts , Model-based derivative-free methods for convex-constrained optimization, SIAM Journal on Optimization, 32 (2022), pp. 2552–2579

  28. [28]

    Larson, M

    J. Larson, M. Menickelly, and S. M. Wild , Derivative-free optimization methods , Acta Numerica, 28 (2019), pp. 287–404

  29. [29]

    Ledoux , Isoperimetry and Gaussian analysis , Springer, Berlin, Heidelberg, 1996, pp

    M. Ledoux , Isoperimetry and Gaussian analysis , Springer, Berlin, Heidelberg, 1996, pp. 165–294

  30. [30]

    89, American Mathematical Society, 2001

    , The concentration of measure phenomenon , no. 89, American Mathematical Society, 2001

  31. [31]

    R. M. Lewis and V. Torczon , Pattern search algorithms for bound constrained minimization, SIAM Journal on optimization, 9 (1999), pp. 1082–1099

  32. [32]

    , Pattern search methods for linearly constrained minimization , SIAM Journal on Optimization, 10 (2000), pp. 917–941

  33. [33]

    1075–1089

    , A globally convergent augmented lagrangian pattern search algorithm for optimization with general constraints and simple bounds , SIAM Journal on Optimization, 12 (2002), pp. 1075–1089

  34. [34]

    Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou , A survey of convolutional neural networks: analysis, applica- tions, and prospects, IEEE Transactions on Neural Networks and Learning Systems, 33 (2021), pp. 6999–7019

  35. [35]

    Menickelly, Avoiding geometry improvement in derivative-free model-based methods via randomization,

    M. Menickelly, Avoiding geometry improvement in derivative-free model-based methods via randomization,

  36. [36]

    R. J. Muirhead , Aspects of multivariate statistical theory , John Wiley & Sons, 2009

  37. [37]

    Penrose, A generalized inverse for matrices, in Mathematical Proceedings of the Cambridge Philosophical Society, vol

    R. Penrose, A generalized inverse for matrices, in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 51(3), Cambridge University Press, 1955, pp. 406–413

  38. [38]

    M. J. D. Powell , A direct search optimization method that models the objective and constraint functions by linear interpolation, Springer, Dordrecht, 1994, pp. 51–67

  39. [39]

    , On fast trust region methods for quadratic models with linear constraints, Mathematical Programming Computation, 7 (2015), pp. 237–267

  40. [40]

    T. M. Ragonneau and Z. Zhang , PDFO: a cross-platform package for Powell’s derivative-free optimization solvers, Mathematical Programming Computation, 16 (2024), pp. 535–559

  41. [41]

    Roberts, Model construction for convex-constrained derivative-free optimization, SIAM Journal on Opti- mization, 35 (2025), pp

    L. Roberts, Model construction for convex-constrained derivative-free optimization, SIAM Journal on Opti- mization, 35 (2025), pp. 622–650

  42. [42]

    Roberts and C

    L. Roberts and C. W. Royer , Direct search based on probabilistic descent in reduced spaces, SIAM Journal on Optimization, 33 (2023), pp. 3057–3082