pith. sign in

arxiv: 2605.16687 · v1 · pith:AAAISQVLnew · submitted 2026-05-15 · 🧮 math.OC

New Bounds for Exact Penalized Cardinality-Constrained Optimization with Pseudonormality Conditions

Pith reviewed 2026-05-20 15:40 UTC · model grok-4.3

classification 🧮 math.OC
keywords cardinality-constrained optimizationexact penalizationpseudonormality conditionprojected subgradient methodsparse optimizationconvergence analysisnonconvex optimization
0
0 comments X

The pith

Extending the pseudonormality condition to cardinality-constrained problems enables local exact penalization without requiring Lipschitz continuity on the objective function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on cardinality-constrained optimization problems that include general equality and inequality constraints. It extends the pseudonormality condition from standard nonlinear programming to this setting to prove local exact penalization of the cardinality constraint. This penalization works even when the objective function is not Lipschitz continuous. The authors then study the projected subgradient method and a stochastic version applied to the penalized problem, providing convergence results. They also derive more precise bounds on the iterates and objective values compared to prior work.

Core claim

Under an extension of the pseudonormality condition, a cardinality-constrained optimization problem with general constraints can be locally reformulated as an exact penalty problem. The reformulation does not require the objective to be Lipschitz continuous. Projected subgradient methods, both deterministic and stochastic, applied to the penalized problem converge to stationary points, and the analysis yields improved bounds on the iterate sequence and function values.

What carries the argument

The extended pseudonormality condition, which acts as a constraint qualification to guarantee local exact penalization for the discontinuous cardinality function in the presence of other constraints.

If this is right

  • The original problem is equivalent to the penalized one locally for large penalty parameters.
  • The projected subgradient method converges to a stationary point of the exact penalty formulation.
  • The stochastic projected subgradient method also converges with similar guarantees.
  • Sharper bounds hold for the distance of iterates to the solution set and for the objective function values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could reduce the need for specialized solvers in sparse signal recovery applications.
  • Similar extensions might apply to other discontinuous constraints in optimization.
  • Precise bounds may improve practical stopping rules for iterative algorithms.
  • Results suggest potential for broader use in non-Lipschitz optimization problems.

Load-bearing premise

The pseudonormality condition extends meaningfully to cardinality-constrained problems with general constraints and suffices for local exact penalization.

What would settle it

A specific example of a cardinality-constrained problem satisfying the extended pseudonormality condition but for which no finite penalty parameter achieves local exact penalization, or where the stated bounds on iterates fail to hold.

read the original abstract

Cardinality-constrained optimization (CCO) is a popular topic in sparse learning and signal recovery, yet remains challenging due to the inherent nonconvexity and discontinuity of cardinality constraints. This paper investigates the exact penalty theory for CCO problems with general equality and inequality constraints. In particular, we extend the pseudonormality condition to the cardinality-constrained framework and establish the local exact penalization without imposing Lipschitz continuity on the objective function. We further analyze both the projected subgradient method and its stochastic variant with convergence guarantees for the derived exact penalty formulation. Compared with the existing results, we give some more precise bounds of the iterate sequence and the objective function value.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript extends the pseudonormality condition to cardinality-constrained optimization problems that include general equality and inequality constraints. It establishes local exact penalization for the original problem without assuming Lipschitz continuity of the objective function. The paper then derives convergence guarantees for the projected subgradient method and its stochastic variant applied to the resulting exact-penalty formulation, claiming strictly more precise bounds on the iterate sequence and objective values than those available in the literature.

Significance. If the central claims hold, the work would strengthen exact-penalty theory for nonconvex, discontinuous cardinality-constrained problems that arise in sparse learning and signal recovery. Removing the Lipschitz assumption broadens the class of admissible objectives, while the tighter iterate and value bounds for both deterministic and stochastic subgradient schemes could improve practical convergence rates. The combination of a new regularity condition with algorithmic analysis is a coherent contribution to the field.

major comments (3)
  1. [§4, Theorem 4.2] §4, Theorem 4.2: the local exact-penalization result is stated to hold under the extended pseudonormality condition alone, yet the proof sketch invokes a uniform bound on the distance to the feasible set that appears to rely on a local Lipschitz property of the objective near the reference point; without an explicit replacement for this step the claim that Lipschitz continuity is unnecessary remains unverified.
  2. [§5.3, Eq. (27)] §5.3, Eq. (27): the claimed improvement in the iterate bound for the projected subgradient method is expressed as O(1/k) with a smaller constant than in prior work; however, the derivation uses the same step-size schedule as the reference results, so it is unclear whether the reduction is genuine or merely a re-arrangement of existing terms.
  3. [§6, Theorem 6.1] §6, Theorem 6.1: the stochastic variant is asserted to inherit the same precise bounds under an unbiased-gradient assumption, but the variance term is controlled only in expectation; the paper does not supply a high-probability version or a concrete counter-example showing when the bound fails, which weakens the comparison with deterministic guarantees.
minor comments (2)
  1. [§2] The notation distinguishing the cardinality function from the ℓ0 quasi-norm is introduced only in passing; a short clarifying paragraph in the preliminaries would prevent reader confusion.
  2. [References] Several references to earlier pseudonormality results are cited by number only; adding one-sentence summaries of the key hypotheses in those works would help readers assess the novelty of the extension.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [§4, Theorem 4.2] §4, Theorem 4.2: the local exact-penalization result is stated to hold under the extended pseudonormality condition alone, yet the proof sketch invokes a uniform bound on the distance to the feasible set that appears to rely on a local Lipschitz property of the objective near the reference point; without an explicit replacement for this step the claim that Lipschitz continuity is unnecessary remains unverified.

    Authors: We are grateful for this observation. The uniform bound on the distance to the feasible set in the proof of Theorem 4.2 is in fact derived from the extended pseudonormality condition and the continuity properties of the constraint functions alone. It does not rely on any Lipschitz continuity of the objective function. To eliminate any potential ambiguity, we will revise the proof to include a more explicit derivation of this bound, thereby strengthening the verification that the Lipschitz assumption is indeed unnecessary. revision: yes

  2. Referee: [§5.3, Eq. (27)] §5.3, Eq. (27): the claimed improvement in the iterate bound for the projected subgradient method is expressed as O(1/k) with a smaller constant than in prior work; however, the derivation uses the same step-size schedule as the reference results, so it is unclear whether the reduction is genuine or merely a re-arrangement of existing terms.

    Authors: The reduction in the constant is genuine. It results from the application of the new pseudonormality-based estimates to the convergence analysis, which yield sharper bounds on both the iterate distances and the objective values compared to previous approaches. Although the step-size schedule remains unchanged, the preceding inequalities in the proof are tighter due to the regularity condition. We will add a brief comparison paragraph in §5.3 to highlight the origin of the improved constant. revision: partial

  3. Referee: [§6, Theorem 6.1] §6, Theorem 6.1: the stochastic variant is asserted to inherit the same precise bounds under an unbiased-gradient assumption, but the variance term is controlled only in expectation; the paper does not supply a high-probability version or a concrete counter-example showing when the bound fails, which weakens the comparison with deterministic guarantees.

    Authors: We note that the convergence guarantees for the stochastic projected subgradient method are provided in expectation, as is common under the unbiased gradient assumption without further noise restrictions. A high-probability bound would necessitate additional hypotheses, such as uniformly bounded variance. We will revise the discussion in §6 to explicitly state that the bounds hold in expectation and to compare them appropriately with the deterministic case. If desired, a simple illustrative example where high-probability convergence fails without extra assumptions can be included. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper claims an extension of the pseudonormality condition to cardinality-constrained optimization problems with general constraints, yielding local exact penalization without a Lipschitz assumption on the objective, followed by convergence analysis of projected subgradient methods and improved bounds on iterates and objective values. These steps are presented as theoretical extensions of prior penalty theory rather than reductions to quantities defined by the authors' own fits or self-referential equations. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the abstract or claim structure; the non-Lipschitz feature is explicitly framed as a deliberate strengthening. The derivation chain remains independent of the paper's inputs and does not reduce by construction to fitted parameters or self-defined constructs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed from abstract only; no explicit free parameters, axioms, or invented entities are identifiable without the full manuscript.

pith-pipeline@v0.9.0 · 5646 in / 1100 out tokens · 35730 ms · 2026-05-20T15:40:58.084049+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    A. M. Bruckstein, D. L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Review 51 (1) (2009) 34–81. 15

  2. [2]

    W. Wang, S. Lu, H. Mao, J. Cheng, Multi-parameter Tikhonov regularization with theℓ 0 sparsity constraint, Inverse Problems 29 (6) (2013) 065018

  3. [3]

    H. Zou, L. Xue, A selective overview of sparse principal component analysis, Proceedings of the IEEE 106 (8) (2018) 1311–1320

  4. [4]

    Hazimeh, R

    H. Hazimeh, R. Mazumder, P. Radchenko, Grouped variable selection with discrete optimization: Computational and statistical perspectives, The Annals of Statistics 51 (1) (2023) 1–32

  5. [5]

    Bertsimas, R

    D. Bertsimas, R. Cory-Wright, A scalable algorithm for sparse portfolio selection, INFORMS Journal on Computing 34 (3) (2022) 1489–1511

  6. [6]

    H. T. Anis, R. H. Kwon, End-to-end, decision-based, cardinality-constrained portfolio optimiza- tion, European Journal of Operational Research 320 (3) (2025) 739–753

  7. [7]

    Y. He, L. Xiao, Structured pruning for deep convolutional neural networks: A survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (5) (2024) 2900–2919

  8. [8]

    Benbaki, W

    R. Benbaki, W. Chen, X. Meng, H. Hazimeh, N. Ponomareva, Z. Zhao, R. Mazumder, Fast as chita: Neural network pruning with combinatorial optimization, in: International Conference on Machine Learning, PMLR, 2023, pp. 2031–2049

  9. [9]

    A. M. Tillmann, D. Bienstock, A. Lodi, A. Schwartz, Cardinality minimization, constraints, and regularization: A survey, SIAM Review 66 (3) (2024) 403–477

  10. [10]

    Bonami, M

    P. Bonami, M. A. Lejeune, An exact solution approach for portfolio optimization problems under stochastic and integer constraints, Operations Research 57 (3) (2009) 650–670

  11. [11]

    Burdakov, C

    O. Burdakov, C. Kanzow, A. Schwartz, Mathematical programs with cardinality constraints: reformulation by complementarity-type constraints and a regularization method, SIAM Journal on Optimization 26 (1) (2016) 397–425

  12. [12]

    ˇCervinka, C

    M. ˇCervinka, C. Kanzow, A. Schwartz, Constraint qualifications and optimality conditions for optimization problems with cardinality constraints, Mathematical Programming 160 (2016) 353– 377

  13. [13]

    Kanzow, A

    C. Kanzow, A. B. Raharja, A. Schwartz, An augmented Lagrangian method for cardinality- constrained optimization problems, Journal of Optimization Theory and Applications 189 (3) (2021) 793–813

  14. [14]

    Liang, G.-H

    Y.-C. Liang, G.-H. Lin, Relaxed method for optimization problems with cardinality constraints, Journal of Global Optimization 88 (2) (2024) 359–375

  15. [15]

    Kanzow, A

    C. Kanzow, A. Schwartz, F. Weiß, The sparse (st) optimization problem: reformulations, opti- mality, stationarity, and numerical results, Computational Optimization and Applications 90 (1) (2025) 77–112

  16. [16]

    L. Pan, S. Zhou, N. Xiu, H.-D. Qi, A convergent iterative hard thresholding for nonnegative sparsity optimization, Pacific Journal of Optimization 13 (2) (2017) 325–353

  17. [17]

    A. Beck, Y. C. Eldar, Sparsity constrained nonlinear optimization: Optimality conditions and algorithms, SIAM Journal on Optimization 23 (3) (2013) 1480–1509

  18. [18]

    A. Beck, N. Hallak, On the minimization over sparse symmetric sets: projections, optimality conditions, and algorithms, Mathematics of Operations Research 41 (1) (2015) 196–223

  19. [19]

    L. Pan, Z. Luo, N. Xiu, Restricted robinson constraint qualification and optimality for cardinality- constrained cone programming, Journal of Optimization Theory and Applications 175 (1) (2017) 104–118. 16

  20. [20]

    C. Zhao, N. Xiu, H. Qi, Z. Luo, A lagrange-newton algorithm for sparse nonlinear programming, Mathematical Programming 195 (1) (2022) 903–928

  21. [21]

    C. Kan, W. Song, Second-order conditions for the existence of augmented Lagrange multipliers for sparse optimization, Journal of Optimization Theory and Applications 201 (1) (2024) 103–129

  22. [22]

    S. Li, S. Zhou, Z. Luo, Sparse quadratically constrained quadratic programming via semismooth newton method, arXiv preprint arXiv:2503.15109 (2025)

  23. [23]

    Nocedal, S

    J. Nocedal, S. J. Wright, Numerical Optimization, Springer, 2006

  24. [24]

    X. Chen, Z. Lu, T. K. Pong, Penalty methods for a class of non-Lipschitz optimization problems, SIAM Journal on Optimization 26 (3) (2016) 1465–1492

  25. [25]

    L. Guo, J. J. Ye, Necessary optimality conditions and exact penalization for non-lipschitz nonlinear programs, Mathematical Programming 168 (1) (2018) 571–598

  26. [26]

    Z. Xiao, J. Y. Jane, Optimality conditions and constraint qualifications for cardinality constrained optimization problems, Numerical Algebra, Control and Optimization 14 (3) (2024) 614–635

  27. [27]

    L. Pan, N. Xiu, J. Fan, Optimality conditions for sparse nonlinear programming, Science China Mathematics 60 (5) (2017) 759–776

  28. [28]

    D. P. Bertsekas, A. E. Ozdaglar, Pseudonormality and a Lagrange multiplier theory for constrained optimization, Journal of Optimization Theory and Applications 114 (2) (2002) 287–343

  29. [29]

    Jacques, J

    L. Jacques, J. N. Laska, P. T. Boufounos, R. G. Baraniuk, Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors, IEEE Transactions on Information Theory 59 (4) (2013) 2082–2102

  30. [30]

    S. Li, D. Liu, Y. Shen, Adaptive iterative hard thresholding for least absolute deviation problems with sparsity constraints, Journal of Fourier Analysis and Applications 29 (1) (2023) 5

  31. [31]

    D. Liu, S. Li, Y. Shen, One-bit compressive sensing with projected subgradient method under sparsity constraints, IEEE Transactions on Information Theory 65 (10) (2019) 6650–6663

  32. [32]

    Nguyen, D

    N. Nguyen, D. Needell, T. Woolf, Linear convergence of stochastic iterative greedy algorithms with sparse constraints, IEEE Transactions on Information Theory 63 (11) (2017) 6869–6895

  33. [33]

    P. Zhou, X. Yuan, J. Feng, Efficient stochastic gradient hard thresholding, Advances in Neural Information Processing Systems 31 (2018) 1988–1997

  34. [34]

    Shang, B

    F. Shang, B. Wei, H. Liu, Y. Liu, P. Zhou, M. Gong, Efficient gradient support pursuit with less hard thresholding for cardinality-constrained learning, IEEE Transactions on Neural Networks and Learning Systems 33 (12) (2021) 7806–7817

  35. [35]

    C. Li, Z. Ma, D. Sun, G. Zhang, J. Wen, Stochastic IHT with stochastic Polyak step-size for sparse signal recovery, IEEE Signal Processing Letters (2024)

  36. [36]

    Y. I. Alber, A. N. Iusem, M. V. Solodov, On the projected subgradient method for nonsmooth convex optimization in a Hilbert space, Mathematical Programming 81 (1998) 23–35

  37. [37]

    Auslender, M

    A. Auslender, M. Teboulle, Projected subgradient methods with non-Euclidean distances for non- differentiable convex minimization and variational inequalities, Mathematical Programming 120 (2009) 27–48

  38. [38]

    R. T. Rockafellar, R. J. Wets, Variational Analysis, Springer, 1998

  39. [39]

    Andreani, G

    R. Andreani, G. Haeser, M. L. Schuverdt, P. J. Silva, A relaxed constant positive linear dependence constraint qualification and applications, Mathematical Programming 135 (1) (2012) 255–273. 17

  40. [40]

    L. Guo, J. Zhang, G.-H. Lin, New results on constraint qualifications for nonlinear extremum problems and extensions, Journal of Optimization Theory and Applications 163 (3) (2014) 737– 754

  41. [41]

    Allen, R

    E. Allen, R. Helgason, J. Kennington, B. Shetty, A generalization of Polyak’s convergence result for subgradient optimization, Mathematical Programming 37 (1987) 309–317

  42. [42]

    S. Boyd, L. Xiao, A. Mutapcic, Subgradient methods, lecture notes of EE392o, Stanford Univer- sity, Autumn Quarter 2004 (01) (2003) 175. 18