Sparsity-Constraint Optimization via Splicing Iteration

Borui Tang; Hongmei Lin; Jin Zhu; Junxian Zhu; Xueqin Wang; Zezhi Wang

arxiv: 2406.12017 · v2 · submitted 2024-06-17 · 📊 stat.ML · cs.LG· stat.CO

Sparsity-Constraint Optimization via Splicing Iteration

Jin Zhu , Junxian Zhu , Zezhi Wang , Borui Tang , Hongmei Lin , Xueqin Wang This is my paper

Pith reviewed 2026-05-23 23:47 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.CO

keywords sparsity-constrained optimizationsplicing iterationsupport recoverylinear convergencehard-thresholdingsparse Markov networksstrong convexity

0 comments

The pith

SCOPE achieves linear convergence and exact support recovery in sparsity-constrained optimization by replacing gradient steps with objective-guided splicing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SCOPE, an iterative algorithm for minimizing nonlinear differentiable objectives subject to a cardinality constraint on the solution vector. SCOPE performs a splicing operation that adds and removes candidate indices from the current support and selects the update that most improves the objective value, eliminating any need to choose a continuous step-size parameter. The authors prove that this procedure converges linearly to the global minimizer and recovers the exact support set whenever the sparsity level matches the true one. These guarantees hold under strong convexity and smoothness restricted to low-dimensional subspaces and do not require restricted-isometry-property conditions. The method is illustrated on sparse quadratic programs, sparse logistic regression, and structure learning for binary Markov networks.

Core claim

SCOPE is a splicing iteration that optimizes sparsity-constrained differentiable objectives which are strongly convex and smooth on low-dimensional subspaces, attaining a linear convergence rate and exact support recovery without continuous hyperparameter tuning or restricted-isometry-property assumptions.

What carries the argument

The splicing operation, which generates a small set of candidate supports by adding and deleting indices from the current support and chooses the candidate yielding the lowest objective value.

If this is right

The algorithm applies without modification to sparse quadratic optimization, sparse classifier training, and recovery of sparse Markov networks on binary data.
When the sparsity level is correctly specified, the method recovers the true support set with high probability under the stated convexity and smoothness conditions.
Numerical experiments using the provided C++ implementation demonstrate improved support recovery over existing hard-thresholding procedures.
Theoretical guarantees are established both with and without restricted-isometry-property-type assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Removing the requirement that the user supply the exact sparsity level would require an outer loop or information criterion that the paper does not analyze.
The splicing mechanism could be tested on objectives that are convex only on the full space but not on every subspace of dimension k.
Because no continuous step size is tuned, SCOPE may reduce sensitivity to initialization compared with methods whose step-size choice interacts with the starting point.

Load-bearing premise

The objective functions are strongly convex and smooth when restricted to the low-dimensional subspaces spanned by any set of variables whose size equals the target sparsity level.

What would settle it

Run SCOPE on a strongly convex and smooth objective whose minimizer has known support of size k; linear convergence or exact support recovery failing on this instance would falsify the claimed guarantees.

read the original abstract

Sparsity-constrained optimization underlies many problems in signal processing, statistics, and machine learning. State-of-the-art hard-thresholding (HT) algorithms rely on an appropriately selected continuous step-size parameter to ensure convergence. In this paper, we propose a naturally convergent iterative algorithm, SCOPE (Sparsity-Constrained Optimization via sPlicing itEration). The algorithm is capable of optimizing nonlinear differentiable objective functions that are strongly convex and smooth on low-dimensional subspaces. SCOPE replaces the gradient step with a splicing operation guided directly by the objective value, thereby eliminating the need to tune any continuous hyperparameter. Theoretically, it achieves a linear convergence rate and recovers the true support set when the sparsity level is correctly specified. We also establish parallel theoretical results without relying on restricted-isometry-property-type conditions. We apply SCOPE's versatility and power to solve sparse quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables. With our C++ implementation of SCOPE, numerical experiments on these tasks show that it achieves superior support recovery performance, confirming both its algorithmic efficiency and theoretical guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces SCOPE, an iterative algorithm for sparsity-constrained optimization of nonlinear differentiable objectives. It replaces gradient steps with a splicing operation driven directly by the objective value, eliminating continuous step-size tuning. The method targets objectives that are strongly convex and smooth on low-dimensional subspaces, claiming linear convergence and exact support recovery when the sparsity level is correctly specified. Parallel guarantees are derived without RIP-type conditions. Applications include sparse quadratic optimization, sparse classifiers, and sparse Markov networks for binary variables, with C++ experiments demonstrating improved support recovery.

Significance. If the central claims hold, the work offers a parameter-free alternative to hard-thresholding methods with linear rates and support recovery, plus results that avoid RIP assumptions. This could impact signal processing and statistical learning where sparsity constraints arise. The C++ implementation supports reproducibility of the reported experiments.

major comments (1)

[Abstract] Abstract (paragraph 2): the linear convergence rate and support-recovery guarantees are conditioned on the objective being strongly convex and smooth on low-dimensional subspaces. This assumption is load-bearing for both the main results and the no-RIP parallel results; if the restricted objective on the true support fails to satisfy it, the claims lose their justification. The manuscript should explicitly verify or bound this property for the three applications.

minor comments (1)

The abstract states superior support recovery but does not name the competing methods or report quantitative metrics (e.g., exact recovery rates or F1 scores); adding one sentence with these details would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph 2): the linear convergence rate and support-recovery guarantees are conditioned on the objective being strongly convex and smooth on low-dimensional subspaces. This assumption is load-bearing for both the main results and the no-RIP parallel results; if the restricted objective on the true support fails to satisfy it, the claims lose their justification. The manuscript should explicitly verify or bound this property for the three applications.

Authors: We agree that the strong convexity and smoothness of the objective when restricted to low-dimensional subspaces is a key assumption underlying both the linear convergence and support recovery results, as well as the RIP-free guarantees. In the revised manuscript we will add an explicit subsection in the applications section that verifies and bounds this property for each of the three examples. For sparse quadratic optimization the restricted Hessian is the corresponding principal submatrix; we will state the standard eigenvalue lower bound that yields the strong-convexity parameter. For sparse logistic classifiers the restricted Hessian takes the form of a weighted Gram matrix whose minimum eigenvalue is bounded away from zero whenever the design submatrix on the true support has full column rank and the logistic probabilities are bounded away from the extremes (a mild condition satisfied in the reported experiments). For sparse Markov networks the log-likelihood restricted to the true support is strongly convex under linear independence of the sufficient statistics on that support, which we will note holds for the binary pairwise models considered. These additions will make the load-bearing assumption explicit without altering the main theorems. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation rests on explicit structural assumptions rather than self-definition or fitted inputs

full rationale

The paper introduces SCOPE as a new splicing-based algorithm and derives linear convergence plus support recovery from the stated assumption that the objective is strongly convex and smooth on low-dimensional subspaces. No step in the provided abstract or description reduces a claimed prediction to a fitted parameter, renames a known result, or relies on a load-bearing self-citation whose content is unverified. The no-RIP parallel results are presented as additional analysis under the same subspace assumption, not as removal of all structural conditions. The derivation chain is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that objectives are strongly convex and smooth on subspaces; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Objective functions are strongly convex and smooth on low-dimensional subspaces
Explicitly stated as the class of functions for which the algorithm and its convergence guarantees apply.

pith-pipeline@v0.9.0 · 5733 in / 1256 out tokens · 21161 ms · 2026-05-23T23:47:28.681393+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel; Jcost_pos_of_ne_one unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

f(·) is m_{3s}-RSC and M_{3s}-RSS (Assumption 1); ∥∇f(θ*)∥_∞ ≤ 0.35√s (1.49 m_{3s} − M_{3s}) ϑ (Assumption 2); linear convergence |f(θ_{t+1}) − f(θ*)| ≤ δ_1^{-1} |f(θ_t) − f(θ*)| (Theorem 2)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Splicing operator selects S(k)_A, S(k)_I by ranking ξ_j ∝ θ_j² (in support) or (∇_j f)^2 (outside); objective decreases monotonically until termination (Remark 2)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in "" FUNCTION format.date year ...

work page
[2]

Journal of Machine Learning Research 14(25):807--841

Bahmani S, Raj B, Boufounos PT (2013) Greedy sparsity-constrained optimization. Journal of Machine Learning Research 14(25):807--841

work page 2013
[3]

SIAM Journal on Optimization 23(3):1480--1509

Beck A, Eldar YC (2013) Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM Journal on Optimization 23(3):1480--1509

work page 2013
[4]

SIAM Journal on Optimization 31(3):2340--2367

Bertsimas D, Cory-Wright R, Pauphilet J (2021) A unified approach to mixed-integer optimization problems with logical constraints. SIAM Journal on Optimization 31(3):2340--2367

work page 2021
[5]

Journal of Fourier Analysis and Applications 14(5):629--654

Blumensath T, Davies ME (2008) Iterative thresholding for sparse approximations. Journal of Fourier Analysis and Applications 14(5):629--654

work page 2008
[6]

Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59(8):1207--1223

Candes EJ, Romberg JK, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59(8):1207--1223

work page 2006
[7]

IEEE Transactions on Information Theory 51(12):4203--4215, ://dx.doi.org/10.1109/TIT.2005.858979

Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Transactions on Information Theory 51(12):4203--4215, ://dx.doi.org/10.1109/TIT.2005.858979

work page doi:10.1109/tit.2005.858979 2005
[8]

Journal of Machine Learning Research 17(83):1--5

Diamond S, Boyd S (2016) CVXPY : a python-embedded modeling language for convex optimization. Journal of Machine Learning Research 17(83):1--5

work page 2016
[9]

IEEE Transactions on Information Theory 52(4):1289--1306

Donoho DL (2006) Compressed sensing. IEEE Transactions on Information Theory 52(4):1289--1306

work page 2006
[10]

SIAM Journal on Numerical Analysis 49(6):2543--2563

Foucart S (2011) Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on Numerical Analysis 49(6):2543--2563

work page 2011
[11]

://www.gurobi.com

Gurobi Optimization, LLC (2022) Gurobi optimizer reference manual . ://www.gurobi.com

work page 2022
[12]

Operations Research 68(5):1517--1537

Hazimeh H, Mazumder R (2020) Fast best subset selection: coordinate descent and local combinatorial optimization algorithms. Operations Research 68(5):1517--1537

work page 2020
[13]

Journal of Machine Learning Research 10(32):883--906

H \"o fling H, Tibshirani R (2009) Estimation of sparse binary pairwise markov networks using pseudo-likelihoods. Journal of Machine Learning Research 10(32):883--906

work page 2009
[14]

Advances in Neural Information Processing Systems 27

Jain P, Tewari A, Kar P (2014) On iterative hard thresholding methods for high-dimensional M -estimation. Advances in Neural Information Processing Systems 27

work page 2014
[15]

International Conference on Machine Learning, 503--511 (PMLR)

Liu J, Ye J, Fujimaki R (2014) Forward-backward greedy algorithms for general convex smooth functions over a cardinality constraint. International Conference on Machine Learning, 503--511 (PMLR)

work page 2014
[16]

Science advances 4(3):e1700791

Lokhov AY, Vuffray M, Misra S, Chertkov M (2018) Optimal structure and parameter learning of ising models. Science advances 4(3):e1700791

work page 2018
[17]

Applied and Computational Harmonic Analysis 26(3):301--321

Needell D, Tropp JA (2009) CoSaMP : iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis 26(3):301--321

work page 2009
[18]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(4):659--677

Park MY, Hastie T (2007) L_1 -regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(4):659--677

work page 2007
[19]

Annals of Statistics 639--665

Rigollet P (2012) Kullback-leibler aggregation and misspecified generalized linear models. Annals of Statistics 639--665

work page 2012
[20]

IEEE Transactions on Information Theory 58(7):4117--4134

Santhanam NP, Wainwright MJ (2012) Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory 58(7):4117--4134

work page 2012
[21]

SIAM Journal on Optimization 20(6):2807--2832, ISSN 1052-6234, 1095-7189

Shalev-Shwartz S, Srebro N, Zhang T (2010) Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM Journal on Optimization 20(6):2807--2832, ISSN 1052-6234, 1095-7189

work page 2010
[22]

Journal of Machine Learning Research 18(1):7650--7691, ISSN 1532-4435

Shen J, Li P (2017) A tight bound of hard thresholding. Journal of Machine Learning Research 18(1):7650--7691, ISSN 1532-4435

work page 2017
[23]

Journal of the American Statistical Association 107(497):223--232

Shen X, Pan W, Zhu Y (2012) Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association 107(497):223--232

work page 2012
[24]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58(1):267--288, ISSN 00359246

Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58(1):267--288, ISSN 00359246

work page 1996
[25]

IEEE Transactions on Information Theory 53(12):4655--4666

Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory 53(12):4655--4666

work page 2007
[26]

Wainwright MJ (2019) High-dimensional statistics: a non-asymptotic viewpoint (Cambridge University Press)

work page 2019
[27]

Journal of Statistical Software 94:1--24

Wen C, Zhang A, Quan S, Wang X (2020) BeSS : an R package for best subset selection in linear, logistic and cox proportional hazards models. Journal of Statistical Software 94:1--24

work page 2020
[28]

Annals of Statistics 40(3):1403--1429

Xue L, Zou H, Cai T (2012) Nonconcave penalized composite conditional likelihood estimation of sparse ising models. Annals of Statistics 40(3):1403--1429

work page 2012
[29]

Journal of Machine Learning Research 18:1--43

Yuan X, Li P, Zhang T (2017) Gradient hard thresholding pursuit. Journal of Machine Learning Research 18:1--43

work page 2017
[30]

Journal of Machine Learning Research 21(152):1--50

Yuan X, Liu B, Wang L, Liu Q, Metaxas DN (2020) Dual iterative hard thresholding. Journal of Machine Learning Research 21(152):1--50

work page 2020
[31]

Annals of Statistics 36(4):1567--1594

Zhang C, Huang J (2008) The sparsity and bias of the lasso selection in high-dimensional linear regression. Annals of Statistics 36(4):1567--1594

work page 2008
[32]

Journal of Machine Learning Research 45

Zhou S, Xiu N, Qi H (2021) Global and quadratic convergence of newton hard-thresholding pursuit. Journal of Machine Learning Research 45

work page 2021
[33]

Proceedings of the National Academy of Sciences 117(52):33117--33123

Zhu J, Wen C, Zhu J, Zhang H, Wang X (2020) A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences 117(52):33117--33123

work page 2020
[34]

Operations Research 70(4):2384--2398

Zhu Z, Li X, Wang M, Zhang A (2022) Learning markov models via low-rank optimization. Operations Research 70(4):2384--2398

work page 2022

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in "" FUNCTION format.date year ...

work page

[2] [2]

Journal of Machine Learning Research 14(25):807--841

Bahmani S, Raj B, Boufounos PT (2013) Greedy sparsity-constrained optimization. Journal of Machine Learning Research 14(25):807--841

work page 2013

[3] [3]

SIAM Journal on Optimization 23(3):1480--1509

Beck A, Eldar YC (2013) Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM Journal on Optimization 23(3):1480--1509

work page 2013

[4] [4]

SIAM Journal on Optimization 31(3):2340--2367

Bertsimas D, Cory-Wright R, Pauphilet J (2021) A unified approach to mixed-integer optimization problems with logical constraints. SIAM Journal on Optimization 31(3):2340--2367

work page 2021

[5] [5]

Journal of Fourier Analysis and Applications 14(5):629--654

Blumensath T, Davies ME (2008) Iterative thresholding for sparse approximations. Journal of Fourier Analysis and Applications 14(5):629--654

work page 2008

[6] [6]

Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59(8):1207--1223

Candes EJ, Romberg JK, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59(8):1207--1223

work page 2006

[7] [7]

IEEE Transactions on Information Theory 51(12):4203--4215, ://dx.doi.org/10.1109/TIT.2005.858979

Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Transactions on Information Theory 51(12):4203--4215, ://dx.doi.org/10.1109/TIT.2005.858979

work page doi:10.1109/tit.2005.858979 2005

[8] [8]

Journal of Machine Learning Research 17(83):1--5

Diamond S, Boyd S (2016) CVXPY : a python-embedded modeling language for convex optimization. Journal of Machine Learning Research 17(83):1--5

work page 2016

[9] [9]

IEEE Transactions on Information Theory 52(4):1289--1306

Donoho DL (2006) Compressed sensing. IEEE Transactions on Information Theory 52(4):1289--1306

work page 2006

[10] [10]

SIAM Journal on Numerical Analysis 49(6):2543--2563

Foucart S (2011) Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on Numerical Analysis 49(6):2543--2563

work page 2011

[11] [11]

://www.gurobi.com

Gurobi Optimization, LLC (2022) Gurobi optimizer reference manual . ://www.gurobi.com

work page 2022

[12] [12]

Operations Research 68(5):1517--1537

Hazimeh H, Mazumder R (2020) Fast best subset selection: coordinate descent and local combinatorial optimization algorithms. Operations Research 68(5):1517--1537

work page 2020

[13] [13]

Journal of Machine Learning Research 10(32):883--906

H \"o fling H, Tibshirani R (2009) Estimation of sparse binary pairwise markov networks using pseudo-likelihoods. Journal of Machine Learning Research 10(32):883--906

work page 2009

[14] [14]

Advances in Neural Information Processing Systems 27

Jain P, Tewari A, Kar P (2014) On iterative hard thresholding methods for high-dimensional M -estimation. Advances in Neural Information Processing Systems 27

work page 2014

[15] [15]

International Conference on Machine Learning, 503--511 (PMLR)

Liu J, Ye J, Fujimaki R (2014) Forward-backward greedy algorithms for general convex smooth functions over a cardinality constraint. International Conference on Machine Learning, 503--511 (PMLR)

work page 2014

[16] [16]

Science advances 4(3):e1700791

Lokhov AY, Vuffray M, Misra S, Chertkov M (2018) Optimal structure and parameter learning of ising models. Science advances 4(3):e1700791

work page 2018

[17] [17]

Applied and Computational Harmonic Analysis 26(3):301--321

Needell D, Tropp JA (2009) CoSaMP : iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis 26(3):301--321

work page 2009

[18] [18]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(4):659--677

Park MY, Hastie T (2007) L_1 -regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(4):659--677

work page 2007

[19] [19]

Annals of Statistics 639--665

Rigollet P (2012) Kullback-leibler aggregation and misspecified generalized linear models. Annals of Statistics 639--665

work page 2012

[20] [20]

IEEE Transactions on Information Theory 58(7):4117--4134

Santhanam NP, Wainwright MJ (2012) Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory 58(7):4117--4134

work page 2012

[21] [21]

SIAM Journal on Optimization 20(6):2807--2832, ISSN 1052-6234, 1095-7189

Shalev-Shwartz S, Srebro N, Zhang T (2010) Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM Journal on Optimization 20(6):2807--2832, ISSN 1052-6234, 1095-7189

work page 2010

[22] [22]

Journal of Machine Learning Research 18(1):7650--7691, ISSN 1532-4435

Shen J, Li P (2017) A tight bound of hard thresholding. Journal of Machine Learning Research 18(1):7650--7691, ISSN 1532-4435

work page 2017

[23] [23]

Journal of the American Statistical Association 107(497):223--232

Shen X, Pan W, Zhu Y (2012) Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association 107(497):223--232

work page 2012

[24] [24]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58(1):267--288, ISSN 00359246

Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58(1):267--288, ISSN 00359246

work page 1996

[25] [25]

IEEE Transactions on Information Theory 53(12):4655--4666

Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory 53(12):4655--4666

work page 2007

[26] [26]

Wainwright MJ (2019) High-dimensional statistics: a non-asymptotic viewpoint (Cambridge University Press)

work page 2019

[27] [27]

Journal of Statistical Software 94:1--24

Wen C, Zhang A, Quan S, Wang X (2020) BeSS : an R package for best subset selection in linear, logistic and cox proportional hazards models. Journal of Statistical Software 94:1--24

work page 2020

[28] [28]

Annals of Statistics 40(3):1403--1429

Xue L, Zou H, Cai T (2012) Nonconcave penalized composite conditional likelihood estimation of sparse ising models. Annals of Statistics 40(3):1403--1429

work page 2012

[29] [29]

Journal of Machine Learning Research 18:1--43

Yuan X, Li P, Zhang T (2017) Gradient hard thresholding pursuit. Journal of Machine Learning Research 18:1--43

work page 2017

[30] [30]

Journal of Machine Learning Research 21(152):1--50

Yuan X, Liu B, Wang L, Liu Q, Metaxas DN (2020) Dual iterative hard thresholding. Journal of Machine Learning Research 21(152):1--50

work page 2020

[31] [31]

Annals of Statistics 36(4):1567--1594

Zhang C, Huang J (2008) The sparsity and bias of the lasso selection in high-dimensional linear regression. Annals of Statistics 36(4):1567--1594

work page 2008

[32] [32]

Journal of Machine Learning Research 45

Zhou S, Xiu N, Qi H (2021) Global and quadratic convergence of newton hard-thresholding pursuit. Journal of Machine Learning Research 45

work page 2021

[33] [33]

Proceedings of the National Academy of Sciences 117(52):33117--33123

Zhu J, Wen C, Zhu J, Zhang H, Wang X (2020) A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences 117(52):33117--33123

work page 2020

[34] [34]

Operations Research 70(4):2384--2398

Zhu Z, Li X, Wang M, Zhang A (2022) Learning markov models via low-rank optimization. Operations Research 70(4):2384--2398

work page 2022