Approximate separability of symmetrically penalized least squares in high dimensions: characterization and consequences

Michael Celentano

arxiv: 1906.10319 · v1 · pith:Q4GZVT3Gnew · submitted 2019-06-25 · 🧮 math.ST · stat.TH

Approximate separability of symmetrically penalized least squares in high dimensions: characterization and consequences

Michael Celentano This is my paper

Pith reviewed 2026-05-25 16:32 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords high-dimensional estimationpenalized least squaressymmetric penaltiesGaussian sequence modelconcentration inequalitiesseparabilityM-estimationadaptive procedures

0 comments

The pith

Symmetrically penalized least squares with non-separable penalties behaves nearly like separable penalties in high-dimensional Gaussian models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that in the Gaussian sequence model and the linear model with uncorrelated Gaussian designs, symmetrically penalized least squares using a possibly non-separable symmetric convex penalty has high-dimensional behavior that closely matches least squares with a suitably chosen separable penalty. This match is quantified by finite-sample concentration inequalities. A reader would care because the result clarifies the role of non-separability: when the empirical distribution of the parameter coordinates is known, non-separable penalties offer at most limited advantages, while when unknown they automatically implement a specific adaptive procedure, with a partial converse characterizing which adaptive procedures arise this way.

Core claim

The high-dimensional behavior of symmetrically penalized least squares with a possibly non-separable, symmetric, convex penalty in both the Gaussian sequence model and the linear model with uncorrelated Gaussian designs nearly matches the behavior of least squares with an appropriately chosen separable penalty in these same models, with the similarity precisely quantified by a finite-sample concentration inequality in both cases.

What carries the argument

Finite-sample concentration inequality that bounds the difference between the non-separable and separable penalized estimators.

If this is right

When the empirical distribution of the parameter coordinates is known exactly or approximately, non-separable symmetric penalties yield at most limited improvements over separable ones.
When that distribution is unknown, non-separable symmetric penalties automatically implement an adaptive procedure that the paper characterizes.
A partial converse identifies which adaptive procedures can be realized via such non-separable symmetric penalties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The concentration result suggests that explicit knowledge of the coordinate distribution can be replaced by a non-separable penalty without much loss in these models.
Similar approximate separability may hold in other high-dimensional settings if comparable concentration can be established around a separable surrogate.
The characterization of the induced adaptive procedure offers a way to interpret non-separable penalties as implicit empirical-Bayes rules.

Load-bearing premise

The analysis requires the Gaussian sequence model or the linear model with uncorrelated Gaussian designs together with a symmetric convex penalty.

What would settle it

A concrete counterexample in the Gaussian sequence model where a symmetric convex non-separable penalty produces an estimator whose deviation from the matching separable penalty exceeds the stated concentration bound for large dimension.

Figures

Figures reproduced from arXiv: 1906.10319 by Michael Celentano.

**Figure 1.** Figure 1: Plots of bθj vs. yj with penalty (1.7) in model (1.1) at three different noise levels τ = .5, 1, and 2.5. Dimension p = 1000; parameter distribution µθ = 1 20µ−1 + 1 10µ0 + 1 20µ1; regularization parameters λj = 2 for j ≤ 333, λj = 1 for 334 ≤ j ≤ 667, and λj = .5 for 668 ≤ j ≤ 1000. Also shown are curves computed based on the theory developed in the paper on which (yj , bθj ) are predicted to approximatel… view at source ↗

**Figure 2.** Figure 2: Plots of Afp (yj ) vs. yj (theory) and bθj vs. yj (simulation) for various choices of penalty fp in proximal operator (1.3). In all plots, y = θ + τz with z ∼ N(0, Ip). Top row: fp(x) = p 1−α/2kxk α 2 , µθ ≈ N(0, 1). Middle row: fp(x) = p 1−α/2kxk α 1 , µθ = .05δ−1 + .9δ1 + .05δ1. Bottom row: fp(x) = 1 2 minη∈R p + Pp j=1 w2 j ηj + λjη(j) , µθ = .05δ−M + .9δ1 + .05δM, µλ = 1 3 δ2 + 1 3 δ1 + 1 3 δ.5. Bo… view at source ↗

read the original abstract

We show that the high-dimensional behavior of symmetrically penalized least squares with a possibly non-separable, symmetric, convex penalty in both (i) the Gaussian sequence model and (ii) the linear model with uncorrelated Gaussian designs nearly matches the behavior of least squares with an appropriately chosen separable penalty in these same models. The similarity in behavior is precisely quantified by a finite-sample concentration inequality in both cases. Our results help clarify the role non-separability can play in high-dimensional M-estimation. In particular, if the empirical distribution of the coordinates of the parameter is known --exactly or approximately-- there are at most limited advantages to using non-separable, symmetric penalties over separable ones. In contrast, if the empirical distribution of the coordinates of the parameter is unknown, we argue that non-separable, symmetric penalties automatically implement an adaptive procedure which we characterize. We also provide a partial converse which characterizes adaptive procedures which can be implemented in this way.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Non-separable symmetric penalties mostly match separable ones in these Gaussian models, with adaptation as the main extra when the coordinate distribution is unknown.

read the letter

The core result is that symmetrically penalized least squares with a possibly non-separable convex penalty tracks the behavior of a well-chosen separable penalty in the Gaussian sequence model and the uncorrelated-design linear model, and the paper quantifies the gap with a finite-sample concentration inequality. When the empirical distribution of the parameter coordinates is known, the advantage of non-separability is limited; when it is unknown, the non-separable penalty automatically adapts in a way the paper characterizes, along with a partial converse on what adaptations are possible this way. That distinction and the concentration bound are the concrete new pieces relative to earlier work on separable penalties. The argument stays within symmetric convex penalties and the two Gaussian settings, which keeps the claims clean. The modeling restrictions are explicit and necessary for the concentration statements, so they do not feel like hidden weaknesses. The main limitation is scope: the results do not speak to correlated designs or non-Gaussian noise, but the paper does not pretend they do. Proof details are not visible here, so tightness of the bounds and any technical gaps cannot be checked directly, but the abstract-level logic shows no circularity or obvious fitting issues. This is useful reading for people working on high-dimensional M-estimation and the practical value of penalty structure. It is worth sending to referees because the central characterization is precise and addresses a standing question in the area.

Referee Report

0 major / 3 minor

Summary. The manuscript claims that symmetrically penalized least squares with a possibly non-separable symmetric convex penalty exhibits high-dimensional behavior in the Gaussian sequence model and the linear model with uncorrelated Gaussian designs that nearly matches least squares with an appropriately chosen separable penalty; this equivalence is quantified by finite-sample concentration inequalities. The paper further argues that non-separability offers at most limited advantages when the empirical distribution of the parameter coordinates is known (exactly or approximately), while automatically implementing a characterized adaptive procedure when the distribution is unknown, and provides a partial converse on realizable adaptive procedures.

Significance. If the central results hold, the work supplies a precise, finite-sample characterization of the role of non-separability versus separability in high-dimensional M-estimation under symmetric convex penalties. The concentration inequalities and the distinction between known versus unknown empirical distributions provide concrete guidance on when non-separable penalties can or cannot yield meaningful gains, strengthening the theoretical understanding of adaptive estimation in these models.

minor comments (3)

[§1] §1 (Introduction): the transition from the Gaussian sequence model to the linear model could be made more explicit by stating the precise design assumptions (uncorrelated Gaussian) immediately after the sequence-model result.
[§2] Notation for the symmetric penalty and its separable counterpart is introduced gradually; a single displayed definition early in §2 would improve readability.
[final section] The partial converse in the final section would benefit from a brief remark on whether the characterization extends beyond the Gaussian setting or remains model-specific.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive evaluation of the manuscript. The referee's summary accurately reflects the paper's contributions on the approximate equivalence between symmetrically penalized least squares with non-separable penalties and separable penalties, along with the implications for known versus unknown empirical distributions of the parameters. We appreciate the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via concentration inequalities

full rationale

The paper establishes finite-sample concentration inequalities showing that symmetrically penalized least squares (possibly non-separable) behaves similarly to separable penalties in the Gaussian sequence model and linear model with uncorrelated designs. These are direct mathematical derivations under stated convexity/symmetry assumptions, with explicit distinctions drawn for known vs. unknown empirical distributions and a partial converse on adaptive procedures. No steps reduce by construction to fitted inputs, self-citations, or renamings; the results are externally falsifiable concentration bounds independent of the target claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Result rests on standard domain assumptions in high-dimensional statistics; no free parameters, invented entities, or ad-hoc axioms are indicated in the abstract.

axioms (2)

domain assumption Penalty is symmetric and convex
Explicitly required for the approximate separability result in the abstract.
domain assumption Designs are Gaussian sequence or uncorrelated Gaussian linear
Model assumptions stated as the settings where the concentration holds.

pith-pipeline@v0.9.0 · 5685 in / 1218 out tokens · 52458 ms · 2026-05-25T16:32:33.775632+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

[1]

Donoho, and Iain M

Felix Abramovich, Yoav Benjamini, David L. Donoho, and Iain M. Johnstone. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. , 34(2):584--653, 04 2006

work page 2006
[2]

Bauschke and Patrick L

Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces . Spring Science+businees Media, LLC, New York, NY, 2011

work page 2011
[3]

Belloni, V

A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika , 98(4):791--806, 2011

work page 2011
[4]

Bickel and David A

Peter J. Bickel and David A. Freedman. Some Asymptotic Theory for the Bootstrap . The Annals of Statistics , 9(6):1196--1217, 11 1981

work page 1981
[5]

Brown and Eitan Greenshtein

Lawrence D. Brown and Eitan Greenshtein. Nonparametric empirical bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. Ann. Statist. , 37(4):1685--1704, 08 2009

work page 2009
[6]

Bellec, Lecu\'e Guillaume, and Alexandre B

Pierre C. Bellec, Lecu\'e Guillaume, and Alexandre B. Tsybakov. Slope meets lasso: Improved oracle bounds and optimality. Ann. Statist. , 46(6B):3603--3642, 12 2018

work page 2018
[7]

The dynamics of message passing on dense graphs, with applications to compressed sensing

Mohsen Bayati and Andrea Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing . IEEE Trans. on Inform. Theory , 57:764--785, 2011

work page 2011
[8]

The LASSO risk for gaussian matrices

Mohsen Bayati and Andrea Montanari. The LASSO risk for gaussian matrices . IEEE Trans. on Inform. Theory , 58:1997--2017, 2012

work page 1997
[9]

State evolution for approximate message passing with non-separable functions

Raphael Berthier, Andrea Montanari, and Phan-Minh Nguyen. State evolution for approximate message passing with non-separable functions . Information and Inference , 01 2019

work page 2019
[10]

An iterative construction of solutions of the TAP equations for the Sherrington--Kirkpatrick model

Erwin Bolthausen. An iterative construction of solutions of the TAP equations for the Sherrington--Kirkpatrick model . Communications in Mathematical Physics , 325(1):333--366, 2014

work page 2014
[11]

Cand \` e s

Ma gorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Cand \` e s. SLOPE---Adaptive Variable Selection via Convex Optimization . The Annals of Applied Statistics , 9(3):1103--1140, 9 2015

work page 2015
[13]

Peter W. Day. Decreasing rearrangements and doubly stochastic operators. Transactions of the American Mathematical Society , 178:383--392, 1973

work page 1973
[14]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association , 90(432):1200--1224, 1995

work page 1995
[15]

High dimensional robust M-estimation: asymptotic variance via approximate message passing

David Donoho and Andrea Montanari. High dimensional robust M-estimation: asymptotic variance via approximate message passing . Probability Theory and Related Fields , 166(3-4):935--969, 12 2016

work page 2016
[16]

Tweedie's Formula and Selection Bias

Bradley Efron. Tweedie's Formula and Selection Bias . Journal of the American Statistical Association , 106(496):1602--1614, 12 2011

work page 2011
[17]

Evans and Ronald F

Lawrence C. Evans and Ronald F. Gariepy. Measure Theory and Fine Properties of Functions . CRC Press, Taylor & Francis Group, Boca Raton, FL, revised edition, 2015

work page 2015
[18]

Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results

Noureddine El Karoui. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results. 2013. arXiv:1311.2445

work page internal anchor Pith review Pith/arXiv arXiv 2013
[19]

On robust regression with high-dimensional predictors

Noureddine El Karoui, Derek Bean, Peter J Bickel, Chinghway Lim, and Bin Yu. On robust regression with high-dimensional predictors. Proceedings of the National Academy of Sciences of the United States of America , 110(36):14557--62, 9 2013

work page 2013
[20]

Stein's estimation rule and its competitors--an empirical bayes approach

Bradley Efron and Carl Morris. Stein's estimation rule and its competitors--an empirical bayes approach. Journal of the American Statistical Association , 68(341):117--130, 1973

work page 1973
[21]

On the rate of convergence in wasserstein distance of the empirical measure

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in wasserstein distance of the empirical measure. Probability Theory and Related Fields , 162(3):707--738, Aug 2015

work page 2015
[22]

Y. Gordon. On milman's inequality and random subspaces which escape through a mesh in ℝn. In Joram Lindenstrauss and Vitali D. Milman, editors, Geometric Aspects of Functional Analysis , pages 84--106, Berlin, Heidelberg, 1988. Springer Berlin Heidelberg

work page 1988
[23]

Hong Hu and Yue M. Lu. Asymptotics and optimal designs of slope for sparse linear regression. 2019

work page 2019
[24]

Subdifferentials of convex symmetric functions: an application of the inequality of hardy, littlewood, and p\'olya

Anthony Horsley and Andrzej Wrobel. Subdifferentials of convex symmetric functions: an application of the inequality of hardy, littlewood, and p\'olya. Journal of Mathematical Analysis and Applications , 135:462--475, 1988

work page 1988
[25]

State evolution for general approximate message passing algorithms, with applications to spatial coupling

Adel Javanmard and Andrea Montanari. State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference: A Journal of the IMA , 2(2):115--144, 2013

work page 2013
[26]

General maximum likelihood empirical bayes estimation of normal means

Wenhua Jiang and Cun-Hui Zhang. General maximum likelihood empirical bayes estimation of normal means. Ann. Statist. , 37(4):1647--1684, 08 2009

work page 2009
[27]

Foundations of Modern Probability

Olav Kallenberg. Foundations of Modern Probability . Applied Probability Trust, New York, NY, 2002

work page 2002
[28]

The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning

L \'e o Miolane and Andrea Montanari. The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning. arXiv:1811.01212 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Guillaume Obozinski and Francis R. Bach. Convex relaxation for combinatorial penalties. 2012

work page 2012
[30]

Proximal Algorithms

Neal Parikh and Stephen Boyd. Proximal Algorithms . Foundations and Trends in Optimization , 1(3):123--231, 2013

work page 2013
[31]

An empirical bayes approach to statistics

Herbert Robbins. An empirical bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , pages 157--163, Berkeley, Calif., 1956. University of California Press

work page 1956
[32]

R. T. Rockafellar. Characterization of the subdifferentials of convex functions. Pacific J. Math. , 17(3):497--510, 1966

work page 1966
[33]

Identifying Groups of Strongly Correlated Variables through Smoothed Ordered Weighted L_1 -norms

Raman Sankaran, Francis Bach, and Chiranjib Bhattacharya. Identifying Groups of Strongly Correlated Variables through Smoothed Ordered Weighted L_1 -norms . In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics , volume 54 of Proceedings of Machine Learning Research , pages 1123--...

work page 2017
[34]

SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax

Weijie Su and Emmanuel Cand \` e s. SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax . The Annals of Statistics , 44(3):1038--1068, 6 2016

work page 2016
[35]

A modern maximum-likelihood theory for high-dimensional logistic regression

Pragya Sur and Emmanuel J Cand \`e s. A modern maximum-likelihood theory for high-dimensional logistic regression. arXiv:1803.06964 , 2018

work page arXiv 2018
[36]

A framework to characterize performance of LASSO algorithms

Mihailo Stojnic. A framework to characterize performance of lasso algorithms. arXiv:1303.7291 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[37]

Precise Error Analysis of Regularized M-estimators in High-dimensions

Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassibi. Precise Error Analysis of Regularized M-estimators in High-dimensions . Technical report, 2016

work page 2016
[38]

Precise error analysis of regularized m-estimators in high-dimensions

Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassibi. Precise error analysis of regularized m-estimators in high-dimensions. IEEE Transactions on Information Theory , 2018

work page 2018
[39]

Regularized linear regression: A precise analysis of the estimation error

Christos Thrampoulidis, Samet Oymak, and Babak Hassibi. Regularized linear regression: A precise analysis of the estimation error. In Conference on Learning Theory , pages 1683--1709, 2015

work page 2015
[40]

Optimal Transport, old and new

C \`e dric Villani. Optimal Transport, old and new . Springer-Verlag Berlin Heidelberg, New York, NY, 2010

work page 2010
[41]

Xianchao Xie, S. C. Kou, and Lawrence D. Brown. Sure estimates for a heteroscedastic hierarchical model. Journal of the American Statistical Association , 107(500):1465--1479, 2012

work page 2012

[1] [1]

Donoho, and Iain M

Felix Abramovich, Yoav Benjamini, David L. Donoho, and Iain M. Johnstone. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. , 34(2):584--653, 04 2006

work page 2006

[2] [2]

Bauschke and Patrick L

Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces . Spring Science+businees Media, LLC, New York, NY, 2011

work page 2011

[3] [3]

Belloni, V

A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika , 98(4):791--806, 2011

work page 2011

[4] [4]

Bickel and David A

Peter J. Bickel and David A. Freedman. Some Asymptotic Theory for the Bootstrap . The Annals of Statistics , 9(6):1196--1217, 11 1981

work page 1981

[5] [5]

Brown and Eitan Greenshtein

Lawrence D. Brown and Eitan Greenshtein. Nonparametric empirical bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. Ann. Statist. , 37(4):1685--1704, 08 2009

work page 2009

[6] [6]

Bellec, Lecu\'e Guillaume, and Alexandre B

Pierre C. Bellec, Lecu\'e Guillaume, and Alexandre B. Tsybakov. Slope meets lasso: Improved oracle bounds and optimality. Ann. Statist. , 46(6B):3603--3642, 12 2018

work page 2018

[7] [7]

The dynamics of message passing on dense graphs, with applications to compressed sensing

Mohsen Bayati and Andrea Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing . IEEE Trans. on Inform. Theory , 57:764--785, 2011

work page 2011

[8] [8]

The LASSO risk for gaussian matrices

Mohsen Bayati and Andrea Montanari. The LASSO risk for gaussian matrices . IEEE Trans. on Inform. Theory , 58:1997--2017, 2012

work page 1997

[9] [9]

State evolution for approximate message passing with non-separable functions

Raphael Berthier, Andrea Montanari, and Phan-Minh Nguyen. State evolution for approximate message passing with non-separable functions . Information and Inference , 01 2019

work page 2019

[10] [10]

An iterative construction of solutions of the TAP equations for the Sherrington--Kirkpatrick model

Erwin Bolthausen. An iterative construction of solutions of the TAP equations for the Sherrington--Kirkpatrick model . Communications in Mathematical Physics , 325(1):333--366, 2014

work page 2014

[11] [11]

Cand \` e s

Ma gorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Cand \` e s. SLOPE---Adaptive Variable Selection via Convex Optimization . The Annals of Applied Statistics , 9(3):1103--1140, 9 2015

work page 2015

[12] [13]

Peter W. Day. Decreasing rearrangements and doubly stochastic operators. Transactions of the American Mathematical Society , 178:383--392, 1973

work page 1973

[13] [14]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association , 90(432):1200--1224, 1995

work page 1995

[14] [15]

High dimensional robust M-estimation: asymptotic variance via approximate message passing

David Donoho and Andrea Montanari. High dimensional robust M-estimation: asymptotic variance via approximate message passing . Probability Theory and Related Fields , 166(3-4):935--969, 12 2016

work page 2016

[15] [16]

Tweedie's Formula and Selection Bias

Bradley Efron. Tweedie's Formula and Selection Bias . Journal of the American Statistical Association , 106(496):1602--1614, 12 2011

work page 2011

[16] [17]

Evans and Ronald F

Lawrence C. Evans and Ronald F. Gariepy. Measure Theory and Fine Properties of Functions . CRC Press, Taylor & Francis Group, Boca Raton, FL, revised edition, 2015

work page 2015

[17] [18]

Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results

Noureddine El Karoui. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results. 2013. arXiv:1311.2445

work page internal anchor Pith review Pith/arXiv arXiv 2013

[18] [19]

On robust regression with high-dimensional predictors

Noureddine El Karoui, Derek Bean, Peter J Bickel, Chinghway Lim, and Bin Yu. On robust regression with high-dimensional predictors. Proceedings of the National Academy of Sciences of the United States of America , 110(36):14557--62, 9 2013

work page 2013

[19] [20]

Stein's estimation rule and its competitors--an empirical bayes approach

Bradley Efron and Carl Morris. Stein's estimation rule and its competitors--an empirical bayes approach. Journal of the American Statistical Association , 68(341):117--130, 1973

work page 1973

[20] [21]

On the rate of convergence in wasserstein distance of the empirical measure

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in wasserstein distance of the empirical measure. Probability Theory and Related Fields , 162(3):707--738, Aug 2015

work page 2015

[21] [22]

Y. Gordon. On milman's inequality and random subspaces which escape through a mesh in ℝn. In Joram Lindenstrauss and Vitali D. Milman, editors, Geometric Aspects of Functional Analysis , pages 84--106, Berlin, Heidelberg, 1988. Springer Berlin Heidelberg

work page 1988

[22] [23]

Hong Hu and Yue M. Lu. Asymptotics and optimal designs of slope for sparse linear regression. 2019

work page 2019

[23] [24]

Subdifferentials of convex symmetric functions: an application of the inequality of hardy, littlewood, and p\'olya

Anthony Horsley and Andrzej Wrobel. Subdifferentials of convex symmetric functions: an application of the inequality of hardy, littlewood, and p\'olya. Journal of Mathematical Analysis and Applications , 135:462--475, 1988

work page 1988

[24] [25]

State evolution for general approximate message passing algorithms, with applications to spatial coupling

Adel Javanmard and Andrea Montanari. State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference: A Journal of the IMA , 2(2):115--144, 2013

work page 2013

[25] [26]

General maximum likelihood empirical bayes estimation of normal means

Wenhua Jiang and Cun-Hui Zhang. General maximum likelihood empirical bayes estimation of normal means. Ann. Statist. , 37(4):1647--1684, 08 2009

work page 2009

[26] [27]

Foundations of Modern Probability

Olav Kallenberg. Foundations of Modern Probability . Applied Probability Trust, New York, NY, 2002

work page 2002

[27] [28]

The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning

L \'e o Miolane and Andrea Montanari. The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning. arXiv:1811.01212 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [29]

Guillaume Obozinski and Francis R. Bach. Convex relaxation for combinatorial penalties. 2012

work page 2012

[29] [30]

Proximal Algorithms

Neal Parikh and Stephen Boyd. Proximal Algorithms . Foundations and Trends in Optimization , 1(3):123--231, 2013

work page 2013

[30] [31]

An empirical bayes approach to statistics

Herbert Robbins. An empirical bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , pages 157--163, Berkeley, Calif., 1956. University of California Press

work page 1956

[31] [32]

R. T. Rockafellar. Characterization of the subdifferentials of convex functions. Pacific J. Math. , 17(3):497--510, 1966

work page 1966

[32] [33]

Identifying Groups of Strongly Correlated Variables through Smoothed Ordered Weighted L_1 -norms

Raman Sankaran, Francis Bach, and Chiranjib Bhattacharya. Identifying Groups of Strongly Correlated Variables through Smoothed Ordered Weighted L_1 -norms . In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics , volume 54 of Proceedings of Machine Learning Research , pages 1123--...

work page 2017

[33] [34]

SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax

Weijie Su and Emmanuel Cand \` e s. SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax . The Annals of Statistics , 44(3):1038--1068, 6 2016

work page 2016

[34] [35]

A modern maximum-likelihood theory for high-dimensional logistic regression

Pragya Sur and Emmanuel J Cand \`e s. A modern maximum-likelihood theory for high-dimensional logistic regression. arXiv:1803.06964 , 2018

work page arXiv 2018

[35] [36]

A framework to characterize performance of LASSO algorithms

Mihailo Stojnic. A framework to characterize performance of lasso algorithms. arXiv:1303.7291 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[36] [37]

Precise Error Analysis of Regularized M-estimators in High-dimensions

Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassibi. Precise Error Analysis of Regularized M-estimators in High-dimensions . Technical report, 2016

work page 2016

[37] [38]

Precise error analysis of regularized m-estimators in high-dimensions

Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassibi. Precise error analysis of regularized m-estimators in high-dimensions. IEEE Transactions on Information Theory , 2018

work page 2018

[38] [39]

Regularized linear regression: A precise analysis of the estimation error

Christos Thrampoulidis, Samet Oymak, and Babak Hassibi. Regularized linear regression: A precise analysis of the estimation error. In Conference on Learning Theory , pages 1683--1709, 2015

work page 2015

[39] [40]

Optimal Transport, old and new

C \`e dric Villani. Optimal Transport, old and new . Springer-Verlag Berlin Heidelberg, New York, NY, 2010

work page 2010

[40] [41]

Xianchao Xie, S. C. Kou, and Lawrence D. Brown. Sure estimates for a heteroscedastic hierarchical model. Journal of the American Statistical Association , 107(500):1465--1479, 2012

work page 2012