pith. sign in

arxiv: 2205.13469 · v4 · submitted 2022-05-26 · 🧮 math.ST · stat.ME· stat.TH

Proximal Estimation and Inference

Pith reviewed 2026-05-24 11:57 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH
keywords proximal estimatorspenalized estimatorsasymptotic distributionoracle propertylinear regressionridgeless estimatorsconvex analysisirregular design
0
0 comments X

The pith

Penalized estimators are proximal operators whose asymptotics depend only on the initial estimator, its penalty subgradient, and the proximal inner product.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a convex analysis framework that treats penalized estimators as the result of applying a proximal operator to an initial estimator. This representation yields a closed-form asymptotic distribution for the estimator that depends solely on three quantities: the initial estimator's asymptotics, the limit penalty subgradient, and the inner product of the proximal operator. The framework applies to both regular and irregular designs in linear regression, where it produces new ridgeless-type estimators that are sqrt(n)-consistent, asymptotically normal, and possess the oracle property.

Core claim

Penalized estimators admit an exact representation as proximal operators applied to corresponding initial estimators. Their asymptotic distribution follows a closed-form formula that depends only on the asymptotic distribution of the initial estimator, the estimator's limit penalty subgradient, and the inner product defining the associated proximal operator. In linear regression settings, this leads to new sqrt(n)-consistent, asymptotically normal Ridgeless-type proximal estimators that feature the Oracle property.

What carries the argument

The proximal operator, which defines the penalized estimator as its application to an initial estimator under a convex penalty.

If this is right

  • The asymptotic distribution of proximal estimators is fully characterized in closed form for both regular and irregular designs.
  • New Ridgeless-type proximal estimators achieve sqrt(n)-consistency and asymptotic normality in linear regression.
  • These estimators satisfy the Oracle property based on the properties of the penalty's subgradients.
  • The framework systematically covers linear regression under both regular and irregular designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The proximal representation may simplify construction of valid confidence intervals in high-dimensional settings where direct analysis is intractable.
  • Extensions to dependent data or nonlinear models could follow by preserving the same three-ingredient structure for the limiting distribution.
  • Practical Monte Carlo results indicate the new estimators are usable immediately in regression applications with irregular designs.

Load-bearing premise

That a large class of penalized estimators can be exactly represented as proximal operators of an initial estimator under convex penalties satisfying subdifferentiability conditions.

What would settle it

A concrete penalized estimator whose asymptotic distribution deviates from the closed-form expression predicted by its proximal operator representation under the stated conditions.

Figures

Figures reproduced from arXiv: 2205.13469 by Alberto Quaini, Fabio Trojani.

Figure 1
Figure 1. Figure 1: Heatmap of population design matrices Qr and Qs in our Monte Carlo simulation settings. 11 [PITH_FULL_IMAGE:figures/full_fig_p028_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Monte Carlo quartiles of sample squared errors [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Monte Carlo quartiles of sample squared errors [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Monte Carlo quartiles of normalized sample squared errors [PITH_FULL_IMAGE:figures/full_fig_p030_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Monte Carlo quartiles of sample squared errors [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Monte Carlo detection probabilities P(Aˆ n = A) for Ridgeless Adaptive Lasso (RLAL, blue solid line) and modified Ridgeless Adaptive Lasso (MRLAL, red dashed line) proximal estimators, using tuning parameters λn = n −γ [γ ∈ (0.5, 1)] and sample sizes n = 100, 200, under a regular, singular and nearly singular design, respectively. 15 [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Monte Carlo inclusion probabilities P(Aˆ n ⊃ A) for Ridgeless Adaptive Lasso (RLAL, blue solid line) and modified Ridgeless Adaptive Lasso (MRLAL, red dashed line) proximal estimators, using tuning parameters λn = n −γ [γ ∈ (0.5, 1)] and sample sizes n = 100, 200, under a regular, singular and nearly-singular design, respectively. 16 [PITH_FULL_IMAGE:figures/full_fig_p033_7.png] view at source ↗
read the original abstract

We build a unifying convex analysis framework characterizing the statistical properties of a large class of penalized estimators, both under a regular and an irregular design. Our framework interprets penalized estimators as proximal estimators, defined by a proximal operator applied to a corresponding initial estimator. We characterize the asymptotic properties of proximal estimators, showing that their asymptotic distribution follows a closed-form formula depending only on (i) the asymptotic distribution of the initial estimator, (ii) the estimator's limit penalty subgradient and (iii) the inner product defining the associated proximal operator. In parallel, we characterize the Oracle features of proximal estimators from the properties of their penalty's subgradients. We exploit our approach to systematically cover linear regression settings with a regular or irregular design. For these settings, we build new $\sqrt{n}-$consistent, asymptotically normal Ridgeless-type proximal estimators, which feature the Oracle property and are shown to perform satisfactorily in practically relevant Monte Carlo settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a unifying convex-analysis framework that represents a broad class of penalized estimators as proximal operators applied to an initial estimator. It derives a closed-form asymptotic distribution for these proximal estimators that depends only on the limiting law of the initial estimator, the limiting penalty subgradient, and the inner product of the proximal mapping. Oracle properties are characterized via subgradient conditions. The framework is applied to linear regression under both regular and irregular designs, producing new √n-consistent, asymptotically normal Ridgeless-type proximal estimators that are claimed to satisfy the Oracle property and to perform well in Monte Carlo experiments.

Significance. If the proximal representation and the resulting closed-form asymptotics hold without additional null-space corrections, the framework would supply a systematic route to asymptotic analysis for many penalized estimators, reducing the need for case-by-case derivations. The explicit construction of new estimators for irregular designs that remain √n-consistent and Oracle would be a concrete advance.

major comments (2)
  1. [irregular-design linear-regression section (and the general proximal-asymptotics theorem)] The central claim that the asymptotic distribution depends only on the three listed quantities (initial-estimator limit, penalty subgradient, proximal inner product) rests on an exact proximal fixed-point representation. In the irregular-design linear-regression case the design matrix need not have full column rank; an extra term involving the null space of the Gram matrix can appear in the subdifferential condition. This term is not among the three listed quantities, so the claimed closed-form formula would no longer hold. The manuscript must exhibit the explicit derivation for the irregular case and show that the null-space contribution vanishes or is absorbed.
  2. [Oracle-property characterization and the irregular-design estimator construction] The abstract states that the new Ridgeless-type proximal estimators are √n-consistent and asymptotically normal with the Oracle property. Because the Oracle property is derived from subgradient conditions that presuppose the exact proximal representation, any failure of that representation in irregular designs would also invalidate the Oracle claim for those estimators. The Monte Carlo evidence alone does not substitute for a corrected asymptotic derivation.
minor comments (2)
  1. [Notation and general framework] Notation for the proximal operator and the inner product should be introduced once with a single consistent symbol rather than re-defined in each application section.
  2. [Monte Carlo experiments] The Monte Carlo section should report the precise values of the penalty parameters and the dimension-to-sample-size ratios used, so that readers can replicate the irregular-design experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and insightful comments. We address the two major comments point by point below, clarifying the treatment of irregular designs and indicating where explicit derivations will be added.

read point-by-point responses
  1. Referee: [irregular-design linear-regression section (and the general proximal-asymptotics theorem)] The central claim that the asymptotic distribution depends only on the three listed quantities (initial-estimator limit, penalty subgradient, proximal inner product) rests on an exact proximal fixed-point representation. In the irregular-design linear-regression case the design matrix need not have full column rank; an extra term involving the null space of the Gram matrix can appear in the subdifferential condition. This term is not among the three listed quantities, so the claimed closed-form formula would no longer hold. The manuscript must exhibit the explicit derivation for the irregular case and show that the null-space contribution vanishes or is absorbed.

    Authors: We agree that an explicit derivation for the irregular-design case is required to confirm the claimed closed-form asymptotics. In the proximal framework, the operator is defined with respect to the seminorm induced by the (possibly singular) Gram matrix; this automatically projects away the null-space component, which is absorbed into the limiting distribution of the initial estimator under the paper's assumptions on the penalty and the initial estimator. We will add a self-contained derivation (expanding the current Section 4.2) that starts from the subdifferential inclusion, isolates the null-space term, and shows it vanishes in the limiting proximal fixed-point equation, thereby preserving dependence on only the three listed quantities. revision: yes

  2. Referee: [Oracle-property characterization and the irregular-design estimator construction] The abstract states that the new Ridgeless-type proximal estimators are √n-consistent and asymptotically normal with the Oracle property. Because the Oracle property is derived from subgradient conditions that presuppose the exact proximal representation, any failure of that representation in irregular designs would also invalidate the Oracle claim for those estimators. The Monte Carlo evidence alone does not substitute for a corrected asymptotic derivation.

    Authors: The Oracle characterization is obtained directly from the limiting subgradient conditions once the proximal representation is established. Because the added derivation will confirm that the representation continues to hold without extra null-space corrections, the Oracle claims for the new Ridgeless-type estimators remain valid. The Monte Carlo study is presented only as numerical corroboration; the primary justification is the theoretical argument. We will revise the text to make this logical dependence explicit and to cross-reference the new irregular-design derivation. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation applies standard convex analysis to proximal representations

full rationale

The paper defines proximal estimators via the proximal operator applied to an initial estimator and derives their asymptotic distribution from the initial estimator's limiting law, the limiting subgradient, and the proximal inner product using subdifferential calculus. This chain relies on convex analysis identities that hold independently of the target asymptotic result and does not reduce any claimed prediction to a fitted input, self-citation, or definitional tautology. The Oracle property characterization and Ridgeless-type constructions follow directly from the same subgradient conditions without circular steps. The framework is self-contained against external convex-analysis benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that penalized estimators admit proximal representations and that convex analysis supplies the necessary subdifferential and proximal-operator calculus; no free parameters or invented entities are mentioned in the abstract.

axioms (2)
  • domain assumption Penalized estimators can be exactly represented as proximal operators applied to an initial estimator under a convex penalty
    This representation is the foundational step that allows the asymptotic and Oracle characterizations.
  • domain assumption The penalty function admits a well-defined limit subgradient that governs both the asymptotic distribution and Oracle features
    Invoked when linking subgradient properties to the closed-form asymptotics and Oracle behavior.

pith-pipeline@v0.9.0 · 5679 in / 1353 out tokens · 39099 ms · 2026-05-24T11:57:09.779695+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

  1. [1]

    Variational convergence for functions and operators, volume 1

    Hedy Attouch. Variational convergence for functions and operators, volume 1. Pitman Advanced Publishing Program, 1984

  2. [2]

    Benign overfitting in linear regression

    Peter L Bartlett, Philip M Long, G \'a bor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117 0 (48): 0 30063--30070, 2020

  3. [3]

    Convex analysis and monotone operator theory in Hilbert spaces

    Heinz H Bauschke, Patrick L Combettes, et al. Convex analysis and monotone operator theory in Hilbert spaces. Springer, 2nd edition, 2016

  4. [4]

    Variable selection via nonconcave penalized likelihood and its oracle properties

    Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96 0 (456): 0 1348--1360, 2001

  5. [5]

    Ridge estimator in singulah oesiun with application to age-period-cohort analysis of disease rates

    Wenjiang J Fu. Ridge estimator in singulah oesiun with application to age-period-cohort analysis of disease rates. Communications in statistics-Theory and Methods, 29 0 (2): 0 263--278, 2000

  6. [6]

    On the asymptotics of constrained m-estimation

    Charles J Geyer. On the asymptotics of constrained m-estimation. The Annals of statistics, pages 1993--2010, 1994

  7. [8]

    Fundamentals of convex analysis

    Jean-Baptiste Hiriart-Urruty and Claude Lemar \'e chal. Fundamentals of convex analysis. Springer Science & Business Media, 2004

  8. [9]

    Ridge regression: applications to nonorthogonal problems

    Arthur E Hoerl and Robert W Kennard. Ridge regression: applications to nonorthogonal problems. Technometrics, 12 0 (1): 0 69--82, 1970

  9. [10]

    The variation of the spectrum of a normal matrix

    Alan Hoffman and Helmut Wielandt. The variation of the spectrum of a normal matrix. Duke Math. J., 20: 0 37--39, 1953

  10. [11]

    Epi-convergence in distribution and stochastic equi-semicontinuity

    Keith Knight. Epi-convergence in distribution and stochastic equi-semicontinuity. Unpublished manuscript, 37 0 (7): 0 14, 1999

  11. [12]

    Asymptotics for lasso-type estimators

    Keith Knight and Wenjiang Fu. Asymptotics for lasso-type estimators. Annals of statistics, pages 1356--1378, 2000

  12. [13]

    Inequality constrained least-squares estimation

    Chong Kiew Liew. Inequality constrained least-squares estimation. Journal of the American Statistical Association, 71 0 (355): 0 746--751, 1976

  13. [14]

    Puri Madan, Carl T Russel, and Thomas Mathew

    L. Puri Madan, Carl T Russel, and Thomas Mathew. Convergence of generalized inverses with applications to asymptotic hypothesis testing. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002), 46 0 (2): 0 277--286, 1984

  14. [15]

    On the continuity of the young-fenchel transform

    Umberto Mosco. On the continuity of the young-fenchel transform. Journal of Mathematical Analysis and Applications, 35 0 (3): 0 518--535, 1971

  15. [16]

    Asymptotics of ridge (less) regression under general source condition

    Dominic Richards, Jaouad Mourtada, and Lorenzo Rosasco. Asymptotics of ridge (less) regression under general source condition. In International Conference on Artificial Intelligence and Statistics, pages 3889--3897. PMLR, 2021

  16. [17]

    Variational analysis, volume 317

    R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009

  17. [18]

    On the convergence of closed-valued measurable multifunctions

    Gabriella Salinetti and Roger J-B Wets. On the convergence of closed-valued measurable multifunctions. Transactions of the American Mathematical Society, 266 0 (1): 0 275--289, 1981

  18. [19]

    On the convergence in distribution of measurable multifunctions (random sets) normal integrands, stochastic processes and stochastic infima

    Gabriella Salinetti and Roger J-B Wets. On the convergence in distribution of measurable multifunctions (random sets) normal integrands, stochastic processes and stochastic infima. Mathematics of Operations Research, 11 0 (3): 0 385--419, 1986

  19. [20]

    Regression shrinkage and selection via the lasso

    Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58 0 (1): 0 267--288, 1996

  20. [21]

    The intrinsic estimator for age-period-cohort analysis: what it is and how to use it

    Yang Yang, Sam Schulhofer-Wohl, Wenjiang J Fu, and Kenneth C Land. The intrinsic estimator for age-period-cohort analysis: what it is and how to use it. American Journal of Sociology, 113 0 (6): 0 1697--1736, 2008

  21. [22]

    Model selection and estimation in regression with grouped variables

    Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68 0 (1): 0 49--67, 2006

  22. [23]

    The adaptive lasso and its oracle properties

    Hui Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101 0 (476): 0 1418--1429, 2006

  23. [24]

    Regularization and variable selection via the elastic net

    Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67 0 (2): 0 301--320, 2005

  24. [25]

    On the adaptive elastic-net with a diverging number of parameters

    Hui Zou and Hao Helen Zhang. On the adaptive elastic-net with a diverging number of parameters. Annals of statistics, 37 0 (4): 0 1733, 2009

  25. [26]

    bbook [author] Bauschke , Heinz H H. H. , Combettes , Patrick L P. L. et al. ( 2016 ). Convex analysis and monotone operator theory in Hilbert spaces , 2nd ed. Springer . bbook

  26. [27]

    ( 2009 )

    barticle [author] Caner , Mehmet M. ( 2009 ). Lasso-type GMM estimator . Econometric Theory 25 270--290 . barticle

  27. [28]

    Liao , Zhipeng Z

    barticle [author] Cheng , Xu X. Liao , Zhipeng Z. ( 2015 ). Select the valid and relevant moments: An information-based LASSO for GMM with many moments . Journal of Econometrics 186 443--464 . barticle

  28. [29]

    barticle [author] Fu , Wenjiang J W. J. ( 2000 ). Ridge estimator in singulah oesiun with application to age-period-cohort analysis of disease rates . Communications in statistics-Theory and Methods 29 263--278 . barticle

  29. [30]

    , Monfort , Alain A

    binproceedings [author] Gourieroux , Christian C. , Monfort , Alain A. Trognon , Alain A. ( 1985 ). Moindres carr \'e s asymptotiques . In Annales de l'INSEE 91--122 . JSTOR . binproceedings

  30. [31]

    bbook [author] Hampel , Frank R F. R. , Ronchetti , Elvezio M E. M. , Rousseeuw , Peter J P. J. Stahel , Werner A W. A. ( 2011 ). Robust statistics: the approach based on influence functions 196 . John Wiley & Sons . bbook

  31. [32]

    The Annals of Statistics , volume =

    barticle [author] Hastie , Trevor T. , Montanari , Andrea A. , Rosset , Saharon S. Tibshirani , Ryan J R. J. ( 2019 ). Surprises in high-dimensional ridgeless least squares interpolation . arXiv preprint arXiv:1903.08560 . barticle

  32. [33]

    Lemar \'e chal , Claude C

    bbook [author] Hiriart-Urruty , Jean-Baptiste J.-B. Lemar \'e chal , Claude C. ( 2004 ). Fundamentals of convex analysis . Springer Science & Business Media . bbook

  33. [34]

    barticle [author] Hoerl , Arthur E A. E. Kennard , Robert W R. W. ( 1970 ). Ridge regression: applications to nonorthogonal problems . Technometrics 12 69--82 . barticle

  34. [35]

    barticle [author] Lee , Jason D J. D. , Sun , Yuekai Y. Taylor , Jonathan E J. E. ( 2015 ). On model selection consistency of regularized M-estimators . Electronic Journal of Statistics 9 608--642 . barticle

  35. [36]

    , Peng , Heng H

    barticle [author] Li , Gaorong G. , Peng , Heng H. Zhu , Lixing L. ( 2011 ). Nonconcave penalized M-estimation with a diverging number of parameters . Statistica Sinica 391--419 . barticle

  36. [37]

    ( 2013 )

    barticle [author] Liao , Zhipeng Z. ( 2013 ). Adaptive GMM shrinkage estimation with consistent moment selection . Econometric Theory 29 857--904 . barticle

  37. [38]

    barticle [author] Liew , Chong Kiew C. K. ( 1976 ). Inequality constrained least-squares estimation . Journal of the American Statistical Association 71 746--751 . barticle

  38. [39]

    ( 2017 )

    barticle [author] Loh , Po-Ling P.-L. ( 2017 ). Statistical consistency and asymptotic normality for high-dimensional robust M -estimators . The Annals of Statistics 45 866--896 . barticle

  39. [40]

    barticle [author] Moreau , Jean Jacques J. J. ( 1962 ). Fonctions convexes duales et points proximaux dans un espace hilbertien . Comptes rendus hebdomadaires des s é ances de l'Acad é mie des sciences 255 2897--2899 . barticle

  40. [41]

    barticle [author] Negahban , Sahand N S. N. , Ravikumar , Pradeep P. , Wainwright , Martin J M. J. Yu , Bin B. ( 2012 ). A unified framework for high-dimensional analysis of M -estimators with decomposable regularizers . Statistical science 27 538--557 . barticle

  41. [42]

    , Mourtada , Jaouad J

    binproceedings [author] Richards , Dominic D. , Mourtada , Jaouad J. Rosasco , Lorenzo L. ( 2021 ). Asymptotics of ridge (less) regression under general source condition . In International Conference on Artificial Intelligence and Statistics 3889--3897 . PMLR . binproceedings

  42. [43]

    Wets , Roger J-B R

    barticle [author] Salinetti , Gabriella G. Wets , Roger J-B R. J.-B. ( 1981 ). On the convergence of closed-valued measurable multifunctions . Transactions of the American Mathematical Society 266 275--289 . barticle

  43. [44]

    Wets , Roger J-B R

    barticle [author] Salinetti , Gabriella G. Wets , Roger J-B R. J.-B. ( 1986 ). On the convergence in distribution of measurable multifunctions (random sets) normal integrands, stochastic processes and stochastic infima . Mathematics of Operations Research 11 385--419 . barticle

  44. [45]

    ( 1996 )

    barticle [author] Tibshirani , Robert R. ( 1996 ). Regression shrinkage and selection via the lasso . Journal of the Royal Statistical Society: Series B (Methodological) 58 267--288 . barticle

  45. [46]

    , Schulhofer-Wohl , Sam S

    barticle [author] Yang , Yang Y. , Schulhofer-Wohl , Sam S. , Fu , Wenjiang J W. J. Land , Kenneth C K. C. ( 2008 ). The intrinsic estimator for age-period-cohort analysis: what it is and how to use it . American Journal of Sociology 113 1697--1736 . barticle

  46. [47]

    Lin , Yi Y

    barticle [author] Yuan , Ming M. Lin , Yi Y. ( 2006 ). Model selection and estimation in regression with grouped variables . Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 49--67 . barticle

  47. [48]

    ( 2006 )

    barticle [author] Zou , Hui H. ( 2006 ). The adaptive lasso and its oracle properties . Journal of the American statistical association 101 1418--1429 . barticle

  48. [49]

    Hastie , Trevor T

    barticle [author] Zou , Hui H. Hastie , Trevor T. ( 2005 ). Regularization and variable selection via the elastic net . Journal of the royal statistical society: series B (statistical methodology) 67 301--320 . barticle

  49. [50]

    Zhang , Hao Helen H

    barticle [author] Zou , Hui H. Zhang , Hao Helen H. H. ( 2009 ). On the adaptive elastic-net with a diverging number of parameters . Annals of statistics 37 1733 . barticle