pith. sign in

arxiv: 2511.11054 · v2 · pith:64PTCJN7new · submitted 2025-11-14 · 🧮 math.ST · stat.TH

Tuning free Catoni type joint robust estimation

Pith reviewed 2026-05-21 20:07 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords heavy-tailed estimationCatoni estimatorjoint estimationrobust statisticsmean estimationlinear regressionnon-asymptotic boundsPoincaré-Miranda theorem
0
0 comments X

The pith

A system of two coupled Catoni-type equations estimates both a parameter and its unknown variance at sub-Gaussian rates under heavy tails, without tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a joint estimation procedure that solves two linked Catoni-type equations to recover the target parameter and the noise variance at the same time. This is done for mean estimation, linear regression, and penalized regression under the sole assumption that the noise has a finite moment of order 2β for β between 1 and 2. The resulting non-asymptotic deviation bounds for both quantities match the rates that would be available if the variance were known in advance. Because the equations are non-convex, the analysis replaces standard convex M-estimation tools with an application of the Poincaré-Miranda theorem to guarantee the existence of suitable solutions and control their joint error.

Core claim

The central claim is that the coupled system of two Catoni-type estimating equations admits solutions whose joint deviation from the true parameter and true variance satisfies sub-Gaussian-type bounds under a finite 2β-moment condition with β∈(1,2], with rates that match those of oracle procedures knowing the variance in advance.

What carries the argument

The pair of coupled, non-convex Catoni-type estimating equations for the parameter and the variance, whose joint solutions are controlled via the Poincaré-Miranda theorem.

If this is right

  • The same joint rates hold in mean estimation, linear regression, and ℓ2-penalized regression.
  • The bounds remain valid without knowledge of the variance or any tuning parameters.
  • The rates are optimal up to absolute constants in the heavy-tailed regime.
  • The proof strategy applies to other problems that require simultaneous estimation of parameters of different types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same topological control might simplify proofs for joint robust estimation in generalized linear models.
  • Practitioners facing data with unknown scale could replace separate variance estimation and cross-validation steps with this single procedure.
  • The moment condition 2β with β close to 1 suggests the method remains useful even when tails are only slightly heavier than Gaussian.

Load-bearing premise

The non-convex coupled equations must possess solutions whose joint deviations can be bounded using a topological theorem instead of convexity arguments.

What would settle it

Generate data with exactly 2.1 moments and check whether the observed joint deviation of the estimator from the true parameter and variance exceeds the claimed sub-Gaussian bound by more than a small constant factor.

read the original abstract

This paper develops a Catoni-type joint (tuning-free) estimation framework for parametric models with heavy-tailed noise, in which the target parameter and the unknown noise variance are estimated simultaneously through a system of two coupled Catoni-type estimating equations. We instantiate the framework in three canonical settings: mean estimation, linear regression, and $\ell_{2}$-penalized regression. Theoretically, we establish non-asymptotic, sub-Gaussian-type deviation bounds that hold jointly for the target parameter and the variance estimator, under only a finite $2\beta$-th moment assumption with $\beta\in (1,2]$. The resulting rates match -- up to absolute constants -- those of oracle procedures that know the variance in advance, thereby attaining optimality in the heavy-tailed regime. Methodologically, because the coupled equations are intrinsically non-convex and non-linear, classical convex M-estimation arguments are inapplicable. We develop a new analytical toolkit based on the Poincare--Miranda theorem. The resulting proof strategy is of independent methodological interest, and we expect it to be applicable to a broad class of other statistical problems in which several parameters of heterogeneous nature must be estimated jointly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper develops a tuning-free joint robust estimation framework for parametric models with heavy-tailed noise, simultaneously estimating the target parameter and unknown noise variance via a system of two coupled Catoni-type estimating equations. It instantiates the approach for mean estimation, linear regression, and ℓ₂-penalized regression. The central theoretical claim is the derivation of non-asymptotic sub-Gaussian-type joint deviation bounds under only a finite 2β-th moment assumption (β ∈ (1,2]), with rates matching those of oracle procedures that know the variance in advance. The proofs rely on the Poincaré-Miranda theorem to establish existence of solutions to the non-convex system, bypassing classical convex M-estimation arguments.

Significance. If the central claims hold, the work would provide a valuable contribution to robust statistics by delivering optimal, variance-adaptive estimators under weak moment conditions without tuning parameters. The methodological innovation of adapting Poincaré-Miranda for joint non-convex estimation could extend to other problems involving heterogeneous parameters, and the oracle-matching rates under 2β moments represent a strong theoretical achievement.

major comments (2)
  1. [Proofs of the main deviation bounds (theorems establishing joint sub-Gaussian rates)] The proof strategy invokes Poincaré-Miranda to guarantee a zero of the coupled estimating functions inside a rectangle whose dimensions are set to the target deviation rates. However, the theorem only ensures existence within the rectangle once opposing sign conditions hold on the faces; it provides no control over possible additional zeros outside the rectangle. Under the stated polynomial integrability of the fluctuation terms, far-field behavior is not automatically dominated, so the non-asymptotic bounds may apply only to some solutions rather than to every solution of the system. This affects the well-definedness of the estimator and the validity of the joint deviation claim.
  2. [Section 2 (definition of the joint estimators) and the subsequent theoretical analysis] The estimators are defined as solutions to the coupled non-convex, non-linear equations. Without an additional argument showing that no solutions exist outside the target deviation ball (e.g., via uniform domination of the expectation term by the fluctuation term for large deviations), or a constructive selection rule for the solution inside the rectangle, it remains unclear which root is being bounded and how the procedure is implemented in practice.
minor comments (1)
  1. [Section 2] Notation for the Catoni function and the coupled equations could be made more explicit when first introduced to aid readability for readers unfamiliar with the original Catoni estimator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable comments on our manuscript. We appreciate the recognition of the potential contributions and address the major comments point by point below. We plan to make revisions to clarify the well-definedness of the estimators.

read point-by-point responses
  1. Referee: The proof strategy invokes Poincaré-Miranda to guarantee a zero of the coupled estimating functions inside a rectangle whose dimensions are set to the target deviation rates. However, the theorem only ensures existence within the rectangle once opposing sign conditions hold on the faces; it provides no control over possible additional zeros outside the rectangle. Under the stated polynomial integrability of the fluctuation terms, far-field behavior is not automatically dominated, so the non-asymptotic bounds may apply only to some solutions rather than to every solution of the system. This affects the well-definedness of the estimator and the validity of the joint deviation claim.

    Authors: We thank the referee for highlighting this important subtlety. The Poincaré-Miranda theorem is invoked solely to guarantee existence of at least one solution inside the target rectangle. We agree that this does not automatically rule out other zeros outside the rectangle. In the revision we will explicitly define the joint estimator as any solution lying inside the rectangle whose existence is assured by the theorem. The deviation bounds are then stated for this defined estimator. A short remark will be added noting that the selection is by construction inside the region of interest. This resolves the well-definedness issue while remaining faithful to the minimal moment assumptions. revision: yes

  2. Referee: The estimators are defined as solutions to the coupled non-convex, non-linear equations. Without an additional argument showing that no solutions exist outside the target deviation ball (e.g., via uniform domination of the expectation term by the fluctuation term for large deviations), or a constructive selection rule for the solution inside the rectangle, it remains unclear which root is being bounded and how the procedure is implemented in practice.

    Authors: We agree that the current wording leaves ambiguity about which root is intended. We will revise Section 2 to state that the estimator is defined to be a solution of the coupled system that lies inside the rectangle for which existence is guaranteed by Poincaré-Miranda. For implementation we will add a brief discussion indicating that the low-dimensional (two-equation) system can be solved numerically by standard methods such as Newton iteration or a merit-function minimization, initialized at a point scaled to the target deviation rates. This makes both the theoretical object and the practical procedure unambiguous. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained; no reductions to inputs by construction

full rationale

The estimators are defined directly as solutions to the coupled Catoni-type equations. Non-asymptotic joint deviation bounds are then derived relative to oracle procedures that know the variance, under explicit 2β-moment assumptions with β∈(1,2]. The proof invokes the Poincaré-Miranda theorem to establish existence inside a target rectangle whose dimensions are set to the claimed rates; this is a standard existence argument applied to the estimating functions and does not presuppose the target bounds or rename fitted quantities as predictions. No self-citations are load-bearing for the central claims, no ansatz is smuggled, and no known empirical pattern is merely relabeled. The derivation therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on a standard moment condition and an analytic theorem; no free parameters are introduced because the method is tuning-free.

axioms (2)
  • domain assumption Finite 2β-th moment assumption for β in (1,2]
    Invoked to obtain the sub-Gaussian-type deviation bounds for the joint estimators.
  • domain assumption Poincaré-Miranda theorem applies to the coupled estimating equations
    Used to establish existence and control the non-convex non-linear system.

pith-pipeline@v0.9.0 · 5732 in / 1420 out tokens · 34822 ms · 2026-05-21T20:07:36.945345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    We employ the Poincaré–Miranda Theorem to show that the solutions lie within certain geometric regions, such as cylinders or cones, centered around the true parameter values.

  • Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    ψ1 satisfying -log(1-x+|x|²/2) ≤ ψ1(x) ≤ log(1+x+|x|²/2)

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

  1. [1]

    Auddy, A., & Yuan, M. (2022). On estimating rank-one spiked tensors in the presence of heavy tailed errors.IEEE Transactions on Information Theory, 68(12), 8053-8075. 2

  2. [2]

    Belloni, A., Chernozhukov, V ., & Wang, L. (2011). Square-root ridge: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791-806. 4

  3. [3]

    Babii, A., Ghysels, E., & Striaukas, J. (2022). Machine learning time series regressions with an appli- cation to nowcasting.Journal of Business & Economic Statistics, 40(3), 1094-1106. 2

  4. [4]

    Bubeck, S., Cesa-Bianchi, N., & Lugosi, G. (2013). Bandits with heavy tail.IEEE Transactions on Information Theory, 59(11), 7711-7717. 3

  5. [5]

    Bertrand,Q., Massias,M., Gramfort,A., & Salmon, J. (2019). Handling correlated and repeated mea- surements with the smoothed multivariate square-root ridge. Advances in Neural Information Process- ing Systems 32 (NeurIPS 2019). 4

  6. [6]

    (2012), Challenging the Empirical Mean and Empirical Variance: A Deviation Study

    Catoni, O. (2012), Challenging the Empirical Mean and Empirical Variance: A Deviation Study. Annales de I’Institut Henri Poincar´e— Probabilit ´es et Statistiques, 48, 1148–1185. 2, 3, 4

  7. [7]

    Croux, C., Gelper, S., & Mahieu, K. (2010). Robust exponential smoothing of multivariate time series. Computational statistics & data analysis, 54(12), 2999-3006. 2

  8. [8]

    and Xu, L., 2021

    Chen, P., Jin, X., Li, X. and Xu, L., 2021. A generalized Catoni’s M-estimator under finiteα-th moment assumption withα∈(1,2).Electronic Journal of Statistics, 15(2), pp.5523-5544. 3

  9. [9]

    T., Martin, R

    Connor, J. T., Martin, R. D., & Atlas, L. E. (1994). Recurrent neural networks and robust time series prediction.IEEE transactions on neural networks, 5(2), 240-254. 2

  10. [10]

    and Jo, H.H

    Eom,Y .H. and Jo, H.H. (2015), Tail-scope: Using friends to estimate heavy tails of degree distributions in large-scale complex networks.Scientific Reports, vol. 5, 09752. (2015) 2

  11. [11]

    Fan, J., Ke, Y ., Sun, Q., & Zhou, W. X. (2019). FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control.Journal of the American Statistical Association, 114(526), 1684-1696. 3

  12. [12]

    & Wang, Y

    Fan, J., Li, Q. & Wang, Y . (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.Journal of the Royal Statistical Society, Series B, 79(1), 247–265. 2

  13. [13]

    Fan, J., Liu, H., & Wang, W. (2018). Large covariance estimation through elliptical factor models. Annals of statistics, 46(4), 1383. 3 JOINT ROBUST ESTIMATION 51

  14. [14]

    Finkenstadt, B., & Rootz ´en, H. (Eds.). (2003). Extreme values in finance, telecommunications, and the environment. CRC Press. 2

  15. [15]

    Frankowska, H. (2018). The Poincar ´e–Miranda theorem and viability condition.Journal of Mathe- matical Analysis and Applications, 463(2), 832-837. 25, 35, 36, 47, 48

  16. [16]

    Fan, J., Wang, W., & Zhong, Y . (2019). Robust covariance estimation for approximate factor models. Journal of econometrics, 208(1), 5-22. 2

  17. [17]

    Fan, J., Wang, W., & Zhu, Z. (2021). A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery.Annals of statistics, 49(3), 1239. 2

  18. [18]

    Fan, J., Wang, K., Zhong, Y ., & Zhu, Z. (2021). Robust high dimensional factor models with applica- tions to statistical machine learning.Statistical science: a review journal of the Institute of Mathemat- ical Statistics, 36(2), 303. 2

  19. [19]

    P., & Xu, H

    Guerrier, S., Molinari, R., Victoria-Feser, M. P., & Xu, H. (2022). Robust two-step wavelet-based inference for time series models.Journal of the American Statistical Association, 117(540), 1996-

  20. [20]

    Huber, P. J. (1964). Robust estimation of a location parameter.Annals of Mathematical Statistics, 35(1), 73–101. 2

  21. [21]

    Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo.The annals of statistics, 799-821. 2, 3

  22. [22]

    J., & Ronchetti, E

    Huber, P. J., & Ronchetti, E. M. (2011).Robust statistics. John Wiley & Sons. 3

  23. [23]

    (2019) Sparse Poisson regression with penalized weighted score function

    Jia, J., Xie, F., & Xu, L. (2019) Sparse Poisson regression with penalized weighted score function. Electronic Journal of Statistics, 13(2), 2898-2920. 4

  24. [24]

    (2018) Bernstein’s inequalities for general Markov chains.arXiv preprint arXiv:1805.10721

    Jiang, B., Sun, Q., & Fan, J. (2018) Bernstein’s inequalities for general Markov chains.arXiv preprint arXiv:1805.10721. 3

  25. [25]

    Ke, Y ., Minsker, S., Ren, Z., Sun, Q., & Zhou, W. X. (2019). User-friendly covariance estimation for heavy-tailed distributions.Statistical Science, 34(3), 454-471. 2

  26. [26]

    Lecu ´e, G., & Lerasle, M. (2020). Robust machine learning by median-of-means: theory and practice. Annals of Statistics, 48(2), 906-931. 2

  27. [27]

    (2019) ”Sub-Gaussian estimators of the mean of a random vector,”The Annals of Statistics, Ann

    Lugosi G., Mendelson S. (2019) ”Sub-Gaussian estimators of the mean of a random vector,”The Annals of Statistics, Ann. Statist. 47(2), 783-794, 2

  28. [28]

    (2019) Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey.Foundations of Computational Mathematics, 19(5), 1145–1190

    Lugosi, G., Mendelson, S. (2019) Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey.Foundations of Computational Mathematics, 19(5), 1145–1190. 2

  29. [29]

    Mammen, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap.The annals of statistics,382-400. 3

  30. [30]

    Minsker, S. (2018). Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46(6A), 2871-2903. 2, 3

  31. [31]

    Molstad, A.J. (2022). New Insights for the Multivariate Square-Root ridge.Journal of Machine Learn- ing Research, 23(66):1-52. 4

  32. [32]

    M., & Harchaoui, Z

    Pillutla, K., Kakade, S. M., & Harchaoui, Z. (2022). Robust aggregation for federated learning.IEEE Transactions on Signal Processing, 70,1142-1154. 2

  33. [33]

    S., Balakrishnan, S., & Ravikumar, P

    Prasad, A., Suggala, A. S., Balakrishnan, S., & Ravikumar, P. (2020). Robust estimation via robust gradient estimation.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3), 601-627. 3

  34. [34]

    Qu, L. (2021). A new approach to estimating earnings forecasting models: Robust regression MM- estimation.International Journal of Forecasting, 37(2), 1011-1030. 2

  35. [35]

    Do we need to estimate the variance in robust mean estimation.arXiv preprint arXiv:2107.00118.4, 5, 7

    Sun, Q., (2021). Do we need to estimate the variance in robust mean estimation.arXiv preprint arXiv:2107.00118.4, 5, 7

  36. [36]

    and Zhang, C.H

    Sun, T. and Zhang, C.H. (2012). Scaled sparse linear regression.Biometrika, 99(4), pp.879-898. 4, 5

  37. [37]

    & Fan, J

    Sun, Q., Zhou, W.X. & Fan, J. (2020). Adaptive huber regression.Journal of the American Statistical Association, 115(529), pp.254-265. 2, 3, 4, 12, 13

  38. [38]

    A., & Dezeure, R

    Van de Geer, S., B ¨uhlmann, P., Ritov, Y . A., & Dezeure, R. (2014). On asymptotically optimal confi- dence regions and tests for high-dimensional models.Annals of statistics, 42(3), 1166-1202. 8

  39. [39]

    Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices.arXiv preprint arXiv:1011.3027.8

  40. [40]

    (2018).High-dimensional probability: An introduction with applications in data science (V ol

    Vershynin, R. (2018).High-dimensional probability: An introduction with applications in data science (V ol. 47). Cambridge university press. 8 52 X. LI, J. S. LIU, Q. SUN, AND L. XU

  41. [41]

    Wang, Y ., Li, G., Xiao, Z., Xu, L., & Zhang, W. (2024). Robust estimation for high-dimensional time series with heavy tails.arXiv preprint arXiv:2411.05217.2, 3

  42. [42]

    Wang, L., Peng, B., & Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. Journal of the American Statistical Association, 110(512), 1658-1669. 2

  43. [43]

    Wang, H., & Ramdas, A. (2023). Catoni-style confidence sequences for heavy-tailed mean estimation. Stochastic Processes and Their Applications, 163, 168-202. 3

  44. [44]

    (2021, October)

    Wang, Y ., Zhong, X., He, F., Chen, H., & Tao, D. (2021, October). Huber additive models for non- stationary time series analysis. InInternational conference on learning representations.2, 3

  45. [45]

    Wang, L., Zheng, C., Zhou, W., & Zhou, W. X. (2021). A new principle for tuning-free Huber regres- sion.Statistica Sinica, 31(4), 2153-2177. 4

  46. [46]

    J., & Maronna, R

    Yohai, V . J., & Maronna, R. A. (1979). Asymptotic behavior of M-estimators for the linear model.The Annals of Statistics,258-268. 3

  47. [47]

    X., Bose, K., Fan, J., & Liu, H

    Zhou, W. X., Bose, K., Fan, J., & Liu, H. (2018). A new perspective on robust M-estimation: Finite sample theory and applications to dependence-adjusted multiple testing.Annals of statistics, 46(5),

  48. [48]

    3 DEPARTMENT OFSTATISTICS ANDDATASCIENCE, SOUTHERNUNIVERSITY OFSCIENCE AND TECHNOLOGY Email address:lixiang3@sustech.edu.cn DEPARTMENT OFSTATISTICS ANDDATASCIENCE, TSINGHUAUNIVERSITY Email address:junsliu@tsinghua.edu.cn DEPARTMENT OFSTATISTICALSCIENCES, UNIVERSITY OFTORONTO Email address:qiang.sun@utoronto.ca DEPARTMENT OFMATHEMATICS, UNIVERSITY OFMACAU ...