Denoising clustering covariance matrices with Rotational Invariant Estimators
Pith reviewed 2026-05-10 12:35 UTC · model grok-4.3
The pith
The Rotational Invariant Estimator stabilizes best-fit recovery in galaxy clustering analyses even with few mocks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Among the estimators tested, RIE emerges as the most effective at stabilizing best-fit recovery, particularly in Fourier space, where it closely reproduces the reference posteriors even when the number of mocks barely exceeds the data-vector dimension.
What carries the argument
The Rotational Invariant Estimator (RIE), which produces a denoised covariance matrix by exploiting rotational invariance properties of the estimator.
Load-bearing premise
The controlled synthetic data sets with analytically known covariance matrices accurately represent the statistical properties and noise structure encountered in real galaxy clustering observations.
What would settle it
Repeating the full inference pipeline on the same synthetic data but with a much larger number of mocks to serve as ground truth, then checking whether RIE still yields best-fit values and posterior volumes that match the high-mock reference within expected statistical fluctuations.
Figures
read the original abstract
Cosmological parameter inference from galaxy clustering relies critically on accurate estimates of the covariance and precision matrices. These are often obtained from a limited number of mock catalogs, introducing noise and bias in the precision matrix when the data-vector dimension becomes comparable to the number of available realizations. We present the first application of the Rotational Invariant Estimator (RIE) to the large-scale clustering of galaxies, benchmarking it against the standard sample covariance and the non-linear shrinkage estimator NERCOME for both the two-point correlation function (2PCF) and power spectrum. Using controlled synthetic data sets with analytically known covariance matrices, we estimate the covariance with all three methods across a range of mock-to-dimension ratios $q = N/D$ and data-vector sizes $D$. We then perform Bayesian inference with an EFT-based model and quantify each estimator through the Figure of Bias (FoB) and Figure of Merit (FoM). After correction for finite-$N$ effects, the sample covariance recovers unbiased average uncertainty volumes but suffers from growing best-fit scatter and bias at small $q$ due to the Dodelson--Schneider effect. Both NERCOME and RIE substantially reduce these stochastic shifts; however, the uncertainties they assign are probe-dependent. In configuration space, both estimators can yield overly tight constraints, with a bias that grows with $D$. In Fourier space, RIE delivers markedly improved best-fit stability with only mild FoM bias, whereas NERCOME tends to overestimate the constraining power. Among the estimators tested, RIE emerges as the most effective at stabilizing best-fit recovery, particularly in Fourier space, where it closely reproduces the reference posteriors even when the number of mocks barely exceeds the data-vector dimension.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the first application of the Rotational Invariant Estimator (RIE) to covariance matrix estimation for galaxy clustering, benchmarking it against the sample covariance estimator (with finite-N corrections) and the NERCOME shrinkage estimator. Using synthetic datasets with analytically known true covariances, the authors compare performance for both the two-point correlation function and power spectrum across a range of mock-to-dimension ratios q = N/D. They quantify results via Figure of Bias (FoB) and Figure of Merit (FoM) after Bayesian inference with an EFT-based model, concluding that RIE provides the best stabilization of best-fit parameter recovery (especially in Fourier space) while mitigating Dodelson-Schneider effects, though uncertainty calibration is probe-dependent.
Significance. If the benchmark results hold, the work offers a practical advance for cosmological parameter inference when the number of mocks is comparable to or only modestly exceeds the data-vector dimension, a common limitation in analyses of surveys such as DESI or Euclid. Credit is due for the controlled synthetic setup with analytically known truth, which enables direct, independent validation of FoB/FoM metrics rather than circular self-consistency tests, and for the explicit finite-N corrections applied to the sample estimator.
minor comments (4)
- The abstract and methods description would benefit from an explicit early definition of the ratio q = N/D and a brief statement of how the finite-N correction is implemented for the sample covariance (e.g., which formula or reference is used).
- Implementation details for the RIE (e.g., any regularization parameters, choice of rotationally invariant shrinkage form, or numerical stability checks) are not fully specified; adding a short paragraph or appendix would improve reproducibility.
- The text should clarify whether the reported FoM and FoB values are averaged over multiple independent realizations of the synthetic data or derived from a single run, and how error bars on these metrics are obtained.
- Figure captions and axis labels could be expanded to indicate the exact data-vector dimension D and the number of mocks N for each panel, rather than relying solely on the q values.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript, positive summary of our results, and recommendation for minor revision. The controlled synthetic setup with known truth and the focus on practical performance at modest q = N/D are indeed central to the work. We address the major comments below.
Circularity Check
No significant circularity
full rationale
The paper conducts an empirical benchmarking study on synthetic datasets where the true covariance matrix is known analytically by construction. Reference posteriors are obtained directly from this known truth using an EFT-based model, and the FoB/FoM metrics compare estimator performance against these external references. No derivation, prediction, or central claim reduces by the paper's equations to a fitted input or self-defined quantity. Any references to prior RIE work or finite-N corrections (e.g., Dodelson-Schneider) are external to the benchmarking results and do not form a load-bearing self-citation chain that would force the reported outcomes.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Synthetic data sets with analytically known covariance matrices faithfully reproduce the statistical properties of real galaxy clustering observations
- domain assumption The EFT-based model provides an adequate description for the Bayesian inference step used to evaluate estimator performance
Reference graph
Works this paper leans on
-
[1]
Abadir, K. M., Distaso, W., & Žikeš, F. 2014, Journal of Econometrics, 181, 165 Article number, page 8 of 11 A. Farina et al.: Denoising cosmological covariance matrices with Rotational Invariant estimators Fig. 6.Posterior distributions for the model parameters obtained from the power spectrum analysis forD=270 using different covariance matrix estimator...
-
[2]
as a function of the cosmological parametersh andω c. The predicted basis functions are then contracted with the EFT bias and counterterm coefficients at each step of the MCMC chain, which allows a single set of trained networks to serve any combination of nuisance parameters without retrain- ing. Input preprocessing.The two raw cosmological inputs (h, ω ...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.