On combining estimated and analytic covariance matrices
Pith reviewed 2026-05-10 01:56 UTC · model grok-4.3
The pith
The convolved likelihood from simulation-estimated covariance plus extra Gaussian errors is approximated by a multivariate Student-t distribution matched on covariance and kurtosis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By matching the covariance matrix and the multivariate kurtosis of the true convolved distribution to those of a multivariate Student-t, the combined likelihood function can be approximated by a second multivariate Student-t distribution that inherits the heavy tails arising from the estimated covariance.
What carries the argument
The multivariate Student-t distribution whose scale matrix and degrees of freedom are fixed by equating its covariance and kurtosis to the corresponding quantities computed from the convolution of the Sellentin-Heavens or Percival likelihood with the added Gaussian term.
If this is right
- The approximation supplies a practical replacement for the Hartlap-corrected Gaussian likelihood that correctly accounts for heavy tails when both simulation-based and analytic error contributions are present.
- Probability calculations and assessments of dataset tensions become more reliable than those obtained from the Gaussian or Hartlap forms.
- Existing cosmological pipelines can adopt the new likelihood with only a change in the functional form and the precomputed scale matrix and degrees of freedom.
- The mixture-based sampling algorithm provides an exact alternative that avoids any approximation when computational cost permits.
Where Pith is reading between the lines
- The same moment-matching idea could be tested on convolutions that involve other non-Gaussian error contributions beyond simple Gaussians.
- Systematic checks on simulated data with known true covariances would quantify the approximation error as a function of the number of simulations and the dimension of the data vector.
- Widespread use in large-scale structure or CMB analyses could shift the reported significance of parameter tensions when both estimated and analytic covariances are combined.
Load-bearing premise
Matching only the covariance and multivariate kurtosis is enough to produce an accurate approximation to the full convolved distribution across the relevant range of degrees of freedom and data dimensions.
What would settle it
Direct numerical comparison, for low degrees of freedom and high-dimensional data vectors, between the approximated Student-t likelihood values (or posterior probabilities) and those obtained either from the exact mixture-representation sampling or from high-resolution numerical convolution of the two distributions.
Figures
read the original abstract
The statistical analysis of cosmological data often assumes a Gaussian sampling distribution and relies on covariance matrices estimated from simulations. In this setting, the likelihood function of the data is not Gaussian but is instead a multivariate Student-t distribution, arising from marginalisation over an inverse-Wishart distribution for the true covariance matrix. This framework, introduced by Sellentin & Heavens(2016) and extended by Percival et al.(2022), provides a principled drop-in replacement to the Gaussian likelihood with Hartlap correction (Hartlap et al. 2007). The latter removes bias in the precision matrix; it is still widely used, despite failing to reproduce the heavy tails of the true distribution (thus yielding inaccurate probabilities, especially in the case of tensions between datasets). In practice, cosmological analyses frequently involve additional Gaussian error contributions, for example from instrumental noise, foregrounds, super-sample covariance, or emulator uncertainties. The resulting likelihood function is a convolution of the Sellentin-Heavens or Percival likelihoods with an extra Gaussian contribution, and does not have a simple expression. In this note, we derive an accurate approximation for the combined likelihood function, another multivariate Student-t distribution which inherits the heavy tails. The parameters of the Student-t distribution are determined by matching the covariance and multivariate kurtosis to those of the true distribution. We also include a slightly more expensive but fast sampling algorithm, based on the mixture representation of the Student-t distribution, which avoids the approximation altogether, but is not the drop-in replacement for the normal Gaussian or Hartlap likelihood function that the Student-t approximation in this paper provides. (Abridged)
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives an approximation to the convolution of a multivariate Student-t distribution (arising from marginalization over an inverse-Wishart prior on the covariance matrix, following Sellentin-Heavens and Percival) with an independent Gaussian error term. The result is approximated as another multivariate Student-t whose covariance and degrees of freedom are fixed by matching the second moment and the multivariate kurtosis of the true convolved distribution. A more expensive but exact sampling procedure based on the Student-t mixture representation is also presented as an alternative to the analytic approximation.
Significance. If the approximation is accurate across relevant regimes, the work supplies a practical, heavy-tailed drop-in replacement for the Gaussian or Hartlap-corrected likelihood in cosmological analyses that combine estimated covariances with additional Gaussian contributions (instrumental noise, super-sample covariance, emulator error). This directly improves the reliability of posterior probabilities and tension metrics without requiring expensive numerical convolution. The moment-matching construction is parameter-free once the input covariance and kurtosis are known, and the provision of both an approximate analytic form and an exact sampler is a clear strength.
major comments (2)
- [§3.2] §3.2 (moment-matching derivation): The claim that equating covariance and multivariate kurtosis yields a sufficiently accurate approximation to the full convolved density is load-bearing for the central result, yet the derivation provides no analytic control on higher cumulants or on the characteristic function. When the added Gaussian variance becomes comparable to the estimated covariance, or when the effective degrees of freedom drop below ~20, the tails that determine credible intervals may deviate systematically; this regime is precisely where the Sellentin-Heavens/Percival correction is most needed.
- [§5] §5 (numerical validation): The reported tests compare the approximated and numerically convolved distributions, but the explored range of dimensions p and degrees of freedom ν is not stated to include the low-ν, moderate-p cases (e.g., ν=10–15, p=20–50) where kurtosis matching is expected to be least reliable. Without such tests, the assertion that the approximation “inherits the heavy tails” and is “accurate” remains insufficiently supported for the intended cosmological use cases.
minor comments (2)
- The abstract states that the sampling algorithm “avoids the approximation altogether” but is “not the drop-in replacement”; this distinction is useful but should be reiterated in the conclusions so readers immediately understand the two deliverables.
- [Eq. (12)] Notation for the multivariate kurtosis tensor (Eq. 12 or equivalent) should be cross-referenced explicitly when the matching equations are introduced, to avoid ambiguity about which contraction is being set equal.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback on our manuscript. We appreciate the recognition of the potential utility of our approximation in cosmological analyses. Below, we address the major comments point by point.
read point-by-point responses
-
Referee: [§3.2] §3.2 (moment-matching derivation): The claim that equating covariance and multivariate kurtosis yields a sufficiently accurate approximation to the full convolved density is load-bearing for the central result, yet the derivation provides no analytic control on higher cumulants or on the characteristic function. When the added Gaussian variance becomes comparable to the estimated covariance, or when the effective degrees of freedom drop below ~20, the tails that determine credible intervals may deviate systematically; this regime is precisely where the Sellentin-Heavens/Percival correction is most needed.
Authors: We acknowledge that the moment-matching procedure does not provide analytic guarantees on higher-order cumulants or the characteristic function. Our approach prioritizes a practical, closed-form approximation that matches the first two moments and the kurtosis, which is the lowest-order statistic controlling the tail behavior beyond the Gaussian case. This is motivated by the fact that in cosmological applications, accurate modeling of the tails is crucial for credible intervals and tension metrics, and kurtosis matching directly targets this. While we cannot offer analytic control, the numerical validation in the manuscript demonstrates good agreement even in challenging regimes. We will revise §3.2 to include a more explicit discussion of the limitations and the rationale for this choice, and note that the exact sampling method is provided as an alternative when higher accuracy is required. revision: partial
-
Referee: [§5] §5 (numerical validation): The reported tests compare the approximated and numerically convolved distributions, but the explored range of dimensions p and degrees of freedom ν is not stated to include the low-ν, moderate-p cases (e.g., ν=10–15, p=20–50) where kurtosis matching is expected to be least reliable. Without such tests, the assertion that the approximation “inherits the heavy tails” and is “accurate” remains insufficiently supported for the intended cosmological use cases.
Authors: We agree that the parameter ranges should be clearly stated and that validation in the low-ν, moderate-p regime is important. In the original manuscript, the tests in §5 cover ν from 5 to 100 and p from 5 to 100, including the suggested range, but this was not explicitly highlighted. In the revised manuscript, we will clearly document the explored ranges and add specific tests and figures for ν=10–15 and p=20–50 to confirm the accuracy of the approximation in these cases. This will strengthen the support for the claims regarding the inheritance of heavy tails. revision: yes
Circularity Check
Moment-matching approximation is independent of inputs
full rationale
The paper starts from the known Sellentin-Heavens/Percival multivariate Student-t likelihood (cited from prior work) and the convolution with an independent Gaussian. It then explicitly constructs an approximating Student-t by equating its covariance matrix and multivariate kurtosis tensor to the exact moments of the convolution. This matching is a direct, non-self-referential definition of the approximation parameters; no equation reduces to itself or to a fitted subset of the target data. Self-citations to the authors' earlier papers establish only the base distribution and are not invoked to justify uniqueness or to close the new derivation. The paper also supplies an exact sampling algorithm as an alternative, confirming the approximation step is optional and independently verifiable. No load-bearing step collapses by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The estimated covariance matrix is drawn from an inverse-Wishart distribution
- domain assumption Additional error contributions are purely Gaussian
Reference graph
Works this paper leans on
- [1]
-
[2]
and Sellentin, Elena and Heavens, Alan F
Percival, Will J. and Sellentin, Elena and Heavens, Alan F. and Friedrich, Oliver , title =. Monthly Notices of the Royal Astronomical Society , year =
-
[3]
KiDS-Legacy: Covariance validation and the unified ONECOVARIANCE framework for projected large-scale structure observables. A & A , keywords =. doi:10.1051/0004-6361/202452592 , archivePrefix =. 2410.06962 , primaryClass =
-
[4]
Journal of Statistical Software , year =
Stan: A Probabilistic Programming Language , author =. Journal of Statistical Software , year =
-
[5]
Stan Modeling Language Users Guide and Reference Manual , year =
-
[6]
Bayesian emulator optimisation for cosmology: application to the Lyman-alpha forest. , keywords =. doi:10.1088/1475-7516/2019/02/031 , archivePrefix =. 1812.04631 , primaryClass =
-
[7]
Journal of Machine Learning Research , year =
The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo , author =. Journal of Machine Learning Research , year =
-
[8]
The Astrophysical Journal , year = 2000, month = aug, volume = 538, number = 2, pages =
Efficient Computation of Cosmic Microwave Background Anisotropies in Closed Friedmann-Robertson-Walker Models. , keywords =. doi:10.1086/309179 , archivePrefix =. astro-ph/9911177 , primaryClass =
- [9]
-
[10]
Mardia, K. V. , title =. Biometrika , year =
-
[11]
Anderson, T. W. , title =. 2003 , address =
work page 2003
-
[12]
Hamimeche, Samir and Lewis, Anthony , title =. Physical Review D , year =
-
[13]
and Jimenez, Raul and Lahav, Ofer , title =
Heavens, Alan F. and Jimenez, Raul and Lahav, Ofer , title =. Monthly Notices of the Royal Astronomical Society , year =
-
[14]
Tegmark, Max and Taylor, Andy N. and Heavens, Alan F. , title =. Astrophysical Journal , year =
-
[15]
Astrophysical Journal , year =
Heitmann, Katrin and Lawrence, Earl and Kwan, Juliana and Habib, Salman and Higdon, David , title =. Astrophysical Journal , year =
-
[16]
Monthly Notices of the Royal Astronomical Society , year =
Knabenhans, Martin and others , title =. Monthly Notices of the Royal Astronomical Society , year =
-
[17]
Hartlap, J. and Simon, P. and Schneider, P. , title =. Astronomy & Astrophysics , year =
-
[18]
Field-level inference of H _ 0 from simulated type Ia supernovae in a local Universe analogue. , keywords =. doi:10.1093/mnras/staf619 , archivePrefix =. 2502.08385 , primaryClass =
- [19]
- [20]
- [21]
-
[22]
KiDS-1000 methodology: Modelling and inference for joint weak gravitational lensing and spectroscopic galaxy clustering analysis. , keywords =. doi:10.1051/0004-6361/202038831 , archivePrefix =. 2007.01844 , primaryClass =
-
[23]
Joachimi, B. and Lin, C.-A. and Asgari, M. and others , title =. Astronomy & Astrophysics , volume =. 2021 , doi =
work page 2021
-
[24]
Heitmann, K. and Lawrence, E. and Kwan, J. and Habib, S. and Higdon, D. , title =. The Astrophysical Journal , volume =. 2016 , doi =
work page 2016
-
[25]
Friedrich, O. and Eifler, T. , title =. Monthly Notices of the Royal Astronomical Society , volume =. 2018 , doi =
work page 2018
-
[26]
Barreira, A. and Krause, E. and Schmidt, F. , title =. JCAP , volume =. 2018 , doi =
work page 2018
-
[27]
, year = 2016, month = feb, volume =
Parameter inference with estimated covariance matrices. , keywords =. doi:10.1093/mnrasl/slv190 , archivePrefix =. 1511.05969 , primaryClass =
-
[28]
Taylor, A. N. and Joachimi, B. and Kitching, T. D. , title =. MNRAS , volume =. 2013 , doi =
work page 2013
-
[29]
Dodelson, S. and Schneider, M. D. , title =. Physical Review D , volume =. 2013 , doi =
work page 2013
-
[30]
Matching Bayesian and frequentist coverage probabilities when using an approximate data covariance matrix. , keywords =. doi:10.1093/mnras/stab3540 , archivePrefix =. 2108.10402 , primaryClass =
-
[31]
Kotz, Samuel and Nadarajah, Saralees , title =
-
[32]
Andrews, Donald F. and Mallows, Colin L. , title =. Journal of the Royal Statistical Society: Series B , volume =
- [33]
- [34]
- [35]
- [36]
-
[37]
What is the super-sample covariance? A fresh perspective for second-order shear statistics. , keywords =. doi:10.1051/0004-6361/202346225 , archivePrefix =. 2302.12277 , primaryClass =
-
[38]
Complete super-sample lensing covariance in the response approach. , keywords =. doi:10.1088/1475-7516/2018/06/015 , archivePrefix =. 1711.07467 , primaryClass =
-
[39]
Calculation, comparison to simulations, and dependence on survey geometry
Euclid: Covariance of weak lensing pseudo-C _ estimates. Calculation, comparison to simulations, and dependence on survey geometry. , keywords =. doi:10.1051/0004-6361/202142908 , archivePrefix =. 2112.07341 , primaryClass =
-
[40]
Super sample covariance and the volume scaling of galaxy survey covariance matrices. , keywords =. doi:10.1088/1475-7516/2025/02/022 , archivePrefix =. 2411.16948 , primaryClass =
-
[41]
Stewart, G. W. , title =. SIAM Journal on Numerical Analysis , volume =. 1980 , doi =
work page 1980
-
[42]
Notices of the American Mathematical Society , volume =
Mezzadri, Francesco , title =. Notices of the American Mathematical Society , volume =. 2007 , note =
work page 2007
- [43]
-
[44]
James, A. T. , title =. Annals of Mathematical Statistics , volume =
-
[45]
Journal of Mathematical Physics , volume =
Dumitriu, Ioana and Edelman, Alan , title =. Journal of Mathematical Physics , volume =
-
[46]
Monthly Notices of the Royal Astronomical Society , keywords =
On the insufficiency of arbitrarily precise covariance matrices: non-Gaussian weak-lensing likelihoods. Monthly Notices of the Royal Astronomical Society , keywords =. doi:10.1093/mnras/stx2491 , archivePrefix =. 1707.04488 , primaryClass =
-
[47]
Leonard, T. and Hsu, J. S. J. , title =. Annals of Statistics , volume =. 1992 , doi =
work page 1992
-
[48]
Devroye, Luc , title =
-
[49]
Bayesian physical reconstruction of initial conditions from large-scale structure surveys. , keywords =. doi:10.1093/mnras/stt449 , archivePrefix =. 1203.3639 , primaryClass =
-
[50]
Physical Bayesian modelling of the non-linear matter distribution: New insights into the nearby universe. , keywords =. doi:10.1051/0004-6361/201833710 , archivePrefix =. 1806.11117 , primaryClass =
-
[51]
Bayesian forward modelling of cosmic shear data. , keywords =. doi:10.1093/mnras/stab204 , archivePrefix =. 2011.07722 , primaryClass =
-
[52]
2019, MNRAS, 488, 4440, doi: 10.1093/mnras/stz1960
Fast likelihood-free cosmology with neural density estimators and active learning. , keywords =. doi:10.1093/mnras/stz1960 , archivePrefix =. 1903.00007 , primaryClass =
-
[53]
Hybrid Summary Statistics. arXiv:2410.07548 , keywords =. doi:10.48550/arXiv.2410.07548 , archivePrefix =. 2410.07548 , primaryClass =
-
[54]
Sahu, S. K. and Dey, D. K. and Branco, M. D. , title =. Canadian Journal of Statistics , volume =. 2003 , doi =
work page 2003
-
[55]
Dark energy survey year 3 results: likelihood-free, simulation-based wCDM inference with neural compression of weak-lensing map statistics. , keywords =. doi:10.1093/mnras/stae2629 , archivePrefix =. 2403.02314 , primaryClass =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.