pith. sign in

arxiv: 2605.22640 · v1 · pith:EUXDLDC2new · submitted 2026-05-21 · 📊 stat.ME

Positive-definiteness in separable priors: effects on prior interpretability and inference

Pith reviewed 2026-05-22 03:44 UTC · model grok-4.3

classification 📊 stat.ME
keywords positive definite matricesseparable priorstruncation effectssparse Bayesian inferenceprior interpretabilitymatrix shrinkage
0
0 comments X

The pith

Truncation to enforce positive-definiteness on separable matrix priors can unintentionally shift mass toward sparser structures in both the prior and posterior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines priors for symmetric positive-definite matrices that begin with independent entries and then truncate to enforce positive-definiteness. It shows that this truncation alters the distribution relative to the untruncated version, complicating interpretability and shrinkage properties. For sparse settings the effect is pronounced: the truncated prior and resulting posterior place systematically higher probability on sparser matrices than the untruncated counterpart would. The authors derive parameter choices, such as the variance assigned to off-diagonal entries, that reduce these discrepancies as matrix dimension increases. A sympathetic reader would care because many Bayesian analyses rely on the assumption that the prior behaves as intended before truncation; when it does not, posterior conclusions about sparsity become harder to justify directly from the modeling choices.

Core claim

Unless the variance parameters of the untruncated separable prior are chosen with care, the truncation that enforces positive-definiteness causes the resulting prior (and its induced posterior) to assign higher probability mass to sparser matrix structures than the original untruncated prior would have assigned.

What carries the argument

Truncation applied to a separable prior whose entries are initially independent, used to restrict support to the cone of positive-definite matrices.

If this is right

  • Sparse inference procedures that rely on these priors will report higher posterior probabilities for sparse matrices than the modeler may have intended.
  • Interpretability of shrinkage or regularization effects becomes difficult without explicit adjustment of prior variances as dimension grows.
  • Posterior inference on matrix structure can be made to match the untruncated case more closely by scaling the off-diagonal variance appropriately with dimension.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar truncation effects could appear in other constrained parameter spaces where independence is assumed before projection, such as correlation matrices or covariance matrices with additional sign restrictions.
  • If the goal is to preserve the marginal behavior of each entry, one might instead work directly with priors that are already supported on the positive-definite cone rather than truncating after the fact.

Load-bearing premise

That the untruncated version with independent entries already encodes the intended prior behavior, so any systematic change introduced by truncation is a distortion that needs to be corrected.

What would settle it

A simulation or analytic calculation for growing matrix dimension showing that, after the recommended adjustment of off-diagonal variances, the probability mass assigned to sparse versus dense structures becomes statistically indistinguishable between the truncated and untruncated priors.

Figures

Figures reproduced from arXiv: 2605.22640 by David Rossell, Jack Storror Carter.

Figure 1
Figure 1. Figure 1: Monte Carlo estimate of c = P(Θ ≻ 0) for fixed unit diagonal and Gaussian off-diagonals for different δk = µ σ √ k − 2. Left: fixed δk. Right: δk varies with k δk = 0.1 δk = 0.05 δk = 0 δk = − 0.05 δk = − 0.1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c δk = k−1 2 δk = k−2 3 δk = k−1 δk = k−2 δk = 0 0.92 0.96 1.00 0 50 100 150 200 k c Theorem 2. Let Θ be as in Theorem 1, but with independent stochastic di… view at source ↗
Figure 2
Figure 2. Figure 2: Monte Carlo estimate of c = P(Θ ≻ 0) for θii ∼ Exp(1) (left) and θii ∼ Gamma(2, 2) (right) and Gaussian off-diagonals with different standard deviations σ σ = k−2 σ = k−3 2 σ = k−5 4 σ = k−9 8 σ = k−1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c σ = k−5 4 σ = k−9 8 σ = k−1 σ = k−7 8 σ = k−3 4 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c 3 Sparse matrices An important class of priors are those that induce… view at source ↗
Figure 3
Figure 3. Figure 3: Monte Carlo estimate of c = P(Θ ≻ 0) in the sparse case for fixed θii = 1 and sparse Gaussian off-diagonals for different sparsity levels ηk = Pp(θij = 0) ηk = 0.5k −1/4 ηk = 0.5k −1/2 δk = 0.1 δk = 0.05 δk = 0 δk = − 0.05 δk = − 0.1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c δk = 0.1 δk = 0.05 δk = − 0.1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c 16 [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
read the original abstract

A popular class of priors for symmetric positive-definite matrices assumes independent entries and adds a truncation to ensure positive-definiteness. While conceptually simple and often computationally convenient, unless done carefully this truncation can have unintended effects. If the truncated prior or its margins are significantly different from their untruncated counterpart, then its interpretability may suffer, its shrinkage properties become harder to characterise, and posterior inference may be affected in unanticipated ways. We investigate the effect of the truncation both for dense and sparse matrices, and show how to set prior parameters such as the variance of off-diagonal entries such that said effect is mitigated as the matrix dimension grows. We pay particular attention to sparse inference where, unless prior parameters are set carefully, the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures than the untruncated prior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript examines separable priors for symmetric positive-definite matrices that assume independent entries and apply truncation to enforce positive-definiteness. It argues that this truncation can distort the prior (and thus the posterior) relative to the untruncated version, causing the truncated prior to assign systematically higher mass to sparser structures unless parameters such as the variance of off-diagonal entries are chosen carefully; the authors investigate this for both dense and sparse regimes and provide guidance on parameter scaling to mitigate the distortion as matrix dimension grows.

Significance. If the central derivations and any accompanying simulations hold, the work is significant for Bayesian covariance estimation and Gaussian graphical modeling, where separable priors are widely used. It clarifies interpretability and shrinkage issues that arise from truncation and supplies concrete parameter-setting rules that could improve prior elicitation and posterior behavior in high-dimensional settings. The emphasis on sparse inference is timely given the prevalence of sparsity-inducing models.

major comments (2)
  1. Abstract and §3 (or equivalent section deriving the posterior effect): the claim that 'the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures' presupposes likelihood neutrality with respect to sparsity. The manuscript must demonstrate this explicitly, for example by deriving the posterior under a standard observation model (Wishart or Gaussian graphical model) or by providing simulations that isolate the prior distortion under realistic data-generating processes; without such evidence the 'hence' step remains unsupported and could reverse under density-correlated likelihoods.
  2. Section on parameter mitigation (likely §4 or §5): the proposed scaling of the variance of off-diagonal entries to counteract the truncation effect as dimension p grows should be shown to be robust across sparsity levels. If the mitigation is derived under a specific sparsity regime, the manuscript should state the range of validity and provide a counter-example or bound when the assumption is violated.
minor comments (2)
  1. Notation for the truncated versus untruncated margins should be introduced earlier and used consistently; current usage in the abstract and early sections risks ambiguity when comparing marginal distributions.
  2. Figures comparing prior mass on sparsity patterns would benefit from explicit axis labels indicating the matrix dimension p and the specific variance value used, to allow readers to reproduce the mitigation effect.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below, indicating where we will revise the manuscript to strengthen the presentation and support for our claims.

read point-by-point responses
  1. Referee: Abstract and §3 (or equivalent section deriving the posterior effect): the claim that 'the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures' presupposes likelihood neutrality with respect to sparsity. The manuscript must demonstrate this explicitly, for example by deriving the posterior under a standard observation model (Wishart or Gaussian graphical model) or by providing simulations that isolate the prior distortion under realistic data-generating processes; without such evidence the 'hence' step remains unsupported and could reverse under density-correlated likelihoods.

    Authors: We agree that the transition from the prior distortion to its effect on the posterior requires explicit justification rather than an implicit assumption of likelihood neutrality. In the revised manuscript we will expand Section 3 to include a short derivation of the posterior under a multivariate Gaussian likelihood (with known mean) that isolates the contribution of the truncated prior. We will also add simulation results under both a Wishart observation model and a sparse Gaussian graphical model to demonstrate that the prior-induced preference for sparser structures persists in the posterior under standard data-generating processes. These additions will directly support the claim in the abstract and main text. revision: yes

  2. Referee: Section on parameter mitigation (likely §4 or §5): the proposed scaling of the variance of off-diagonal entries to counteract the truncation effect as dimension p grows should be shown to be robust across sparsity levels. If the mitigation is derived under a specific sparsity regime, the manuscript should state the range of validity and provide a counter-example or bound when the assumption is violated.

    Authors: The scaling rules presented in Sections 4 and 5 are derived under both the dense regime (all off-diagonal entries non-zero) and the sparse regime (fixed or slowly growing number of non-zero off-diagonals). We will revise the text to state explicitly the range of validity: the recommended scaling holds when the number of non-zero off-diagonals is o(p^2). In the dense limit the truncation bias vanishes without adjustment, which we already note. We will add a brief theoretical bound on the residual distortion for intermediate sparsity levels together with a simple counter-example (a moderately sparse matrix with sparsity rate p^{-1/2}) showing when the scaling must be further modified. These clarifications will be incorporated in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; analysis remains self-contained

full rationale

The paper examines truncation effects on separable priors for positive-definite matrices and provides guidance on parameter choice to mitigate interpretability and inference issues as dimension grows. No equations or claims in the provided abstract reduce a derived result to a fitted input, self-definition, or load-bearing self-citation chain. The central statements about mass assignment to sparse structures are presented as consequences of the truncation mechanism itself rather than as predictions forced by prior fitting or renaming. The derivation chain is independent of the target conclusions and does not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard construction of separable priors via independent entries plus truncation; main adjustable element is off-diagonal variance.

free parameters (1)
  • variance of off-diagonal entries
    Key parameter highlighted for careful setting to reduce truncation effects as matrix dimension increases.
axioms (1)
  • domain assumption Entries of the matrix are independent before applying the positive-definiteness truncation
    Core assumption in the separable prior class described in the abstract.

pith-pipeline@v0.9.0 · 5666 in / 1299 out tokens · 53074 ms · 2026-05-22T03:44:28.401713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    doi: 10.1016/j.jmva.2015.01.015

    ISSN 10957243. doi: 10.1016/j.jmva.2015.01.015. URL http://dx.doi.org/10.1016/j.jmva.2015.01.015. S. Boucheron, G. Lugosi, and P. Massart.Concentration inequalities: A nonasymptotic theory of indepen- dence. Oxford university press, Oxford,

  2. [2]

    doi: 10.1093/biostatistics/kxm045

    ISSN 14654644. doi: 10.1093/biostatistics/kxm045. Lingrui Gan, Naveen N Narisetty, and Feng Liang. Bayesian Regularization for Graphical Models With Unequal Shrinkage.Journal of the American Statistical Association, 114(527):1218–1231,

  3. [3]

    L., Athanasopoulos, G., and Hyndman, R

    ISSN 1537274X. doi: 10.1080/01621459.2018.1482755. Jack Jewson, Li Li, Laura Battaglia, Stephen Hansen, David Rossell, and Piotr Zwiernik. Graphical model inference with external network data.Biometrics, 80(4):ujae151,

  4. [4]

    Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,

    18 REFERENCES REFERENCES Steffen Lauritzen and Piotr Zwiernik. Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,

  5. [5]

    Bayesian computation for high-dimensional gaussian graphical models with spike-and-slab priors.arXiv, 2511.01875:1–139,

    Deborah Sulem, Jack Jewson, and David Rossell. Bayesian computation for high-dimensional gaussian graphical models with spike-and-slab priors.arXiv, 2511.01875:1–139,

  6. [6]

    doi: 10.1214/12-BA729

    ISSN 19360975. doi: 10.1214/12-BA729. Hao Wang. Scaling it up: Stochastic search structure learning in graphical models.Bayesian Analysis, 10 (2):351–377,

  7. [7]

    doi: 10.1214/14-BA916

    ISSN 19316690. doi: 10.1214/14-BA916. 19 A SECTION 1 PROOFS A Section 1 proofs A.1 Proof of Proposition 1 LetSbe the set of symmetric matrices andS + the set of PD matrices. The TV distance is given by TV(p, p+) = sup A⊆S p(A)−p +(A) . The supremum is achieved by taking anyAsuch that{Θ :p(Θ)< p +(Θ)} ⊆A⊆ {Θ :p(Θ)≤p +(Θ)}, provided thatAis measurable. If Θ...

  8. [8]

    off-diagonals with densityπ

    B.3 Proof of Theorem 1 We decompose Θ as Θ =µI+σX k whereX k has zero diagonal and i.i.d. off-diagonals with densityπ. Standard Wigner matrix theory shows that Wk = Xk√ k = Θ−µI σ √ k has eigenvalues converging to the semicircle distribution and, in particular, has minimum eigenvalueλmin(Wk)→ −2 with probability 1 ask→ ∞(Bai and Yin, 1988). Sinceλ min(Θ) ...

  9. [9]

    It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k)

    This follows from the Wigner matrix theory in Lee and Yin (2014), which shows that deviations of the smallest eigenvalue of eΘ from−2 are of orderk −2/3 in probability. It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k). B.4 Proof of Theorem 2 We decompose Θ as Θ =D+σX k =D k +σ √ kWk, whereDis the diagonal of...

  10. [10]

    To ease notation, letl z(Θ) andl z′(Θ) be random variables with distribution equal to the conditional distributionλ min(Θ)|Z=zandλ min(Θ)|Z=z ′ respectively

    From this, any otherz ′ such thatz ′ ≥zentry- wise follows by induction. To ease notation, letl z(Θ) andl z′(Θ) be random variables with distribution equal to the conditional distributionλ min(Θ)|Z=zandλ min(Θ)|Z=z ′ respectively. The goal is to show thatE[l z(Θ)]≥E[l z′(Θ)]. The proof strategy is to expressl z(Θ) andl z′(Θ) as infimums over a (conditiona...

  11. [11]

    To apply Theorem 4, we need to find ν=∥E(W 2)∥= X i>j E [zijθij(Eij +E ji)]2 = X i>j zij(Eii +E jj)E(θ2 ij) . where we used thatz ijθij(Eij +E ji) are independent,z 2 ij =z ij, that simple algebra shows that (Eij +E ji)2 = (Eii +E jj), and that for any set independent and zero-mean random matricesA 1, . . . , An, it holds that E   " nX i=1 Ai #2  =E ...

  12. [12]

    (2012), Theorem 2.7 shows that deviations of the maximum eigenvalue of a sparse Wigner matrix from 2 are of the orderk −2/3

    C.8 Proof of Corollaries 5-6 Erd˝ os et al. (2012), Theorem 2.7 shows that deviations of the maximum eigenvalue of a sparse Wigner matrix from 2 are of the orderk −2/3. In particular, the maximum eigenvalue converges to 2 ask→ ∞. Note the condition thatq > N 1/3 whereq= √kηk which corresponds to the conditionk −1/3 =o(η k). The proof then follows directly...