Positive-definiteness in separable priors: effects on prior interpretability and inference
Pith reviewed 2026-05-22 03:44 UTC · model grok-4.3
The pith
Truncation to enforce positive-definiteness on separable matrix priors can unintentionally shift mass toward sparser structures in both the prior and posterior.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Unless the variance parameters of the untruncated separable prior are chosen with care, the truncation that enforces positive-definiteness causes the resulting prior (and its induced posterior) to assign higher probability mass to sparser matrix structures than the original untruncated prior would have assigned.
What carries the argument
Truncation applied to a separable prior whose entries are initially independent, used to restrict support to the cone of positive-definite matrices.
If this is right
- Sparse inference procedures that rely on these priors will report higher posterior probabilities for sparse matrices than the modeler may have intended.
- Interpretability of shrinkage or regularization effects becomes difficult without explicit adjustment of prior variances as dimension grows.
- Posterior inference on matrix structure can be made to match the untruncated case more closely by scaling the off-diagonal variance appropriately with dimension.
Where Pith is reading between the lines
- Similar truncation effects could appear in other constrained parameter spaces where independence is assumed before projection, such as correlation matrices or covariance matrices with additional sign restrictions.
- If the goal is to preserve the marginal behavior of each entry, one might instead work directly with priors that are already supported on the positive-definite cone rather than truncating after the fact.
Load-bearing premise
That the untruncated version with independent entries already encodes the intended prior behavior, so any systematic change introduced by truncation is a distortion that needs to be corrected.
What would settle it
A simulation or analytic calculation for growing matrix dimension showing that, after the recommended adjustment of off-diagonal variances, the probability mass assigned to sparse versus dense structures becomes statistically indistinguishable between the truncated and untruncated priors.
Figures
read the original abstract
A popular class of priors for symmetric positive-definite matrices assumes independent entries and adds a truncation to ensure positive-definiteness. While conceptually simple and often computationally convenient, unless done carefully this truncation can have unintended effects. If the truncated prior or its margins are significantly different from their untruncated counterpart, then its interpretability may suffer, its shrinkage properties become harder to characterise, and posterior inference may be affected in unanticipated ways. We investigate the effect of the truncation both for dense and sparse matrices, and show how to set prior parameters such as the variance of off-diagonal entries such that said effect is mitigated as the matrix dimension grows. We pay particular attention to sparse inference where, unless prior parameters are set carefully, the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures than the untruncated prior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines separable priors for symmetric positive-definite matrices that assume independent entries and apply truncation to enforce positive-definiteness. It argues that this truncation can distort the prior (and thus the posterior) relative to the untruncated version, causing the truncated prior to assign systematically higher mass to sparser structures unless parameters such as the variance of off-diagonal entries are chosen carefully; the authors investigate this for both dense and sparse regimes and provide guidance on parameter scaling to mitigate the distortion as matrix dimension grows.
Significance. If the central derivations and any accompanying simulations hold, the work is significant for Bayesian covariance estimation and Gaussian graphical modeling, where separable priors are widely used. It clarifies interpretability and shrinkage issues that arise from truncation and supplies concrete parameter-setting rules that could improve prior elicitation and posterior behavior in high-dimensional settings. The emphasis on sparse inference is timely given the prevalence of sparsity-inducing models.
major comments (2)
- Abstract and §3 (or equivalent section deriving the posterior effect): the claim that 'the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures' presupposes likelihood neutrality with respect to sparsity. The manuscript must demonstrate this explicitly, for example by deriving the posterior under a standard observation model (Wishart or Gaussian graphical model) or by providing simulations that isolate the prior distortion under realistic data-generating processes; without such evidence the 'hence' step remains unsupported and could reverse under density-correlated likelihoods.
- Section on parameter mitigation (likely §4 or §5): the proposed scaling of the variance of off-diagonal entries to counteract the truncation effect as dimension p grows should be shown to be robust across sparsity levels. If the mitigation is derived under a specific sparsity regime, the manuscript should state the range of validity and provide a counter-example or bound when the assumption is violated.
minor comments (2)
- Notation for the truncated versus untruncated margins should be introduced earlier and used consistently; current usage in the abstract and early sections risks ambiguity when comparing marginal distributions.
- Figures comparing prior mass on sparsity patterns would benefit from explicit axis labels indicating the matrix dimension p and the specific variance value used, to allow readers to reproduce the mitigation effect.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below, indicating where we will revise the manuscript to strengthen the presentation and support for our claims.
read point-by-point responses
-
Referee: Abstract and §3 (or equivalent section deriving the posterior effect): the claim that 'the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures' presupposes likelihood neutrality with respect to sparsity. The manuscript must demonstrate this explicitly, for example by deriving the posterior under a standard observation model (Wishart or Gaussian graphical model) or by providing simulations that isolate the prior distortion under realistic data-generating processes; without such evidence the 'hence' step remains unsupported and could reverse under density-correlated likelihoods.
Authors: We agree that the transition from the prior distortion to its effect on the posterior requires explicit justification rather than an implicit assumption of likelihood neutrality. In the revised manuscript we will expand Section 3 to include a short derivation of the posterior under a multivariate Gaussian likelihood (with known mean) that isolates the contribution of the truncated prior. We will also add simulation results under both a Wishart observation model and a sparse Gaussian graphical model to demonstrate that the prior-induced preference for sparser structures persists in the posterior under standard data-generating processes. These additions will directly support the claim in the abstract and main text. revision: yes
-
Referee: Section on parameter mitigation (likely §4 or §5): the proposed scaling of the variance of off-diagonal entries to counteract the truncation effect as dimension p grows should be shown to be robust across sparsity levels. If the mitigation is derived under a specific sparsity regime, the manuscript should state the range of validity and provide a counter-example or bound when the assumption is violated.
Authors: The scaling rules presented in Sections 4 and 5 are derived under both the dense regime (all off-diagonal entries non-zero) and the sparse regime (fixed or slowly growing number of non-zero off-diagonals). We will revise the text to state explicitly the range of validity: the recommended scaling holds when the number of non-zero off-diagonals is o(p^2). In the dense limit the truncation bias vanishes without adjustment, which we already note. We will add a brief theoretical bound on the residual distortion for intermediate sparsity levels together with a simple counter-example (a moderately sparse matrix with sparsity rate p^{-1/2}) showing when the scaling must be further modified. These clarifications will be incorporated in the revised version. revision: yes
Circularity Check
No significant circularity detected; analysis remains self-contained
full rationale
The paper examines truncation effects on separable priors for positive-definite matrices and provides guidance on parameter choice to mitigate interpretability and inference issues as dimension grows. No equations or claims in the provided abstract reduce a derived result to a fitted input, self-definition, or load-bearing self-citation chain. The central statements about mass assignment to sparse structures are presented as consequences of the truncation mechanism itself rather than as predictions forced by prior fitting or renaming. The derivation chain is independent of the target conclusions and does not collapse by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- variance of off-diagonal entries
axioms (1)
- domain assumption Entries of the matrix are independent before applying the positive-definiteness truncation
Reference graph
Works this paper leans on
-
[1]
doi: 10.1016/j.jmva.2015.01.015
ISSN 10957243. doi: 10.1016/j.jmva.2015.01.015. URL http://dx.doi.org/10.1016/j.jmva.2015.01.015. S. Boucheron, G. Lugosi, and P. Massart.Concentration inequalities: A nonasymptotic theory of indepen- dence. Oxford university press, Oxford,
-
[2]
doi: 10.1093/biostatistics/kxm045
ISSN 14654644. doi: 10.1093/biostatistics/kxm045. Lingrui Gan, Naveen N Narisetty, and Feng Liang. Bayesian Regularization for Graphical Models With Unequal Shrinkage.Journal of the American Statistical Association, 114(527):1218–1231,
-
[3]
L., Athanasopoulos, G., and Hyndman, R
ISSN 1537274X. doi: 10.1080/01621459.2018.1482755. Jack Jewson, Li Li, Laura Battaglia, Stephen Hansen, David Rossell, and Piotr Zwiernik. Graphical model inference with external network data.Biometrics, 80(4):ujae151,
-
[4]
Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,
18 REFERENCES REFERENCES Steffen Lauritzen and Piotr Zwiernik. Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,
-
[5]
Deborah Sulem, Jack Jewson, and David Rossell. Bayesian computation for high-dimensional gaussian graphical models with spike-and-slab priors.arXiv, 2511.01875:1–139,
-
[6]
ISSN 19360975. doi: 10.1214/12-BA729. Hao Wang. Scaling it up: Stochastic search structure learning in graphical models.Bayesian Analysis, 10 (2):351–377,
-
[7]
ISSN 19316690. doi: 10.1214/14-BA916. 19 A SECTION 1 PROOFS A Section 1 proofs A.1 Proof of Proposition 1 LetSbe the set of symmetric matrices andS + the set of PD matrices. The TV distance is given by TV(p, p+) = sup A⊆S p(A)−p +(A) . The supremum is achieved by taking anyAsuch that{Θ :p(Θ)< p +(Θ)} ⊆A⊆ {Θ :p(Θ)≤p +(Θ)}, provided thatAis measurable. If Θ...
-
[8]
B.3 Proof of Theorem 1 We decompose Θ as Θ =µI+σX k whereX k has zero diagonal and i.i.d. off-diagonals with densityπ. Standard Wigner matrix theory shows that Wk = Xk√ k = Θ−µI σ √ k has eigenvalues converging to the semicircle distribution and, in particular, has minimum eigenvalueλmin(Wk)→ −2 with probability 1 ask→ ∞(Bai and Yin, 1988). Sinceλ min(Θ) ...
work page 1988
-
[9]
It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k)
This follows from the Wigner matrix theory in Lee and Yin (2014), which shows that deviations of the smallest eigenvalue of eΘ from−2 are of orderk −2/3 in probability. It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k). B.4 Proof of Theorem 2 We decompose Θ as Θ =D+σX k =D k +σ √ kWk, whereDis the diagonal of...
work page 2014
-
[10]
From this, any otherz ′ such thatz ′ ≥zentry- wise follows by induction. To ease notation, letl z(Θ) andl z′(Θ) be random variables with distribution equal to the conditional distributionλ min(Θ)|Z=zandλ min(Θ)|Z=z ′ respectively. The goal is to show thatE[l z(Θ)]≥E[l z′(Θ)]. The proof strategy is to expressl z(Θ) andl z′(Θ) as infimums over a (conditiona...
work page 2018
-
[11]
To apply Theorem 4, we need to find ν=∥E(W 2)∥= X i>j E [zijθij(Eij +E ji)]2 = X i>j zij(Eii +E jj)E(θ2 ij) . where we used thatz ijθij(Eij +E ji) are independent,z 2 ij =z ij, that simple algebra shows that (Eij +E ji)2 = (Eii +E jj), and that for any set independent and zero-mean random matricesA 1, . . . , An, it holds that E " nX i=1 Ai #2 =E ...
work page 2015
-
[12]
C.8 Proof of Corollaries 5-6 Erd˝ os et al. (2012), Theorem 2.7 shows that deviations of the maximum eigenvalue of a sparse Wigner matrix from 2 are of the orderk −2/3. In particular, the maximum eigenvalue converges to 2 ask→ ∞. Note the condition thatq > N 1/3 whereq= √kηk which corresponds to the conditionk −1/3 =o(η k). The proof then follows directly...
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.