Positive-definiteness in separable priors: effects on prior interpretability and inference

David Rossell; Jack Storror Carter

arxiv: 2605.22640 · v1 · pith:EUXDLDC2new · submitted 2026-05-21 · 📊 stat.ME

Positive-definiteness in separable priors: effects on prior interpretability and inference

Jack Storror Carter , David Rossell This is my paper

Pith reviewed 2026-05-22 03:44 UTC · model grok-4.3

classification 📊 stat.ME

keywords positive definite matricesseparable priorstruncation effectssparse Bayesian inferenceprior interpretabilitymatrix shrinkage

0 comments

The pith

Truncation to enforce positive-definiteness on separable matrix priors can unintentionally shift mass toward sparser structures in both the prior and posterior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines priors for symmetric positive-definite matrices that begin with independent entries and then truncate to enforce positive-definiteness. It shows that this truncation alters the distribution relative to the untruncated version, complicating interpretability and shrinkage properties. For sparse settings the effect is pronounced: the truncated prior and resulting posterior place systematically higher probability on sparser matrices than the untruncated counterpart would. The authors derive parameter choices, such as the variance assigned to off-diagonal entries, that reduce these discrepancies as matrix dimension increases. A sympathetic reader would care because many Bayesian analyses rely on the assumption that the prior behaves as intended before truncation; when it does not, posterior conclusions about sparsity become harder to justify directly from the modeling choices.

Core claim

Unless the variance parameters of the untruncated separable prior are chosen with care, the truncation that enforces positive-definiteness causes the resulting prior (and its induced posterior) to assign higher probability mass to sparser matrix structures than the original untruncated prior would have assigned.

What carries the argument

Truncation applied to a separable prior whose entries are initially independent, used to restrict support to the cone of positive-definite matrices.

If this is right

Sparse inference procedures that rely on these priors will report higher posterior probabilities for sparse matrices than the modeler may have intended.
Interpretability of shrinkage or regularization effects becomes difficult without explicit adjustment of prior variances as dimension grows.
Posterior inference on matrix structure can be made to match the untruncated case more closely by scaling the off-diagonal variance appropriately with dimension.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar truncation effects could appear in other constrained parameter spaces where independence is assumed before projection, such as correlation matrices or covariance matrices with additional sign restrictions.
If the goal is to preserve the marginal behavior of each entry, one might instead work directly with priors that are already supported on the positive-definite cone rather than truncating after the fact.

Load-bearing premise

That the untruncated version with independent entries already encodes the intended prior behavior, so any systematic change introduced by truncation is a distortion that needs to be corrected.

What would settle it

A simulation or analytic calculation for growing matrix dimension showing that, after the recommended adjustment of off-diagonal variances, the probability mass assigned to sparse versus dense structures becomes statistically indistinguishable between the truncated and untruncated priors.

Figures

Figures reproduced from arXiv: 2605.22640 by David Rossell, Jack Storror Carter.

**Figure 1.** Figure 1: Monte Carlo estimate of c = P(Θ ≻ 0) for fixed unit diagonal and Gaussian off-diagonals for different δk = µ σ √ k − 2. Left: fixed δk. Right: δk varies with k δk = 0.1 δk = 0.05 δk = 0 δk = − 0.05 δk = − 0.1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c δk = k−1 2 δk = k−2 3 δk = k−1 δk = k−2 δk = 0 0.92 0.96 1.00 0 50 100 150 200 k c Theorem 2. Let Θ be as in Theorem 1, but with independent stochastic di… view at source ↗

**Figure 2.** Figure 2: Monte Carlo estimate of c = P(Θ ≻ 0) for θii ∼ Exp(1) (left) and θii ∼ Gamma(2, 2) (right) and Gaussian off-diagonals with different standard deviations σ σ = k−2 σ = k−3 2 σ = k−5 4 σ = k−9 8 σ = k−1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c σ = k−5 4 σ = k−9 8 σ = k−1 σ = k−7 8 σ = k−3 4 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c 3 Sparse matrices An important class of priors are those that induce… view at source ↗

**Figure 3.** Figure 3: Monte Carlo estimate of c = P(Θ ≻ 0) in the sparse case for fixed θii = 1 and sparse Gaussian off-diagonals for different sparsity levels ηk = Pp(θij = 0) ηk = 0.5k −1/4 ηk = 0.5k −1/2 δk = 0.1 δk = 0.05 δk = 0 δk = − 0.05 δk = − 0.1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c δk = 0.1 δk = 0.05 δk = − 0.1 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 k c 16 [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

read the original abstract

A popular class of priors for symmetric positive-definite matrices assumes independent entries and adds a truncation to ensure positive-definiteness. While conceptually simple and often computationally convenient, unless done carefully this truncation can have unintended effects. If the truncated prior or its margins are significantly different from their untruncated counterpart, then its interpretability may suffer, its shrinkage properties become harder to characterise, and posterior inference may be affected in unanticipated ways. We investigate the effect of the truncation both for dense and sparse matrices, and show how to set prior parameters such as the variance of off-diagonal entries such that said effect is mitigated as the matrix dimension grows. We pay particular attention to sparse inference where, unless prior parameters are set carefully, the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures than the untruncated prior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Truncation to enforce positive definiteness in separable priors quietly biases both prior and posterior toward sparser matrices unless off-diagonal variance is scaled with dimension.

read the letter

The core observation is that adding a truncation step to an independent-entry prior on a symmetric positive definite matrix can shift probability mass toward sparser structures, and this shift persists into the posterior unless the prior variance on off-diagonals is adjusted as dimension grows. They examine the effect in both dense and sparse regimes and supply explicit scaling rules to keep the distortion small for large p. That guidance is the practical takeaway for anyone coding these priors for covariance or precision matrices. The work is useful because it treats a common implementation choice as something that needs explicit control rather than assuming the truncation is harmless. The scaling advice looks straightforward to apply and addresses a real gap between the conceptual prior and what actually gets used in computation. The main soft spot is the direct claim that the posterior inherits the same sparsity bias. The abstract moves from prior distortion to posterior effect without showing that the likelihood term is neutral or that simulations isolate the prior contribution under realistic data models. If the observation model itself favors denser matrices, the net posterior shift could be smaller than stated. The paper assumes the untruncated independent prior is the intended target; that is reasonable for the cases they consider but worth noting as a modeling choice. This note is aimed at people already working with separable priors in high-dimensional Bayesian settings, particularly sparse graphical models or covariance estimation. A reader who has seen unexpectedly strong sparsity in their results will get immediate value from the parameter rules. It deserves a serious referee because the issue is concrete, the proposed fix is usable, and the underlying math on truncation effects appears solid even if the posterior step needs tighter support.

Referee Report

2 major / 2 minor

Summary. The manuscript examines separable priors for symmetric positive-definite matrices that assume independent entries and apply truncation to enforce positive-definiteness. It argues that this truncation can distort the prior (and thus the posterior) relative to the untruncated version, causing the truncated prior to assign systematically higher mass to sparser structures unless parameters such as the variance of off-diagonal entries are chosen carefully; the authors investigate this for both dense and sparse regimes and provide guidance on parameter scaling to mitigate the distortion as matrix dimension grows.

Significance. If the central derivations and any accompanying simulations hold, the work is significant for Bayesian covariance estimation and Gaussian graphical modeling, where separable priors are widely used. It clarifies interpretability and shrinkage issues that arise from truncation and supplies concrete parameter-setting rules that could improve prior elicitation and posterior behavior in high-dimensional settings. The emphasis on sparse inference is timely given the prevalence of sparsity-inducing models.

major comments (2)

Abstract and §3 (or equivalent section deriving the posterior effect): the claim that 'the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures' presupposes likelihood neutrality with respect to sparsity. The manuscript must demonstrate this explicitly, for example by deriving the posterior under a standard observation model (Wishart or Gaussian graphical model) or by providing simulations that isolate the prior distortion under realistic data-generating processes; without such evidence the 'hence' step remains unsupported and could reverse under density-correlated likelihoods.
Section on parameter mitigation (likely §4 or §5): the proposed scaling of the variance of off-diagonal entries to counteract the truncation effect as dimension p grows should be shown to be robust across sparsity levels. If the mitigation is derived under a specific sparsity regime, the manuscript should state the range of validity and provide a counter-example or bound when the assumption is violated.

minor comments (2)

Notation for the truncated versus untruncated margins should be introduced earlier and used consistently; current usage in the abstract and early sections risks ambiguity when comparing marginal distributions.
Figures comparing prior mass on sparsity patterns would benefit from explicit axis labels indicating the matrix dimension p and the specific variance value used, to allow readers to reproduce the mitigation effect.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below, indicating where we will revise the manuscript to strengthen the presentation and support for our claims.

read point-by-point responses

Referee: Abstract and §3 (or equivalent section deriving the posterior effect): the claim that 'the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures' presupposes likelihood neutrality with respect to sparsity. The manuscript must demonstrate this explicitly, for example by deriving the posterior under a standard observation model (Wishart or Gaussian graphical model) or by providing simulations that isolate the prior distortion under realistic data-generating processes; without such evidence the 'hence' step remains unsupported and could reverse under density-correlated likelihoods.

Authors: We agree that the transition from the prior distortion to its effect on the posterior requires explicit justification rather than an implicit assumption of likelihood neutrality. In the revised manuscript we will expand Section 3 to include a short derivation of the posterior under a multivariate Gaussian likelihood (with known mean) that isolates the contribution of the truncated prior. We will also add simulation results under both a Wishart observation model and a sparse Gaussian graphical model to demonstrate that the prior-induced preference for sparser structures persists in the posterior under standard data-generating processes. These additions will directly support the claim in the abstract and main text. revision: yes
Referee: Section on parameter mitigation (likely §4 or §5): the proposed scaling of the variance of off-diagonal entries to counteract the truncation effect as dimension p grows should be shown to be robust across sparsity levels. If the mitigation is derived under a specific sparsity regime, the manuscript should state the range of validity and provide a counter-example or bound when the assumption is violated.

Authors: The scaling rules presented in Sections 4 and 5 are derived under both the dense regime (all off-diagonal entries non-zero) and the sparse regime (fixed or slowly growing number of non-zero off-diagonals). We will revise the text to state explicitly the range of validity: the recommended scaling holds when the number of non-zero off-diagonals is o(p^2). In the dense limit the truncation bias vanishes without adjustment, which we already note. We will add a brief theoretical bound on the residual distortion for intermediate sparsity levels together with a simple counter-example (a moderately sparse matrix with sparsity rate p^{-1/2}) showing when the scaling must be further modified. These clarifications will be incorporated in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; analysis remains self-contained

full rationale

The paper examines truncation effects on separable priors for positive-definite matrices and provides guidance on parameter choice to mitigate interpretability and inference issues as dimension grows. No equations or claims in the provided abstract reduce a derived result to a fitted input, self-definition, or load-bearing self-citation chain. The central statements about mass assignment to sparse structures are presented as consequences of the truncation mechanism itself rather than as predictions forced by prior fitting or renaming. The derivation chain is independent of the target conclusions and does not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard construction of separable priors via independent entries plus truncation; main adjustable element is off-diagonal variance.

free parameters (1)

variance of off-diagonal entries
Key parameter highlighted for careful setting to reduce truncation effects as matrix dimension increases.

axioms (1)

domain assumption Entries of the matrix are independent before applying the positive-definiteness truncation
Core assumption in the separable prior class described in the abstract.

pith-pipeline@v0.9.0 · 5666 in / 1299 out tokens · 53074 ms · 2026-05-22T03:44:28.401713+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

doi: 10.1016/j.jmva.2015.01.015

ISSN 10957243. doi: 10.1016/j.jmva.2015.01.015. URL http://dx.doi.org/10.1016/j.jmva.2015.01.015. S. Boucheron, G. Lugosi, and P. Massart.Concentration inequalities: A nonasymptotic theory of indepen- dence. Oxford university press, Oxford,

work page doi:10.1016/j.jmva.2015.01.015 2015
[2]

doi: 10.1093/biostatistics/kxm045

ISSN 14654644. doi: 10.1093/biostatistics/kxm045. Lingrui Gan, Naveen N Narisetty, and Feng Liang. Bayesian Regularization for Graphical Models With Unequal Shrinkage.Journal of the American Statistical Association, 114(527):1218–1231,

work page doi:10.1093/biostatistics/kxm045
[3]

L., Athanasopoulos, G., and Hyndman, R

ISSN 1537274X. doi: 10.1080/01621459.2018.1482755. Jack Jewson, Li Li, Laura Battaglia, Stephen Hansen, David Rossell, and Piotr Zwiernik. Graphical model inference with external network data.Biometrics, 80(4):ujae151,

work page doi:10.1080/01621459.2018.1482755 2018
[4]

Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,

18 REFERENCES REFERENCES Steffen Lauritzen and Piotr Zwiernik. Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,

work page arXiv 2008
[5]

Bayesian computation for high-dimensional gaussian graphical models with spike-and-slab priors.arXiv, 2511.01875:1–139,

Deborah Sulem, Jack Jewson, and David Rossell. Bayesian computation for high-dimensional gaussian graphical models with spike-and-slab priors.arXiv, 2511.01875:1–139,

work page arXiv
[6]

doi: 10.1214/12-BA729

ISSN 19360975. doi: 10.1214/12-BA729. Hao Wang. Scaling it up: Stochastic search structure learning in graphical models.Bayesian Analysis, 10 (2):351–377,

work page doi:10.1214/12-ba729
[7]

doi: 10.1214/14-BA916

ISSN 19316690. doi: 10.1214/14-BA916. 19 A SECTION 1 PROOFS A Section 1 proofs A.1 Proof of Proposition 1 LetSbe the set of symmetric matrices andS + the set of PD matrices. The TV distance is given by TV(p, p+) = sup A⊆S p(A)−p +(A) . The supremum is achieved by taking anyAsuch that{Θ :p(Θ)< p +(Θ)} ⊆A⊆ {Θ :p(Θ)≤p +(Θ)}, provided thatAis measurable. If Θ...

work page doi:10.1214/14-ba916
[8]

off-diagonals with densityπ

B.3 Proof of Theorem 1 We decompose Θ as Θ =µI+σX k whereX k has zero diagonal and i.i.d. off-diagonals with densityπ. Standard Wigner matrix theory shows that Wk = Xk√ k = Θ−µI σ √ k has eigenvalues converging to the semicircle distribution and, in particular, has minimum eigenvalueλmin(Wk)→ −2 with probability 1 ask→ ∞(Bai and Yin, 1988). Sinceλ min(Θ) ...

work page 1988
[9]

It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k)

This follows from the Wigner matrix theory in Lee and Yin (2014), which shows that deviations of the smallest eigenvalue of eΘ from−2 are of orderk −2/3 in probability. It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k). B.4 Proof of Theorem 2 We decompose Θ as Θ =D+σX k =D k +σ √ kWk, whereDis the diagonal of...

work page 2014
[10]

To ease notation, letl z(Θ) andl z′(Θ) be random variables with distribution equal to the conditional distributionλ min(Θ)|Z=zandλ min(Θ)|Z=z ′ respectively

From this, any otherz ′ such thatz ′ ≥zentry- wise follows by induction. To ease notation, letl z(Θ) andl z′(Θ) be random variables with distribution equal to the conditional distributionλ min(Θ)|Z=zandλ min(Θ)|Z=z ′ respectively. The goal is to show thatE[l z(Θ)]≥E[l z′(Θ)]. The proof strategy is to expressl z(Θ) andl z′(Θ) as infimums over a (conditiona...

work page 2018
[11]

To apply Theorem 4, we need to find ν=∥E(W 2)∥= X i>j E [zijθij(Eij +E ji)]2 = X i>j zij(Eii +E jj)E(θ2 ij) . where we used thatz ijθij(Eij +E ji) are independent,z 2 ij =z ij, that simple algebra shows that (Eij +E ji)2 = (Eii +E jj), and that for any set independent and zero-mean random matricesA 1, . . . , An, it holds that E   " nX i=1 Ai #2  =E ...

work page 2015
[12]

(2012), Theorem 2.7 shows that deviations of the maximum eigenvalue of a sparse Wigner matrix from 2 are of the orderk −2/3

C.8 Proof of Corollaries 5-6 Erd˝ os et al. (2012), Theorem 2.7 shows that deviations of the maximum eigenvalue of a sparse Wigner matrix from 2 are of the orderk −2/3. In particular, the maximum eigenvalue converges to 2 ask→ ∞. Note the condition thatq > N 1/3 whereq= √kηk which corresponds to the conditionk −1/3 =o(η k). The proof then follows directly...

work page 2012

[1] [1]

doi: 10.1016/j.jmva.2015.01.015

ISSN 10957243. doi: 10.1016/j.jmva.2015.01.015. URL http://dx.doi.org/10.1016/j.jmva.2015.01.015. S. Boucheron, G. Lugosi, and P. Massart.Concentration inequalities: A nonasymptotic theory of indepen- dence. Oxford university press, Oxford,

work page doi:10.1016/j.jmva.2015.01.015 2015

[2] [2]

doi: 10.1093/biostatistics/kxm045

ISSN 14654644. doi: 10.1093/biostatistics/kxm045. Lingrui Gan, Naveen N Narisetty, and Feng Liang. Bayesian Regularization for Graphical Models With Unequal Shrinkage.Journal of the American Statistical Association, 114(527):1218–1231,

work page doi:10.1093/biostatistics/kxm045

[3] [3]

L., Athanasopoulos, G., and Hyndman, R

ISSN 1537274X. doi: 10.1080/01621459.2018.1482755. Jack Jewson, Li Li, Laura Battaglia, Stephen Hansen, David Rossell, and Piotr Zwiernik. Graphical model inference with external network data.Biometrics, 80(4):ujae151,

work page doi:10.1080/01621459.2018.1482755 2018

[4] [4]

Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,

18 REFERENCES REFERENCES Steffen Lauritzen and Piotr Zwiernik. Locally associated graphical models and mixed convex exponential families.arXiv, 2008.04688:1–34,

work page arXiv 2008

[5] [5]

Bayesian computation for high-dimensional gaussian graphical models with spike-and-slab priors.arXiv, 2511.01875:1–139,

Deborah Sulem, Jack Jewson, and David Rossell. Bayesian computation for high-dimensional gaussian graphical models with spike-and-slab priors.arXiv, 2511.01875:1–139,

work page arXiv

[6] [6]

doi: 10.1214/12-BA729

ISSN 19360975. doi: 10.1214/12-BA729. Hao Wang. Scaling it up: Stochastic search structure learning in graphical models.Bayesian Analysis, 10 (2):351–377,

work page doi:10.1214/12-ba729

[7] [7]

doi: 10.1214/14-BA916

ISSN 19316690. doi: 10.1214/14-BA916. 19 A SECTION 1 PROOFS A Section 1 proofs A.1 Proof of Proposition 1 LetSbe the set of symmetric matrices andS + the set of PD matrices. The TV distance is given by TV(p, p+) = sup A⊆S p(A)−p +(A) . The supremum is achieved by taking anyAsuch that{Θ :p(Θ)< p +(Θ)} ⊆A⊆ {Θ :p(Θ)≤p +(Θ)}, provided thatAis measurable. If Θ...

work page doi:10.1214/14-ba916

[8] [8]

off-diagonals with densityπ

B.3 Proof of Theorem 1 We decompose Θ as Θ =µI+σX k whereX k has zero diagonal and i.i.d. off-diagonals with densityπ. Standard Wigner matrix theory shows that Wk = Xk√ k = Θ−µI σ √ k has eigenvalues converging to the semicircle distribution and, in particular, has minimum eigenvalueλmin(Wk)→ −2 with probability 1 ask→ ∞(Bai and Yin, 1988). Sinceλ min(Θ) ...

work page 1988

[9] [9]

It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k)

This follows from the Wigner matrix theory in Lee and Yin (2014), which shows that deviations of the smallest eigenvalue of eΘ from−2 are of orderk −2/3 in probability. It follows that we still have limk→∞ c= 1 if one sets anyσ= µ (2+δk) √ k such thatk −2/3 =o(δ k). B.4 Proof of Theorem 2 We decompose Θ as Θ =D+σX k =D k +σ √ kWk, whereDis the diagonal of...

work page 2014

[10] [10]

To ease notation, letl z(Θ) andl z′(Θ) be random variables with distribution equal to the conditional distributionλ min(Θ)|Z=zandλ min(Θ)|Z=z ′ respectively

From this, any otherz ′ such thatz ′ ≥zentry- wise follows by induction. To ease notation, letl z(Θ) andl z′(Θ) be random variables with distribution equal to the conditional distributionλ min(Θ)|Z=zandλ min(Θ)|Z=z ′ respectively. The goal is to show thatE[l z(Θ)]≥E[l z′(Θ)]. The proof strategy is to expressl z(Θ) andl z′(Θ) as infimums over a (conditiona...

work page 2018

[11] [11]

To apply Theorem 4, we need to find ν=∥E(W 2)∥= X i>j E [zijθij(Eij +E ji)]2 = X i>j zij(Eii +E jj)E(θ2 ij) . where we used thatz ijθij(Eij +E ji) are independent,z 2 ij =z ij, that simple algebra shows that (Eij +E ji)2 = (Eii +E jj), and that for any set independent and zero-mean random matricesA 1, . . . , An, it holds that E   " nX i=1 Ai #2  =E ...

work page 2015

[12] [12]

(2012), Theorem 2.7 shows that deviations of the maximum eigenvalue of a sparse Wigner matrix from 2 are of the orderk −2/3

C.8 Proof of Corollaries 5-6 Erd˝ os et al. (2012), Theorem 2.7 shows that deviations of the maximum eigenvalue of a sparse Wigner matrix from 2 are of the orderk −2/3. In particular, the maximum eigenvalue converges to 2 ask→ ∞. Note the condition thatq > N 1/3 whereq= √kηk which corresponds to the conditionk −1/3 =o(η k). The proof then follows directly...

work page 2012