Posterior Bayesian Neural Networks with Dependent Weights
Pith reviewed 2026-05-19 02:43 UTC · model grok-4.3
The pith
If the random covariance is positive definite, the posterior of wide Bayesian neural network outputs is identified.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If the random covariance matrix of the infinite-width limit is positive definite under the prior, the posterior distribution of the output is identified in the wide-width limit according to a sequential regime. Mild sufficient conditions ensure the invertibility of this matrix under the prior. Sufficient conditions on the activation function and associated Levy measures ensure the sequential limits are independent of order.
What carries the argument
The random covariance matrix of the infinite-width Gaussian mixture limit, whose positive definiteness under the prior enables identification of the posterior.
If this is right
- The output posterior becomes identifiable in terms of the prior Gaussian mixture when the covariance condition holds.
- The limit does not depend on widening order under suitable conditions on activations and Levy measures.
- The invertibility results extend to networks with dependent and heavy-tailed weights.
- Numerical simulations confirm the posterior identification in concrete cases.
Where Pith is reading between the lines
- This identification could support more reliable uncertainty estimates when using dependent weight priors in large models.
- Dependent weights may better capture data correlations while keeping wide-limit analysis tractable.
- Checking positive definiteness on finite but large networks could serve as a practical test of the limit result.
- Order independence suggests that width scaling order need not affect theoretical predictions in many cases.
Load-bearing premise
The random covariance matrix of the infinite-width limit is positive definite under the prior.
What would settle it
A computation showing the random covariance matrix fails to be positive definite under the prior for some activation or Levy measure, so that the posterior cannot be identified.
Figures
read the original abstract
We consider fully connected and feedforward deep neural networks with dependent and possibly heavy-tailed weights, as introduced in [26], to address limitations of the standard Gaussian prior. It has been proved in [26] that, as the number of nodes in the hidden layers grows large, according to a sequential and ordered limit, the law of the output converges weakly to a Gaussian mixture. In this paper, we study the neural network through the lens of the posterior distribution with a Gaussian likelihood. If the random covariance matrix of the infinite-width limit is positive definite under the prior, we identify the posterior distribution of the output in the wide-width limit according to a sequential regime. Remarkably, we provide mild sufficient conditions to ensure the aforementioned invertibility of the random covariance matrix under the prior, thereby extending the results in [8]. Among our results, we present sufficient conditions on some model parameters (the activation function and the associated L\'evy measures) which ensure that the sequential limits are independent of the order. We illustrate our findings with examples and numerical simulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines fully connected feedforward Bayesian neural networks with dependent, possibly heavy-tailed weights. Building on the weak convergence of the output law to a Gaussian mixture in the sequential infinite-width limit established in [26], the authors show that, conditional on the random covariance matrix of this limit being positive definite under the prior, the posterior distribution of the output can be identified when a Gaussian likelihood is used. They supply mild sufficient conditions on the activation function and associated Lévy measures that guarantee this positive definiteness (extending [8]) and that render the sequential limits independent of layer-order. The results are illustrated with theoretical examples and numerical simulations.
Significance. If the positive-definiteness condition holds for the models under study, the work supplies a concrete characterization of limiting posteriors for BNNs outside the standard Gaussian-prior regime. This could support theoretical analysis of uncertainty quantification and generalization when heavy-tailed or dependent priors are employed. The explicit sufficient conditions and the order-independence result constitute clear technical contributions; the numerical illustrations provide initial empirical grounding.
major comments (1)
- [§4, §5] §4 (Main Results) and §5 (Numerical Experiments): The central identification of the limiting posterior (Theorem 4.1) is explicitly conditional on positive definiteness of the random covariance under the prior. The paper states mild sufficient conditions on the activation and Lévy measure that guarantee this property, yet the numerical examples in §5 do not verify that these conditions are satisfied for the concrete activations (e.g., ReLU) and Lévy measures chosen in the simulations. Because singularity on a set of positive prior probability would invalidate the posterior identification and the subsequent Gaussian-mixture inversion, this verification is load-bearing for the applicability of the claimed results to the reported experiments.
minor comments (3)
- [§2] §2 (Model Setup): The notation for the sequential width limits and the dependence structure induced by the Lévy measure could be clarified with an explicit diagram or additional sentence relating the finite-width covariance to the limiting random measure.
- [Figure 2] Figure 2 caption: The parameters of the Lévy measure and the network depth used in the plotted trajectories should be stated explicitly so that readers can reproduce the positive-definiteness check if desired.
- [Introduction] References: [8] and [26] are central; ensure that the precise statements being extended (e.g., the invertibility result in [8]) are quoted or paraphrased in the introduction for immediate context.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. We appreciate the positive assessment of the significance of our results on posterior identification for BNNs with dependent weights. We address the major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4, §5] §4 (Main Results) and §5 (Numerical Experiments): The central identification of the limiting posterior (Theorem 4.1) is explicitly conditional on positive definiteness of the random covariance under the prior. The paper states mild sufficient conditions on the activation and Lévy measure that guarantee this property, yet the numerical examples in §5 do not verify that these conditions are satisfied for the concrete activations (e.g., ReLU) and Lévy measures chosen in the simulations. Because singularity on a set of positive prior probability would invalidate the posterior identification and the subsequent Gaussian-mixture inversion, this verification is load-bearing for the applicability of the claimed results to the reported experiments.
Authors: We agree that explicit verification of the positive-definiteness conditions is necessary to ensure the numerical experiments fall within the regime where Theorem 4.1 applies. In the revised manuscript we will add a short subsection (or appendix paragraph) in §5 that checks the sufficient conditions of Theorem 4.2 for each activation function and Lévy measure used in the simulations. For the ReLU examples we will confirm that the associated Lévy measure satisfies the integrability and non-degeneracy requirements that guarantee the random covariance matrix is positive definite almost surely under the prior; analogous checks will be provided for the other activations and measures appearing in the figures. These verifications are straightforward given the explicit criteria already stated in the paper and will not alter any of the theoretical statements. revision: yes
Circularity Check
No circularity; posterior identification is conditional on an explicit assumption with independent sufficient conditions
full rationale
The derivation begins from the Gaussian-mixture convergence established in prior work [26] and then conditions the posterior identification on positive definiteness of the limiting random covariance. The paper supplies separate mild sufficient conditions on activations and Lévy measures to guarantee this invertibility, extending [8] without re-deriving the same quantities or fitting parameters that are then renamed as predictions. No step equates the target posterior to its inputs by construction, nor does any load-bearing claim reduce solely to an unverified self-citation chain. The sequential-limit results and order-independence conditions are derived from the stated assumptions rather than presupposing the final form.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The random covariance matrix of the infinite-width limit is positive definite under the prior
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
If the random covariance matrix of the infinite-width limit is positive definite under the prior, we identify the posterior distribution of the output in the wide-width limit according to a sequential regime. ... mild sufficient conditions on the activation function and the associated Lévy measures
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 4.1 ... lim nL→∞ … lim n1→∞ Z(L+1)B(x) = G(L+1)(x) ... MG(0, Id ⊗ K(L+1)(x)) with Markov chain on positive semi-definite matrices K(ℓ)(x)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
L. Ambrosio, N. Fusco, and D. Pallara.Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, London, 2000
work page 2000
-
[3]
L. Andreis, F. Bassetti, and C. Hirsch. LDP for the covariance process in fully connected neural networks.arXiv:6431605, 2025
work page 2025
-
[4]
N. Apollonio, D. De Canditiis, G. Franzina, P. Stolfi, and G. L. Torrisi. Normal approximation of random gaussian neural networks.Stochastic Systems, 2025. to appear
work page 2025
-
[5]
K. Balasubramanian, L. Goldstein, and A. Salim. Gaussian random field approximation via stein’s method with applications to wide random neural networks.Appl. Comput. Harm. Anal., 72, 2024
work page 2024
-
[6]
K. Balasubramanian and N. Ross. Finite-dimensional Gaussian approximation for deep neural networks: Universality in random weights.arXiv:2507.12686, 2025
-
[7]
A. Basteri and D. Trevisan. Quantitative Gaussian approximation of randomly initialized deep neural networks.Machine Learning, 113:6373–6393, 2024
work page 2024
-
[8]
A. Bordino, S. Favaro, and S. Fortini. Non-asymptotic approximations of Gaussian neural networks via second-order Poincar´ e inequalities. InProceedings of Machine Learning Research (AABI24), 2024
work page 2024
-
[9]
V. Cammarota, D. Marinucci, M. Salvi, and S. Vigogna. A quantitative functional central limit theorem for shallow neural networks.Modern Stochastics: Theory and Applications, 11:85–108, 2024
work page 2024
-
[10]
F. Caporali, S. Favaro, and D. Trevisan. Student-t processes as infinite-width limits of posterior Bayesian neural networks.arXiv:2502.0427, 2025
-
[11]
L. Celli and G. Peccati. Entropic bounds for conditionally Gaussian vectors and application to neural networks.arXiv:2504.08335, 2025
- [12]
-
[13]
L. C. Evans and R. F. Gariepy.Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton, Florida, 1992
work page 1992
- [14]
- [15]
-
[16]
Federer.Geometric Measure Theory
H. Federer.Geometric Measure Theory. Springer, New York, 1969
work page 1969
- [17]
-
[18]
B. Hanin. Random neural networks in the infinite width limit as Gaussian processes.The Annals of Applied Probability, 33:4798–4819, 2023. 41
work page 2023
-
[19]
J. Hron, Y. Bahri, R. Novak, J. Pennington, and J. Sohl-Dickstein. Exact posterior distribu- tions of wide Bayesian neural networks. InWorkshop on Uncertainty and Robustness in Deep Learning, 2020
work page 2020
-
[20]
P. Izmailov, S. Vikram, M. D. Hoffman, and A. G. Wilson. What are Bayesian neural net- work posteriors really like? InProceedings of the 38th International Conference on Machine Learning, 2021
work page 2021
-
[21]
P. Jung, H. Lee, J. Lee, and H. Yang.α-stable convergence of heavy-tailed infinitely wide neural networks.Advances in Applied Probability, 55:1415–1441, 2023
work page 2023
-
[22]
P. Lancaster and M. Tismenetsky.Theory of Matrices: With Applications. San Diego Univer- sity Press, San Diego, 1985
work page 1985
-
[23]
H. Lee, F. Ayed, P. Jung, J. Lee, H. Yang, and F. Caron. Deep neural networks with dependent weights: Gaussian process mixture limit, heavy tails, sparsity and compressibility.Journal of Machine Learning Research, 24:78 pp., 2023
work page 2023
-
[24]
J. Lee, Y. Bahri, R. Novak, S. Schoenholz, J. Pennington, and J. Sohl-Dickstein. Deep neural networks as Gaussian processes. InProceedings of the 6th International Conference on Learning Representations, 2018
work page 2018
- [25]
-
[26]
K. V. Mardia, J. T. Kent, and J. M. Bibby.Multivariate Analysis Probability and Mathematical Statistics. Academic Press, Waltham, 1995
work page 1995
-
[27]
C. H. Martin and M. W. Mahoney. Traditional and heavy-tailed self regularization in neural network models. InInternational Conference on Machine Learning, 2019
work page 2019
-
[28]
A. G. D. G. Matthews, J. Hron, M. Rowland, R. E. Turner, and Z. Ghahramani. Gaus- sian process behaviour in wide deep neural networks. InProceedings of the 6th International Conference on Learning Representations, 2018
work page 2018
-
[29]
S. Mei, A. Montanari, and P.M. Nguyen. A mean field view of the landscape of two-layers neural network.Proceedings of the National Academy of Sciences (PNAS), 2018
work page 2018
-
[30]
R. M. Neal.Bayesian Learning for Neural Networks. PhD thesis, Department of Computer Science, University of Toronto, 1995
work page 1995
-
[31]
R. M. Neal. Priors for infinite networks. InBayesian Learning for Neural Networks. Lecture Notes in Statistics, volume 118, pages 29–53. Springer, New York, 1996
work page 1996
-
[32]
L. Pezzetti, S. Favaro, and S. Peluchetti. Function-space MCMC for Bayesian wide neural networks.arXiv:2408.14325, 2024
- [33]
-
[34]
Sato.L´ evy Processes and Infinitely Divisible Distributions
K. Sato.L´ evy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge, 1999. 42
work page 1999
- [35]
- [36]
-
[37]
F. Wentzel, K. Roth, B. S. Veeling, J. Swiatkowski, L. Tran, S. Mandt, J. Snoek, T. Salimans, R. Jenatton, and S. Nowozin. How good is the Bayes posterior in deep neural networks really? InProceedings of the 37th International Conference on Machine Learning, pages 10248–10259, 2020. 43
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.