Posterior Bayesian Neural Networks with Dependent Weights

Giovanni Franzina; Giovanni Luca Torrisi; Nicola Apollonio

arxiv: 2507.22095 · v5 · pith:BQ5HSWBHnew · submitted 2025-07-29 · 📊 stat.ML · cs.LG· math.PR

Posterior Bayesian Neural Networks with Dependent Weights

Nicola Apollonio , Giovanni Franzina , Giovanni Luca Torrisi This is my paper

Pith reviewed 2026-05-19 02:43 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PR

keywords Bayesian neural networksdependent weightsinfinite-width limitGaussian mixtureposterior distributionpositive definite covariancesequential regimeLevy measures

0 comments

The pith

If the random covariance is positive definite, the posterior of wide Bayesian neural network outputs is identified.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes conditions under which the posterior distribution of outputs from deep neural networks with dependent weights can be identified in the infinite-width limit. Building on the convergence of the prior to a Gaussian mixture, the authors show that positive definiteness of the random covariance matrix allows for this identification in a sequential widening regime. Sympathetic readers would care because this provides a theoretical handle on Bayesian inference for networks with more general weight distributions than independent Gaussians. The work also gives conditions on activations and Levy measures to make the results robust to the order of limits.

Core claim

If the random covariance matrix of the infinite-width limit is positive definite under the prior, the posterior distribution of the output is identified in the wide-width limit according to a sequential regime. Mild sufficient conditions ensure the invertibility of this matrix under the prior. Sufficient conditions on the activation function and associated Levy measures ensure the sequential limits are independent of order.

What carries the argument

The random covariance matrix of the infinite-width Gaussian mixture limit, whose positive definiteness under the prior enables identification of the posterior.

If this is right

The output posterior becomes identifiable in terms of the prior Gaussian mixture when the covariance condition holds.
The limit does not depend on widening order under suitable conditions on activations and Levy measures.
The invertibility results extend to networks with dependent and heavy-tailed weights.
Numerical simulations confirm the posterior identification in concrete cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This identification could support more reliable uncertainty estimates when using dependent weight priors in large models.
Dependent weights may better capture data correlations while keeping wide-limit analysis tractable.
Checking positive definiteness on finite but large networks could serve as a practical test of the limit result.
Order independence suggests that width scaling order need not affect theoretical predictions in many cases.

Load-bearing premise

The random covariance matrix of the infinite-width limit is positive definite under the prior.

What would settle it

A computation showing the random covariance matrix fails to be positive definite under the prior for some activation or Levy measure, so that the posterior cannot be identified.

Figures

Figures reproduced from arXiv: 2507.22095 by Giovanni Franzina, Giovanni Luca Torrisi, Nicola Apollonio.

**Figure 1.** Figure 1: Model 1. Distribution function of the marginals of the 3-variate posterior Bayesian neural network for different widths (n = 4, 8, 16, 32) compared to the corresponding marginal distribution functions for the wide width limit. All the parameters are specified in Section 10.1. 10.2 Simulation of the Model 2 We consider the Model 2 specified by: CB := 1, CW := 1, L = 1, n0 := 4, n1 ∈ {2, 4, 8, 16}, n2 := 1, … view at source ↗

**Figure 2.** Figure 2: Model 2. Distribution function of the marginals of the 3-variate posterior Bayesian neural network for different widths (n = 2, 4, 8, 16) compared to the corresponding marginal distribution functions for the wide width limit. All the parameters are specified in Section 10.2. and, as noticed in Appendix E.3.2 of [23], T (1) (j) = 4 ( Pj k=1 Ek) 2 , where {Ek}k≥1 is a sequence of independent random variables… view at source ↗

read the original abstract

We consider fully connected and feedforward deep neural networks with dependent and possibly heavy-tailed weights, as introduced in [26], to address limitations of the standard Gaussian prior. It has been proved in [26] that, as the number of nodes in the hidden layers grows large, according to a sequential and ordered limit, the law of the output converges weakly to a Gaussian mixture. In this paper, we study the neural network through the lens of the posterior distribution with a Gaussian likelihood. If the random covariance matrix of the infinite-width limit is positive definite under the prior, we identify the posterior distribution of the output in the wide-width limit according to a sequential regime. Remarkably, we provide mild sufficient conditions to ensure the aforementioned invertibility of the random covariance matrix under the prior, thereby extending the results in [8]. Among our results, we present sufficient conditions on some model parameters (the activation function and the associated L\'evy measures) which ensure that the sequential limits are independent of the order. We illustrate our findings with examples and numerical simulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They give conditions to keep the limiting covariance positive definite for dependent-weight networks, allowing posterior identification, but the examples don't confirm the conditions hold.

read the letter

The main thing to know is that they supply mild sufficient conditions on the activation function and associated Levy measures to guarantee that the random covariance in the infinite-width limit stays positive definite under the prior. This lets them identify the posterior distribution of the output in the wide limit, extending the convergence result from their earlier work. They also give conditions that make the sequential limits independent of the order in which layers go to infinity. The setup uses a Gaussian likelihood and builds directly on the Gaussian mixture limit for the prior. This is useful technical progress for handling more realistic priors in Bayesian neural nets. The conditions address a gap left by the standard Gaussian case and seem to follow from the invertibility results they cite. The abstract states the claims clearly without obvious circularity. The soft spot is that the identification is conditional on positive definiteness, and the paper does not appear to verify that the supplied conditions hold for the specific activations and Levy measures in their numerical examples. If the covariance is singular with positive probability, the posterior can't be recovered that way. The sequential nature makes this check important across different orders, and it would be good to see that confirmed. This paper is for specialists in infinite-width Bayesian deep learning. A reader focused on dependent or heavy-tailed weights would get the most out of the new conditions. It has enough new math to merit a serious referee who can go through the proofs and check the examples against the conditions. I would recommend sending it to peer review.

Referee Report

1 major / 3 minor

Summary. The manuscript examines fully connected feedforward Bayesian neural networks with dependent, possibly heavy-tailed weights. Building on the weak convergence of the output law to a Gaussian mixture in the sequential infinite-width limit established in [26], the authors show that, conditional on the random covariance matrix of this limit being positive definite under the prior, the posterior distribution of the output can be identified when a Gaussian likelihood is used. They supply mild sufficient conditions on the activation function and associated Lévy measures that guarantee this positive definiteness (extending [8]) and that render the sequential limits independent of layer-order. The results are illustrated with theoretical examples and numerical simulations.

Significance. If the positive-definiteness condition holds for the models under study, the work supplies a concrete characterization of limiting posteriors for BNNs outside the standard Gaussian-prior regime. This could support theoretical analysis of uncertainty quantification and generalization when heavy-tailed or dependent priors are employed. The explicit sufficient conditions and the order-independence result constitute clear technical contributions; the numerical illustrations provide initial empirical grounding.

major comments (1)

[§4, §5] §4 (Main Results) and §5 (Numerical Experiments): The central identification of the limiting posterior (Theorem 4.1) is explicitly conditional on positive definiteness of the random covariance under the prior. The paper states mild sufficient conditions on the activation and Lévy measure that guarantee this property, yet the numerical examples in §5 do not verify that these conditions are satisfied for the concrete activations (e.g., ReLU) and Lévy measures chosen in the simulations. Because singularity on a set of positive prior probability would invalidate the posterior identification and the subsequent Gaussian-mixture inversion, this verification is load-bearing for the applicability of the claimed results to the reported experiments.

minor comments (3)

[§2] §2 (Model Setup): The notation for the sequential width limits and the dependence structure induced by the Lévy measure could be clarified with an explicit diagram or additional sentence relating the finite-width covariance to the limiting random measure.
[Figure 2] Figure 2 caption: The parameters of the Lévy measure and the network depth used in the plotted trajectories should be stated explicitly so that readers can reproduce the positive-definiteness check if desired.
[Introduction] References: [8] and [26] are central; ensure that the precise statements being extended (e.g., the invertibility result in [8]) are quoted or paraphrased in the introduction for immediate context.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We appreciate the positive assessment of the significance of our results on posterior identification for BNNs with dependent weights. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4, §5] §4 (Main Results) and §5 (Numerical Experiments): The central identification of the limiting posterior (Theorem 4.1) is explicitly conditional on positive definiteness of the random covariance under the prior. The paper states mild sufficient conditions on the activation and Lévy measure that guarantee this property, yet the numerical examples in §5 do not verify that these conditions are satisfied for the concrete activations (e.g., ReLU) and Lévy measures chosen in the simulations. Because singularity on a set of positive prior probability would invalidate the posterior identification and the subsequent Gaussian-mixture inversion, this verification is load-bearing for the applicability of the claimed results to the reported experiments.

Authors: We agree that explicit verification of the positive-definiteness conditions is necessary to ensure the numerical experiments fall within the regime where Theorem 4.1 applies. In the revised manuscript we will add a short subsection (or appendix paragraph) in §5 that checks the sufficient conditions of Theorem 4.2 for each activation function and Lévy measure used in the simulations. For the ReLU examples we will confirm that the associated Lévy measure satisfies the integrability and non-degeneracy requirements that guarantee the random covariance matrix is positive definite almost surely under the prior; analogous checks will be provided for the other activations and measures appearing in the figures. These verifications are straightforward given the explicit criteria already stated in the paper and will not alter any of the theoretical statements. revision: yes

Circularity Check

0 steps flagged

No circularity; posterior identification is conditional on an explicit assumption with independent sufficient conditions

full rationale

The derivation begins from the Gaussian-mixture convergence established in prior work [26] and then conditions the posterior identification on positive definiteness of the limiting random covariance. The paper supplies separate mild sufficient conditions on activations and Lévy measures to guarantee this invertibility, extending [8] without re-deriving the same quantities or fitting parameters that are then renamed as predictions. No step equates the target posterior to its inputs by construction, nor does any load-bearing claim reduce solely to an unverified self-citation chain. The sequential-limit results and order-independence conditions are derived from the stated assumptions rather than presupposing the final form.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the positive-definiteness assumption for the random covariance and on the convergence result established in the cited reference [26]. No free parameters or new invented entities are introduced in the abstract.

axioms (1)

domain assumption The random covariance matrix of the infinite-width limit is positive definite under the prior
This is the explicit sufficient condition required to identify the posterior distribution.

pith-pipeline@v0.9.0 · 5712 in / 997 out tokens · 52490 ms · 2026-05-19T02:43:17.248208+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

If the random covariance matrix of the infinite-width limit is positive definite under the prior, we identify the posterior distribution of the output in the wide-width limit according to a sequential regime. ... mild sufficient conditions on the activation function and the associated Lévy measures
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.1 ... lim nL→∞ … lim n1→∞ Z(L+1)B(x) = G(L+1)(x) ... MG(0, Id ⊗ K(L+1)(x)) with Markov chain on positive semi-definite matrices K(ℓ)(x)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Adams, S

S. Adams, S. Patan` e, M. Lahijanian, and L. Laurenti. Finite neural networks as mixtures of gaussian processes: From provable error bounds to prior selection.arXiv:2407.18707, 2024. 40

work page arXiv 2024
[2]

Ambrosio, N

L. Ambrosio, N. Fusco, and D. Pallara.Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, London, 2000

work page 2000
[3]

Andreis, F

L. Andreis, F. Bassetti, and C. Hirsch. LDP for the covariance process in fully connected neural networks.arXiv:6431605, 2025

work page 2025
[4]

Apollonio, D

N. Apollonio, D. De Canditiis, G. Franzina, P. Stolfi, and G. L. Torrisi. Normal approximation of random gaussian neural networks.Stochastic Systems, 2025. to appear

work page 2025
[5]

Balasubramanian, L

K. Balasubramanian, L. Goldstein, and A. Salim. Gaussian random field approximation via stein’s method with applications to wide random neural networks.Appl. Comput. Harm. Anal., 72, 2024

work page 2024
[6]

Balasubramanian and N

K. Balasubramanian and N. Ross. Finite-dimensional Gaussian approximation for deep neural networks: Universality in random weights.arXiv:2507.12686, 2025

work page arXiv 2025
[7]

Basteri and D

A. Basteri and D. Trevisan. Quantitative Gaussian approximation of randomly initialized deep neural networks.Machine Learning, 113:6373–6393, 2024

work page 2024
[8]

Bordino, S

A. Bordino, S. Favaro, and S. Fortini. Non-asymptotic approximations of Gaussian neural networks via second-order Poincar´ e inequalities. InProceedings of Machine Learning Research (AABI24), 2024

work page 2024
[9]

Cammarota, D

V. Cammarota, D. Marinucci, M. Salvi, and S. Vigogna. A quantitative functional central limit theorem for shallow neural networks.Modern Stochastics: Theory and Applications, 11:85–108, 2024

work page 2024
[10]

Caporali, S

F. Caporali, S. Favaro, and D. Trevisan. Student-t processes as infinite-width limits of posterior Bayesian neural networks.arXiv:2502.0427, 2025

work page arXiv 2025
[11]

Celli and G

L. Celli and G. Peccati. Entropic bounds for conditionally Gaussian vectors and application to neural networks.arXiv:2504.08335, 2025

work page arXiv 2025
[12]

Eldan, D

R. Eldan, D. Mikulincer, and T. Schramm. Non-asymptotic approximations of neural networks by Gaussian processes. InConference on Learning Theory, pages 1754–1775, 2021

work page 2021
[13]

L. C. Evans and R. F. Gariepy.Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton, Florida, 1992

work page 1992
[14]

Favaro, S

S. Favaro, S. Fortini, and S. Peluchetti. Deep stable neural networks: Large-width asymptotics and convergence rates.Bernoulli, 29:2574–2597, 2023

work page 2023
[15]

Favaro, B

S. Favaro, B. Hanin, D. Marinucci, I. Nourdin, and G. Peccati. Quantitative CLTs in deep neural networks.Probability Theory and Related Fields, 2025. to appear

work page 2025
[16]

Federer.Geometric Measure Theory

H. Federer.Geometric Measure Theory. Springer, New York, 1969

work page 1969
[17]

Hajjar, L

K. Hajjar, L. Chizat, and C. Giraud. Training integrable parameterizations of deep neural networks in the infinite-width limit.Journal of Machine Learning Research, 25, 2024

work page 2024
[18]

B. Hanin. Random neural networks in the infinite width limit as Gaussian processes.The Annals of Applied Probability, 33:4798–4819, 2023. 41

work page 2023
[19]

J. Hron, Y. Bahri, R. Novak, J. Pennington, and J. Sohl-Dickstein. Exact posterior distribu- tions of wide Bayesian neural networks. InWorkshop on Uncertainty and Robustness in Deep Learning, 2020

work page 2020
[20]

Izmailov, S

P. Izmailov, S. Vikram, M. D. Hoffman, and A. G. Wilson. What are Bayesian neural net- work posteriors really like? InProceedings of the 38th International Conference on Machine Learning, 2021

work page 2021
[21]

P. Jung, H. Lee, J. Lee, and H. Yang.α-stable convergence of heavy-tailed infinitely wide neural networks.Advances in Applied Probability, 55:1415–1441, 2023

work page 2023
[22]

Lancaster and M

P. Lancaster and M. Tismenetsky.Theory of Matrices: With Applications. San Diego Univer- sity Press, San Diego, 1985

work page 1985
[23]

H. Lee, F. Ayed, P. Jung, J. Lee, H. Yang, and F. Caron. Deep neural networks with dependent weights: Gaussian process mixture limit, heavy tails, sparsity and compressibility.Journal of Machine Learning Research, 24:78 pp., 2023

work page 2023
[24]

J. Lee, Y. Bahri, R. Novak, S. Schoenholz, J. Pennington, and J. Sohl-Dickstein. Deep neural networks as Gaussian processes. InProceedings of the 6th International Conference on Learning Representations, 2018

work page 2018
[25]

Macci, B

C. Macci, B. Pacchiarotti, and G. L. Torrisi. Large and moderate deviations for Gaussian neural networks.Journal of Applied Probability, 2025. to appear

work page 2025
[26]

K. V. Mardia, J. T. Kent, and J. M. Bibby.Multivariate Analysis Probability and Mathematical Statistics. Academic Press, Waltham, 1995

work page 1995
[27]

C. H. Martin and M. W. Mahoney. Traditional and heavy-tailed self regularization in neural network models. InInternational Conference on Machine Learning, 2019

work page 2019
[28]

A. G. D. G. Matthews, J. Hron, M. Rowland, R. E. Turner, and Z. Ghahramani. Gaus- sian process behaviour in wide deep neural networks. InProceedings of the 6th International Conference on Learning Representations, 2018

work page 2018
[29]

S. Mei, A. Montanari, and P.M. Nguyen. A mean field view of the landscape of two-layers neural network.Proceedings of the National Academy of Sciences (PNAS), 2018

work page 2018
[30]

R. M. Neal.Bayesian Learning for Neural Networks. PhD thesis, Department of Computer Science, University of Toronto, 1995

work page 1995
[31]

R. M. Neal. Priors for infinite networks. InBayesian Learning for Neural Networks. Lecture Notes in Statistics, volume 118, pages 29–53. Springer, New York, 1996

work page 1996
[32]

Pezzetti, S

L. Pezzetti, S. Favaro, and S. Peluchetti. Function-space MCMC for Bayesian wide neural networks.arXiv:2408.14325, 2024

work page arXiv 2024
[33]

Rolski, H

T. Rolski, H. Schmidli, V. Schmidt, and J. Teugels.Stochastic Processes for Insurance and Finance. Wiley, Chichester, 1999

work page 1999
[34]

Sato.L´ evy Processes and Infinitely Divisible Distributions

K. Sato.L´ evy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge, 1999. 42

work page 1999
[35]

Trevisan

D. Trevisan. Wide deep neural networks with Gaussian weights are very close to Gaussian processes.arXiv:2312.11737, 2025

work page arXiv 2025
[36]

Q. Vogel. Large deviations of Gaussian neural networks with ReLU activation. arXiv:2405.16958, 2024

work page arXiv 2024
[37]

Wentzel, K

F. Wentzel, K. Roth, B. S. Veeling, J. Swiatkowski, L. Tran, S. Mandt, J. Snoek, T. Salimans, R. Jenatton, and S. Nowozin. How good is the Bayes posterior in deep neural networks really? InProceedings of the 37th International Conference on Machine Learning, pages 10248–10259, 2020. 43

work page 2020

[1] [1]

Adams, S

S. Adams, S. Patan` e, M. Lahijanian, and L. Laurenti. Finite neural networks as mixtures of gaussian processes: From provable error bounds to prior selection.arXiv:2407.18707, 2024. 40

work page arXiv 2024

[2] [2]

Ambrosio, N

L. Ambrosio, N. Fusco, and D. Pallara.Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, London, 2000

work page 2000

[3] [3]

Andreis, F

L. Andreis, F. Bassetti, and C. Hirsch. LDP for the covariance process in fully connected neural networks.arXiv:6431605, 2025

work page 2025

[4] [4]

Apollonio, D

N. Apollonio, D. De Canditiis, G. Franzina, P. Stolfi, and G. L. Torrisi. Normal approximation of random gaussian neural networks.Stochastic Systems, 2025. to appear

work page 2025

[5] [5]

Balasubramanian, L

K. Balasubramanian, L. Goldstein, and A. Salim. Gaussian random field approximation via stein’s method with applications to wide random neural networks.Appl. Comput. Harm. Anal., 72, 2024

work page 2024

[6] [6]

Balasubramanian and N

K. Balasubramanian and N. Ross. Finite-dimensional Gaussian approximation for deep neural networks: Universality in random weights.arXiv:2507.12686, 2025

work page arXiv 2025

[7] [7]

Basteri and D

A. Basteri and D. Trevisan. Quantitative Gaussian approximation of randomly initialized deep neural networks.Machine Learning, 113:6373–6393, 2024

work page 2024

[8] [8]

Bordino, S

A. Bordino, S. Favaro, and S. Fortini. Non-asymptotic approximations of Gaussian neural networks via second-order Poincar´ e inequalities. InProceedings of Machine Learning Research (AABI24), 2024

work page 2024

[9] [9]

Cammarota, D

V. Cammarota, D. Marinucci, M. Salvi, and S. Vigogna. A quantitative functional central limit theorem for shallow neural networks.Modern Stochastics: Theory and Applications, 11:85–108, 2024

work page 2024

[10] [10]

Caporali, S

F. Caporali, S. Favaro, and D. Trevisan. Student-t processes as infinite-width limits of posterior Bayesian neural networks.arXiv:2502.0427, 2025

work page arXiv 2025

[11] [11]

Celli and G

L. Celli and G. Peccati. Entropic bounds for conditionally Gaussian vectors and application to neural networks.arXiv:2504.08335, 2025

work page arXiv 2025

[12] [12]

Eldan, D

R. Eldan, D. Mikulincer, and T. Schramm. Non-asymptotic approximations of neural networks by Gaussian processes. InConference on Learning Theory, pages 1754–1775, 2021

work page 2021

[13] [13]

L. C. Evans and R. F. Gariepy.Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton, Florida, 1992

work page 1992

[14] [14]

Favaro, S

S. Favaro, S. Fortini, and S. Peluchetti. Deep stable neural networks: Large-width asymptotics and convergence rates.Bernoulli, 29:2574–2597, 2023

work page 2023

[15] [15]

Favaro, B

S. Favaro, B. Hanin, D. Marinucci, I. Nourdin, and G. Peccati. Quantitative CLTs in deep neural networks.Probability Theory and Related Fields, 2025. to appear

work page 2025

[16] [16]

Federer.Geometric Measure Theory

H. Federer.Geometric Measure Theory. Springer, New York, 1969

work page 1969

[17] [17]

Hajjar, L

K. Hajjar, L. Chizat, and C. Giraud. Training integrable parameterizations of deep neural networks in the infinite-width limit.Journal of Machine Learning Research, 25, 2024

work page 2024

[18] [18]

B. Hanin. Random neural networks in the infinite width limit as Gaussian processes.The Annals of Applied Probability, 33:4798–4819, 2023. 41

work page 2023

[19] [19]

J. Hron, Y. Bahri, R. Novak, J. Pennington, and J. Sohl-Dickstein. Exact posterior distribu- tions of wide Bayesian neural networks. InWorkshop on Uncertainty and Robustness in Deep Learning, 2020

work page 2020

[20] [20]

Izmailov, S

P. Izmailov, S. Vikram, M. D. Hoffman, and A. G. Wilson. What are Bayesian neural net- work posteriors really like? InProceedings of the 38th International Conference on Machine Learning, 2021

work page 2021

[21] [21]

P. Jung, H. Lee, J. Lee, and H. Yang.α-stable convergence of heavy-tailed infinitely wide neural networks.Advances in Applied Probability, 55:1415–1441, 2023

work page 2023

[22] [22]

Lancaster and M

P. Lancaster and M. Tismenetsky.Theory of Matrices: With Applications. San Diego Univer- sity Press, San Diego, 1985

work page 1985

[23] [23]

H. Lee, F. Ayed, P. Jung, J. Lee, H. Yang, and F. Caron. Deep neural networks with dependent weights: Gaussian process mixture limit, heavy tails, sparsity and compressibility.Journal of Machine Learning Research, 24:78 pp., 2023

work page 2023

[24] [24]

J. Lee, Y. Bahri, R. Novak, S. Schoenholz, J. Pennington, and J. Sohl-Dickstein. Deep neural networks as Gaussian processes. InProceedings of the 6th International Conference on Learning Representations, 2018

work page 2018

[25] [25]

Macci, B

C. Macci, B. Pacchiarotti, and G. L. Torrisi. Large and moderate deviations for Gaussian neural networks.Journal of Applied Probability, 2025. to appear

work page 2025

[26] [26]

K. V. Mardia, J. T. Kent, and J. M. Bibby.Multivariate Analysis Probability and Mathematical Statistics. Academic Press, Waltham, 1995

work page 1995

[27] [27]

C. H. Martin and M. W. Mahoney. Traditional and heavy-tailed self regularization in neural network models. InInternational Conference on Machine Learning, 2019

work page 2019

[28] [28]

A. G. D. G. Matthews, J. Hron, M. Rowland, R. E. Turner, and Z. Ghahramani. Gaus- sian process behaviour in wide deep neural networks. InProceedings of the 6th International Conference on Learning Representations, 2018

work page 2018

[29] [29]

S. Mei, A. Montanari, and P.M. Nguyen. A mean field view of the landscape of two-layers neural network.Proceedings of the National Academy of Sciences (PNAS), 2018

work page 2018

[30] [30]

R. M. Neal.Bayesian Learning for Neural Networks. PhD thesis, Department of Computer Science, University of Toronto, 1995

work page 1995

[31] [31]

R. M. Neal. Priors for infinite networks. InBayesian Learning for Neural Networks. Lecture Notes in Statistics, volume 118, pages 29–53. Springer, New York, 1996

work page 1996

[32] [32]

Pezzetti, S

L. Pezzetti, S. Favaro, and S. Peluchetti. Function-space MCMC for Bayesian wide neural networks.arXiv:2408.14325, 2024

work page arXiv 2024

[33] [33]

Rolski, H

T. Rolski, H. Schmidli, V. Schmidt, and J. Teugels.Stochastic Processes for Insurance and Finance. Wiley, Chichester, 1999

work page 1999

[34] [34]

Sato.L´ evy Processes and Infinitely Divisible Distributions

K. Sato.L´ evy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge, 1999. 42

work page 1999

[35] [35]

Trevisan

D. Trevisan. Wide deep neural networks with Gaussian weights are very close to Gaussian processes.arXiv:2312.11737, 2025

work page arXiv 2025

[36] [36]

Q. Vogel. Large deviations of Gaussian neural networks with ReLU activation. arXiv:2405.16958, 2024

work page arXiv 2024

[37] [37]

Wentzel, K

F. Wentzel, K. Roth, B. S. Veeling, J. Swiatkowski, L. Tran, S. Mandt, J. Snoek, T. Salimans, R. Jenatton, and S. Nowozin. How good is the Bayes posterior in deep neural networks really? InProceedings of the 37th International Conference on Machine Learning, pages 10248–10259, 2020. 43

work page 2020