pith. machine review for the scientific record. sign in

arxiv: 2604.20551 · v1 · submitted 2026-04-22 · 📊 stat.ML · cs.LG

Recognition: unknown

On Bayesian Softmax-Gated Mixture-of-Experts Models

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:13 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords mixture-of-expertssoftmax gatingBayesian inferenceposterior contractiondensity estimationparameter estimationmodel selectionVoronoi losses
0
0 comments X

The pith

Bayesian softmax-gated mixture-of-experts models achieve posterior contraction for density estimation and consistent parameter recovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the large-sample behavior of the posterior distribution in Bayesian mixture-of-experts models that rely on a softmax gating function. It proves that the posterior contracts to the true density at explicit rates, whether the number of experts is fixed ahead of time or is itself learned from the data. The analysis further shows that the model parameters can be recovered consistently when measured with specially constructed Voronoi-type losses that resolve the label-switching and identifiability problems typical of mixtures. Two practical procedures for choosing the number of experts are introduced and their statistical properties are derived. These guarantees matter because mixture-of-experts architectures are widely deployed for flexible regression and classification, and Bayesian versions now have a theoretical footing that was previously missing.

Core claim

For Bayesian mixture-of-experts models equipped with softmax gating, the posterior distribution contracts at explicit rates for density estimation both when the number of experts is fixed and known and when it is treated as random and learnable from the data. Parameter estimation is shown to converge under tailored Voronoi-type losses that properly account for the non-identifiability structure of the model. Two complementary strategies for selecting the number of experts are proposed and analyzed, supplying the first systematic asymptotic theory for this class of models.

What carries the argument

The posterior distribution over the parameters and gating weights of the softmax-gated mixture-of-experts model, together with Voronoi-type losses that resolve label switching to enable consistent parameter estimation.

If this is right

  • Posterior contraction rates hold for density estimation when the number of experts is fixed and known.
  • The same contraction rates hold when the number of experts is random and must be learned.
  • Parameter estimates converge in probability under the tailored Voronoi-type losses.
  • Two distinct strategies for choosing the number of experts are valid and their error properties are characterized.
  • The results supply concrete guidance on prior specification and model design for practical use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same contraction techniques may extend to mixture-of-experts models that use gating functions other than softmax.
  • The Voronoi losses suggest new evaluation metrics for mixture models that could be useful even outside the Bayesian setting.
  • The model-selection procedures could be combined with computational approximations such as variational inference to scale to large data.
  • These guarantees provide a benchmark against which frequentist mixture-of-experts estimators can be compared.

Load-bearing premise

The true data-generating density must belong to the mixture-of-experts model class and the prior distributions on the parameters must satisfy standard regularity conditions.

What would settle it

Simulated data drawn from a known softmax-gated mixture-of-experts density where the posterior fails to contract to the true density at the stated rate would falsify the contraction results.

Figures

Figures reproduced from arXiv: 2604.20551 by Alessandro Rinaldo, Huy Nguyen, Nhat Ho, Nicola Bariletto.

Figure 1
Figure 1. Figure 1: Model selection by VI-derived ELBO maximization. As the sample size increases, [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
read the original abstract

Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly prominent in modern machine learning, yet their theoretical properties in the Bayesian framework remain largely unexplored. In this paper, we study Bayesian mixture-of-experts models, focusing on the ubiquitous softmax-based gating mechanism. Specifically, we investigate the asymptotic behavior of the posterior distribution for three fundamental statistical tasks: density estimation, parameter estimation, and model selection. First, we establish posterior contraction rates for density estimation, both in the regimes with a fixed, known number of experts and with a random learnable number of experts. We then analyze parameter estimation and derive convergence guarantees based on tailored Voronoi-type losses, which account for the complex identifiability structure of mixture-of-experts models. Finally, we propose and analyze two complementary strategies for selecting the number of experts. Taken together, these results provide one of the first systematic theoretical analyses of Bayesian mixture-of-experts models with softmax gating, and yield several theory-grounded insights for practical model design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper studies Bayesian mixture-of-experts models with softmax gating. It claims to establish posterior contraction rates for density estimation both when the number of experts is fixed and known and when it is random and learnable. It further derives convergence guarantees for parameter estimation under tailored Voronoi-type losses that address label-switching and gating non-identifiability, and proposes and analyzes two complementary strategies for selecting the number of experts.

Significance. If the derivations hold, the work supplies one of the first systematic theoretical treatments of Bayesian softmax-gated MoE models, a class widely used in modern ML. The use of Voronoi-type losses to handle the complex identifiability structure and the coverage of both fixed and overfitted regimes are strengths. The results rest on standard regularity conditions from Bayesian mixture theory and provide theory-grounded guidance for practical model design.

major comments (1)
  1. Abstract: the claim that posterior contraction rates and convergence guarantees are established is not accompanied by explicit rates, listed assumptions, or proof sketches. This is load-bearing for the central claims, as the abstract supplies no concrete technical conditions (e.g., entropy bounds on the softmax-gated class or prior positivity on KL neighborhoods) under which the rates are asserted to hold.
minor comments (2)
  1. The manuscript would benefit from an explicit statement of all regularity conditions in a dedicated assumptions subsection early in the paper, rather than leaving them implicit as 'standard'.
  2. Notation for the gating function and expert parameters should be introduced with a clear table or diagram to aid readability, especially given the label-switching discussion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for the constructive comment. We address the point below and have incorporated the suggested improvement.

read point-by-point responses
  1. Referee: Abstract: the claim that posterior contraction rates and convergence guarantees are established is not accompanied by explicit rates, listed assumptions, or proof sketches. This is load-bearing for the central claims, as the abstract supplies no concrete technical conditions (e.g., entropy bounds on the softmax-gated class or prior positivity on KL neighborhoods) under which the rates are asserted to hold.

    Authors: We agree that the abstract would benefit from greater specificity on the rates and assumptions. The detailed posterior contraction rates (for both the fixed-expert and overfitted regimes), the entropy bounds on the softmax-gated class, the prior positivity conditions on KL neighborhoods, and the proof sketches are fully stated in the main theorems and appendices. To address the comment, we have revised the abstract to briefly reference the key rates and the standard regularity conditions under which the results hold, while retaining the high-level overview style typical of abstracts. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives new posterior contraction rates for density estimation (fixed and random number of experts), convergence under Voronoi-type losses for parameter estimation, and consistency for two model-selection strategies. These follow from standard regularity conditions in Bayesian nonparametric mixture theory (true density in the model class, prior positivity on KL neighborhoods, entropy bounds on the softmax-gated class) and are tailored to the gating/identifiability structure without reducing to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The central claims remain independently verifiable against external benchmarks in the literature on Bayesian mixtures and do not collapse by construction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit list of free parameters, axioms, or invented entities; all technical conditions are left unspecified.

pith-pipeline@v0.9.0 · 5495 in / 1019 out tokens · 30297 ms · 2026-05-09T23:13:07.594438+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

231 extracted references · 17 canonical work pages · 6 internal anchors

  1. [1]

    Ascolani, F., Lijoi, A., Rebaudo, G., and Zanella, G. (2023). Clustering consistency with D irichlet process mixtures. Biometrika , 110(2):551--558

  2. [3]

    and Walker, S

    Bariletto, N. and Walker, S. G. (2025). On A Necessary Condition For Posterior Inconsistency: New Insights From A Classic Counterexample . arXiv preprint arXiv:2510.18126

  3. [5]

    Barron, A., Schervish, M., and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. The Annals of Statistics , 27:536--561

  4. [6]

    Bishop, C. M. (2006). Pattern Recognition and Machine Learning . Springer

  5. [7]

    Bishop, C. M. and Svens \'e n, M. (2003). Bayesian hierarchical mixtures of experts. In Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI-2003) , pages 57--64. Morgan Kaufmann

  6. [8]

    G., Holmes, C

    Bissiri, P. G., Holmes, C. C., and Walker, S. G. (2016). A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 78(5):1103--1130

  7. [9]

    M., Kucukelbir, A., and McAuliffe, J

    Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association , 112(518):859--877

  8. [10]

    Dai, D., Deng, C., Zhao, C., Xu, R., Gao, H., Chen, D., Li, J., Zeng, W., Yu, X., Wu, Y., Xie, Z., Li, Y., Huang, P., Luo, F., Ruan, C., Sui, Z., and Liang, W. (2024). D eep S eek M o E : Towards ultimate expert specialization in mixture-of-experts language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics ...

  9. [12]

    T., Nguyen, H., Nguyen, C., Le, M., Nguyen, D

    Diep, N. T., Nguyen, H., Nguyen, C., Le, M., Nguyen, D. M. H., Sonntag, D., Niepert, M., and Ho, N. (2025). On zero-initialized attention: Optimal prompt and gating factor estimation. In Proceedings of the ICML

  10. [13]

    Doob, J. L. (1949). Application of the theory of martingales. Le calcul des probabilités et ses applications , pages 23--27

  11. [14]

    Dudley, R. M. (2002). Real Analysis and Probability . Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2 edition

  12. [15]

    Fong, E., Holmes, C., and Walker, S. G. (2023). Martingale posterior distributions. Journal of the Royal Statistical Society Series B: Statistical Methodology , 85(5):1357--1391

  13. [16]

    and Petrone, S

    Fortini, S. and Petrone, S. (2024). Exchangeability, prediction and predictive modeling in B ayesian statistics. Statistical Science . In press. arXiv:2402.10126

  14. [18]

    B., Stern, H

    Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis . CRC press, 3rd edition

  15. [19]

    K., and Ramamoorthi, R

    Ghosal, S., Ghosh, J. K., and Ramamoorthi, R. V. (1999). Posterior Consistency of Dirichlet Mixtures in Density Estimation . The Annals of Statistics , 27(1):143--158

  16. [20]

    K., and Van Der Vaart, A

    Ghosal, S., Ghosh, J. K., and Van Der Vaart, A. W. (2000). Convergence rates of posterior distributions. Annals of Statistics , pages 500--531

  17. [21]

    and van der Vaart, A

    Ghosal, S. and van der Vaart, A. (2007a). Convergence rates of posterior distributions for noniid observations . The Annals of Statistics , 35(1):192 -- 223

  18. [22]

    and Van der Vaart, A

    Ghosal, S. and Van der Vaart, A. (2017). Fundamentals of nonparametric Bayesian inference , volume 44. Cambridge University Press

  19. [23]

    and van der Vaart, A

    Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence for maximum likelihood and bayes estimation for mixtures of normal densities. The Annals of Statistics , 29(5):1233--1263

  20. [24]

    and van der Vaart, A

    Ghosal, S. and van der Vaart, A. W. (2007b). Posterior convergence rates of dirichlet mixtures at smooth densities. The Annals of Statistics , 35(2):697--723

  21. [25]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Google Gemini Team (2025). Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arxiv preprint arxiv 2507.06261

  22. [26]

    Gormley, I. C. and Fr \"u hwirth-Schnatter, S. (2019). Mixture of experts models. In Handbook of mixture analysis , pages 271--307. Chapman and Hall/CRC

  23. [27]

    Green, P. J. (1995). Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika , 82(4):711--732

  24. [28]

    Guha, A., Ho, N., and Nguyen, X. (2021). On posterior contraction of parameters and interpretability in bayesian mixture modeling. Bernoulli , 27(4):2159--2188

  25. [29]

    Han, X., Nguyen, H., Harris, C., Ho, N., and Saria, S. (2024). Fusemoe: Mixture-of-experts transformers for fleximodal fusion. In Advances in Neural Information Processing Systems

  26. [30]

    Hazimeh, H., Zhao, Z., Chowdhery, A., Sathiamoorthy, M., Chen, Y., Mazumder, R., Hong, L., and Chi, E. (2021). DSelect -k: Differentiable Selection in the Mixture of Experts with Applications to Multi - Task Learning . In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems...

  27. [31]

    and Nguyen, X

    Ho, N. and Nguyen, X. (2016). On strong identifiability and convergence rates of parameter estimation in finite mixtures. Electronic Journal of Statistics , 10(1):271--307

  28. [32]

    Ho, N., Yang, C.-Y., and Jordan, M. I. (2022). Convergence rates for G aussian mixtures of experts. Journal of Machine Learning Research , 23(323):1--81

  29. [33]

    A., Jordan, M

    Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation , 3(1):79--87

  30. [35]

    I., Ghahramani, Z., Jaakkola, T

    Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning , 37(2):183--233

  31. [36]

    Jordan, M. I. and Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation , 6(2):181--214

  32. [37]

    Knoblauch, J., Jewson, J., and Damoulas, T. (2022). An optimization-centric view on bayes' rule: Reviewing and generalizing variational inference. Journal of Machine Learning Research , 23(132):1--109

  33. [38]

    Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2017). Automatic differentiation variational inference. Journal of Machine Learning Research , 18(14):1--45

  34. [39]

    Le, M., Nguyen, C., Nguyen, H., Tran, Q., Le, T., and Ho, N. (2025). Revisiting prefix-tuning: Statistical benefits of reparameterization among prompts. In The Thirteenth International Conference on Learning Representations

  35. [40]

    N., Nguyen, H., Vu, T

    Le, M., The, A. N., Nguyen, H., Vu, T. T. N., Pham, H. T., Van, L. N., and Ho, N. (2024). Mixture of experts meets prompt-based continual learning. In The Thirty-eighth Annual Conference on Neural Information Processing Systems

  36. [41]

    Lee, H., Yun, E., Nam, G., Fong, E., and Lee, J. (2023). Martingale posterior neural processes. In The Eleventh International Conference on Learning Representations

  37. [42]

    Li, B., Shen, Y., Yang, J., Wang, Y., Ren, J., Che, T., Zhang, J., and Liu, Z. (2023). Sparse Mixture-of-Experts are Domain Generalizable Learners . In The Eleventh International Conference on Learning Representations

  38. [43]

    Lijoi, A., Pr \"u nster, I., and Walker, S. G. (2005). On consistency of nonparametric normal mixtures for B ayesian density estimation. Journal of the American Statistical Association , 100(472):1292--1296

  39. [45]

    Ludziejewski, J., Krajewski, J., Adamczewski, K., Pi\' o ro, M., Krutul, M., Antoniak, S., Ciebiera, K., Kr\' o l, K., Odrzyg\' o \' z d\' z , T., Sankowski, P., Cygan, M., and Jaszczur, S. (2024). Scaling laws for fine-grained mixture of experts. In Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machin...

  40. [46]

    MacEachern, S. N. (1999). Dependent nonparametric processes. In ASA proceedings of the section on Bayesian statistical science , volume 1, pages 50--55. Alexandria, Virginia. Virginia: American Statistical Association; 1999

  41. [47]

    and Ho, N

    Manole, T. and Ho, N. (2022). Refined convergence rates for maximum likelihood estimation under finite mixture models. In Proceedings of the 39th International Conference on Machine Learning , volume 162 of Proceedings of Machine Learning Research , pages 14979--15006. PMLR

  42. [48]

    and Ebrahimpour, R

    Masoudnia, S. and Ebrahimpour, R. (2014). Mixture of experts: a literature survey. Artificial Intelligence Review , 42(2):275--293

  43. [49]

    Mendes, E. F. and Jiang, W. (2012). On convergence rates of mixtures of polynomial experts. Neural Computation , 24(11):3025--3051

  44. [50]

    Miller, J. W. (2018). A detailed treatment of D oob's theorem. arXiv preprint arXiv:1801.03122

  45. [51]

    Miller, J. W. (2023). Consistency of mixture models with a prior on the number of components. Dependence Modeling , 11(1):20220150

  46. [52]

    Miller, J. W. and Harrison, M. T. (2014). Inconsistency of pitman-yor process mixtures for the number of components. Journal of Machine Learning Research , 15(1):3333--3370

  47. [54]

    Nguyen, H., Akbarian, P., Nguyen, T., and Ho, N. (2024a). A general theory for softmax gating multinomial logistic mixture of experts. In Proceedings of the 41st International Conference on Machine Learning

  48. [55]

    Nguyen, H., Akbarian, P., Pham, T., Nguyen, T., Zhang, S., and Ho, N. (2025). Statistical advantages of perturbing cosine router in mixture of experts. In International Conference on Learning Representations

  49. [56]

    Nguyen, H., Akbarian, P., Yan, F., and Ho, N. (2024b). Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts . In The Twelfth International Conference on Learning Representations

  50. [57]

    On expert estimation in hierar- chical mixture of experts: Beyond softmax gating functions.arXiv preprint arXiv:2410.02935, 2024

    Nguyen, H., Han, X., Harris, C. W., Saria, S., and Ho, N. (2024c). On expert estimation in hierarchical mixture of experts: Beyond softmax gating functions. arxiv preprint arxiv 2410.02935

  51. [58]

    Nguyen, H., Ho, N., and Rinaldo, A. (2026). Convergence rates for softmax gating mixture of experts. IEEE Transactions on Information Theory , 72(2):1276--1304

  52. [59]

    Nguyen, H., Nguyen, T., and Ho, N. (2023). Demystifying softmax gating function in G aussian mixture of experts. In Advances in Neural Information Processing Systems

  53. [60]

    Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models . The Annals of Statistics , 41(1):370 -- 400

  54. [61]

    Nobile, A. (1994). Bayesian Analysis of Finite Mixture Distributions . PhD thesis, Carnegie Mellon University, Pittsburgh, PA

  55. [62]

    G., Tzelepis, C., Panagakis, Y., Nicolaou, M

    Oldfield, J., Georgopoulos, M., Chrysos, G. G., Tzelepis, C., Panagakis, Y., Nicolaou, M. A., Deng, J., and Patras, I. (2024). Multilinear mixture of experts: Scalable expert specialization through factorization. In Advances in Neural Information Processing Systems

  56. [63]

    A., and Tanner, M

    Peng, F., Jacobs, R. A., and Tanner, M. A. (1996). Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. Journal of the American Statistical Association , 91(434):953--960

  57. [64]

    Ranganath, R., Gerrish, S., and Blei, D. (2014). Black box variational inference. In Artificial intelligence and statistics , pages 814--822. PMLR

  58. [65]

    Rasmussen, C. E. and Ghahramani, Z. (2002). Infinite mixtures of G aussian process experts. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14 , pages 881--888. MIT Press

  59. [66]

    B., et al

    Ren, L., Du, L., Dunson, D. B., et al. (2011). Logistic stick-breaking process. Journal of Machine Learning Research , 12(1)

  60. [67]

    S., Keysers, D., and Houlsby, N

    Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Pint, A. S., Keysers, D., and Houlsby, N. (2021). Scaling vision with sparse mixture of experts. In Advances in Neural Information Processing Systems , volume 34, pages 8583--8595. Curran Associates, Inc

  61. [68]

    Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods . Springer

  62. [69]

    B., and Gelfand, A

    Rodriguez, A., Dunson, D. B., and Gelfand, A. E. (2008). The nested dirichlet process. Journal of the American Statistical Association , 103(483):1131--1154

  63. [70]

    E., Mena, R

    Rodr \' guez, C. E., Mena, R. H., and Walker, S. G. (2025). Martingale posterior inference for finite mixture models and clustering. Journal of Computational and Graphical Statistics , pages 1--10

  64. [71]

    and Mengersen, K

    Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 73(5):689--710

  65. [72]

    Schwartz, L. (1965). On B ayes procedures. Zeitschrift f \"u r Wahrscheinlichkeitstheorie und verwandte Gebiete , 4(1):10--26

  66. [73]

    Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., and Dean, J. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In In International Conference on Learning Representations

  67. [74]

    and Wasserman, L

    Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. The Annals of Statistics , 29(3):687--714

  68. [75]

    W., Jordan, M

    Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical dirichlet processes. Journal of the American Statistical Association , 101(476):1566--1581

  69. [76]

    Teicher, H. (1963). Identifiability of finite mixtures. Annals of Statistics , 34:1265--1269

  70. [77]

    and Ghahramani, Z

    Ueda, N. and Ghahramani, Z. (2002). Bayesian model search for mixture models based on optimizing variational bounds. Neural Networks , 15(10):1223--1241

  71. [78]

    N., Kaiser, L

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc

  72. [79]

    Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint , volume 48. Cambridge university press

  73. [80]

    Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning , 1(1--2):1--305

  74. [81]

    Walker, S. G. (2004). New approaches to Bayesian consistency . The Annals of Statistics , 32(5):2028 -- 2043

  75. [82]

    Walker, S. G. and Hjort, N. L. (2001). On B ayesian consistency. Journal of the Royal Statistical Society, Series B , 63:811--821

  76. [83]

    G., Lijoi, A., and Pr \"u nster, I

    Walker, S. G., Lijoi, A., and Pr \"u nster, I. (2005). Data tracking and the understanding of bayesian consistency. Biometrika , 92(4):765--778

  77. [84]

    G., Lijoi, A., and Pr \"u nster, I

    Walker, S. G., Lijoi, A., and Pr \"u nster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models . The Annals of Statistics , 35(2):738--746

  78. [86]

    Waterhouse, S., MacKay, D., and Robinson, T. (1995). Bayesian methods for mixtures of experts. In Advances in Neural Information Processing Systems , volume 8

  79. [87]

    Wong, W. H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLE s. The Annals of Statistics , pages 339--362

  80. [88]

    and Williamson, S

    Wu, L. and Williamson, S. A. (2024). Posterior uncertainty quantification in neural networks using data augmentation. In International conference on artificial intelligence and statistics , pages 3376--3384. PMLR

Showing first 80 references.