Dirichlet process mixtures of block g priors for model selection and prediction in linear models
Pith reviewed 2026-05-23 18:19 UTC · model grok-4.3
The pith
Dirichlet process mixtures of block g priors are consistent for model selection and avoid the conditional Lindley paradox in linear models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Dirichlet process mixtures of block g priors are consistent in various senses and, in particular, avoid the conditional Lindley paradox. They permit differential shrinkage across data-selected blocks of regression coefficients while fully accounting for the correlation structure of the predictors, thereby bridging model-selection and continuous-shrinkage approaches.
What carries the argument
Dirichlet process mixture of block g priors, which clusters coefficients into blocks that share a common shrinkage parameter while each block prior incorporates the predictors' full covariance.
If this is right
- The priors achieve consistency for model selection and prediction under standard linear-model assumptions.
- They avoid the conditional Lindley paradox highlighted for certain other priors.
- In datasets containing a small number of very large effects, the priors yield higher power for smaller significant effects with only a minimal rise in false discoveries.
- Posterior inference is feasible via an MCMC algorithm that requires only minimal ad-hoc tuning.
Where Pith is reading between the lines
- The data-driven block construction may reveal latent grouping structure among predictors that is useful beyond prediction accuracy.
- The same mixture construction could be tested in settings with missing predictors or non-Gaussian errors to check whether the consistency properties persist.
- When predictors exhibit strong multicollinearity, the explicit accounting for correlation inside each block g prior may reduce sensitivity to arbitrary variable ordering.
Load-bearing premise
The Dirichlet process mixture can identify data-selected blocks of parameters that permit differential shrinkage while the block g construction fully accounts for the predictors' correlation structure.
What would settle it
A simulation study or real dataset in which the conditional Lindley paradox appears under standard g-prior mixtures but is absent under the Dirichlet process block version, or in which the prior fails to recover blocks and loses power relative to simpler alternatives.
Figures
read the original abstract
This paper introduces Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models. These priors are extensions of traditional mixtures of $g$ priors that allow for differential shrinkage for various (data-selected) blocks of parameters while fully accounting for the predictors' correlation structure, providing a bridge between the literatures on model selection and continuous shrinkage priors. We show that Dirichlet process mixtures of block $g$ priors are consistent in various senses and, in particular, that they avoid the conditional Lindley ``paradox'' highlighted by Som et al. (2016). Further, we develop a Markov chain Monte Carlo algorithm for posterior inference that requires only minimal ad-hoc tuning. Finally, we investigate the empirical performance of the prior in various real and simulated datasets. In the presence of a small number of very large effects, Dirichlet process mixtures of block $g$ priors lead to higher power for detecting smaller but significant effects without only a minimal increase in the number of false discoveries.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Dirichlet process mixtures of block g priors as a nonparametric extension of g-prior mixtures for Bayesian linear model selection and prediction. The construction permits differential shrinkage across data-selected blocks of coefficients while incorporating the full correlation structure among predictors within blocks. The authors establish posterior consistency in multiple senses and show explicit avoidance of the conditional Lindley paradox of Som et al. (2016). They supply an MCMC algorithm requiring only minimal tuning and report empirical results on simulated and real data, highlighting improved power to detect small effects when a few large effects are present, with only minimal increase in false discoveries.
Significance. If the consistency and paradox-avoidance results hold under the stated conditions, the work supplies a principled bridge between discrete model-selection priors and continuous shrinkage priors. The nonparametric block structure and the MCMC sampler with limited tuning are practical strengths; the empirical demonstration of power gains without substantial false-discovery inflation is a concrete contribution to high-dimensional linear modeling.
major comments (2)
- [§3.2, Theorem 3] §3.2, Theorem 3 (consistency for model selection): the proof sketch invokes the block-g construction to control the marginal likelihood ratio, but the argument appears to condition on the realized partition; it is unclear whether the result remains valid when the posterior on the number of blocks is allowed to grow with n, which is the generic behavior of the DP mixture.
- [§3.3, Proposition 1] §3.3, Proposition 1 (avoidance of conditional Lindley paradox): the derivation shows that the posterior odds remain bounded away from zero when a block contains both large and small signals, yet the bound depends on the fixed value of the DP concentration parameter; the paper does not state whether the result continues to hold for data-driven choices of this hyperparameter.
minor comments (3)
- [§2.1] §2.1: the definition of the block-g prior matrix R_b should explicitly display how the within-block correlation matrix is formed from the design submatrix X_b.
- [Table 2] Table 2: the reported posterior inclusion probabilities for the small-effect variables lack standard errors or credible intervals, making it difficult to assess variability across replications.
- [§4] The MCMC description in §4 states that only minimal tuning is required, but the supplementary material does not report effective sample sizes or mixing diagnostics for the block-allocation variables.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight important points regarding the scope of our consistency and paradox-avoidance results. We address each major comment below and indicate the revisions that will be incorporated.
read point-by-point responses
-
Referee: [§3.2, Theorem 3] §3.2, Theorem 3 (consistency for model selection): the proof sketch invokes the block-g construction to control the marginal likelihood ratio, but the argument appears to condition on the realized partition; it is unclear whether the result remains valid when the posterior on the number of blocks is allowed to grow with n, which is the generic behavior of the DP mixture.
Authors: We agree that the current proof of Theorem 3 proceeds conditionally on a fixed partition. To establish unconditional consistency, the argument must also control the posterior mass on partitions whose number of blocks grows too rapidly with n. We will revise §3.2 to include an additional lemma showing that the DP prior (with fixed concentration) places vanishing posterior probability on partitions with more than O(log n) blocks under the stated conditions, thereby extending the marginal-likelihood ratio bound to the unconditional case. This revision will be made. revision: yes
-
Referee: [§3.3, Proposition 1] §3.3, Proposition 1 (avoidance of conditional Lindley paradox): the derivation shows that the posterior odds remain bounded away from zero when a block contains both large and small signals, yet the bound depends on the fixed value of the DP concentration parameter; the paper does not state whether the result continues to hold for data-driven choices of this hyperparameter.
Authors: The derivation of Proposition 1 indeed treats the DP concentration parameter α as fixed. The lower bound on the posterior odds is monotone in α, so the qualitative avoidance of the conditional Lindley paradox continues to hold for any fixed α in a compact interval away from zero and infinity. We will add an explicit remark after the proposition stating this assumption and noting that fully data-driven selection of α lies outside the present scope. No further technical change is required. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper introduces Dirichlet process mixtures of block g priors as a nonparametric extension of existing g-prior mixtures, with consistency claims and avoidance of the conditional Lindley paradox positioned as theoretical properties of the construction. No load-bearing steps reduce by definition, fitted input, or self-citation chain to the inputs themselves; the derivation relies on external arguments for consistency rather than internal tautologies or renamed empirical patterns. The central claims remain independent of any self-referential fitting or uniqueness imported solely from the authors' prior work.
Axiom & Free-Parameter Ledger
free parameters (1)
- Dirichlet process concentration parameter
axioms (2)
- domain assumption The block g prior correctly encodes the correlation structure of the design matrix
- standard math Standard consistency results for Dirichlet process mixtures carry over to the block-g setting
invented entities (1)
-
block g prior
no independent evidence
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Andrade, J. A. A. & O'Hagan, A. (2011). Bayesian robustness modelling of location and scale parameters. Scandinavian Journal of Statistics 38, 691--711
work page 2011
-
[4]
Antoniak, C. E. (1974). Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The annals of statistics pp. 1152--1174
work page 1974
-
[5]
On the Beta Prime Prior for Scale Parameters in High-Dimensional Bayesian Regression Models
Bai, R. & Ghosh, M. (2018). On the beta prime prior for scale parameters in high-dimensional B ayesian regression models. arXiv preprint arXiv:1807.06539
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Bayarri, M. J. , Berger, J. O. , Forte, A. , Garc \' a-Donato, G. et al. (2012). Criteria for B ayesian model choice with application to variable selection. The Annals of Statistics 40, 1550--1577
work page 2012
-
[7]
Berger, J. O. , Bernardo, J. M. & Sun, D. (2009). The formal definition of reference priors. Annals of Statistics 37, 905--938
work page 2009
-
[8]
Berger, J. O. & Pericchi, L. R. (1996). The intrinsic B ayes factor for linear models. In Bayesian Statistics 5, Eds. A. P. D. J. M. Bernardo, J. O. Berger & A. F. M. Smith, pp. 25--44. Oxford Univ. Press
work page 1996
-
[9]
Berger, J. O. , Pericchi, L. R. & Varshavsky, J. A. (1998). Bayes factors and marginal distributions in invariant situations. Sankhy \=a : The Indian Journal of Statistics, Series A pp. 307--321
work page 1998
-
[10]
Bertoin, J. (2006). Random fragmentation and coagulation processes, volume 102. Cambridge University Press
work page 2006
-
[11]
Bhadra, A. , Datta, J. , Polson, N. G. , Willard, B. et al. (2017). The horseshoe+ estimator of ultra-sparse signals. Bayesian Analysis 12, 1105--1131
work page 2017
- [12]
-
[13]
Blackwell, D. & MacQueen, J. B. (1973). Ferguson distributions via p \'o lya urn schemes. The annals of statistics 1, 353--355
work page 1973
-
[14]
Boss, J. , Datta, J. , Wang, X. , Park, S. K. , Kang, J. & Mukherjee, B. (2023). Group inverse-gamma gamma shrinkage for sparse linear models with block-correlated regressors. Bayesian Analysis 1, 1--30
work page 2023
-
[15]
Bov \'e , D. S. & Held, L. (2011). Hyper- g priors for generalized linear models. Bayesian Analysis 6, 387--410
work page 2011
-
[16]
Breiman, L. & Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association 80, 580--598
work page 1985
-
[17]
Brown, P. J. & Griffin, J. E. (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 5, 171--188
work page 2010
-
[18]
Carvalho, C. M. , Polson, N. G. & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465--480
work page 2010
-
[19]
Carvalho, C. M. & Scott, J. G. (2009). Objective B ayesian model selection in G aussian graphical models. Biometrika 96, 497--512
work page 2009
-
[20]
Casella, G. & Moreno, E. (2006). Objective bayesian variable selection. Journal of the American Statistical Association 101, 157--167
work page 2006
-
[21]
Consonni, G. , Fouskakis, D. , Liseo, B. , Ntzoufras, I. et al. (2018). Prior distributions for objective B ayesian analysis. Bayesian Analysis 13, 627--679
work page 2018
-
[22]
De Santis, F. & Spezzaferri, F. (2001). Consistent fractional B ayes factor for nested normal linear models. Journal of statistical planning and inference 97, 305--321
work page 2001
-
[23]
Denti, F. , Azevedo, R. , Lo, C. , Wheeler, D. G. , Gandhi, S. P. , Guindani, M. & Shahbaba, B. (2023). A horseshoe mixture model for bayesian screening with an application to light sheet fluorescence microscopy in brain imaging. The Annals of Applied Statistics 17, 2639--2658
work page 2023
-
[24]
Ferguson, T. S. (1973). A bayesian analysis of some nonparametric problems. The annals of statistics pp. 209--230
work page 1973
-
[25]
Finegold, M. & Drton, M. (2014). Robust B ayesian graphical modeling using D irichlet t-distributions. Bayesian Analysis 9, 521--550
work page 2014
-
[26]
Forte, A. , Garcia-Donato, G. & Steel, M. F. J. (2018). Methods and tools for B ayesian variable selection and model averaging in normal linear regression. International Statistical Review 86, 237--258
work page 2018
-
[27]
Fouskakis, D. , Ntzoufras, I. & Draper, D. (2015). Power-expected-posterior priors for variable selection in G aussian linear models. Bayesian Analysis 10, 75--107
work page 2015
-
[28]
Gneiting, T. & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102, 359--378
work page 2007
-
[29]
Gordy, M. B. (1998). A generalization of generalized B eta distributions. Technical report, Division of Research and Statistics, Division of Monetary Affairs, Federal Reserve
work page 1998
-
[30]
Green, P. J. (1995). Reversible jump M arkov chain M onte C arlo computation and B ayesian model determination. Biometrika 82, 711--732
work page 1995
-
[31]
Griffin, J. & Brown, P. (2005). Alternative prior distributions for variable selection with very many more variables than observations. University of Kent Technical Report
work page 2005
-
[32]
Hans, C. (2009). Bayesian lasso regression. Biometrika 96, 835--845
work page 2009
-
[33]
Huang, J. , Ma, S. & Zhang, C.-H. (2008). Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica pp. 1603--1618
work page 2008
-
[34]
Johnson, V. E. & Rossell, D. (2010). On the use of non-local prior densities in B ayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 143--170
work page 2010
-
[35]
Johnson, V. E. & Rossell, D. (2012). Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association 107, 649--660
work page 2012
-
[36]
Kass, R. E. & Wasserman, L. (1995). A reference B ayesian test for nested hypotheses and its relationship to the S chwarz criterion. Journal of the American Statistical Association 90, 928--934
work page 1995
- [37]
-
[38]
Leng, C. , Tran, M.-N. & Nott, D. (2014). Bayesian adaptive lasso. Annals of the Institute of Statistical Mathematics 66, 221--244
work page 2014
- [39]
-
[40]
Li, Y. & Clyde, M. A. (2018). Mixtures of g-priors in generalized linear models. Journal of the American Statistical Association 113, 1828--1845
work page 2018
-
[41]
Liang, F. , Paulo, R. , Molina, G. , Clyde, M. A. & Berger, J. O. (2008). Mixtures of g-priors for B ayesian variable selection. Journal of the American Statistical Association 103, 410--423
work page 2008
-
[42]
Liu, Y. , Wichura, M. J. & Drton, M. (2012). Rejection sampling for an extended gamma distribution. Unpublished manuscript
work page 2012
-
[43]
Neal, R. M. (2000). Markov chain sampling methods for dirichlet process mixture models. Journal of computational and graphical statistics 9, 249--265
work page 2000
-
[44]
O'Hagan, A. (1995). Fractional B ayes factors for model comparison. Journal of the Royal Statistical Society: Series B (Methodological) 57, 99--118
work page 1995
-
[45]
Park, T. & Casella, G. (2008). The B ayesian lasso. Journal of the American Statistical Association 103, 681--686
work page 2008
-
[46]
Polson, N. G. & Scott, J. G. (2012). Local shrinkage rules, l \'e vy processes and regularized regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74, 287--311
work page 2012
-
[47]
Polson, N. G. , Scott, J. G. & Windle, J. (2013). Bayesian inference for logistic models using P \'o lya-- G amma latent variables. Journal of the American Statistical Association 108, 1339--1349
work page 2013
-
[48]
Porwal, A. & Raftery, A. E. (2022). Effect of model space priors on statistical inference with model uncertainty. The New England Journal of Statistics in Data Science pp. 1--10
work page 2022
-
[49]
Porwal, A. & Rodr \' guez, A. (2023). Laplace power-expected-posterior priors for logistic regression. Bayesian Analysis 1, 1--24
work page 2023
-
[50]
Rodr \' guez, A. (2013). On the jeffreys prior for the multivariate ewens distribution. Statistics & Probability Letters 83, 1539--1546
work page 2013
-
[51]
Scott, J. G. & Berger, J. O. (2010). Bayes and empirical- B ayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics pp. 2587--2619
work page 2010
-
[52]
Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica sinica pp. 639--650
work page 1994
-
[53]
Som, A. (2014). Paradoxes and Priors in Bayesian Regression. Ph.D. thesis, The Ohio State University
work page 2014
- [54]
-
[55]
Tipping, M. E. (2001). Sparse B ayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211--244
work page 2001
-
[56]
Zellner, A. (1986). On assessing prior distributions and B ayesian regression analysis with g-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, Eds. P. K. Goel & A. Zellner, pp. 233--243. Amsterdam: North-Holland/Elsevier
work page 1986
- [57]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.