pith. sign in

arxiv: 2411.00471 · v3 · submitted 2024-11-01 · 📊 stat.ME · cs.LG

Dirichlet process mixtures of block g priors for model selection and prediction in linear models

Pith reviewed 2026-05-23 18:19 UTC · model grok-4.3

classification 📊 stat.ME cs.LG
keywords Dirichlet processblock g priorsmodel selectionlinear modelsshrinkage priorsconsistencyLindley paradoxMCMC
0
0 comments X

The pith

Dirichlet process mixtures of block g priors are consistent for model selection and avoid the conditional Lindley paradox in linear models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Dirichlet process mixtures of block g priors to handle model selection and prediction in linear models. These priors extend mixtures of g priors by letting the data select blocks of coefficients that receive different shrinkage while respecting the full correlation structure among predictors. The authors establish that the resulting priors are consistent in several senses. They specifically show avoidance of the conditional Lindley paradox that arises under some other priors. An MCMC algorithm with little tuning is provided, and simulations plus real data illustrate gains in power for smaller effects when a few large signals are present.

Core claim

Dirichlet process mixtures of block g priors are consistent in various senses and, in particular, avoid the conditional Lindley paradox. They permit differential shrinkage across data-selected blocks of regression coefficients while fully accounting for the correlation structure of the predictors, thereby bridging model-selection and continuous-shrinkage approaches.

What carries the argument

Dirichlet process mixture of block g priors, which clusters coefficients into blocks that share a common shrinkage parameter while each block prior incorporates the predictors' full covariance.

If this is right

  • The priors achieve consistency for model selection and prediction under standard linear-model assumptions.
  • They avoid the conditional Lindley paradox highlighted for certain other priors.
  • In datasets containing a small number of very large effects, the priors yield higher power for smaller significant effects with only a minimal rise in false discoveries.
  • Posterior inference is feasible via an MCMC algorithm that requires only minimal ad-hoc tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The data-driven block construction may reveal latent grouping structure among predictors that is useful beyond prediction accuracy.
  • The same mixture construction could be tested in settings with missing predictors or non-Gaussian errors to check whether the consistency properties persist.
  • When predictors exhibit strong multicollinearity, the explicit accounting for correlation inside each block g prior may reduce sensitivity to arbitrary variable ordering.

Load-bearing premise

The Dirichlet process mixture can identify data-selected blocks of parameters that permit differential shrinkage while the block g construction fully accounts for the predictors' correlation structure.

What would settle it

A simulation study or real dataset in which the conditional Lindley paradox appears under standard g-prior mixtures but is absent under the Dirichlet process block version, or in which the prior fails to recover blocks and loses power relative to simpler alternatives.

Figures

Figures reproduced from arXiv: 2411.00471 by Abel Rodriguez, Anupreet Porwal.

Figure 1
Figure 1. Figure 1: Empirical illustration of the conditional Lindley paradox under hyper- [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Scatterplots of random samples from the Dirchlet mixture of block [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Behavior of log (Ba,0(y)) (left column) and Pr(ξ1 ̸= ξ2 | y) (right column) under the DP mixture of block g priors in our first simulation study. Each thin grey line corresponds to one replicate of the simulation, while the thicker blue line corresponds to the mean curve. Figures in the top row correspond to design matrices generated under η = 0, while the bottom row corresponds to η = 0.5 24 [PITH_FULL_I… view at source ↗
Figure 4
Figure 4. Figure 4: F1 scores for model selection procedures based on various priors for our second simulation study. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Prediction MSE for η = 0 and η = 0.5. the relative MSE with respect to that under the g-prior for each dataset. Hence, values less than 1 correspond to methods with smaller (better) prediction MSE. Note that, with 31 [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Joint and marginal posterior distributions for [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Posterior inclusion probabilities for individual variables and model sizes for var [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Joint and marginal posterior distributions for [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Predictive mean squared error (MSE) and and median interval scores (MIS) for [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Relative mean squared error of the coefficients for [PITH_FULL_IMAGE:figures/full_fig_p053_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Relative mean squared error of the coefficients for [PITH_FULL_IMAGE:figures/full_fig_p054_11.png] view at source ↗
read the original abstract

This paper introduces Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models. These priors are extensions of traditional mixtures of $g$ priors that allow for differential shrinkage for various (data-selected) blocks of parameters while fully accounting for the predictors' correlation structure, providing a bridge between the literatures on model selection and continuous shrinkage priors. We show that Dirichlet process mixtures of block $g$ priors are consistent in various senses and, in particular, that they avoid the conditional Lindley ``paradox'' highlighted by Som et al. (2016). Further, we develop a Markov chain Monte Carlo algorithm for posterior inference that requires only minimal ad-hoc tuning. Finally, we investigate the empirical performance of the prior in various real and simulated datasets. In the presence of a small number of very large effects, Dirichlet process mixtures of block $g$ priors lead to higher power for detecting smaller but significant effects without only a minimal increase in the number of false discoveries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces Dirichlet process mixtures of block g priors as a nonparametric extension of g-prior mixtures for Bayesian linear model selection and prediction. The construction permits differential shrinkage across data-selected blocks of coefficients while incorporating the full correlation structure among predictors within blocks. The authors establish posterior consistency in multiple senses and show explicit avoidance of the conditional Lindley paradox of Som et al. (2016). They supply an MCMC algorithm requiring only minimal tuning and report empirical results on simulated and real data, highlighting improved power to detect small effects when a few large effects are present, with only minimal increase in false discoveries.

Significance. If the consistency and paradox-avoidance results hold under the stated conditions, the work supplies a principled bridge between discrete model-selection priors and continuous shrinkage priors. The nonparametric block structure and the MCMC sampler with limited tuning are practical strengths; the empirical demonstration of power gains without substantial false-discovery inflation is a concrete contribution to high-dimensional linear modeling.

major comments (2)
  1. [§3.2, Theorem 3] §3.2, Theorem 3 (consistency for model selection): the proof sketch invokes the block-g construction to control the marginal likelihood ratio, but the argument appears to condition on the realized partition; it is unclear whether the result remains valid when the posterior on the number of blocks is allowed to grow with n, which is the generic behavior of the DP mixture.
  2. [§3.3, Proposition 1] §3.3, Proposition 1 (avoidance of conditional Lindley paradox): the derivation shows that the posterior odds remain bounded away from zero when a block contains both large and small signals, yet the bound depends on the fixed value of the DP concentration parameter; the paper does not state whether the result continues to hold for data-driven choices of this hyperparameter.
minor comments (3)
  1. [§2.1] §2.1: the definition of the block-g prior matrix R_b should explicitly display how the within-block correlation matrix is formed from the design submatrix X_b.
  2. [Table 2] Table 2: the reported posterior inclusion probabilities for the small-effect variables lack standard errors or credible intervals, making it difficult to assess variability across replications.
  3. [§4] The MCMC description in §4 states that only minimal tuning is required, but the supplementary material does not report effective sample sizes or mixing diagnostics for the block-allocation variables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important points regarding the scope of our consistency and paradox-avoidance results. We address each major comment below and indicate the revisions that will be incorporated.

read point-by-point responses
  1. Referee: [§3.2, Theorem 3] §3.2, Theorem 3 (consistency for model selection): the proof sketch invokes the block-g construction to control the marginal likelihood ratio, but the argument appears to condition on the realized partition; it is unclear whether the result remains valid when the posterior on the number of blocks is allowed to grow with n, which is the generic behavior of the DP mixture.

    Authors: We agree that the current proof of Theorem 3 proceeds conditionally on a fixed partition. To establish unconditional consistency, the argument must also control the posterior mass on partitions whose number of blocks grows too rapidly with n. We will revise §3.2 to include an additional lemma showing that the DP prior (with fixed concentration) places vanishing posterior probability on partitions with more than O(log n) blocks under the stated conditions, thereby extending the marginal-likelihood ratio bound to the unconditional case. This revision will be made. revision: yes

  2. Referee: [§3.3, Proposition 1] §3.3, Proposition 1 (avoidance of conditional Lindley paradox): the derivation shows that the posterior odds remain bounded away from zero when a block contains both large and small signals, yet the bound depends on the fixed value of the DP concentration parameter; the paper does not state whether the result continues to hold for data-driven choices of this hyperparameter.

    Authors: The derivation of Proposition 1 indeed treats the DP concentration parameter α as fixed. The lower bound on the posterior odds is monotone in α, so the qualitative avoidance of the conditional Lindley paradox continues to hold for any fixed α in a compact interval away from zero and infinity. We will add an explicit remark after the proposition stating this assumption and noting that fully data-driven selection of α lies outside the present scope. No further technical change is required. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces Dirichlet process mixtures of block g priors as a nonparametric extension of existing g-prior mixtures, with consistency claims and avoidance of the conditional Lindley paradox positioned as theoretical properties of the construction. No load-bearing steps reduce by definition, fitted input, or self-citation chain to the inputs themselves; the derivation relies on external arguments for consistency rather than internal tautologies or renamed empirical patterns. The central claims remain independent of any self-referential fitting or uniqueness imported solely from the authors' prior work.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central contribution is a new prior construction whose properties are asserted but not derived in the abstract; the ledger therefore records the minimal set of modeling choices needed to state the claim.

free parameters (1)
  • Dirichlet process concentration parameter
    Hyperparameter controlling the number of blocks; its value or prior is required for the mixture but not specified in the abstract.
axioms (2)
  • domain assumption The block g prior correctly encodes the correlation structure of the design matrix
    Invoked when extending the classical g-prior to blocks.
  • standard math Standard consistency results for Dirichlet process mixtures carry over to the block-g setting
    Required for the consistency claims.
invented entities (1)
  • block g prior no independent evidence
    purpose: To permit differential shrinkage across data-selected groups of coefficients
    New modeling object introduced by the paper.

pith-pipeline@v0.9.0 · 5701 in / 1229 out tokens · 51477 ms · 2026-05-23T18:19:00.162528+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 1 internal anchor

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Andrade, J. A. A. & O'Hagan, A. (2011). Bayesian robustness modelling of location and scale parameters. Scandinavian Journal of Statistics 38, 691--711

  4. [4]

    Antoniak, C. E. (1974). Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The annals of statistics pp. 1152--1174

  5. [5]

    On the Beta Prime Prior for Scale Parameters in High-Dimensional Bayesian Regression Models

    Bai, R. & Ghosh, M. (2018). On the beta prime prior for scale parameters in high-dimensional B ayesian regression models. arXiv preprint arXiv:1807.06539

  6. [6]

    Bayarri, M. J. , Berger, J. O. , Forte, A. , Garc \' a-Donato, G. et al. (2012). Criteria for B ayesian model choice with application to variable selection. The Annals of Statistics 40, 1550--1577

  7. [7]

    Berger, J. O. , Bernardo, J. M. & Sun, D. (2009). The formal definition of reference priors. Annals of Statistics 37, 905--938

  8. [8]

    Berger, J. O. & Pericchi, L. R. (1996). The intrinsic B ayes factor for linear models. In Bayesian Statistics 5, Eds. A. P. D. J. M. Bernardo, J. O. Berger & A. F. M. Smith, pp. 25--44. Oxford Univ. Press

  9. [9]

    Berger, J. O. , Pericchi, L. R. & Varshavsky, J. A. (1998). Bayes factors and marginal distributions in invariant situations. Sankhy \=a : The Indian Journal of Statistics, Series A pp. 307--321

  10. [10]

    Bertoin, J. (2006). Random fragmentation and coagulation processes, volume 102. Cambridge University Press

  11. [11]

    , Datta, J

    Bhadra, A. , Datta, J. , Polson, N. G. , Willard, B. et al. (2017). The horseshoe+ estimator of ultra-sparse signals. Bayesian Analysis 12, 1105--1131

  12. [12]

    , Pati, D

    Bhattacharya, A. , Pati, D. , Pillai, N. S. & Dunson, D. B. (2015). Dirichlet-- L aplace priors for optimal shrinkage. Journal of the American Statistical Association 110, 1479--1490

  13. [13]

    & MacQueen, J

    Blackwell, D. & MacQueen, J. B. (1973). Ferguson distributions via p \'o lya urn schemes. The annals of statistics 1, 353--355

  14. [14]

    , Datta, J

    Boss, J. , Datta, J. , Wang, X. , Park, S. K. , Kang, J. & Mukherjee, B. (2023). Group inverse-gamma gamma shrinkage for sparse linear models with block-correlated regressors. Bayesian Analysis 1, 1--30

  15. [15]

    Bov \'e , D. S. & Held, L. (2011). Hyper- g priors for generalized linear models. Bayesian Analysis 6, 387--410

  16. [16]

    & Friedman, J

    Breiman, L. & Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association 80, 580--598

  17. [17]

    Brown, P. J. & Griffin, J. E. (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 5, 171--188

  18. [18]

    Carvalho, C. M. , Polson, N. G. & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465--480

  19. [19]

    Carvalho, C. M. & Scott, J. G. (2009). Objective B ayesian model selection in G aussian graphical models. Biometrika 96, 497--512

  20. [20]

    & Moreno, E

    Casella, G. & Moreno, E. (2006). Objective bayesian variable selection. Journal of the American Statistical Association 101, 157--167

  21. [21]

    , Fouskakis, D

    Consonni, G. , Fouskakis, D. , Liseo, B. , Ntzoufras, I. et al. (2018). Prior distributions for objective B ayesian analysis. Bayesian Analysis 13, 627--679

  22. [22]

    & Spezzaferri, F

    De Santis, F. & Spezzaferri, F. (2001). Consistent fractional B ayes factor for nested normal linear models. Journal of statistical planning and inference 97, 305--321

  23. [23]

    , Azevedo, R

    Denti, F. , Azevedo, R. , Lo, C. , Wheeler, D. G. , Gandhi, S. P. , Guindani, M. & Shahbaba, B. (2023). A horseshoe mixture model for bayesian screening with an application to light sheet fluorescence microscopy in brain imaging. The Annals of Applied Statistics 17, 2639--2658

  24. [24]

    Ferguson, T. S. (1973). A bayesian analysis of some nonparametric problems. The annals of statistics pp. 209--230

  25. [25]

    & Drton, M

    Finegold, M. & Drton, M. (2014). Robust B ayesian graphical modeling using D irichlet t-distributions. Bayesian Analysis 9, 521--550

  26. [26]

    , Garcia-Donato, G

    Forte, A. , Garcia-Donato, G. & Steel, M. F. J. (2018). Methods and tools for B ayesian variable selection and model averaging in normal linear regression. International Statistical Review 86, 237--258

  27. [27]

    , Ntzoufras, I

    Fouskakis, D. , Ntzoufras, I. & Draper, D. (2015). Power-expected-posterior priors for variable selection in G aussian linear models. Bayesian Analysis 10, 75--107

  28. [28]

    & Raftery, A

    Gneiting, T. & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102, 359--378

  29. [29]

    Gordy, M. B. (1998). A generalization of generalized B eta distributions. Technical report, Division of Research and Statistics, Division of Monetary Affairs, Federal Reserve

  30. [30]

    Green, P. J. (1995). Reversible jump M arkov chain M onte C arlo computation and B ayesian model determination. Biometrika 82, 711--732

  31. [31]

    & Brown, P

    Griffin, J. & Brown, P. (2005). Alternative prior distributions for variable selection with very many more variables than observations. University of Kent Technical Report

  32. [32]

    Hans, C. (2009). Bayesian lasso regression. Biometrika 96, 835--845

  33. [33]

    Huang, J. , Ma, S. & Zhang, C.-H. (2008). Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica pp. 1603--1618

  34. [34]

    Johnson, V. E. & Rossell, D. (2010). On the use of non-local prior densities in B ayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 143--170

  35. [35]

    Johnson, V. E. & Rossell, D. (2012). Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association 107, 649--660

  36. [36]

    Kass, R. E. & Wasserman, L. (1995). A reference B ayesian test for nested hypotheses and its relationship to the S chwarz criterion. Journal of the American Statistical Association 90, 928--934

  37. [37]

    Lee, S. Y. , Pati, D. & Mallick, B. K. (2020). Continuous shrinkage prior revisited: a collapsing behavior and remedy. arXiv preprint arXiv:2007.02192

  38. [38]

    , Tran, M.-N

    Leng, C. , Tran, M.-N. & Nott, D. (2014). Bayesian adaptive lasso. Annals of the Institute of Statistical Mathematics 66, 221--244

  39. [39]

    & Pati, D

    Li, H. & Pati, D. (2017). Variable selection using shrinkage priors. Computational Statistics & Data Analysis 107, 107--119

  40. [40]

    & Clyde, M

    Li, Y. & Clyde, M. A. (2018). Mixtures of g-priors in generalized linear models. Journal of the American Statistical Association 113, 1828--1845

  41. [41]

    , Paulo, R

    Liang, F. , Paulo, R. , Molina, G. , Clyde, M. A. & Berger, J. O. (2008). Mixtures of g-priors for B ayesian variable selection. Journal of the American Statistical Association 103, 410--423

  42. [42]

    , Wichura, M

    Liu, Y. , Wichura, M. J. & Drton, M. (2012). Rejection sampling for an extended gamma distribution. Unpublished manuscript

  43. [43]

    Neal, R. M. (2000). Markov chain sampling methods for dirichlet process mixture models. Journal of computational and graphical statistics 9, 249--265

  44. [44]

    O'Hagan, A. (1995). Fractional B ayes factors for model comparison. Journal of the Royal Statistical Society: Series B (Methodological) 57, 99--118

  45. [45]

    & Casella, G

    Park, T. & Casella, G. (2008). The B ayesian lasso. Journal of the American Statistical Association 103, 681--686

  46. [46]

    Polson, N. G. & Scott, J. G. (2012). Local shrinkage rules, l \'e vy processes and regularized regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74, 287--311

  47. [47]

    Polson, N. G. , Scott, J. G. & Windle, J. (2013). Bayesian inference for logistic models using P \'o lya-- G amma latent variables. Journal of the American Statistical Association 108, 1339--1349

  48. [48]

    & Raftery, A

    Porwal, A. & Raftery, A. E. (2022). Effect of model space priors on statistical inference with model uncertainty. The New England Journal of Statistics in Data Science pp. 1--10

  49. [49]

    & Rodr \' guez, A

    Porwal, A. & Rodr \' guez, A. (2023). Laplace power-expected-posterior priors for logistic regression. Bayesian Analysis 1, 1--24

  50. [50]

    Rodr \' guez, A. (2013). On the jeffreys prior for the multivariate ewens distribution. Statistics & Probability Letters 83, 1539--1546

  51. [51]

    Scott, J. G. & Berger, J. O. (2010). Bayes and empirical- B ayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics pp. 2587--2619

  52. [52]

    Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica sinica pp. 639--650

  53. [53]

    Som, A. (2014). Paradoxes and Priors in Bayesian Regression. Ph.D. thesis, The Ohio State University

  54. [54]

    , Hans, C

    Som, A. , Hans, C. M. & MacEachern, S. N. (2016). A conditional L indley paradox in B ayesian linear models. Biometrika 103, 993--999

  55. [55]

    Tipping, M. E. (2001). Sparse B ayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211--244

  56. [56]

    Zellner, A. (1986). On assessing prior distributions and B ayesian regression analysis with g-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, Eds. P. K. Goel & A. Zellner, pp. 233--243. Amsterdam: North-Holland/Elsevier

  57. [57]

    & Siow, A

    Zellner, A. & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. Trabajos de Estad \'i stica y de Investigaci \'o w Operativa 31, 585--603