Revisiting Bayesian Variable Selection via Optimization

Leo L Duan

arxiv: 2604.21009 · v1 · submitted 2026-04-22 · 📊 stat.ME · stat.CO

Revisiting Bayesian Variable Selection via Optimization

Leo L Duan This is my paper

Pith reviewed 2026-05-09 23:21 UTC · model grok-4.3

classification 📊 stat.ME stat.CO

keywords Bayesian variable selectiondifference of convex functionsmarginal likelihood optimizationtype-II maximum likelihoodglobal convergencelinear regressionMCMC initializationstructured sparsity

0 comments

The pith

The marginal likelihood in Bayesian variable selection can be globally optimized by a difference-of-convex algorithm despite lacking log-concavity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the long-standing concern that the marginal likelihood obtained by integrating out coefficients and variance in Bayesian variable selection is not log-concave, so standard optimization offers no guarantee of reaching the global mode. By recasting the negative log-marginal likelihood as a difference of convex functions of the latent precision parameters, the authors apply a simple iterative DC algorithm. Under mild compact-set conditions this procedure converges linearly to the global optimum. The guarantee covers type-II maximum likelihood and extends to maximum marginal posterior under suitable priors, showing that the optimization problem is more tractable than non-concavity alone would imply. The resulting method requires no tuning parameters and can serve as a fast initializer for MCMC.

Core claim

Treating the negative log-marginal likelihood as a loss function of the latent precision parameters, the paper rewrites it as the difference of two convex functions. The DC algorithm applied to this loss converges to the global minimizer at a linear rate whenever the feasible set is compact. The same convergence holds for the type-II maximum-likelihood estimator and for the maximum marginal posterior under priors that preserve the DC structure.

What carries the argument

Difference-of-convex (DC) algorithm applied to the negative log-marginal likelihood written as the difference of two convex functions of the latent precision parameters.

If this is right

Type-II maximum-likelihood variable selection can be solved to global optimality without risk of spurious local modes.
The same global guarantee applies to maximum marginal posterior modes under priors that keep the DC structure intact.
The algorithm supplies a tuning-free, fast alternative or warm start for collapsed Gibbs samplers in linear regression.
The approach extends directly to models with structured sparsity penalties.
Numerical experiments on simulated data and on aftershock-risk mapping after the 2019 Ridgecrest earthquakes confirm practical performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar DC decompositions may exist for marginal likelihoods in generalized linear models or non-linear regression, allowing the same convergence argument.
Practitioners could run the DC optimizer first to locate a high-quality mode before launching full posterior sampling, reducing the chance that MCMC chains start far from the main mass.
The linear convergence rate suggests that the number of iterations needed scales modestly with dimension when the compact-set radius is fixed.

Load-bearing premise

The negative log-marginal likelihood must admit an expression as the difference of two convex functions, and the domain of the latent precision parameters must satisfy mild compactness.

What would settle it

A low-dimensional linear regression example in which the global mode is known by exhaustive enumeration, yet the DC algorithm returns a strictly inferior point when the compactness condition is removed.

Figures

Figures reproduced from arXiv: 2604.21009 by Leo L Duan.

**Figure 1.** Figure 1: The loss function (2) for p = 1, X = 1n, n = 20, a0 = b0 = 1, under three signal strengths. Red dashed lines mark the interior minimizer. Note under the no-signal case, the optimum is at d1 → ∞, corresponding to shrinking θ1 to zero. When the signal is strong or moderate (left and center panels), the loss has a finite interior minimizer d ⋆ 1 . The no-signal case (right panel) is qualitatively different: t… view at source ↗

**Figure 2.** Figure 2: Iterations to convergence for the difference-of-convex algorithm (DC), automatic [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: Wall-clock time (seconds, log scale) for the same three algorithms and scenarios [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: DC algorithm applied to the Dorothea dataset ( [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: DC spatial smoothing of the 2019 Ridgecrest aftershock sequence (M7.1, [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

read the original abstract

Variable selection in linear regression has been a central topic in statistical research for decades. Bayesian variable selection methods, which account for uncertainty in both the regression coefficients and the noise variance, have achieved broad success through the use of discrete or continuous shrinkage priors and efficient collapsed Gibbs samplers. Despite their popularity and strong empirical performance, an enigma remains: the marginal likelihood, obtained by integrating out the regression coefficients and noise variance, is not log-concave; therefore, there is no guarantee of reliably finding its global optimum. In this article, we study this problem from an optimization perspective. Taking the negative log-marginal likelihood as a loss function of the latent precision parameters, we can rewrite it as a difference of convex functions (DC), and then optimize it via a simple iterative algorithm. Under mild compact set conditions, the DC algorithm converges to the global optimum at a linear rate. The positive finding applies to type-II maximum likelihood and extends to maximum marginal posterior under suitable priors, indicating that the problem of mode finding in Bayesian variable selection is much more benign than the lack of log-concavity might suggest. Besides the theoretical insight, the proposed algorithm is easy to implement, free of tuning, and extensible to structured sparsity, and thus can serve as an efficient alternative or warm-start for traditional Markov chain Monte Carlo solutions. The method is illustrated through numerical studies and a spatial data application for quantifying the aftershock risk following the 2019 Ridgecrest earthquakes. The source code for the algorithm is publicly available at https://github.com/leoduan/dca_optimization_variable_selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DC reformulation gives a tuning-free optimizer for marginal likelihood modes in variable selection, but compact-set conditions likely miss the boundary solutions that matter for inclusion and exclusion.

read the letter

The one thing to know is that this paper claims a DC reformulation of the negative log-marginal likelihood lets a simple algorithm find its global mode at linear rate under compact conditions, offering a tuning-free optimizer for Bayesian variable selection. The new element is the DC decomposition itself along with the convergence result, which is not standard in the shrinkage prior literature. The paper does well by keeping the method easy to code, releasing the source, and applying it to a spatial aftershock analysis. That practical side makes the work more than just theory. The main soft spot is the compact set assumption in the convergence theorem. Variable selection modes typically push some precision parameters to zero or infinity, which lie outside any compact set. The paper needs to address whether the global minimizer stays inside or how to handle the boundary; otherwise the guarantee does not reach the most relevant cases. The abstract calls the conditions mild, but that needs the proof to confirm. This is for people working on high-dimensional or spatial Bayesian regression who want an optimization route instead of full MCMC. Readers focused on mode finding or warm starts will see the most value. It deserves peer review. The claim is substantive enough that referees should check the details and the boundary handling.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that the negative log-marginal likelihood for Bayesian variable selection in linear regression, expressed as a function of latent precision parameters, can be rewritten as a difference of convex functions. It then proposes a DC algorithm that converges to the global optimum at a linear rate under mild compact set conditions. The result is stated to apply to type-II maximum likelihood and to extend to maximum marginal posterior under suitable priors. The approach is presented as tuning-free and easy to implement, serving as an alternative or warm-start for MCMC, with illustrations on numerical studies and a spatial data application; source code is publicly released.

Significance. If the DC decomposition and convergence result hold and extend to the boundary regimes of interest, the work would provide useful theoretical insight that mode-finding in Bayesian variable selection is more tractable than the absence of log-concavity suggests, together with a practical algorithm. The public availability of the source code is a positive feature that supports reproducibility and potential adoption as a complement to existing MCMC methods.

major comments (1)

[Convergence theorem and surrounding discussion of the DC algorithm] The convergence theorem (the statement that the DC algorithm converges to the global optimum at linear rate under mild compact set conditions) is load-bearing for the central claim. The relevant modes in variable selection routinely lie at the boundary of the domain, with some precision parameters τ_j → 0 (inclusion) or τ_j → ∞ (exclusion). Any compact set is closed and bounded and therefore excludes at least one of these extremes. The manuscript provides no explicit argument that the global minimizer remains interior to the chosen compact set or that boundary solutions can be recovered by a limiting argument. Without this, the guarantee does not transfer to the statistical regimes the method is intended to address.

minor comments (2)

[Numerical studies] The numerical studies section would benefit from explicit side-by-side reporting of variable recovery rates, runtime, and sensitivity to initialization when comparing the DC algorithm against collapsed Gibbs sampling.
[Method section] Notation for the latent precision vector τ and the DC decomposition could be introduced with a short table or explicit listing of the convex and concave parts to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for identifying a key theoretical point regarding the scope of the convergence result. We address this comment directly below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Convergence theorem and surrounding discussion of the DC algorithm] The convergence theorem (the statement that the DC algorithm converges to the global optimum at linear rate under mild compact set conditions) is load-bearing for the central claim. The relevant modes in variable selection routinely lie at the boundary of the domain, with some precision parameters τ_j → 0 (inclusion) or τ_j → ∞ (exclusion). Any compact set is closed and bounded and therefore excludes at least one of these extremes. The manuscript provides no explicit argument that the global minimizer remains interior to the chosen compact set or that boundary solutions can be recovered by a limiting argument. Without this, the guarantee does not transfer to the statistical regimes the method is intended to address.

Authors: We thank the referee for highlighting this important limitation in the current statement of the theorem. The convergence result is indeed established only on compact sets, which necessarily exclude the boundary regimes τ_j → 0 and τ_j → ∞ that arise in variable selection. The manuscript does not supply an explicit interiority argument or a limiting procedure to recover boundary solutions. To address this, we will revise the manuscript by adding a dedicated remark that considers a sequence of expanding compact sets K_n whose union is the positive orthant. Under the same mild conditions used in the original proof, we will show that the sequence of DC iterates on K_n converges to a point that is a global minimizer on the closure whenever such a minimizer exists in the extended reals. We will also include additional numerical experiments that initialize the algorithm near the boundaries and confirm recovery of the expected modes. These additions will be incorporated in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity: DC form and convergence derived from standard marginal likelihood

full rationale

The paper starts from the standard negative log-marginal likelihood obtained by integrating out regression coefficients and noise variance in the linear model. It then performs an algebraic rewrite of this expression into a difference of convex functions, which is a direct derivation rather than a self-referential definition or fitted construction. The DC algorithm is applied to this derived objective, and the linear-rate global convergence is asserted under explicitly stated mild compact-set conditions drawn from general DC programming theory. No step reduces the target result to a parameter fitted on the same data, a self-citation chain, or a renamed empirical pattern. The central claim therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the negative log-marginal likelihood admits a DC decomposition; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The negative log-marginal likelihood is a difference of convex functions.
This rewrite is required to apply the DC algorithm and obtain the global convergence guarantee.

pith-pipeline@v0.9.0 · 5571 in / 1087 out tokens · 56547 ms · 2026-05-09T23:21:04.554982+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Yao, Chaorui and Jiang, Xin , journal=

work page
[2]

2015 , publisher=

Lim, Michael and Hastie, Trevor , journal=. 2015 , publisher=

work page 2015
[3]

Guyon, Isabelle and Gunn, Steve and Ben-Hur, Asa and Dror, Gideon , journal=

work page
[4]

and McCulloch, Robert E

George, Edward I. and McCulloch, Robert E. , title =. Journal of the American Statistical Association , year =

work page
[5]

and Scott, James G

Polson, Nicholas G. and Scott, James G. , title =. Bayesian Statistics 9 , editor =. 2011 , doi =

work page 2011
[6]

and Polson, Nicholas G

Carvalho, Carlos M. and Polson, Nicholas G. and Scott, James G. , title =. Biometrika , year =

work page
[7]

and Dunson, David B

Bhattacharya, Anirban and Pati, Debdeep and Pillai, Natesh S. and Dunson, David B. , title =. Journal of the American Statistical Association , year =

work page
[8]

and Lee, Jaeyong , title =

Armagan, Artin and Dunson, David B. and Lee, Jaeyong , title =. Statistica Sinica , year =

work page
[9]

Statistica Sinica , year =

Bai, Ray and Ghosh, Malay , title =. Statistica Sinica , year =

work page
[10]

Journal of the American Statistical Association , year =

Ro. Journal of the American Statistical Association , year =

work page
[11]

Journal of the Royal Statistical Society: Series B , year =

Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B , year =

work page
[12]

and Jones, Galin L

Johnson, Alicia A. and Jones, Galin L. , title =. Electronic Journal of Statistics , volume =

work page
[13]

Linear Algebra and Its Applications , volume =

Rom. Linear Algebra and Its Applications , volume =

work page
[14]

2023 , publisher=

Nishimura, Akihiko and Suchard, Marc A , journal=. 2023 , publisher=

work page 2023
[15]

1993 , publisher=

Albert, James H and Chib, Siddhartha , journal=. 1993 , publisher=

work page 1993
[16]

and Chakravarti, Nilotpal , title =

Best, Michael J. and Chakravarti, Nilotpal , title =. Mathematical Programming , volume =. 1990 , doi =

work page 1990
[17]

De Leeuw, Jan and Hornik, Kurt and Mair, Patrick , journal=

work page
[18]

, title =

Dykstra, Richard L. , title =. Journal of Statistical Planning and Inference , volume =. 1981 , doi =

work page 1981
[19]

2024 , publisher=

Griffin, Maryclare and Hoff, Peter D , journal=. 2024 , publisher=

work page 2024
[20]

Tipping, Michael E , journal=

work page
[21]

Bayesian Analysis , year =

Pe. Bayesian Analysis , year =

work page
[22]

Faul, Anita and Tipping, Michael , journal=

work page
[23]

2013 , publisher=

Polson, Nicholas G and Scott, James G and Windle, Jesse , journal=. 2013 , publisher=

work page 2013
[24]

and Brown, Philip J , title =

Griffin, Jim E. and Brown, Philip J , title =. Bayesian Analysis , year =

work page
[25]

Electronic Journal of Statistics , volume =

Vats, Dootika , title =. Electronic Journal of Statistics , volume =

work page
[26]

2010 , publisher=

Zhang, Cun-Hui , journal=. 2010 , publisher=

work page 2010
[27]

2016 , publisher=

Bhattacharya, Anirban and Chakraborty, Antik and Mallick, Bani K , journal=. 2016 , publisher=

work page 2016
[28]

Johndrow, James and Orenstein, Paulo and Bhattacharya, Anirban , journal=

work page
[29]

Rajaratnam, Bala and Sparks, Doug and Khare, Kshitij and Zhang, Liyuan , journal=

work page
[30]

Journal of the American Statistical Association , year =

Fan, Jianqing and Li, Runze , title =. Journal of the American Statistical Association , year =

work page
[31]

Journal of the American Statistical Association , year =

Park, Trevor and Casella, George , title =. Journal of the American Statistical Association , year =

work page

[1] [1]

Yao, Chaorui and Jiang, Xin , journal=

work page

[2] [2]

2015 , publisher=

Lim, Michael and Hastie, Trevor , journal=. 2015 , publisher=

work page 2015

[3] [3]

Guyon, Isabelle and Gunn, Steve and Ben-Hur, Asa and Dror, Gideon , journal=

work page

[4] [4]

and McCulloch, Robert E

George, Edward I. and McCulloch, Robert E. , title =. Journal of the American Statistical Association , year =

work page

[5] [5]

and Scott, James G

Polson, Nicholas G. and Scott, James G. , title =. Bayesian Statistics 9 , editor =. 2011 , doi =

work page 2011

[6] [6]

and Polson, Nicholas G

Carvalho, Carlos M. and Polson, Nicholas G. and Scott, James G. , title =. Biometrika , year =

work page

[7] [7]

and Dunson, David B

Bhattacharya, Anirban and Pati, Debdeep and Pillai, Natesh S. and Dunson, David B. , title =. Journal of the American Statistical Association , year =

work page

[8] [8]

and Lee, Jaeyong , title =

Armagan, Artin and Dunson, David B. and Lee, Jaeyong , title =. Statistica Sinica , year =

work page

[9] [9]

Statistica Sinica , year =

Bai, Ray and Ghosh, Malay , title =. Statistica Sinica , year =

work page

[10] [10]

Journal of the American Statistical Association , year =

Ro. Journal of the American Statistical Association , year =

work page

[11] [11]

Journal of the Royal Statistical Society: Series B , year =

Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B , year =

work page

[12] [12]

and Jones, Galin L

Johnson, Alicia A. and Jones, Galin L. , title =. Electronic Journal of Statistics , volume =

work page

[13] [13]

Linear Algebra and Its Applications , volume =

Rom. Linear Algebra and Its Applications , volume =

work page

[14] [14]

2023 , publisher=

Nishimura, Akihiko and Suchard, Marc A , journal=. 2023 , publisher=

work page 2023

[15] [15]

1993 , publisher=

Albert, James H and Chib, Siddhartha , journal=. 1993 , publisher=

work page 1993

[16] [16]

and Chakravarti, Nilotpal , title =

Best, Michael J. and Chakravarti, Nilotpal , title =. Mathematical Programming , volume =. 1990 , doi =

work page 1990

[17] [17]

De Leeuw, Jan and Hornik, Kurt and Mair, Patrick , journal=

work page

[18] [18]

, title =

Dykstra, Richard L. , title =. Journal of Statistical Planning and Inference , volume =. 1981 , doi =

work page 1981

[19] [19]

2024 , publisher=

Griffin, Maryclare and Hoff, Peter D , journal=. 2024 , publisher=

work page 2024

[20] [20]

Tipping, Michael E , journal=

work page

[21] [21]

Bayesian Analysis , year =

Pe. Bayesian Analysis , year =

work page

[22] [22]

Faul, Anita and Tipping, Michael , journal=

work page

[23] [23]

2013 , publisher=

Polson, Nicholas G and Scott, James G and Windle, Jesse , journal=. 2013 , publisher=

work page 2013

[24] [24]

and Brown, Philip J , title =

Griffin, Jim E. and Brown, Philip J , title =. Bayesian Analysis , year =

work page

[25] [25]

Electronic Journal of Statistics , volume =

Vats, Dootika , title =. Electronic Journal of Statistics , volume =

work page

[26] [26]

2010 , publisher=

Zhang, Cun-Hui , journal=. 2010 , publisher=

work page 2010

[27] [27]

2016 , publisher=

Bhattacharya, Anirban and Chakraborty, Antik and Mallick, Bani K , journal=. 2016 , publisher=

work page 2016

[28] [28]

Johndrow, James and Orenstein, Paulo and Bhattacharya, Anirban , journal=

work page

[29] [29]

Rajaratnam, Bala and Sparks, Doug and Khare, Kshitij and Zhang, Liyuan , journal=

work page

[30] [30]

Journal of the American Statistical Association , year =

Fan, Jianqing and Li, Runze , title =. Journal of the American Statistical Association , year =

work page

[31] [31]

Journal of the American Statistical Association , year =

Park, Trevor and Casella, George , title =. Journal of the American Statistical Association , year =

work page