Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression
Pith reviewed 2026-05-23 07:47 UTC · model grok-4.3
The pith
Probit and logit data augmentation samplers mix in O(n log(log η/ε)) steps with high probability over data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a modified conductance-based method, the first non-asymptotic polynomial upper bounds are proved for the mixing times of ProbitDA, LogitDA, and LassoDA, with explicit dependence on design matrix for the first two, leading to O(n log(log η/ε)) under data assumptions.
What carries the argument
A modified conductance-based method for analyzing the mixing time of two-block Gibbs samplers in data augmentation algorithms.
If this is right
- ProbitDA and LogitDA achieve ε-mixing in O(n log(log η/ε)) steps with η-warm start.
- LassoDA requires O(d²(d log d + n log n)² log(η/ε)) steps for TV distance ε.
- The bounds hold with high probability for data from sub-Gaussian or log-concave distributions.
- These bounds are applicable even with highly imbalanced response data.
- The results provide comparisons to Langevin Monte Carlo methods.
Where Pith is reading between the lines
- The conductance method could be adapted to analyze other two-block samplers beyond DA.
- Fast mixing suggests these algorithms are practical for large n in Bayesian inference.
- Extensions might include other priors or models where DA is used.
- Initialization strategies could be further optimized based on these bounds.
Load-bearing premise
The data must be independently generated from a sub-Gaussian or log-concave distribution and properly scaled.
What would settle it
Finding a dataset generated from a sub-Gaussian distribution where the mixing time of ProbitDA exceeds the stated O(n log(log η/ε)) bound would falsify the result.
Figures
read the original abstract
We propose using a modified conductance-based method to study the mixing time of an important class of two-block Gibbs samplers, the data augmentation (DA) algorithm. %, which is of prominent interest in both theoretical and empirical research. Using this method, we prove the first non-asymptotic polynomial upper bounds on mixing times of three important DA algorithms: DA algorithms for Bayesian Probit regression (Albert and Chib, 1993, ProbitDA) and Bayesian Logit regression (Polson, Scott, and Windle, 2013, LogitDA), and Bayesian Lasso Regression (Park and Casella, 2008, Rajaratnam et al., 2015, LassoDA). Concretely, for ProbitDA and LogitDA, we demonstrate a tight bound that explicitly depends on the design matrix and prior covariance matrix. Under the assumption that data are independently generated from either a sub-Gaussian or log-concave distribution and properly scaled, the bound implies that with $\eta$-warm start, parameter dimension $d$, and sample size $n$, with high probability over data, the two algorithms require $\mathcal{O}\left(n\log \left(\frac{\log \eta}{\epsilon}\right)\right)$ steps to obtain samples with at most $\epsilon$ error in TV, KL, or $\chi^2$ distance. Meanwhile, we show that under minimal data assumptions, LassoDA requires $\mathcal{O}\left(d^2(d\log d +n \log n)^2 \log \left(\frac{\eta}{\epsilon}\right)\right)$ steps to achieve $\epsilon$-accuracy in TV distance. The results are generally applicable to settings with large $n$ and large $d$, including settings with highly imbalanced response data in Probit and Logit regression. We compare them with the best known guarantees of Langevin Monte Carlo and Metropolis Adjusted Langevin Algorithm. We evaluate our theoretical results using numerical examples, and discuss the mixing times of the three algorithms under feasible initialization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a modified conductance-based method to derive non-asymptotic upper bounds on the mixing times of data augmentation (DA) Gibbs samplers for Bayesian probit regression (ProbitDA), logit regression (LogitDA), and lasso regression (LassoDA). It claims the first polynomial bounds: for ProbitDA and LogitDA, under independent sub-Gaussian or log-concave data with proper scaling, an O(n log(log η / ε)) mixing time (with high probability over data) from an η-warm start to ε-accuracy in total variation, KL, or χ² distance; for LassoDA, an O(d²(d log d + n log n)² log(η/ε)) bound in TV distance. The work includes comparisons to Langevin Monte Carlo and MALA, plus numerical validation, and applies to large-n, large-d regimes including imbalanced responses.
Significance. If the derivations hold, the results supply the first explicit, non-asymptotic polynomial mixing-time guarantees for these standard DA algorithms, which are widely used in Bayesian computation. The explicit dependence on the design matrix and prior covariance, the high-probability simplification under standard data assumptions, and the direct comparison with gradient-based samplers are useful for both theory and practice. The numerical examples provide concrete support for the claimed rates.
minor comments (3)
- The abstract states that the general bound 'explicitly depends on the design matrix and prior covariance matrix' before simplifying under the sub-Gaussian/log-concave assumption; the main text should state this general bound (with the precise matrix dependence) as a numbered theorem early in the results section so readers can see the reduction.
- The phrase 'properly scaled' in the data assumption for the O(n log(...)) bound is used without an explicit definition or reference to a scaling condition on the design matrix; a short clarifying sentence or display equation would remove ambiguity.
- The comparison with LMC and MALA is mentioned but the precise regimes (e.g., step-size choices or dimension dependence) under which the DA bounds are competitive are not summarized in a table or remark; adding such a comparison would improve readability.
Simulated Author's Rebuttal
We thank the referee for their careful reading and positive assessment of our manuscript. Their summary correctly identifies the main contributions: the first explicit non-asymptotic polynomial mixing-time bounds for ProbitDA, LogitDA, and LassoDA via a modified conductance argument, together with high-probability simplifications under standard data assumptions and comparisons to gradient-based methods. We are pleased that the referee finds the explicit dependence on the design matrix, prior, and the applicability to large-n/large-d and imbalanced regimes useful.
Circularity Check
No significant circularity
full rationale
The paper derives non-asymptotic mixing time bounds for the three DA algorithms by applying a modified conductance method to the two-block Gibbs samplers. The resulting bounds are stated to depend explicitly on the design matrix and prior covariance; they simplify to the claimed O(n log(log η / ε)) form only after imposing the external sub-Gaussian or log-concave data assumptions and proper scaling. No equation reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation chain; the conductance argument supplies independent analytic content that is not presupposed by the inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard conductance bounds for Markov chains apply after the proposed modification
- domain assumption Data are i.i.d. from sub-Gaussian or log-concave distributions and properly scaled
Forward citations
Cited by 1 Pith paper
-
Encoder-Free Human Motion Understanding via Structured Motion Descriptions
SMD converts human motion data into structured text descriptions, enabling LLMs to reach new state-of-the-art results on motion question answering and captioning without learned encoders.
Reference graph
Works this paper leans on
-
[1]
A DAMCZAK , R., L ITVAK , A., P AJOR , A. and T OMCZAK -JAEGERMANN , N. (2010). Quantitative esti- mates of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society 23 535–561
work page 2010
-
[2]
A DAMCZAK , R., L ITVAK , A. E., P AJOR , A. and T OMCZAK -JAEGERMANN , N. (2011). Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus. Mathématique 349 195–200
work page 2011
-
[3]
A LBERT , J. H. and C HIB , S. (1993). Bayesian analysis of binary and polychotomous response data. Jour- nal of the American statistical Association 88 669–679
work page 1993
-
[4]
A LONSO -GUTIÉRREZ , D. and B ASTERO , J. (2015). Approaching the Kannan-Lovász-Simonovits and variance conjectures 2131. Springer
work page 2015
-
[5]
A LTSCHULER , J. M. and C HEWI , S. (2024). Faster high-accuracy log-concave sampling via algorithmic warm starts. Journal of the ACM 71 1–55
work page 2024
-
[6]
A SCOLANI , F., LAVENANT , H. and ZANELLA , G. (2024). Entropy contraction of the Gibbs sampler under log-concavity. arXiv preprint arXiv:2410.00858
-
[7]
B ALASUBRAMANIAN , K., C HEWI , S., E RDOGDU , M. A., S ALIM , A. and Z HANG , S. (2022). Towards a theory of non-log-concave sampling: first-order stationarity guarantees for langevin monte carlo. In Conference on Learning Theory 2896–2923. PMLR
work page 2022
-
[8]
Spectral gaps, symmetries and log-concave perturbations
B ARTHE , F. and K LARTAG , B. (2019). Spectral gaps, symmetries and log-concave perturbations. arXiv preprint arXiv:1907.01823
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[9]
B ARTHE , F. and M ILMAN , E. (2013). Transference principles for log-Sobolev and spectral-gap with ap- plications to conservative spin systems. Communications in Mathematical Physics 323 575–625
work page 2013
-
[10]
B OBKOV, S. G. (1999). Isoperimetric and analytic inequalities for log-concave probability measures. The Annals of Probability 27 1903–1921
work page 1999
-
[11]
B OBKOV, S. G. and H OUDRÉ , C. (1997). Isoperimetric constants for product probability measures. The Annals of Probability 184–205
work page 1997
-
[12]
C AFFARELLI , L. A. (2000). Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics 214 547–563
work page 2000
-
[13]
C ATTIAUX , P. and GUILLIN , A. (2020). On the Poincaré constant of log-concave measures. In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2017-2019 Volume I171–217. Springer
work page 2020
-
[14]
C HEN , Y. and GATMIRY, K. (2023). When does Metropolized Hamiltonian Monte Carlo provably outper- form Metropolis-adjusted Langevin algorithm? arXiv preprint arXiv:2304.04724
-
[15]
C HEN , Y., DWIVEDI , R., W AINWRIGHT , M. J. and Y U, B. (2018). Fast MCMC sampling algorithms on polytopes. Journal of Machine Learning Research 19 1–86
work page 2018
-
[16]
C HEN , Y., DWIVEDI , R., WAINWRIGHT , M. J. and Y U, B. (2020). Fast mixing of Metropolized Hamilto- nian Monte Carlo: Benefits of multi-step gradients.Journal of Machine Learning Research21 1–72
work page 2020
-
[17]
C HENG , X. and B ARTLETT , P. (2018). Convergence of Langevin MCMC in KL-divergence. In Algorith- mic Learning Theory 186–211. PMLR
work page 2018
-
[18]
C HEWI , S. (2023). Log-concave sampling. Book draft available at https://chewisinho. github. io
work page 2023
-
[19]
C HEWI , S., L U, C., A HN, K., C HENG , X., L E GOUIC , T. and R IGOLLET , P. (2021). Optimal dimen- sion dependence of the Metropolis-adjusted Langevin algorithm. In Conference on Learning Theory 1260–1300. PMLR
work page 2021
-
[20]
C HEWI , S., E RDOGDU , M. A., L I, M., S HEN , R. and Z HANG , M. S. (2024). Analysis of langevin monte carlo from poincare to log-sobolev. Foundations of Computational Mathematics 1–51
work page 2024
-
[21]
C HOI , H. M. and H OBERT , J. P. (2013). The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic
work page 2013
-
[22]
C OHEN , A. and EINAV, L. (2007). Estimating risk preferences from deductible choice.American economic review 97 745–788
work page 2007
-
[23]
C OURTADE , T. A. (2020). Bounds on the Poincaré constant for convolution measures
work page 2020
-
[24]
C OUSINS , B. and V EMPALA , S. (2014). A cubic algorithm for computing Gaussian volume. In Proceed- ings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms 1215–1228. SIAM
work page 2014
-
[25]
D AI, Y., GAO, Y., HUANG , J., J IAO, Y., KANG , L. and L IU, J. (2023). Lipschitz Transport Maps via the Follmer Flow. arXiv preprint arXiv:2309.03490
-
[26]
D ALALYAN , A. (2017a). Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Conference on Learning Theory 678–689. PMLR
-
[27]
D ALALYAN , A. S. (2017b). Theoretical guarantees for approximate sampling from smooth and log- concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology 79 651– 676. FAST MIXING OF DATA AUGMENTATION ALGORITHMS 27
-
[28]
D ALALYAN , A. S. and K ARAGULYAN , A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications 129 5278–5311
work page 2019
-
[29]
D ALALYAN , A. S. and T SYBAKOV , A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences 78 1423–1443
work page 2012
-
[30]
D IEBOLT , J. and R OBERT , C. P. (1994). Estimation of finite mixture distributions through Bayesian sam- pling. Journal of the Royal Statistical Society: Series B (Methodological) 56 363–375
work page 1994
-
[31]
D URANTE , D. and D UNSON , D. B. (2018). Bayesian inference and testing of group differences in brain networks
work page 2018
-
[32]
D URMUS , A., M AJEWSKI , S. and M IASOJEDOW , B. (2019). Analysis of Langevin Monte Carlo via con- vex optimization. Journal of Machine Learning Research 20 1–46
work page 2019
-
[33]
D URMUS , A. and M OULINES , E. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm
work page 2017
-
[34]
D URMUS , A. and M OULINES , E. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm
work page 2019
-
[35]
D VORZAK , M. and W AGNER , H. (2016). Sparse Bayesian modelling of underreported count data. Statis- tical Modelling 16 24–46
work page 2016
-
[36]
D WIVEDI , R., C HEN , Y., WAINWRIGHT , M. J. and Y U, B. (2019). Log-concave sampling: Metropolis- Hastings algorithms are fast. Journal of Machine Learning Research 20 1–42
work page 2019
-
[37]
D YER , M., F RIEZE , A. and KANNAN , R. (1991). A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM (JACM) 38 1–17
work page 1991
-
[38]
E LDAN , R. (2013). Thin shell implies spectral gap up to polylog via a stochastic localization scheme. Geometric and Functional Analysis 23 532–569
work page 2013
-
[39]
E RDOGDU , M. A., H OSSEINZADEH , R. and ZHANG , S. (2022). Convergence of Langevin Monte Carlo in chi-squared and Rényi divergence. In International Conference on Artificial Intelligence and Statis- tics 8151–8175. PMLR
work page 2022
-
[40]
F RUEHWIRTH -S CHNATTER , S. and F RÜHWIRTH , R. (2007). Auxiliary mixture sampling with applica- tions to logistic models. Computational Statistics & Data Analysis 51 3509–3528
work page 2007
-
[41]
G RANT , E. H. C., M ILLER , D. A., S CHMIDT , B. R., A DAMS , M. J., A MBURGEY , S. M., C HAM - BERT, T., C RUICKSHANK , S. S., F ISHER , R. N., G REEN , D. M., H OSSACK , B. R. et al. (2016). Quantitative evidence for the effects of multiple drivers on continental-scale amphibian declines.Sci- entific reports 6 25625
work page 2016
-
[42]
E., M ATECHOU , E., B UXTON , A
G RIFFIN , J. E., M ATECHOU , E., B UXTON , A. S., B ORMPOUDAKIS , D. and G RIFFITHS , R. A. (2020). Modelling environmental DNA data; Bayesian variable selection accounting for false positive and false negative errors. Journal of the Royal Statistical Society Series C: Applied Statistics 69 377– 392
work page 2020
-
[43]
H ANS , C. (2009). Bayesian lasso regression. Biometrika 96 835–845
work page 2009
-
[44]
H ELD , L. and H OLMES , C. C. (2006). Bayesian auxiliary variable models for binary and multinomial regression
work page 2006
-
[45]
H OBERT , J. P. (2011). The data augmentation algorithm: Theory and methodology. Handbook of Markov Chain Monte Carlo 253–293
work page 2011
-
[46]
H OLLEY , R. and S TROOCK , D. W. (1986). Logarithmic Sobolev inequalities and stochastic Ising models
work page 1986
-
[47]
J ERRUM , M. and S INCLAIR , A. (1989). Approximating the permanent. SIAM journal on computing 18 1149–1178
work page 1989
-
[48]
J OHNDROW , J. E., S MITH , A., P ILLAI , N. and D UNSON , D. B. (2019). MCMC for imbalanced categor- ical data. Journal of the American Statistical Association
work page 2019
-
[49]
J ONES , G. L. and H OBERT , J. P. (2001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statistical Science 312–334
work page 2001
-
[50]
J OSEPH , L., G YORKOS , T. W. and COUPAL , L. (1995). Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. American journal of epidemiology 141 263–272
work page 1995
-
[51]
J USTINIANO , A. and P RIMICERI , G. E. (2008). The time-varying volatility of macroeconomic fluctua- tions. American Economic Review 98 604–641
work page 2008
-
[52]
K ANNAN , R., L OVÁSZ , L. and S IMONOVITS , M. (1995). Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry 13 541–559
work page 1995
-
[53]
K ANNAN , R., L OVÁSZ , L. and S IMONOVITS , M. (1997). Random walks and an o*(n5) volume algorithm for convex bodies. Random Structures & Algorithms 11 1–50
work page 1997
-
[54]
K HARE , K. and H OBERT , J. P. (2013). Geometric ergodicity of the Bayesian lasso
work page 2013
-
[55]
K IM, Y.-H. and M ILMAN , E. (2011). A Generalization of Caffarelli’s Contraction Theorem via (reverse) Heat Flow. 28 LEE AND ZHANG
work page 2011
- [56]
-
[57]
K OLESNIKOV , A. V. (2011). Mass transportation and contractions. arXiv preprint arXiv:1103.1479
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[58]
L AWLER , G. F. and S OKAL , A. D. (1988). Bounds on the L2 spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality. Transactions of the American mathematical so- ciety 309 557–580
work page 1988
-
[59]
L EE, Y. T., S HEN , R. and T IAN , K. (2020). Logsmooth gradient concentration and tighter runtimes for Metropolized Hamiltonian Monte Carlo. In Conference on learning theory 2565–2597. PMLR
work page 2020
-
[60]
L EE, Y. T. and V EMPALA , S. S. (2017). Eldan’s stochastic localization and the KLS hyperplane conjec- ture: an improved lower bound for expansion. In2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) 998–1007. IEEE
work page 2017
-
[61]
L EVIN , D. A. and P ERES , Y. (2017). Markov chains and mixing times 107. American Mathematical Soc
work page 2017
-
[62]
L IU, J. S., W ONG , W. H. and K ONG , A. (1994). Covariance structure of the Gibbs sampler with applica- tions to the comparisons of estimators and augmentation schemes. Biometrika 81 27–40
work page 1994
-
[63]
L OVÁSZ , L. (1999). Hit-and-run mixes fast. Mathematical programming 86 443–461
work page 1999
-
[64]
L OVÁSZ , L. and S IMONOVITS , M. (1993). Random walks in a convex body and an improved volume algorithm. Random structures & algorithms 4 359–412
work page 1993
-
[65]
L OVÁSZ , L. and V EMPALA , S. (2003). Hit-and-run is fast and fun. preprint, Microsoft Research
work page 2003
-
[66]
L OVÁSZ , L. and VEMPALA , S. (2004). Hit-and-run from a corner. InProceedings of the thirty-sixth annual ACM symposium on Theory of computing 310–314
work page 2004
-
[67]
L OVÁSZ , L. and V EMPALA , S. (2006). Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)57–68. IEEE
work page 2006
-
[68]
L OVÁSZ , L. and V EMPALA , S. (2007). The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms 30 307–358
work page 2007
-
[69]
S., C HENG , X., F LAMMARION , N., B ARTLETT , P
M A, Y.-A., C HATTERJI , N. S., C HENG , X., F LAMMARION , N., B ARTLETT , P. L. and J ORDAN , M. I. (2021). Is there an analog of Nesterov acceleration for gradient-based MCMC?
work page 2021
-
[70]
M ALLICK , H. and Y I, N. (2014). A new Bayesian lasso. Statistics and its interface 7 571
work page 2014
-
[71]
M ANGOUBI , O. and S MITH , A. (2021). Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions: Continuous dynamics. The Annals of Applied Probability 31 2019–2045
work page 2021
-
[72]
M IKULINCER , D. and S HENFELD , Y. (2024). The Brownian transport map. Probability Theory and Re- lated Fields 1–66
work page 2024
-
[73]
M ILMAN , E. (2010). Isoperimetric and concentration inequalities: equivalence under curvature lower bound
work page 2010
-
[74]
M ILMAN , E. (2012). Properties of isoperimetric, functional and transport-entropy inequalities via concen- tration. Probability Theory and Related Fields 152 475–507
work page 2012
-
[75]
M ILMAN , E. and S ODIN , S. (2008). An isoperimetric inequality for uniformly log-concave measures and uniformly convex bodies. Journal of Functional Analysis 254 1235–1268
work page 2008
-
[76]
M ORGAN , F. (2005). Manifolds with density. Notices of the AMS 52 853–858
work page 2005
-
[77]
M OU, W., H O, N., W AINWRIGHT , M. J., B ARTLETT , P. L. and J ORDAN , M. I. (2019). Sampling for bayesian mixture models: Mcmc with polynomial-time mixing. arXiv preprint arXiv:1912.05153
-
[78]
K., H E, Y., B ALASUBRAMANIAN , K
M OUSAVI -H OSSEINI , A., F ARGHLY, T. K., H E, Y., B ALASUBRAMANIAN , K. and E RDOGDU , M. A. (2023). Towards a complete analysis of langevin monte carlo: Beyond poincaré inequality. In The Thirty Sixth Annual Conference on Learning Theory 1–35. PMLR
work page 2023
-
[79]
N ARAYANAN , H. (2016). Randomized interior point methods for sampling and optimization
work page 2016
-
[80]
N GUYEN , H. H., N GO, V. M. and T RAN , A. N. T. (2021). Financial performances, entrepreneurial fac- tors and coping strategy to survive in the COVID-19 pandemic: case of Vietnam. Research in Inter- national Business and Finance 56 101380
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.