Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

Holden Lee; Kexin Zhang

arxiv: 2412.07999 · v3 · submitted 2024-12-11 · 🧮 math.ST · math.PR· stat.ML· stat.TH

Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

Holden Lee , Kexin Zhang This is my paper

Pith reviewed 2026-05-23 07:47 UTC · model grok-4.3

classification 🧮 math.ST math.PRstat.MLstat.TH

keywords mixing timedata augmentationGibbs samplerBayesian probitBayesian logitBayesian lassoconductance methodnon-asymptotic bounds

0 comments

The pith

Probit and logit data augmentation samplers mix in O(n log(log η/ε)) steps with high probability over data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a modified conductance method to bound the mixing times of data augmentation Gibbs samplers for Bayesian regression models. It proves the first polynomial upper bounds on the number of steps needed for ProbitDA and LogitDA to reach ε-accuracy in total variation, KL, or chi-squared distance. The bounds depend explicitly on the design matrix and prior, and simplify to O(n log(log η/ε)) when data are sub-Gaussian or log-concave and scaled properly. For LassoDA the bound is polynomial in d and n but higher order. These results apply to large-scale settings with imbalanced responses.

Core claim

Using a modified conductance-based method, the first non-asymptotic polynomial upper bounds are proved for the mixing times of ProbitDA, LogitDA, and LassoDA, with explicit dependence on design matrix for the first two, leading to O(n log(log η/ε)) under data assumptions.

What carries the argument

A modified conductance-based method for analyzing the mixing time of two-block Gibbs samplers in data augmentation algorithms.

If this is right

ProbitDA and LogitDA achieve ε-mixing in O(n log(log η/ε)) steps with η-warm start.
LassoDA requires O(d²(d log d + n log n)² log(η/ε)) steps for TV distance ε.
The bounds hold with high probability for data from sub-Gaussian or log-concave distributions.
These bounds are applicable even with highly imbalanced response data.
The results provide comparisons to Langevin Monte Carlo methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The conductance method could be adapted to analyze other two-block samplers beyond DA.
Fast mixing suggests these algorithms are practical for large n in Bayesian inference.
Extensions might include other priors or models where DA is used.
Initialization strategies could be further optimized based on these bounds.

Load-bearing premise

The data must be independently generated from a sub-Gaussian or log-concave distribution and properly scaled.

What would settle it

Finding a dataset generated from a sub-Gaussian distribution where the mixing time of ProbitDA exceeds the stated O(n log(log η/ε)) bound would falsify the result.

Figures

Figures reproduced from arXiv: 2412.07999 by Holden Lee, Kexin Zhang.

**Figure 1.** Figure 1: Illustration of the transition kernels of ProbitDA, LogitDA, and LassoDA. Here, the arrow represents conditional dependency [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: Simulation results for ProbitDA with imbalance factor Υ = 0.6 [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Simulation results for ProbitDA with imbalance factor Υ = 1 [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Simulation results for LogitDA with imbalance factor Υ = 0.6. demonstrated in Theorem 3.6, is tight in n, but the d’s dependency can be potentially improved. In [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Simulation results for LogitDA with imbalance factor Υ = 1 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Simulation results for LassoDA. We report the autocorrelation time for both the vcoordinate and the first coordinate of β. We plot the autocorrelation time for the three scenarios in [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of the kernel of T -transformed LassoDA = TV φ (m) , φ(m) π + TV φ (m−1)PTφ→ρ , φ(m−1) π PTφ→ρ ≤(i) 2 TV φ (m−1), φ(m−1) π = 2 TV νTφ P m−1 Tφ , πTφ , where (i) is due to data processing equality. Overall, we have TV (νP m, π) = TV (νTP m T , πT ) ≤ 2 TV νTφ P m−1 Tφ , πTφ (31) . Equation (31) gives us a way to control the mixing time of the LassoDA by that of φ-marginal of… view at source ↗

read the original abstract

We propose using a modified conductance-based method to study the mixing time of an important class of two-block Gibbs samplers, the data augmentation (DA) algorithm. %, which is of prominent interest in both theoretical and empirical research. Using this method, we prove the first non-asymptotic polynomial upper bounds on mixing times of three important DA algorithms: DA algorithms for Bayesian Probit regression (Albert and Chib, 1993, ProbitDA) and Bayesian Logit regression (Polson, Scott, and Windle, 2013, LogitDA), and Bayesian Lasso Regression (Park and Casella, 2008, Rajaratnam et al., 2015, LassoDA). Concretely, for ProbitDA and LogitDA, we demonstrate a tight bound that explicitly depends on the design matrix and prior covariance matrix. Under the assumption that data are independently generated from either a sub-Gaussian or log-concave distribution and properly scaled, the bound implies that with $\eta$-warm start, parameter dimension $d$, and sample size $n$, with high probability over data, the two algorithms require $\mathcal{O}\left(n\log \left(\frac{\log \eta}{\epsilon}\right)\right)$ steps to obtain samples with at most $\epsilon$ error in TV, KL, or $\chi^2$ distance. Meanwhile, we show that under minimal data assumptions, LassoDA requires $\mathcal{O}\left(d^2(d\log d +n \log n)^2 \log \left(\frac{\eta}{\epsilon}\right)\right)$ steps to achieve $\epsilon$-accuracy in TV distance. The results are generally applicable to settings with large $n$ and large $d$, including settings with highly imbalanced response data in Probit and Logit regression. We compare them with the best known guarantees of Langevin Monte Carlo and Metropolis Adjusted Langevin Algorithm. We evaluate our theoretical results using numerical examples, and discuss the mixing times of the three algorithms under feasible initialization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first explicit non-asymptotic polynomial mixing bounds for ProbitDA, LogitDA, and LassoDA via modified conductance.

read the letter

This paper establishes the first non-asymptotic polynomial upper bounds on mixing times for the three named data augmentation Gibbs samplers. The bounds depend explicitly on the design matrix and prior covariance, which is the concrete advance over prior work that lacked such rates. For ProbitDA and LogitDA, under sub-Gaussian or log-concave data with proper scaling, the bound reduces to O(n log(log η / ε)) steps with high probability over the data to reach ε accuracy in TV, KL, or χ². The LassoDA bound is O(d² (d log d + n log n)² log(η/ε)) in TV distance under weaker assumptions. They also compare these rates to Langevin Monte Carlo and MALA and include numerical checks plus discussion of initialization. The results apply to large n and d, including imbalanced cases. The modified conductance method is the technical device that delivers the explicit dependence, and the abstract shows no circularity or hidden dimension blowup. The main limitation is that the full proof details are not visible here, so the precise error control in the conductance application remains to be checked in the manuscript. This work is aimed at researchers who need mixing time guarantees for these specific Bayesian regression samplers. A reader focused on non-asymptotic MCMC analysis would get direct value from the rates and the comparisons. It deserves peer review because the claims are new, the problem is standard, and the evidence presented in the abstract is internally consistent.

Referee Report

0 major / 3 minor

Summary. The manuscript develops a modified conductance-based method to derive non-asymptotic upper bounds on the mixing times of data augmentation (DA) Gibbs samplers for Bayesian probit regression (ProbitDA), logit regression (LogitDA), and lasso regression (LassoDA). It claims the first polynomial bounds: for ProbitDA and LogitDA, under independent sub-Gaussian or log-concave data with proper scaling, an O(n log(log η / ε)) mixing time (with high probability over data) from an η-warm start to ε-accuracy in total variation, KL, or χ² distance; for LassoDA, an O(d²(d log d + n log n)² log(η/ε)) bound in TV distance. The work includes comparisons to Langevin Monte Carlo and MALA, plus numerical validation, and applies to large-n, large-d regimes including imbalanced responses.

Significance. If the derivations hold, the results supply the first explicit, non-asymptotic polynomial mixing-time guarantees for these standard DA algorithms, which are widely used in Bayesian computation. The explicit dependence on the design matrix and prior covariance, the high-probability simplification under standard data assumptions, and the direct comparison with gradient-based samplers are useful for both theory and practice. The numerical examples provide concrete support for the claimed rates.

minor comments (3)

The abstract states that the general bound 'explicitly depends on the design matrix and prior covariance matrix' before simplifying under the sub-Gaussian/log-concave assumption; the main text should state this general bound (with the precise matrix dependence) as a numbered theorem early in the results section so readers can see the reduction.
The phrase 'properly scaled' in the data assumption for the O(n log(...)) bound is used without an explicit definition or reference to a scaling condition on the design matrix; a short clarifying sentence or display equation would remove ambiguity.
The comparison with LMC and MALA is mentioned but the precise regimes (e.g., step-size choices or dimension dependence) under which the DA bounds are competitive are not summarized in a table or remark; adding such a comparison would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive assessment of our manuscript. Their summary correctly identifies the main contributions: the first explicit non-asymptotic polynomial mixing-time bounds for ProbitDA, LogitDA, and LassoDA via a modified conductance argument, together with high-probability simplifications under standard data assumptions and comparisons to gradient-based methods. We are pleased that the referee finds the explicit dependence on the design matrix, prior, and the applicability to large-n/large-d and imbalanced regimes useful.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives non-asymptotic mixing time bounds for the three DA algorithms by applying a modified conductance method to the two-block Gibbs samplers. The resulting bounds are stated to depend explicitly on the design matrix and prior covariance; they simplify to the claimed O(n log(log η / ε)) form only after imposing the external sub-Gaussian or log-concave data assumptions and proper scaling. No equation reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation chain; the conductance argument supplies independent analytic content that is not presupposed by the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard Markov chain conductance theory and the data distribution assumptions stated in the abstract; no free parameters or invented entities are apparent from the abstract.

axioms (2)

standard math Standard conductance bounds for Markov chains apply after the proposed modification
Invoked to obtain the mixing time upper bounds for the two-block Gibbs samplers
domain assumption Data are i.i.d. from sub-Gaussian or log-concave distributions and properly scaled
Required for the high-probability O(n log(log η / ε)) bound to hold

pith-pipeline@v0.9.0 · 5913 in / 1367 out tokens · 34878 ms · 2026-05-23T07:47:57.471170+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Encoder-Free Human Motion Understanding via Structured Motion Descriptions
cs.CV 2026-04 unverdicted novelty 7.0

SMD converts human motion data into structured text descriptions, enabling LLMs to reach new state-of-the-art results on motion question answering and captioning without learned encoders.

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

and T OMCZAK -JAEGERMANN , N

A DAMCZAK , R., L ITVAK , A., P AJOR , A. and T OMCZAK -JAEGERMANN , N. (2010). Quantitative esti- mates of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society 23 535–561

work page 2010
[2]

E., P AJOR , A

A DAMCZAK , R., L ITVAK , A. E., P AJOR , A. and T OMCZAK -JAEGERMANN , N. (2011). Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus. Mathématique 349 195–200

work page 2011
[3]

A LBERT , J. H. and C HIB , S. (1993). Bayesian analysis of binary and polychotomous response data. Jour- nal of the American statistical Association 88 669–679

work page 1993
[4]

and B ASTERO , J

A LONSO -GUTIÉRREZ , D. and B ASTERO , J. (2015). Approaching the Kannan-Lovász-Simonovits and variance conjectures 2131. Springer

work page 2015
[5]

A LTSCHULER , J. M. and C HEWI , S. (2024). Faster high-accuracy log-concave sampling via algorithmic warm starts. Journal of the ACM 71 1–55

work page 2024
[6]

and ZANELLA , G

A SCOLANI , F., LAVENANT , H. and ZANELLA , G. (2024). Entropy contraction of the Gibbs sampler under log-concavity. arXiv preprint arXiv:2410.00858

work page arXiv 2024
[7]

A., S ALIM , A

B ALASUBRAMANIAN , K., C HEWI , S., E RDOGDU , M. A., S ALIM , A. and Z HANG , S. (2022). Towards a theory of non-log-concave sampling: first-order stationarity guarantees for langevin monte carlo. In Conference on Learning Theory 2896–2923. PMLR

work page 2022
[8]

Spectral gaps, symmetries and log-concave perturbations

B ARTHE , F. and K LARTAG , B. (2019). Spectral gaps, symmetries and log-concave perturbations. arXiv preprint arXiv:1907.01823

work page internal anchor Pith review Pith/arXiv arXiv 2019
[9]

and M ILMAN , E

B ARTHE , F. and M ILMAN , E. (2013). Transference principles for log-Sobolev and spectral-gap with ap- plications to conservative spin systems. Communications in Mathematical Physics 323 575–625

work page 2013
[10]

B OBKOV, S. G. (1999). Isoperimetric and analytic inequalities for log-concave probability measures. The Annals of Probability 27 1903–1921

work page 1999
[11]

B OBKOV, S. G. and H OUDRÉ , C. (1997). Isoperimetric constants for product probability measures. The Annals of Probability 184–205

work page 1997
[12]

C AFFARELLI , L. A. (2000). Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics 214 547–563

work page 2000
[13]

and GUILLIN , A

C ATTIAUX , P. and GUILLIN , A. (2020). On the Poincaré constant of log-concave measures. In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2017-2019 Volume I171–217. Springer

work page 2020
[14]

and GATMIRY, K

C HEN , Y. and GATMIRY, K. (2023). When does Metropolized Hamiltonian Monte Carlo provably outper- form Metropolis-adjusted Langevin algorithm? arXiv preprint arXiv:2304.04724

work page arXiv 2023
[15]

C HEN , Y., DWIVEDI , R., W AINWRIGHT , M. J. and Y U, B. (2018). Fast MCMC sampling algorithms on polytopes. Journal of Machine Learning Research 19 1–86

work page 2018
[16]

C HEN , Y., DWIVEDI , R., WAINWRIGHT , M. J. and Y U, B. (2020). Fast mixing of Metropolized Hamilto- nian Monte Carlo: Benefits of multi-step gradients.Journal of Machine Learning Research21 1–72

work page 2020
[17]

and B ARTLETT , P

C HENG , X. and B ARTLETT , P. (2018). Convergence of Langevin MCMC in KL-divergence. In Algorith- mic Learning Theory 186–211. PMLR

work page 2018
[18]

C HEWI , S. (2023). Log-concave sampling. Book draft available at https://chewisinho. github. io

work page 2023
[19]

and R IGOLLET , P

C HEWI , S., L U, C., A HN, K., C HENG , X., L E GOUIC , T. and R IGOLLET , P. (2021). Optimal dimen- sion dependence of the Metropolis-adjusted Langevin algorithm. In Conference on Learning Theory 1260–1300. PMLR

work page 2021
[20]

A., L I, M., S HEN , R

C HEWI , S., E RDOGDU , M. A., L I, M., S HEN , R. and Z HANG , M. S. (2024). Analysis of langevin monte carlo from poincare to log-sobolev. Foundations of Computational Mathematics 1–51

work page 2024
[21]

C HOI , H. M. and H OBERT , J. P. (2013). The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic

work page 2013
[22]

and EINAV, L

C OHEN , A. and EINAV, L. (2007). Estimating risk preferences from deductible choice.American economic review 97 745–788

work page 2007
[23]

C OURTADE , T. A. (2020). Bounds on the Poincaré constant for convolution measures

work page 2020
[24]

and V EMPALA , S

C OUSINS , B. and V EMPALA , S. (2014). A cubic algorithm for computing Gaussian volume. In Proceed- ings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms 1215–1228. SIAM

work page 2014
[25]

and L IU, J

D AI, Y., GAO, Y., HUANG , J., J IAO, Y., KANG , L. and L IU, J. (2023). Lipschitz Transport Maps via the Follmer Flow. arXiv preprint arXiv:2309.03490

work page arXiv 2023
[26]

D ALALYAN , A. (2017a). Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Conference on Learning Theory 678–689. PMLR

work page
[27]

D ALALYAN , A. S. (2017b). Theoretical guarantees for approximate sampling from smooth and log- concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology 79 651– 676. FAST MIXING OF DATA AUGMENTATION ALGORITHMS 27

work page
[28]

D ALALYAN , A. S. and K ARAGULYAN , A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications 129 5278–5311

work page 2019
[29]

D ALALYAN , A. S. and T SYBAKOV , A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences 78 1423–1443

work page 2012
[30]

and R OBERT , C

D IEBOLT , J. and R OBERT , C. P. (1994). Estimation of finite mixture distributions through Bayesian sam- pling. Journal of the Royal Statistical Society: Series B (Methodological) 56 363–375

work page 1994
[31]

and D UNSON , D

D URANTE , D. and D UNSON , D. B. (2018). Bayesian inference and testing of group differences in brain networks

work page 2018
[32]

and M IASOJEDOW , B

D URMUS , A., M AJEWSKI , S. and M IASOJEDOW , B. (2019). Analysis of Langevin Monte Carlo via con- vex optimization. Journal of Machine Learning Research 20 1–46

work page 2019
[33]

and M OULINES , E

D URMUS , A. and M OULINES , E. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm

work page 2017
[34]

and M OULINES , E

D URMUS , A. and M OULINES , E. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm

work page 2019
[35]

and W AGNER , H

D VORZAK , M. and W AGNER , H. (2016). Sparse Bayesian modelling of underreported count data. Statis- tical Modelling 16 24–46

work page 2016
[36]

D WIVEDI , R., C HEN , Y., WAINWRIGHT , M. J. and Y U, B. (2019). Log-concave sampling: Metropolis- Hastings algorithms are fast. Journal of Machine Learning Research 20 1–42

work page 2019
[37]

and KANNAN , R

D YER , M., F RIEZE , A. and KANNAN , R. (1991). A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM (JACM) 38 1–17

work page 1991
[38]

E LDAN , R. (2013). Thin shell implies spectral gap up to polylog via a stochastic localization scheme. Geometric and Functional Analysis 23 532–569

work page 2013
[39]

A., H OSSEINZADEH , R

E RDOGDU , M. A., H OSSEINZADEH , R. and ZHANG , S. (2022). Convergence of Langevin Monte Carlo in chi-squared and Rényi divergence. In International Conference on Artificial Intelligence and Statis- tics 8151–8175. PMLR

work page 2022
[40]

and F RÜHWIRTH , R

F RUEHWIRTH -S CHNATTER , S. and F RÜHWIRTH , R. (2007). Auxiliary mixture sampling with applica- tions to logistic models. Computational Statistics & Data Analysis 51 3509–3528

work page 2007
[41]

G RANT , E. H. C., M ILLER , D. A., S CHMIDT , B. R., A DAMS , M. J., A MBURGEY , S. M., C HAM - BERT, T., C RUICKSHANK , S. S., F ISHER , R. N., G REEN , D. M., H OSSACK , B. R. et al. (2016). Quantitative evidence for the effects of multiple drivers on continental-scale amphibian declines.Sci- entific reports 6 25625

work page 2016
[42]

E., M ATECHOU , E., B UXTON , A

G RIFFIN , J. E., M ATECHOU , E., B UXTON , A. S., B ORMPOUDAKIS , D. and G RIFFITHS , R. A. (2020). Modelling environmental DNA data; Bayesian variable selection accounting for false positive and false negative errors. Journal of the Royal Statistical Society Series C: Applied Statistics 69 377– 392

work page 2020
[43]

H ANS , C. (2009). Bayesian lasso regression. Biometrika 96 835–845

work page 2009
[44]

and H OLMES , C

H ELD , L. and H OLMES , C. C. (2006). Bayesian auxiliary variable models for binary and multinomial regression

work page 2006
[45]

H OBERT , J. P. (2011). The data augmentation algorithm: Theory and methodology. Handbook of Markov Chain Monte Carlo 253–293

work page 2011
[46]

and S TROOCK , D

H OLLEY , R. and S TROOCK , D. W. (1986). Logarithmic Sobolev inequalities and stochastic Ising models

work page 1986
[47]

and S INCLAIR , A

J ERRUM , M. and S INCLAIR , A. (1989). Approximating the permanent. SIAM journal on computing 18 1149–1178

work page 1989
[48]

E., S MITH , A., P ILLAI , N

J OHNDROW , J. E., S MITH , A., P ILLAI , N. and D UNSON , D. B. (2019). MCMC for imbalanced categor- ical data. Journal of the American Statistical Association

work page 2019
[49]

J ONES , G. L. and H OBERT , J. P. (2001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statistical Science 312–334

work page 2001
[50]

J OSEPH , L., G YORKOS , T. W. and COUPAL , L. (1995). Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. American journal of epidemiology 141 263–272

work page 1995
[51]

and P RIMICERI , G

J USTINIANO , A. and P RIMICERI , G. E. (2008). The time-varying volatility of macroeconomic fluctua- tions. American Economic Review 98 604–641

work page 2008
[52]

and S IMONOVITS , M

K ANNAN , R., L OVÁSZ , L. and S IMONOVITS , M. (1995). Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry 13 541–559

work page 1995
[53]

and S IMONOVITS , M

K ANNAN , R., L OVÁSZ , L. and S IMONOVITS , M. (1997). Random walks and an o*(n5) volume algorithm for convex bodies. Random Structures & Algorithms 11 1–50

work page 1997
[54]

and H OBERT , J

K HARE , K. and H OBERT , J. P. (2013). Geometric ergodicity of the Bayesian lasso

work page 2013
[55]

and M ILMAN , E

K IM, Y.-H. and M ILMAN , E. (2011). A Generalization of Caffarelli’s Contraction Theorem via (reverse) Heat Flow. 28 LEE AND ZHANG

work page 2011
[56]

K LARTAG , B. (2023). Logarithmic bounds for isoperimetry and slices of convex sets. arXiv preprint arXiv:2303.14938

work page arXiv 2023
[57]

K OLESNIKOV , A. V. (2011). Mass transportation and contractions. arXiv preprint arXiv:1103.1479

work page internal anchor Pith review Pith/arXiv arXiv 2011
[58]

L AWLER , G. F. and S OKAL , A. D. (1988). Bounds on the L2 spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality. Transactions of the American mathematical so- ciety 309 557–580

work page 1988
[59]

T., S HEN , R

L EE, Y. T., S HEN , R. and T IAN , K. (2020). Logsmooth gradient concentration and tighter runtimes for Metropolized Hamiltonian Monte Carlo. In Conference on learning theory 2565–2597. PMLR

work page 2020
[60]

L EE, Y. T. and V EMPALA , S. S. (2017). Eldan’s stochastic localization and the KLS hyperplane conjec- ture: an improved lower bound for expansion. In2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) 998–1007. IEEE

work page 2017
[61]

L EVIN , D. A. and P ERES , Y. (2017). Markov chains and mixing times 107. American Mathematical Soc

work page 2017
[62]

S., W ONG , W

L IU, J. S., W ONG , W. H. and K ONG , A. (1994). Covariance structure of the Gibbs sampler with applica- tions to the comparisons of estimators and augmentation schemes. Biometrika 81 27–40

work page 1994
[63]

L OVÁSZ , L. (1999). Hit-and-run mixes fast. Mathematical programming 86 443–461

work page 1999
[64]

and S IMONOVITS , M

L OVÁSZ , L. and S IMONOVITS , M. (1993). Random walks in a convex body and an improved volume algorithm. Random structures & algorithms 4 359–412

work page 1993
[65]

and V EMPALA , S

L OVÁSZ , L. and V EMPALA , S. (2003). Hit-and-run is fast and fun. preprint, Microsoft Research

work page 2003
[66]

and VEMPALA , S

L OVÁSZ , L. and VEMPALA , S. (2004). Hit-and-run from a corner. InProceedings of the thirty-sixth annual ACM symposium on Theory of computing 310–314

work page 2004
[67]

and V EMPALA , S

L OVÁSZ , L. and V EMPALA , S. (2006). Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)57–68. IEEE

work page 2006
[68]

and V EMPALA , S

L OVÁSZ , L. and V EMPALA , S. (2007). The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms 30 307–358

work page 2007
[69]

S., C HENG , X., F LAMMARION , N., B ARTLETT , P

M A, Y.-A., C HATTERJI , N. S., C HENG , X., F LAMMARION , N., B ARTLETT , P. L. and J ORDAN , M. I. (2021). Is there an analog of Nesterov acceleration for gradient-based MCMC?

work page 2021
[70]

and Y I, N

M ALLICK , H. and Y I, N. (2014). A new Bayesian lasso. Statistics and its interface 7 571

work page 2014
[71]

and S MITH , A

M ANGOUBI , O. and S MITH , A. (2021). Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions: Continuous dynamics. The Annals of Applied Probability 31 2019–2045

work page 2021
[72]

and S HENFELD , Y

M IKULINCER , D. and S HENFELD , Y. (2024). The Brownian transport map. Probability Theory and Re- lated Fields 1–66

work page 2024
[73]

M ILMAN , E. (2010). Isoperimetric and concentration inequalities: equivalence under curvature lower bound

work page 2010
[74]

M ILMAN , E. (2012). Properties of isoperimetric, functional and transport-entropy inequalities via concen- tration. Probability Theory and Related Fields 152 475–507

work page 2012
[75]

and S ODIN , S

M ILMAN , E. and S ODIN , S. (2008). An isoperimetric inequality for uniformly log-concave measures and uniformly convex bodies. Journal of Functional Analysis 254 1235–1268

work page 2008
[76]

M ORGAN , F. (2005). Manifolds with density. Notices of the AMS 52 853–858

work page 2005
[77]

J., B ARTLETT , P

M OU, W., H O, N., W AINWRIGHT , M. J., B ARTLETT , P. L. and J ORDAN , M. I. (2019). Sampling for bayesian mixture models: Mcmc with polynomial-time mixing. arXiv preprint arXiv:1912.05153

work page arXiv 2019
[78]

K., H E, Y., B ALASUBRAMANIAN , K

M OUSAVI -H OSSEINI , A., F ARGHLY, T. K., H E, Y., B ALASUBRAMANIAN , K. and E RDOGDU , M. A. (2023). Towards a complete analysis of langevin monte carlo: Beyond poincaré inequality. In The Thirty Sixth Annual Conference on Learning Theory 1–35. PMLR

work page 2023
[79]

N ARAYANAN , H. (2016). Randomized interior point methods for sampling and optimization

work page 2016
[80]

H., N GO, V

N GUYEN , H. H., N GO, V. M. and T RAN , A. N. T. (2021). Financial performances, entrepreneurial fac- tors and coping strategy to survive in the COVID-19 pandemic: case of Vietnam. Research in Inter- national Business and Finance 56 101380

work page 2021

Showing first 80 references.

[1] [1]

and T OMCZAK -JAEGERMANN , N

A DAMCZAK , R., L ITVAK , A., P AJOR , A. and T OMCZAK -JAEGERMANN , N. (2010). Quantitative esti- mates of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society 23 535–561

work page 2010

[2] [2]

E., P AJOR , A

A DAMCZAK , R., L ITVAK , A. E., P AJOR , A. and T OMCZAK -JAEGERMANN , N. (2011). Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus. Mathématique 349 195–200

work page 2011

[3] [3]

A LBERT , J. H. and C HIB , S. (1993). Bayesian analysis of binary and polychotomous response data. Jour- nal of the American statistical Association 88 669–679

work page 1993

[4] [4]

and B ASTERO , J

A LONSO -GUTIÉRREZ , D. and B ASTERO , J. (2015). Approaching the Kannan-Lovász-Simonovits and variance conjectures 2131. Springer

work page 2015

[5] [5]

A LTSCHULER , J. M. and C HEWI , S. (2024). Faster high-accuracy log-concave sampling via algorithmic warm starts. Journal of the ACM 71 1–55

work page 2024

[6] [6]

and ZANELLA , G

A SCOLANI , F., LAVENANT , H. and ZANELLA , G. (2024). Entropy contraction of the Gibbs sampler under log-concavity. arXiv preprint arXiv:2410.00858

work page arXiv 2024

[7] [7]

A., S ALIM , A

B ALASUBRAMANIAN , K., C HEWI , S., E RDOGDU , M. A., S ALIM , A. and Z HANG , S. (2022). Towards a theory of non-log-concave sampling: first-order stationarity guarantees for langevin monte carlo. In Conference on Learning Theory 2896–2923. PMLR

work page 2022

[8] [8]

Spectral gaps, symmetries and log-concave perturbations

B ARTHE , F. and K LARTAG , B. (2019). Spectral gaps, symmetries and log-concave perturbations. arXiv preprint arXiv:1907.01823

work page internal anchor Pith review Pith/arXiv arXiv 2019

[9] [9]

and M ILMAN , E

B ARTHE , F. and M ILMAN , E. (2013). Transference principles for log-Sobolev and spectral-gap with ap- plications to conservative spin systems. Communications in Mathematical Physics 323 575–625

work page 2013

[10] [10]

B OBKOV, S. G. (1999). Isoperimetric and analytic inequalities for log-concave probability measures. The Annals of Probability 27 1903–1921

work page 1999

[11] [11]

B OBKOV, S. G. and H OUDRÉ , C. (1997). Isoperimetric constants for product probability measures. The Annals of Probability 184–205

work page 1997

[12] [12]

C AFFARELLI , L. A. (2000). Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics 214 547–563

work page 2000

[13] [13]

and GUILLIN , A

C ATTIAUX , P. and GUILLIN , A. (2020). On the Poincaré constant of log-concave measures. In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2017-2019 Volume I171–217. Springer

work page 2020

[14] [14]

and GATMIRY, K

C HEN , Y. and GATMIRY, K. (2023). When does Metropolized Hamiltonian Monte Carlo provably outper- form Metropolis-adjusted Langevin algorithm? arXiv preprint arXiv:2304.04724

work page arXiv 2023

[15] [15]

C HEN , Y., DWIVEDI , R., W AINWRIGHT , M. J. and Y U, B. (2018). Fast MCMC sampling algorithms on polytopes. Journal of Machine Learning Research 19 1–86

work page 2018

[16] [16]

C HEN , Y., DWIVEDI , R., WAINWRIGHT , M. J. and Y U, B. (2020). Fast mixing of Metropolized Hamilto- nian Monte Carlo: Benefits of multi-step gradients.Journal of Machine Learning Research21 1–72

work page 2020

[17] [17]

and B ARTLETT , P

C HENG , X. and B ARTLETT , P. (2018). Convergence of Langevin MCMC in KL-divergence. In Algorith- mic Learning Theory 186–211. PMLR

work page 2018

[18] [18]

C HEWI , S. (2023). Log-concave sampling. Book draft available at https://chewisinho. github. io

work page 2023

[19] [19]

and R IGOLLET , P

C HEWI , S., L U, C., A HN, K., C HENG , X., L E GOUIC , T. and R IGOLLET , P. (2021). Optimal dimen- sion dependence of the Metropolis-adjusted Langevin algorithm. In Conference on Learning Theory 1260–1300. PMLR

work page 2021

[20] [20]

A., L I, M., S HEN , R

C HEWI , S., E RDOGDU , M. A., L I, M., S HEN , R. and Z HANG , M. S. (2024). Analysis of langevin monte carlo from poincare to log-sobolev. Foundations of Computational Mathematics 1–51

work page 2024

[21] [21]

C HOI , H. M. and H OBERT , J. P. (2013). The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic

work page 2013

[22] [22]

and EINAV, L

C OHEN , A. and EINAV, L. (2007). Estimating risk preferences from deductible choice.American economic review 97 745–788

work page 2007

[23] [23]

C OURTADE , T. A. (2020). Bounds on the Poincaré constant for convolution measures

work page 2020

[24] [24]

and V EMPALA , S

C OUSINS , B. and V EMPALA , S. (2014). A cubic algorithm for computing Gaussian volume. In Proceed- ings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms 1215–1228. SIAM

work page 2014

[25] [25]

and L IU, J

D AI, Y., GAO, Y., HUANG , J., J IAO, Y., KANG , L. and L IU, J. (2023). Lipschitz Transport Maps via the Follmer Flow. arXiv preprint arXiv:2309.03490

work page arXiv 2023

[26] [26]

D ALALYAN , A. (2017a). Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Conference on Learning Theory 678–689. PMLR

work page

[27] [27]

D ALALYAN , A. S. (2017b). Theoretical guarantees for approximate sampling from smooth and log- concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology 79 651– 676. FAST MIXING OF DATA AUGMENTATION ALGORITHMS 27

work page

[28] [28]

D ALALYAN , A. S. and K ARAGULYAN , A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications 129 5278–5311

work page 2019

[29] [29]

D ALALYAN , A. S. and T SYBAKOV , A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences 78 1423–1443

work page 2012

[30] [30]

and R OBERT , C

D IEBOLT , J. and R OBERT , C. P. (1994). Estimation of finite mixture distributions through Bayesian sam- pling. Journal of the Royal Statistical Society: Series B (Methodological) 56 363–375

work page 1994

[31] [31]

and D UNSON , D

D URANTE , D. and D UNSON , D. B. (2018). Bayesian inference and testing of group differences in brain networks

work page 2018

[32] [32]

and M IASOJEDOW , B

D URMUS , A., M AJEWSKI , S. and M IASOJEDOW , B. (2019). Analysis of Langevin Monte Carlo via con- vex optimization. Journal of Machine Learning Research 20 1–46

work page 2019

[33] [33]

and M OULINES , E

D URMUS , A. and M OULINES , E. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm

work page 2017

[34] [34]

and M OULINES , E

D URMUS , A. and M OULINES , E. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm

work page 2019

[35] [35]

and W AGNER , H

D VORZAK , M. and W AGNER , H. (2016). Sparse Bayesian modelling of underreported count data. Statis- tical Modelling 16 24–46

work page 2016

[36] [36]

D WIVEDI , R., C HEN , Y., WAINWRIGHT , M. J. and Y U, B. (2019). Log-concave sampling: Metropolis- Hastings algorithms are fast. Journal of Machine Learning Research 20 1–42

work page 2019

[37] [37]

and KANNAN , R

D YER , M., F RIEZE , A. and KANNAN , R. (1991). A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM (JACM) 38 1–17

work page 1991

[38] [38]

E LDAN , R. (2013). Thin shell implies spectral gap up to polylog via a stochastic localization scheme. Geometric and Functional Analysis 23 532–569

work page 2013

[39] [39]

A., H OSSEINZADEH , R

E RDOGDU , M. A., H OSSEINZADEH , R. and ZHANG , S. (2022). Convergence of Langevin Monte Carlo in chi-squared and Rényi divergence. In International Conference on Artificial Intelligence and Statis- tics 8151–8175. PMLR

work page 2022

[40] [40]

and F RÜHWIRTH , R

F RUEHWIRTH -S CHNATTER , S. and F RÜHWIRTH , R. (2007). Auxiliary mixture sampling with applica- tions to logistic models. Computational Statistics & Data Analysis 51 3509–3528

work page 2007

[41] [41]

G RANT , E. H. C., M ILLER , D. A., S CHMIDT , B. R., A DAMS , M. J., A MBURGEY , S. M., C HAM - BERT, T., C RUICKSHANK , S. S., F ISHER , R. N., G REEN , D. M., H OSSACK , B. R. et al. (2016). Quantitative evidence for the effects of multiple drivers on continental-scale amphibian declines.Sci- entific reports 6 25625

work page 2016

[42] [42]

E., M ATECHOU , E., B UXTON , A

G RIFFIN , J. E., M ATECHOU , E., B UXTON , A. S., B ORMPOUDAKIS , D. and G RIFFITHS , R. A. (2020). Modelling environmental DNA data; Bayesian variable selection accounting for false positive and false negative errors. Journal of the Royal Statistical Society Series C: Applied Statistics 69 377– 392

work page 2020

[43] [43]

H ANS , C. (2009). Bayesian lasso regression. Biometrika 96 835–845

work page 2009

[44] [44]

and H OLMES , C

H ELD , L. and H OLMES , C. C. (2006). Bayesian auxiliary variable models for binary and multinomial regression

work page 2006

[45] [45]

H OBERT , J. P. (2011). The data augmentation algorithm: Theory and methodology. Handbook of Markov Chain Monte Carlo 253–293

work page 2011

[46] [46]

and S TROOCK , D

H OLLEY , R. and S TROOCK , D. W. (1986). Logarithmic Sobolev inequalities and stochastic Ising models

work page 1986

[47] [47]

and S INCLAIR , A

J ERRUM , M. and S INCLAIR , A. (1989). Approximating the permanent. SIAM journal on computing 18 1149–1178

work page 1989

[48] [48]

E., S MITH , A., P ILLAI , N

J OHNDROW , J. E., S MITH , A., P ILLAI , N. and D UNSON , D. B. (2019). MCMC for imbalanced categor- ical data. Journal of the American Statistical Association

work page 2019

[49] [49]

J ONES , G. L. and H OBERT , J. P. (2001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statistical Science 312–334

work page 2001

[50] [50]

J OSEPH , L., G YORKOS , T. W. and COUPAL , L. (1995). Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. American journal of epidemiology 141 263–272

work page 1995

[51] [51]

and P RIMICERI , G

J USTINIANO , A. and P RIMICERI , G. E. (2008). The time-varying volatility of macroeconomic fluctua- tions. American Economic Review 98 604–641

work page 2008

[52] [52]

and S IMONOVITS , M

K ANNAN , R., L OVÁSZ , L. and S IMONOVITS , M. (1995). Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry 13 541–559

work page 1995

[53] [53]

and S IMONOVITS , M

K ANNAN , R., L OVÁSZ , L. and S IMONOVITS , M. (1997). Random walks and an o*(n5) volume algorithm for convex bodies. Random Structures & Algorithms 11 1–50

work page 1997

[54] [54]

and H OBERT , J

K HARE , K. and H OBERT , J. P. (2013). Geometric ergodicity of the Bayesian lasso

work page 2013

[55] [55]

and M ILMAN , E

K IM, Y.-H. and M ILMAN , E. (2011). A Generalization of Caffarelli’s Contraction Theorem via (reverse) Heat Flow. 28 LEE AND ZHANG

work page 2011

[56] [56]

K LARTAG , B. (2023). Logarithmic bounds for isoperimetry and slices of convex sets. arXiv preprint arXiv:2303.14938

work page arXiv 2023

[57] [57]

K OLESNIKOV , A. V. (2011). Mass transportation and contractions. arXiv preprint arXiv:1103.1479

work page internal anchor Pith review Pith/arXiv arXiv 2011

[58] [58]

L AWLER , G. F. and S OKAL , A. D. (1988). Bounds on the L2 spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality. Transactions of the American mathematical so- ciety 309 557–580

work page 1988

[59] [59]

T., S HEN , R

L EE, Y. T., S HEN , R. and T IAN , K. (2020). Logsmooth gradient concentration and tighter runtimes for Metropolized Hamiltonian Monte Carlo. In Conference on learning theory 2565–2597. PMLR

work page 2020

[60] [60]

L EE, Y. T. and V EMPALA , S. S. (2017). Eldan’s stochastic localization and the KLS hyperplane conjec- ture: an improved lower bound for expansion. In2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) 998–1007. IEEE

work page 2017

[61] [61]

L EVIN , D. A. and P ERES , Y. (2017). Markov chains and mixing times 107. American Mathematical Soc

work page 2017

[62] [62]

S., W ONG , W

L IU, J. S., W ONG , W. H. and K ONG , A. (1994). Covariance structure of the Gibbs sampler with applica- tions to the comparisons of estimators and augmentation schemes. Biometrika 81 27–40

work page 1994

[63] [63]

L OVÁSZ , L. (1999). Hit-and-run mixes fast. Mathematical programming 86 443–461

work page 1999

[64] [64]

and S IMONOVITS , M

L OVÁSZ , L. and S IMONOVITS , M. (1993). Random walks in a convex body and an improved volume algorithm. Random structures & algorithms 4 359–412

work page 1993

[65] [65]

and V EMPALA , S

L OVÁSZ , L. and V EMPALA , S. (2003). Hit-and-run is fast and fun. preprint, Microsoft Research

work page 2003

[66] [66]

and VEMPALA , S

L OVÁSZ , L. and VEMPALA , S. (2004). Hit-and-run from a corner. InProceedings of the thirty-sixth annual ACM symposium on Theory of computing 310–314

work page 2004

[67] [67]

and V EMPALA , S

L OVÁSZ , L. and V EMPALA , S. (2006). Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)57–68. IEEE

work page 2006

[68] [68]

and V EMPALA , S

L OVÁSZ , L. and V EMPALA , S. (2007). The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms 30 307–358

work page 2007

[69] [69]

S., C HENG , X., F LAMMARION , N., B ARTLETT , P

M A, Y.-A., C HATTERJI , N. S., C HENG , X., F LAMMARION , N., B ARTLETT , P. L. and J ORDAN , M. I. (2021). Is there an analog of Nesterov acceleration for gradient-based MCMC?

work page 2021

[70] [70]

and Y I, N

M ALLICK , H. and Y I, N. (2014). A new Bayesian lasso. Statistics and its interface 7 571

work page 2014

[71] [71]

and S MITH , A

M ANGOUBI , O. and S MITH , A. (2021). Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions: Continuous dynamics. The Annals of Applied Probability 31 2019–2045

work page 2021

[72] [72]

and S HENFELD , Y

M IKULINCER , D. and S HENFELD , Y. (2024). The Brownian transport map. Probability Theory and Re- lated Fields 1–66

work page 2024

[73] [73]

M ILMAN , E. (2010). Isoperimetric and concentration inequalities: equivalence under curvature lower bound

work page 2010

[74] [74]

M ILMAN , E. (2012). Properties of isoperimetric, functional and transport-entropy inequalities via concen- tration. Probability Theory and Related Fields 152 475–507

work page 2012

[75] [75]

and S ODIN , S

M ILMAN , E. and S ODIN , S. (2008). An isoperimetric inequality for uniformly log-concave measures and uniformly convex bodies. Journal of Functional Analysis 254 1235–1268

work page 2008

[76] [76]

M ORGAN , F. (2005). Manifolds with density. Notices of the AMS 52 853–858

work page 2005

[77] [77]

J., B ARTLETT , P

M OU, W., H O, N., W AINWRIGHT , M. J., B ARTLETT , P. L. and J ORDAN , M. I. (2019). Sampling for bayesian mixture models: Mcmc with polynomial-time mixing. arXiv preprint arXiv:1912.05153

work page arXiv 2019

[78] [78]

K., H E, Y., B ALASUBRAMANIAN , K

M OUSAVI -H OSSEINI , A., F ARGHLY, T. K., H E, Y., B ALASUBRAMANIAN , K. and E RDOGDU , M. A. (2023). Towards a complete analysis of langevin monte carlo: Beyond poincaré inequality. In The Thirty Sixth Annual Conference on Learning Theory 1–35. PMLR

work page 2023

[79] [79]

N ARAYANAN , H. (2016). Randomized interior point methods for sampling and optimization

work page 2016

[80] [80]

H., N GO, V

N GUYEN , H. H., N GO, V. M. and T RAN , A. N. T. (2021). Financial performances, entrepreneurial fac- tors and coping strategy to survive in the COVID-19 pandemic: case of Vietnam. Research in Inter- national Business and Finance 56 101380

work page 2021