High-accuracy log-concave sampling with stochastic queries
Pith reviewed 2026-05-21 12:28 UTC · model grok-4.3
The pith
Log-concave sampling reaches poly-log accuracy with subexponential stochastic gradients.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
High-accuracy guarantees for log-concave sampling—that is, iteration and query complexities scaling as poly log(1/δ)—are achievable using stochastic gradients with subexponential tails. An information-theoretic argument shows that light-tailed stochastic gradients are necessary: in the bounded-variance case the minimax query complexity is Θ(1/δ).
What carries the argument
A sampling procedure driven by unbiased stochastic gradient oracles whose noise satisfies a subexponential tail bound, allowing exponential error reduction per step.
If this is right
- Sampling algorithms tolerate gradient noise whose tails are subexponential without sacrificing high accuracy.
- Convex optimization and sampling exhibit qualitatively different sensitivity to the same form of stochasticity.
- High-accuracy sampling extends to oracles that return only noisy function values.
- Finite-sum log-concave potentials admit improved query complexities under the same tail assumption.
Where Pith is reading between the lines
- In large-scale machine-learning settings where gradients are obtained from mini-batches, sampling may remain feasible at high precision while optimization does not.
- The subexponential-tail condition could be relaxed further or verified for common variance-reduction techniques used in practice.
- Similar tail-based arguments might apply to sampling from distributions that are only approximately log-concave.
Load-bearing premise
Stochastic gradient estimates are unbiased and their deviations obey a subexponential tail bound.
What would settle it
An explicit log-concave target distribution together with a subexponential stochastic gradient oracle for which every algorithm requires poly(1/δ) queries to reach accuracy δ would disprove the upper bound.
read the original abstract
We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/\delta)$, where $\delta$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/\delta)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $\Theta(1/\delta)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries, and an improved complexity result for sampling from finite-sum potentials.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that log-concave sampling admits iteration and query complexities scaling as poly-log(1/δ) when the stochastic gradient oracle supplies unbiased estimates with subexponential tails. It establishes a separation from convex optimization (which requires poly(1/δ) even under Gaussian noise), supplies an information-theoretic lower bound showing Θ(1/δ) minimax query complexity under bounded variance, and extends the high-accuracy guarantees to stochastic zeroth-order oracles and finite-sum potentials.
Significance. If the upper-bound construction and lower-bound argument hold, the result provides a clean separation between sampling and optimization under stochastic oracles and clarifies the role of tail assumptions in achieving high accuracy. The information-theoretic necessity argument and the finite-sum improvement are particularly useful for guiding algorithm design in high-dimensional sampling.
major comments (2)
- [§3] §3 (main upper-bound theorem): the analysis must demonstrate explicitly that the subexponential tail bound controls the accumulated bias and variance over the claimed poly-log(1/δ) number of steps without introducing hidden poly(1/δ) factors or requiring stronger moment assumptions; the current sketch leaves open whether the total-variation or Wasserstein error bound remains O(δ) after the stated number of queries.
- [Theorem 4.1] Theorem 4.1 (information-theoretic lower bound): the reduction from sampling to a hypothesis-testing problem under bounded-variance noise appears to rely on a specific choice of target distribution; confirm that the Ω(1/δ) query lower bound continues to hold for the standard class of log-concave densities used in the upper bound.
minor comments (2)
- [§2] Notation for the subexponential parameter (e.g., the constant K in the tail bound) should be introduced once in §2 and used consistently; several later statements treat it as absorbed into poly-log factors without explicit dependence.
- [Figure 1] Figure 1 (complexity comparison): the plotted curves for sampling versus optimization should include the precise dependence on dimension and the subexponential parameter so that the separation is quantitatively visible.
Simulated Author's Rebuttal
We thank the referee for the detailed review and valuable suggestions. The comments help clarify the presentation of our results on the separation between log-concave sampling and convex optimization under stochastic oracles. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (main upper-bound theorem): the analysis must demonstrate explicitly that the subexponential tail bound controls the accumulated bias and variance over the claimed poly-log(1/δ) number of steps without introducing hidden poly(1/δ) factors or requiring stronger moment assumptions; the current sketch leaves open whether the total-variation or Wasserstein error bound remains O(δ) after the stated number of queries.
Authors: We appreciate this observation. The proof in Section 3 relies on the subexponential concentration to bound the error in each iteration, and because the total number of iterations is only polylogarithmic in 1/δ, a union bound over the steps allows us to control the total accumulated error with an additional logarithmic factor that is absorbed into the polylog(1/δ) complexity. The bias and variance terms do not accumulate to poly(1/δ) due to the light tails. To address the concern, we will expand the proof sketch into a full detailed proof in the revised manuscript, explicitly calculating the total variation or Wasserstein distance bound to confirm it is O(δ). revision: yes
-
Referee: [Theorem 4.1] Theorem 4.1 (information-theoretic lower bound): the reduction from sampling to a hypothesis-testing problem under bounded-variance noise appears to rely on a specific choice of target distribution; confirm that the Ω(1/δ) query lower bound continues to hold for the standard class of log-concave densities used in the upper bound.
Authors: The lower bound construction in Theorem 4.1 uses a family of log-concave distributions with bounded variance stochastic gradients that are representative of the standard class considered in the upper bound results (e.g., strongly log-concave potentials with sub-Gaussian or subexponential noise). The reduction to hypothesis testing is general and does not depend on a narrow choice; it applies to the minimax risk over log-concave densities satisfying the assumptions. We will add a clarifying paragraph in the revision to explicitly state that the lower bound holds for the same class of distributions as the upper bound. revision: yes
Circularity Check
No significant circularity; derivation self-contained
full rationale
The paper establishes an algorithmic upper bound achieving poly-log(1/δ) iteration and query complexity for log-concave sampling under subexponential-tailed stochastic gradient noise, together with an information-theoretic lower bound of Θ(1/δ) under bounded variance. These are presented as separate results: the upper bound via a stochastic-gradient MCMC construction whose error analysis uses the tail assumption to control accumulation, and the lower bound via a minimax argument. No equations or steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the separation from optimization and from bounded-variance sampling is derived from distinct assumptions and proof techniques. The framework is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show that high-accuracy guarantees for log-concave sampling... are achievable using stochastic gradients with subexponential tails.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.