pith. sign in

arxiv: 2605.27769 · v1 · pith:RDC4AS2Wnew · submitted 2026-05-26 · 💻 cs.DS · cs.IT· cs.LG· math.IT· stat.ML

Smoothed Score Queries and the Complexity of Sampling

Pith reviewed 2026-06-29 14:41 UTC · model grok-4.3

classification 💻 cs.DS cs.ITcs.LGmath.ITstat.ML
keywords sampling query complexitysmoothed scoresGaussian distributionscondition numberrational approximationtotal variation errorcommunication lower boundoracle model
0
0 comments X

The pith

Smoothed-score queries allow Gaussian sampling with only logarithmic dependence on condition number.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the query complexity of sampling from high-dimensional Gaussians when the algorithm may query smoothed scores rather than ordinary gradients. Ordinary gradients supply only matrix-vector products with the precision matrix and force sampling methods to scale with the square root of the condition number. Smoothed scores at a chosen noise level return the resolvent of the precision matrix, and spacing those levels geometrically while approximating the needed operations by rational functions reduces the query count to the product of two logarithmic terms. The construction also yields a finite-bit scheme whose total communicated information is polylogarithmic in the condition number. A matching lower bound of Omega(log kappa) is established by converting total-variation guarantees into communication requirements via a channel-synthesis argument.

Core claim

For a Gaussian target with precision matrix Lambda, a smoothed-score query at noise level tau gives access to the resolvent (Lambda + tau^{-1} I)^{-1}. Combining geometrically spaced noise levels with sinc-quadrature rational approximation produces a sampler using q = O((log kappa + log(e sqrt(d) / delta_TV)) log(e sqrt(d) / delta_TV)) such queries to achieve total-variation error delta_TV. This replaces the square-root dependence of standard gradient oracles with logarithmic dependence on kappa. The same approach with coordinatewise quantization and dithering yields total communicated gradient information that is O(log^2 kappa) for fixed dimension and accuracy, while a reverse-Shannon conve

What carries the argument

Smoothed-score query at noise level tau, which returns the resolvent (Lambda + tau^{-1} I)^{-1} for Gaussian target with precision matrix Lambda.

If this is right

  • The number of smoothed-score queries scales with the product of two logarithmic factors in kappa and accuracy rather than with the square root of kappa.
  • Total communicated bits of gradient information can be reduced to O(log squared kappa) for fixed dimension and accuracy.
  • An Omega(log kappa) lower bound on the required gradient information holds via the channel-synthesis technique.
  • Smoothed scores form a strictly stronger oracle than exact gradients for the Gaussian sampling task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Rational approximation of resolvents may reduce condition-number dependence in other matrix-function tasks that arise in sampling or integration.
  • The channel-synthesis lower-bound technique could be applied to prove communication requirements for sampling from distributions that are only approximately Gaussian.
  • If a target is close to Gaussian, the same geometrically spaced query schedule might still control total-variation error once the smoothing mismatch is bounded.
  • Defining smoothed-score oracles for non-Gaussian targets would require showing that the resolvent-like information can be obtained or approximated without the exact Gaussian convolution.

Load-bearing premise

The target distribution is exactly Gaussian so that each smoothed-score query equals the resolvent exactly and the algorithm may freely select the noise levels.

What would settle it

Measure the number of smoothed-score queries needed to reach fixed total-variation error on a non-Gaussian target whose effective condition number is large; if the count still scales only logarithmically then the claim holds, otherwise the Gaussian assumption is necessary.

Figures

Figures reproduced from arXiv: 2605.27769 by Jingbo Liu.

Figure 1
Figure 1. Figure 1: Channel-synthesis view of finite-information sampling. The only information about the [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
read the original abstract

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic \(\sqrt{\kappa}\) dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix \(\Lambda\), a smoothed-score query at noise level \(\tau\) gives access to the resolvent \((\Lambda+\tau^{-1}I)^{-1}\). Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\log\kappa+\log(e\sqrt d/\delta_{\rm TV})\bigr)\log(e\sqrt d/\delta_{\rm TV})\right)$ smoothed-score queries for total variation error \(\delta_{\rm TV}\), improving the condition-number dependence from \(\sqrt{\kappa}\) to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in \(\kappa\); in particular, for fixed dimension and accuracy, the bit complexity is \(O(\log^2\kappa)\). To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an \(\Omega(\log\kappa)\) lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims that smoothed-score queries (gradients of log-densities after Gaussian convolution) provide access to resolvents (Λ + τ^{-1}I)^{-1} for a Gaussian target with precision matrix Λ. By combining geometrically spaced noise levels τ with sinc-quadrature rational approximation, it obtains an algorithm using q = O((log κ + log(e√d/δ_TV)) log(e√d/δ_TV)) such queries to achieve total-variation error δ_TV, replacing the classic √κ dependence with logarithmic. It further gives a finite-bit variant whose communicated gradient information is O(log² κ) for fixed d and accuracy, and proves an Ω(log κ) lower bound on gradient information via a channel-synthesis (reverse-Shannon) converse that converts TV simulation guarantees into communication requirements.

Significance. If correct, the result establishes smoothed scores as a provably stronger oracle for exact Gaussian sampling, with nearly matching upper and lower bounds on query and bit complexity. Credit is due for the explicit reduction to standard resolvent approximation via sinc quadrature (which yields the logarithmic κ dependence without hidden constants or post-hoc tuning) and for introducing the channel-synthesis technique to obtain the matching Ω(log κ) lower bound. The bounds are parameter-free in the stated sense and directly falsifiable via the explicit construction.

minor comments (2)
  1. [Abstract] Abstract: the notation δ_TV is introduced without an explicit forward reference to its definition in the main text; a parenthetical “(see §2)” would improve readability.
  2. The finite-bit section states that coordinatewise quantization plus dithering yields polylogarithmic communication, but does not explicitly bound the bit length per coordinate; adding a short sentence with the per-coordinate bit cost would clarify the O(log² κ) claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thorough summary of the manuscript and for the positive recommendation to accept. The report accurately captures the main contributions, including the resolvent reduction via smoothed scores, the sinc-quadrature construction yielding the logarithmic condition-number dependence, the finite-bit variant, and the channel-synthesis lower bound.

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external resolvent identities and approximation theory

full rationale

The paper derives its query-complexity upper bound by invoking the exact identity that a smoothed-score query at level τ equals the resolvent (Λ + τ^{-1}I)^{-1} for a Gaussian target, then applies geometrically spaced τ values together with standard sinc-quadrature rational approximation. Both the resolvent identity and the quadrature technique are standard facts from linear algebra and approximation theory, independent of any quantities defined or fitted inside the paper. The matching Ω(log κ) lower bound is obtained via a channel-synthesis converse that converts total-variation simulation guarantees into communication requirements; this is likewise an external information-theoretic argument. No equation reduces by construction to a fitted parameter, no uniqueness theorem is imported from the authors' prior work, and no self-citation is load-bearing for the central claim. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claims rest on the algebraic identity that the smoothed score of a Gaussian equals the resolvent of its precision matrix; this identity is treated as a standard fact rather than derived or fitted inside the paper. No free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)
  • standard math For a Gaussian with precision matrix Λ, the gradient of the log-density after convolution with N(0,τI) equals (Λ + τ^{-1}I)^{-1} applied to the query point.
    Invoked in the second sentence of the abstract as the definition of the smoothed-score oracle.

pith-pipeline@v0.9.1-grok · 5862 in / 1515 out tokens · 33779 ms · 2026-06-29T14:41:35.348501+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 10 canonical work pages · 3 internal anchors

  1. [1]

    Aune, E., Eidsvik, J., and Pokern, Y. (2013). Iterative numerical methods for sampling from high dimensional Gaussian distributions.Statistics and Computing, 23(4):501–521

  2. [2]

    H., Devetak, I., Harrow, A

    Bennett, C. H., Devetak, I., Harrow, A. W., Shor, P. W., and Winter, A. (2014). The quantum reverse Shannon theorem and resource tradeoffs for simulating quantum channels.IEEE Transactions on Information Theory, 60(5):2926–2959

  3. [3]

    H., Shor, P

    Bennett, C. H., Shor, P. W., Smolin, J. A., and Thapliyal, A. V. (2002). Entanglement-assisted ca- pacity of a quantum channel and the reverse Shannon theorem.IEEE Transactions on Information Theory, 48(10):2637–2655

  4. [4]

    Berta, M., Christandl, M., and Renner, R. (2011). The quantum reverse Shannon theorem based on one-shot information theory.Communications in Mathematical Physics, 306(3):579–615

  5. [5]

    Chen, F., Chewi, S., Daskalakis, C., and Rakhlin, A. (2026). High-accuracy sampling for diffusion models and log-concave distributions.arXiv preprint arXiv:2602.01338

  6. [6]

    Chen, S., Chewi, S., Li, J., Li, Y., Salim, A., and Zhang, A. R. (2023b). Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. InInternational Conference on Learning Representations. arXiv:2209.11215

  7. [7]

    Chen, T., Greenbaum, A., Musco, C., and Musco, C. (2022). Error bounds for Lanczos-based matrix function approximation.SIAM Journal on Matrix Analysis and Applications, 43(2):787–811

  8. [8]

    Chewi, S., de Dios Pont, J., Li, J., Lu, C., and Narayanan, S. (2024). Query lower bounds for log-concave sampling.Journal of the ACM, 71(4):29:1–29:42. Preliminary version in FOCS 2023; arXiv:2304.02599

  9. [9]

    Chewi, S., Gerber, P., Lee, H., and Lu, C. (2023). Fisher information lower bounds for sampling. In Proceedings of the 34th International Conference on Algorithmic Learning Theory, volume 201 of Proceedings of Machine Learning Research, pages 375–410. PMLR. arXiv:2210.02482

  10. [10]

    R., Lu, C., Le Gouic, T., and Rigollet, P

    Chewi, S., Gerber, P. R., Lu, C., Le Gouic, T., and Rigollet, P. (2022). The query complexity of sampling from strongly log-concave distributions in one dimension. InProceedings of the Thirty Fifth Conference on Learning Theory, volume 178 ofProceedings of Machine Learning Research, pages 2041–2059. PMLR

  11. [11]

    and Saad, Y

    Chow, E. and Saad, Y. (2014). Preconditioned Krylov subspace methods for sampling multivariate Gaussian distributions.SIAM Journal on Scientific Computing, 36(2):A588–A608

  12. [12]

    Cuff, P. (2013). Distributed channel synthesis.IEEE Transactions on Information Theory, 59(11):7071–7096

  13. [13]

    Gallager, R. G. (1968).Information Theory and Reliable Communication. John Wiley & Sons, New York

  14. [14]

    Gao and L

    Gao, X. and Zhu, L. (2025). Convergence analysis for general probability-flow ODEs of diffusion models in wasserstein distances. InInternational Conference on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Research, pages 1009–1017. arXiv:2401.17958

  15. [15]

    Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pages 6840–6851

  16. [16]

    and Li, G

    Jiao, Y. and Li, G. (2024). Instance-dependent convergence theory for diffusion models.arXiv preprint arXiv:2410.13738

  17. [17]

    Jiao, Y., Zhou, Y., and Li, G. (2025). Optimal convergence analysis of DDPM for general distributions. arXiv preprint arXiv:2510.27562

  18. [18]

    (2017).Eγ-resolvability.IEEE Transactions on Information Theory, 63(5):2629–2658

    Liu, J., Cuff, P., and Verdú, S. (2017).Eγ-resolvability.IEEE Transactions on Information Theory, 63(5):2629–2658

  19. [19]

    and Verdú, S

    Liu, J. and Verdú, S. (2018). Rejection sampling and noncausal sampling under moment constraints. In2018 IEEE International Symposium on Information Theory (ISIT), pages 1565–1569

  20. [20]

    (2023).Upper and Lower Bounds for Sampling

    Lu, C. (2023).Upper and Lower Bounds for Sampling. PhD thesis, Massachusetts Institute of Technology

  21. [21]

    Mahajan, J., Zhang, K., Liang, F., and Liu, J. (2025). The Picard-Lagrange framework for higher- order Langevin Monte Carlo.arXiv preprint arXiv:2510.18242. 34

  22. [22]

    Musco, C., Musco, C., and Sidford, A. (2018). Stability of the Lanczos method for matrix function approximation. InProceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1605–1624

  23. [23]

    Neal, R. M. (2001). Annealed importance sampling.Statistics and Computing, 11(2):125–139

  24. [24]

    Okayama, T., Matsuo, T., and Sugihara, M. (2013). Error estimates with explicit constants for sinc approximation, sinc quadrature and sinc indefinite integration.Numerische Mathematik, 124(2):361–394

  25. [25]

    P., Kumar, A., Ermon, S., and Poole, B

    Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. (2021). Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations

  26. [26]

    (1993).Numerical Methods Based on Sinc and Analytic Functions, volume 20 ofSpringer Series in Computational Mathematics

    Stenger, F. (1993).Numerical Methods Based on Sinc and Analytic Functions, volume 20 ofSpringer Series in Computational Mathematics. Springer

  27. [27]

    Trefethen, L. N. and Weideman, J. A. C. (2014). The exponentially convergent trapezoidal rule. SIAM Review, 56(3):385–458

  28. [28]

    (2018).High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 ofCambridge Series in Statistical and Probabilistic Mathematics

    Vershynin, R. (2018).High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press

  29. [29]

    Query Lower Bounds for Diffusion Sampling

    Xun, Z. and Price, E. (2026). Query lower bounds for diffusion sampling.arXiv preprint arXiv:2604.10857

  30. [30]

    S., Huan, S., Huang, J., Boffi, N

    Zhang, M. S., Huan, S., Huang, J., Boffi, N. M., Chen, S., and Chewi, S. (2025). Sublinear iterations can suffice even for DDPMs.arXiv preprint arXiv:2511.04844. 35