The Power of Test-Time Training for Approximate Sampling
Pith reviewed 2026-06-27 11:05 UTC · model grok-4.3
The pith
Sampling from large classes of distributions requires quadratically many queries to an approximate density oracle, proving random walks optimal, while bounded classes allow fewer queries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show a quadratic lower bound on the query complexity of sampling from μ∗ given query access to ˆμ (for sufficiently large classes F), thus showing that the random walk approach proposed by Jerrum & Sinclair (1989) and refined by Hayes & Sinclair (2010), is optimal. This answers an open question posed by Hayes & Sinclair. We then show that this lower bound can be circumvented if the size of F is bounded appropriately.
What carries the argument
Query complexity of producing a sample from μ* using an approximate-density oracle ˆμ, parameterized by the cardinality of the known class F of possible target distributions.
If this is right
- For every class F above the size threshold, no sampling procedure can improve on quadratic query cost.
- The Jerrum-Sinclair random walk achieves the optimal query complexity for all sufficiently large F.
- When |F| is below the threshold, there exist sampling algorithms whose query cost is sub-quadratic.
- The bounded-F regime supplies a formal abstraction of test-time training that can be analyzed with existing counting-to-sampling techniques.
Where Pith is reading between the lines
- In practice, the effective class size for an LLM may be small enough under a fixed prompt distribution that test-time adaptation yields measurable query savings.
- The results motivate studying how to certify or enforce small effective |F| in concrete generative-model settings.
- Similar lower-bound and circumvention arguments may apply to other adaptive-inference tasks such as test-time fine-tuning for reasoning.
Load-bearing premise
The oracle supplies sufficiently accurate approximate density estimates, and the quadratic lower bound applies precisely when the class F exceeds a certain size threshold.
What would settle it
An explicit algorithm that produces a sample from μ* using o(n²) queries to ˆμ when F is larger than the size threshold would falsify the lower bound.
read the original abstract
Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been proposed to solve challenging reasoning problems. The efficacy of such sampling algorithms is limited, however, by the relationship between the LLM and the particular sampling task at hand, which has motivated the framework of test-time training (TTT). TTT works by updating a model's weights in response to partial generations and reward feedback received at inference time, thus adapting to the particular problem. In this work, we propose a formalization for TTT as the problem of producing a sample from a given probability measure $\mu^\star$ belonging to a known class ${F}$ of distributions, given an oracle $\hat \mu$ which yields approximate density estimates for $\mu^\star$. This is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989): namely, when ${F}$ is the class of all distributions, it coincides exactly with the aforementioned counting-to-sampling reduction. In this paper, we first show a quadratic lower bound on the query complexity of sampling from $\mu^\star$ given query access to $\hat \mu$ (for sufficiently large classes ${F}$), thus showing that the random walk approach proposed by Jerrum & Sinclair (1989) and refined by Hayes & Sinclair (2010), is optimal. This answers an open question posed by Hayes & Sinclair. We then show that this lower bound can be circumvented if the size of ${F}$ is bounded appropriately. As we discuss, this latter result can be viewed as an abstraction of TTT, and thus represents a starting point for the development of a principled theoretical framework for TTT.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formalizes test-time training (TTT) as sampling from an unknown μ∗ ∈ F (known class of distributions) given query access to an oracle ˆμ supplying approximate density estimates. When F is the class of all distributions this reduces exactly to the classical counting-to-sampling problem. The central results are (i) a quadratic query-complexity lower bound that holds for all sufficiently large F, establishing optimality of the Jerrum–Sinclair random-walk approach and answering an open question of Hayes & Sinclair, and (ii) a circumvention of the quadratic bound when |F| is suitably bounded, which the authors present as a theoretical abstraction of TTT.
Significance. If the claimed proofs are correct, the work supplies the first query-complexity lower bound that matches the long-standing random-walk upper bound in the general case and simultaneously exhibits a regime (bounded |F|) in which substantially fewer queries suffice. This supplies a clean complexity-theoretic explanation for why TTT can be effective when the underlying model class is restricted, and it places modern LLM sampling techniques on the same footing as the classical Jerrum–Valiant–Vazirani and Jerrum–Sinclair results. The absence of free parameters or self-referential reductions in the stated argument is a strength.
minor comments (2)
- The abstract states that proofs exist for both the quadratic lower bound and the bounded-class circumvention, but the precise definition of the oracle model (additive vs. multiplicative error, dependence on the approximation parameter), the exact threshold on |F| that triggers the lower bound, and the error terms in the reductions are not visible in the provided abstract. These details are load-bearing for verifying tightness.
- Notation for the target measure μ∗ and the oracle ˆμ is introduced cleanly, but the manuscript should include an explicit comparison table or paragraph showing how the new oracle model specializes to the standard counting oracle when F is unrestricted.
Simulated Author's Rebuttal
We thank the referee for the positive summary and for highlighting the significance of the quadratic lower bound matching the Jerrum-Sinclair random walk, the resolution of the Hayes-Sinclair question, and the bounded-|F| circumvention as a theoretical abstraction of TTT. No specific major comments were listed in the report.
Circularity Check
No circularity; lower bounds are independent of self-referential inputs
full rationale
The paper establishes query-complexity lower bounds for sampling from μ* given ˆμ-oracle access when F is large, showing optimality of the Jerrum-Sinclair random walk, and shows circumvention for bounded |F|. These are standard information-theoretic and reduction arguments in the counting-to-sampling setting. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citations are load-bearing for the central claims (citations are to Jerrum-Sinclair 1989 and Hayes-Sinclair 2010, independent of the present authors), and the oracle model is explicitly aligned with prior work without ansatz smuggling or renaming. The derivation chain is self-contained against external complexity benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existence of oracles that return approximate density estimates for distributions in a known class F.
- standard math Standard notions of query complexity and total variation distance from theoretical computer science.
Reference graph
Works this paper leans on
-
[1]
[ALOG+20] Nima Anari, Kuikui Liu, Shayan Oveis Gharan, Cynthia Vinzant, and Thuy Duong Vuong. Log-concave polynomials iv: Exchange properties, tight mixing times, and faster sampling of spanning trees.arXiv preprint arXiv:2004.07220,
-
[2]
Language Models are Few-Shot Learners
[BMR+20] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. Language models are few-shot learners.arXiv preprint arXiv:2005.14165,
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[3]
[CLV21] Zongchen Chen, Kuikui Liu, and Eric Vigoda. Spectral independence via stability and applications to holant-type problems.arXiv preprint arXiv:2106.03366,
-
[4]
Foster, Zakaria Mhammedi, and Dhruv Rohatgi
[FMR25] Dylan J. Foster, Zakaria Mhammedi, and Dhruv Rohatgi. Is a good foundation necessary for efficient reinforcement learning? the computational role of the base model in exploration.arXiv preprint arXiv:2503.07453,
-
[5]
Foster, and Akshay Krishnamurthy
[GCR+26] Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, and Akshay Krishnamurthy. Reject, resample, repeat: Understand- ing parallel reasoning in language model inference.arXiv preprint arXiv:2603.07887,
-
[6]
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
[GMAM25] Jonathan Geuter, Youssef Mroueh, and David Alvarez-Melis. Guided speculative inference for efficient test-time alignment of LLMs.arXiv preprint arXiv:2506.04118,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Denoising Diffusion Probabilistic Models
[HJA20] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239,
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[8]
Test-time learning for large language models
[HZC+25] Jinwu Hu, Zhitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, and Mingkui Tan. Test-time learning for large language models. arXiv preprint arXiv:2505.20633,
-
[9]
[LKB+23] Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step.arXiv preprint arXiv:2305.20050,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
[Ope23] OpenAI. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
53 [PSX+25] Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, and Akash Srivastava. A probabilistic inference approach to inference-time scaling of llms using particle-based monte carlo methods.arXiv preprint arXiv:2502.01618,
-
[12]
Lecture notes. [RSS+25] Dhruv Rohatgi, Abhishek Shetty, Donya Saless, Yuchen Li, Ankur Moitra, Andrej Risteski, and Dylan J Foster. Taming imperfect process verifiers: A sampling per- spective on backtracking.arXiv preprint arXiv:2510.03149,
-
[13]
Non-asymptotic Error Bounds for Sequential MCMC Methods in Multimodal Settings
[Sch12] Nikolaus Schweizer. Non-asymptotic error bounds for sequential mcmc methods in multimodal settings.arXiv preprint arXiv:1205.6733,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
[SDWMG15] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics.arXiv preprint arXiv:1503.03585,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
[SHT+25] Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848,
-
[16]
Learning to (learn at test time)
[SLD+23] Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiao- long Wang, Tatsunori Hashimoto, and Xinlei Chen. Learning to (learn at test time). arXiv preprint arXiv:2310.13807,
-
[17]
Score-Based Generative Modeling through Stochastic Differential Equations
[SSDK+20] Yang Song, Jascha Sohl-Dickstein, Diederik Kingma, Abhishek Kumar, Stefano Er- mon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456,
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[18]
[SWL+19] Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A. Efros, and Moritz Hardt. Test-time training with self-supervision for generalization under distribution shifts. arXiv preprint arXiv:1909.13231,
-
[19]
Gemini: A Family of Highly Capable Multimodal Models
[Tea23] Gemini Team. Gemini: A family of highly capable multimodal models.arXiv preprint arXiv:2312.11805,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Value-guided search for efficient chain-of-thought reasoning
[WZC+25] Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kiant´ e Brantley, and Wen Sun. Value-guided search for efficient chain-of-thought reasoning. arXiv preprint arXiv:2505.17373,
-
[21]
54 [XDY+24] Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, and Tong Zhang. Iterative preference learning from human feedback: Bridging theory and practice for RLHF under KL-constraint.arXiv preprint arXiv:2312.11456,
-
[22]
Fudge: Controlled text generation with future discrimi- nators
[YK21] Kevin Yang and Dan Klein. Fudge: Controlled text generation with future discrimi- nators. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3511–3535,
2021
-
[23]
Learning to Discover at Test Time
[YKL+26] Mert Yuksekg¨ on¨ ul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xi- aolong Wang, Jan Kautz, Yejin Choi, James Zou, Yu Sun, and Carlos Guestrin. Learning to discover at test time.arXiv preprint arXiv:2601.16175,
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
Ttcs: Test-time curriculum synthesis for self- evolving.arXiv preprint arXiv:2601.22628,
[YXT+26] Chengyi Yang, Zhishang Xiang, Yunbo Tang, Zongpei Teng, Chengsong Huang, Fei Long, Yuhan Liu, and Jinsong Su. Ttcs: Test-time curriculum synthesis for self- evolving.arXiv preprint arXiv:2601.22628,
-
[25]
[ZL26] Youheng Zhu and Yiping Lu. On the power of (approximate) reward models for inference-time scaling.arXiv preprint arXiv:2602.01381,
-
[26]
TTRL: Test-Time Reinforcement Learning
[ZZS+25] Yuxin Zuo, Kaiyan Zhang, Li Sheng, Shang Qu, Ganqu Cui, Xuekai Zhu, Haozhan Li, Yuchen Zhang, Xinwei Long, Ermo Hua, Biqing Qi, Youbang Sun, Zhiyuan Ma, Lifan Yuan, Ning Ding, and Bowen Zhou. Ttrl: Test-time reinforcement learning. arXiv preprint arXiv:2504.16084,
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
The Lessons of Developing Process Reward Models in Mathematical Reasoning
[ZZW+25] Zhenru Zhang, Chujie Zheng, Yangzhen Wu, Beichen Zhang, Runji Lin, Bowen Yu, Dayiheng Liu, Jingren Zhou, and Junyang Lin. The lessons of developing process reward models in mathematical reasoning.arXiv preprint arXiv:2501.07301,
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.