pith. sign in

arxiv: 2605.23346 · v1 · pith:TMKIP2WYnew · submitted 2026-05-22 · 💻 cs.LG

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion

Pith reviewed 2026-05-25 05:28 UTC · model grok-4.3

classification 💻 cs.LG
keywords contrastive distribution matchingamortized sequential monte carlodiscrete diffusiontwist functionreward-tilted samplingcategorical data generationcontrolled generation
0
0 comments X

The pith

CDM learns a parameterized twist function from positive and negative samples to amortize Twisted SMC for discrete diffusion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Contrastive Distribution Matching to make Twisted Sequential Monte Carlo practical for discrete diffusion models by training a twist function on positive and negative samples rather than relying on expensive Monte Carlo estimates at every inference step. Training uses a reformulated gradient that exploits the closed-form forward kernels of discrete diffusion, avoiding approximations during learning. The learned twist function then guides sampling from reward-tilted distributions with asymptotic exactness. This matters for tasks requiring controlled generation of categorical data, such as text detoxification or sequence design, because it removes the main computational bottleneck while adding under 5 percent overhead. A reader would see this as turning an asymptotically correct but slow method into a scalable one for real applications.

Core claim

We introduce Contrastive Distribution Matching (CDM), a novel framework that amortizes the cost of SMC inference by learning a parameterized twist function via positive and negative samples. For efficient training, we reformulate the gradient estimator to leverage the closed-form forward kernels of discrete diffusion models. In practice, evaluating our learned twist function incurs less than 5% additional computational overhead compared to a single forward pass of the base model. Through extensive empirical evaluations, we demonstrate that CDM consistently outperforms existing baselines under matched wall-clock time across applications including toxic text generation, regulatory DNA sequence

What carries the argument

Contrastive Distribution Matching (CDM), the framework that trains a parameterized twist function using positive and negative samples and a reformulated gradient based on closed-form forward kernels.

If this is right

  • The learned twist function adds less than 5% overhead relative to a single base-model forward pass at inference time.
  • CDM produces higher-quality samples than existing baselines when wall-clock time is held constant.
  • The method applies directly to reward-tilted sampling in toxic text generation, DNA sequence design, protein designability, and diffusion LLM alignment.
  • Training avoids costly Monte Carlo estimates of the optimal twist by using the closed-form kernels of discrete diffusion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The contrastive training procedure could be reused across different reward functions without retraining the underlying diffusion model.
  • Because the overhead is low, the approach may enable Twisted SMC on larger discrete models where previous Monte Carlo costs were prohibitive.
  • The same reformulation might be tested on other sequential models that possess closed-form forward transitions.

Load-bearing premise

The reformulated gradient estimator based on closed-form forward kernels produces accurate updates for the twist function without requiring Monte Carlo approximations during training.

What would settle it

An experiment in which the twist function trained via the contrastive reformulation yields no improvement in sample quality or effective sample size over standard SMC on the same reward-tilted task.

Figures

Figures reproduced from arXiv: 2605.23346 by Jaihoon Kim, Minhyuk Sung, Morteza Mardani, Prin Phunyaphibarn, Seungjun Kim, Taehoon Yoon.

Figure 1
Figure 1. Figure 1: Reward vs. Wall-Clock Time with Varying M. Increasing M yields a more accurate twist estimate, improving SMC performance, but incurs a substan￾tial computational cost. CDM show supe￾rior scalability by amortizing this cost [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Scaling Results. We present scaling results for toxic text generation (a-b), regulatory DNA sequence design (c-d), protein designability (e-f), and diffusion LLM alignment (g-h). For each case, we plot the given reward and a heldout reward not seen during training against inference wall-clock time. In all cases, CDM establishes a new Pareto front by consistently outperforming all baselines. works [55, 73].… view at source ↗
Figure 3
Figure 3. Figure 3: Compatibility with Fine-Tuned Proposals. (Left) Applying CDM on top of fine-tuned models improves performance for both toxic text and protein generation. (Right) CDM mitigates mode collapse commonly observed in fine-tuned models while achieving comparable rewards. introduces a severe computational bottleneck at inference. We observe that BoN scales comparably to Soft Value [43], which we assume is because … view at source ↗
Figure 4
Figure 4. Figure 4: Training Comparison of CDM with Soft Value. In this section, we compare the training dynamics of Soft Value and CDM on toxic text and DNA sequence generation, plotting reward against wall-clock training time with fixed training parameters (e.g., optimizer, architecture, batch size). Additionally, for the Soft Value [43] baseline, we sweep the Monte Carlo sample size M used to estimate the optimal twist fun… view at source ↗
Figure 5
Figure 5. Figure 5: Amortized Twisted SMC Procedure. With the learned twist function, we can amortize the SMC inference with a single forward pass. On the other hand, SMC relies on expensive Monte Carlo estimate to approximate the twist function. Algorithm 1: Twisted Sequential Monte Carlo / Importance Sampling 1 Function TwistSMC(K, q, Ψ, ESSthres, tstop) // Inputs: // K: Number of particles q: Proposal distribution // Ψ: Tw… view at source ↗
Figure 6
Figure 6. Figure 6: Twist Head Architecture. (Left) We parameterize the twisting function as a lightweight head that predicts the value based on the last hidden state of the denoising network. (Right) We consider three architectural choices for the twist head: (a) MLP, (b) MLP+PE, and (c) Transformer draw x ϕ t via importance sampling under the EMA-updated twist ψ ϕEMA . We present implementation details and ablations of the … view at source ↗
Figure 7
Figure 7. Figure 7: Positive Buffer Ablation Results. We present an ablation study on the buffer update frequency, nupdate, evaluating its impact on both (a) toxic text generation and (b) regulatory DNA sequence design. from SMC. In particular, when the reward is expensive, increasing nupdate reduces the number of reward evaluations required. In Figs. 7a and 7b, we show that CDM performs well across various update intervals n… view at source ↗
Figure 8
Figure 8. Figure 8: Scaling with Direct Backpropaga￾tion Fine-Tuned Proposal [82]. CDM is also compatible with DRAKES [82], a proposal fine￾tuned via direct backpropagation for tasks with differentiable rewards [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Protein Qualitative Results. We display the generated protein in blue and the refolded protein (using ESMFold) in orange. CDM achieves designable proteins, as shown by the closely matching generated and refolded structures. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
read the original abstract

Discrete diffusion models have emerged as powerful frameworks for generating structured categorical data. However, efficiently sampling from reward-tilted distributions remains a fundamental challenge. While Twisted Sequential Monte Carlo (SMC) offers asymptotic exactness for this task, estimating the optimal twist function in discrete state spaces necessitates costly Monte Carlo approximations, resulting a severe computational bottleneck at inference. To overcome this limitation, we introduce Contrastive Distribution Matching (CDM), a novel framework that amortizes the cost of SMC inference by learning a parameterized twist function via positive and negative samples. For efficient training, we reformulate the gradient estimator to leverage the closed-form forward kernels of discrete diffusion models. In practice, evaluating our learned twist function incurs less than 5% additional computational overhead compared to a single forward pass of the base model. Through extensive empirical evaluations, we demonstrate that CDM consistently outperforms existing baselines under matched wall-clock time. We validate the effectiveness and versatility of our approach across a diverse range of applications, including toxic text generation, regulatory DNA sequence design, protein designability, and diffusion large language model alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces Contrastive Distribution Matching (CDM), a framework that amortizes Twisted Sequential Monte Carlo (SMC) inference for reward-tilted sampling in discrete diffusion models. It learns a parameterized twist function contrastively from positive and negative samples and reformulates the gradient estimator to exploit the closed-form forward kernels of discrete diffusion, avoiding per-step Monte Carlo approximations during training. The paper claims the learned twist incurs <5% additional overhead relative to a single forward pass of the base model and demonstrates consistent outperformance over baselines under matched wall-clock time on tasks including toxic text generation, regulatory DNA sequence design, protein designability, and diffusion LLM alignment.

Significance. If the central construction and empirical claims hold, the work could meaningfully advance practical use of asymptotically exact SMC methods in discrete diffusion by removing a key computational bottleneck. The contrastive amortization approach, combined with the kernel-exploiting gradient reformulation, targets a real inference-time cost in reward-guided generation and is validated across multiple application domains, which strengthens its potential relevance for constrained structured data generation.

minor comments (3)
  1. [§3] §3 (Method): the precise definition of 'positive and negative samples' for the contrastive objective and how they are generated from the diffusion process should be stated explicitly with pseudocode or an algorithm box, as the current description leaves the sampling procedure for the contrastive pairs implicit.
  2. [Table 2, Figure 4] Table 2 and Figure 4: the wall-clock time comparisons would be strengthened by reporting the number of independent runs and standard deviations; without this, it is difficult to assess whether the reported gains are statistically reliable across the four application domains.
  3. [§4.2] §4.2 (Experiments): the base model architecture and training details for the twist function (e.g., whether it shares parameters with the diffusion model or is a separate network) are not fully specified; adding these details would improve reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work on Contrastive Distribution Matching (CDM) for amortizing Twisted SMC in discrete diffusion models, as well as for the encouraging significance assessment and the recommendation of minor revision. No specific major comments appear in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces CDM as a contrastive learning method to amortize twist function estimation for twisted SMC, with the gradient estimator reformulated to exploit closed-form discrete diffusion forward kernels. This is a standard technical construction for efficient training, supported by empirical evaluations on downstream tasks. No derivation step reduces by construction to its own inputs, no fitted parameter is renamed as a prediction, and no load-bearing self-citation chain or uniqueness theorem is invoked. The central claims remain independent of the learned parameters themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method implicitly assumes that positive/negative sample contrast provides a useful training signal for the twist function and that the closed-form kernels remain valid under the reward tilt.

pith-pipeline@v0.9.0 · 5736 in / 1117 out tokens · 17078 ms · 2026-05-25T05:28:31.397969+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · 12 internal anchors

  1. [1]

    Nets: A non-equilibrium transport sampler

    Michael Samuel Albergo and Eric Vanden-Eijnden. Nets: A non-equilibrium transport sampler. In International Conference on Machine Learning, pages 1026–1055. PMLR, 2025

  2. [2]

    Effective gene expression prediction from sequence by integrating long-range interactions.Nature methods, 18(10):1196–1203, 2021

    Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R Ledsam, Agnieszka Grabska-Barwinska, Kyle R Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R Kelley. Effective gene expression prediction from sequence by integrating long-range interactions.Nature methods, 18(10):1196–1203, 2021

  3. [3]

    Universal guidance for diffusion models

    Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Universal guidance for diffusion models. InCVPRW, 2023

  4. [4]

    Training diffusion models with reinforcement learning.arXiv, 2024

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv, 2024

  5. [5]

    A continuous time framework for discrete denoising models.Advances in Neural Information Processing Systems, 2022

    Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models.Advances in Neural Information Processing Systems, 2022

  6. [6]

    Monte carlo guided diffusion for bayesian linear inverse problems

    Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided diffusion for bayesian linear inverse problems. InICLR, 2024. 10

  7. [7]

    Nft: Bridging supervised learning and reinforcement learning in math reasoning

    Huayu Chen, Kaiwen Zheng, Qinsheng Zhang, Ganqu Cui, Yin Cui, Haotian Ye, Tsung-Yi Lin, Ming-Yu Liu, Jun Zhu, and Haoxiang Wang. Nft: Bridging supervised learning and reinforcement learning in math reasoning. InThe F ourteenth International Conference on Learning Representations, 2026

  8. [8]

    Springer, 2020

    Nicolas Chopin, Omiros Papaspiliopoulos, et al.An introduction to sequential Monte Carlo, volume 4. Springer, 2020

  9. [9]

    Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution

    Po-Wei Chou, Daniel Maturana, and Sebastian Scherer. Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. InInternational conference on machine learning, pages 834–843. PMLR, 2017

  10. [10]

    Split gibbs discrete diffusion posterior sampling.arXiv preprint arXiv:2503.01161, 2025

    Wenda Chu, Zihui Wu, Yifan Chen, Yang Song, and Yisong Yue. Split gibbs discrete diffusion posterior sampling.arXiv preprint arXiv:2503.01161, 2025

  11. [11]

    Diffusion posterior sampling for general noisy inverse problems

    Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InICLR, 2023

  12. [12]

    Directly fine-tuning diffusion models on differentiable rewards

    Kevin Clark, Paul Vicol, Kevin Swersky, and Fleet David J. Directly fine-tuning diffusion models on differentiable rewards. InICLR, 2024

  13. [13]

    Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement

    Meihua Dang, Jiaqi Han, Minkai Xu, Kai Xu, Akash Srivastava, and Stefano Ermon. Inference-time scaling of diffusion language models with particle gibbs sampling.arXiv preprint arXiv:2507.08390, 2025

  14. [14]

    Sequential monte carlo samplers.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006

    Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential monte carlo samplers.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006

  15. [15]

    Overview of the multilingual text detoxification task at pan 2024

    Daryna Dementieva, Daniil Moskovskiy, Nikolay Babakov, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Dmitry Ustalov, Elisei Stakovskii, et al. Overview of the multilingual text detoxification task at pan 2024. InCLEF (Working Notes), pages 2432–2461, 2024

  16. [16]

    Generative Modeling via Drifting

    Mingyang Deng, He Li, Tianhong Li, Yilun Du, and Kaiming He. Generative modeling via drifting.arXiv preprint arXiv:2602.04770, 2026

  17. [17]

    Adjoint matching: Fine- tuning flow and diffusion generative models with memoryless stochastic optimal control.arXiv preprint arXiv:2409.08861, 2024

    Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky TQ Chen. Adjoint matching: Fine- tuning flow and diffusion generative models with memoryless stochastic optimal control.arXiv preprint arXiv:2409.08861, 2024

  18. [18]

    An introduction to sequential monte carlo methods

    Arnaud Doucet, Nando De Freitas, and Neil Gordon. An introduction to sequential monte carlo methods. InSequential Monte Carlo methods in practice, pages 3–14. Springer, 2001

  19. [19]

    Springer, 2001

    Arnaud Doucet, Nando De Freitas, Neil James Gordon, et al.Sequential Monte Carlo methods in practice. Springer, 2001

  20. [20]

    Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106 (496):1602–1614, 2011

    Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106 (496):1602–1614, 2011

  21. [21]

    Dpok: reinforcement learning for fine-tuning text-to-image diffusion models

    Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mo- hammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: reinforcement learning for fine-tuning text-to-image diffusion models. InNeurIPS, 2023

  22. [22]

    Scaling laws for reward model overoptimization

    Leo Gao, John Schulman, and Jacob Hilton. Scaling laws for reward model overoptimization. InICML, 2023

  23. [23]

    Openwebtext corpus

    Aaron Gokaslan and Vanya Cohen. Openwebtext corpus. http://Skylion007.github.io/ OpenWebTextCorpus, 2019

  24. [24]

    Machine-guided design of synthetic cell type-specific cis-regulatory elements.bioRxiv, 2023

    Sager J Gosai, Rodrigo I Castro, Natalia Fuentes, John C Butts, Susan Kales, Ramil R Noche, Kousuke Mouri, Pardis C Sabeti, Steven K Reilly, and Ryan Tewhey. Machine-guided design of synthetic cell type-specific cis-regulatory elements.bioRxiv, 2023

  25. [25]

    Oops i took a gradient: Scalable sampling for discrete distributions

    Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, and Chris Maddison. Oops i took a gradient: Scalable sampling for discrete distributions. InInternational Conference on Machine Learning, pages 3831–3841. PMLR, 2021

  26. [26]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 11

  27. [27]

    Discrete feynman-kac correctors.arXiv preprint arXiv:2601.10403, 2026

    Mohsin Hasan, Viktor Ohanesian, Artem Gazizov, Yoshua Bengio, Alán Aspuru-Guzik, Roberto Bondesan, Marta Skreta, and Kirill Neklyudov. Discrete feynman-kac correctors.arXiv preprint arXiv:2601.10403, 2026

  28. [28]

    Adjoint sampling: Highly scalable diffusion samplers via adjoint matching.arXiv preprint arXiv:2504.11713, 2025

    Aaron Havens, Benjamin Kurt Miller, Bing Yan, Carles Domingo-Enrich, Anuroop Sriram, Brandon Wood, Daniel Levine, Bin Hu, Brandon Amos, Brian Karrer, et al. Adjoint sampling: Highly scalable diffusion samplers via adjoint matching.arXiv preprint arXiv:2504.11713, 2025

  29. [29]

    Leaps: A discrete neural sampler via locally equivariant networks.arXiv preprint arXiv:2502.10843, 2025

    Peter Holderrieth, Michael S Albergo, and Tommi Jaakkola. Leaps: A discrete neural sampler via locally equivariant networks.arXiv preprint arXiv:2502.10843, 2025

  30. [30]

    Categorical Reparameterization with Gumbel-Softmax

    Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax.arXiv preprint arXiv:1611.01144, 2016

  31. [31]

    Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control

    Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E Turner, and Douglas Eck. Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control. In ICML, 2017

  32. [32]

    Path integrals and symmetry breaking for optimal control theory.Journal of statistical mechanics: theory and experiment, 2005

    Hilbert J Kappen. Path integrals and symmetry breaking for optimal control theory.Journal of statistical mechanics: theory and experiment, 2005

  33. [33]

    Inference-time scaling for flow models via stochastic generation and rollover budget forcing.arXiv preprint arXiv:2503.19385, 2025

    Jaihoon Kim, Taehoon Yoon, Jisung Hwang, and Minhyuk Sung. Inference-time scaling for flow models via stochastic generation and rollover budget forcing.arXiv preprint arXiv:2503.19385, 2025

  34. [34]

    Test-time alignment of diffusion models without reward over-optimization

    Sunwoo Kim, Minkyu Kim, and Dongmin Park. Test-time alignment of diffusion models without reward over-optimization. InICLR, 2025

  35. [35]

    Rl with kl penalties is better viewed as bayesian inference

    Tomasz Korbak, Ethan Perez, and Christopher Buckley. Rl with kl penalties is better viewed as bayesian inference. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 1083–1091, 2022

  36. [36]

    Flux.https://github.com/black-forest-labs/flux, 2024

    Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

  37. [37]

    FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

    Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

  38. [38]

    Rewardbench: Evaluating reward models for language modeling

    Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, et al. Rewardbench: Evaluating reward models for language modeling. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 1755–1797, 2025

  39. [39]

    Sixo: Smoothing inference with twisted objectives

    Dieterich Lawson, Allan Raventós, Andrew Warrington, and Scott Linderman. Sixo: Smoothing inference with twisted objectives. InNeurIPS, 2022

  40. [40]

    Flow Map Language Models: One-step Language Modeling via Continuous Denoising

    Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Seunghoon Hong, Nicholas M Boffi, and Jinwoo Kim. Flow map language models: One-step language modeling via continuous denoising.arXiv preprint arXiv:2602.16813, 2026

  41. [41]

    Debiasing guidance for discrete diffusion with sequential monte carlo.arXiv preprint arXiv:2502.06079, 2025

    Cheuk Kit Lee, Paul Jeha, Jes Frellsen, Pietro Lio, Michael Samuel Albergo, and Francisco Vargas. Debiasing guidance for discrete diffusion with sequential monte carlo.arXiv preprint arXiv:2502.06079, 2025

  42. [42]

    Reinforcement learning and control as probabilistic inference: Tutorial and review.arXiv, 2018

    Sergey Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review.arXiv, 2018

  43. [43]

    Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding

    Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Aviv Regev, Sergey Levine, and Masatoshi Uehara. Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding. InNeurIPS, 2025

  44. [44]

    Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

    Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

  45. [45]

    Critic sequential monte carlo.arXiv preprint arXiv:2205.15460, 2022

    Vasileios Lioutas, Jonathan Wilder Lavington, Justice Sefas, Matthew Niedoba, Yunpeng Liu, Berend Zwartsenberg, Setareh Dabiri, Frank Wood, and Adam Scibior. Critic sequential monte carlo.arXiv preprint arXiv:2205.15460, 2022. 12

  46. [46]

    Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

    Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, and Yahui Zhou. Skywork-reward: Bag of tricks for reward modeling in llms.arXiv preprint arXiv:2410.18451, 2024

  47. [47]

    Bridging discrete and backpropaga- tion: Straight-through and beyond

    Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, and Jianfeng Gao. Bridging discrete and backpropaga- tion: Straight-through and beyond. InNeurIPS, 2023

  48. [48]

    Paradetox: Detoxification with parallel data

    Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, and Alexander Panchenko. Paradetox: Detoxification with parallel data. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 6804–6818, 2022

  49. [49]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  50. [50]

    Inference-time scaling for diffusion models beyond scaling denoising steps.arXiv, 2025

    Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference-time scaling for diffusion models beyond scaling denoising steps.arXiv, 2025

  51. [51]

    Controlled decoding from language models

    Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, et al. Controlled decoding from language models. arXiv preprint arXiv:2310.17022, 2023

  52. [52]

    Elements of sequential monte carlo

    Christian A Naesseth, Fredrik Lindsten, Thomas B Schön, et al. Elements of sequential monte carlo. F oundations and Trends in Machine Learning, 12(3):307–392, 2019

  53. [53]

    Large language diffusion models

    Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, JUN ZHOU, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. InNeurIPS, 2025

  54. [54]

    Unlocking guidance for discrete state-space diffusion and flow models.arXiv preprint arXiv:2406.01572, 2024

    Hunter Nisonoff, Junhao Xiong, Stephan Allenspach, and Jennifer Listgarten. Unlocking guidance for discrete state-space diffusion and flow models.arXiv preprint arXiv:2406.01572, 2024

  55. [55]

    Inference-time scaling of discrete diffusion models via importance weighting and optimal proposal design.arXiv preprint arXiv:2505.22524, 2025

    Zijing Ou, Chinmay Pani, and Yingzhen Li. Inference-time scaling of discrete diffusion models via importance weighting and optimal proposal design.arXiv preprint arXiv:2505.22524, 2025

  56. [56]

    Pairflow: Closed-form source-target coupling for few-step generation in discrete flow models

    Mingue Park, Jisung Hwang, Seungwoo Yoo, Kyeongmin Yeo, and Minhyuk Sung. Pairflow: Closed-form source-target coupling for few-step generation in discrete flow models. InICLR, 2026

  57. [57]

    Gradient estimation with stochastic softmax tricks

    Max Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, and Chris J Maddison. Gradient estimation with stochastic softmax tricks. InNeurIPS, volume 33, pages 5691–5704, 2020

  58. [58]

    Reward-guided discrete diffusion via clean-sample markov chain for molecule and biological sequence design.arXiv preprint arXiv:2602.09424, 2026

    Prin Phunyaphibarn and Minhyuk Sung. Reward-guided discrete diffusion via clean-sample markov chain for molecule and biological sequence design.arXiv preprint arXiv:2602.09424, 2026

  59. [59]

    Probabilistic planning with sequential monte carlo methods

    Alexandre Piché, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, and Chris Pal. Probabilistic planning with sequential monte carlo methods. InInternational Conference on Learning Representations, 2018

  60. [60]

    Discrete Flow Maps

    Peter Potaptchik, Jason Yim, Adhi Saravanan, Peter Holderrieth, Eric Vanden-Eijnden, and Michael S Albergo. Discrete flow maps.arXiv preprint arXiv:2604.09784, 2026

  61. [61]

    Video diffusion alignment via reward gradients.arXiv preprint arXiv:2407.08737, 2024

    Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, and Deepak Pathak. Video diffusion alignment via reward gradients.arXiv preprint arXiv:2407.08737, 2024

  62. [62]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

  63. [63]

    Direct preference optimization: Your language model is secretly a reward model

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InNIPS, 2023

  64. [64]

    Test-time scaling of diffusion models via noise trajectory search

    Vignav Ramesh and Morteza Mardani. Test-time scaling of diffusion models via noise trajectory search. arXiv preprint arXiv:2506.03164, 2025

  65. [65]

    Least squares estimation without priors or supervision.Neural computation, 23(2):374–420, 2011

    Martin Raphan and Eero P Simoncelli. Least squares estimation without priors or supervision.Neural computation, 23(2):374–420, 2011

  66. [66]

    On stochastic optimal control and reinforcement learning by approximate inference.Proceedings of Robotics: Science and Systems VIII, 2012

    Konrad Rawlik, Marc Toussaint, and Sethu Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference.Proceedings of Robotics: Science and Systems VIII, 2012

  67. [67]

    Driftlite: Lightweight drift control for inference-time scaling of diffusion models.arXiv preprint arXiv:2509.21655, 2025

    Yinuo Ren, Wenhao Gao, Lexing Ying, Grant M Rotskoff, and Jiequn Han. Driftlite: Lightweight drift control for inference-time scaling of diffusion models.arXiv preprint arXiv:2509.21655, 2025. 13

  68. [68]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

  69. [69]

    Simple and effective masked diffusion language models

    Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models. In NeurIPS, 2024

  70. [70]

    Designing DNA with tunable regulatory activity using discrete diffusion

    Anirban Sarkar, Ziqi Tang, Chris Z Zhao, and Peter K Koo. Designing DNA with tunable regulatory activity using discrete diffusion. InNeurIPS 2024 Workshop on AI for New Drug Modalities, 2024. URL https://openreview.net/forum?id=Ioy8LCAyRj

  71. [71]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  72. [72]

    Simplified and generalized masked diffusion for discrete data

    Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. InNeurIPS, 2024

  73. [73]

    A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

    Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

  74. [74]

    Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819, 2025

    Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Alán Aspuru-Guzik, Arnaud Doucet, Rob Brekelmans, Alexander Tong, and Kirill Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819, 2025

  75. [75]

    Discrete adjoint matching

    Oswin So, Brian Karrer, Chuchu Fan, Ricky TQ Chen, and Guan-Horng Liu. Discrete adjoint matching. arXiv preprint arXiv:2602.07132, 2026

  76. [76]

    Pseudoinverse-guided diffusion models for inverse problems

    Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational conference on learning representations, 2023

  77. [77]

    Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

    Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

  78. [78]

    Fine-tuning of continuous-time diffusion models as entropy-regularized control.arXiv, 2024

    Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine-tuning of continuous-time diffusion models as entropy-regularized control.arXiv, 2024

  79. [79]

    Fast and accurate protein structure search with foldseek

    Michel Van Kempen, Stephanie S Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron LM Gilchrist, Johannes Söding, and Martin Steinegger. Fast and accurate protein structure search with foldseek. Nature biotechnology, 42(2):243–246, 2024

  80. [80]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

Showing first 80 references.