Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion
Pith reviewed 2026-05-25 05:28 UTC · model grok-4.3
The pith
CDM learns a parameterized twist function from positive and negative samples to amortize Twisted SMC for discrete diffusion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce Contrastive Distribution Matching (CDM), a novel framework that amortizes the cost of SMC inference by learning a parameterized twist function via positive and negative samples. For efficient training, we reformulate the gradient estimator to leverage the closed-form forward kernels of discrete diffusion models. In practice, evaluating our learned twist function incurs less than 5% additional computational overhead compared to a single forward pass of the base model. Through extensive empirical evaluations, we demonstrate that CDM consistently outperforms existing baselines under matched wall-clock time across applications including toxic text generation, regulatory DNA sequence
What carries the argument
Contrastive Distribution Matching (CDM), the framework that trains a parameterized twist function using positive and negative samples and a reformulated gradient based on closed-form forward kernels.
If this is right
- The learned twist function adds less than 5% overhead relative to a single base-model forward pass at inference time.
- CDM produces higher-quality samples than existing baselines when wall-clock time is held constant.
- The method applies directly to reward-tilted sampling in toxic text generation, DNA sequence design, protein designability, and diffusion LLM alignment.
- Training avoids costly Monte Carlo estimates of the optimal twist by using the closed-form kernels of discrete diffusion.
Where Pith is reading between the lines
- The contrastive training procedure could be reused across different reward functions without retraining the underlying diffusion model.
- Because the overhead is low, the approach may enable Twisted SMC on larger discrete models where previous Monte Carlo costs were prohibitive.
- The same reformulation might be tested on other sequential models that possess closed-form forward transitions.
Load-bearing premise
The reformulated gradient estimator based on closed-form forward kernels produces accurate updates for the twist function without requiring Monte Carlo approximations during training.
What would settle it
An experiment in which the twist function trained via the contrastive reformulation yields no improvement in sample quality or effective sample size over standard SMC on the same reward-tilted task.
Figures
read the original abstract
Discrete diffusion models have emerged as powerful frameworks for generating structured categorical data. However, efficiently sampling from reward-tilted distributions remains a fundamental challenge. While Twisted Sequential Monte Carlo (SMC) offers asymptotic exactness for this task, estimating the optimal twist function in discrete state spaces necessitates costly Monte Carlo approximations, resulting a severe computational bottleneck at inference. To overcome this limitation, we introduce Contrastive Distribution Matching (CDM), a novel framework that amortizes the cost of SMC inference by learning a parameterized twist function via positive and negative samples. For efficient training, we reformulate the gradient estimator to leverage the closed-form forward kernels of discrete diffusion models. In practice, evaluating our learned twist function incurs less than 5% additional computational overhead compared to a single forward pass of the base model. Through extensive empirical evaluations, we demonstrate that CDM consistently outperforms existing baselines under matched wall-clock time. We validate the effectiveness and versatility of our approach across a diverse range of applications, including toxic text generation, regulatory DNA sequence design, protein designability, and diffusion large language model alignment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Contrastive Distribution Matching (CDM), a framework that amortizes Twisted Sequential Monte Carlo (SMC) inference for reward-tilted sampling in discrete diffusion models. It learns a parameterized twist function contrastively from positive and negative samples and reformulates the gradient estimator to exploit the closed-form forward kernels of discrete diffusion, avoiding per-step Monte Carlo approximations during training. The paper claims the learned twist incurs <5% additional overhead relative to a single forward pass of the base model and demonstrates consistent outperformance over baselines under matched wall-clock time on tasks including toxic text generation, regulatory DNA sequence design, protein designability, and diffusion LLM alignment.
Significance. If the central construction and empirical claims hold, the work could meaningfully advance practical use of asymptotically exact SMC methods in discrete diffusion by removing a key computational bottleneck. The contrastive amortization approach, combined with the kernel-exploiting gradient reformulation, targets a real inference-time cost in reward-guided generation and is validated across multiple application domains, which strengthens its potential relevance for constrained structured data generation.
minor comments (3)
- [§3] §3 (Method): the precise definition of 'positive and negative samples' for the contrastive objective and how they are generated from the diffusion process should be stated explicitly with pseudocode or an algorithm box, as the current description leaves the sampling procedure for the contrastive pairs implicit.
- [Table 2, Figure 4] Table 2 and Figure 4: the wall-clock time comparisons would be strengthened by reporting the number of independent runs and standard deviations; without this, it is difficult to assess whether the reported gains are statistically reliable across the four application domains.
- [§4.2] §4.2 (Experiments): the base model architecture and training details for the twist function (e.g., whether it shares parameters with the diffusion model or is a separate network) are not fully specified; adding these details would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our work on Contrastive Distribution Matching (CDM) for amortizing Twisted SMC in discrete diffusion models, as well as for the encouraging significance assessment and the recommendation of minor revision. No specific major comments appear in the report.
Circularity Check
No significant circularity detected
full rationale
The paper introduces CDM as a contrastive learning method to amortize twist function estimation for twisted SMC, with the gradient estimator reformulated to exploit closed-form discrete diffusion forward kernels. This is a standard technical construction for efficient training, supported by empirical evaluations on downstream tasks. No derivation step reduces by construction to its own inputs, no fitted parameter is renamed as a prediction, and no load-bearing self-citation chain or uniqueness theorem is invoked. The central claims remain independent of the learned parameters themselves.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Nets: A non-equilibrium transport sampler
Michael Samuel Albergo and Eric Vanden-Eijnden. Nets: A non-equilibrium transport sampler. In International Conference on Machine Learning, pages 1026–1055. PMLR, 2025
work page 2025
-
[2]
Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R Ledsam, Agnieszka Grabska-Barwinska, Kyle R Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R Kelley. Effective gene expression prediction from sequence by integrating long-range interactions.Nature methods, 18(10):1196–1203, 2021
work page 2021
-
[3]
Universal guidance for diffusion models
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Universal guidance for diffusion models. InCVPRW, 2023
work page 2023
-
[4]
Training diffusion models with reinforcement learning.arXiv, 2024
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv, 2024
work page 2024
-
[5]
Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models.Advances in Neural Information Processing Systems, 2022
work page 2022
-
[6]
Monte carlo guided diffusion for bayesian linear inverse problems
Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided diffusion for bayesian linear inverse problems. InICLR, 2024. 10
work page 2024
-
[7]
Nft: Bridging supervised learning and reinforcement learning in math reasoning
Huayu Chen, Kaiwen Zheng, Qinsheng Zhang, Ganqu Cui, Yin Cui, Haotian Ye, Tsung-Yi Lin, Ming-Yu Liu, Jun Zhu, and Haoxiang Wang. Nft: Bridging supervised learning and reinforcement learning in math reasoning. InThe F ourteenth International Conference on Learning Representations, 2026
work page 2026
-
[8]
Nicolas Chopin, Omiros Papaspiliopoulos, et al.An introduction to sequential Monte Carlo, volume 4. Springer, 2020
work page 2020
-
[9]
Po-Wei Chou, Daniel Maturana, and Sebastian Scherer. Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. InInternational conference on machine learning, pages 834–843. PMLR, 2017
work page 2017
-
[10]
Split gibbs discrete diffusion posterior sampling.arXiv preprint arXiv:2503.01161, 2025
Wenda Chu, Zihui Wu, Yifan Chen, Yang Song, and Yisong Yue. Split gibbs discrete diffusion posterior sampling.arXiv preprint arXiv:2503.01161, 2025
-
[11]
Diffusion posterior sampling for general noisy inverse problems
Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InICLR, 2023
work page 2023
-
[12]
Directly fine-tuning diffusion models on differentiable rewards
Kevin Clark, Paul Vicol, Kevin Swersky, and Fleet David J. Directly fine-tuning diffusion models on differentiable rewards. InICLR, 2024
work page 2024
-
[13]
Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement
Meihua Dang, Jiaqi Han, Minkai Xu, Kai Xu, Akash Srivastava, and Stefano Ermon. Inference-time scaling of diffusion language models with particle gibbs sampling.arXiv preprint arXiv:2507.08390, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential monte carlo samplers.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006
work page 2006
-
[15]
Overview of the multilingual text detoxification task at pan 2024
Daryna Dementieva, Daniil Moskovskiy, Nikolay Babakov, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Dmitry Ustalov, Elisei Stakovskii, et al. Overview of the multilingual text detoxification task at pan 2024. InCLEF (Working Notes), pages 2432–2461, 2024
work page 2024
-
[16]
Generative Modeling via Drifting
Mingyang Deng, He Li, Tianhong Li, Yilun Du, and Kaiming He. Generative modeling via drifting.arXiv preprint arXiv:2602.04770, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky TQ Chen. Adjoint matching: Fine- tuning flow and diffusion generative models with memoryless stochastic optimal control.arXiv preprint arXiv:2409.08861, 2024
-
[18]
An introduction to sequential monte carlo methods
Arnaud Doucet, Nando De Freitas, and Neil Gordon. An introduction to sequential monte carlo methods. InSequential Monte Carlo methods in practice, pages 3–14. Springer, 2001
work page 2001
-
[19]
Arnaud Doucet, Nando De Freitas, Neil James Gordon, et al.Sequential Monte Carlo methods in practice. Springer, 2001
work page 2001
-
[20]
Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106 (496):1602–1614, 2011
work page 2011
-
[21]
Dpok: reinforcement learning for fine-tuning text-to-image diffusion models
Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mo- hammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: reinforcement learning for fine-tuning text-to-image diffusion models. InNeurIPS, 2023
work page 2023
-
[22]
Scaling laws for reward model overoptimization
Leo Gao, John Schulman, and Jacob Hilton. Scaling laws for reward model overoptimization. InICML, 2023
work page 2023
-
[23]
Aaron Gokaslan and Vanya Cohen. Openwebtext corpus. http://Skylion007.github.io/ OpenWebTextCorpus, 2019
work page 2019
-
[24]
Machine-guided design of synthetic cell type-specific cis-regulatory elements.bioRxiv, 2023
Sager J Gosai, Rodrigo I Castro, Natalia Fuentes, John C Butts, Susan Kales, Ramil R Noche, Kousuke Mouri, Pardis C Sabeti, Steven K Reilly, and Ryan Tewhey. Machine-guided design of synthetic cell type-specific cis-regulatory elements.bioRxiv, 2023
work page 2023
-
[25]
Oops i took a gradient: Scalable sampling for discrete distributions
Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, and Chris Maddison. Oops i took a gradient: Scalable sampling for discrete distributions. InInternational Conference on Machine Learning, pages 3831–3841. PMLR, 2021
work page 2021
-
[26]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 11
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Discrete feynman-kac correctors.arXiv preprint arXiv:2601.10403, 2026
Mohsin Hasan, Viktor Ohanesian, Artem Gazizov, Yoshua Bengio, Alán Aspuru-Guzik, Roberto Bondesan, Marta Skreta, and Kirill Neklyudov. Discrete feynman-kac correctors.arXiv preprint arXiv:2601.10403, 2026
-
[28]
Aaron Havens, Benjamin Kurt Miller, Bing Yan, Carles Domingo-Enrich, Anuroop Sriram, Brandon Wood, Daniel Levine, Bin Hu, Brandon Amos, Brian Karrer, et al. Adjoint sampling: Highly scalable diffusion samplers via adjoint matching.arXiv preprint arXiv:2504.11713, 2025
-
[29]
Peter Holderrieth, Michael S Albergo, and Tommi Jaakkola. Leaps: A discrete neural sampler via locally equivariant networks.arXiv preprint arXiv:2502.10843, 2025
-
[30]
Categorical Reparameterization with Gumbel-Softmax
Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax.arXiv preprint arXiv:1611.01144, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[31]
Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E Turner, and Douglas Eck. Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control. In ICML, 2017
work page 2017
-
[32]
Hilbert J Kappen. Path integrals and symmetry breaking for optimal control theory.Journal of statistical mechanics: theory and experiment, 2005
work page 2005
-
[33]
Jaihoon Kim, Taehoon Yoon, Jisung Hwang, and Minhyuk Sung. Inference-time scaling for flow models via stochastic generation and rollover budget forcing.arXiv preprint arXiv:2503.19385, 2025
-
[34]
Test-time alignment of diffusion models without reward over-optimization
Sunwoo Kim, Minkyu Kim, and Dongmin Park. Test-time alignment of diffusion models without reward over-optimization. InICLR, 2025
work page 2025
-
[35]
Rl with kl penalties is better viewed as bayesian inference
Tomasz Korbak, Ethan Perez, and Christopher Buckley. Rl with kl penalties is better viewed as bayesian inference. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 1083–1091, 2022
work page 2022
-
[36]
Flux.https://github.com/black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024
work page 2024
-
[37]
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Rewardbench: Evaluating reward models for language modeling
Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, et al. Rewardbench: Evaluating reward models for language modeling. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 1755–1797, 2025
work page 2025
-
[39]
Sixo: Smoothing inference with twisted objectives
Dieterich Lawson, Allan Raventós, Andrew Warrington, and Scott Linderman. Sixo: Smoothing inference with twisted objectives. InNeurIPS, 2022
work page 2022
-
[40]
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Seunghoon Hong, Nicholas M Boffi, and Jinwoo Kim. Flow map language models: One-step language modeling via continuous denoising.arXiv preprint arXiv:2602.16813, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[41]
Cheuk Kit Lee, Paul Jeha, Jes Frellsen, Pietro Lio, Michael Samuel Albergo, and Francisco Vargas. Debiasing guidance for discrete diffusion with sequential monte carlo.arXiv preprint arXiv:2502.06079, 2025
-
[42]
Reinforcement learning and control as probabilistic inference: Tutorial and review.arXiv, 2018
Sergey Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review.arXiv, 2018
work page 2018
-
[43]
Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding
Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Aviv Regev, Sergey Levine, and Masatoshi Uehara. Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding. InNeurIPS, 2025
work page 2025
-
[44]
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023
work page 2023
-
[45]
Critic sequential monte carlo.arXiv preprint arXiv:2205.15460, 2022
Vasileios Lioutas, Jonathan Wilder Lavington, Justice Sefas, Matthew Niedoba, Yunpeng Liu, Berend Zwartsenberg, Setareh Dabiri, Frank Wood, and Adam Scibior. Critic sequential monte carlo.arXiv preprint arXiv:2205.15460, 2022. 12
-
[46]
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, and Yahui Zhou. Skywork-reward: Bag of tricks for reward modeling in llms.arXiv preprint arXiv:2410.18451, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Bridging discrete and backpropaga- tion: Straight-through and beyond
Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, and Jianfeng Gao. Bridging discrete and backpropaga- tion: Straight-through and beyond. InNeurIPS, 2023
work page 2023
-
[48]
Paradetox: Detoxification with parallel data
Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, and Alexander Panchenko. Paradetox: Detoxification with parallel data. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 6804–6818, 2022
work page 2022
-
[49]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[50]
Inference-time scaling for diffusion models beyond scaling denoising steps.arXiv, 2025
Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference-time scaling for diffusion models beyond scaling denoising steps.arXiv, 2025
work page 2025
-
[51]
Controlled decoding from language models
Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, et al. Controlled decoding from language models. arXiv preprint arXiv:2310.17022, 2023
-
[52]
Elements of sequential monte carlo
Christian A Naesseth, Fredrik Lindsten, Thomas B Schön, et al. Elements of sequential monte carlo. F oundations and Trends in Machine Learning, 12(3):307–392, 2019
work page 2019
-
[53]
Large language diffusion models
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, JUN ZHOU, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. InNeurIPS, 2025
work page 2025
-
[54]
Hunter Nisonoff, Junhao Xiong, Stephan Allenspach, and Jennifer Listgarten. Unlocking guidance for discrete state-space diffusion and flow models.arXiv preprint arXiv:2406.01572, 2024
-
[55]
Zijing Ou, Chinmay Pani, and Yingzhen Li. Inference-time scaling of discrete diffusion models via importance weighting and optimal proposal design.arXiv preprint arXiv:2505.22524, 2025
-
[56]
Pairflow: Closed-form source-target coupling for few-step generation in discrete flow models
Mingue Park, Jisung Hwang, Seungwoo Yoo, Kyeongmin Yeo, and Minhyuk Sung. Pairflow: Closed-form source-target coupling for few-step generation in discrete flow models. InICLR, 2026
work page 2026
-
[57]
Gradient estimation with stochastic softmax tricks
Max Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, and Chris J Maddison. Gradient estimation with stochastic softmax tricks. InNeurIPS, volume 33, pages 5691–5704, 2020
work page 2020
-
[58]
Prin Phunyaphibarn and Minhyuk Sung. Reward-guided discrete diffusion via clean-sample markov chain for molecule and biological sequence design.arXiv preprint arXiv:2602.09424, 2026
-
[59]
Probabilistic planning with sequential monte carlo methods
Alexandre Piché, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, and Chris Pal. Probabilistic planning with sequential monte carlo methods. InInternational Conference on Learning Representations, 2018
work page 2018
-
[60]
Peter Potaptchik, Jason Yim, Adhi Saravanan, Peter Holderrieth, Eric Vanden-Eijnden, and Michael S Albergo. Discrete flow maps.arXiv preprint arXiv:2604.09784, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[61]
Video diffusion alignment via reward gradients.arXiv preprint arXiv:2407.08737, 2024
Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, and Deepak Pathak. Video diffusion alignment via reward gradients.arXiv preprint arXiv:2407.08737, 2024
-
[62]
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
work page 2019
-
[63]
Direct preference optimization: Your language model is secretly a reward model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InNIPS, 2023
work page 2023
-
[64]
Test-time scaling of diffusion models via noise trajectory search
Vignav Ramesh and Morteza Mardani. Test-time scaling of diffusion models via noise trajectory search. arXiv preprint arXiv:2506.03164, 2025
-
[65]
Least squares estimation without priors or supervision.Neural computation, 23(2):374–420, 2011
Martin Raphan and Eero P Simoncelli. Least squares estimation without priors or supervision.Neural computation, 23(2):374–420, 2011
work page 2011
-
[66]
Konrad Rawlik, Marc Toussaint, and Sethu Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference.Proceedings of Robotics: Science and Systems VIII, 2012
work page 2012
-
[67]
Yinuo Ren, Wenhao Gao, Lexing Ying, Grant M Rotskoff, and Jiequn Han. Driftlite: Lightweight drift control for inference-time scaling of diffusion models.arXiv preprint arXiv:2509.21655, 2025. 13
-
[68]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022
work page 2022
-
[69]
Simple and effective masked diffusion language models
Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models. In NeurIPS, 2024
work page 2024
-
[70]
Designing DNA with tunable regulatory activity using discrete diffusion
Anirban Sarkar, Ziqi Tang, Chris Z Zhao, and Peter K Koo. Designing DNA with tunable regulatory activity using discrete diffusion. InNeurIPS 2024 Workshop on AI for New Drug Modalities, 2024. URL https://openreview.net/forum?id=Ioy8LCAyRj
work page 2024
-
[71]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[72]
Simplified and generalized masked diffusion for discrete data
Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. InNeurIPS, 2024
work page 2024
-
[73]
Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025
-
[74]
Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Alán Aspuru-Guzik, Arnaud Doucet, Rob Brekelmans, Alexander Tong, and Kirill Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819, 2025
-
[75]
Oswin So, Brian Karrer, Chuchu Fan, Ricky TQ Chen, and Guan-Horng Liu. Discrete adjoint matching. arXiv preprint arXiv:2602.07132, 2026
-
[76]
Pseudoinverse-guided diffusion models for inverse problems
Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational conference on learning representations, 2023
work page 2023
-
[77]
Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024
-
[78]
Fine-tuning of continuous-time diffusion models as entropy-regularized control.arXiv, 2024
Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine-tuning of continuous-time diffusion models as entropy-regularized control.arXiv, 2024
work page 2024
-
[79]
Fast and accurate protein structure search with foldseek
Michel Van Kempen, Stephanie S Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron LM Gilchrist, Johannes Söding, and Martin Steinegger. Fast and accurate protein structure search with foldseek. Nature biotechnology, 42(2):243–246, 2024
work page 2024
-
[80]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.