NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment
Pith reviewed 2026-06-30 10:48 UTC · model grok-4.3
The pith
NTRK guides diffusion models to high-reward outputs by biasing only the noise term of the reverse process while leaving the pretrained mean unchanged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. This is enabled by a whitening operator, the central mechanism behind NTRK, which converts reward gradients into noise-compatible perturbations without losing their guiding signal.
What carries the argument
The whitening operator that converts reward gradients into noise-compatible perturbations while leaving the reverse mean unchanged.
If this is right
- NTRK outperforms recent state-of-the-art baselines on various reward alignment tasks without losing sample quality.
- On aesthetic generation NTRK reaches the reward level of the best baseline at 500 NFEs using only 25 NFEs.
- The method requires only a single sample per step.
- The reverse kernel remains exactly the pretrained one at every step.
Where Pith is reading between the lines
- The separation of mean and noise control may let practitioners add reward guidance to existing diffusion pipelines with no retraining.
- If the whitening step generalizes, similar noise-only tilting could be tested on other iterative generative processes that separate deterministic and stochastic parts.
- The reported 20-fold reduction in steps suggests that reward alignment cost could be moved almost entirely to inference rather than fine-tuning.
Load-bearing premise
The whitening operator converts reward gradients into noise-compatible perturbations that provide guiding signal without pushing intermediate states outside the pretrained model's trained region or degrading sample quality.
What would settle it
A direct comparison in which NTRK samples at 25 NFEs show both lower reward and visibly lower quality than the best baseline at 500 NFEs on the aesthetic task.
read the original abstract
We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the versatility of pretrained diffusion models. Yet existing methods face a trade-off. Gradient-based guidance shifts the reverse mean, steering generation but pushing intermediate states outside the region that the model was trained on and degrading quality. Search-based methods preserve quality but gain no gradient signal. No prior method achieves both. NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. This is enabled by a whitening operator, the central mechanism behind NTRK, which converts reward gradients into noise-compatible perturbations without losing their guiding signal. Across various reward alignment tasks, NTRK outperforms recent state-of-the-art baselines without losing sample quality. Remarkably, on aesthetic generation, NTRK surpasses the reward of the best baseline at 500 NFEs using only 25 NFEs, a 20 times reduction in compute.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Noise-Tilted Reverse Kernel (NTRK) for reward-guided diffusion sampling. It keeps the pretrained reverse mean fixed and uses a whitening operator to bias only the noise term with reward gradients, requiring one sample per step. The method is claimed to resolve the guidance-quality trade-off, outperforming recent baselines across reward alignment tasks while preserving sample quality. A highlighted result is that on aesthetic generation NTRK exceeds the best baseline reward at 500 NFEs using only 25 NFEs.
Significance. If the whitening operator is shown to convert gradients into noise-compatible perturbations that stay within the pretrained model's support, the result would be significant: it offers a new route to gradient-based guidance that avoids the mean-shift degradation seen in prior work, while retaining the efficiency of single-sample reverse steps. The reported 20x NFE reduction would be a strong practical contribution if reproducible.
major comments (2)
- [§3] The central claim rests on the whitening operator converting reward gradients into noise perturbations without pushing intermediate states outside the trained region (§3, around the definition of the tilted kernel). No explicit derivation or bound is referenced showing that the operator preserves the marginals of the pretrained reverse process; without this, the 'no quality loss' assertion remains unanchored.
- [Table 2] Table 2 (aesthetic generation results): the 25-NFE NTRK reward is reported higher than the 500-NFE baseline, but the table does not list the exact reward model, classifier-free guidance scale, or number of seeds used for each method. This makes it impossible to assess whether the 20x compute claim is load-bearing or sensitive to hyper-parameter choices.
minor comments (2)
- [§3.1] Notation for the whitening operator W(·) is introduced without an explicit matrix or operator definition; a short appendix deriving its action on the noise covariance would improve clarity.
- [§4] The abstract states 'without losing sample quality' but the main text should include FID or CLIP-score comparisons against the unguided model to quantify this.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important areas for strengthening the theoretical grounding and experimental transparency of the NTRK method. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [§3] The central claim rests on the whitening operator converting reward gradients into noise perturbations without pushing intermediate states outside the trained region (§3, around the definition of the tilted kernel). No explicit derivation or bound is referenced showing that the operator preserves the marginals of the pretrained reverse process; without this, the 'no quality loss' assertion remains unanchored.
Authors: We agree that an explicit derivation or bound would strengthen the presentation. While the current manuscript motivates the whitening operator via its effect on the noise term and empirical preservation of sample quality, it does not contain a formal proof that the operator leaves the marginals of the pretrained reverse process unchanged. In the revised version we will add a short derivation in the appendix that shows the whitening step produces a perturbation whose expectation under the pretrained noise distribution remains zero, thereby preserving the marginal at each reverse step. This will directly address the anchoring concern. revision: yes
-
Referee: [Table 2] Table 2 (aesthetic generation results): the 25-NFE NTRK reward is reported higher than the 500-NFE baseline, but the table does not list the exact reward model, classifier-free guidance scale, or number of seeds used for each method. This makes it impossible to assess whether the 20x compute claim is load-bearing or sensitive to hyper-parameter choices.
Authors: The referee is correct that these details are currently missing from Table 2. In the revision we will expand the table (or add a companion table) to report: (i) the precise aesthetic reward model and its checkpoint, (ii) the classifier-free guidance scale applied to each baseline, and (iii) the number of evaluation seeds (we used 50 seeds for all methods). We will also state the exact hyper-parameter settings used for the 25-NFE and 500-NFE runs so that the 20× NFE reduction can be evaluated under matched conditions. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents NTRK as a new construction that fixes the reverse-process mean and applies a whitening operator to tilt only the noise term with reward gradients. No equations or claims in the abstract reduce the central mechanism to a fitted parameter renamed as a prediction, a self-definitional loop, or a load-bearing self-citation. The whitening operator is introduced as an enabling device rather than derived from prior results by the same authors, and the performance claims (including NFE reduction) are framed as empirical consequences of the construction rather than tautological. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Countgd: Multi-modal open-world count- ing
Amini-Naieni, N., Han, T., and Zisserman, A. Countgd: Multi-modal open-world count- ing. InAdvances in Neural Information Processing Systems, volume 37, pp. 48810– 48837, 2024
2024
-
[2]
Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., and Lin, J. Qwen2.5-VLtechnicalreport.arXiv preprint arXiv:2502.13923, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Universal guidance for diffusion models
Bansal, A., Chu, H.-M., Schwarzschild, A., Sengupta, S., Goldblum, M., Geiping, J., and Goldstein, T. Universal guidance for diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recogni- tion Workshops, 2023
2023
-
[4]
D-Flow: Differentiating through flows for controlled generation
Ben-Hamu, H., Puny, O., Gat, I., Karrer, B., Singer, U., and Lipman, Y. D-Flow: Differentiating through flows for controlled generation. InInternational Conference on Machine Learning, pp. 3462–3483, 2024
2024
-
[5]
Training diffusion mod- els with reinforcement learning
Black, K., Janner, M., Du, Y., Kostrikov, I., and Levine, S. Training diffusion mod- els with reinforcement learning. InInterna- tional Conference on Learning Representa- tions, 2024
2024
-
[6]
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Cai, H., Cao, S., Du, R., Gao, P., Hoi, S., Hou, Z., Huang, S., Jiang, D., Jin, X., Li, L., et al. Z-image: An efficient image generation foundation model with single- stream diffusion transformer.arXiv preprint arXiv:2511.22699, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Cardoso, G., Idrissi, Y. J. E., Corff, S. L., and Moulines, E. Monte carlo guided diffu- sion for bayesian linear inverse problems. In International Conference on Learning Rep- resentations, 2024
2024
-
[8]
Chefer, H., Alaluf, Y., Vinker, Y., Wolf, L., and Cohen-Or, D. Attend-and-excite: Attention-based semantic guidance for text- to-image diffusion models.ACM Transac- tions on Graphics, 42(4):148:1–148:12, 2023. doi: 10.1145/3592116
-
[9]
Chung, H., Sim, B., Ryu, D., and Ye, J. C. Improving diffusion models for inverse prob- lemsusingmanifoldconstraints. InAdvances in Neural Information Processing Systems, volume 35, pp. 25683–25696, 2022
2022
-
[10]
T., Klasky, M
Chung, H., Kim, J., Mccann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sam- pling for general noisy inverse problems. In International Conference on Learning Rep- resentations, 2023
2023
-
[11]
Clark, K., Vicol, P., Swersky, K., and J, F. D. Directly fine-tuning diffusion models on dif- ferentiable rewards. InInternational Con- ference on Learning Representations, 2024
2024
-
[12]
Warped diffusion: Solving video in- verse problems with image diffusion models
Daras, G., Nie, W., Kreis, K., Dimakis, A., Mardani, M., Kovachki, N., and Vah- dat, A. Warped diffusion: Solving video in- verse problems with image diffusion models. InAdvances in Neural Information Process- ing Systems, volume 37, pp. 101116–101143, 2024
2024
-
[13]
and Song, Y
Dou, Z. and Song, Y. Diffusion posterior sampling for linear inverse problem solving: Afiltering perspective. InInternational Con- ference on Learning Representations, 2024
2024
-
[14]
J., et al.Sequential Monte Carlo methods in practice
Doucet, A., De Freitas, N., Gordon, N. J., et al.Sequential Monte Carlo methods in practice. Springer, 2001. 13 NoiseTilt: Noise-Tilted Reverse Kernels
2001
-
[15]
ReNO: Enhanc- ing one-step text-to-image models through reward-based noise optimization
Eyring, L., Karthik, S., Roth, K., Dosovit- skiy, A., and Akata, Z. ReNO: Enhanc- ing one-step text-to-image models through reward-based noise optimization. InAd- vances in Neural Information Processing Systems, volume 37, pp. 125487–125519, 2024
2024
-
[16]
DPOK: reinforce- ment learning for fine-tuning text-to-image diffusion models
Fan, Y., Watkins, O., Du, Y., Liu, H., Ryu, M., Boutilier, C., Abbeel, P., Ghavamzadeh, M., Lee, K., and Lee, K. DPOK: reinforce- ment learning for fine-tuning text-to-image diffusion models. InAdvances in Neural In- formation Processing Systems, volume 36, pp. 79858–79885, 2023
2023
-
[17]
Scaling laws for reward model overoptimization
Gao, L., Schulman, J., and Hilton, J. Scaling laws for reward model overoptimization. In International Conference on Machine Learn- ing, pp. 10835–10866, 2023
2023
-
[18]
Z., Salakhutdinov, R., and Ermon, S
He, Y., Murata, N., Lai, C.-H., Takida, Y., Uesaka, T., Kim, D., Liao, W.-H., Mitsu- fuji, Y., Kolter, J. Z., Salakhutdinov, R., and Ermon, S. Manifold preserving guided diffusion. InInternational Conference on Learning Representations, 2024
2024
-
[19]
Prompt-to-prompt image editing with cross attention control
Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., and Cohen-Or, D. Prompt-to-prompt image editing with cross attention control. InInternational Confer- ence on Learning Representations, 2023
2023
-
[20]
Stylealignedimagegeneration via shared attention
Hertz, A., Voynov, A., Fruchter, S., and Cohen-Or, D. Stylealignedimagegeneration via shared attention. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4775– 4785, 2024
2024
-
[21]
Denoising diffusion probabilistic models
Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pp. 6840–6851, 2020
2020
-
[22]
T2I-CompBench++: An enhanced and comprehensive benchmark for compositional text-to-image generation
Huang, K., Duan, C., Sun, K., Xie, E., Li, Z., and Liu, X. T2I-CompBench++: An enhanced and comprehensive benchmark for compositional text-to-image generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3563–3579, 2025. doi: 10.1109/TPAMI.2025.3531907
-
[23]
VBench: Comprehensive benchmark suite for video generative models
Huang, Z., He, Y., Yu, J., Zhang, F., Si, C., Jiang, Y., Zhang, Y., Wu, T., Jin, Q., Chanpaisit, N., Wang, Y., Chen, X., Wang, L., Lin, D., Qiao, Y., and Liu, Z. VBench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),pp.21807– 21818, 2024
2024
-
[24]
Gradient Preconditioning for Efficient and Reliable Reward-Guided Generation
Hwang, J. and Sung, M. Gradient preconditioning for efficient and reliable reward-guided generation.arXiv preprint arXiv:2602.08646, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Moment- and power-spectrum-based Gaussianity reg- ularization for text-to-image models
Hwang, J., Kim, J., and Sung, M. Moment- and power-spectrum-based Gaussianity reg- ularization for text-to-image models. In Advances in Neural Information Processing Systems, volume 38, pp. 18235–18264, 2025
2025
-
[26]
Independent Component Analysis
Hyvärinen, A., Karhunen, J., and Oja, E. Independent Component Analysis. John Wi- ley & Sons, 2001
2001
-
[27]
Y., Lin, Z., and Hwang, S
Jang, S., Ki, T., Jo, J., Yoon, J., Kim, S. Y., Lin, Z., and Hwang, S. J. Frame guidance: Training-free guidance for frame-level con- trol in video diffusion models. InInterna- tional Conference on Learning Representa- tions, 2026
2026
-
[28]
Kessy, A., Lewin, A., andStrimmer, K. Opti- mal whitening and decorrelation.The Amer- ican Statistician, 72(4):309–314, 2018. doi: 10.1080/00031305.2016.1277159
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2016.1277159 2018
-
[29]
Inference-time scaling for flow models via stochastic generation and rollover budget forcing
Kim, J., Yoon, T., Hwang, J., and Sung, M. Inference-time scaling for flow models via stochastic generation and rollover budget forcing. InAdvances in Neural Information Processing Systems, volume 38, pp. 30830– 30864, 2025
2025
-
[30]
Test- time alignment of diffusion models with- out reward over-optimization
Kim, S., Kim, M., and Park, D. Test- time alignment of diffusion models with- out reward over-optimization. InInterna- tional Conference on Learning Representa- tions, 2025
2025
-
[31]
Pick-a- Pic: An open dataset of user preferences for text-to-image generation
Kirstain, Y., Polyak, A., Singer, U., Ma- tiana, S., Penna, J., and Levy, O. Pick-a- Pic: An open dataset of user preferences for text-to-image generation. InAdvances 14 NoiseTilt: Noise-Tilted Reverse Kernels in Neural Information Processing Systems, volume 36, pp. 36652–36663, 2023
2023
-
[32]
On reinforcement learn- ing and distribution matching for fine-tuning language models with no catastrophic for- getting
Korbak, T., Elsahar, H., Kruszewski, G., and Dymetmant, M. On reinforcement learn- ing and distribution matching for fine-tuning language models with no catastrophic for- getting. InAdvances in Neural Information Processing Systems, volume 35, pp. 16203– 16220, 2022
2022
-
[33]
Multi-concept cus- tomization of text-to-image diffusion
Kumari, N., Zhang, B., Zhang, R., Shecht- man, E., and Zhu, J.-Y. Multi-concept cus- tomization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 1931–1941, 2023
1931
-
[34]
and Ye, J
Kwon, T. and Ye, J. C. Solving video inverse problems using image diffusion models. In International Conference on Learning Rep- resentations, 2025
2025
-
[35]
Labs, B. F. FLUX.https://github.com/ black-forest-labs/flux, 2024
2024
-
[36]
Syncdiffusion: Coherent montage via syn- chronized joint diffusions
Lee, Y., Kim, K., Kim, H., and Sung, M. Syncdiffusion: Coherent montage via syn- chronized joint diffusions. InAdvances in Neural Information Processing Systems, vol- ume 36, pp. 50648–50660, 2023
2023
-
[37]
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Li, J., Cui, Y., Huang, T., Ma, Y., Fan, C., Yang, M., Zhong, Z., and Bo, L. Mixgrpo: Unlocking flow-based grpo effi- ciency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Derivative- free guidance in continuous and discrete diffusion models with soft value-based de- coding
Li, X., Zhao, Y., Wang, C., Scalia, G., Eraslan, G., Nair, S., Biancalani, T., Regev, A., Levine, S., and Uehara, M. Derivative- free guidance in continuous and discrete diffusion models with soft value-based de- coding. InAdvances in Neural Information Processing Systems, volume 38, pp. 95507– 95545, 2025
2025
-
[39]
Evaluating text-to-visual generation with image-to-text generation
Lin, Z., Pathak, D., Li, B., Li, J., Xia, X., Neubig, G., Zhang, P., and Ramanan, D. Evaluating text-to-visual generation with image-to-text generation. InProceedings of the European Conference on Computer Vision (ECCV), pp. 366–384, 2024
2024
-
[40]
Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for generative modeling. InInternational Con- ference on Learning Representations, 2023
2023
-
[41]
Flow-GRPO: Training flow matching models via online RL
Liu, J., Liu, G., Liang, J., Li, Y., Liu, J., Wang, X., Wan, P., Zhang, D., and Ouyang, W. Flow-GRPO: Training flow matching models via online RL. InAdvances in Neural Information Processing Systems, volume 38, pp. 40783–40818, 2025
2025
-
[42]
Improving video generation with human feedback
Liu, J., Liu, G., Liang, J., Yuan, Z., Liu, X., Zheng, M., Wu, X., Wang, Q., Qin, W., Xia, M., et al. Improving video generation with human feedback. InAdvances in Neural Information Processing Systems, volume 38, pp. 82155–82192, 2025
2025
-
[43]
Flow straight and fast: Learning to generate and trans- fer data with rectified flow
Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learning to generate and trans- fer data with rectified flow. InInterna- tional Conference on Learning Representa- tions, 2023
2023
-
[44]
Freelong: Training-free long video genera- tion with spectralblend temporal attention
Lu, Y., Liang, Y., Zhu, L., and Yang, Y. Freelong: Training-free long video genera- tion with spectralblend temporal attention. InAdvances in Neural Information Process- ing Systems, volume 37, pp. 131434–131455, 2024
2024
-
[45]
Dual-process image genera- tion
Luo, G., Granskog, J., Holynski, A., and Darrell, T. Dual-process image genera- tion. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pp. 17972–17983, 2025
2025
-
[46]
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Ma, N., Tong, S., Jia, H., Hu, H., Su, Y.-C., Zhang, M., Yang, X., Li, Y., Jaakkola, T., Jia, X., and Xie, S. Inference-time scaling for diffusion models beyond scaling denois- ing steps.arXiv preprint arXiv:2501.09732, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Ma, X., Wang, Y., Chen, X., Wong, T.-T., and Chen, C. Training-free stylized text-to- image generation with fast inference.arXiv preprint arXiv:2505.19063, 2025
-
[48]
Video dif- fusion alignment via reward gradients
Prabhudesai, M., Mendonca, R., Qin, Z., Fragkiadaki, K., and Pathak, D. Video dif- fusion alignment via reward gradients. In International Conference on Learning Rep- resentations, 2025. 15 NoiseTilt: Noise-Tilted Reverse Kernels
2025
-
[49]
T., Zhao, S., Lau, C
Qian, Y., Guo, Z., Deng, B., Lei, C. T., Zhao, S., Lau, C. P., Hong, X., and Pound, M. P. T2icount: Enhancing cross-modal understanding for zero-shot counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 25336–25345, 2025
2025
-
[50]
Qiu, H., Chen, Z., Wang, Z., He, Y., Xia, M., and Liu, Z. Freetraj: Tuning-free trajectory control in video diffusion models.arXiv preprint arXiv:2406.16863, 2024
-
[51]
D., Ermon, S., and Finn, C
Rafailov, R., Sharma, A., Mitchell, E., Man- ning, C. D., Ermon, S., and Finn, C. Di- rect preference optimization: Your language model is secretly a reward model. InAd- vances in Neural Information Processing Systems, volume 36, pp. 53728–53741, 2023
2023
-
[52]
Ramesh, V. and Mardani, M. Test-time scal- ing of diffusion models via noise trajectory search.arXiv preprint arXiv:2506.03164, 2025
-
[53]
E.An Empirical Bayes Ap- proach to Statistics
Robbins, H. E.An Empirical Bayes Ap- proach to Statistics. Springer, 1992
1992
-
[54]
High-resolution image synthesis with latent diffusion models
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 10674–10685, 2022
2022
-
[55]
Solving linear inverse problems provably via poste- rior sampling with latent diffusion models
Rout, L., Raoof, N., Daras, G., Caramanis, C., Dimakis, A., and Shakkottai, S. Solving linear inverse problems provably via poste- rior sampling with latent diffusion models. InAdvances in Neural Information Process- ing Systems, volume 36, pp. 49960–49990, 2023
2023
-
[56]
Beyond first-order tweedie: Solving inverse problems using latent diffusion
Rout, L., Chen, Y., Kumar, A., Caramanis, C., Shakkottai, S., and Chu, W.-S. Beyond first-order tweedie: Solving inverse problems using latent diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9472– 9481, 2024
2024
-
[57]
Learning diffusion priors from observations by expectation maximization
Rozet, F., Andry, G., Lanusse, F., and Louppe, G. Learning diffusion priors from observations by expectation maximization. InAdvances in Neural Information Process- ing Systems, volume 37, pp. 87647–87682, 2024
2024
-
[58]
Norm-guided latent space exploration for text-to-image generation
Samuel, D., Ben-Ari, R., Darshan, N., Maron, H., and Chechik, G. Norm-guided latent space exploration for text-to-image generation. InAdvances in Neural Infor- mation Processing Systems, volume 36, pp. 57863–57875, 2023
2023
-
[59]
LAION aesthetics.https: //laion.ai/blog/laion-aesthetics, 2022
Schuhmann, C. LAION aesthetics.https: //laion.ai/blog/laion-aesthetics, 2022
2022
-
[60]
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement
Seo, J., Veer, S., Tian, R., Ding, W., Sharma, A., Leung, K., Schmerling, E., Pavone, M., and Bajcsy, A. Stressdream: Steering video world models for robust pol- icy evaluation and improvement.arXiv preprint arXiv:2606.00267, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[61]
Singhal, R., Horvitz, Z., Teehan, R., Ren, M., Yu, Z., McKeown, K., and Ranganath, R. A general framework for inference-time scaling and steering of diffusion models. arXiv preprint arXiv:2501.06848, 2025
-
[62]
Pseudoinverse-guided diffusion models for inverse problems
Song, J., Vahdat, A., Mardani, M., and Kautz, J. Pseudoinverse-guided diffusion models for inverse problems. InInterna- tional Conference on Learning Representa- tions, 2023
2023
-
[63]
Loss-guided diffusion models for plug-and-play controllable generation
Song, J., Zhang, Q., Yin, H., Mardani, M., Liu, M.-Y., Kautz, J., Chen, Y., and Vah- dat, A. Loss-guided diffusion models for plug-and-play controllable generation. In International Conference on Machine Learn- ing, pp. 32483–32498, 2023
2023
-
[64]
P., Kumar, A., Ermon, S., and Poole, B
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score- based generative modeling through stochas- tic differential equations. InInternational Conference on Learning Representations, 2021
2021
-
[65]
M., Lowe, R., Voss, C., Radford, A., Amodei, D., and Christiano, P
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D. M., Lowe, R., Voss, C., Radford, A., Amodei, D., and Christiano, P. Learning to summarize from human feedback. In Advances in Neural Information Processing Systems, volume 33, pp. 3008–3021, 2020. 16 NoiseTilt: Noise-Tilted Reverse Kernels
2020
-
[66]
Inference-time alignment of diffusion models with direct noise optimization
Tang, Z., Peng, J., Tang, J., Hong, M., Wang, F., and Chang, T.-H. Inference-time alignment of diffusion models with direct noise optimization. InInternational Confer- ence on Machine Learning, pp. 58905–58930, 2025
2025
-
[67]
Bridging model-based optimization and generative modeling via conservative fine-tuning of diffusion models
Uehara, M., Zhao, Y., Hajiramezanali, E., Scalia, G., Eraslan, G., Lal, A., Levine, S., and Biancalani, T. Bridging model-based optimization and generative modeling via conservative fine-tuning of diffusion models. InAdvances in Neural Information Process- ing Systems, volume 37, pp. 127511–127535, 2024
2024
-
[68]
L., Tseng, A
Uehara, M., Zhao, Y., Black, K., Haji- ramezanali, E., Scalia, G., Diamant, N. L., Tseng, A. M., Biancalani, T., and Levine, S. Fine-tuning of continuous-time diffusion models as entropy-regularized control. In International Conference on Learning Rep- resentations, 2025
2025
-
[69]
Diffusion model alignment using direct preference op- timization
Wallace, B., Dang, M., Rafailov, R., Zhou, L., Lou, A., Purushwalkam, S., Ermon, S., Xiong, C., Joty, S., and Naik, N. Diffusion model alignment using direct preference op- timization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8228–8238, 2024
2024
-
[70]
Wan Team, A. G. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[71]
Wu, L., Trippe, B., Naesseth, C., Blei, D., and Cunningham, J. P. Practical and asymp- totically exact conditional sampling in diffu- sion models. InAdvances in Neural Infor- mation Processing Systems, volume 36, pp. 31372–31403, 2023
2023
-
[72]
Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthe- sis
Wu, X., Hao, Y., Sun, K., Chen, Y., Zhu, F., Zhao, R., and Li, H. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthe- sis. InInternational Conference on Learning Representations, 2024
2024
-
[73]
ImageRe- ward: Learning and evaluating human pref- erences for text-to-image generation
Xu, J., Liu, X., Wu, Y., Tong, Y., Li, Q., Ding, M., Tang, J., and Dong, Y. ImageRe- ward: Learning and evaluating human pref- erences for text-to-image generation. In Advances in Neural Information Processing Systems, volume 36, pp. 15903–15935, 2023
2023
-
[74]
Y., and Ermon, S
Ye, H., Lin, H., Han, J., Xu, M., Liu, S., Liang, Y., Ma, J., Zou, J. Y., and Ermon, S. TFG: Unified training-free guidance for diffusion models. InAdvances in Neural Information Processing Systems, volume 37, pp. 22370–22417, 2024
2024
-
[75]
Psi-sampler: Initial particle sampling for smc-based inference-time reward alignment in score models
Yoon, T., Min, Y., Yeo, K., and Sung, M. Psi-sampler: Initial particle sampling for smc-based inference-time reward alignment in score models. InAdvances in Neural In- formation Processing Systems, volume 38, pp. 104745–104781, 2025
2025
-
[76]
FreeDoM: Training-free energy- guided conditional diffusion model
Yu, J., Wang, Y., Zhao, C., Ghanem, B., and Zhang, J. FreeDoM: Training-free energy- guided conditional diffusion model. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 23174–23184, 2023
2023
-
[77]
Controlvideo: Training-free controllable text-to-video gen- eration
Zhang, Y., Wei, Y., Jiang, D., Zhang, X., Zuo, W., and Tian, Q. Controlvideo: Training-free controllable text-to-video gen- eration. InInternational Conference on Learning Representations, 2024. 17 Appendix A Reward-Guided Reverse Kernels In this section, we provide derivations and interpretations for reward-guided reverse kernels. Figure 3 and Table 1 su...
-
[78]
The remaining problem is therefore the Euclidean projection of the sorted vectorx↑onto the box constraintsLr≤yr≤Ur, which is achieved by elementwise clipping
(79) For fixedy↑, this is minimized whenPx is sorted in the same order asy↑, that is, whenP =Px, by the rearrangement inequality. The remaining problem is therefore the Euclidean projection of the sorted vectorx↑onto the box constraintsLr≤yr≤Ur, which is achieved by elementwise clipping. This one-level construction is much tighter than the global interval...
-
[79]
Ifv(p)> 0, then the update in Equation(95)is the Euclidean projection ofx(p) ontoS(µ⋆ p,v⋆ p): x(p) new = ΠS(µ⋆p,v⋆p) ( x(p)) ∈arg min y∈S(µ⋆p,v⋆p) ∥y−x(p)∥2
-
[80]
Proof.Write x(p) = ¯x(p)1+c,1 ⊤c= 0,∥c∥2 2 =v (p)
(97) Whenv (p) = 0, the minimizer is not unique. Proof.Write x(p) = ¯x(p)1+c,1 ⊤c= 0,∥c∥2 2 =v (p). (98) Any feasibley∈S(µ⋆ p,v⋆ p)can be written as y=µ⋆ p1+d,1 ⊤d= 0,∥d∥2 2 =v ⋆ p. (99) By orthogonality, ∥y−x(p)∥2 2 =∥(µ⋆ p−¯x(p))1∥2 2 +∥d−c∥2 2 =F(µ⋆ p−¯x(p))2 +∥d−c∥2
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.