pith. sign in

arxiv: 2606.03119 · v1 · pith:SDUP3FXVnew · submitted 2026-06-02 · 💻 cs.CV · cs.AI· cs.LG

GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

Pith reviewed 2026-06-28 10:42 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords bridge modelsprior guidancetraining-free guidanceimage translationfrequency modulationdiffusion modelsin-painting
0
0 comments X

The pith

A weak unseen prior contrasted with the seen prior improves bridge model performance without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a training-free Prior Guidance method for bridge models, which generate data from data using an instructive clean prior. It introduces a weak prior unseen in pre-training to produce a degraded denoising result, then uses the contrast with the seen prior's result to scale and enhance exploitation of the good prior. This is refined into frequency-modulated prior guidance that adjusts scales for low- and high-frequency bands, plus a cascaded CFG-FMPG framework for in-painting. Experiments show consistent gains on image translation tasks. A reader would care because it upgrades existing models at inference time only.

Core claim

Prior Guidance (PG) introduces a weak prior unseen during bridge pre-training, hindering prior exploitation and degrading the denoising result. Contrasting it with the seen prior via a scaling factor highlights and enhances prior exploitation. Frequency-modulated prior guidance (FMPG) tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. For in-painting, a cascaded CFG-FMPG first generates a noisy hidden representation via CFG then exploits it as a generative prior with FMPG.

What carries the argument

Prior Guidance, which scales the difference between denoising results from a seen prior and a weak unseen prior to enhance exploitation in the bridge process.

If this is right

  • PG methods improve pre-trained bridge models across diverse image translation tasks.
  • FMPG adjusts guidance scales to match the frequency bands in bridge generative dynamics.
  • CFG-FMPG combines CFG and FMPG strengths for in-painting while preserving inference speed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The contrast approach could extend to creating guidance signals in other data-to-data or conditional generation settings.
  • Frequency-specific scaling might help in tasks where generative steps have varying frequency content.
  • Selecting different weak priors could be tested to optimize the degradation contrast for specific domains.

Load-bearing premise

A weak prior unseen during pre-training produces a sufficiently degraded denoising result whose contrast with the seen prior reliably enhances exploitation without new artifacts or instabilities.

What would settle it

Running the scaling procedure on a pre-trained bridge model for an image translation task and observing no quality gain or added artifacts would show the contrast mechanism does not work as claimed.

Figures

Figures reproduced from arXiv: 2606.03119 by Binjie Yuan, Jun S. Liu, Jun Zhu, Kaiwen Zheng, Yucheng Yang, Zehua Chen.

Figure 1
Figure 1. Figure 1: Overview of guidance strategies. Classifier-free guidance (CFG, left) enhances condition alignment by extrapolating an unconditional denoising result Dθuc and a conditional denoising result Dθc . Auto-guidance (AG, middle) improves sample quality by contrasting a full-capacity denoiser Dθgood against a less-capable denoiser Dθbad . Our proposed prior guidance (PG, right) encourages prior exploitation by co… view at source ↗
Figure 2
Figure 2. Figure 2: Guidance scale comparison and signal-to-noise (SNR) evolution. FMPG (Middle) adapts guidance to high-frequency (HF) and low-frequency (LF) bands, mirroring the typical U-shaped SNR (Right) observed in bridge models, whereas PG (Left) employs a constant scale and does not account for frequency dynamics. Step 2 Step 4 Step 6 Step 8 Step 10 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Frequency energy distribution of residuals. This plot maps the energy transfer from the input residual after extra noise addition (∆xt shown in the second row of figures) to the output residual (∆x0 shown in the first row of figures). Brighter colors indicate higher energy. 2024; Zheng et al., 2025) learn a data-to-data process be￾tween a prior pT (xT ) ∼ pprior and the target p0(x0) ∼ pdata. Specifically,… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison on ImageNet restoration. The corruption is simulated using a center 128×128 mask. Our proposed hybrid guidance strategy recovers semantic layout and high-frequency details, simultaneously. Notably, our method outperforms the standard CFG baseline. U-shaped profile similar to the trend of bridge SNR. Namely, considering that the prior information on the HF band has been corrupted at t… view at source ↗
Figure 5
Figure 5. Figure 5: Corruption Scale (10 NFE). Evaluated with w = 38.0. Optimal σ ∈ [0.30, 0.32]. C.2. Standard Regime (20 NFE) With 20 NFE, the optimal guidance decreases. The best performance is achieved at w = 20.0 (FID 3.79) and w = 21.0 (FID 3.81) [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Guidance Scale (20 NFE). Convex shape confirms optimality at w = 20.0 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Corruption Scale (20 NFE). Robust around σ = 0.30. C.3. High-Fidelity Regime (40 NFE) Optimal guidance scale shifts to w ≈ 14.0. The top two scales are w = 14.0 (FID 2.96) and w = 13.0 (FID 3.05). 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Guidance Scale (40 NFE). Shifted lower to w = 14.0 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Guidance Scale (100 NFE). Converged at w = 9.0. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Corruption Scale (100 NFE). Very clean prior required (σ = 0.22). D. Frequency-Specific Guidance Analysis To validate our Frequency-Modulated Prior Guidance (FMPG), we conducted an ablation study by restricting the guidance signal to specific frequency bands using FFT decomposition. Algorithm 1 Frequency-modulated Prior Guidance for Bridge Models 1: Input: Pre-trained denoiser Dθ, source image xT , steps … view at source ↗
Figure 12
Figure 12. Figure 12: Guidance Scale Search for Static-PG (Edges2Handbags (Isola et al., 2017)). Ten discrete scale points are connected by line segments, forming a roughly U-shaped unimodal curve with the optimum at w = 18. D.2. Calibration of Dynamic FMPG Schedules Building upon the optimal static baseline (wbase = 18), we further unlock the potential of our method by calibrating the dynamic modulation intensity. We independ… view at source ↗
Figure 13
Figure 13. Figure 13: 10 NFE Baseline. Optimal w = 2.0 [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: 20 NFE Baseline. Optimal w = 1.5. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: 40 NFE Baseline. Optimal w ≈ 1.0. E.2. Sensitivity Analysis of FMPG Parameters We conducted fine-grained ablation studies on the Start Ratio (τstart) to identify the optimal configuration for Frequency￾Modulated Prior Guidance (FMPG) under different computational budgets. Analysis at NFE = 10. For the low-budget regime, we fixed the hyperparameters as follows: • CFG (Ho & Salimans, 2021) Guidance Scale (w… view at source ↗
Figure 16
Figure 16. Figure 16: Qualitative Results on DIODE (Part I). Visual comparison of the first two samples. F.2. Edges2Handbags: Texture Evolution We further demonstrate robustness on the Edges2Handbags dataset. F.3. ImageNet: High-Fidelity Synthesis Finally, we present results on the challenging ImageNet (256 × 256) dataset. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Qualitative Results on DIODE (Part II). Visual comparison of additional samples. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Qualitative Results on Edges2Handbags (Part I). Visual comparison of additional samples. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Qualitative Results on Edges2Handbags (Part II). Visual comparison of additional samples. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Qualitative Results on ImageNet (Part I). Visual comparison of additional samples. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Qualitative Results on ImageNet (Part II). Visual comparison of additional samples. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_21.png] view at source ↗
read the original abstract

Guidance methods, such as classifier-free guidance (CFG) and auto-guidance (AG), have advanced noise-to-data generation in diffusion models. Recently, bridge models have introduced a data-to-data generative process that can exploit an instructive clean prior. In this work, inspired by previous methods creating quality difference between denoising results as guidance, we propose a training-free bridge guidance method, termed Prior Guidance (PG). Specifically, we introduce a weak prior, which is unseen during bridge pre-training, hindering prior exploitation and thereby degrading denoising result. Then, we contrast it with the seen prior to highlight and enhance prior exploitation via a scaling factor. Moreover, we analyze the underlying mechanism of prior exploitation in the bridge process and design frequency-modulated prior guidance (FMPG), which tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. To address prior exploitation in image in-painting, we develop a cascaded framework, CFG-FMPG, which first generates a noisy hidden representation via CFG and then exploits it as a generative prior with FMPG, fulfilling their complementary strengths without compromising inference efficiency. Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript proposes a training-free Prior Guidance (PG) method for bridge models that introduces a weak unseen prior to degrade the denoising result and then contrasts it with the seen prior via a scaling factor to enhance prior exploitation. It further develops Frequency-Modulated Prior Guidance (FMPG) by analyzing the mechanism of prior exploitation and tailoring the guidance scale to low- and high-frequency bands, plus a cascaded CFG-FMPG framework for inpainting that combines CFG and FMPG. The central claim is that these methods consistently improve pre-trained bridge models across diverse image translation tasks.

Significance. If the claimed improvements are robustly supported, the work would be significant for enabling practical enhancements to data-to-data bridge models without retraining or additional parameters. The training-free contrastive design and the frequency-modulated analysis represent strengths that extend guidance techniques from diffusion models to bridge models while preserving inference efficiency.

minor comments (1)
  1. [Abstract] Abstract: the statement that 'Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks' supplies no quantitative metrics, baselines, error bars, or controls, which is a presentation issue that reduces the abstract's informativeness even though the full experiments section presumably contains the supporting data.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript, the recognition of the training-free nature and frequency-modulated analysis as strengths, and the recommendation for minor revision. No specific major comments were provided in the report, so we have no point-by-point responses to address. We are pleased that the central claims regarding consistent improvements to pre-trained bridge models are viewed as potentially significant if robustly supported.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives PG as a training-free contrast between a seen prior and a deliberately weak unseen prior, then analyzes bridge dynamics to motivate frequency-modulated scaling in FMPG and a cascaded CFG-FMPG for inpainting. These steps are presented as direct consequences of the stated mechanism analysis rather than reductions to fitted parameters, self-definitions, or load-bearing self-citations. Experimental gains on pre-trained models are reported separately and do not feed back into the construction of the guidance equations themselves. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivation, assumptions, and experimental details unavailable.

axioms (1)
  • domain assumption Bridge models can exploit an instructive clean prior in a data-to-data generative process.
    Stated directly in the abstract as the foundation for the guidance approach.

pith-pipeline@v0.9.1-grok · 5761 in / 1163 out tokens · 33179 ms · 2026-06-28T10:42:03.218806+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

100 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    H., Jin, K

    Ahn, D., Cho, H., Min, J., Jang, W., Kim, J., Kim, S., Park, H. H., Jin, K. H., and Kim, S. Self-rectifying diffusion sampling with perturbed-attention guidance. In Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., and Varol, G. (eds.), Computer Vision -- ECCV 2024, 2025

  2. [2]

    Dynamic classifier-free diffusion guidance via online feedback

    Anonymous. Dynamic classifier-free diffusion guidance via online feedback. In The Fourteenth International Conference on Learning Representations, 2026

  3. [4]

    Bolton, A., Zhou, W., Chen, Z., Iacovides, G., and Mandic, D. P. Refinebridge: Generative bridge models improve financial forecasting by foundation models. In ICASSP, 2026

  4. [5]

    Stochastic self-guidance for training-free enhancement of diffusion models

    Chen, C., Zhu, J., Feng, X., Huang, N., Zhu, C., Wu, M., Mao, F., Wu, J., Chu, X., and Li, X. Stochastic self-guidance for training-free enhancement of diffusion models. In The Fourteenth International Conference on Learning Representations, 2026 a

  5. [6]

    Normalized attention guidance: Universal negative guidance for diffusion models

    Chen, D.-Y., Bandyopadhyay, H., Zou, K., and Song, Y.-Z. Normalized attention guidance: Universal negative guidance for diffusion models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  6. [7]

    Chen, T., Liu, G.-H., and Theodorou, E. A. Likelihood training of schrödinger bridge using forward-backward sdes theory. In ICLR, 2022 a

  7. [8]

    Infergrad: Improving diffusion models for vocoder by considering inference in training

    Chen, Z., Tan, X., Wang, K., Pan, S., Mandic, D., He, L., and Zhao, S. Infergrad: Improving diffusion models for vocoder by considering inference in training. In ICASSP, 2022 b

  8. [11]

    P., and Zhu, J

    Chen, Z., Miao, Y., Wang, L., Fan, L., Mandic, D. P., and Zhu, J. Versatile cardiovascular signal generation with a unified diffusion transformer. Nature Machine Intelligence, 8 0 (1): 0 6--19, 2026 b

  9. [12]

    Omni2sound: Towards unified video-text-to-audio generation

    Dai, Y., Chen, Z., Jiang, Y., Ke, Q., Cai, J., and Zhu, J. Omni2sound: Towards unified video-text-to-audio generation. In CVPR, 2026

  10. [13]

    2009, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255, doi: 10.1109/CVPR.2009.5206848 DES Collaboration, Abbott, T

    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 248--255, 2009. doi:10.1109/CVPR.2009.5206848

  11. [14]

    Scaling rectified flow transformers for high-resolution image synthesis

    Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., and Rombach, R. Scaling rectified flow transformers for high-resolution image synthesis. In ICML, 2024

  12. [15]

    On the guidance of flow matching

    Feng, R., Yu, C., Deng, W., Hu, P., and Wu, T. On the guidance of flow matching. In Forty-second International Conference on Machine Learning, 2025

  13. [16]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium

    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

  14. [17]

    and Salimans, T

    Ho, J. and Salimans, T. Classifier-free diffusion guidance. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021

  15. [18]

    Denoising diffusion probabilistic models

    Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.\ 6840--6851. Curran Associates, Inc., 2020

  16. [19]

    Smoothed energy guidance: Guiding diffusion models with reduced energy curvature of attention

    Hong, S. Smoothed energy guidance: Guiding diffusion models with reduced energy curvature of attention. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 66743--66772. Curran Associates, Inc., 2024

  17. [20]

    Improving sample quality of diffusion models using self-attention guidance

    Hong, S., Lee, G., Jang, W., and Kim, S. Improving sample quality of diffusion models using self-attention guidance. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 7428--7437, 2023

  18. [21]

    B., Romero-Soriano, A., Drozdzal, M., Verbeek, J., and Alahari, K

    Ifriqi, T. B., Romero-Soriano, A., Drozdzal, M., Verbeek, J., and Alahari, K. Entropy rectifying guidance for diffusion and flow models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  19. [22]

    Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

  20. [23]

    Freeaudio: Training-free timing planning for controllable long-form text-to-audio generation

    Jiang, Y., Chen, Z., Ju, Z., Li, C., Dou, W., and Zhu, J. Freeaudio: Training-free timing planning for controllable long-form text-to-audio generation. In ACM Multimedia, 2025

  21. [24]

    Elucidating the design space of diffusion-based generative models

    Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 26565--26577. Curran Associates, Inc., 2022

  22. [25]

    Guiding a diffusion model with a bad version of itself

    Karras, T., Aittala, M., Kynk\" a \" a nniemi, T., Lehtinen, J., Aila, T., and Laine, S. Guiding a diffusion model with a bad version of itself. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 52996--53021. Curran Associates, Inc., 2024

  23. [26]

    Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis

    Leng, Y., Chen, Z., Guo, J., Liu, H., Chen, J., Tan, X., Mandic, D., He, L., Li, X.-Y., Qin, T., Zhao, S., and Liu, T.-Y. Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis. In NeurIPS, 2022

  24. [27]

    Bridge-sr: Schrödinger bridge for efficient sr

    Li, C., Chen, Z., Bao, F., and Zhu, J. Bridge-sr: Schrödinger bridge for efficient sr. In ICASSP, 2025 a

  25. [28]

    Audio super-resolution with latent bridge models

    Li, C., Chen, Z., Wang, L., and Zhu, J. Audio super-resolution with latent bridge models. In NeurIPS, 2025 b

  26. [29]

    A., Nie, W., and Anandkumar, A

    Liu, G.-H., Vahdat, A., Huang, D.-A., Theodorou, E. A., Nie, W., and Anandkumar, A. I ^2 sb: Image-to-image schr \"o dinger bridge. In International Conference on Machine Learning, pp.\ 21551--21568. PMLR, 2023 a

  27. [30]

    A., and Chen, R

    Liu, G.-H., Lipman, Y., Nickel, M., Karrer, B., Theodorou, E. A., and Chen, R. T. Generalized schrödinger bridge matching. In ICLR, 2024

  28. [31]

    P., Wang, W., and Plumbley, M

    Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D. P., Wang, W., and Plumbley, M. D. Audioldm: Text-to-audio generation with latent diffusion models. In ICML, 2023 b

  29. [32]

    SDE dit: Guided image synthesis and editing with stochastic differential equations

    Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.-Y., and Ermon, S. SDE dit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022

  30. [33]

    Respdiff: An end-to-end multi-scale rnn diffusion model for respiratory waveform estimation from ppg signals

    Miao, Y., Chen, Z., Li, C., and Mandic, D. Respdiff: An end-to-end multi-scale rnn diffusion model for respiratory waveform estimation from ppg signals. In ICASSP, 2025

  31. [34]

    Diffgap: A lightweight diffusion module in contrastive space for bridging cross-model gap

    Mo, S., Chen, Z., Bao, F., and Zhu, J. Diffgap: A lightweight diffusion module in contrastive space for bridging cross-model gap. In ICASSP, 2025

  32. [35]

    Models, C. D. B. Guande he and kaiwen zheng and jianfei chen and fan bao and jun zhu. In NeurIPS, 2024

  33. [36]

    Dynamic classifier-free diffusion guidance via online feedback

    Papalampidi, P., Wiles, O., Ktena, I., Shtedritski, A., Bugliarello, E., Kajic, I., Albuquerque, I., and Nematzadeh, A. Dynamic classifier-free diffusion guidance via online feedback. In The Fourteenth International Conference on Learning Representations, 2026

  34. [37]

    High-resolution image synthesis with latent diffusion models

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

  35. [38]

    Sadat, S., Vontobel, T., Salehi, F., and Weber, R. M. Guidance in the frequency domain enables high-fidelity sampling at low cfg scales, 2025

  36. [39]

    Improved techniques for training gans

    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., and Chen, X. Improved techniques for training gans. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016

  37. [40]

    Denoising diffusion implicit models

    Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021 a

  38. [41]

    P., Kumar, A., Ermon, S., and Poole, B

    Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In ICLR, 2021 b

  39. [42]

    Dual diffusion implicit bridges for image-to-image translation

    Su, X., Song, J., Meng, C., and Ermon, S. Dual diffusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2023

  40. [43]

    Z., Daniele, A

    Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F. Z., Daniele, A. F., Mostajabi, M., Basart, S., Walter, M. R., and Shakhnarovich, G. Diode: A dense indoor and outdoor depth dataset, 2019

  41. [44]

    Audiomog: Guiding audio generation with mixture-of-guidance

    Wang, J., Chen, Z., Yuan, B., Zheng, K., Li, C., Jiang, Y., and Zhu, J. Audiomog: Guiding audio generation with mixture-of-guidance. In ICME, 2026

  42. [45]

    Towards a golden classifier-free guidance path via foresight fixed point iterations

    Wang, K., Mao, J., Wu, T., and Xiang, Y. Towards a golden classifier-free guidance path via foresight fixed point iterations. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025 a

  43. [46]

    Tiva: Time-aligned video-to-audio generation

    Wang, X., Wang, Y., Wu, Y., Song, R., Tan, X., Chen, Z., Xu, H., and Sui, G. Tiva: Time-aligned video-to-audio generation. In ACM MM, 2024

  44. [47]

    Framebridge: Improving image-to-video generation with bridge models

    Wang, Y., Chen, Z., Chen, X., Wei, Y., Zhu, J., and Chen, J. Framebridge: Improving image-to-video generation with bridge models. In ICML, 2025 b

  45. [49]

    A., Shechtman, E., and Wang, O

    Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

  46. [50]

    Zhang, S., Cheng, Y., and Steeg, G. V. Exploring the design space of diffusion bridge models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025 b

  47. [51]

    Diffusion bridge implicit models

    Zheng, K., He, G., Chen, J., Bao, F., and Zhu, J. Diffusion bridge implicit models. In The Thirteenth International Conference on Learning Representations, 2025

  48. [52]

    Denoising diffusion bridge models

    Zhou, L., Lou, A., Khanna, S., and Ermon, S. Denoising diffusion bridge models. In International Conference on Learning Representations, 2024

  49. [53]

    International Conference on Learning Representations , year=

    Denoising Diffusion Bridge Models , author=. International Conference on Learning Representations , year=

  50. [54]

    ICLR , year=

    Score-Based Generative Modeling through Stochastic Differential Equations , author=. ICLR , year=

  51. [55]

    ICLR , year=

    Generalized Schrödinger Bridge Matching , author=. ICLR , year=

  52. [56]

    ICLR , year=

    Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory , author=. ICLR , year=

  53. [57]

    ICASSP , year=

    Bridge-SR: Schrödinger Bridge for Efficient SR , author=. ICASSP , year=

  54. [58]

    NeurIPS , year=

    Audio Super-Resolution with Latent Bridge Models , author=. NeurIPS , year=

  55. [59]

    ICASSP , year=

    RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation Models , author=. ICASSP , year=

  56. [60]

    ACM MM , year=

    TiVA: Time-Aligned Video-to-Audio Generation , author=. ACM MM , year=

  57. [61]

    Nature Machine Intelligence , volume=

    Versatile cardiovascular signal generation with a unified diffusion transformer , author=. Nature Machine Intelligence , volume=. 2026 , publisher=

  58. [62]

    ICASSP , year=

    DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap , author=. ICASSP , year=

  59. [63]

    ICASSP , year=

    RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals , author=. ICASSP , year=

  60. [64]

    ICME , year=

    AudioMoG: Guiding Audio Generation with Mixture-of-Guidance , author=. ICME , year=

  61. [65]

    arXiv preprint arXiv:2212.14518 , year=

    ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech , author=. arXiv preprint arXiv:2212.14518 , year=

  62. [66]

    arXiv preprint arXiv:2509.25275 , year=

    VoiceBridge: General Speech Restoration with One-step Latent Bridge Models , author=. arXiv preprint arXiv:2509.25275 , year=

  63. [67]

    NeurIPS , year=

    Guande He and Kaiwen Zheng and Jianfei Chen and Fan Bao and Jun Zhu , author=. NeurIPS , year=

  64. [68]

    Available: https://arxiv.org/abs/2312.03491

    Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis , author=. arXiv preprint arXiv:2312.03491 , year=

  65. [69]

    Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets , author=. arXiv preprint arXiv:2311.15127 , year=

  66. [70]

    ICML , year=

    FrameBridge: Improving Image-to-Video Generation with Bridge Models , author=. ICML , year=

  67. [71]

    ICML , year=

    Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , author=. ICML , year=

  68. [72]

    CVPR , year=

    High-Resolution Image Synthesis with Latent Diffusion Models , author=. CVPR , year=

  69. [73]

    ICML , year=

    AudioLDM: Text-to-Audio Generation with Latent Diffusion Models , author=. ICML , year=

  70. [74]

    ICASSP , year=

    InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training , author=. ICASSP , year=

  71. [75]

    NeurIPS , year=

    BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis , author=. NeurIPS , year=

  72. [76]

    CVPR , year=

    Omni2Sound: Towards Unified Video-Text-to-Audio Generation , author=. CVPR , year=

  73. [77]

    ACM Multimedia , year=

    FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation , author=. ACM Multimedia , year=

  74. [78]

    I ^2 SB: Image-to-Image Schr

    Liu, Guan-Horng and Vahdat, Arash and Huang, De-An and Theodorou, Evangelos A and Nie, Weili and Anandkumar, Anima , booktitle=. I ^2 SB: Image-to-Image Schr. 2023 , organization=

  75. [79]

    NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=

    Classifier-Free Diffusion Guidance , author=. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=

  76. [80]

    Guiding a Diffusion Model with a Bad Version of Itself , volume =

    Karras, Tero and Aittala, Miika and Kynk\". Guiding a Diffusion Model with a Bad Version of Itself , volume =. Advances in Neural Information Processing Systems , editor =

  77. [81]

    The Thirteenth International Conference on Learning Representations , year=

    Diffusion Bridge Implicit Models , author=. The Thirteenth International Conference on Learning Representations , year=

  78. [82]

    Denoising Diffusion Probabilistic Models , volume =

    Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , booktitle =. Denoising Diffusion Probabilistic Models , volume =

  79. [83]

    Elucidating the Design Space of Diffusion-Based Generative Models , volume =

    Karras, Tero and Aittala, Miika and Aila, Timo and Laine, Samuli , booktitle =. Elucidating the Design Space of Diffusion-Based Generative Models , volume =

  80. [84]

    The Eleventh International Conference on Learning Representations , year=

    Dual Diffusion Implicit Bridges for Image-to-Image Translation , author=. The Eleventh International Conference on Learning Representations , year=

Showing first 80 references.