GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

Binjie Yuan; Jun S. Liu; Jun Zhu; Kaiwen Zheng; Yucheng Yang; Zehua Chen

arxiv: 2606.03119 · v1 · pith:SDUP3FXVnew · submitted 2026-06-02 · 💻 cs.CV · cs.AI· cs.LG

GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

Zehua Chen , Yucheng Yang , Binjie Yuan , Kaiwen Zheng , Jun S. Liu , Jun Zhu This is my paper

Pith reviewed 2026-06-28 10:42 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords bridge modelsprior guidancetraining-free guidanceimage translationfrequency modulationdiffusion modelsin-painting

0 comments

The pith

A weak unseen prior contrasted with the seen prior improves bridge model performance without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a training-free Prior Guidance method for bridge models, which generate data from data using an instructive clean prior. It introduces a weak prior unseen in pre-training to produce a degraded denoising result, then uses the contrast with the seen prior's result to scale and enhance exploitation of the good prior. This is refined into frequency-modulated prior guidance that adjusts scales for low- and high-frequency bands, plus a cascaded CFG-FMPG framework for in-painting. Experiments show consistent gains on image translation tasks. A reader would care because it upgrades existing models at inference time only.

Core claim

Prior Guidance (PG) introduces a weak prior unseen during bridge pre-training, hindering prior exploitation and degrading the denoising result. Contrasting it with the seen prior via a scaling factor highlights and enhances prior exploitation. Frequency-modulated prior guidance (FMPG) tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. For in-painting, a cascaded CFG-FMPG first generates a noisy hidden representation via CFG then exploits it as a generative prior with FMPG.

What carries the argument

Prior Guidance, which scales the difference between denoising results from a seen prior and a weak unseen prior to enhance exploitation in the bridge process.

If this is right

PG methods improve pre-trained bridge models across diverse image translation tasks.
FMPG adjusts guidance scales to match the frequency bands in bridge generative dynamics.
CFG-FMPG combines CFG and FMPG strengths for in-painting while preserving inference speed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The contrast approach could extend to creating guidance signals in other data-to-data or conditional generation settings.
Frequency-specific scaling might help in tasks where generative steps have varying frequency content.
Selecting different weak priors could be tested to optimize the degradation contrast for specific domains.

Load-bearing premise

A weak prior unseen during pre-training produces a sufficiently degraded denoising result whose contrast with the seen prior reliably enhances exploitation without new artifacts or instabilities.

What would settle it

Running the scaling procedure on a pre-trained bridge model for an image translation task and observing no quality gain or added artifacts would show the contrast mechanism does not work as claimed.

Figures

Figures reproduced from arXiv: 2606.03119 by Binjie Yuan, Jun S. Liu, Jun Zhu, Kaiwen Zheng, Yucheng Yang, Zehua Chen.

**Figure 1.** Figure 1: Overview of guidance strategies. Classifier-free guidance (CFG, left) enhances condition alignment by extrapolating an unconditional denoising result Dθuc and a conditional denoising result Dθc . Auto-guidance (AG, middle) improves sample quality by contrasting a full-capacity denoiser Dθgood against a less-capable denoiser Dθbad . Our proposed prior guidance (PG, right) encourages prior exploitation by co… view at source ↗

**Figure 2.** Figure 2: Guidance scale comparison and signal-to-noise (SNR) evolution. FMPG (Middle) adapts guidance to high-frequency (HF) and low-frequency (LF) bands, mirroring the typical U-shaped SNR (Right) observed in bridge models, whereas PG (Left) employs a constant scale and does not account for frequency dynamics. Step 2 Step 4 Step 6 Step 8 Step 10 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Frequency energy distribution of residuals. This plot maps the energy transfer from the input residual after extra noise addition (∆xt shown in the second row of figures) to the output residual (∆x0 shown in the first row of figures). Brighter colors indicate higher energy. 2024; Zheng et al., 2025) learn a data-to-data process between a prior pT (xT ) ∼ pprior and the target p0(x0) ∼ pdata. Specifically,… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on ImageNet restoration. The corruption is simulated using a center 128×128 mask. Our proposed hybrid guidance strategy recovers semantic layout and high-frequency details, simultaneously. Notably, our method outperforms the standard CFG baseline. U-shaped profile similar to the trend of bridge SNR. Namely, considering that the prior information on the HF band has been corrupted at t… view at source ↗

**Figure 5.** Figure 5: Corruption Scale (10 NFE). Evaluated with w = 38.0. Optimal σ ∈ [0.30, 0.32]. C.2. Standard Regime (20 NFE) With 20 NFE, the optimal guidance decreases. The best performance is achieved at w = 20.0 (FID 3.79) and w = 21.0 (FID 3.81) [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Guidance Scale (20 NFE). Convex shape confirms optimality at w = 20.0 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Corruption Scale (20 NFE). Robust around σ = 0.30. C.3. High-Fidelity Regime (40 NFE) Optimal guidance scale shifts to w ≈ 14.0. The top two scales are w = 14.0 (FID 2.96) and w = 13.0 (FID 3.05). 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Guidance Scale (40 NFE). Shifted lower to w = 14.0 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Guidance Scale (100 NFE). Converged at w = 9.0. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Corruption Scale (100 NFE). Very clean prior required (σ = 0.22). D. Frequency-Specific Guidance Analysis To validate our Frequency-Modulated Prior Guidance (FMPG), we conducted an ablation study by restricting the guidance signal to specific frequency bands using FFT decomposition. Algorithm 1 Frequency-modulated Prior Guidance for Bridge Models 1: Input: Pre-trained denoiser Dθ, source image xT , steps … view at source ↗

**Figure 12.** Figure 12: Guidance Scale Search for Static-PG (Edges2Handbags (Isola et al., 2017)). Ten discrete scale points are connected by line segments, forming a roughly U-shaped unimodal curve with the optimum at w = 18. D.2. Calibration of Dynamic FMPG Schedules Building upon the optimal static baseline (wbase = 18), we further unlock the potential of our method by calibrating the dynamic modulation intensity. We independ… view at source ↗

**Figure 13.** Figure 13: 10 NFE Baseline. Optimal w = 2.0 [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗

**Figure 14.** Figure 14: 20 NFE Baseline. Optimal w = 1.5. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: 40 NFE Baseline. Optimal w ≈ 1.0. E.2. Sensitivity Analysis of FMPG Parameters We conducted fine-grained ablation studies on the Start Ratio (τstart) to identify the optimal configuration for FrequencyModulated Prior Guidance (FMPG) under different computational budgets. Analysis at NFE = 10. For the low-budget regime, we fixed the hyperparameters as follows: • CFG (Ho & Salimans, 2021) Guidance Scale (w… view at source ↗

**Figure 16.** Figure 16: Qualitative Results on DIODE (Part I). Visual comparison of the first two samples. F.2. Edges2Handbags: Texture Evolution We further demonstrate robustness on the Edges2Handbags dataset. F.3. ImageNet: High-Fidelity Synthesis Finally, we present results on the challenging ImageNet (256 × 256) dataset. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: Qualitative Results on DIODE (Part II). Visual comparison of additional samples. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗

**Figure 18.** Figure 18: Qualitative Results on Edges2Handbags (Part I). Visual comparison of additional samples. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗

**Figure 19.** Figure 19: Qualitative Results on Edges2Handbags (Part II). Visual comparison of additional samples. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_19.png] view at source ↗

**Figure 20.** Figure 20: Qualitative Results on ImageNet (Part I). Visual comparison of additional samples. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗

**Figure 21.** Figure 21: Qualitative Results on ImageNet (Part II). Visual comparison of additional samples. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_21.png] view at source ↗

read the original abstract

Guidance methods, such as classifier-free guidance (CFG) and auto-guidance (AG), have advanced noise-to-data generation in diffusion models. Recently, bridge models have introduced a data-to-data generative process that can exploit an instructive clean prior. In this work, inspired by previous methods creating quality difference between denoising results as guidance, we propose a training-free bridge guidance method, termed Prior Guidance (PG). Specifically, we introduce a weak prior, which is unseen during bridge pre-training, hindering prior exploitation and thereby degrading denoising result. Then, we contrast it with the seen prior to highlight and enhance prior exploitation via a scaling factor. Moreover, we analyze the underlying mechanism of prior exploitation in the bridge process and design frequency-modulated prior guidance (FMPG), which tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. To address prior exploitation in image in-painting, we develop a cascaded framework, CFG-FMPG, which first generates a noisy hidden representation via CFG and then exploits it as a generative prior with FMPG, fulfilling their complementary strengths without compromising inference efficiency. Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives bridge models a training-free guidance method that contrasts a weak unseen prior against the training prior, with frequency modulation to match the process dynamics.

read the letter

The main contribution is Prior Guidance (PG) and its frequency-modulated version (FMPG) for bridge models. They create a degraded result by feeding a weak prior that was not seen during pre-training, then scale the difference against the standard prior to strengthen exploitation. They also analyze how prior information flows through the bridge steps and adjust the scale separately for low- and high-frequency bands. For inpainting they add a simple cascade that first runs CFG then applies FMPG.

The adaptation to the data-to-data bridge setting is the clearest new piece. Standard CFG and auto-guidance were built for noise-to-data diffusion; this work starts from the bridge equations and the frequency tailoring follows from that analysis. The cascaded inpainting setup is a straightforward way to combine the two without extra training.

The experiments are described as showing consistent gains on several image translation tasks. That is the main support for the claim. The soft spot is that the abstract gives no numbers, error bars, or list of baselines, so the size of the improvement and its robustness are not visible here. If the full paper supplies those controls and the gains survive them, the practical value is real. The assumption that the weak-prior contrast stays stable is the natural point of fragility, but the paper says the frequency modulation is meant to handle it.

This is for people already working with bridge or flow-based models for translation or editing tasks. A reader who needs inference-time improvement on an existing bridge checkpoint would find the method worth testing.

I would send it to peer review. The procedure is new, the motivation is tied to the bridge dynamics, and the claims are concrete enough for referees to check.

Referee Report

0 major / 1 minor

Summary. The manuscript proposes a training-free Prior Guidance (PG) method for bridge models that introduces a weak unseen prior to degrade the denoising result and then contrasts it with the seen prior via a scaling factor to enhance prior exploitation. It further develops Frequency-Modulated Prior Guidance (FMPG) by analyzing the mechanism of prior exploitation and tailoring the guidance scale to low- and high-frequency bands, plus a cascaded CFG-FMPG framework for inpainting that combines CFG and FMPG. The central claim is that these methods consistently improve pre-trained bridge models across diverse image translation tasks.

Significance. If the claimed improvements are robustly supported, the work would be significant for enabling practical enhancements to data-to-data bridge models without retraining or additional parameters. The training-free contrastive design and the frequency-modulated analysis represent strengths that extend guidance techniques from diffusion models to bridge models while preserving inference efficiency.

minor comments (1)

[Abstract] Abstract: the statement that 'Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks' supplies no quantitative metrics, baselines, error bars, or controls, which is a presentation issue that reduces the abstract's informativeness even though the full experiments section presumably contains the supporting data.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript, the recognition of the training-free nature and frequency-modulated analysis as strengths, and the recommendation for minor revision. No specific major comments were provided in the report, so we have no point-by-point responses to address. We are pleased that the central claims regarding consistent improvements to pre-trained bridge models are viewed as potentially significant if robustly supported.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives PG as a training-free contrast between a seen prior and a deliberately weak unseen prior, then analyzes bridge dynamics to motivate frequency-modulated scaling in FMPG and a cascaded CFG-FMPG for inpainting. These steps are presented as direct consequences of the stated mechanism analysis rather than reductions to fitted parameters, self-definitions, or load-bearing self-citations. Experimental gains on pre-trained models are reported separately and do not feed back into the construction of the guidance equations themselves. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivation, assumptions, and experimental details unavailable.

axioms (1)

domain assumption Bridge models can exploit an instructive clean prior in a data-to-data generative process.
Stated directly in the abstract as the foundation for the guidance approach.

pith-pipeline@v0.9.1-grok · 5761 in / 1163 out tokens · 33179 ms · 2026-06-28T10:42:03.218806+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

100 extracted references · 5 canonical work pages · 1 internal anchor

[1]

H., Jin, K

Ahn, D., Cho, H., Min, J., Jang, W., Kim, J., Kim, S., Park, H. H., Jin, K. H., and Kim, S. Self-rectifying diffusion sampling with perturbed-attention guidance. In Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., and Varol, G. (eds.), Computer Vision -- ECCV 2024, 2025

2024
[2]

Dynamic classifier-free diffusion guidance via online feedback

Anonymous. Dynamic classifier-free diffusion guidance via online feedback. In The Fourteenth International Conference on Learning Representations, 2026

2026
[4]

Bolton, A., Zhou, W., Chen, Z., Iacovides, G., and Mandic, D. P. Refinebridge: Generative bridge models improve financial forecasting by foundation models. In ICASSP, 2026

2026
[5]

Stochastic self-guidance for training-free enhancement of diffusion models

Chen, C., Zhu, J., Feng, X., Huang, N., Zhu, C., Wu, M., Mao, F., Wu, J., Chu, X., and Li, X. Stochastic self-guidance for training-free enhancement of diffusion models. In The Fourteenth International Conference on Learning Representations, 2026 a

2026
[6]

Normalized attention guidance: Universal negative guidance for diffusion models

Chen, D.-Y., Bandyopadhyay, H., Zou, K., and Song, Y.-Z. Normalized attention guidance: Universal negative guidance for diffusion models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[7]

Chen, T., Liu, G.-H., and Theodorou, E. A. Likelihood training of schrödinger bridge using forward-backward sdes theory. In ICLR, 2022 a

2022
[8]

Infergrad: Improving diffusion models for vocoder by considering inference in training

Chen, Z., Tan, X., Wang, K., Pan, S., Mandic, D., He, L., and Zhao, S. Infergrad: Improving diffusion models for vocoder by considering inference in training. In ICASSP, 2022 b

2022
[11]

P., and Zhu, J

Chen, Z., Miao, Y., Wang, L., Fan, L., Mandic, D. P., and Zhu, J. Versatile cardiovascular signal generation with a unified diffusion transformer. Nature Machine Intelligence, 8 0 (1): 0 6--19, 2026 b

2026
[12]

Omni2sound: Towards unified video-text-to-audio generation

Dai, Y., Chen, Z., Jiang, Y., Ke, Q., Cai, J., and Zhu, J. Omni2sound: Towards unified video-text-to-audio generation. In CVPR, 2026

2026
[13]

2009, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255, doi: 10.1109/CVPR.2009.5206848 DES Collaboration, Abbott, T

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 248--255, 2009. doi:10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009
[14]

Scaling rectified flow transformers for high-resolution image synthesis

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., and Rombach, R. Scaling rectified flow transformers for high-resolution image synthesis. In ICML, 2024

2024
[15]

On the guidance of flow matching

Feng, R., Yu, C., Deng, W., Hu, P., and Wu, T. On the guidance of flow matching. In Forty-second International Conference on Machine Learning, 2025

2025
[16]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017
[17]

and Salimans, T

Ho, J. and Salimans, T. Classifier-free diffusion guidance. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021

2021
[18]

Denoising diffusion probabilistic models

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.\ 6840--6851. Curran Associates, Inc., 2020

2020
[19]

Smoothed energy guidance: Guiding diffusion models with reduced energy curvature of attention

Hong, S. Smoothed energy guidance: Guiding diffusion models with reduced energy curvature of attention. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 66743--66772. Curran Associates, Inc., 2024

2024
[20]

Improving sample quality of diffusion models using self-attention guidance

Hong, S., Lee, G., Jang, W., and Kim, S. Improving sample quality of diffusion models using self-attention guidance. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 7428--7437, 2023

2023
[21]

B., Romero-Soriano, A., Drozdzal, M., Verbeek, J., and Alahari, K

Ifriqi, T. B., Romero-Soriano, A., Drozdzal, M., Verbeek, J., and Alahari, K. Entropy rectifying guidance for diffusion and flow models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[22]

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

2017
[23]

Freeaudio: Training-free timing planning for controllable long-form text-to-audio generation

Jiang, Y., Chen, Z., Ju, Z., Li, C., Dou, W., and Zhu, J. Freeaudio: Training-free timing planning for controllable long-form text-to-audio generation. In ACM Multimedia, 2025

2025
[24]

Elucidating the design space of diffusion-based generative models

Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 26565--26577. Curran Associates, Inc., 2022

2022
[25]

Guiding a diffusion model with a bad version of itself

Karras, T., Aittala, M., Kynk\" a \" a nniemi, T., Lehtinen, J., Aila, T., and Laine, S. Guiding a diffusion model with a bad version of itself. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 52996--53021. Curran Associates, Inc., 2024

2024
[26]

Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis

Leng, Y., Chen, Z., Guo, J., Liu, H., Chen, J., Tan, X., Mandic, D., He, L., Li, X.-Y., Qin, T., Zhao, S., and Liu, T.-Y. Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis. In NeurIPS, 2022

2022
[27]

Bridge-sr: Schrödinger bridge for efficient sr

Li, C., Chen, Z., Bao, F., and Zhu, J. Bridge-sr: Schrödinger bridge for efficient sr. In ICASSP, 2025 a

2025
[28]

Audio super-resolution with latent bridge models

Li, C., Chen, Z., Wang, L., and Zhu, J. Audio super-resolution with latent bridge models. In NeurIPS, 2025 b

2025
[29]

A., Nie, W., and Anandkumar, A

Liu, G.-H., Vahdat, A., Huang, D.-A., Theodorou, E. A., Nie, W., and Anandkumar, A. I ^2 sb: Image-to-image schr \"o dinger bridge. In International Conference on Machine Learning, pp.\ 21551--21568. PMLR, 2023 a

2023
[30]

A., and Chen, R

Liu, G.-H., Lipman, Y., Nickel, M., Karrer, B., Theodorou, E. A., and Chen, R. T. Generalized schrödinger bridge matching. In ICLR, 2024

2024
[31]

P., Wang, W., and Plumbley, M

Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D. P., Wang, W., and Plumbley, M. D. Audioldm: Text-to-audio generation with latent diffusion models. In ICML, 2023 b

2023
[32]

SDE dit: Guided image synthesis and editing with stochastic differential equations

Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.-Y., and Ermon, S. SDE dit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022

2022
[33]

Respdiff: An end-to-end multi-scale rnn diffusion model for respiratory waveform estimation from ppg signals

Miao, Y., Chen, Z., Li, C., and Mandic, D. Respdiff: An end-to-end multi-scale rnn diffusion model for respiratory waveform estimation from ppg signals. In ICASSP, 2025

2025
[34]

Diffgap: A lightweight diffusion module in contrastive space for bridging cross-model gap

Mo, S., Chen, Z., Bao, F., and Zhu, J. Diffgap: A lightweight diffusion module in contrastive space for bridging cross-model gap. In ICASSP, 2025

2025
[35]

Models, C. D. B. Guande he and kaiwen zheng and jianfei chen and fan bao and jun zhu. In NeurIPS, 2024

2024
[36]

Dynamic classifier-free diffusion guidance via online feedback

Papalampidi, P., Wiles, O., Ktena, I., Shtedritski, A., Bugliarello, E., Kajic, I., Albuquerque, I., and Nematzadeh, A. Dynamic classifier-free diffusion guidance via online feedback. In The Fourteenth International Conference on Learning Representations, 2026

2026
[37]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

2022
[38]

Sadat, S., Vontobel, T., Salehi, F., and Weber, R. M. Guidance in the frequency domain enables high-fidelity sampling at low cfg scales, 2025

2025
[39]

Improved techniques for training gans

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., and Chen, X. Improved techniques for training gans. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016

2016
[40]

Denoising diffusion implicit models

Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021 a

2021
[41]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In ICLR, 2021 b

2021
[42]

Dual diffusion implicit bridges for image-to-image translation

Su, X., Song, J., Meng, C., and Ermon, S. Dual diffusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2023

2023
[43]

Z., Daniele, A

Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F. Z., Daniele, A. F., Mostajabi, M., Basart, S., Walter, M. R., and Shakhnarovich, G. Diode: A dense indoor and outdoor depth dataset, 2019

2019
[44]

Audiomog: Guiding audio generation with mixture-of-guidance

Wang, J., Chen, Z., Yuan, B., Zheng, K., Li, C., Jiang, Y., and Zhu, J. Audiomog: Guiding audio generation with mixture-of-guidance. In ICME, 2026

2026
[45]

Towards a golden classifier-free guidance path via foresight fixed point iterations

Wang, K., Mao, J., Wu, T., and Xiang, Y. Towards a golden classifier-free guidance path via foresight fixed point iterations. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025 a

2025
[46]

Tiva: Time-aligned video-to-audio generation

Wang, X., Wang, Y., Wu, Y., Song, R., Tan, X., Chen, Z., Xu, H., and Sui, G. Tiva: Time-aligned video-to-audio generation. In ACM MM, 2024

2024
[47]

Framebridge: Improving image-to-video generation with bridge models

Wang, Y., Chen, Z., Chen, X., Wei, Y., Zhu, J., and Chen, J. Framebridge: Improving image-to-video generation with bridge models. In ICML, 2025 b

2025
[49]

A., Shechtman, E., and Wang, O

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

2018
[50]

Zhang, S., Cheng, Y., and Steeg, G. V. Exploring the design space of diffusion bridge models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025 b

2025
[51]

Diffusion bridge implicit models

Zheng, K., He, G., Chen, J., Bao, F., and Zhu, J. Diffusion bridge implicit models. In The Thirteenth International Conference on Learning Representations, 2025

2025
[52]

Denoising diffusion bridge models

Zhou, L., Lou, A., Khanna, S., and Ermon, S. Denoising diffusion bridge models. In International Conference on Learning Representations, 2024

2024
[53]

International Conference on Learning Representations , year=

Denoising Diffusion Bridge Models , author=. International Conference on Learning Representations , year=
[54]

ICLR , year=

Score-Based Generative Modeling through Stochastic Differential Equations , author=. ICLR , year=
[55]

ICLR , year=

Generalized Schrödinger Bridge Matching , author=. ICLR , year=
[56]

ICLR , year=

Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory , author=. ICLR , year=
[57]

ICASSP , year=

Bridge-SR: Schrödinger Bridge for Efficient SR , author=. ICASSP , year=
[58]

NeurIPS , year=

Audio Super-Resolution with Latent Bridge Models , author=. NeurIPS , year=
[59]

ICASSP , year=

RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation Models , author=. ICASSP , year=
[60]

ACM MM , year=

TiVA: Time-Aligned Video-to-Audio Generation , author=. ACM MM , year=
[61]

Nature Machine Intelligence , volume=

Versatile cardiovascular signal generation with a unified diffusion transformer , author=. Nature Machine Intelligence , volume=. 2026 , publisher=

2026
[62]

ICASSP , year=

DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap , author=. ICASSP , year=
[63]

ICASSP , year=

RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals , author=. ICASSP , year=
[64]

ICME , year=

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance , author=. ICME , year=
[65]

arXiv preprint arXiv:2212.14518 , year=

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech , author=. arXiv preprint arXiv:2212.14518 , year=

work page arXiv
[66]

arXiv preprint arXiv:2509.25275 , year=

VoiceBridge: General Speech Restoration with One-step Latent Bridge Models , author=. arXiv preprint arXiv:2509.25275 , year=

work page arXiv
[67]

NeurIPS , year=

Guande He and Kaiwen Zheng and Jianfei Chen and Fan Bao and Jun Zhu , author=. NeurIPS , year=
[68]

Available: https://arxiv.org/abs/2312.03491

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis , author=. arXiv preprint arXiv:2312.03491 , year=

work page arXiv
[69]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets , author=. arXiv preprint arXiv:2311.15127 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[70]

ICML , year=

FrameBridge: Improving Image-to-Video Generation with Bridge Models , author=. ICML , year=
[71]

ICML , year=

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , author=. ICML , year=
[72]

CVPR , year=

High-Resolution Image Synthesis with Latent Diffusion Models , author=. CVPR , year=
[73]

ICML , year=

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models , author=. ICML , year=
[74]

ICASSP , year=

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training , author=. ICASSP , year=
[75]

NeurIPS , year=

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis , author=. NeurIPS , year=
[76]

CVPR , year=

Omni2Sound: Towards Unified Video-Text-to-Audio Generation , author=. CVPR , year=
[77]

ACM Multimedia , year=

FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation , author=. ACM Multimedia , year=
[78]

I ^2 SB: Image-to-Image Schr

Liu, Guan-Horng and Vahdat, Arash and Huang, De-An and Theodorou, Evangelos A and Nie, Weili and Anandkumar, Anima , booktitle=. I ^2 SB: Image-to-Image Schr. 2023 , organization=

2023
[79]

NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=

Classifier-Free Diffusion Guidance , author=. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=

2021
[80]

Guiding a Diffusion Model with a Bad Version of Itself , volume =

Karras, Tero and Aittala, Miika and Kynk\". Guiding a Diffusion Model with a Bad Version of Itself , volume =. Advances in Neural Information Processing Systems , editor =
[81]

The Thirteenth International Conference on Learning Representations , year=

Diffusion Bridge Implicit Models , author=. The Thirteenth International Conference on Learning Representations , year=
[82]

Denoising Diffusion Probabilistic Models , volume =

Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , booktitle =. Denoising Diffusion Probabilistic Models , volume =
[83]

Elucidating the Design Space of Diffusion-Based Generative Models , volume =

Karras, Tero and Aittala, Miika and Aila, Timo and Laine, Samuli , booktitle =. Elucidating the Design Space of Diffusion-Based Generative Models , volume =
[84]

The Eleventh International Conference on Learning Representations , year=

Dual Diffusion Implicit Bridges for Image-to-Image Translation , author=. The Eleventh International Conference on Learning Representations , year=

Showing first 80 references.

[1] [1]

H., Jin, K

Ahn, D., Cho, H., Min, J., Jang, W., Kim, J., Kim, S., Park, H. H., Jin, K. H., and Kim, S. Self-rectifying diffusion sampling with perturbed-attention guidance. In Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., and Varol, G. (eds.), Computer Vision -- ECCV 2024, 2025

2024

[2] [2]

Dynamic classifier-free diffusion guidance via online feedback

Anonymous. Dynamic classifier-free diffusion guidance via online feedback. In The Fourteenth International Conference on Learning Representations, 2026

2026

[3] [4]

Bolton, A., Zhou, W., Chen, Z., Iacovides, G., and Mandic, D. P. Refinebridge: Generative bridge models improve financial forecasting by foundation models. In ICASSP, 2026

2026

[4] [5]

Stochastic self-guidance for training-free enhancement of diffusion models

Chen, C., Zhu, J., Feng, X., Huang, N., Zhu, C., Wu, M., Mao, F., Wu, J., Chu, X., and Li, X. Stochastic self-guidance for training-free enhancement of diffusion models. In The Fourteenth International Conference on Learning Representations, 2026 a

2026

[5] [6]

Normalized attention guidance: Universal negative guidance for diffusion models

Chen, D.-Y., Bandyopadhyay, H., Zou, K., and Song, Y.-Z. Normalized attention guidance: Universal negative guidance for diffusion models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025

[6] [7]

Chen, T., Liu, G.-H., and Theodorou, E. A. Likelihood training of schrödinger bridge using forward-backward sdes theory. In ICLR, 2022 a

2022

[7] [8]

Infergrad: Improving diffusion models for vocoder by considering inference in training

Chen, Z., Tan, X., Wang, K., Pan, S., Mandic, D., He, L., and Zhao, S. Infergrad: Improving diffusion models for vocoder by considering inference in training. In ICASSP, 2022 b

2022

[8] [11]

P., and Zhu, J

Chen, Z., Miao, Y., Wang, L., Fan, L., Mandic, D. P., and Zhu, J. Versatile cardiovascular signal generation with a unified diffusion transformer. Nature Machine Intelligence, 8 0 (1): 0 6--19, 2026 b

2026

[9] [12]

Omni2sound: Towards unified video-text-to-audio generation

Dai, Y., Chen, Z., Jiang, Y., Ke, Q., Cai, J., and Zhu, J. Omni2sound: Towards unified video-text-to-audio generation. In CVPR, 2026

2026

[10] [13]

2009, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255, doi: 10.1109/CVPR.2009.5206848 DES Collaboration, Abbott, T

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 248--255, 2009. doi:10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009

[11] [14]

Scaling rectified flow transformers for high-resolution image synthesis

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., and Rombach, R. Scaling rectified flow transformers for high-resolution image synthesis. In ICML, 2024

2024

[12] [15]

On the guidance of flow matching

Feng, R., Yu, C., Deng, W., Hu, P., and Wu, T. On the guidance of flow matching. In Forty-second International Conference on Machine Learning, 2025

2025

[13] [16]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017

[14] [17]

and Salimans, T

Ho, J. and Salimans, T. Classifier-free diffusion guidance. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021

2021

[15] [18]

Denoising diffusion probabilistic models

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.\ 6840--6851. Curran Associates, Inc., 2020

2020

[16] [19]

Smoothed energy guidance: Guiding diffusion models with reduced energy curvature of attention

Hong, S. Smoothed energy guidance: Guiding diffusion models with reduced energy curvature of attention. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 66743--66772. Curran Associates, Inc., 2024

2024

[17] [20]

Improving sample quality of diffusion models using self-attention guidance

Hong, S., Lee, G., Jang, W., and Kim, S. Improving sample quality of diffusion models using self-attention guidance. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 7428--7437, 2023

2023

[18] [21]

B., Romero-Soriano, A., Drozdzal, M., Verbeek, J., and Alahari, K

Ifriqi, T. B., Romero-Soriano, A., Drozdzal, M., Verbeek, J., and Alahari, K. Entropy rectifying guidance for diffusion and flow models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025

[19] [22]

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

2017

[20] [23]

Freeaudio: Training-free timing planning for controllable long-form text-to-audio generation

Jiang, Y., Chen, Z., Ju, Z., Li, C., Dou, W., and Zhu, J. Freeaudio: Training-free timing planning for controllable long-form text-to-audio generation. In ACM Multimedia, 2025

2025

[21] [24]

Elucidating the design space of diffusion-based generative models

Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 26565--26577. Curran Associates, Inc., 2022

2022

[22] [25]

Guiding a diffusion model with a bad version of itself

Karras, T., Aittala, M., Kynk\" a \" a nniemi, T., Lehtinen, J., Aila, T., and Laine, S. Guiding a diffusion model with a bad version of itself. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C. (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 52996--53021. Curran Associates, Inc., 2024

2024

[23] [26]

Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis

Leng, Y., Chen, Z., Guo, J., Liu, H., Chen, J., Tan, X., Mandic, D., He, L., Li, X.-Y., Qin, T., Zhao, S., and Liu, T.-Y. Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis. In NeurIPS, 2022

2022

[24] [27]

Bridge-sr: Schrödinger bridge for efficient sr

Li, C., Chen, Z., Bao, F., and Zhu, J. Bridge-sr: Schrödinger bridge for efficient sr. In ICASSP, 2025 a

2025

[25] [28]

Audio super-resolution with latent bridge models

Li, C., Chen, Z., Wang, L., and Zhu, J. Audio super-resolution with latent bridge models. In NeurIPS, 2025 b

2025

[26] [29]

A., Nie, W., and Anandkumar, A

Liu, G.-H., Vahdat, A., Huang, D.-A., Theodorou, E. A., Nie, W., and Anandkumar, A. I ^2 sb: Image-to-image schr \"o dinger bridge. In International Conference on Machine Learning, pp.\ 21551--21568. PMLR, 2023 a

2023

[27] [30]

A., and Chen, R

Liu, G.-H., Lipman, Y., Nickel, M., Karrer, B., Theodorou, E. A., and Chen, R. T. Generalized schrödinger bridge matching. In ICLR, 2024

2024

[28] [31]

P., Wang, W., and Plumbley, M

Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D. P., Wang, W., and Plumbley, M. D. Audioldm: Text-to-audio generation with latent diffusion models. In ICML, 2023 b

2023

[29] [32]

SDE dit: Guided image synthesis and editing with stochastic differential equations

Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.-Y., and Ermon, S. SDE dit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022

2022

[30] [33]

Respdiff: An end-to-end multi-scale rnn diffusion model for respiratory waveform estimation from ppg signals

Miao, Y., Chen, Z., Li, C., and Mandic, D. Respdiff: An end-to-end multi-scale rnn diffusion model for respiratory waveform estimation from ppg signals. In ICASSP, 2025

2025

[31] [34]

Diffgap: A lightweight diffusion module in contrastive space for bridging cross-model gap

Mo, S., Chen, Z., Bao, F., and Zhu, J. Diffgap: A lightweight diffusion module in contrastive space for bridging cross-model gap. In ICASSP, 2025

2025

[32] [35]

Models, C. D. B. Guande he and kaiwen zheng and jianfei chen and fan bao and jun zhu. In NeurIPS, 2024

2024

[33] [36]

Dynamic classifier-free diffusion guidance via online feedback

Papalampidi, P., Wiles, O., Ktena, I., Shtedritski, A., Bugliarello, E., Kajic, I., Albuquerque, I., and Nematzadeh, A. Dynamic classifier-free diffusion guidance via online feedback. In The Fourteenth International Conference on Learning Representations, 2026

2026

[34] [37]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

2022

[35] [38]

Sadat, S., Vontobel, T., Salehi, F., and Weber, R. M. Guidance in the frequency domain enables high-fidelity sampling at low cfg scales, 2025

2025

[36] [39]

Improved techniques for training gans

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., and Chen, X. Improved techniques for training gans. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016

2016

[37] [40]

Denoising diffusion implicit models

Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021 a

2021

[38] [41]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In ICLR, 2021 b

2021

[39] [42]

Dual diffusion implicit bridges for image-to-image translation

Su, X., Song, J., Meng, C., and Ermon, S. Dual diffusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2023

2023

[40] [43]

Z., Daniele, A

Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F. Z., Daniele, A. F., Mostajabi, M., Basart, S., Walter, M. R., and Shakhnarovich, G. Diode: A dense indoor and outdoor depth dataset, 2019

2019

[41] [44]

Audiomog: Guiding audio generation with mixture-of-guidance

Wang, J., Chen, Z., Yuan, B., Zheng, K., Li, C., Jiang, Y., and Zhu, J. Audiomog: Guiding audio generation with mixture-of-guidance. In ICME, 2026

2026

[42] [45]

Towards a golden classifier-free guidance path via foresight fixed point iterations

Wang, K., Mao, J., Wu, T., and Xiang, Y. Towards a golden classifier-free guidance path via foresight fixed point iterations. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025 a

2025

[43] [46]

Tiva: Time-aligned video-to-audio generation

Wang, X., Wang, Y., Wu, Y., Song, R., Tan, X., Chen, Z., Xu, H., and Sui, G. Tiva: Time-aligned video-to-audio generation. In ACM MM, 2024

2024

[44] [47]

Framebridge: Improving image-to-video generation with bridge models

Wang, Y., Chen, Z., Chen, X., Wei, Y., Zhu, J., and Chen, J. Framebridge: Improving image-to-video generation with bridge models. In ICML, 2025 b

2025

[45] [49]

A., Shechtman, E., and Wang, O

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

2018

[46] [50]

Zhang, S., Cheng, Y., and Steeg, G. V. Exploring the design space of diffusion bridge models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025 b

2025

[47] [51]

Diffusion bridge implicit models

Zheng, K., He, G., Chen, J., Bao, F., and Zhu, J. Diffusion bridge implicit models. In The Thirteenth International Conference on Learning Representations, 2025

2025

[48] [52]

Denoising diffusion bridge models

Zhou, L., Lou, A., Khanna, S., and Ermon, S. Denoising diffusion bridge models. In International Conference on Learning Representations, 2024

2024

[49] [53]

International Conference on Learning Representations , year=

Denoising Diffusion Bridge Models , author=. International Conference on Learning Representations , year=

[50] [54]

ICLR , year=

Score-Based Generative Modeling through Stochastic Differential Equations , author=. ICLR , year=

[51] [55]

ICLR , year=

Generalized Schrödinger Bridge Matching , author=. ICLR , year=

[52] [56]

ICLR , year=

Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory , author=. ICLR , year=

[53] [57]

ICASSP , year=

Bridge-SR: Schrödinger Bridge for Efficient SR , author=. ICASSP , year=

[54] [58]

NeurIPS , year=

Audio Super-Resolution with Latent Bridge Models , author=. NeurIPS , year=

[55] [59]

ICASSP , year=

RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation Models , author=. ICASSP , year=

[56] [60]

ACM MM , year=

TiVA: Time-Aligned Video-to-Audio Generation , author=. ACM MM , year=

[57] [61]

Nature Machine Intelligence , volume=

Versatile cardiovascular signal generation with a unified diffusion transformer , author=. Nature Machine Intelligence , volume=. 2026 , publisher=

2026

[58] [62]

ICASSP , year=

DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap , author=. ICASSP , year=

[59] [63]

ICASSP , year=

RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals , author=. ICASSP , year=

[60] [64]

ICME , year=

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance , author=. ICME , year=

[61] [65]

arXiv preprint arXiv:2212.14518 , year=

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech , author=. arXiv preprint arXiv:2212.14518 , year=

work page arXiv

[62] [66]

arXiv preprint arXiv:2509.25275 , year=

VoiceBridge: General Speech Restoration with One-step Latent Bridge Models , author=. arXiv preprint arXiv:2509.25275 , year=

work page arXiv

[63] [67]

NeurIPS , year=

Guande He and Kaiwen Zheng and Jianfei Chen and Fan Bao and Jun Zhu , author=. NeurIPS , year=

[64] [68]

Available: https://arxiv.org/abs/2312.03491

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis , author=. arXiv preprint arXiv:2312.03491 , year=

work page arXiv

[65] [69]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets , author=. arXiv preprint arXiv:2311.15127 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[66] [70]

ICML , year=

FrameBridge: Improving Image-to-Video Generation with Bridge Models , author=. ICML , year=

[67] [71]

ICML , year=

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , author=. ICML , year=

[68] [72]

CVPR , year=

High-Resolution Image Synthesis with Latent Diffusion Models , author=. CVPR , year=

[69] [73]

ICML , year=

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models , author=. ICML , year=

[70] [74]

ICASSP , year=

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training , author=. ICASSP , year=

[71] [75]

NeurIPS , year=

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis , author=. NeurIPS , year=

[72] [76]

CVPR , year=

Omni2Sound: Towards Unified Video-Text-to-Audio Generation , author=. CVPR , year=

[73] [77]

ACM Multimedia , year=

FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation , author=. ACM Multimedia , year=

[74] [78]

I ^2 SB: Image-to-Image Schr

Liu, Guan-Horng and Vahdat, Arash and Huang, De-An and Theodorou, Evangelos A and Nie, Weili and Anandkumar, Anima , booktitle=. I ^2 SB: Image-to-Image Schr. 2023 , organization=

2023

[75] [79]

NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=

Classifier-Free Diffusion Guidance , author=. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=

2021

[76] [80]

Guiding a Diffusion Model with a Bad Version of Itself , volume =

Karras, Tero and Aittala, Miika and Kynk\". Guiding a Diffusion Model with a Bad Version of Itself , volume =. Advances in Neural Information Processing Systems , editor =

[77] [81]

The Thirteenth International Conference on Learning Representations , year=

Diffusion Bridge Implicit Models , author=. The Thirteenth International Conference on Learning Representations , year=

[78] [82]

Denoising Diffusion Probabilistic Models , volume =

Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , booktitle =. Denoising Diffusion Probabilistic Models , volume =

[79] [83]

Elucidating the Design Space of Diffusion-Based Generative Models , volume =

Karras, Tero and Aittala, Miika and Aila, Timo and Laine, Samuli , booktitle =. Elucidating the Design Space of Diffusion-Based Generative Models , volume =

[80] [84]

The Eleventh International Conference on Learning Representations , year=

Dual Diffusion Implicit Bridges for Image-to-Image Translation , author=. The Eleventh International Conference on Learning Representations , year=