arxiv: 2512.05070 · v2 · submitted 2025-12-04 · 📊 stat.ML · cs.LG

Recognition: no theorem link

Control Consistency Losses for Diffusion Bridges

Samuel Howard , Nikolas N\"usken , Jakiw Pidstrigach

Authors on Pith no claims yet

Pith reviewed 2026-05-17 00:40 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords diffusion bridgesoptimal controlself-consistency lossconditioned dynamicsstochastic processesonline trainingrare events

0 comments

The pith

A self-consistency property of optimal control lets models learn diffusion bridges through iterative online training without differentiating through trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to train models that simulate diffusion processes conditioned on both a starting point and an ending point. It does so by turning a self-consistency property of the optimal control into a training loss. The resulting algorithm updates the model online in successive iterations and avoids the need to back-propagate gradients through full simulated paths. The approach is tested on several empirical tasks and also points to links with recent work on stochastic optimal control.

Core claim

Learning the conditioned dynamics of a diffusion process can be achieved by enforcing a self-consistency loss on the optimal control, which produces an iterative online training procedure that does not require differentiation through simulated trajectories.

What carries the argument

The self-consistency property of the optimal control for the conditioned diffusion, converted directly into a training loss.

Load-bearing premise

The optimal control for the conditioned diffusion process has a usable self-consistency property that can be turned into a stable training loss without extra assumptions on the drift or diffusion coefficients.

What would settle it

Apply the trained model to a diffusion bridge whose true conditioned dynamics are known exactly and check whether the generated trajectories match the known distribution to within sampling error.

Figures

Figures reproduced from arXiv: 2512.05070 by Jakiw Pidstrigach, Nikolas N\"usken, Samuel Howard.

**Figure 1.** Figure 1: The self-consistency property u(s, Xs) = EPu [u(t, Xt)Jt|s|Xs] relates the control value at an earlier time s to its value at a later time t, along the trajectories. Diffusion Bridge Bridges (3) from to Processes satisfying selfconsistency Enforced by construction Enforced by self-consistency loss [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: Comparison of our proposed approach with Neural Guided Diffusion Bridge (NGDB) [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Visualisation of obtained bridge trajectories for the 2d Muller-Brown potential. ¨ 4 Experiments We now provide experiments to validate our approach, in a range of different settings. We present a selection below, with additional experiments and comparisons to other diffusion bridge methods in Appendix C. We first provide a comparison with the neural-guided bridge approach (NGDB) of Yang et al. (2025) for… view at source ↗

**Figure 6.** Figure 6: Comparison of performance on the double-well experiment, for different barrier height [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Rare event example for cell diffusion process, from Yang et al. (2025), showing Xt,1 (•) and Xt,2 (•). Left: NGDB; Right: Ours. 0 2 4 t 1 0 1 Xt; i 0 2 4 t 1 0 1 Xt; i [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 9.** Figure 9: Comparison of the trajectories obtained by taking auxiliary drifts [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

Simulating the conditioned dynamics of diffusion processes, given their initial and terminal states, is an important but challenging problem in the sciences. The difficulty is particularly pronounced for rare events, for which the unconditioned dynamics rarely reach the terminal state. In this work, we propose a novel approach for learning diffusion bridges based on a self-consistency property of the optimal control. The resulting algorithm learns the conditioned dynamics in an iterative online manner, and exhibits strong performance in a range of empirical settings without requiring differentiation through simulated trajectories. Beyond the diffusion bridge setting, we draw connections between our self-consistency framework and recent advances in the wider stochastic optimal control literature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The self-consistency loss gives a practical way to train diffusion bridges online without trajectory differentiation, but stability for rare terminals needs checking.

read the letter

The core contribution is a training loss that exploits a self-consistency property of the optimal control for the conditioned process. This lets the method learn the bridge drift iteratively from online samples rather than differentiating through full trajectories. That setup is new relative to the usual score-matching or Doob-h-transform approaches cited in the abstract, and it directly targets the rare-event case where standard simulation fails to hit the terminal state often enough. The connection they draw to recent stochastic optimal control work is also useful and not just decorative. If the derivation is clean, this could be a handy algorithmic tool for people simulating conditioned SDEs in physics or biology. The empirical claims in the abstract sound reasonable for the settings they tested, and avoiding backprop through paths is a real practical win when trajectories are long or stiff. The main soft spot is the stability question. The abstract says the approach works without extra assumptions on the coefficients, yet turning an optimal-control fixed point into a stable loss for rare terminals often requires some Lipschitz control or regularization to keep the iteration from drifting. If the paper only shows good results on moderate cases and does not spell out why the iteration converges in general, that part will need tightening in review. The experiments will also have to demonstrate that the online updates do not accumulate bias over iterations. This paper is for readers already working on diffusion bridges, controlled SDEs, or rare-event sampling. Someone looking for a new loss construction in that niche would find it worth reading. It is coherent enough on its own terms to deserve a serious referee rather than a desk reject; the idea is distinct and the practical angle is clear even if the convergence details need more work.

Referee Report

2 major / 2 minor

Summary. The paper proposes a novel method for learning diffusion bridges by exploiting a self-consistency property of the optimal control for conditioned diffusions. The resulting algorithm performs iterative online training of the conditioned dynamics and reports strong empirical performance across settings without requiring differentiation through simulated trajectories. Connections are drawn to the broader stochastic optimal control literature.

Significance. If the central derivation holds and the iteration is stable, the approach could provide a practical alternative to Doob h-transform or score-matching methods for rare-event conditioned sampling, with the online nature and avoidance of trajectory differentiation offering computational advantages in applications such as molecular simulation or rare-event analysis. The empirical results, if reproducible, would strengthen the case for control-based consistency losses in diffusion models.

major comments (2)

[§3.2] §3.2, Eq. (8): The self-consistency relation for the optimal control u* is stated to hold without additional assumptions on the drift or diffusion coefficients, yet the fixed-point iteration for online learning implicitly requires contraction or Lipschitz conditions to guarantee convergence to the Doob h-transform; this is load-bearing for the claim that the loss is stable for rare terminal conditions.
[§4.1] §4.1, Algorithm 1: The online update rule is presented as exact, but no analysis is given for bias introduced when the terminal state is reached with low probability under the uncontrolled process; this directly affects whether the learned bridge matches the true conditioned dynamics.

minor comments (2)

[Figure 2] Figure 2: The caption does not specify the number of independent runs or whether shaded regions represent standard deviation; this reduces clarity of the reported performance gains.
[§2] §2: The notation for the controlled SDE and the value function is introduced without a consolidated symbol table, making cross-references between the optimal-control derivation and the diffusion-bridge setting harder to follow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below and indicate where revisions will be made to the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2, Eq. (8): The self-consistency relation for the optimal control u* is stated to hold without additional assumptions on the drift or diffusion coefficients, yet the fixed-point iteration for online learning implicitly requires contraction or Lipschitz conditions to guarantee convergence to the Doob h-transform; this is load-bearing for the claim that the loss is stable for rare terminal conditions.

Authors: The self-consistency relation in Eq. (8) follows directly from the dynamic programming principle applied to the stochastic optimal control formulation and holds under the standard assumptions required for the existence of an optimal control (finite cost, admissible controls, and well-posed SDE). No further Lipschitz or contraction assumptions are needed for the relation itself to be valid. We agree, however, that convergence of the fixed-point iteration to the Doob h-transform does rely on the operator being contractive, which in general requires additional conditions such as Lipschitz continuity of the drift and diffusion coefficients or boundedness of the control. In the revised manuscript we will add a clarifying paragraph in §3.2 that distinguishes the validity of the relation from the convergence of the iteration, state sufficient conditions drawn from the stochastic control literature, and note the implications for rare terminal conditions. This is a targeted clarification rather than a change to the core derivation. revision: yes
Referee: [§4.1] §4.1, Algorithm 1: The online update rule is presented as exact, but no analysis is given for bias introduced when the terminal state is reached with low probability under the uncontrolled process; this directly affects whether the learned bridge matches the true conditioned dynamics.

Authors: The update rule is exact when the expectation is taken with respect to trajectories sampled from the current iterate of the conditioned process. We acknowledge that the manuscript does not provide a formal bias analysis for the transient phase in which the terminal state remains rare under the initial uncontrolled dynamics. In the revised version we will add a short discussion in §4.1 describing this initialization effect and its progressive reduction over iterations, together with additional numerical results on rare-event conditioning tasks that illustrate empirical convergence. These changes address the concern while preserving the online, differentiation-free character of the algorithm. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper grounds its loss in a self-consistency property of the optimal control for conditioned diffusions, presented as a general feature of stochastic optimal control rather than a definition or fit internal to the learned parameters. The abstract and description show the method as an iterative online algorithm that avoids trajectory differentiation, with connections drawn to external literature. No equations or steps reduce the claimed prediction or loss to a quantity already fitted to the same data by construction, and no load-bearing self-citation chain is indicated. The approach remains self-contained against external benchmarks from control theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the existence and usability of a self-consistency property for the optimal control of conditioned diffusions; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The optimal control for a diffusion bridge satisfies a self-consistency property that can be exploited as a training signal.
Invoked as the foundation for the proposed consistency loss.

pith-pipeline@v0.9.0 · 5401 in / 1150 out tokens · 42687 ms · 2026-05-17T00:40:24.458615+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models
cs.LG 2026-05 unverdicted novelty 7.0

Reinforce Adjoint Matching derives a simple consistency loss for RL post-training of diffusion models by tilting the clean distribution toward higher-reward samples under KL regularization while keeping the noising pr...

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · cited by 1 Pith paper

[1]

Diffusion Bridges for Stochastic Hamiltonian Systems and Shape Evolutions

Alexis Arnaudon, Frank van der Meulen, Moritz Schauer, and Stefan Sommer (2022). “Diffusion Bridges for Stochastic Hamiltonian Systems and Shape Evolutions”. In:SIAM Journal on Imaging Sciences15.1, pp. 293–323. Elizabeth Louise Baker, Moritz Schauer, and Stefan Sommer (2025). “Score matching for bridges without time-reversals”. In:The 28th International ...

work page arXiv 2022
[2]

Particle filters for partially observed diffusions

Berlin ; Springer. Paul Fearnhead, Omiros Papaspiliopoulos, and Gareth O Roberts (2008). “Particle filters for partially observed diffusions”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology 70.4, pp. 755–777. A. Golightly and D.J. Wilkinson (2008). “Bayesian inference for nonlinear multivariate diffusion models observed with...

work page 2008
[3]

On Generating Monte Carlo Samples of Continuous Diffusion Bridges

Ed. by P. L. Hennequin. Springer Berlin Heidelberg, pp. 143–303. Hiroshi Kunita (1986).Lectures on Stochastic Flows and Applications. Ming Lin, Rong Chen, and Per Mykland (2010). “On Generating Monte Carlo Samples of Continuous Diffusion Bridges”. In:Journal of the American Statistical Association105.490, pp. 820–838. Christian L´eonard (2013).A survey of...

work page arXiv 1986
[4]

CRC Monographs on Statistics & Applied Probability, pp. 311–340. Amnon Pazy (2012).Semigroups of linear operators and applications to partial differential equations. V ol

work page 2012
[5]

Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusion processes

Springer Science & Business Media. Asger Roer Pedersen (1995). “Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusion processes”. In:Bernoulli1.3, pp. 257 –279. Jakiw Pidstrigach, Elizabeth Louise Baker, Carles Domingo-Enrich, George Deligiannidis, and Nikolas N¨usken (2025). “Conditioning Di...

work page 1995
[6]

Pedersen (1995) use the unconditional dynamics, and Clark (1990) and Delyon and Hu (2006) utilise the Brownian bridge to ensure termination at the correct state

or Metropolis-Hastings steps. Pedersen (1995) use the unconditional dynamics, and Clark (1990) and Delyon and Hu (2006) utilise the Brownian bridge to ensure termination at the correct state. Schauer et al. (2017) consider a class of guided diffusions that includes an additional drift term while preserving tractability. We also highlight the works Stuart ...

work page 1995
[7]

We also discuss potential issues that might arise in certain settings, and present a generalised version of our algorithm which can help to mitigate them

We now discuss further implementation details regarding the proposed approach. We also discuss potential issues that might arise in certain settings, and present a generalised version of our algorithm which can help to mitigate them. B.1 Choice of neural parameterisation Recall from the discussion around equation (7) that we use the neural parameterisatio...

work page 2021
[8]

can be included in the training target recursions, similar to the adjustments discussed in Domingo-Enrich (2024). The idea of the STL adjustments is to remove the variance in the training targets at the solution, by leveraging the knowledge that the corresponding potential must solve the appropriate HJB equation. Below, we provide a sketch proof showing h...

work page 2024
[9]

B.5 Connections to stochastic optimal control methods We now discuss connections of the self-consistency property with recent methodological develop- ments in the stochastic optimal control literature. In Section 3, we discussed how the self-consistency property is a necessary property for the optimal control, but for sufficiency we must also impose addit...

work page 2025
[10]

andforward-bridge(FB) (Baker et al., 2025), which both learn from uncontrolled simulations. As expected, we find these approaches to generally perform well in settings where the terminal point occurs frequently under the unconditioned dynamics, but to struggle otherwise, which agrees with the findings in Yang et al. (2025). All experiments were carried ou...

work page 2025
[11]

We used 500 time-discretisation steps, simulated 64 trajectories at each step, and used a learning rate of 1e-4

0.023±0.010 0.148±0.073 6.53±0.14 66.1±0.9 792.8±0.9 Ours (˜b=−b) 0.055±0.009 0.041±0.013 6.53±0.17 66.0±0.5 792.4±1.0 NGDB 0.051±0.012 0.098±0.047 6.48±0.15 65.5±0.6 783.4±0.8 SDB 0.104±0.011 0.229±0.024 5.56±0.39* 75.0±5.1 1845±230 FB 0.074±0.011 - 6.51±0.14 237±98 4836±954 of trajectory simulations per step. We used 500 time-discretisation steps, simul...

work page 1985
[12]

This again allows us to report KL(P∗∥Pθ) and verify that the learned diffusion bridges are correct

is a 1-dimensional stochastic process that evolves according to the SDE dXt =a(b−X t)dt+ε p XtdBt.(37) We consider this example as it allows us to verify our method on an SDE with aspatially-dependent diffusion coefficient σ(t, x) =ε √x, and also because the ground-truth is again known in closed form, as the transition densities can be written asce−u−v v ...

work page 2025
[13]

(2025): anormalevent, arareevent, and a multi-modalexample

38.6±0.2 -32.5±5.9 We consider the three settings used in Yang et al. (2025): anormalevent, arareevent, and a multi-modalexample. For the normal event, the starting state is x0 = [0.1,−0.1] , the termination state is xT = [2.0,−0.1] , and the termination time is T= 4 . For the rare event, the starting state is again x0 = [0.1,−0.1] , the termination state...

work page 2025
[14]

Including the STL adjustments generally appears to improve performance slightly, with the largest improvements seen in cases where large Jacobians arise (that is, the double-well and M¨uller-Brown experiments). However, as the STL adjustments require computing the terms (∇u σ+u∇σ)(t, X u t )·δB t, they increase the computational cost of the method and tra...

work page 1984
[15]

To see this, take ˜T < T , and apply the above result with ˜T in place of T and with F(X ˜T ) =p(X T =x T |X ˜T ), which is smooth

Finally, we remark that this self-consistency property also holds in the singular case when F=δ xT , which corresponds to diffusion bridges. To see this, take ˜T < T , and apply the above result with ˜T in place of T and with F(X ˜T ) =p(X T =x T |X ˜T ), which is smooth. We thus obtain the result for s < t < ˜T , and as ˜T was arbitrary we see that the s...

work page 2013
[16]

From this, it follows that (iv) implies (iii), and the converse implication follows directly from the definition. Theorem 3.1.Within the class of controlled diffusion processes of the form (3) that ter- minate at xT , there is a unique process X u t that satisfies the self-consistency property u(s, Xs) =E Pu[J ⊤ t|s u(t, Xt)|Xs] and whose control is of gr...

work page 2003