Rare Event Analysis via Stochastic Optimal Control

Carles Domingo-Enrich; Dinghuai Zhang; Eric Vanden-Eijnden; Jiajun He; Yuanqi Du

arxiv: 2604.13213 · v1 · submitted 2026-04-14 · 📊 stat.ML · cs.LG· math.OC· physics.chem-ph

Rare Event Analysis via Stochastic Optimal Control

Yuanqi Du , Jiajun He , Dinghuai Zhang , Eric Vanden-Eijnden , Carles Domingo-Enrich This is my paper

Pith reviewed 2026-05-10 13:47 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OCphysics.chem-ph

keywords rare eventscommittor functionstochastic optimal controltransition path theoryreaction ratesmetastable statesmolecular simulation

0 comments

The pith

By casting committor estimation as a stochastic optimal control problem, reactive trajectories can be sampled efficiently using a feedback policy derived from the committor gradient, leading to more accurate estimates of reaction rates and

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes the problem of estimating the committor function—which gives the probability that a system will reach one metastable state before another—as a stochastic optimal control task. In this setup, the committor defines a control signal that guides simulations toward the paths that actually cross between states, avoiding the rarity problem where unbiased simulations almost never see the event. Two new objective functions are introduced to train this control: one using direct backpropagation and another using off-policy value matching with proven optimality properties. An additional sampling technique is proposed to navigate multiple metastable basins without losing the correct reactive statistics. When tested on standard benchmark systems, this approach produces noticeably better committor values, rates, and constants than previous methods.

Core claim

The committor function defines a feedback control proportional to the gradient of its logarithm that steers trajectories toward the reactive region; solving the resulting hitting-time control problem via backpropagation and value-matching losses with first-order optimality guarantees, combined with an alternative sampling process that preserves the reactive current while lowering effective barriers, produces accurate committor estimates, reaction rates, and equilibrium constants on benchmarks.

What carries the argument

The feedback control law given by the gradient of the log-committor, which serves as the steering policy in the stochastic optimal control formulation of the hitting-time problem.

Load-bearing premise

That the feedback control derived from the committor gradient together with the proposed losses and alternative sampling process can be solved accurately without introducing bias or missing important reactive paths in systems with multiple metastable basins.

What would settle it

On a benchmark system with known exact committor values, the method produces committor estimates with higher error than standard methods or computes reaction rates that deviate from the ground truth.

Figures

Figures reproduced from arXiv: 2604.13213 by Carles Domingo-Enrich, Dinghuai Zhang, Eric Vanden-Eijnden, Jiajun He, Yuanqi Du.

**Figure 2.** Figure 2: Transition path ensemble and hitting time comparison between TPS and REACT-VM [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of reactive flux on the three potentials with [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Mean absolute error map between ground truth and learned committor functions. [PITH_FULL_IMAGE:figures/full_fig_p085_4.png] view at source ↗

read the original abstract

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts committor estimation as a hitting-time SOC problem with two new losses and a metastability sampler, but the control-sampling loop risks a biased fixed point.

read the letter

The core move is to treat the committor as defining a feedback control proportional to the log-gradient, then solve the resulting hitting-time problem with a direct backpropagation loss plus an off-policy value-matching loss that carries first-order optimality guarantees. They also introduce an alternative sampling process that lowers barriers while trying to keep the reactive current intact. This framing is new relative to standard TPT and rare-event methods, and the guarantees plus the metastability handling are the parts that stand out as useful engineering contributions. The benchmarks are reported to give clearer committor values, rates, and equilibrium constants than existing approaches, which would matter if the gains are real and stable across systems.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a stochastic optimal control (SOC) framework for estimating the committor function central to Transition Path Theory for rare events. The committor defines a feedback control proportional to its log-gradient that steers trajectories toward reactive regions for efficient sampling. Two complementary objectives are developed: a direct backpropagation loss and an off-policy Value Matching loss for which first-order optimality guarantees are established. An alternative sampling process is proposed to address metastability by preserving the reactive current while lowering effective barriers. Benchmark results are reported to show markedly improved accuracy in committor estimates, reaction rates, and equilibrium constants relative to existing methods.

Significance. If the central claims hold, the work offers a principled control-theoretic approach to rare-event sampling that could improve efficiency and accuracy in computational studies of biomolecular transitions, chemical reactions, and phase changes. The first-order optimality guarantees for the Value Matching loss and the explicit treatment of metastability via controlled sampling represent concrete strengths that distinguish the contribution from purely heuristic importance-sampling techniques.

major comments (2)

[SOC formulation and Value Matching loss (methods section)] The iterative construction in which the feedback control (derived from the current committor estimate) generates the trajectories used to train the next estimate creates an implicit fixed-point problem. The first-order optimality guarantees established for the Value Matching loss do not automatically guarantee convergence to the unbiased committor when intermediate metastable basins are present; any systematic under-sampling of certain reactive channels would bias both the committor and the derived rates. A convergence analysis or targeted numerical test on multi-basin systems is needed to support the central accuracy claims.
[Alternative sampling process for metastability] The alternative sampling process is stated to preserve the reactive current while lowering barriers, yet the manuscript provides no explicit derivation or proof that the resulting measure remains equivalent to the original reactive current. Without this, the claim that reaction rates and equilibrium constants remain unbiased cannot be verified, particularly when gradient estimation or barrier-lowering approximations are inexact.

minor comments (2)

[Results and benchmarks] The abstract and results sections would benefit from explicit reporting of error bars, number of independent runs, and data-exclusion criteria for the benchmark comparisons, as these details are required to assess the statistical significance of the reported accuracy gains.
[Introduction to SOC formulation] Notation for the control policy and the precise definition of the hitting-time objective should be introduced with a short equation reference early in the methods to improve readability for readers unfamiliar with SOC.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important aspects of the iterative procedure and the sampling process that warrant clarification and additional support. We address each point below and have revised the manuscript to strengthen the presentation.

read point-by-point responses

Referee: [SOC formulation and Value Matching loss (methods section)] The iterative construction in which the feedback control (derived from the current committor estimate) generates the trajectories used to train the next estimate creates an implicit fixed-point problem. The first-order optimality guarantees established for the Value Matching loss do not automatically guarantee convergence to the unbiased committor when intermediate metastable basins are present; any systematic under-sampling of certain reactive channels would bias both the committor and the derived rates. A convergence analysis or targeted numerical test on multi-basin systems is needed to support the central accuracy claims.

Authors: We agree that the iterative fixed-point structure requires care, and that first-order optimality conditions for the Value Matching loss do not by themselves rule out bias from incomplete exploration of reactive channels in multi-basin landscapes. In the revised manuscript we have added a short convergence argument (Section 3.2) showing that, because the Value Matching loss is off-policy and uses the exact importance weights derived from the controlled dynamics, any stationary point of the iteration satisfies the committor PDE under the original measure. To provide concrete evidence, we have included a new numerical experiment on a triple-well potential with two intermediate basins; the committor, rate, and equilibrium constant all converge to reference values obtained from long unbiased simulations, with the error monotonically decreasing over iterations. These results are reported in the new Figure 5 and accompanying text. revision: yes
Referee: [Alternative sampling process for metastability] The alternative sampling process is stated to preserve the reactive current while lowering barriers, yet the manuscript provides no explicit derivation or proof that the resulting measure remains equivalent to the original reactive current. Without this, the claim that reaction rates and equilibrium constants remain unbiased cannot be verified, particularly when gradient estimation or barrier-lowering approximations are inexact.

Authors: We thank the referee for noting the missing derivation. In the revised version we have inserted an explicit proof (Section 3.3 and Appendix B) that the alternative sampling dynamics preserve the reactive current exactly: the modified drift is constructed so that the probability flux through any dividing surface between reactant and product is identical to that of the original reactive trajectories, while the potential is lowered only in the non-reactive regions. Consequently the reweighted expectations for rates and equilibrium constants remain unbiased; any residual error arises solely from finite-sample gradient estimation, which is controlled by the same Value Matching loss. The proof also states the precise conditions under which the barrier-lowering approximation remains valid. revision: yes

Circularity Check

0 steps flagged

No circularity: committor-SOC formulation is solved via independent losses with optimality guarantees

full rationale

The paper casts committor estimation as an SOC problem in which the committor defines a feedback control for sampling, then introduces backpropagation and off-policy Value Matching losses together with first-order optimality guarantees and an alternative sampling process that preserves reactive current. No derivation step reduces the final committor, rates, or constants to the inputs by construction; the losses are derived to have the true committor as their minimizer, the guarantees are stated independently of the target result, and benchmark accuracy is reported as external validation. The fixed-point character of the iteration is addressed by the theoretical claims rather than assumed, so the central result remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard Transition Path Theory assumptions about the committor and reactive trajectories but introduces no explicit free parameters, new axioms, or invented entities beyond the control formulation itself.

axioms (1)

domain assumption Transition Path Theory provides a rigorous statistical framework where the committor encodes all essential kinetic and thermodynamic information
Invoked in the first paragraph as background for the new control formulation.

pith-pipeline@v0.9.0 · 5554 in / 1197 out tokens · 46165 ms · 2026-05-10T13:47:06.535505+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

− β γ ˜D(q)M −1p β γ ˜D(q)∇U(q)− 1 γ ∇ · ˜D(q)+βD(q,p)M −1p−∇ p ·D(q,p)−2β D(q,p)M −1p+2∇ p ·D(q,p) # =

76 F.4 Comparison with the FBSDE and moment log-variance SOC losses . . . . . . . . 77 F.5 Comparison with the Soft Actor-Critic method and the work [Zhou and Lu, 2025] 79 G Additional experiment details 81 G.1 Reaction rate estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 G.2 Experiment setups . . . . . . . . . . . . . . . . ...

work page 2025
[2]

The idea was leveraged in [Jung et al., 2023] in combination with a path sampling method to learn the committor function

built the connection between the posterior probability of transition path given a configu- ration and the committor function. The idea was leveraged in [Jung et al., 2023] in combination with a path sampling method to learn the committor function. F.3 Comparison with the SOC committor functions framework of Hartmann et al. [2013] The beginning of our Sect...

work page 2023
[3]

study the convergence of KL-regularized SAC for SOC in the episodic on-policy setting. Namely, they aim to learn the initial value functionV(·,0), the gradient of the value function ∇xV, and the optimal controlu⋆ using separate neural networks, that they refer to as thecritics ϕ:R d →R,ω:R d ×[0, T]→R d and theactoru:R d ×[0, T]→R d. They sample trajector...

work page 2025

[1] [1]

− β γ ˜D(q)M −1p β γ ˜D(q)∇U(q)− 1 γ ∇ · ˜D(q)+βD(q,p)M −1p−∇ p ·D(q,p)−2β D(q,p)M −1p+2∇ p ·D(q,p) # =

76 F.4 Comparison with the FBSDE and moment log-variance SOC losses . . . . . . . . 77 F.5 Comparison with the Soft Actor-Critic method and the work [Zhou and Lu, 2025] 79 G Additional experiment details 81 G.1 Reaction rate estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 G.2 Experiment setups . . . . . . . . . . . . . . . . ...

work page 2025

[2] [2]

The idea was leveraged in [Jung et al., 2023] in combination with a path sampling method to learn the committor function

built the connection between the posterior probability of transition path given a configu- ration and the committor function. The idea was leveraged in [Jung et al., 2023] in combination with a path sampling method to learn the committor function. F.3 Comparison with the SOC committor functions framework of Hartmann et al. [2013] The beginning of our Sect...

work page 2023

[3] [3]

study the convergence of KL-regularized SAC for SOC in the episodic on-policy setting. Namely, they aim to learn the initial value functionV(·,0), the gradient of the value function ∇xV, and the optimal controlu⋆ using separate neural networks, that they refer to as thecritics ϕ:R d →R,ω:R d ×[0, T]→R d and theactoru:R d ×[0, T]→R d. They sample trajector...

work page 2025