Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space

Giosue Migliorini; Padhraic Smyth

arxiv: 2510.12916 · v2 · submitted 2025-10-14 · 📊 stat.ML · cs.LG

Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space

Giosue Migliorini , Padhraic Smyth This is my paper

Pith reviewed 2026-05-18 07:12 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords continuous-time Markov chainssequential Monte Carlohidden Markov modelsinteracting particle systemstwisted SMCSIRS modelinference

0 comments

The pith

Interacting continuous-time Markov chains admit tractable posterior inference through parameterized twist potentials in twisted SMC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Latent Interacting Particle Systems to parameterize the generators of each chain in a collection of interacting continuous-time Markov chains. It develops an efficient way to estimate look-ahead functions, or twist potentials, that incorporate future observations into the inference, then folds this approximation into a twisted Sequential Monte Carlo procedure. The resulting sampler targets the intractable posterior process that arises when discrete-time observations are added via Doob's h-transform. A reader would care because these models describe real processes such as epidemic spread on networks and wildfire dynamics, yet exact inference has remained out of reach in high dimensions.

Core claim

We introduce Latent Interacting Particle Systems, a model class parameterizing the generator of each Markov chain in the system. Our inference method involves estimating look-ahead functions (twist potentials) that anticipate future information, for which we introduce an efficient parameterization. We incorporate this approximation in a twisted Sequential Monte Carlo sampling scheme. We demonstrate the effectiveness of our approach on a challenging posterior inference task for a latent SIRS model on a graph, and on a neural model for wildfire spread dynamics trained on real data.

What carries the argument

Latent Interacting Particle Systems, which parameterize the infinitesimal generator of each chain so that interactions and the effect of future observations can be expressed through twist potentials.

If this is right

Posterior inference becomes feasible for high-dimensional systems such as epidemic models on large graphs.
Discrete-time noisy observations can be incorporated into continuous-time dynamics without requiring exact conditioning.
Neural parameterizations of the dynamics, as in the wildfire example, can be trained using the approximate posterior samples.
The same twisted sampling scheme extends to other auxiliary-variable problems that produce intractable h-transformed processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The parameterization of twist potentials may transfer to other continuous-time models that admit similar look-ahead approximations.
Scalability tests on graphs with thousands of nodes would clarify whether the method remains practical when interaction structure grows denser.
Replacing the current twist parameterization with learned neural approximators could further reduce bias in very large systems.

Load-bearing premise

The efficient parameterization of the twist potentials provides a sufficiently accurate approximation to the true look-ahead functions so that the twisted SMC sampler yields useful posterior samples even in high-dimensional interacting systems.

What would settle it

On a small graph where exact posterior samples or long-run MCMC references are computable, the twisted SMC output deviates substantially in marginal statistics or predictive checks from the reference distribution.

Figures

Figures reproduced from arXiv: 2510.12916 by Giosue Migliorini, Padhraic Smyth.

**Figure 2.** Figure 2: Latent trajectory reconstruction, measured by cross-entropy loss on the test set of ground truth trajectories with respect to the posterior approximations. Error bars correspond to two standard errors computed across trajectories. Methods. For our TwistNet parameterization, we experiment learning the twist using both our forward KL loss from equation 18 and the DRE loss introduced by (Lawson et al., 202… view at source ↗

**Figure 3.** Figure 3: Evolution of the parameters and log relative parameter error ( [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Observed active fire maps from WildfireSpreadTS (Gerard et al., 2023) and snapshot [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Latent trajectory reconstruction of each inference method, measured by Brier score of true latent trajectories with respect to the posterior approximations. Methods using TwistNet are highlighted by a thicker line. Error bars correspond to two standard errors. Evaluation. In our latent trajectory inference experiment we are interested in understanding whether our method can be used to perform posterio… view at source ↗

**Figure 6.** Figure 6: Evolution of the parameters and log relative parameter error ( [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗

**Figure 7.** Figure 7: First example of latent trajectories, with counts of each state over the graph at [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗

**Figure 8.** Figure 8: Second example of latent trajectories, with counts of each state over the graph at [PITH_FULL_IMAGE:figures/full_fig_p037_8.png] view at source ↗

**Figure 9.** Figure 9: Samples in between observations at times [PITH_FULL_IMAGE:figures/full_fig_p038_9.png] view at source ↗

**Figure 10.** Figure 10: Samples in between observations at times [PITH_FULL_IMAGE:figures/full_fig_p040_10.png] view at source ↗

**Figure 11.** Figure 11: VIIRS covariates and empirical distribution of active fires obtained from samples [PITH_FULL_IMAGE:figures/full_fig_p041_11.png] view at source ↗

**Figure 12.** Figure 12: Third and fourth examples of wildfire trajectories. [PITH_FULL_IMAGE:figures/full_fig_p042_12.png] view at source ↗

**Figure 13.** Figure 13: Fifth and sixth examples of wildfire trajectories. [PITH_FULL_IMAGE:figures/full_fig_p043_13.png] view at source ↗

read the original abstract

Systems of interacting continuous-time Markov chains are a powerful model class, but inference is typically intractable in high dimensional settings. Auxiliary information, such as noisy observations, is typically only available at discrete times, and incorporating it via a Doob's $h$-transform gives rise to an intractable posterior process that requires approximation. We introduce Latent Interacting Particle Systems, a model class parameterizing the generator of each Markov chain in the system. Our inference method involves estimating look-ahead functions (twist potentials) that anticipate future information, for which we introduce an efficient parameterization. We incorporate this approximation in a twisted Sequential Monte Carlo sampling scheme. We demonstrate the effectiveness of our approach on a challenging posterior inference task for a latent SIRS model on a graph, and on a neural model for wildfire spread dynamics trained on real data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces Latent Interacting Particle Systems with a compact twist-potential parameterization for twisted SMC on coupled CTMCs, but lacks error bounds or exact comparisons for the approximation.

read the letter

The main takeaway is that this work gives a concrete parameterization for generators in systems of interacting continuous-time Markov chains and pairs it with an efficient form for the twist potentials used in twisted SMC. That combination is the actual new piece, and it is not just a rephrasing of prior abstract twisted-SMC results. They show the method on a latent SIRS model on a graph and on a neural wildfire-spread model fit to real data, which are reasonable test cases for the target applications in epidemiology and spatial dynamics. The setup is practical and the experiments are end-to-end, so the reader can see that the sampler runs on non-trivial instances. The soft spot is exactly the one flagged in the stress test: there is no derivation or numerical check showing that the chosen twist parameterization stays close to the true Doob h-transform look-ahead function, either uniformly in dimension or in a small tractable regime where the exact quantity can be computed. The reported results therefore leave open whether performance comes from good approximation quality or from the base models being tolerant of moderate error. The citation pattern looks standard and the math is presented at the level needed for a methods paper, but the missing error control is the load-bearing gap. This is for readers already working on inference for continuous-time interacting systems who need a workable sampler rather than a fully rigorous one. A serious referee should see it because the problem is well-motivated and the proposed fix is specific enough to critique in detail. I would send it to review and ask for at least one controlled comparison against exact look-ahead on a low-dimensional case.

Referee Report

2 major / 2 minor

Summary. The paper introduces Latent Interacting Particle Systems (LIPS), a model class that parameterizes the generator of each Markov chain within a system of interacting continuous-time Markov chains. For posterior inference given discrete-time auxiliary observations, it develops an efficient parameterization of look-ahead functions (twist potentials) that approximate the Doob h-transform and embeds this approximation inside a twisted Sequential Monte Carlo sampler. The approach is illustrated on a latent SIRS epidemic model defined on a graph and on a neural-network parameterization of wildfire spread dynamics fitted to real data.

Significance. If the twist-potential approximation is shown to be sufficiently accurate, the work would supply a practical route to scalable inference for high-dimensional coupled CTMCs, a setting that arises in epidemiology, ecology, and environmental modeling. The explicit construction of a new model class together with a reusable parameterization for twisted SMC constitutes a concrete, potentially extensible contribution that could be reproduced and extended by others working on continuous-time interacting systems.

major comments (2)

[Section describing the twist-potential parameterization] The central claim that the efficient parameterization of twist potentials yields a sufficiently accurate approximation to the true look-ahead functions (so that twisted SMC produces useful posterior samples in high-dimensional interacting systems) is not supported by any derivation or uniform error bound. No analysis is given showing that the chosen functional form controls approximation error as a function of system size or interaction strength.
[Experimental results section] The experimental demonstrations on the latent SIRS graph model and the wildfire neural model are entirely end-to-end. There are no comparisons against exact look-ahead functions (or Doob h-transform) in any low-dimensional tractable regime, nor ablation studies that isolate the contribution of twist-approximation error versus other modeling choices.

minor comments (2)

[Notation and preliminaries] Notation for generators, twist potentials, and the resulting twisted process should be introduced once and used consistently; several symbols appear to be redefined across sections.
[Abstract and introduction] The abstract and introduction would benefit from a short explicit statement of the approximation error that the new parameterization is intended to control.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which identify key opportunities to strengthen the theoretical discussion and empirical validation in our work. We address each major comment below and describe the corresponding revisions.

read point-by-point responses

Referee: [Section describing the twist-potential parameterization] The central claim that the efficient parameterization of twist potentials yields a sufficiently accurate approximation to the true look-ahead functions (so that twisted SMC produces useful posterior samples in high-dimensional interacting systems) is not supported by any derivation or uniform error bound. No analysis is given showing that the chosen functional form controls approximation error as a function of system size or interaction strength.

Authors: We acknowledge that the manuscript does not include a formal derivation or uniform error bounds for the twist-potential approximation. Deriving such bounds for general coupled CTMCs is technically challenging because the exact Doob h-transform depends on the full interaction structure and future observations in a non-local manner. Our parameterization is instead motivated by the need for a computationally tractable form that respects the conditional independence structure of the latent interacting particle system while incorporating auxiliary information. We will revise the relevant section to provide a clearer exposition of the design choices, their relation to the exact h-transform, and the regimes (e.g., weak interactions or localized observations) in which the approximation is expected to remain accurate. We will also add a limitations paragraph explicitly noting the absence of uniform guarantees. revision: partial
Referee: [Experimental results section] The experimental demonstrations on the latent SIRS graph model and the wildfire neural model are entirely end-to-end. There are no comparisons against exact look-ahead functions (or Doob h-transform) in any low-dimensional tractable regime, nor ablation studies that isolate the contribution of twist-approximation error versus other modeling choices.

Authors: We agree that controlled experiments in tractable regimes would strengthen the evidence for the twist approximation. In the revised manuscript we will add a new subsection presenting results on small, low-dimensional SIRS graphs (e.g., 3–5 nodes) where the exact look-ahead functions can be computed via matrix exponentiation or dynamic programming. These experiments will report quantitative error metrics between the parameterized twists and the exact h-transform. We will also include ablation studies that compare twisted SMC against untwisted SMC and against simpler twist forms, thereby isolating the contribution of the proposed parameterization from other modeling decisions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces Latent Interacting Particle Systems as a novel model class by directly parameterizing the generator of each Markov chain in the interacting system, then proposes an efficient parameterization for the twist potentials (look-ahead functions) that approximate the intractable Doob h-transform posterior process. These elements are incorporated into a twisted SMC sampler as a new approximation scheme. No load-bearing step reduces a claimed result or prediction to its own inputs by construction, such as defining a quantity in terms of itself, fitting a parameter on a subset and renaming the fit as a prediction, or relying on a self-citation chain whose content is unverified. The derivation builds on standard twisted SMC concepts with the new parameterization and model class providing independent content, and effectiveness is shown via empirical demonstration on SIRS and wildfire examples rather than tautological reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities beyond the new model class are stated. The central claim rests on the unverified accuracy of the twist-potential approximation in high dimensions.

invented entities (1)

Latent Interacting Particle Systems no independent evidence
purpose: Parameterize the generator of each Markov chain in an interacting system
New model class introduced to enable the subsequent inference method.

pith-pipeline@v0.9.0 · 5665 in / 1084 out tokens · 31883 ms · 2026-05-18T07:12:00.571983+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Latent Interacting Particle Systems, a model class parameterizing the generator of each Markov chain... estimating look-ahead functions (twist potentials)... twisted Sequential Monte Carlo sampling scheme.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Twist-induced proposal... rθ,ψi,t(zi|zt) := rθi,t(zi|zt) sψi,t(zi,zt) ... sψi,t(zi,zt) := hψt(zi→zit)/hψt(zt)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 8 internal anchors

[1]

Foundation inference models for markov jump processes.arXiv preprint arXiv:2406.06419,

David Berghaus, Kostadin Cvejoski, Patrick Seifner, Cesar Ojeda, and Ramses J Sanchez. Foundation inference models for markov jump processes.arXiv preprint arXiv:2406.06419,

work page arXiv
[2]

Reweighted Wake-Sleep

Jörg Bornschein and Yoshua Bengio. Reweighted wake-sleep.arXiv preprint arXiv:1406.2751,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

doi: 10.1007/978-3-030-47845-2

ISBN 978- 3-030-47844-5 978-3-030-47845-2. doi: 10.1007/978-3-030-47845-2. URL https://link. springer.com/10.1007/978-3-030-47845-2. Nicolas Chopin, Andras Fulop, Jeremy Heng, and Alexandre H. Thiery. Computational Doob’s h-transforms for Online Filtering of Discretely Observed Diffusions, May

work page doi:10.1007/978-3-030-47845-2
[4]

arXiv:2206.03369 [stat]

URLhttp://arxiv.org/abs/2206.03369. arXiv:2206.03369 [stat]. Keith C Clarke, James A Brass, and Philip J Riggan. A cellular automation model of wildfire propagation and extinction.Photogrammetric Engineering and Remote Sensing, 60(11): 1355–1367,

work page arXiv
[5]

Guided simulation of conditioned chemical reaction networks.arXiv preprint arXiv:2312.04457,

Marc Corstanje and Frank van der Meulen. Guided simulation of conditioned chemical reaction networks.arXiv preprint arXiv:2312.04457,

work page arXiv
[6]

Discrete flow matching.arXiv preprint arXiv:2407.15595,

URLhttp: //arxiv.org/abs/2407.15595. arXiv:2407.15595 [cs]. Sebastian Gerard, Yu Zhao, and Josephine Sullivan. Wildfirespreadts: A dataset of multi- modal time series for wildfire spread prediction.Advances in Neural Information Processing Systems, 36:74515–74529,

work page arXiv
[7]

The iterated auxiliary particle filter

URLhttp://arxiv.org/abs/1511.06286. arXiv:1511.06286 [stat]. Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors,Proceedings of the 7th Python in Science Conference, pages 11 – 15, Pasadena, CA USA,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Retro- bridge: Modeling retrosynthesis with markov bridges.arXiv preprint arXiv:2308.16212,

Ilia Igashov, Arne Schneuing, Marwin Segler, Michael Bronstein, and Bruno Correia. Retro- bridge: Modeling retrosynthesis with markov bridges.arXiv preprint arXiv:2308.16212,

work page arXiv
[9]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

arXiv:1805.10469 [stat]

URL http://arxiv.org/abs/1805.10469. arXiv:1805.10469 [stat]. 16 Cheuk Kit Lee, Paul Jeha, Jes Frellsen, Pietro Lio, Michael Samuel Albergo, and Francisco Vargas. Debiasing Guidance for Discrete Diffusion with Sequential Monte Carlo, February

work page arXiv
[11]

arXiv preprint arXiv:2502.06079 , year=

URLhttp://arxiv.org/abs/2502.06079. arXiv:2502.06079 [cs]. Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, ShuiwangJi, AvivRegev, SergeyLevine, andMasatoshiUehara. Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding, October

work page arXiv
[12]

arXiv preprint arXiv:2408.08252 , year =

URLhttp://arxiv.org/abs/2408.08252. arXiv:2408.08252 [cs]. Xuechen Li, Ting-Kam Leonard Wong, Ricky TQ Chen, and David Duvenaud. Scalable gradients for stochastic differential equations. InInternational Conference on Artificial Intelligence and Statistics, pages 3870–3882. PMLR,

work page arXiv
[13]

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, and Cho-Jui Hsieh. Neural sde: Stabilizing neural ode networks with stochastic noise.arXiv preprint arXiv:1906.02355,

work page internal anchor Pith review Pith/arXiv arXiv 1906
[14]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion language modeling by estimating the ratios of the data distribution.arXiv preprint arXiv:2310.16834,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Guidance for twisted particle filter: a continuous-time perspective

Jianfeng Lu and Yuliang Wang. Guidance for twisted particle filter: a continuous-time perspective, September2024. URLhttp://arxiv.org/abs/2409.02399. arXiv:2409.02399 [stat]. Robert T McGibbon and Vijay S Pande. Efficient maximum likelihood parameterization of continuous-time markov processes.The Journal of chemical physics, 143(3),

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Variational Sequential Monte Carlo

URLhttps://arxiv.org/abs/1705.11140v2. Christian A. Naesseth, Fredrik Lindsten, and Thomas B. Schön. Elements of Sequential Monte Carlo, December

work page internal anchor Pith review Pith/arXiv arXiv
[17]

and Lindsten, Fredrik and Sch

URLhttp://arxiv.org/abs/1903.04797. arXiv:1903.04797 [stat]. 17 Hunter Nisonoff, Junhao Xiong, Stephan Allenspach, and Jennifer Listgarten. Unlocking Guidance for Discrete State-Space Diffusion and Flow Models, March

work page arXiv 1903
[18]

Unlocking guidance for discrete state-space diffusion and flow models

URLhttp: //arxiv.org/abs/2406.01572. arXiv:2406.01572 [cs]. James R Norris.Markov chains. Cambridge university press,

work page arXiv
[19]

arXiv:2410.05602 [stat]

URLhttp://arxiv.org/ abs/2410.05602. arXiv:2410.05602 [stat]. William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205,

work page arXiv
[20]

Prapas, S

Ioannis Prapas, Spyros Kondylatos, Ioannis Papoutsis, Gustau Camps-Valls, Michele Ronco, Miguel-Ángel Fernández-Torres, Maria Piles Guillem, and Nuno Carvalhais. Deep learning methods for daily wildfire danger forecasting.arXiv preprint arXiv:2111.02736,

work page arXiv
[21]

Score-based continuous-time discrete diffusion models

URLhttp://arxiv.org/abs/2211.16750. arXiv:2211.16750 [cs]. Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit.arXiv preprint arXiv:1905.09883,

work page arXiv 1905
[22]

Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review, January 2025

18 Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review.arXiv preprint arXiv:2501.09685,

work page arXiv
[23]

arXiv preprint arXiv:2209.14734 (2022) 13

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, and Pascal Frossard. Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734,

work page arXiv
[24]

Fine-tuning discrete diffusion models via reward optimization with applications to dna and protein design

URLhttp://arxiv.org/abs/2410.13643. arXiv:2410.13643 [cs]. Christian Wildner and Heinz Koeppl. Moment-based variational inference for markov jump processes. InInternational Conference on Machine Learning, pages 6766–6775. PMLR,

work page arXiv
[25]

Deep Sets

URLhttp://arxiv.org/abs/1703.06114. arXiv:1703.06114 [cs]. Boqian Zhang, Jiangwei Pan, and Vinayak A Rao. Collapsed variational bayes for markov jump processes.Advances in Neural Information Processing Systems, 30,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

org/abs/2404.17546

URLhttp://arxiv. org/abs/2404.17546. arXiv:2404.17546 [cs]. Yuchen Zhu, Wei Guo, Jaemoo Choi, Guan-Horng Liu, Yongxin Chen, and Molei Tao. Mdns: Masked diffusion neural sampler via stochastic optimal control.arXiv preprint arXiv:2508.10684,

work page arXiv
[27]

Background Inference for CTMCs.Inference methods for CTMCs have been extensively studied

19 Appendix A. Background Inference for CTMCs.Inference methods for CTMCs have been extensively studied. Maximum likelihood estimation for time-homogeneous CTMCs is discussed in Jackson (2011); Bladt and Sørensen (2005); McGibbon and Pande (2015). Expectation-maximization techniques for continuous-time hidden Markov models can be found in Bureau et al. (2...

work page 2011
[28]

and variational methods. The latter include mean-field (Opper and Sanguinetti, 2007; Cohn et al., 2010), moment-based methods (Wildner and Koeppl, 2019), combinations with MCMC (Zhang et al., 2017), and extensions to hybrid processes (Köhs et al., 2021). More recent methods include black-box variational inference with neural networks (Seifner and Sánchez,...

work page 2007
[29]

By definition, we can express the generator L⋆ t ofP ⋆ t as L⋆ t (f)(z) = lim ∆t→0 EP ⋆[f(Z t+∆t)|Z t =z]−f(z) ∆t (26) = lim ∆t→0 EP f(Z t+∆t) h⋆ t+∆t(Zt+∆t) h⋆ t (z) |Z t =z −f(z) ∆t (27) Moreover, we can approximateh⋆ t+∆t(z)for t∈ [τk, τk+1 − ∆t), k∈ [1 : K]using a Taylor expansion around timet h⋆ t+∆t(z) =h ⋆ t (z) + ∆t ∂h⋆ t (z) ∂t +o(∆ t)(28) =h ⋆ t...

work page 1998
[30]

The time and memory cost of the loss terms can be reduced toΘ(Cs)andΘ( Cr)by employing a Monte Carlo approximation of time, only considering a single timestep for each update

In our experiments, we letΦbear the cost of heavy operations such as processing future observations, observation times, as well as covariates and positional information, whileρis a simple two-layer MLP. The time and memory cost of the loss terms can be reduced toΘ(Cs)andΘ( Cr)by employing a Monte Carlo approximation of time, only considering a single time...

work page 2020
[31]

, Gdo 4:# Sleep phase 5:forn= 1,

3:forg= 1, . . . , Gdo 4:# Sleep phase 5:forn= 1, . . . , Ndo 6:forb= 1, . . . , Bdo 7: Simulatez (b) [t0:tM] from the priorPθ via Euler steps. 8: Simulate synthetic observations˜y (b) k ∼p θ(· |z (b) τk ), fork= 1, . . . , K. 9:ifmc_lossthen 10:m∼ U({0, . . . , M−1}) 11:ℓ (b) sleep =−logq ψ 0 (z(b) 0 ) +M X i∈I ∆tm+1 rθ,ψ i,tm(zi,(b) tm |z (b) tm)−δ zi,(...

work page 2017
[32]

(2025) proposed to compute a first-order Taylor approximation of the log-twist evaluated at a specific valuez, i.e

+ 1forward passes of the twist function, Nisonoff et al. (2025) proposed to compute a first-order Taylor approximation of the log-twist evaluated at a specific valuez, i.e. logh ψ t (zt)≈logh ψ t (z) +z ⊤ t ∇z logh ψ t (z)(47) wherez , zt are one-hot encoded versions ofz, zt, enabling a single forward pass atz of the twist function. Note that backpropagat...

work page 2025
[33]

Positive samplesz+ t are generated by the forward model using ancestral sampling ofz+ [0,T] ∼P θ first, and theny1:K, τ1:K

used to learn the twist in NAS-X and TAG is ˆLDRE(ψ) = X t∈T X i∈I logσ(logh ψ t (z+ t ;y ≥t, τ≥t)) + log(1−σ(logh ψ t (z− t ;y ≥t, τ≥t))),(48) where σ : R→ [0, 1]is the logistic function. Positive samplesz+ t are generated by the forward model using ancestral sampling ofz+ [0,T] ∼P θ first, and theny1:K, τ1:K. Negative samples z− t ∼P θ, and are hence un...

work page 2008
[34]

Each nodei has a feature vector ξi ∈R 16, included to make posterior inference more challenging. Ground truth paths on 31 [0, T] = [0, 10]are drawn using Gillespie’s algorithm (Gillespie, 1977; Wilkinson, 2018), with rate parameters in equation 20 fixed to(α0, α1, β, γ) = (0.1, 1.0, 0.4, 0.05). We assign 50 trajectories for the training set, and 50 to the...

work page 1977
[35]

D.3 Wildfires trajectories Dataset.As mentioned in Section 4.2, we consider a subset of the trajectories in the WildfireSpreadTS dataset (Gerard et al., 2023). We filter them based on the following criteria: • We take a64 × 64crop at the center of the image at the day corresponding to the starting date in GlobFire (Artés et al., 2019), and consider trajec...

work page 2023

[1] [1]

Foundation inference models for markov jump processes.arXiv preprint arXiv:2406.06419,

David Berghaus, Kostadin Cvejoski, Patrick Seifner, Cesar Ojeda, and Ramses J Sanchez. Foundation inference models for markov jump processes.arXiv preprint arXiv:2406.06419,

work page arXiv

[2] [2]

Reweighted Wake-Sleep

Jörg Bornschein and Yoshua Bengio. Reweighted wake-sleep.arXiv preprint arXiv:1406.2751,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

doi: 10.1007/978-3-030-47845-2

ISBN 978- 3-030-47844-5 978-3-030-47845-2. doi: 10.1007/978-3-030-47845-2. URL https://link. springer.com/10.1007/978-3-030-47845-2. Nicolas Chopin, Andras Fulop, Jeremy Heng, and Alexandre H. Thiery. Computational Doob’s h-transforms for Online Filtering of Discretely Observed Diffusions, May

work page doi:10.1007/978-3-030-47845-2

[4] [4]

arXiv:2206.03369 [stat]

URLhttp://arxiv.org/abs/2206.03369. arXiv:2206.03369 [stat]. Keith C Clarke, James A Brass, and Philip J Riggan. A cellular automation model of wildfire propagation and extinction.Photogrammetric Engineering and Remote Sensing, 60(11): 1355–1367,

work page arXiv

[5] [5]

Guided simulation of conditioned chemical reaction networks.arXiv preprint arXiv:2312.04457,

Marc Corstanje and Frank van der Meulen. Guided simulation of conditioned chemical reaction networks.arXiv preprint arXiv:2312.04457,

work page arXiv

[6] [6]

Discrete flow matching.arXiv preprint arXiv:2407.15595,

URLhttp: //arxiv.org/abs/2407.15595. arXiv:2407.15595 [cs]. Sebastian Gerard, Yu Zhao, and Josephine Sullivan. Wildfirespreadts: A dataset of multi- modal time series for wildfire spread prediction.Advances in Neural Information Processing Systems, 36:74515–74529,

work page arXiv

[7] [7]

The iterated auxiliary particle filter

URLhttp://arxiv.org/abs/1511.06286. arXiv:1511.06286 [stat]. Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors,Proceedings of the 7th Python in Science Conference, pages 11 – 15, Pasadena, CA USA,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Retro- bridge: Modeling retrosynthesis with markov bridges.arXiv preprint arXiv:2308.16212,

Ilia Igashov, Arne Schneuing, Marwin Segler, Michael Bronstein, and Bruno Correia. Retro- bridge: Modeling retrosynthesis with markov bridges.arXiv preprint arXiv:2308.16212,

work page arXiv

[9] [9]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

arXiv:1805.10469 [stat]

URL http://arxiv.org/abs/1805.10469. arXiv:1805.10469 [stat]. 16 Cheuk Kit Lee, Paul Jeha, Jes Frellsen, Pietro Lio, Michael Samuel Albergo, and Francisco Vargas. Debiasing Guidance for Discrete Diffusion with Sequential Monte Carlo, February

work page arXiv

[11] [11]

arXiv preprint arXiv:2502.06079 , year=

URLhttp://arxiv.org/abs/2502.06079. arXiv:2502.06079 [cs]. Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, ShuiwangJi, AvivRegev, SergeyLevine, andMasatoshiUehara. Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding, October

work page arXiv

[12] [12]

arXiv preprint arXiv:2408.08252 , year =

URLhttp://arxiv.org/abs/2408.08252. arXiv:2408.08252 [cs]. Xuechen Li, Ting-Kam Leonard Wong, Ricky TQ Chen, and David Duvenaud. Scalable gradients for stochastic differential equations. InInternational Conference on Artificial Intelligence and Statistics, pages 3870–3882. PMLR,

work page arXiv

[13] [13]

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, and Cho-Jui Hsieh. Neural sde: Stabilizing neural ode networks with stochastic noise.arXiv preprint arXiv:1906.02355,

work page internal anchor Pith review Pith/arXiv arXiv 1906

[14] [14]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion language modeling by estimating the ratios of the data distribution.arXiv preprint arXiv:2310.16834,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Guidance for twisted particle filter: a continuous-time perspective

Jianfeng Lu and Yuliang Wang. Guidance for twisted particle filter: a continuous-time perspective, September2024. URLhttp://arxiv.org/abs/2409.02399. arXiv:2409.02399 [stat]. Robert T McGibbon and Vijay S Pande. Efficient maximum likelihood parameterization of continuous-time markov processes.The Journal of chemical physics, 143(3),

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Variational Sequential Monte Carlo

URLhttps://arxiv.org/abs/1705.11140v2. Christian A. Naesseth, Fredrik Lindsten, and Thomas B. Schön. Elements of Sequential Monte Carlo, December

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

and Lindsten, Fredrik and Sch

URLhttp://arxiv.org/abs/1903.04797. arXiv:1903.04797 [stat]. 17 Hunter Nisonoff, Junhao Xiong, Stephan Allenspach, and Jennifer Listgarten. Unlocking Guidance for Discrete State-Space Diffusion and Flow Models, March

work page arXiv 1903

[18] [18]

Unlocking guidance for discrete state-space diffusion and flow models

URLhttp: //arxiv.org/abs/2406.01572. arXiv:2406.01572 [cs]. James R Norris.Markov chains. Cambridge university press,

work page arXiv

[19] [19]

arXiv:2410.05602 [stat]

URLhttp://arxiv.org/ abs/2410.05602. arXiv:2410.05602 [stat]. William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205,

work page arXiv

[20] [20]

Prapas, S

Ioannis Prapas, Spyros Kondylatos, Ioannis Papoutsis, Gustau Camps-Valls, Michele Ronco, Miguel-Ángel Fernández-Torres, Maria Piles Guillem, and Nuno Carvalhais. Deep learning methods for daily wildfire danger forecasting.arXiv preprint arXiv:2111.02736,

work page arXiv

[21] [21]

Score-based continuous-time discrete diffusion models

URLhttp://arxiv.org/abs/2211.16750. arXiv:2211.16750 [cs]. Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit.arXiv preprint arXiv:1905.09883,

work page arXiv 1905

[22] [22]

Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review, January 2025

18 Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review.arXiv preprint arXiv:2501.09685,

work page arXiv

[23] [23]

arXiv preprint arXiv:2209.14734 (2022) 13

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, and Pascal Frossard. Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734,

work page arXiv

[24] [24]

Fine-tuning discrete diffusion models via reward optimization with applications to dna and protein design

URLhttp://arxiv.org/abs/2410.13643. arXiv:2410.13643 [cs]. Christian Wildner and Heinz Koeppl. Moment-based variational inference for markov jump processes. InInternational Conference on Machine Learning, pages 6766–6775. PMLR,

work page arXiv

[25] [25]

Deep Sets

URLhttp://arxiv.org/abs/1703.06114. arXiv:1703.06114 [cs]. Boqian Zhang, Jiangwei Pan, and Vinayak A Rao. Collapsed variational bayes for markov jump processes.Advances in Neural Information Processing Systems, 30,

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

org/abs/2404.17546

URLhttp://arxiv. org/abs/2404.17546. arXiv:2404.17546 [cs]. Yuchen Zhu, Wei Guo, Jaemoo Choi, Guan-Horng Liu, Yongxin Chen, and Molei Tao. Mdns: Masked diffusion neural sampler via stochastic optimal control.arXiv preprint arXiv:2508.10684,

work page arXiv

[27] [27]

Background Inference for CTMCs.Inference methods for CTMCs have been extensively studied

19 Appendix A. Background Inference for CTMCs.Inference methods for CTMCs have been extensively studied. Maximum likelihood estimation for time-homogeneous CTMCs is discussed in Jackson (2011); Bladt and Sørensen (2005); McGibbon and Pande (2015). Expectation-maximization techniques for continuous-time hidden Markov models can be found in Bureau et al. (2...

work page 2011

[28] [28]

and variational methods. The latter include mean-field (Opper and Sanguinetti, 2007; Cohn et al., 2010), moment-based methods (Wildner and Koeppl, 2019), combinations with MCMC (Zhang et al., 2017), and extensions to hybrid processes (Köhs et al., 2021). More recent methods include black-box variational inference with neural networks (Seifner and Sánchez,...

work page 2007

[29] [29]

By definition, we can express the generator L⋆ t ofP ⋆ t as L⋆ t (f)(z) = lim ∆t→0 EP ⋆[f(Z t+∆t)|Z t =z]−f(z) ∆t (26) = lim ∆t→0 EP f(Z t+∆t) h⋆ t+∆t(Zt+∆t) h⋆ t (z) |Z t =z −f(z) ∆t (27) Moreover, we can approximateh⋆ t+∆t(z)for t∈ [τk, τk+1 − ∆t), k∈ [1 : K]using a Taylor expansion around timet h⋆ t+∆t(z) =h ⋆ t (z) + ∆t ∂h⋆ t (z) ∂t +o(∆ t)(28) =h ⋆ t...

work page 1998

[30] [30]

The time and memory cost of the loss terms can be reduced toΘ(Cs)andΘ( Cr)by employing a Monte Carlo approximation of time, only considering a single timestep for each update

In our experiments, we letΦbear the cost of heavy operations such as processing future observations, observation times, as well as covariates and positional information, whileρis a simple two-layer MLP. The time and memory cost of the loss terms can be reduced toΘ(Cs)andΘ( Cr)by employing a Monte Carlo approximation of time, only considering a single time...

work page 2020

[31] [31]

, Gdo 4:# Sleep phase 5:forn= 1,

3:forg= 1, . . . , Gdo 4:# Sleep phase 5:forn= 1, . . . , Ndo 6:forb= 1, . . . , Bdo 7: Simulatez (b) [t0:tM] from the priorPθ via Euler steps. 8: Simulate synthetic observations˜y (b) k ∼p θ(· |z (b) τk ), fork= 1, . . . , K. 9:ifmc_lossthen 10:m∼ U({0, . . . , M−1}) 11:ℓ (b) sleep =−logq ψ 0 (z(b) 0 ) +M X i∈I ∆tm+1 rθ,ψ i,tm(zi,(b) tm |z (b) tm)−δ zi,(...

work page 2017

[32] [32]

(2025) proposed to compute a first-order Taylor approximation of the log-twist evaluated at a specific valuez, i.e

+ 1forward passes of the twist function, Nisonoff et al. (2025) proposed to compute a first-order Taylor approximation of the log-twist evaluated at a specific valuez, i.e. logh ψ t (zt)≈logh ψ t (z) +z ⊤ t ∇z logh ψ t (z)(47) wherez , zt are one-hot encoded versions ofz, zt, enabling a single forward pass atz of the twist function. Note that backpropagat...

work page 2025

[33] [33]

Positive samplesz+ t are generated by the forward model using ancestral sampling ofz+ [0,T] ∼P θ first, and theny1:K, τ1:K

used to learn the twist in NAS-X and TAG is ˆLDRE(ψ) = X t∈T X i∈I logσ(logh ψ t (z+ t ;y ≥t, τ≥t)) + log(1−σ(logh ψ t (z− t ;y ≥t, τ≥t))),(48) where σ : R→ [0, 1]is the logistic function. Positive samplesz+ t are generated by the forward model using ancestral sampling ofz+ [0,T] ∼P θ first, and theny1:K, τ1:K. Negative samples z− t ∼P θ, and are hence un...

work page 2008

[34] [34]

Each nodei has a feature vector ξi ∈R 16, included to make posterior inference more challenging. Ground truth paths on 31 [0, T] = [0, 10]are drawn using Gillespie’s algorithm (Gillespie, 1977; Wilkinson, 2018), with rate parameters in equation 20 fixed to(α0, α1, β, γ) = (0.1, 1.0, 0.4, 0.05). We assign 50 trajectories for the training set, and 50 to the...

work page 1977

[35] [35]

D.3 Wildfires trajectories Dataset.As mentioned in Section 4.2, we consider a subset of the trajectories in the WildfireSpreadTS dataset (Gerard et al., 2023). We filter them based on the following criteria: • We take a64 × 64crop at the center of the image at the day corresponding to the starting date in GlobFire (Artés et al., 2019), and consider trajec...

work page 2023