pith. sign in

arxiv: 2508.17090 · v3 · submitted 2025-08-23 · 📊 stat.ML · cs.LG

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

Pith reviewed 2026-05-18 21:24 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords neural SDEcompact state spacesuicide risk modelingEMA dataconstrained dynamicslatent SDEclinical time seriesstochastic processes
0
0 comments X

The pith

Neural SDEs can be parameterized to remain inside any prescribed compact polyhedral state space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to create stochastic differential equations using neural networks that stay within a specified compact region of state space, such as the range of possible values in ecological momentary assessment data for suicidal thoughts. Standard approaches often produce outputs that violate these natural bounds, reducing trust in the model for clinical use. The authors derive mathematical constraints that the drift and diffusion must satisfy to keep solutions inside the space for both general and stationary cases. They also give a practical way to turn any proposed neural dynamics into ones that obey these conditions. This leads to better predictions and smoother training on real suicide risk datasets.

Core claim

We derive constraints on drift and diffusion for general and stationary SDEs so their solutions remain in the desired state space; and we introduce a parameterization that maps arbitrary (neural or expert-given) dynamics into constraint-satisfying SDEs. On several real EMA datasets, including a large suicide-risk study, our parameterization improves forecasts and optimization dynamics over standard latent neural SDE baselines.

What carries the argument

The parameterization of drift and diffusion that maps arbitrary dynamics into forms satisfying the derived constraints for keeping solutions inside the compact polyhedral state space.

If this is right

  • Solutions of the SDE remain confined to the target compact domain by construction.
  • Training proceeds without the numerical instabilities that forced prior models to oversimplify dynamics.
  • Forecasts on real EMA suicide risk data improve relative to standard latent neural SDE baselines.
  • The construction extends SDE-based modeling to other domains that impose hard state constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constraint approach could be extended to non-polyhedral domains such as spheres.
  • Expert-specified dynamics could be fed into the parameterization before neural components are added.
  • The method may enable safer real-time clinical decision tools that cannot output impossible values.

Load-bearing premise

The parameterization preserves sufficient expressiveness for modeling real EMA data without introducing new numerical instabilities.

What would settle it

Checking whether trajectories generated by the trained model ever exit the prescribed compact state space on held-out EMA sequences, or whether training still exhibits the instabilities of unconstrained neural SDEs.

Figures

Figures reproduced from arXiv: 2508.17090 by Malinda Lu, Matthew K. Nock, Yaniv Yacoby, Yue-Jane Liu.

Figure 1
Figure 1. Figure 1: Intuition behind WSP for different polyhedra. Top left: w(z) from Eq. 4, approaching 0 at the boundaries and 1 in the interior. Top right: ch(z) from Eq. 5, pointing towards the Chebyshev center ⋆. Bottom left: solutions to WSP SDE, successfully remaining in K. Bottom right: some unconstrained drift h˜ vs. WSP drift h (Eq. 5) matching in the interior of K, but differing near the bounds. 4. Parameterization… view at source ↗
Figure 2
Figure 2. Figure 2: Top: WSP exhibits better inductive bias than baselines. Left: unconstrained SDE (Eq. 1) with NN quickly leaves K = [0, 1]. Middle: SDE transformed via sigmoid (Eqs. 2 and 3) sticks to the boundary. Right: SDE with WSP (Eq. 5) successfully remains in K. Bottom: Stationary WSP exhibits favorable inductive bias. Given a target time-marginal and WSP diffusion, drift derived from Theorem 3.3 yields an SDE viabl… view at source ↗
Figure 3
Figure 3. Figure 3: WSP exhibits better inductive bias than baselines given smooth, pathwise expansion of Brownian motion. Top left: Stratonovich-SDE with NN quickly leaves K = [0, 1]. Top & bottom right: Stratonovich-SDE transformed via sigmoid sticks to the boundary. Bottom left: Stratonovich-SDE with WSP successfully remains in K. Note: for Eqs. 2 and 3, we used the Stratonovich chain-rule instead of Ito’s lemma. F. Result… view at source ↗
Figure 4
Figure 4. Figure 4: Stationary SDE exhibits better inductive bias than baselines. Given a target time-marginal, given any diffusion with WSP, we can always derive a corresponding drift via Theorem 3.3 that is viable in K and has the target stationary distribution. Like the non-stationary dynamics, these dynamics overcome the shortcomings of the baselines dynamics in Eqs. 1–3. Here, our diffusion is a NN with randomly initiali… view at source ↗
read the original abstract

Ecological Momentary Assessment (EMA) studies enable the collection of high-frequency self-reports of suicidal thoughts and behaviors (STBs) via smartphones. Latent stochastic differential equations (SDEs) are a promising model class for EMA data, as it is irregularly sampled, noisy, and partially observed. But SDE-based models suffer from two key limitations. (a) These models often violate domain constraints, undermining scientific validity and clinical trust of the model. (b) Training is numerically unstable without ad hoc fixes (e.g. oversimplified dynamics) that are ill-suited for high-stakes applications. Here, we develop a novel class of expressive SDEs whose solutions are provably confined to a prescribed compact polyhedral state space, matching the domains of EMA data. In this work, (1) we show why chain-rule based constructions of SDEs on compact domains fail, theoretically and empirically; (2) we derive constraints on drift and diffusion for general and stationary SDEs so their solutions remain in the desired state space; and (3), we introduce a parameterization that maps arbitrary (neural or expert-given) dynamics into constraint-satisfying SDEs. On several real EMA datasets, including a large suicide-risk study, our parameterization improves forecasts and optimization dynamics over standard latent neural SDE baselines. These contributions pave the way for principled, trustworthy continuous-time models of suicide risk and other clinical time series and extend applications of SDE-based methods (e.g. diffusion models) to domains with hard state constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a class of neural SDEs whose solutions remain provably inside a prescribed compact polyhedral domain, motivated by irregularly sampled EMA data on suicidal thoughts and behaviors. It shows that standard chain-rule constructions fail to enforce the constraints, derives necessary conditions on the drift and diffusion coefficients for both general and stationary SDEs, and introduces a parameterization that converts arbitrary (neural or expert-specified) dynamics into constraint-satisfying versions. Experiments on multiple real EMA datasets, including a large suicide-risk study, report improved forecast accuracy and more stable optimization relative to unconstrained latent neural SDE baselines.

Significance. If the derived boundary conditions are necessary and sufficient and the parameterization is sufficiently expressive, the work supplies a principled mechanism for enforcing hard state-space constraints in continuous-time latent models. This would strengthen the applicability of SDE-based methods to clinical time series with bounded domains and could improve both scientific validity and numerical reliability in high-stakes settings.

major comments (2)
  1. [Parameterization section (following constraint derivations)] The parameterization (described after the constraint derivations) maps arbitrary neural dynamics f and g to constrained f' and g'. It is not shown that this map is surjective onto the set of all valid (drift, diffusion) pairs whose Itô processes remain inside the polyhedron. If the construction fixes components of the diffusion matrix or adds a fixed reflection term, the representable processes form a strict subclass; this would undermine the claim of 'arbitrary (neural or expert-given) dynamics' without domain-specific simplifications and weaken the interpretation of the reported gains on EMA data.
  2. [Theory section on diffusion constraints] The necessity of the normal-component-zero condition on the diffusion at the boundary is asserted for general SDEs. A concrete counter-example or a reference to the supporting stochastic-process argument (e.g., via local time or exit-time analysis) should be supplied to confirm that violating the condition can produce positive exit probability.
minor comments (2)
  1. [Notation and preliminaries] Notation for the polyhedral facets and inward normals should be introduced once and used consistently; currently the same symbol appears to be reused for different quantities in the stationary-case subsection.
  2. [Experiments] The experimental section reports improvements over 'standard latent neural SDE baselines' but does not list the exact baseline architectures, hyper-parameter search ranges, or number of random seeds; these details are needed to assess whether the gains are robust.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: The parameterization (described after the constraint derivations) maps arbitrary neural dynamics f and g to constrained f' and g'. It is not shown that this map is surjective onto the set of all valid (drift, diffusion) pairs whose Itô processes remain inside the polyhedron. If the construction fixes components of the diffusion matrix or adds a fixed reflection term, the representable processes form a strict subclass; this would undermine the claim of 'arbitrary (neural or expert-given) dynamics' without domain-specific simplifications and weaken the interpretation of the reported gains on EMA data.

    Authors: We agree that the parameterization is not shown to be surjective onto every possible valid constrained (drift, diffusion) pair. It is constructed to take arbitrary input dynamics and adjust them to satisfy the necessary boundary conditions derived in the paper, rather than parametrizing the full set of constrained processes. This approach still fulfills the goal of allowing flexible neural or expert-specified dynamics while provably confining solutions to the polyhedron. The gains on EMA datasets relative to unconstrained baselines stem from this practical flexibility and improved numerical stability. In revision we will explicitly note the lack of a surjectivity claim and add discussion of the representable class of processes. revision: partial

  2. Referee: The necessity of the normal-component-zero condition on the diffusion at the boundary is asserted for general SDEs. A concrete counter-example or a reference to the supporting stochastic-process argument (e.g., via local time or exit-time analysis) should be supplied to confirm that violating the condition can produce positive exit probability.

    Authors: We will strengthen this section in revision. We will add a simple one-dimensional counterexample: an Itô process on [0,1] with diffusion coefficient identically equal to 1 (non-zero normal component at the endpoints) has positive probability of exiting in arbitrarily small time, as the exit time τ satisfies P(τ < ε) > 0 for every ε > 0. For the general case we will cite standard results on boundary behavior of diffusions (via local-time and exit-time analysis) establishing that a non-vanishing normal diffusion component implies positive exit probability. This material will be inserted into the Theory section. revision: yes

Circularity Check

0 steps flagged

No significant circularity in mathematical derivations of SDE constraints

full rationale

The paper's central claims rest on deriving constraints for drift and diffusion terms so that SDE solutions remain inside a prescribed compact polyhedron, plus an explicit parameterization that maps arbitrary dynamics into ones satisfying those boundary conditions. These steps are presented as first-principles results from stochastic calculus (showing why naive chain-rule constructions fail, then stating the necessary inward/tangent drift and zero-normal diffusion conditions). No quoted equation reduces a prediction to a fitted quantity defined by the same data, no self-citation is invoked as a uniqueness theorem that forces the construction, and the parameterization is introduced as a constructive map rather than an ansatz smuggled from prior work. The derivation chain is therefore self-contained and does not collapse to its inputs by definition. This matches the reader's assessment that the contribution is theoretical rather than a statistical tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard stochastic calculus background while introducing new domain-specific constraints; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption SDE solutions remain inside a prescribed compact polyhedral state space when drift and diffusion satisfy derived inequalities.
    Central to the theoretical contribution stated in the abstract.

pith-pipeline@v0.9.0 · 5820 in / 1232 out tokens · 33725 ms · 2026-05-18T21:24:05.635869+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 5 internal anchors

  1. [1]

    Barron, J. T. Continuously differentiable exponential linear units. arXiv preprint arXiv:1704.07483,

  2. [2]

    Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

    Clevert, D.-A., Unterthiner, T., and Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289,

  3. [3]

    URL:https://math.stackexchange.com/q/2171798 (version: 2021-10-12)

    URL https://math.stackexchange.com/q/2171798. URL:https://math.stackexchange.com/q/2171798 (version: 2021-10-12). Fishman, N., Klarner, L., Bortoli, V . D., Mathieu, E., and Hutchinson, M. J. Diffusion models for constrained domains. Transactions on Machine Learning Research, 2023a. ISSN 2835-8856. Fishman, N., Klarner, L., Mathieu, E., Hutchinson, M., an...

  4. [4]

    Enhancing mixup-based semi-supervised learningwith explicit lipschitz regularization

    Gyawali, P., Ghimire, S., and Wang, L. Enhancing mixup-based semi-supervised learningwith explicit lipschitz regularization. In 2020 IEEE International Conference on Data Mining (ICDM), pp. 1046–1051. IEEE,

  5. [5]

    Gaussian Error Linear Units (GELUs)

    Hendrycks, D. and Gimpel, K. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415,

  6. [6]

    D., Williams, F., Jacobson, A., Fidler, S., and Litany, O

    Liu, H.-T. D., Williams, F., Jacobson, A., Fidler, S., and Litany, O. Learning smooth neural functions via lipschitz regularization. In ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–13,

  7. [7]

    Spectral Normalization for Generative Adversarial Networks

    Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y . Spectral normalization for generative adversarial networks.arXiv preprint arXiv:1802.05957,

  8. [8]

    Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

    URL https://chrisorm.github.io/ SDE-S.html. Phan, D., Pradhan, N., and Jankowiak, M. Composable effects for flexible and accelerated probabilistic programming in numpyro. arXiv preprint arXiv:1912.11554,

  9. [9]

    Extending Theorem 3.2 to Stratonovich SDEs on Compact Polyhedra Corollary A.1

    8 Neural SDEs on Compact State-Spaces A. Extending Theorem 3.2 to Stratonovich SDEs on Compact Polyhedra Corollary A.1. Suppose that the drift and diffusion, h(t, zt) and g(t, zt), of a Stratonovich SDE, defined for t ≥ 0 and zt ∈ RDz, satisfy conditions (i)-(iii) from Theorem 3.2. Suppose further that for all T > 0, zt, z′ t ∈ K, and t ∈ [0, T], ∥diag(∇z...

  10. [10]

    Proof of Theorem 3.3 Proof

    B. Proof of Theorem 3.3 Proof. To prove Theorem 3.3, we will show that the form for the drift listed in condition (a) results in a stationary SDE. Then, we will prove that this stationary SDE satisfies all conditions from Theorem 3.2, implying it is viable in K. We find h by drawing inspiration from the derivation Cai & Lin (1996), which sets the Fokker-P...

  11. [11]

    Proof that WSP satisfies (ii) from Theorem 3.2

    · cg(zt)∥ (86) = ∥˜g(t, zt)∥ + ∥cg(zt)∥ (87) = ∥˜g(t, zt)∥ + ∥0∥ (88) = ∥˜g(t, zt)∥ (89) ≤ p CT · (1 + ∥zt∥2) (90) Thus, ∥g(t, zt)∥2 ≤ CT · (1 + ∥zt∥2). Proof that WSP satisfies (ii) from Theorem 3.2. We now prove that for all T > 0, zt, z′ t ∈ K, and t ∈ [0, T], ∥h(t, zt) − h(t, z′ t)∥ + ∥g(t, zt) − g(t, z′ t)∥ ≤ CT · ∥zt − z′ t∥. We do this as follows: ...

  12. [12]

    Lipschitz smoothness,

    · cg(zt)) ⊙ ed, vs⟩ (112) = ⟨cg(zt) ⊙ ed, vs⟩ (113) = ⟨0 ⊙ ed, vs⟩ (114) = ⟨0, vs⟩ (115) = 0 (116) D. Discussion of Assumptions The assumptions in Theorems 3.2 and 3.3 are easily satisfied when h, g, and log ˜p(zt) are parameterized by NNs. Lipschitz continuity with respect to inputs. Lipschitz continuous functions are closed under composition, making a l...

  13. [13]

    1–3), we solve SDEs given by NNs h and g with randomly sampled weights

    against baselines (Eqs. 1–3), we solve SDEs given by NNs h and g with randomly sampled weights. We define the viable region, K = [0, 1], to be a compact rectangle, and specifically choose to set z0 = 0.99 near the boundary to stress-test the chain-rule based SDEs in Eqs. 2 and 3 to show that once close to the boundary, they will struggle to return to the ...

  14. [14]

    with NumPyro (Phan et al., 2019), Diffrax (Kidger,

  15. [15]

    Gaussian Assumed Approximation

    WSP exhibits better inductive bias than baselines given smooth, pathwise expansion of Brownian motion. Top left: Stratonovich-SDE with NN quickly leaves K = [0 , 1]. Top & bottom right: Stratonovich-SDE transformed via sigmoid sticks to the boundary. Bottom left: Stratonovich-SDE with WSP successfully remains in K. Note: for Eqs. 2 and 3, we used the Stra...

  16. [16]

    This expansion replaces Brownian motion, dBt, with a randomly weighted sum of ODEs, allowing us to use an ODE solver

    · π · t 2T · ξr · dt, (117) where T is the end-time of the process. This expansion replaces Brownian motion, dBt, with a randomly weighted sum of ODEs, allowing us to use an ODE solver. As R → ∞ , the distribution of dbBt converges to that of dBt, and overall differential equation converges to the Stratonovich SDE (Wong & Zakai, 1965). In Fig. 3, we empir...