pith. sign in

arxiv: 2602.04907 · v2 · pith:MBUUJWD3new · submitted 2026-02-03 · 💻 cs.LG · cs.AI· stat.ME

Causal Discovery from Heteroscedastic Stochastic Dynamical Systems under Imperfect Physical Models

Pith reviewed 2026-05-21 13:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ME
keywords causal discoverystochastic differential equationsdynamical systemsheteroscedastic SDEphysical modelsgraph recoveryODE misspecification
0
0 comments X

The pith

A framework recovers causal structures in dynamical systems by modeling known physics in the SDE drift and unknown couplings in the diffusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes integrating partial physical knowledge from ODEs into causal discovery for dynamical systems using stochastic differential equations. The known dynamics are placed in the drift term, allowing the diffusion term to reveal causal interactions that go beyond the prescribed physics. This approach handles cyclic interactions and nonstationarity, providing recovery guarantees for both stable and unstable systems along with robustness to imperfect physical models. Experiments demonstrate better performance than purely data-driven methods on simulated and real epidemic data.

Core claim

The authors develop a causal discovery framework for heteroscedastic SDEs where the drift encodes known ODE dynamics and the diffusion term captures unknown causal couplings. They introduce a sparsity-inducing maximum quasi-likelihood estimator with a stabilization technique, prove causal graph recovery guarantees under mild conditions for stable and unstable SDEs, and show robustness to ODE misspecification.

What carries the argument

The heteroscedastic SDE model with fixed drift from ODE physics and learnable diffusion matrix representing causal structure.

If this is right

  • Causal graphs can be recovered from time series data of both stable and unstable dynamical systems.
  • The estimate remains reliable even when the ODE model is misspecified.
  • The stabilization technique improves optimization while preserving statistical recoverability.
  • Improved performance on nonlinear benchmarks like Lotka-Volterra and Lorenz dynamics with cyclic structures.
  • Reconstruction of stochastic SIR dynamics from real-world epidemic data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could be extended to systems with partial knowledge of higher-order dynamics.
  • Combining physics and data-driven causal discovery may improve forecasting in complex real-world processes.
  • The method suggests a way to handle nonstationary data without assuming equilibrium.
  • Future work might test the framework on high-dimensional systems beyond the current benchmarks.

Load-bearing premise

The system can be represented as a heteroscedastic SDE where the drift term exactly encodes the known ODE dynamics and the diffusion term captures the unknown causal couplings.

What would settle it

Observing that the recovered causal graph fails to match the ground truth in a simulation where the true dynamics are a known heteroscedastic SDE but the drift is misspecified would falsify the robustness analysis.

Figures

Figures reproduced from arXiv: 2602.04907 by Jianhong Chen, Naichen Shi, Xubo Yue.

Figure 1
Figure 1. Figure 1: A motivating example of causal discovery under partially known physics. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Results on DAGs (stable system): mean/std over 10 runs (SHD, TPR, FDR). The full [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Results on directed loop graphs (stable system): mean/std over 10 runs (SHD, TPR, [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Causal discovery is a data-driven paradigm for analyzing complex systems, while physics-based models, such as ordinary differential equations (ODEs), provide mechanistic structure for real-world dynamical processes. Integrating these paradigms can improve identifiability, stability, and robustness. However, real dynamical systems often exhibit cyclic interactions and nonstationarity, whereas many causal discovery methods rely on acyclicity, stationarity, or equilibrium assumptions. We propose an integrative causal discovery framework for dynamical systems that leverages partial physical knowledge through stochastic differential equations (SDEs). The drift term encodes known ODE dynamics, while the diffusion term captures unknown causal couplings beyond the prescribed physics. We develop a scalable sparsity-inducing maximum quasi-likelihood estimator with a theoretically justified stabilization technique to improve the optimization landscape. Under mild conditions, we establish causal graph recovery guarantees for both stable and unstable SDEs. We also analyze robustness of our causal graph estimate to ODE misspecification and clarify how the introduced stabilization technique balances numerical stability and statistical recoverability. Experiments on linear SDEs and nonlinear benchmarks, including Lotka-Volterra and Lorenz dynamics with acyclic and cyclic structures, show improved graph recovery and robustness over data-driven baselines. We also demonstrate practical utility on real-world epidemic data by reconstructing stochastic SIR dynamics within our causal discovery framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an integrative causal discovery framework for dynamical systems modeled as heteroscedastic SDEs, with the drift term encoding known (but imperfect) ODE physics and the diffusion term capturing unknown causal couplings. It develops a sparsity-inducing maximum quasi-likelihood estimator equipped with a stabilization technique, establishes causal graph recovery guarantees under mild conditions for both stable and unstable SDEs, analyzes robustness to ODE misspecification, and reports improved performance over data-driven baselines on linear SDEs, nonlinear benchmarks (Lotka-Volterra, Lorenz with acyclic/cyclic structures), and real epidemic data via stochastic SIR reconstruction.

Significance. If the recovery guarantees and robustness analysis hold, the work provides a principled way to leverage partial physical knowledge for causal discovery in cyclic and nonstationary systems, addressing limitations of acyclicity or stationarity assumptions in existing methods. The explicit treatment of unstable SDEs and misspecification, together with the empirical validation on nonlinear benchmarks and real-world data, strengthens the contribution; the stabilization technique is presented as theoretically justified and practically useful.

major comments (2)
  1. [§4 (Recovery Guarantees)] §4 (Recovery Guarantees): The claim of causal graph recovery for unstable SDEs rests on the stabilized quasi-likelihood estimator remaining consistent when trajectories diverge, yet the argument invokes the same mild conditions as the stable case without an explicit uniform integrability or moment-control argument to bound the integrated quasi-likelihood (involving inverse diffusion and derivatives) under unbounded growth. This is load-bearing for the central claim covering both stable and unstable regimes.
  2. [§3.2 (Stabilization Technique, Eq. defining the stabilized estimator)] §3.2 (Stabilization Technique, Eq. defining the stabilized estimator): The stabilization parameter is described as balancing numerical stability and statistical recoverability, but its effect on the sparsity penalty's ability to select diffusion-term edges under partial ODE misspecification is not fully characterized; if the stabilization introduces implicit normalization that depends on trajectory scale, it could undermine identifiability of the causal structure in the diffusion term.
minor comments (2)
  1. [Experiments] The experimental section would benefit from explicit reporting of the number of independent trajectories, discretization step size, and exact hyperparameter ranges for the sparsity and stabilization parameters to support reproducibility of the benchmark improvements.
  2. [Method] Notation for the quasi-likelihood versus its stabilized version should be clarified in the equations to distinguish the stabilization from standard maximum-likelihood forms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback on our manuscript. We address each major comment point-by-point below, indicating revisions where appropriate to strengthen the theoretical arguments.

read point-by-point responses
  1. Referee: The claim of causal graph recovery for unstable SDEs rests on the stabilized quasi-likelihood estimator remaining consistent when trajectories diverge, yet the argument invokes the same mild conditions as the stable case without an explicit uniform integrability or moment-control argument to bound the integrated quasi-likelihood (involving inverse diffusion and derivatives) under unbounded growth. This is load-bearing for the central claim covering both stable and unstable regimes.

    Authors: We appreciate this observation on the load-bearing nature of the argument. The current proof extends the stable-case analysis by invoking the stabilization to control estimator growth, but we agree that an explicit uniform integrability step is not detailed for the unstable regime. In the revised manuscript, we will add a supporting lemma in §4 that establishes moment bounds on the integrated quasi-likelihood (including terms involving the inverse diffusion and its derivatives) under the paper's mild growth conditions on the SDE coefficients, ensuring the consistency result holds uniformly for both regimes. revision: yes

  2. Referee: The stabilization parameter is described as balancing numerical stability and statistical recoverability, but its effect on the sparsity penalty's ability to select diffusion-term edges under partial ODE misspecification is not fully characterized; if the stabilization introduces implicit normalization that depends on trajectory scale, it could undermine identifiability of the causal structure in the diffusion term.

    Authors: Thank you for this comment. The stabilization parameter is introduced as a fixed scalar (independent of trajectory scale) chosen to ensure well-conditioned optimization while preserving the population identifiability of the diffusion coefficients. Our robustness analysis already shows that edge selection in the diffusion term remains consistent under bounded misspecification, as the sparsity penalty acts on the stabilized estimator whose asymptotic distribution is unaffected by the fixed stabilization. We will nevertheless expand §3.2 with an additional remark and short derivation clarifying that the stabilization does not introduce scale-dependent normalization that alters identifiability or the sparsity-induced selection under the considered misspecification model. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation of causal recovery guarantees

full rationale

The paper develops a new integrative framework using SDEs where the drift encodes known ODE dynamics and the diffusion term captures unknown causal couplings. It introduces a sparsity-inducing maximum quasi-likelihood estimator equipped with a stabilization technique, then states that under mild conditions causal graph recovery guarantees hold for both stable and unstable SDEs, with additional robustness analysis to ODE misspecification. No quoted step reduces a claimed prediction or guarantee to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified within the paper. The stabilization is presented as theoretically justified inside the derivation rather than smuggled via prior self-work or ansatz. The overall chain therefore remains self-contained against external statistical theory for quasi-likelihood estimators in SDEs and does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on partial physical knowledge being encodable in the drift term, statistical assumptions for quasi-likelihood, and unspecified mild conditions for recovery; no new physical entities are postulated.

free parameters (2)
  • sparsity regularization parameter
    Induces sparsity in the causal graph estimate within the maximum quasi-likelihood objective.
  • stabilization parameter
    Balances numerical stability and statistical recoverability in the optimization technique.
axioms (2)
  • domain assumption Mild conditions sufficient for causal graph recovery in stable and unstable SDEs
    Invoked to establish theoretical guarantees for both stable and unstable systems.
  • domain assumption The system can be represented as a heteroscedastic SDE with separable drift and diffusion terms
    Underpins the encoding of known ODE dynamics in drift and unknown causal couplings in diffusion.

pith-pipeline@v0.9.0 · 5762 in / 1462 out tokens · 44936 ms · 2026-05-21T13:13:21.032137+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

  1. [1]

    A., Shah, K., Abdeljawad, T., and Mofarreh, F

    Alqudah, M. A., Shah, K., Abdeljawad, T., and Mofarreh, F. Study on nonlinear fractional order conformable model of infectious disease using updated analytical techniques.Fractals, 33(04):2540080, 2025

  2. [2]

    Neural graphical modelling in continuous-time: consistency guarantees and algorithms

    Bellot, A., Branson, K., and van der Schaar, M. Neural graphical modelling in continuous-time: consistency guarantees and algorithms. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=SsHBkfeRF9L

  3. [3]

    and Hochberg, Y

    Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodologi- cal), 57(1):289–300, 1995

  4. [4]

    Estimate on the pathwise lyapunov exponent of linear stochastic differential equations with constant coefficients.Stochastic Analysis and Applications, 28:747–762, 09 2010

    Bierkens, J., Gaans, O., and Verduyn Lunel, S. Estimate on the pathwise lyapunov exponent of linear stochastic differential equations with constant coefficients.Stochastic Analysis and Applications, 28:747–762, 09 2010. doi: 10.1080/07362994.2010.503453

  5. [5]

    Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. Neural ordinary differ- ential equations. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 6572–6583, Red Hook, NY, USA, 2018. Curran Associates Inc. 12

  6. [6]

    On the lasso for graphical continuous lyapunov models

    Dettling, P., Drton, M., and Kolar, M. On the lasso for graphical continuous lyapunov models. In Locatello, F. and Didelez, V. (eds.),Proceedings of the Third Conference on Causal Learning and Reasoning, volume 236 ofProceedings of Machine Learning Research, pp. 514–550. PMLR, 01–03 Apr 2024. URLhttps://proceedings.mlr.press/v236/dettling24a.html

  7. [7]

    Gene regulatory network inference based on causal discovery integrating with graph neural network.Quantitative Biology, 11:434–450, 12 2023

    Feng, K., Jiang, H., Yin, C., and Sun, H. Gene regulatory network inference based on causal discovery integrating with graph neural network.Quantitative Biology, 11:434–450, 12 2023. doi: 10.1002/qub2.26

  8. [8]

    Review of causal discovery methods based on graphical models.Frontiers in Genetics, Volume 10 - 2019, 2019

    Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical models.Frontiers in Genetics, Volume 10 - 2019, 2019. ISSN 1664-8021. doi: 10.3389/fgene. 2019.00524. URLhttps://www.frontiersin.org/journals/genetics/articles/10.3389/ fgene.2019.00524

  9. [9]

    Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

    Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

  10. [10]

    Causal discovery from temporal data: An overview and new perspectives.ACM Comput

    Gong, C., Zhang, C., Yao, D., Bi, J., Li, W., and Xu, Y. Causal discovery from temporal data: An overview and new perspectives.ACM Comput. Surv., 57(4), December 2024. ISSN 0360-0300. doi: 10.1145/3705297. URLhttps://doi.org/10.1145/3705297

  11. [11]

    Granger, C. W. J. Investigating causal relations by econometric models and cross-spectral methods.Econometrica, 37(3):424–438, 1969. ISSN 00129682, 14680262. URLhttp://www. jstor.org/stable/1912791

  12. [12]

    Han, S., Awasthi, U., and Bollas, G. M. Physics-informed symbolic regression for tool wear and remaining useful life predictions in manufacturing.Journal of Manufacturing Systems, 80: 734–748, 2025

  13. [13]

    and Sokol, A

    Hansen, N. and Sokol, A. Causal interpretation of stochastic differential equations.Electronic Journal of Probability, 19(none), January 2014. ISSN 1083-6489. doi: 10.1214/ejp.v19-2891. URLhttp://dx.doi.org/10.1214/EJP.v19-2891

  14. [14]

    J., Mao, X., and Stuart, A

    Higham, D. J., Mao, X., and Stuart, A. M. Exponential mean-square stability of numerical solutions to stochastic differential equations.LMS Journal of Computation and Mathematics, 6:297–313, 2003. doi: 10.1112/S1461157000000462

  15. [15]

    M., Peters, J., and Sch¨ olkopf, B

    Hoyer, P., Janzing, D., Mooij, J. M., Peters, J., and Sch¨ olkopf, B. Nonlinear causal dis- covery with additive noise models. In Koller, D., Schuurmans, D., Bengio, Y., and Bot- tou, L. (eds.),Advances in Neural Information Processing Systems, volume 21. Curran As- sociates, Inc., 2008. URLhttps://proceedings.neurips.cc/paper_files/paper/2008/ file/f7664...

  16. [16]

    Hyv¨ arinen, A., Zhang, K., Shimizu, S., and Hoyer, P. O. Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(56): 1709–1731, 2010. URLhttp://jmlr.org/papers/v11/hyvarinen10a.html

  17. [17]

    On learning discrete graphical models using group-sparse regularization

    Jalali, A., Ravikumar, P., Vasuki, V., and Sanghavi, S. On learning discrete graphical models using group-sparse regularization. In Gordon, G., Dunson, D., and Dud´ ık, M. (eds.),Pro- ceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pp. 378–387, Fort Lauder...

  18. [18]

    Springer, 2012

    Khasminskii, R.Stochastic stability of differential equations. Springer, 2012

  19. [19]

    and Wainwright, M

    Loh, P.-L. and Wainwright, M. J. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima.Journal of Machine Learning Research, 16(19):559–616,

  20. [20]

    URLhttp://jmlr.org/papers/v16/loh15a.html

  21. [21]

    W., Salvi, C., and Kilbertus, N

    Manten, G., Casolo, C., Ferrucci, E., Mogensen, S. W., Salvi, C., and Kilbertus, N. Sig- nature kernel conditional independence tests in causal discovery for stochastic processes. In The Thirteenth International Conference on Learning Representations, 2025. URLhttps: //openreview.net/forum?id=Nx4PMtJ1ER

  22. [22]

    and B¨ uhlmann, P

    Meinshausen, N. and B¨ uhlmann, P. High-dimensional graphs and variable selection with the lasso.The Annals of Statistics, 34(3), June 2006. ISSN 0090-5364. doi: 10.1214/ 009053606000000281. URLhttp://dx.doi.org/10.1214/009053606000000281

  23. [23]

    M., Janzing, D., and Sch¨ olkopf, B

    Mooij, J. M., Janzing, D., and Sch¨ olkopf, B. From ordinary differential equations to structural causal models: the deterministic case. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13, pp. 440–448, Arlington, Virginia, USA, 2013. AUAI Press

  24. [24]

    Springer, 6th edition, January 2014

    Øksendal, B.Stochastic Differential Equations: An Introduction with Applications (Univer- sitext). Springer, 6th edition, January 2014. ISBN 3540047581. URLhttp://www.amazon. com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/3540047581

  25. [25]

    Pamfil, R., Bauer, S., Sch¨ olkopf, B., and Buhmann, J. M. DYNOTEARS: Structure learning from time-series data. InProceedings of the 23rd International Conference on Artificial In- telligence and Statistics (AISTATS). PMLR, 2020. URLhttp://proceedings.mlr.press/ v108/pamfil20a.html

  26. [26]

    Causal discovery for time series with latent confounders, 2022

    Reiser, C. Causal discovery for time series with latent confounders, 2022. URLhttps: //arxiv.org/abs/2209.03427

  27. [27]

    Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

    Runge, J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In Peters, J. and Sontag, D. (eds.),Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), volume 124 ofProceedings of Machine Learning Research, pp. 1388–1397. PMLR, 03–06 Aug 2020. URLhttps://proceedings.mlr.pr...

  28. [28]

    M., Camps-Valls, G., Coumou, D., Deyle, E

    Runge, J., Bathiany, S., Bollt, E. M., Camps-Valls, G., Coumou, D., Deyle, E. R., Glymour, C., Kretschmer, M., Mahecha, M. D., Mu˜ noz-Mar´ ı, J., van Nes, E. H., Peters, J., Quax, R., Reichstein, M., Scheffer, M., Scholkopf, B., Spirtes, P., Sugihara, G., Sun, J., Zhang, K., and Zscheischler, J. Inferring causation from time series in earth system scienc...

  29. [30]

    Inferring causation from time series in Earth sys- tem sciences,

    Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., and Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets.Science Advances, 5 (11):eaau4996, 2019. doi: 10.1126/sciadv.aau4996. URLhttps://www.science.org/doi/ abs/10.1126/sciadv.aau4996. 14

  30. [31]

    D., Zhang, K., Glymour, M

    Sanchez-Romero, R., Ramsey, J. D., Zhang, K., Glymour, M. R., Huang, B., and Glymour, C. Estimating feedforward and feedback effective connections from fmri time series: Assessments of statistical methods.Network Neuroscience, 3(2):274–306, 2019

  31. [32]

    R., Kalchbrenner, N., Goyal, A., and Bengio, Y

    Sch¨ olkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., and Bengio, Y. Towards causal representation learning.CoRR, abs/2102.11107, 2021. URLhttps://arxiv. org/abs/2102.11107

  32. [33]

    O., Hyvrinen, A., and Kerminen, A

    Shimizu, S., Hoyer, P. O., Hyvrinen, A., and Kerminen, A. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(72):2003–2030, 2006. URLhttp://jmlr.org/papers/v7/shimizu06a.html

  33. [34]

    and Michailidis, G

    Shojaie, A. and Michailidis, G. Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 09 2010

  34. [35]

    MIT press, 2nd edition, 2000

    Spirtes, P., Glymour, C., and Scheines, R.Causation, Prediction, and Search. MIT press, 2nd edition, 2000

  35. [36]

    and Solin, A.Applied Stochastic Differential Equations

    S¨ arkk¨ a, S. and Solin, A.Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2019

  36. [37]

    Tank, A., Covert, I., Foti, N., Shojaie, A., and Fox, E. B. Neural granger causality.IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021. ISSN 1939-3539. doi: 10.1109/tpami.2021.3065601. URLhttp://dx.doi.org/10.1109/TPAMI.2021.3065601

  37. [38]

    E., and Aliferis, C

    Tsamardinos, I., Brown, L. E., and Aliferis, C. F. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31–78, 2006

  38. [39]

    and Hansen, N

    Varando, G. and Hansen, N. R. Graphical continuous lyapunov models, 2020. URLhttps: //arxiv.org/abs/2005.10483

  39. [40]

    A review on the complexities of brain activity: insights from nonlinear dynamics in neuroscience.Nonlinear Dynamics, 2024

    Vignesh, D., He, S., and Banerjee, S. A review on the complexities of brain activity: insights from nonlinear dynamics in neuroscience.Nonlinear Dynamics, 2024. URLhttps://api. semanticscholar.org/CorpusID:273840637

  40. [41]

    Neural structure learning with stochastic differential equations

    Wang, B., Jennings, J., and Gong, W. Neural structure learning with stochastic differential equations. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=V1GM9xDvIY

  41. [42]

    Causal discovery from non- stationary/heterogeneous data: Skeleton estimation and orientation determination

    Zhang, K., Huang, B., Zhang, J., Glymour, C., and Sch¨ olkopf, B. Causal discovery from non- stationary/heterogeneous data: Skeleton estimation and orientation determination. InProceed- ings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1347–1353, 2017. doi: 10.24963/ijcai.2017/187. URLhttps://doi.org/10.2496...

  42. [43]

    K., and Xing, E

    Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. Dags with no tears: Continuous optimization for structure learning. InAdvances in Neural Information Processing Systems 31, pp. 9472–9483. Curran Associates, Inc., 2018

  43. [44]

    Jacobian regularizer-based neural granger causality

    Zhou, W., Bai, S., Yu, S., Zhao, Q., and Chen, B. Jacobian regularizer-based neural granger causality. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024. 15 S1 Supplementary Materials I S1.1 Literature Review on traditional causal discovery methods Constraint-based methods infer causal structure via conditiona...

  44. [45]

    Without additional functional assumptions, these methods typically identify graphs only up to a Markov equivalence class

    handle latent confounders, and CD-NOD [41] targets nonstationary, heterogeneous time series. Without additional functional assumptions, these methods typically identify graphs only up to a Markov equivalence class. Score-based approaches instead posit an explicit SEM for the temporal process and learn the graph by optimizing a likelihood or score. Hyv¨ ar...

  45. [46]

    We have ∥∇θiℓi(θ∗ i )∥∞ =O P q s n + q logp n .(23)

    Denote thats :=|pa(i)|and by Proposition.1. We have ∥∇θiℓi(θ∗ i )∥∞ =O P q s n + q logp n .(23)

  46. [47]

    We have ||Qn ViVi −Q ∗ ViVi||∞ =O P s r logs n +s logs n .(24)

    LetV i :=pa(i)⊂[p] be the set of all parents of the nodei,Q n ViVi be the empirical andQ ∗ ViVi be the population fisher information matrix. We have ||Qn ViVi −Q ∗ ViVi||∞ =O P s r logs n +s logs n .(24)

  47. [48]

    1 n nX k=1 Zi,k ∞ | {X k} # ≤p pX ℓ=1 max j≤p E

    By assumption.3 and assumption.1 , the empirical Fisher information also has the following property: Qn V c i Vi(Qn ViVi)−1 ∞ ≤1− α 2 + OP K2 c2Cmin s3/2 r logd n + K4 c4C2 min s2 logd n ! , (25) whereC min =λ min(Q∗ ViVi) Here∥ · ∥ ∞ denotes the matrixℓ ∞ norm (maximum absolute row sum) in Eq.(24) and the vector ℓ∞ norm in Eq.(23). The next two lemmas es...

  48. [49]

    c−u 2 k (u2 k +c) 2 − r2 i,k(c−3u 2 k) (u2 k +c) 3 # XkX ⊤ k (112) Thus, we have ∆⊤∇2ℓ(θi)∆ = 1 n n−1X k=0 (∆⊤Xk)2

    Thus, [Z i,k]jℓ ∈SE(ν 2, α) with parametersν 2 ≍C 2 2 andα≍C 2. By Bernstein’s inequality for sums of independent mean-zero sub-exponential random variables, for anyt >0, P 1 n nX k=1 [Zi,k]jℓ|> t | {X k} ! ≤2 exp − n 2 min t2 ν2 , t α ,(64) 29 Next, using∥A∥ ∞ ≤pmax j,ℓ≤p |Ajℓ|, we have P 1 n nX k=1 Zi,k ∞ > t| {X k} ! ≤P max j,ℓ≤p 1 n nX k=1 [Zi,k]jℓ > ...

  49. [50]

    ≥α∥∆∥ 2 2 −τ r n∥∆∥2 1 (122) To complete the proof, we need to show thern ≍ q logp n , let ˆΣ = 1 n Pn−1 k=0 XkX ⊤ k and Σk =E[X kX ⊤ k ]

    Moreover, ∆⊤(ˆΣ)∆ = ∆⊤E[ˆΣ]∆ + ∆⊤ ˆΣ−E[ ˆΣ] ∆.(121) Using the bound|∆ ⊤A∆| ≤ ∥A∥ ∞∥∆∥2 1 on the event∥ ˆΣ−E ˆΣ∥∞ ≤r n, we have ∆⊤ 1 n n−1X k=0 XkX ⊤ k ∆≥ 7 72c m∥∆∥2 2 − 7 72c rn∥∆∥2 1. ≥α∥∆∥ 2 2 −τ r n∥∆∥2 1 (122) To complete the proof, we need to show thern ≍ q logp n , let ˆΣ = 1 n Pn−1 k=0 XkX ⊤ k and Σk =E[X kX ⊤ k ]. For thea, bentries, we define ce...