Causal Discovery from Heteroscedastic Stochastic Dynamical Systems under Imperfect Physical Models

Jianhong Chen; Naichen Shi; Xubo Yue

arxiv: 2602.04907 · v2 · pith:MBUUJWD3new · submitted 2026-02-03 · 💻 cs.LG · cs.AI· stat.ME

Causal Discovery from Heteroscedastic Stochastic Dynamical Systems under Imperfect Physical Models

Jianhong Chen , Naichen Shi , Xubo Yue This is my paper

Pith reviewed 2026-05-21 13:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ME

keywords causal discoverystochastic differential equationsdynamical systemsheteroscedastic SDEphysical modelsgraph recoveryODE misspecification

0 comments

The pith

A framework recovers causal structures in dynamical systems by modeling known physics in the SDE drift and unknown couplings in the diffusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes integrating partial physical knowledge from ODEs into causal discovery for dynamical systems using stochastic differential equations. The known dynamics are placed in the drift term, allowing the diffusion term to reveal causal interactions that go beyond the prescribed physics. This approach handles cyclic interactions and nonstationarity, providing recovery guarantees for both stable and unstable systems along with robustness to imperfect physical models. Experiments demonstrate better performance than purely data-driven methods on simulated and real epidemic data.

Core claim

The authors develop a causal discovery framework for heteroscedastic SDEs where the drift encodes known ODE dynamics and the diffusion term captures unknown causal couplings. They introduce a sparsity-inducing maximum quasi-likelihood estimator with a stabilization technique, prove causal graph recovery guarantees under mild conditions for stable and unstable SDEs, and show robustness to ODE misspecification.

What carries the argument

The heteroscedastic SDE model with fixed drift from ODE physics and learnable diffusion matrix representing causal structure.

If this is right

Causal graphs can be recovered from time series data of both stable and unstable dynamical systems.
The estimate remains reliable even when the ODE model is misspecified.
The stabilization technique improves optimization while preserving statistical recoverability.
Improved performance on nonlinear benchmarks like Lotka-Volterra and Lorenz dynamics with cyclic structures.
Reconstruction of stochastic SIR dynamics from real-world epidemic data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could be extended to systems with partial knowledge of higher-order dynamics.
Combining physics and data-driven causal discovery may improve forecasting in complex real-world processes.
The method suggests a way to handle nonstationary data without assuming equilibrium.
Future work might test the framework on high-dimensional systems beyond the current benchmarks.

Load-bearing premise

The system can be represented as a heteroscedastic SDE where the drift term exactly encodes the known ODE dynamics and the diffusion term captures the unknown causal couplings.

What would settle it

Observing that the recovered causal graph fails to match the ground truth in a simulation where the true dynamics are a known heteroscedastic SDE but the drift is misspecified would falsify the robustness analysis.

Figures

Figures reproduced from arXiv: 2602.04907 by Jianhong Chen, Naichen Shi, Xubo Yue.

**Figure 2.** Figure 2: Results on DAGs (stable system): mean/std over 10 runs (SHD, TPR, FDR). The full [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Results on directed loop graphs (stable system): mean/std over 10 runs (SHD, TPR, [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

Causal discovery is a data-driven paradigm for analyzing complex systems, while physics-based models, such as ordinary differential equations (ODEs), provide mechanistic structure for real-world dynamical processes. Integrating these paradigms can improve identifiability, stability, and robustness. However, real dynamical systems often exhibit cyclic interactions and nonstationarity, whereas many causal discovery methods rely on acyclicity, stationarity, or equilibrium assumptions. We propose an integrative causal discovery framework for dynamical systems that leverages partial physical knowledge through stochastic differential equations (SDEs). The drift term encodes known ODE dynamics, while the diffusion term captures unknown causal couplings beyond the prescribed physics. We develop a scalable sparsity-inducing maximum quasi-likelihood estimator with a theoretically justified stabilization technique to improve the optimization landscape. Under mild conditions, we establish causal graph recovery guarantees for both stable and unstable SDEs. We also analyze robustness of our causal graph estimate to ODE misspecification and clarify how the introduced stabilization technique balances numerical stability and statistical recoverability. Experiments on linear SDEs and nonlinear benchmarks, including Lotka-Volterra and Lorenz dynamics with acyclic and cyclic structures, show improved graph recovery and robustness over data-driven baselines. We also demonstrate practical utility on real-world epidemic data by reconstructing stochastic SIR dynamics within our causal discovery framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper splits known ODE drift from causal structure in the diffusion of an SDE and adds stabilization to claim recovery guarantees even for unstable systems, but the unstable-case argument looks thin without the full moment controls.

read the letter

The paper's main move is to model a dynamical system as an SDE whose drift is taken from an imperfect but known ODE while the diffusion term is left free to encode unknown causal couplings. They then build a sparsity-inducing maximum quasi-likelihood estimator and add a stabilization step that is supposed to keep the optimization tractable for both stable and unstable trajectories. The abstract reports recovery guarantees under mild conditions, a robustness analysis to ODE misspecification, and better graph recovery than data-driven baselines on linear SDEs plus nonlinear benchmarks including Lotka-Volterra and Lorenz, with an application to epidemic data for stochastic SIR reconstruction. That combination of partial physics and causal discovery for cyclic, nonstationary systems is the part that feels new and practically motivated. The experiments are described at a high level but the reported improvements over baselines suggest the framework can deliver usable gains in settings where pure data-driven methods struggle. The soft spot is the handling of unstable SDEs. When trajectories diverge, the quasi-likelihood terms that involve the inverse diffusion can become ill-behaved, and it is not obvious from the given material how the stabilization fully supplies the uniform integrability or moment bounds needed to keep the sparsity penalty from selecting the wrong edges. The claim that the technique is “theoretically justified” and balances numerical stability with statistical recoverability is stated but not yet visible in detail, so the central guarantee for the unstable case rests on steps that still need checking. This is work for people who already work at the boundary of causal discovery and physics-informed modeling, especially in epidemiology or nonlinear dynamics where some mechanistic structure is available but incomplete. A reader who wants concrete ways to relax stationarity and acyclicity assumptions while still getting graph recovery would get value from it. The paper deserves a serious referee because the integrative idea is coherent and the empirical side looks promising, even if the theory for diverging processes may require tightening. I would send it to review with a request that referees focus on the justification of the stabilization and the moment conditions for the unstable case.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an integrative causal discovery framework for dynamical systems modeled as heteroscedastic SDEs, with the drift term encoding known (but imperfect) ODE physics and the diffusion term capturing unknown causal couplings. It develops a sparsity-inducing maximum quasi-likelihood estimator equipped with a stabilization technique, establishes causal graph recovery guarantees under mild conditions for both stable and unstable SDEs, analyzes robustness to ODE misspecification, and reports improved performance over data-driven baselines on linear SDEs, nonlinear benchmarks (Lotka-Volterra, Lorenz with acyclic/cyclic structures), and real epidemic data via stochastic SIR reconstruction.

Significance. If the recovery guarantees and robustness analysis hold, the work provides a principled way to leverage partial physical knowledge for causal discovery in cyclic and nonstationary systems, addressing limitations of acyclicity or stationarity assumptions in existing methods. The explicit treatment of unstable SDEs and misspecification, together with the empirical validation on nonlinear benchmarks and real-world data, strengthens the contribution; the stabilization technique is presented as theoretically justified and practically useful.

major comments (2)

[§4 (Recovery Guarantees)] §4 (Recovery Guarantees): The claim of causal graph recovery for unstable SDEs rests on the stabilized quasi-likelihood estimator remaining consistent when trajectories diverge, yet the argument invokes the same mild conditions as the stable case without an explicit uniform integrability or moment-control argument to bound the integrated quasi-likelihood (involving inverse diffusion and derivatives) under unbounded growth. This is load-bearing for the central claim covering both stable and unstable regimes.
[§3.2 (Stabilization Technique, Eq. defining the stabilized estimator)] §3.2 (Stabilization Technique, Eq. defining the stabilized estimator): The stabilization parameter is described as balancing numerical stability and statistical recoverability, but its effect on the sparsity penalty's ability to select diffusion-term edges under partial ODE misspecification is not fully characterized; if the stabilization introduces implicit normalization that depends on trajectory scale, it could undermine identifiability of the causal structure in the diffusion term.

minor comments (2)

[Experiments] The experimental section would benefit from explicit reporting of the number of independent trajectories, discretization step size, and exact hyperparameter ranges for the sparsity and stabilization parameters to support reproducibility of the benchmark improvements.
[Method] Notation for the quasi-likelihood versus its stabilized version should be clarified in the equations to distinguish the stabilization from standard maximum-likelihood forms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback on our manuscript. We address each major comment point-by-point below, indicating revisions where appropriate to strengthen the theoretical arguments.

read point-by-point responses

Referee: The claim of causal graph recovery for unstable SDEs rests on the stabilized quasi-likelihood estimator remaining consistent when trajectories diverge, yet the argument invokes the same mild conditions as the stable case without an explicit uniform integrability or moment-control argument to bound the integrated quasi-likelihood (involving inverse diffusion and derivatives) under unbounded growth. This is load-bearing for the central claim covering both stable and unstable regimes.

Authors: We appreciate this observation on the load-bearing nature of the argument. The current proof extends the stable-case analysis by invoking the stabilization to control estimator growth, but we agree that an explicit uniform integrability step is not detailed for the unstable regime. In the revised manuscript, we will add a supporting lemma in §4 that establishes moment bounds on the integrated quasi-likelihood (including terms involving the inverse diffusion and its derivatives) under the paper's mild growth conditions on the SDE coefficients, ensuring the consistency result holds uniformly for both regimes. revision: yes
Referee: The stabilization parameter is described as balancing numerical stability and statistical recoverability, but its effect on the sparsity penalty's ability to select diffusion-term edges under partial ODE misspecification is not fully characterized; if the stabilization introduces implicit normalization that depends on trajectory scale, it could undermine identifiability of the causal structure in the diffusion term.

Authors: Thank you for this comment. The stabilization parameter is introduced as a fixed scalar (independent of trajectory scale) chosen to ensure well-conditioned optimization while preserving the population identifiability of the diffusion coefficients. Our robustness analysis already shows that edge selection in the diffusion term remains consistent under bounded misspecification, as the sparsity penalty acts on the stabilized estimator whose asymptotic distribution is unaffected by the fixed stabilization. We will nevertheless expand §3.2 with an additional remark and short derivation clarifying that the stabilization does not introduce scale-dependent normalization that alters identifiability or the sparsity-induced selection under the considered misspecification model. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation of causal recovery guarantees

full rationale

The paper develops a new integrative framework using SDEs where the drift encodes known ODE dynamics and the diffusion term captures unknown causal couplings. It introduces a sparsity-inducing maximum quasi-likelihood estimator equipped with a stabilization technique, then states that under mild conditions causal graph recovery guarantees hold for both stable and unstable SDEs, with additional robustness analysis to ODE misspecification. No quoted step reduces a claimed prediction or guarantee to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified within the paper. The stabilization is presented as theoretically justified inside the derivation rather than smuggled via prior self-work or ansatz. The overall chain therefore remains self-contained against external statistical theory for quasi-likelihood estimators in SDEs and does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on partial physical knowledge being encodable in the drift term, statistical assumptions for quasi-likelihood, and unspecified mild conditions for recovery; no new physical entities are postulated.

free parameters (2)

sparsity regularization parameter
Induces sparsity in the causal graph estimate within the maximum quasi-likelihood objective.
stabilization parameter
Balances numerical stability and statistical recoverability in the optimization technique.

axioms (2)

domain assumption Mild conditions sufficient for causal graph recovery in stable and unstable SDEs
Invoked to establish theoretical guarantees for both stable and unstable systems.
domain assumption The system can be represented as a heteroscedastic SDE with separable drift and diffusion terms
Underpins the encoding of known ODE dynamics in drift and unknown causal couplings in diffusion.

pith-pipeline@v0.9.0 · 5762 in / 1462 out tokens · 44936 ms · 2026-05-21T13:13:21.032137+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dX_t = g(t,x(t),γ)dt + S_A(X_t)dW_t with sparsity-inducing MLE on diffusion coefficients A; stabilization constant c>0 in quasi-likelihood (Eq. 4)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

A., Shah, K., Abdeljawad, T., and Mofarreh, F

Alqudah, M. A., Shah, K., Abdeljawad, T., and Mofarreh, F. Study on nonlinear fractional order conformable model of infectious disease using updated analytical techniques.Fractals, 33(04):2540080, 2025

work page 2025
[2]

Neural graphical modelling in continuous-time: consistency guarantees and algorithms

Bellot, A., Branson, K., and van der Schaar, M. Neural graphical modelling in continuous-time: consistency guarantees and algorithms. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=SsHBkfeRF9L

work page 2022
[3]

and Hochberg, Y

Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodologi- cal), 57(1):289–300, 1995

work page 1995
[4]

Estimate on the pathwise lyapunov exponent of linear stochastic differential equations with constant coefficients.Stochastic Analysis and Applications, 28:747–762, 09 2010

Bierkens, J., Gaans, O., and Verduyn Lunel, S. Estimate on the pathwise lyapunov exponent of linear stochastic differential equations with constant coefficients.Stochastic Analysis and Applications, 28:747–762, 09 2010. doi: 10.1080/07362994.2010.503453

work page doi:10.1080/07362994.2010.503453 2010
[5]

Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. Neural ordinary differ- ential equations. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 6572–6583, Red Hook, NY, USA, 2018. Curran Associates Inc. 12

work page 2018
[6]

On the lasso for graphical continuous lyapunov models

Dettling, P., Drton, M., and Kolar, M. On the lasso for graphical continuous lyapunov models. In Locatello, F. and Didelez, V. (eds.),Proceedings of the Third Conference on Causal Learning and Reasoning, volume 236 ofProceedings of Machine Learning Research, pp. 514–550. PMLR, 01–03 Apr 2024. URLhttps://proceedings.mlr.press/v236/dettling24a.html

work page 2024
[7]

Gene regulatory network inference based on causal discovery integrating with graph neural network.Quantitative Biology, 11:434–450, 12 2023

Feng, K., Jiang, H., Yin, C., and Sun, H. Gene regulatory network inference based on causal discovery integrating with graph neural network.Quantitative Biology, 11:434–450, 12 2023. doi: 10.1002/qub2.26

work page doi:10.1002/qub2.26 2023
[8]

Review of causal discovery methods based on graphical models.Frontiers in Genetics, Volume 10 - 2019, 2019

Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical models.Frontiers in Genetics, Volume 10 - 2019, 2019. ISSN 1664-8021. doi: 10.3389/fgene. 2019.00524. URLhttps://www.frontiersin.org/journals/genetics/articles/10.3389/ fgene.2019.00524

work page doi:10.3389/fgene 2019
[9]

Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

work page 2019
[10]

Causal discovery from temporal data: An overview and new perspectives.ACM Comput

Gong, C., Zhang, C., Yao, D., Bi, J., Li, W., and Xu, Y. Causal discovery from temporal data: An overview and new perspectives.ACM Comput. Surv., 57(4), December 2024. ISSN 0360-0300. doi: 10.1145/3705297. URLhttps://doi.org/10.1145/3705297

work page doi:10.1145/3705297 2024
[11]

Granger, C. W. J. Investigating causal relations by econometric models and cross-spectral methods.Econometrica, 37(3):424–438, 1969. ISSN 00129682, 14680262. URLhttp://www. jstor.org/stable/1912791

work page arXiv 1969
[12]

Han, S., Awasthi, U., and Bollas, G. M. Physics-informed symbolic regression for tool wear and remaining useful life predictions in manufacturing.Journal of Manufacturing Systems, 80: 734–748, 2025

work page 2025
[13]

and Sokol, A

Hansen, N. and Sokol, A. Causal interpretation of stochastic differential equations.Electronic Journal of Probability, 19(none), January 2014. ISSN 1083-6489. doi: 10.1214/ejp.v19-2891. URLhttp://dx.doi.org/10.1214/EJP.v19-2891

work page doi:10.1214/ejp.v19-2891 2014
[14]

J., Mao, X., and Stuart, A

Higham, D. J., Mao, X., and Stuart, A. M. Exponential mean-square stability of numerical solutions to stochastic differential equations.LMS Journal of Computation and Mathematics, 6:297–313, 2003. doi: 10.1112/S1461157000000462

work page doi:10.1112/s1461157000000462 2003
[15]

M., Peters, J., and Sch¨ olkopf, B

Hoyer, P., Janzing, D., Mooij, J. M., Peters, J., and Sch¨ olkopf, B. Nonlinear causal dis- covery with additive noise models. In Koller, D., Schuurmans, D., Bengio, Y., and Bot- tou, L. (eds.),Advances in Neural Information Processing Systems, volume 21. Curran As- sociates, Inc., 2008. URLhttps://proceedings.neurips.cc/paper_files/paper/2008/ file/f7664...

work page 2008
[16]

Hyv¨ arinen, A., Zhang, K., Shimizu, S., and Hoyer, P. O. Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(56): 1709–1731, 2010. URLhttp://jmlr.org/papers/v11/hyvarinen10a.html

work page 2010
[17]

On learning discrete graphical models using group-sparse regularization

Jalali, A., Ravikumar, P., Vasuki, V., and Sanghavi, S. On learning discrete graphical models using group-sparse regularization. In Gordon, G., Dunson, D., and Dud´ ık, M. (eds.),Pro- ceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pp. 378–387, Fort Lauder...

work page 2011
[18]

Springer, 2012

Khasminskii, R.Stochastic stability of differential equations. Springer, 2012

work page 2012
[19]

and Wainwright, M

Loh, P.-L. and Wainwright, M. J. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima.Journal of Machine Learning Research, 16(19):559–616,

work page
[20]

URLhttp://jmlr.org/papers/v16/loh15a.html

work page
[21]

W., Salvi, C., and Kilbertus, N

Manten, G., Casolo, C., Ferrucci, E., Mogensen, S. W., Salvi, C., and Kilbertus, N. Sig- nature kernel conditional independence tests in causal discovery for stochastic processes. In The Thirteenth International Conference on Learning Representations, 2025. URLhttps: //openreview.net/forum?id=Nx4PMtJ1ER

work page 2025
[22]

and B¨ uhlmann, P

Meinshausen, N. and B¨ uhlmann, P. High-dimensional graphs and variable selection with the lasso.The Annals of Statistics, 34(3), June 2006. ISSN 0090-5364. doi: 10.1214/ 009053606000000281. URLhttp://dx.doi.org/10.1214/009053606000000281

work page doi:10.1214/009053606000000281 2006
[23]

M., Janzing, D., and Sch¨ olkopf, B

Mooij, J. M., Janzing, D., and Sch¨ olkopf, B. From ordinary differential equations to structural causal models: the deterministic case. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13, pp. 440–448, Arlington, Virginia, USA, 2013. AUAI Press

work page 2013
[24]

Springer, 6th edition, January 2014

Øksendal, B.Stochastic Differential Equations: An Introduction with Applications (Univer- sitext). Springer, 6th edition, January 2014. ISBN 3540047581. URLhttp://www.amazon. com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/3540047581

work page arXiv 2014
[25]

Pamfil, R., Bauer, S., Sch¨ olkopf, B., and Buhmann, J. M. DYNOTEARS: Structure learning from time-series data. InProceedings of the 23rd International Conference on Artificial In- telligence and Statistics (AISTATS). PMLR, 2020. URLhttp://proceedings.mlr.press/ v108/pamfil20a.html

work page 2020
[26]

Causal discovery for time series with latent confounders, 2022

Reiser, C. Causal discovery for time series with latent confounders, 2022. URLhttps: //arxiv.org/abs/2209.03427

work page arXiv 2022
[27]

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

Runge, J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In Peters, J. and Sontag, D. (eds.),Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), volume 124 ofProceedings of Machine Learning Research, pp. 1388–1397. PMLR, 03–06 Aug 2020. URLhttps://proceedings.mlr.pr...

work page 2020
[28]

M., Camps-Valls, G., Coumou, D., Deyle, E

Runge, J., Bathiany, S., Bollt, E. M., Camps-Valls, G., Coumou, D., Deyle, E. R., Glymour, C., Kretschmer, M., Mahecha, M. D., Mu˜ noz-Mar´ ı, J., van Nes, E. H., Peters, J., Quax, R., Reichstein, M., Scheffer, M., Scholkopf, B., Spirtes, P., Sugihara, G., Sun, J., Zhang, K., and Zscheischler, J. Inferring causation from time series in earth system scienc...

work page 2019
[30]

Inferring causation from time series in Earth sys- tem sciences,

Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., and Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets.Science Advances, 5 (11):eaau4996, 2019. doi: 10.1126/sciadv.aau4996. URLhttps://www.science.org/doi/ abs/10.1126/sciadv.aau4996. 14

work page doi:10.1126/sciadv.aau4996 2019
[31]

D., Zhang, K., Glymour, M

Sanchez-Romero, R., Ramsey, J. D., Zhang, K., Glymour, M. R., Huang, B., and Glymour, C. Estimating feedforward and feedback effective connections from fmri time series: Assessments of statistical methods.Network Neuroscience, 3(2):274–306, 2019

work page 2019
[32]

R., Kalchbrenner, N., Goyal, A., and Bengio, Y

Sch¨ olkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., and Bengio, Y. Towards causal representation learning.CoRR, abs/2102.11107, 2021. URLhttps://arxiv. org/abs/2102.11107

work page arXiv 2021
[33]

O., Hyvrinen, A., and Kerminen, A

Shimizu, S., Hoyer, P. O., Hyvrinen, A., and Kerminen, A. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(72):2003–2030, 2006. URLhttp://jmlr.org/papers/v7/shimizu06a.html

work page 2003
[34]

and Michailidis, G

Shojaie, A. and Michailidis, G. Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 09 2010

work page 2010
[35]

MIT press, 2nd edition, 2000

Spirtes, P., Glymour, C., and Scheines, R.Causation, Prediction, and Search. MIT press, 2nd edition, 2000

work page 2000
[36]

and Solin, A.Applied Stochastic Differential Equations

S¨ arkk¨ a, S. and Solin, A.Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2019

work page 2019
[37]

Tank, A., Covert, I., Foti, N., Shojaie, A., and Fox, E. B. Neural granger causality.IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021. ISSN 1939-3539. doi: 10.1109/tpami.2021.3065601. URLhttp://dx.doi.org/10.1109/TPAMI.2021.3065601

work page doi:10.1109/tpami.2021.3065601 2021
[38]

E., and Aliferis, C

Tsamardinos, I., Brown, L. E., and Aliferis, C. F. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31–78, 2006

work page 2006
[39]

and Hansen, N

Varando, G. and Hansen, N. R. Graphical continuous lyapunov models, 2020. URLhttps: //arxiv.org/abs/2005.10483

work page arXiv 2020
[40]

A review on the complexities of brain activity: insights from nonlinear dynamics in neuroscience.Nonlinear Dynamics, 2024

Vignesh, D., He, S., and Banerjee, S. A review on the complexities of brain activity: insights from nonlinear dynamics in neuroscience.Nonlinear Dynamics, 2024. URLhttps://api. semanticscholar.org/CorpusID:273840637

work page 2024
[41]

Neural structure learning with stochastic differential equations

Wang, B., Jennings, J., and Gong, W. Neural structure learning with stochastic differential equations. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=V1GM9xDvIY

work page 2024
[42]

Causal discovery from non- stationary/heterogeneous data: Skeleton estimation and orientation determination

Zhang, K., Huang, B., Zhang, J., Glymour, C., and Sch¨ olkopf, B. Causal discovery from non- stationary/heterogeneous data: Skeleton estimation and orientation determination. InProceed- ings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1347–1353, 2017. doi: 10.24963/ijcai.2017/187. URLhttps://doi.org/10.2496...

work page doi:10.24963/ijcai.2017/187 2017
[43]

K., and Xing, E

Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. Dags with no tears: Continuous optimization for structure learning. InAdvances in Neural Information Processing Systems 31, pp. 9472–9483. Curran Associates, Inc., 2018

work page 2018
[44]

Jacobian regularizer-based neural granger causality

Zhou, W., Bai, S., Yu, S., Zhao, Q., and Chen, B. Jacobian regularizer-based neural granger causality. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024. 15 S1 Supplementary Materials I S1.1 Literature Review on traditional causal discovery methods Constraint-based methods infer causal structure via conditiona...

work page 2024
[45]

Without additional functional assumptions, these methods typically identify graphs only up to a Markov equivalence class

handle latent confounders, and CD-NOD [41] targets nonstationary, heterogeneous time series. Without additional functional assumptions, these methods typically identify graphs only up to a Markov equivalence class. Score-based approaches instead posit an explicit SEM for the temporal process and learn the graph by optimizing a likelihood or score. Hyv¨ ar...

work page
[46]

We have ∥∇θiℓi(θ∗ i )∥∞ =O P q s n + q logp n .(23)

Denote thats :=|pa(i)|and by Proposition.1. We have ∥∇θiℓi(θ∗ i )∥∞ =O P q s n + q logp n .(23)

work page
[47]

We have ||Qn ViVi −Q ∗ ViVi||∞ =O P s r logs n +s logs n .(24)

LetV i :=pa(i)⊂[p] be the set of all parents of the nodei,Q n ViVi be the empirical andQ ∗ ViVi be the population fisher information matrix. We have ||Qn ViVi −Q ∗ ViVi||∞ =O P s r logs n +s logs n .(24)

work page
[48]

1 n nX k=1 Zi,k ∞ | {X k} # ≤p pX ℓ=1 max j≤p E

By assumption.3 and assumption.1 , the empirical Fisher information also has the following property: Qn V c i Vi(Qn ViVi)−1 ∞ ≤1− α 2 + OP K2 c2Cmin s3/2 r logd n + K4 c4C2 min s2 logd n ! , (25) whereC min =λ min(Q∗ ViVi) Here∥ · ∥ ∞ denotes the matrixℓ ∞ norm (maximum absolute row sum) in Eq.(24) and the vector ℓ∞ norm in Eq.(23). The next two lemmas es...

work page
[49]

c−u 2 k (u2 k +c) 2 − r2 i,k(c−3u 2 k) (u2 k +c) 3 # XkX ⊤ k (112) Thus, we have ∆⊤∇2ℓ(θi)∆ = 1 n n−1X k=0 (∆⊤Xk)2

Thus, [Z i,k]jℓ ∈SE(ν 2, α) with parametersν 2 ≍C 2 2 andα≍C 2. By Bernstein’s inequality for sums of independent mean-zero sub-exponential random variables, for anyt >0, P 1 n nX k=1 [Zi,k]jℓ|> t | {X k} ! ≤2 exp − n 2 min t2 ν2 , t α ,(64) 29 Next, using∥A∥ ∞ ≤pmax j,ℓ≤p |Ajℓ|, we have P 1 n nX k=1 Zi,k ∞ > t| {X k} ! ≤P max j,ℓ≤p 1 n nX k=1 [Zi,k]jℓ > ...

work page
[50]

≥α∥∆∥ 2 2 −τ r n∥∆∥2 1 (122) To complete the proof, we need to show thern ≍ q logp n , let ˆΣ = 1 n Pn−1 k=0 XkX ⊤ k and Σk =E[X kX ⊤ k ]

Moreover, ∆⊤(ˆΣ)∆ = ∆⊤E[ˆΣ]∆ + ∆⊤ ˆΣ−E[ ˆΣ] ∆.(121) Using the bound|∆ ⊤A∆| ≤ ∥A∥ ∞∥∆∥2 1 on the event∥ ˆΣ−E ˆΣ∥∞ ≤r n, we have ∆⊤ 1 n n−1X k=0 XkX ⊤ k ∆≥ 7 72c m∥∆∥2 2 − 7 72c rn∥∆∥2 1. ≥α∥∆∥ 2 2 −τ r n∥∆∥2 1 (122) To complete the proof, we need to show thern ≍ q logp n , let ˆΣ = 1 n Pn−1 k=0 XkX ⊤ k and Σk =E[X kX ⊤ k ]. For thea, bentries, we define ce...

work page

[1] [1]

A., Shah, K., Abdeljawad, T., and Mofarreh, F

Alqudah, M. A., Shah, K., Abdeljawad, T., and Mofarreh, F. Study on nonlinear fractional order conformable model of infectious disease using updated analytical techniques.Fractals, 33(04):2540080, 2025

work page 2025

[2] [2]

Neural graphical modelling in continuous-time: consistency guarantees and algorithms

Bellot, A., Branson, K., and van der Schaar, M. Neural graphical modelling in continuous-time: consistency guarantees and algorithms. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=SsHBkfeRF9L

work page 2022

[3] [3]

and Hochberg, Y

Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodologi- cal), 57(1):289–300, 1995

work page 1995

[4] [4]

Estimate on the pathwise lyapunov exponent of linear stochastic differential equations with constant coefficients.Stochastic Analysis and Applications, 28:747–762, 09 2010

Bierkens, J., Gaans, O., and Verduyn Lunel, S. Estimate on the pathwise lyapunov exponent of linear stochastic differential equations with constant coefficients.Stochastic Analysis and Applications, 28:747–762, 09 2010. doi: 10.1080/07362994.2010.503453

work page doi:10.1080/07362994.2010.503453 2010

[5] [5]

Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. Neural ordinary differ- ential equations. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 6572–6583, Red Hook, NY, USA, 2018. Curran Associates Inc. 12

work page 2018

[6] [6]

On the lasso for graphical continuous lyapunov models

Dettling, P., Drton, M., and Kolar, M. On the lasso for graphical continuous lyapunov models. In Locatello, F. and Didelez, V. (eds.),Proceedings of the Third Conference on Causal Learning and Reasoning, volume 236 ofProceedings of Machine Learning Research, pp. 514–550. PMLR, 01–03 Apr 2024. URLhttps://proceedings.mlr.press/v236/dettling24a.html

work page 2024

[7] [7]

Gene regulatory network inference based on causal discovery integrating with graph neural network.Quantitative Biology, 11:434–450, 12 2023

Feng, K., Jiang, H., Yin, C., and Sun, H. Gene regulatory network inference based on causal discovery integrating with graph neural network.Quantitative Biology, 11:434–450, 12 2023. doi: 10.1002/qub2.26

work page doi:10.1002/qub2.26 2023

[8] [8]

Review of causal discovery methods based on graphical models.Frontiers in Genetics, Volume 10 - 2019, 2019

Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical models.Frontiers in Genetics, Volume 10 - 2019, 2019. ISSN 1664-8021. doi: 10.3389/fgene. 2019.00524. URLhttps://www.frontiersin.org/journals/genetics/articles/10.3389/ fgene.2019.00524

work page doi:10.3389/fgene 2019

[9] [9]

Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

work page 2019

[10] [10]

Causal discovery from temporal data: An overview and new perspectives.ACM Comput

Gong, C., Zhang, C., Yao, D., Bi, J., Li, W., and Xu, Y. Causal discovery from temporal data: An overview and new perspectives.ACM Comput. Surv., 57(4), December 2024. ISSN 0360-0300. doi: 10.1145/3705297. URLhttps://doi.org/10.1145/3705297

work page doi:10.1145/3705297 2024

[11] [11]

Granger, C. W. J. Investigating causal relations by econometric models and cross-spectral methods.Econometrica, 37(3):424–438, 1969. ISSN 00129682, 14680262. URLhttp://www. jstor.org/stable/1912791

work page arXiv 1969

[12] [12]

Han, S., Awasthi, U., and Bollas, G. M. Physics-informed symbolic regression for tool wear and remaining useful life predictions in manufacturing.Journal of Manufacturing Systems, 80: 734–748, 2025

work page 2025

[13] [13]

and Sokol, A

Hansen, N. and Sokol, A. Causal interpretation of stochastic differential equations.Electronic Journal of Probability, 19(none), January 2014. ISSN 1083-6489. doi: 10.1214/ejp.v19-2891. URLhttp://dx.doi.org/10.1214/EJP.v19-2891

work page doi:10.1214/ejp.v19-2891 2014

[14] [14]

J., Mao, X., and Stuart, A

Higham, D. J., Mao, X., and Stuart, A. M. Exponential mean-square stability of numerical solutions to stochastic differential equations.LMS Journal of Computation and Mathematics, 6:297–313, 2003. doi: 10.1112/S1461157000000462

work page doi:10.1112/s1461157000000462 2003

[15] [15]

M., Peters, J., and Sch¨ olkopf, B

Hoyer, P., Janzing, D., Mooij, J. M., Peters, J., and Sch¨ olkopf, B. Nonlinear causal dis- covery with additive noise models. In Koller, D., Schuurmans, D., Bengio, Y., and Bot- tou, L. (eds.),Advances in Neural Information Processing Systems, volume 21. Curran As- sociates, Inc., 2008. URLhttps://proceedings.neurips.cc/paper_files/paper/2008/ file/f7664...

work page 2008

[16] [16]

Hyv¨ arinen, A., Zhang, K., Shimizu, S., and Hoyer, P. O. Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(56): 1709–1731, 2010. URLhttp://jmlr.org/papers/v11/hyvarinen10a.html

work page 2010

[17] [17]

On learning discrete graphical models using group-sparse regularization

Jalali, A., Ravikumar, P., Vasuki, V., and Sanghavi, S. On learning discrete graphical models using group-sparse regularization. In Gordon, G., Dunson, D., and Dud´ ık, M. (eds.),Pro- ceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pp. 378–387, Fort Lauder...

work page 2011

[18] [18]

Springer, 2012

Khasminskii, R.Stochastic stability of differential equations. Springer, 2012

work page 2012

[19] [19]

and Wainwright, M

Loh, P.-L. and Wainwright, M. J. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima.Journal of Machine Learning Research, 16(19):559–616,

work page

[20] [20]

URLhttp://jmlr.org/papers/v16/loh15a.html

work page

[21] [21]

W., Salvi, C., and Kilbertus, N

Manten, G., Casolo, C., Ferrucci, E., Mogensen, S. W., Salvi, C., and Kilbertus, N. Sig- nature kernel conditional independence tests in causal discovery for stochastic processes. In The Thirteenth International Conference on Learning Representations, 2025. URLhttps: //openreview.net/forum?id=Nx4PMtJ1ER

work page 2025

[22] [22]

and B¨ uhlmann, P

Meinshausen, N. and B¨ uhlmann, P. High-dimensional graphs and variable selection with the lasso.The Annals of Statistics, 34(3), June 2006. ISSN 0090-5364. doi: 10.1214/ 009053606000000281. URLhttp://dx.doi.org/10.1214/009053606000000281

work page doi:10.1214/009053606000000281 2006

[23] [23]

M., Janzing, D., and Sch¨ olkopf, B

Mooij, J. M., Janzing, D., and Sch¨ olkopf, B. From ordinary differential equations to structural causal models: the deterministic case. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13, pp. 440–448, Arlington, Virginia, USA, 2013. AUAI Press

work page 2013

[24] [24]

Springer, 6th edition, January 2014

Øksendal, B.Stochastic Differential Equations: An Introduction with Applications (Univer- sitext). Springer, 6th edition, January 2014. ISBN 3540047581. URLhttp://www.amazon. com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/3540047581

work page arXiv 2014

[25] [25]

Pamfil, R., Bauer, S., Sch¨ olkopf, B., and Buhmann, J. M. DYNOTEARS: Structure learning from time-series data. InProceedings of the 23rd International Conference on Artificial In- telligence and Statistics (AISTATS). PMLR, 2020. URLhttp://proceedings.mlr.press/ v108/pamfil20a.html

work page 2020

[26] [26]

Causal discovery for time series with latent confounders, 2022

Reiser, C. Causal discovery for time series with latent confounders, 2022. URLhttps: //arxiv.org/abs/2209.03427

work page arXiv 2022

[27] [27]

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

Runge, J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In Peters, J. and Sontag, D. (eds.),Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), volume 124 ofProceedings of Machine Learning Research, pp. 1388–1397. PMLR, 03–06 Aug 2020. URLhttps://proceedings.mlr.pr...

work page 2020

[28] [28]

M., Camps-Valls, G., Coumou, D., Deyle, E

Runge, J., Bathiany, S., Bollt, E. M., Camps-Valls, G., Coumou, D., Deyle, E. R., Glymour, C., Kretschmer, M., Mahecha, M. D., Mu˜ noz-Mar´ ı, J., van Nes, E. H., Peters, J., Quax, R., Reichstein, M., Scheffer, M., Scholkopf, B., Spirtes, P., Sugihara, G., Sun, J., Zhang, K., and Zscheischler, J. Inferring causation from time series in earth system scienc...

work page 2019

[29] [30]

Inferring causation from time series in Earth sys- tem sciences,

Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., and Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets.Science Advances, 5 (11):eaau4996, 2019. doi: 10.1126/sciadv.aau4996. URLhttps://www.science.org/doi/ abs/10.1126/sciadv.aau4996. 14

work page doi:10.1126/sciadv.aau4996 2019

[30] [31]

D., Zhang, K., Glymour, M

Sanchez-Romero, R., Ramsey, J. D., Zhang, K., Glymour, M. R., Huang, B., and Glymour, C. Estimating feedforward and feedback effective connections from fmri time series: Assessments of statistical methods.Network Neuroscience, 3(2):274–306, 2019

work page 2019

[31] [32]

R., Kalchbrenner, N., Goyal, A., and Bengio, Y

Sch¨ olkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., and Bengio, Y. Towards causal representation learning.CoRR, abs/2102.11107, 2021. URLhttps://arxiv. org/abs/2102.11107

work page arXiv 2021

[32] [33]

O., Hyvrinen, A., and Kerminen, A

Shimizu, S., Hoyer, P. O., Hyvrinen, A., and Kerminen, A. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(72):2003–2030, 2006. URLhttp://jmlr.org/papers/v7/shimizu06a.html

work page 2003

[33] [34]

and Michailidis, G

Shojaie, A. and Michailidis, G. Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 09 2010

work page 2010

[34] [35]

MIT press, 2nd edition, 2000

Spirtes, P., Glymour, C., and Scheines, R.Causation, Prediction, and Search. MIT press, 2nd edition, 2000

work page 2000

[35] [36]

and Solin, A.Applied Stochastic Differential Equations

S¨ arkk¨ a, S. and Solin, A.Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2019

work page 2019

[36] [37]

Tank, A., Covert, I., Foti, N., Shojaie, A., and Fox, E. B. Neural granger causality.IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021. ISSN 1939-3539. doi: 10.1109/tpami.2021.3065601. URLhttp://dx.doi.org/10.1109/TPAMI.2021.3065601

work page doi:10.1109/tpami.2021.3065601 2021

[37] [38]

E., and Aliferis, C

Tsamardinos, I., Brown, L. E., and Aliferis, C. F. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31–78, 2006

work page 2006

[38] [39]

and Hansen, N

Varando, G. and Hansen, N. R. Graphical continuous lyapunov models, 2020. URLhttps: //arxiv.org/abs/2005.10483

work page arXiv 2020

[39] [40]

A review on the complexities of brain activity: insights from nonlinear dynamics in neuroscience.Nonlinear Dynamics, 2024

Vignesh, D., He, S., and Banerjee, S. A review on the complexities of brain activity: insights from nonlinear dynamics in neuroscience.Nonlinear Dynamics, 2024. URLhttps://api. semanticscholar.org/CorpusID:273840637

work page 2024

[40] [41]

Neural structure learning with stochastic differential equations

Wang, B., Jennings, J., and Gong, W. Neural structure learning with stochastic differential equations. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=V1GM9xDvIY

work page 2024

[41] [42]

Causal discovery from non- stationary/heterogeneous data: Skeleton estimation and orientation determination

Zhang, K., Huang, B., Zhang, J., Glymour, C., and Sch¨ olkopf, B. Causal discovery from non- stationary/heterogeneous data: Skeleton estimation and orientation determination. InProceed- ings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1347–1353, 2017. doi: 10.24963/ijcai.2017/187. URLhttps://doi.org/10.2496...

work page doi:10.24963/ijcai.2017/187 2017

[42] [43]

K., and Xing, E

Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. Dags with no tears: Continuous optimization for structure learning. InAdvances in Neural Information Processing Systems 31, pp. 9472–9483. Curran Associates, Inc., 2018

work page 2018

[43] [44]

Jacobian regularizer-based neural granger causality

Zhou, W., Bai, S., Yu, S., Zhao, Q., and Chen, B. Jacobian regularizer-based neural granger causality. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024. 15 S1 Supplementary Materials I S1.1 Literature Review on traditional causal discovery methods Constraint-based methods infer causal structure via conditiona...

work page 2024

[44] [45]

Without additional functional assumptions, these methods typically identify graphs only up to a Markov equivalence class

handle latent confounders, and CD-NOD [41] targets nonstationary, heterogeneous time series. Without additional functional assumptions, these methods typically identify graphs only up to a Markov equivalence class. Score-based approaches instead posit an explicit SEM for the temporal process and learn the graph by optimizing a likelihood or score. Hyv¨ ar...

work page

[45] [46]

We have ∥∇θiℓi(θ∗ i )∥∞ =O P q s n + q logp n .(23)

Denote thats :=|pa(i)|and by Proposition.1. We have ∥∇θiℓi(θ∗ i )∥∞ =O P q s n + q logp n .(23)

work page

[46] [47]

We have ||Qn ViVi −Q ∗ ViVi||∞ =O P s r logs n +s logs n .(24)

LetV i :=pa(i)⊂[p] be the set of all parents of the nodei,Q n ViVi be the empirical andQ ∗ ViVi be the population fisher information matrix. We have ||Qn ViVi −Q ∗ ViVi||∞ =O P s r logs n +s logs n .(24)

work page

[47] [48]

1 n nX k=1 Zi,k ∞ | {X k} # ≤p pX ℓ=1 max j≤p E

By assumption.3 and assumption.1 , the empirical Fisher information also has the following property: Qn V c i Vi(Qn ViVi)−1 ∞ ≤1− α 2 + OP K2 c2Cmin s3/2 r logd n + K4 c4C2 min s2 logd n ! , (25) whereC min =λ min(Q∗ ViVi) Here∥ · ∥ ∞ denotes the matrixℓ ∞ norm (maximum absolute row sum) in Eq.(24) and the vector ℓ∞ norm in Eq.(23). The next two lemmas es...

work page

[48] [49]

c−u 2 k (u2 k +c) 2 − r2 i,k(c−3u 2 k) (u2 k +c) 3 # XkX ⊤ k (112) Thus, we have ∆⊤∇2ℓ(θi)∆ = 1 n n−1X k=0 (∆⊤Xk)2

Thus, [Z i,k]jℓ ∈SE(ν 2, α) with parametersν 2 ≍C 2 2 andα≍C 2. By Bernstein’s inequality for sums of independent mean-zero sub-exponential random variables, for anyt >0, P 1 n nX k=1 [Zi,k]jℓ|> t | {X k} ! ≤2 exp − n 2 min t2 ν2 , t α ,(64) 29 Next, using∥A∥ ∞ ≤pmax j,ℓ≤p |Ajℓ|, we have P 1 n nX k=1 Zi,k ∞ > t| {X k} ! ≤P max j,ℓ≤p 1 n nX k=1 [Zi,k]jℓ > ...

work page

[49] [50]

≥α∥∆∥ 2 2 −τ r n∥∆∥2 1 (122) To complete the proof, we need to show thern ≍ q logp n , let ˆΣ = 1 n Pn−1 k=0 XkX ⊤ k and Σk =E[X kX ⊤ k ]

Moreover, ∆⊤(ˆΣ)∆ = ∆⊤E[ˆΣ]∆ + ∆⊤ ˆΣ−E[ ˆΣ] ∆.(121) Using the bound|∆ ⊤A∆| ≤ ∥A∥ ∞∥∆∥2 1 on the event∥ ˆΣ−E ˆΣ∥∞ ≤r n, we have ∆⊤ 1 n n−1X k=0 XkX ⊤ k ∆≥ 7 72c m∥∆∥2 2 − 7 72c rn∥∆∥2 1. ≥α∥∆∥ 2 2 −τ r n∥∆∥2 1 (122) To complete the proof, we need to show thern ≍ q logp n , let ˆΣ = 1 n Pn−1 k=0 XkX ⊤ k and Σk =E[X kX ⊤ k ]. For thea, bentries, we define ce...

work page