pith. sign in

arxiv: 2512.06238 · v2 · pith:5Y4ZMBNInew · submitted 2025-12-06 · 💻 cs.IT · math.IT· math.ST· stat.TH

Non-Asymptotic Error Bounds for Causally Conditioned Directed Information Rates of Gaussian Sequences

Pith reviewed 2026-05-21 18:53 UTC · model grok-4.3

classification 💻 cs.IT math.ITmath.STstat.TH
keywords directed informationcausal conditioningGaussian sequencesnon-asymptotic boundsestimation errorinformation rateoptimal prediction
0
0 comments X

The pith

For Gaussian vector sequences, an estimator of causally conditioned directed information rates achieves error O(N^{-1/2} log N) with high probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes non-asymptotic error bounds for estimators of causally conditioned directed information rates from sequences of Gaussian vectors. Directed information measures causal influences between random processes, so practical use requires finite-sample estimators with explicit accuracy guarantees. The authors derive an explicit formula for the rate based on optimal prediction and construct an estimator from that formula. They prove the estimator's error is of order O(N^{-1/2} log N) with high probability for total sample size N. A sympathetic reader would care because these bounds support reliable causal analysis of real-valued time series without asymptotic approximations.

Core claim

We provide an explicit formula for the causally conditioned directed information rate based on optimal prediction for sequences of Gaussian vectors. We define an estimator based on this formula and show that our estimator gives an error of order O(N^{-1/2} log N) with high probability, where N is the total sample size.

What carries the argument

The estimator defined from the explicit formula for the causally conditioned directed information rate using optimal prediction.

If this is right

  • Finite samples from Gaussian processes produce estimates of causal influences with explicit high-probability accuracy guarantees.
  • The method applies directly to vector-valued sequences without requiring discretization to finite alphabets.
  • Optimal prediction supplies a closed-form route to the information rate for this class of data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The logarithmic factor suggests that sample sizes must grow somewhat faster than N to achieve a given precision.
  • The technique may extend to linear models or data close to Gaussian in practice.
  • Simulations on synthetic Gaussian sequences could isolate whether the log N term is tight.

Load-bearing premise

The data consist of sequences of Gaussian vectors.

What would settle it

A large-sample experiment on Gaussian vector sequences where the estimator deviates from the true rate by more than order N^{-1/2} log N with high probability would falsify the bound.

read the original abstract

Directed information and its causally conditioned variations are often used to measure causal influences between random processes. In practice, these quantities must be measured from data. Non-asymptotic error bounds for these estimates are known for sequences over finite alphabets, but less is known for real-valued data. This paper examines the case in which the data are sequences of Gaussian vectors. We provide an explicit formula for causally conditioned directed information rate based on optimal prediction and define an estimator based on this formula. We show that our estimator gives an error of order $O\left(N^{-1/2}\log(N)\right)$ with high probability, where $N$ is the total sample size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript derives an explicit formula for the causally conditioned directed information rate of Gaussian vector sequences using optimal linear prediction, defines an estimator based on this formula, and proves a non-asymptotic high-probability error bound of O(N^{-1/2} log N) for the estimator (N total sample size).

Significance. If correct, the result supplies non-asymptotic concentration guarantees for directed information estimation under Gaussian assumptions, extending beyond finite-alphabet cases. The explicit prediction-based formula is a clear strength, as it reduces the rate to log-determinants of prediction-error covariances without introducing auxiliary parameters.

major comments (1)
  1. Abstract and central claim: the stated O(N^{-1/2} log N) high-probability bound does not indicate dependence on vector dimension d or a uniform lower bound on the smallest eigenvalue of the relevant covariance matrices. Standard matrix concentration inequalities used to control estimation of the linear predictors and prediction-error covariances introduce factors of d or 1/λ_min that can dominate the claimed rate when d grows with N or λ_min approaches zero; these must be tracked explicitly for the bound to hold as stated.
minor comments (1)
  1. Clarify in the introduction or notation section whether the processes are assumed strictly stationary and whether the covariance matrices are uniformly positive definite across all lags.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below and will incorporate clarifications into the revised manuscript.

read point-by-point responses
  1. Referee: Abstract and central claim: the stated O(N^{-1/2} log N) high-probability bound does not indicate dependence on vector dimension d or a uniform lower bound on the smallest eigenvalue of the relevant covariance matrices. Standard matrix concentration inequalities used to control estimation of the linear predictors and prediction-error covariances introduce factors of d or 1/λ_min that can dominate the claimed rate when d grows with N or λ_min approaches zero; these must be tracked explicitly for the bound to hold as stated.

    Authors: We agree that the dependence on dimension d and the minimal eigenvalue λ_min must be stated explicitly for precision. Our analysis assumes fixed d and that the relevant covariance matrices (of the vector processes and their linear predictors) satisfy a uniform lower bound λ_min ≥ λ > 0 independent of N; these are standard modeling assumptions for Gaussian vector sequences with well-conditioned second-order statistics. Under these conditions, standard matrix concentration results (e.g., matrix Bernstein or Vershynin-type bounds on the sample covariances of the stacked predictor vectors) produce an error term whose leading N-dependence is indeed O(N^{-1/2} log N), with the multiplicative constant depending on d, λ, the process memory length, and the sub-Gaussian parameters. The abstract and theorem statements therefore omit these fixed parameters from the big-O notation. To address the referee's point directly, we will revise the abstract to read “O(N^{-1/2} log N) (with implicit constant depending on d and λ_min)” and will add an explicit statement of the same form to the main theorem, together with a short remark discussing the regime in which d may grow with N. These changes will be made in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation of explicit formula or non-asymptotic estimator bound

full rationale

The paper first states an explicit formula for the causally conditioned directed information rate that follows directly from the properties of multivariate Gaussian processes and the optimality of linear predictors (reducing to log-determinants of prediction-error covariances). The estimator is then constructed by replacing the population covariances in this formula with their empirical counterparts. The claimed O(N^{-1/2} log N) high-probability bound is obtained by applying standard matrix concentration inequalities to control the deviation of these empirical quantities. None of these steps reduces a claimed result to its own inputs by construction, invokes a load-bearing self-citation, or renames a fitted quantity as a prediction. The chain is therefore self-contained against external benchmarks such as Gaussian concentration theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the sequences are Gaussian vectors, which permits the explicit prediction-based formula; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption The observed sequences are jointly Gaussian vector processes.
    This property is invoked to obtain an explicit formula for the causally conditioned directed information rate via optimal prediction.

pith-pipeline@v0.9.0 · 5643 in / 1235 out tokens · 54809 ms · 2026-05-21T18:53:08.059752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    and Michel, O.J

    Amblard, P.O. and Michel, O.J. (2011). On directed information theory and granger causality graphs. Journal of computational neuroscience, 30(1), 7--16

  4. [4]

    and Michel, O.J

    Amblard, P.O. and Michel, O.J. (2012). The relation between granger causality and directed information theory: A review. Entropy, 15(1), 113--143

  5. [5]

    Barnett, L., Barrett, A.B., and Seth, A.K. (2009). Granger causality and transfer entropy are equivalent for gaussian variables. Physical review letters, 103(23), 238701

  6. [6]

    Chicharro, D. (2011). On the spectral formulation of granger causality. Biological cybernetics, 105(5), 331--347

  7. [7]

    Divernois, M.A., Etesami, J., Filipovic, D., and Kiyavash, N. (2024). Analysis of large market data using neural networks: A causal approach. IEEE Journal on Selected Areas in Information Theory, 4, 833--847

  8. [8]

    Etesami, J., Habibnia, A., and Kiyavash, N. (2017). Econometric modeling of systemic risk: going beyond pairwise comparison and allowing for nonlinearity

  9. [9]

    Etesami, J., Habibnia, A., and Kiyavash, N. (2023). Modeling systemic risk: A time-varying nonparametric causal inference framework. arXiv preprint arXiv:2312.16707

  10. [10]

    and Kiyavash, N

    Etesami, J. and Kiyavash, N. (2014). Directed information graphs: A generalization of linear dynamical graphs. In 2014 American control conference, 2563--2568. IEEE

  11. [11]

    Geweke, J. (1982). Measurement of linear dependence and feedback between multiple time series. Journal of the American statistical association, 77(378), 304--313

  12. [12]

    Goldenshluger, A., Zeevi, A., et al. (2001). Nonasymptotic bounds for autoregressive time series modeling. The Annals of Statistics, 29(2), 417--444

  13. [13]

    Granger, C.W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society, 424--438

  14. [14]

    and Loos, S.A

    Hempel, T. and Loos, S.A. (2024). A simple reconstruction method to infer nonreciprocal interactions and local driving in complex systems. arXiv preprint arXiv:2403.09243

  15. [15]

    and Johnson, C.R

    Horn, R.A. and Johnson, C.R. (2012). Matrix analysis. Cambridge university press

  16. [16]

    Jiao, J., Permuter, H.H., Zhao, L., Kim, Y.H., and Weissman, T. (2013). Universal estimation of directed information. IEEE Transactions on Information Theory, 59(10), 6220--6242

  17. [17]

    Kailath, T., Sayed, A.H., and Hassibi, B. (2000). Linear estimation. Prentice-Hall information and system sciences series. Prentice Hall, Upper Saddle River, N.J

  18. [18]

    Kramer, G. (1998). Directed information for channels with feedback, volume 11. Hartung-Gorre Konstanz, Germany

  19. [19]

    Lamperski, A. (2023). Nonasymptotic pointwise and worst-case bounds for classical spectrum estimators. IEEE Transactions on Signal Processing, 71, 4273--4287

  20. [20]

    and Lamperski, A

    Lee, B. and Lamperski, A. (2020). Non-asymptotic closed-loop system identification using autoregressive processes and hankel model reduction. In 2020 59th IEEE Conference on Decision and Control (CDC), 3419--3424. IEEE

  21. [21]

    Marko, H. (1973). The bidirectional communication theory-a generalization of information theory. IEEE Transactions on Communications, 21(12), 1345--1351

  22. [22]

    Massey, J. et al. (1990). Causality, feedback and directed information. In Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), volume 2, 1

  23. [23]

    Permuter, H.H., Kim, Y.H., and Weissman, T. (2011). Interpretations of directed information in portfolio theory, data compression, and hypothesis testing. IEEE Transactions on Information Theory, 57(6), 3248--3259

  24. [24]

    (2011 a )

    Quinn, C.J., Coleman, T.P., Kiyavash, N., and Hatsopoulos, N.G. (2011 a ). Estimating the directed information to infer causal relationships in ensemble neural spike train recordings. Journal of computational neuroscience, 30, 17--44

  25. [25]

    (2011 b )

    Quinn, C.J., Kiyavash, N., and Coleman, T.P. (2011 b ). Equivalence between minimal generative model graphs and directed information graphs. In 2011 IEEE International Symposium on Information Theory Proceedings, 293--297. IEEE

  26. [26]

    Quinn, C.J., Kiyavash, N., and Coleman, T.P. (2015). Directed information graphs. IEEE Transactions on information theory, 61(12), 6887--6909

  27. [27]

    Quinn, C.J., Pinar, A., and Kiyavash, N. (2017). Bounded-degree connected approximations of stochastic networks. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 3(2), 79--88

  28. [28]

    (2007 a )

    Rao, A., Hero III, A.O., States, D.J., and Engel, J.D. (2007 a ). Motif discovery in tissue-specific regulatory sequences using directed information. EURASIP Journal on Bioinformatics and Systems Biology, 2007(1), 13853

  29. [29]

    (2007 b )

    Rao, A., Hero III, A.O., States, D.J., and Engel, J.D. (2007 b ). Using directed information to build biologically relevant influence networks. In Computational Systems Bioinformatics: (Volume 6), 145--156. World Scientific

  30. [30]

    Schreiber, T. (2000). Measuring information transfer. Physical review letters, 85(2), 461

  31. [31]

    Subramanian, V.R., Lamperski, A., and Salapaka, M.V. (2021). Effects of data corruption on network identification using directed information. IEEE Transactions on Automatic Control

  32. [32]

    (2017 a )

    Tanaka, T., Esfahani, P.M., and Mitter, S.K. (2017 a ). Lqg control with minimum directed information: Semidefinite programming approach. IEEE Transactions on Automatic Control, 63(1), 37--52

  33. [33]

    (2017 b )

    Tanaka, T., Skoglund, M., Sandberg, H., and Johansson, K.H. (2017 b ). Directed information and privacy loss in cloud-based control. In 2017 American control conference (ACC), 1666--1672. IEEE

  34. [34]

    Tsur, D., Aharoni, Z., Goldfeld, Z., and Permuter, H. (2023). Neural estimation and optimization of directed information over continuous spaces. IEEE Transactions on Information Theory

  35. [35]

    and Galstyan, A

    Ver Steeg, G. and Galstyan, A. (2012). Information transfer in social media. In Proceedings of the 21st international conference on World Wide Web, 509--518

  36. [36]

    Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press

  37. [37]

    Wainwright, M.J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press

  38. [38]

    Wang, Z., Liang, Y., Zhu, D.C., and Li, T. (2018). The relationship of discrete dcm and directed information in fmri-based causality analysis. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 4(1), 3--13

  39. [39]

    and Masani, P

    Wiener, N. and Masani, P. (1957). The prediction theory of multivariate stochastic processes. Acta Mathematica, 98(1), 111--150

  40. [40]

    Young, J., Neveu, C.L., Byrne, J.H., and Aazhang, B. (2021). Inferring functional connectivity through graphical directed information. Journal of neural engineering, 18(4), 046019