Non-Asymptotic Error Bounds for Causally Conditioned Directed Information Rates of Gaussian Sequences
Pith reviewed 2026-05-21 18:53 UTC · model grok-4.3
The pith
For Gaussian vector sequences, an estimator of causally conditioned directed information rates achieves error O(N^{-1/2} log N) with high probability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We provide an explicit formula for the causally conditioned directed information rate based on optimal prediction for sequences of Gaussian vectors. We define an estimator based on this formula and show that our estimator gives an error of order O(N^{-1/2} log N) with high probability, where N is the total sample size.
What carries the argument
The estimator defined from the explicit formula for the causally conditioned directed information rate using optimal prediction.
If this is right
- Finite samples from Gaussian processes produce estimates of causal influences with explicit high-probability accuracy guarantees.
- The method applies directly to vector-valued sequences without requiring discretization to finite alphabets.
- Optimal prediction supplies a closed-form route to the information rate for this class of data.
Where Pith is reading between the lines
- The logarithmic factor suggests that sample sizes must grow somewhat faster than N to achieve a given precision.
- The technique may extend to linear models or data close to Gaussian in practice.
- Simulations on synthetic Gaussian sequences could isolate whether the log N term is tight.
Load-bearing premise
The data consist of sequences of Gaussian vectors.
What would settle it
A large-sample experiment on Gaussian vector sequences where the estimator deviates from the true rate by more than order N^{-1/2} log N with high probability would falsify the bound.
read the original abstract
Directed information and its causally conditioned variations are often used to measure causal influences between random processes. In practice, these quantities must be measured from data. Non-asymptotic error bounds for these estimates are known for sequences over finite alphabets, but less is known for real-valued data. This paper examines the case in which the data are sequences of Gaussian vectors. We provide an explicit formula for causally conditioned directed information rate based on optimal prediction and define an estimator based on this formula. We show that our estimator gives an error of order $O\left(N^{-1/2}\log(N)\right)$ with high probability, where $N$ is the total sample size.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives an explicit formula for the causally conditioned directed information rate of Gaussian vector sequences using optimal linear prediction, defines an estimator based on this formula, and proves a non-asymptotic high-probability error bound of O(N^{-1/2} log N) for the estimator (N total sample size).
Significance. If correct, the result supplies non-asymptotic concentration guarantees for directed information estimation under Gaussian assumptions, extending beyond finite-alphabet cases. The explicit prediction-based formula is a clear strength, as it reduces the rate to log-determinants of prediction-error covariances without introducing auxiliary parameters.
major comments (1)
- Abstract and central claim: the stated O(N^{-1/2} log N) high-probability bound does not indicate dependence on vector dimension d or a uniform lower bound on the smallest eigenvalue of the relevant covariance matrices. Standard matrix concentration inequalities used to control estimation of the linear predictors and prediction-error covariances introduce factors of d or 1/λ_min that can dominate the claimed rate when d grows with N or λ_min approaches zero; these must be tracked explicitly for the bound to hold as stated.
minor comments (1)
- Clarify in the introduction or notation section whether the processes are assumed strictly stationary and whether the covariance matrices are uniformly positive definite across all lags.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. We address the single major comment below and will incorporate clarifications into the revised manuscript.
read point-by-point responses
-
Referee: Abstract and central claim: the stated O(N^{-1/2} log N) high-probability bound does not indicate dependence on vector dimension d or a uniform lower bound on the smallest eigenvalue of the relevant covariance matrices. Standard matrix concentration inequalities used to control estimation of the linear predictors and prediction-error covariances introduce factors of d or 1/λ_min that can dominate the claimed rate when d grows with N or λ_min approaches zero; these must be tracked explicitly for the bound to hold as stated.
Authors: We agree that the dependence on dimension d and the minimal eigenvalue λ_min must be stated explicitly for precision. Our analysis assumes fixed d and that the relevant covariance matrices (of the vector processes and their linear predictors) satisfy a uniform lower bound λ_min ≥ λ > 0 independent of N; these are standard modeling assumptions for Gaussian vector sequences with well-conditioned second-order statistics. Under these conditions, standard matrix concentration results (e.g., matrix Bernstein or Vershynin-type bounds on the sample covariances of the stacked predictor vectors) produce an error term whose leading N-dependence is indeed O(N^{-1/2} log N), with the multiplicative constant depending on d, λ, the process memory length, and the sub-Gaussian parameters. The abstract and theorem statements therefore omit these fixed parameters from the big-O notation. To address the referee's point directly, we will revise the abstract to read “O(N^{-1/2} log N) (with implicit constant depending on d and λ_min)” and will add an explicit statement of the same form to the main theorem, together with a short remark discussing the regime in which d may grow with N. These changes will be made in the next version. revision: yes
Circularity Check
No circularity in derivation of explicit formula or non-asymptotic estimator bound
full rationale
The paper first states an explicit formula for the causally conditioned directed information rate that follows directly from the properties of multivariate Gaussian processes and the optimality of linear predictors (reducing to log-determinants of prediction-error covariances). The estimator is then constructed by replacing the population covariances in this formula with their empirical counterparts. The claimed O(N^{-1/2} log N) high-probability bound is obtained by applying standard matrix concentration inequalities to control the deviation of these empirical quantities. None of these steps reduces a claimed result to its own inputs by construction, invokes a load-bearing self-citation, or renames a fitted quantity as a prediction. The chain is therefore self-contained against external benchmarks such as Gaussian concentration theory.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The observed sequences are jointly Gaussian vector processes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1: I∞(x→y∥z) = ½ log det(Γyy)/det(Σyy) where Σ, Γ are prediction-error covariances
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assumption 1: c_min I ≼ Φ(e^{jω}) ≼ c_max I; rational PSD; Gaussian vectors
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Amblard, P.O. and Michel, O.J. (2011). On directed information theory and granger causality graphs. Journal of computational neuroscience, 30(1), 7--16
work page 2011
-
[4]
Amblard, P.O. and Michel, O.J. (2012). The relation between granger causality and directed information theory: A review. Entropy, 15(1), 113--143
work page 2012
-
[5]
Barnett, L., Barrett, A.B., and Seth, A.K. (2009). Granger causality and transfer entropy are equivalent for gaussian variables. Physical review letters, 103(23), 238701
work page 2009
-
[6]
Chicharro, D. (2011). On the spectral formulation of granger causality. Biological cybernetics, 105(5), 331--347
work page 2011
-
[7]
Divernois, M.A., Etesami, J., Filipovic, D., and Kiyavash, N. (2024). Analysis of large market data using neural networks: A causal approach. IEEE Journal on Selected Areas in Information Theory, 4, 833--847
work page 2024
-
[8]
Etesami, J., Habibnia, A., and Kiyavash, N. (2017). Econometric modeling of systemic risk: going beyond pairwise comparison and allowing for nonlinearity
work page 2017
- [9]
-
[10]
Etesami, J. and Kiyavash, N. (2014). Directed information graphs: A generalization of linear dynamical graphs. In 2014 American control conference, 2563--2568. IEEE
work page 2014
-
[11]
Geweke, J. (1982). Measurement of linear dependence and feedback between multiple time series. Journal of the American statistical association, 77(378), 304--313
work page 1982
-
[12]
Goldenshluger, A., Zeevi, A., et al. (2001). Nonasymptotic bounds for autoregressive time series modeling. The Annals of Statistics, 29(2), 417--444
work page 2001
-
[13]
Granger, C.W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society, 424--438
work page 1969
-
[14]
Hempel, T. and Loos, S.A. (2024). A simple reconstruction method to infer nonreciprocal interactions and local driving in complex systems. arXiv preprint arXiv:2403.09243
-
[15]
Horn, R.A. and Johnson, C.R. (2012). Matrix analysis. Cambridge university press
work page 2012
-
[16]
Jiao, J., Permuter, H.H., Zhao, L., Kim, Y.H., and Weissman, T. (2013). Universal estimation of directed information. IEEE Transactions on Information Theory, 59(10), 6220--6242
work page 2013
-
[17]
Kailath, T., Sayed, A.H., and Hassibi, B. (2000). Linear estimation. Prentice-Hall information and system sciences series. Prentice Hall, Upper Saddle River, N.J
work page 2000
-
[18]
Kramer, G. (1998). Directed information for channels with feedback, volume 11. Hartung-Gorre Konstanz, Germany
work page 1998
-
[19]
Lamperski, A. (2023). Nonasymptotic pointwise and worst-case bounds for classical spectrum estimators. IEEE Transactions on Signal Processing, 71, 4273--4287
work page 2023
-
[20]
Lee, B. and Lamperski, A. (2020). Non-asymptotic closed-loop system identification using autoregressive processes and hankel model reduction. In 2020 59th IEEE Conference on Decision and Control (CDC), 3419--3424. IEEE
work page 2020
-
[21]
Marko, H. (1973). The bidirectional communication theory-a generalization of information theory. IEEE Transactions on Communications, 21(12), 1345--1351
work page 1973
-
[22]
Massey, J. et al. (1990). Causality, feedback and directed information. In Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), volume 2, 1
work page 1990
-
[23]
Permuter, H.H., Kim, Y.H., and Weissman, T. (2011). Interpretations of directed information in portfolio theory, data compression, and hypothesis testing. IEEE Transactions on Information Theory, 57(6), 3248--3259
work page 2011
- [24]
- [25]
-
[26]
Quinn, C.J., Kiyavash, N., and Coleman, T.P. (2015). Directed information graphs. IEEE Transactions on information theory, 61(12), 6887--6909
work page 2015
-
[27]
Quinn, C.J., Pinar, A., and Kiyavash, N. (2017). Bounded-degree connected approximations of stochastic networks. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 3(2), 79--88
work page 2017
- [28]
- [29]
-
[30]
Schreiber, T. (2000). Measuring information transfer. Physical review letters, 85(2), 461
work page 2000
-
[31]
Subramanian, V.R., Lamperski, A., and Salapaka, M.V. (2021). Effects of data corruption on network identification using directed information. IEEE Transactions on Automatic Control
work page 2021
- [32]
- [33]
-
[34]
Tsur, D., Aharoni, Z., Goldfeld, Z., and Permuter, H. (2023). Neural estimation and optimization of directed information over continuous spaces. IEEE Transactions on Information Theory
work page 2023
-
[35]
Ver Steeg, G. and Galstyan, A. (2012). Information transfer in social media. In Proceedings of the 21st international conference on World Wide Web, 509--518
work page 2012
-
[36]
Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press
work page 2018
-
[37]
Wainwright, M.J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press
work page 2019
-
[38]
Wang, Z., Liang, Y., Zhu, D.C., and Li, T. (2018). The relationship of discrete dcm and directed information in fmri-based causality analysis. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 4(1), 3--13
work page 2018
-
[39]
Wiener, N. and Masani, P. (1957). The prediction theory of multivariate stochastic processes. Acta Mathematica, 98(1), 111--150
work page 1957
-
[40]
Young, J., Neveu, C.L., Byrne, J.H., and Aazhang, B. (2021). Inferring functional connectivity through graphical directed information. Journal of neural engineering, 18(4), 046019
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.