Non-Asymptotic Error Bounds for Causally Conditioned Directed Information Rates of Gaussian Sequences

Andrew Lamperski; Yuping Zheng

arxiv: 2512.06238 · v2 · pith:5Y4ZMBNInew · submitted 2025-12-06 · 💻 cs.IT · math.IT· math.ST· stat.TH

Non-Asymptotic Error Bounds for Causally Conditioned Directed Information Rates of Gaussian Sequences

Yuping Zheng , Andrew Lamperski This is my paper

Pith reviewed 2026-05-21 18:53 UTC · model grok-4.3

classification 💻 cs.IT math.ITmath.STstat.TH

keywords directed informationcausal conditioningGaussian sequencesnon-asymptotic boundsestimation errorinformation rateoptimal prediction

0 comments

The pith

For Gaussian vector sequences, an estimator of causally conditioned directed information rates achieves error O(N^{-1/2} log N) with high probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes non-asymptotic error bounds for estimators of causally conditioned directed information rates from sequences of Gaussian vectors. Directed information measures causal influences between random processes, so practical use requires finite-sample estimators with explicit accuracy guarantees. The authors derive an explicit formula for the rate based on optimal prediction and construct an estimator from that formula. They prove the estimator's error is of order O(N^{-1/2} log N) with high probability for total sample size N. A sympathetic reader would care because these bounds support reliable causal analysis of real-valued time series without asymptotic approximations.

Core claim

We provide an explicit formula for the causally conditioned directed information rate based on optimal prediction for sequences of Gaussian vectors. We define an estimator based on this formula and show that our estimator gives an error of order O(N^{-1/2} log N) with high probability, where N is the total sample size.

What carries the argument

The estimator defined from the explicit formula for the causally conditioned directed information rate using optimal prediction.

If this is right

Finite samples from Gaussian processes produce estimates of causal influences with explicit high-probability accuracy guarantees.
The method applies directly to vector-valued sequences without requiring discretization to finite alphabets.
Optimal prediction supplies a closed-form route to the information rate for this class of data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The logarithmic factor suggests that sample sizes must grow somewhat faster than N to achieve a given precision.
The technique may extend to linear models or data close to Gaussian in practice.
Simulations on synthetic Gaussian sequences could isolate whether the log N term is tight.

Load-bearing premise

The data consist of sequences of Gaussian vectors.

What would settle it

A large-sample experiment on Gaussian vector sequences where the estimator deviates from the true rate by more than order N^{-1/2} log N with high probability would falsify the bound.

read the original abstract

Directed information and its causally conditioned variations are often used to measure causal influences between random processes. In practice, these quantities must be measured from data. Non-asymptotic error bounds for these estimates are known for sequences over finite alphabets, but less is known for real-valued data. This paper examines the case in which the data are sequences of Gaussian vectors. We provide an explicit formula for causally conditioned directed information rate based on optimal prediction and define an estimator based on this formula. We show that our estimator gives an error of order $O\left(N^{-1/2}\log(N)\right)$ with high probability, where $N$ is the total sample size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper gives explicit non-asymptotic bounds for causally conditioned directed info on Gaussians but may hide dimension factors in the O(N^{-1/2} log N) claim.

read the letter

The one thing to know is that this paper supplies an explicit formula for the causally conditioned directed information rate when the observations are Gaussian vectors, and then builds an estimator from it that comes with a high-probability error bound of order O(N to the minus one half times log N). They start from the fact that for jointly Gaussian processes the optimal causal predictor is linear. This reduces the directed information rate to a combination of log determinants of the prediction error covariance matrices at each lag. From there they define a natural sample-based estimator and analyze its deviation using matrix concentration inequalities. That is the concrete advance over the finite-alphabet literature, where such closed forms are not available. The result is useful because many practical time series are modeled as Gaussian or approximately so, and having a rate that scales like the usual parametric rate plus a log factor is helpful for deciding how much data is needed. The approach avoids fitting extra parameters and stays close to the population quantities. One place that could be tighter is the handling of dimension. The bound as stated in the abstract does not mention whether the vector dimension is held fixed or allowed to grow. If the dimension increases, even slowly, the union bound or the matrix deviation terms will introduce polynomial factors in the dimension that might overwhelm the log N. Likewise, if the covariance matrices can have small eigenvalues, the concentration constants blow up. I would look in the proof to see whether they assume uniform eigenvalue bounds or derive the dependence explicitly. If those assumptions are stated and the bound is adjusted accordingly, the claim is fine; otherwise it is an implicit restriction. Readers who care about finite-sample analysis of information measures on continuous data will get the most out of this. It is a targeted extension rather than a broad new framework, so it fits best in a specialized journal or conference in information theory. The work shows clear engagement with the existing bounds and makes a reasonable claim, so it is worth sending out for review.

Referee Report

1 major / 1 minor

Summary. The manuscript derives an explicit formula for the causally conditioned directed information rate of Gaussian vector sequences using optimal linear prediction, defines an estimator based on this formula, and proves a non-asymptotic high-probability error bound of O(N^{-1/2} log N) for the estimator (N total sample size).

Significance. If correct, the result supplies non-asymptotic concentration guarantees for directed information estimation under Gaussian assumptions, extending beyond finite-alphabet cases. The explicit prediction-based formula is a clear strength, as it reduces the rate to log-determinants of prediction-error covariances without introducing auxiliary parameters.

major comments (1)

Abstract and central claim: the stated O(N^{-1/2} log N) high-probability bound does not indicate dependence on vector dimension d or a uniform lower bound on the smallest eigenvalue of the relevant covariance matrices. Standard matrix concentration inequalities used to control estimation of the linear predictors and prediction-error covariances introduce factors of d or 1/λ_min that can dominate the claimed rate when d grows with N or λ_min approaches zero; these must be tracked explicitly for the bound to hold as stated.

minor comments (1)

Clarify in the introduction or notation section whether the processes are assumed strictly stationary and whether the covariance matrices are uniformly positive definite across all lags.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below and will incorporate clarifications into the revised manuscript.

read point-by-point responses

Referee: Abstract and central claim: the stated O(N^{-1/2} log N) high-probability bound does not indicate dependence on vector dimension d or a uniform lower bound on the smallest eigenvalue of the relevant covariance matrices. Standard matrix concentration inequalities used to control estimation of the linear predictors and prediction-error covariances introduce factors of d or 1/λ_min that can dominate the claimed rate when d grows with N or λ_min approaches zero; these must be tracked explicitly for the bound to hold as stated.

Authors: We agree that the dependence on dimension d and the minimal eigenvalue λ_min must be stated explicitly for precision. Our analysis assumes fixed d and that the relevant covariance matrices (of the vector processes and their linear predictors) satisfy a uniform lower bound λ_min ≥ λ > 0 independent of N; these are standard modeling assumptions for Gaussian vector sequences with well-conditioned second-order statistics. Under these conditions, standard matrix concentration results (e.g., matrix Bernstein or Vershynin-type bounds on the sample covariances of the stacked predictor vectors) produce an error term whose leading N-dependence is indeed O(N^{-1/2} log N), with the multiplicative constant depending on d, λ, the process memory length, and the sub-Gaussian parameters. The abstract and theorem statements therefore omit these fixed parameters from the big-O notation. To address the referee's point directly, we will revise the abstract to read “O(N^{-1/2} log N) (with implicit constant depending on d and λ_min)” and will add an explicit statement of the same form to the main theorem, together with a short remark discussing the regime in which d may grow with N. These changes will be made in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation of explicit formula or non-asymptotic estimator bound

full rationale

The paper first states an explicit formula for the causally conditioned directed information rate that follows directly from the properties of multivariate Gaussian processes and the optimality of linear predictors (reducing to log-determinants of prediction-error covariances). The estimator is then constructed by replacing the population covariances in this formula with their empirical counterparts. The claimed O(N^{-1/2} log N) high-probability bound is obtained by applying standard matrix concentration inequalities to control the deviation of these empirical quantities. None of these steps reduces a claimed result to its own inputs by construction, invokes a load-bearing self-citation, or renames a fitted quantity as a prediction. The chain is therefore self-contained against external benchmarks such as Gaussian concentration theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the sequences are Gaussian vectors, which permits the explicit prediction-based formula; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption The observed sequences are jointly Gaussian vector processes.
This property is invoked to obtain an explicit formula for the causally conditioned directed information rate via optimal prediction.

pith-pipeline@v0.9.0 · 5643 in / 1235 out tokens · 54809 ms · 2026-05-21T18:53:08.059752+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1: I∞(x→y∥z) = ½ log det(Γyy)/det(Σyy) where Σ, Γ are prediction-error covariances
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 1: c_min I ≼ Φ(e^{jω}) ≼ c_max I; rational PSD; Gaussian vectors

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

and Michel, O.J

Amblard, P.O. and Michel, O.J. (2011). On directed information theory and granger causality graphs. Journal of computational neuroscience, 30(1), 7--16

work page 2011
[4]

and Michel, O.J

Amblard, P.O. and Michel, O.J. (2012). The relation between granger causality and directed information theory: A review. Entropy, 15(1), 113--143

work page 2012
[5]

Barnett, L., Barrett, A.B., and Seth, A.K. (2009). Granger causality and transfer entropy are equivalent for gaussian variables. Physical review letters, 103(23), 238701

work page 2009
[6]

Chicharro, D. (2011). On the spectral formulation of granger causality. Biological cybernetics, 105(5), 331--347

work page 2011
[7]

Divernois, M.A., Etesami, J., Filipovic, D., and Kiyavash, N. (2024). Analysis of large market data using neural networks: A causal approach. IEEE Journal on Selected Areas in Information Theory, 4, 833--847

work page 2024
[8]

Etesami, J., Habibnia, A., and Kiyavash, N. (2017). Econometric modeling of systemic risk: going beyond pairwise comparison and allowing for nonlinearity

work page 2017
[9]

Etesami, J., Habibnia, A., and Kiyavash, N. (2023). Modeling systemic risk: A time-varying nonparametric causal inference framework. arXiv preprint arXiv:2312.16707

work page arXiv 2023
[10]

and Kiyavash, N

Etesami, J. and Kiyavash, N. (2014). Directed information graphs: A generalization of linear dynamical graphs. In 2014 American control conference, 2563--2568. IEEE

work page 2014
[11]

Geweke, J. (1982). Measurement of linear dependence and feedback between multiple time series. Journal of the American statistical association, 77(378), 304--313

work page 1982
[12]

Goldenshluger, A., Zeevi, A., et al. (2001). Nonasymptotic bounds for autoregressive time series modeling. The Annals of Statistics, 29(2), 417--444

work page 2001
[13]

Granger, C.W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society, 424--438

work page 1969
[14]

and Loos, S.A

Hempel, T. and Loos, S.A. (2024). A simple reconstruction method to infer nonreciprocal interactions and local driving in complex systems. arXiv preprint arXiv:2403.09243

work page arXiv 2024
[15]

and Johnson, C.R

Horn, R.A. and Johnson, C.R. (2012). Matrix analysis. Cambridge university press

work page 2012
[16]

Jiao, J., Permuter, H.H., Zhao, L., Kim, Y.H., and Weissman, T. (2013). Universal estimation of directed information. IEEE Transactions on Information Theory, 59(10), 6220--6242

work page 2013
[17]

Kailath, T., Sayed, A.H., and Hassibi, B. (2000). Linear estimation. Prentice-Hall information and system sciences series. Prentice Hall, Upper Saddle River, N.J

work page 2000
[18]

Kramer, G. (1998). Directed information for channels with feedback, volume 11. Hartung-Gorre Konstanz, Germany

work page 1998
[19]

Lamperski, A. (2023). Nonasymptotic pointwise and worst-case bounds for classical spectrum estimators. IEEE Transactions on Signal Processing, 71, 4273--4287

work page 2023
[20]

and Lamperski, A

Lee, B. and Lamperski, A. (2020). Non-asymptotic closed-loop system identification using autoregressive processes and hankel model reduction. In 2020 59th IEEE Conference on Decision and Control (CDC), 3419--3424. IEEE

work page 2020
[21]

Marko, H. (1973). The bidirectional communication theory-a generalization of information theory. IEEE Transactions on Communications, 21(12), 1345--1351

work page 1973
[22]

Massey, J. et al. (1990). Causality, feedback and directed information. In Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), volume 2, 1

work page 1990
[23]

Permuter, H.H., Kim, Y.H., and Weissman, T. (2011). Interpretations of directed information in portfolio theory, data compression, and hypothesis testing. IEEE Transactions on Information Theory, 57(6), 3248--3259

work page 2011
[24]

(2011 a )

Quinn, C.J., Coleman, T.P., Kiyavash, N., and Hatsopoulos, N.G. (2011 a ). Estimating the directed information to infer causal relationships in ensemble neural spike train recordings. Journal of computational neuroscience, 30, 17--44

work page 2011
[25]

(2011 b )

Quinn, C.J., Kiyavash, N., and Coleman, T.P. (2011 b ). Equivalence between minimal generative model graphs and directed information graphs. In 2011 IEEE International Symposium on Information Theory Proceedings, 293--297. IEEE

work page 2011
[26]

Quinn, C.J., Kiyavash, N., and Coleman, T.P. (2015). Directed information graphs. IEEE Transactions on information theory, 61(12), 6887--6909

work page 2015
[27]

Quinn, C.J., Pinar, A., and Kiyavash, N. (2017). Bounded-degree connected approximations of stochastic networks. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 3(2), 79--88

work page 2017
[28]

(2007 a )

Rao, A., Hero III, A.O., States, D.J., and Engel, J.D. (2007 a ). Motif discovery in tissue-specific regulatory sequences using directed information. EURASIP Journal on Bioinformatics and Systems Biology, 2007(1), 13853

work page 2007
[29]

(2007 b )

Rao, A., Hero III, A.O., States, D.J., and Engel, J.D. (2007 b ). Using directed information to build biologically relevant influence networks. In Computational Systems Bioinformatics: (Volume 6), 145--156. World Scientific

work page 2007
[30]

Schreiber, T. (2000). Measuring information transfer. Physical review letters, 85(2), 461

work page 2000
[31]

Subramanian, V.R., Lamperski, A., and Salapaka, M.V. (2021). Effects of data corruption on network identification using directed information. IEEE Transactions on Automatic Control

work page 2021
[32]

(2017 a )

Tanaka, T., Esfahani, P.M., and Mitter, S.K. (2017 a ). Lqg control with minimum directed information: Semidefinite programming approach. IEEE Transactions on Automatic Control, 63(1), 37--52

work page 2017
[33]

(2017 b )

Tanaka, T., Skoglund, M., Sandberg, H., and Johansson, K.H. (2017 b ). Directed information and privacy loss in cloud-based control. In 2017 American control conference (ACC), 1666--1672. IEEE

work page 2017
[34]

Tsur, D., Aharoni, Z., Goldfeld, Z., and Permuter, H. (2023). Neural estimation and optimization of directed information over continuous spaces. IEEE Transactions on Information Theory

work page 2023
[35]

and Galstyan, A

Ver Steeg, G. and Galstyan, A. (2012). Information transfer in social media. In Proceedings of the 21st international conference on World Wide Web, 509--518

work page 2012
[36]

Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press

work page 2018
[37]

Wainwright, M.J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press

work page 2019
[38]

Wang, Z., Liang, Y., Zhu, D.C., and Li, T. (2018). The relationship of discrete dcm and directed information in fmri-based causality analysis. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 4(1), 3--13

work page 2018
[39]

and Masani, P

Wiener, N. and Masani, P. (1957). The prediction theory of multivariate stochastic processes. Acta Mathematica, 98(1), 111--150

work page 1957
[40]

Young, J., Neveu, C.L., Byrne, J.H., and Aazhang, B. (2021). Inferring functional connectivity through graphical directed information. Journal of neural engineering, 18(4), 046019

work page 2021

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

and Michel, O.J

Amblard, P.O. and Michel, O.J. (2011). On directed information theory and granger causality graphs. Journal of computational neuroscience, 30(1), 7--16

work page 2011

[4] [4]

and Michel, O.J

Amblard, P.O. and Michel, O.J. (2012). The relation between granger causality and directed information theory: A review. Entropy, 15(1), 113--143

work page 2012

[5] [5]

Barnett, L., Barrett, A.B., and Seth, A.K. (2009). Granger causality and transfer entropy are equivalent for gaussian variables. Physical review letters, 103(23), 238701

work page 2009

[6] [6]

Chicharro, D. (2011). On the spectral formulation of granger causality. Biological cybernetics, 105(5), 331--347

work page 2011

[7] [7]

Divernois, M.A., Etesami, J., Filipovic, D., and Kiyavash, N. (2024). Analysis of large market data using neural networks: A causal approach. IEEE Journal on Selected Areas in Information Theory, 4, 833--847

work page 2024

[8] [8]

Etesami, J., Habibnia, A., and Kiyavash, N. (2017). Econometric modeling of systemic risk: going beyond pairwise comparison and allowing for nonlinearity

work page 2017

[9] [9]

Etesami, J., Habibnia, A., and Kiyavash, N. (2023). Modeling systemic risk: A time-varying nonparametric causal inference framework. arXiv preprint arXiv:2312.16707

work page arXiv 2023

[10] [10]

and Kiyavash, N

Etesami, J. and Kiyavash, N. (2014). Directed information graphs: A generalization of linear dynamical graphs. In 2014 American control conference, 2563--2568. IEEE

work page 2014

[11] [11]

Geweke, J. (1982). Measurement of linear dependence and feedback between multiple time series. Journal of the American statistical association, 77(378), 304--313

work page 1982

[12] [12]

Goldenshluger, A., Zeevi, A., et al. (2001). Nonasymptotic bounds for autoregressive time series modeling. The Annals of Statistics, 29(2), 417--444

work page 2001

[13] [13]

Granger, C.W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society, 424--438

work page 1969

[14] [14]

and Loos, S.A

Hempel, T. and Loos, S.A. (2024). A simple reconstruction method to infer nonreciprocal interactions and local driving in complex systems. arXiv preprint arXiv:2403.09243

work page arXiv 2024

[15] [15]

and Johnson, C.R

Horn, R.A. and Johnson, C.R. (2012). Matrix analysis. Cambridge university press

work page 2012

[16] [16]

Jiao, J., Permuter, H.H., Zhao, L., Kim, Y.H., and Weissman, T. (2013). Universal estimation of directed information. IEEE Transactions on Information Theory, 59(10), 6220--6242

work page 2013

[17] [17]

Kailath, T., Sayed, A.H., and Hassibi, B. (2000). Linear estimation. Prentice-Hall information and system sciences series. Prentice Hall, Upper Saddle River, N.J

work page 2000

[18] [18]

Kramer, G. (1998). Directed information for channels with feedback, volume 11. Hartung-Gorre Konstanz, Germany

work page 1998

[19] [19]

Lamperski, A. (2023). Nonasymptotic pointwise and worst-case bounds for classical spectrum estimators. IEEE Transactions on Signal Processing, 71, 4273--4287

work page 2023

[20] [20]

and Lamperski, A

Lee, B. and Lamperski, A. (2020). Non-asymptotic closed-loop system identification using autoregressive processes and hankel model reduction. In 2020 59th IEEE Conference on Decision and Control (CDC), 3419--3424. IEEE

work page 2020

[21] [21]

Marko, H. (1973). The bidirectional communication theory-a generalization of information theory. IEEE Transactions on Communications, 21(12), 1345--1351

work page 1973

[22] [22]

Massey, J. et al. (1990). Causality, feedback and directed information. In Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), volume 2, 1

work page 1990

[23] [23]

Permuter, H.H., Kim, Y.H., and Weissman, T. (2011). Interpretations of directed information in portfolio theory, data compression, and hypothesis testing. IEEE Transactions on Information Theory, 57(6), 3248--3259

work page 2011

[24] [24]

(2011 a )

Quinn, C.J., Coleman, T.P., Kiyavash, N., and Hatsopoulos, N.G. (2011 a ). Estimating the directed information to infer causal relationships in ensemble neural spike train recordings. Journal of computational neuroscience, 30, 17--44

work page 2011

[25] [25]

(2011 b )

Quinn, C.J., Kiyavash, N., and Coleman, T.P. (2011 b ). Equivalence between minimal generative model graphs and directed information graphs. In 2011 IEEE International Symposium on Information Theory Proceedings, 293--297. IEEE

work page 2011

[26] [26]

Quinn, C.J., Kiyavash, N., and Coleman, T.P. (2015). Directed information graphs. IEEE Transactions on information theory, 61(12), 6887--6909

work page 2015

[27] [27]

Quinn, C.J., Pinar, A., and Kiyavash, N. (2017). Bounded-degree connected approximations of stochastic networks. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 3(2), 79--88

work page 2017

[28] [28]

(2007 a )

Rao, A., Hero III, A.O., States, D.J., and Engel, J.D. (2007 a ). Motif discovery in tissue-specific regulatory sequences using directed information. EURASIP Journal on Bioinformatics and Systems Biology, 2007(1), 13853

work page 2007

[29] [29]

(2007 b )

Rao, A., Hero III, A.O., States, D.J., and Engel, J.D. (2007 b ). Using directed information to build biologically relevant influence networks. In Computational Systems Bioinformatics: (Volume 6), 145--156. World Scientific

work page 2007

[30] [30]

Schreiber, T. (2000). Measuring information transfer. Physical review letters, 85(2), 461

work page 2000

[31] [31]

Subramanian, V.R., Lamperski, A., and Salapaka, M.V. (2021). Effects of data corruption on network identification using directed information. IEEE Transactions on Automatic Control

work page 2021

[32] [32]

(2017 a )

Tanaka, T., Esfahani, P.M., and Mitter, S.K. (2017 a ). Lqg control with minimum directed information: Semidefinite programming approach. IEEE Transactions on Automatic Control, 63(1), 37--52

work page 2017

[33] [33]

(2017 b )

Tanaka, T., Skoglund, M., Sandberg, H., and Johansson, K.H. (2017 b ). Directed information and privacy loss in cloud-based control. In 2017 American control conference (ACC), 1666--1672. IEEE

work page 2017

[34] [34]

Tsur, D., Aharoni, Z., Goldfeld, Z., and Permuter, H. (2023). Neural estimation and optimization of directed information over continuous spaces. IEEE Transactions on Information Theory

work page 2023

[35] [35]

and Galstyan, A

Ver Steeg, G. and Galstyan, A. (2012). Information transfer in social media. In Proceedings of the 21st international conference on World Wide Web, 509--518

work page 2012

[36] [36]

Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press

work page 2018

[37] [37]

Wainwright, M.J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press

work page 2019

[38] [38]

Wang, Z., Liang, Y., Zhu, D.C., and Li, T. (2018). The relationship of discrete dcm and directed information in fmri-based causality analysis. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 4(1), 3--13

work page 2018

[39] [39]

and Masani, P

Wiener, N. and Masani, P. (1957). The prediction theory of multivariate stochastic processes. Acta Mathematica, 98(1), 111--150

work page 1957

[40] [40]

Young, J., Neveu, C.L., Byrne, J.H., and Aazhang, B. (2021). Inferring functional connectivity through graphical directed information. Journal of neural engineering, 18(4), 046019

work page 2021