TENDE: Transfer Entropy Neural Diffusion Estimation
Pith reviewed 2026-05-18 06:42 UTC · model grok-4.3
The pith
TENDE estimates transfer entropy by learning conditional score functions with diffusion models and minimal assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TENDE estimates transfer entropy through conditional mutual information by training score-based diffusion models to learn the score functions of the relevant conditional distributions, thereby achieving flexible and scalable estimation that makes minimal assumptions about the underlying data-generating process.
What carries the argument
Score functions of conditional distributions learned via score-based diffusion models, used to approximate the terms in the conditional mutual information expression for transfer entropy.
If this is right
- Transfer entropy estimation becomes feasible for high-dimensional time series without requiring exponentially large datasets.
- Applications in neuroscience and finance can proceed without imposing restrictive distributional assumptions on the observed variables.
- Estimation remains accurate and stable even when sample sizes are moderate, improving reliability for real-world noisy recordings.
- The method supports analysis of longer or more complex time series that exceed the practical reach of earlier estimators.
Where Pith is reading between the lines
- The same diffusion-based score estimation could be reused to compute other conditional information quantities such as directed information.
- Combining TENDE with existing dynamical models might allow joint inference of system dynamics and information flow.
- Testing the estimator on irregularly sampled or partially observed time series would reveal whether the diffusion approach tolerates missing data better than alternatives.
Load-bearing premise
Score-based diffusion models can accurately recover the score functions of the conditional distributions from finite data samples without introducing bias into the transfer entropy estimate.
What would settle it
On synthetic time series where the true transfer entropy value is known exactly by construction, compare TENDE estimates against that ground truth and against estimates from competing neural methods.
read the original abstract
Transfer entropy measures directed information flow in time series, and it has become a fundamental quantity in applications spanning neuroscience, finance, and complex systems analysis. However, existing estimation methods suffer from the curse of dimensionality, require restrictive distributional assumptions, or need exponentially large datasets for reliable convergence. We address these limitations in the literature by proposing TENDE (Transfer Entropy Neural Diffusion Estimation), a novel approach that leverages score-based diffusion models to estimate transfer entropy through conditional mutual information. By learning score functions of the relevant conditional distributions, TENDE provides flexible, scalable estimation while making minimal assumptions about the underlying data-generating process. We demonstrate superior accuracy and robustness compared to existing neural estimators and other state-of-the-art approaches across synthetic benchmarks and real data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TENDE, a method that leverages score-based diffusion models to estimate transfer entropy (TE) by approximating it via conditional mutual information (CMI) computed from learned score functions of the relevant conditional distributions p(y_{t+1} | y_past, x_past) and marginals. It claims this yields flexible, scalable estimation with minimal assumptions on the data-generating process, along with superior accuracy and robustness relative to existing neural estimators on synthetic benchmarks and real data.
Significance. If the central claims hold after addressing the identified gaps, the work would provide a practically useful advance for TE estimation in high-dimensional time series, addressing known limitations such as the curse of dimensionality in applications like neuroscience and finance. The integration of diffusion-based score matching with information-theoretic estimation is a promising direction, and any reproducible code or parameter-free derivations would strengthen its contribution.
major comments (2)
- [§3] §3 (Method): The derivation that learned score functions of the conditional distributions yield a consistent estimator of CMI (and thus TE) lacks an explicit error-propagation analysis or convergence guarantee. Finite-sample score-matching bias is known to persist in high dimensions and could systematically distort the resulting information-theoretic quantity; this assumption is load-bearing for the superiority claims but is not justified theoretically or via bounds.
- [§5.1, Table 2] §5.1 and Table 2 (Experiments): The reported accuracy improvements over neural baselines do not include controls for score-estimation error (e.g., varying diffusion steps, network capacity, or sample size) or statistical significance tests on the TE estimates. Without these, it is impossible to confirm that observed gains arise from the diffusion approach rather than uncontrolled bias or variance in the CMI computation.
minor comments (2)
- [Abstract, §1] The abstract and introduction should explicitly state the precise definition of TE used (e.g., the standard Schreiber formulation) and the exact CMI expression implemented via scores.
- [§2, §3] Notation for the time-series variables (y_past, x_past) is introduced without a clear diagram or pseudocode showing the conditioning sets; this reduces clarity for readers unfamiliar with TE.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the paper where appropriate to strengthen the theoretical discussion and experimental controls.
read point-by-point responses
-
Referee: [§3] §3 (Method): The derivation that learned score functions of the conditional distributions yield a consistent estimator of CMI (and thus TE) lacks an explicit error-propagation analysis or convergence guarantee. Finite-sample score-matching bias is known to persist in high dimensions and could systematically distort the resulting information-theoretic quantity; this assumption is load-bearing for the superiority claims but is not justified theoretically or via bounds.
Authors: We appreciate this observation. Section 3 derives the estimator by expressing CMI (and thus TE) in terms of the learned score functions of the relevant conditional distributions, leveraging the fact that score-based diffusion models can approximate the score of p(y_{t+1} | y_past, x_past) and the marginals. While the literature on score matching provides consistency guarantees under suitable conditions, we acknowledge that the current manuscript does not include an explicit finite-sample error-propagation analysis or high-dimensional convergence bounds for the resulting information-theoretic estimates. This is a valid gap. In the revised version we will add a dedicated paragraph in §3 discussing potential bias propagation from score estimation errors, citing relevant convergence results for diffusion models, and clarifying that the superiority claims rest primarily on the empirical benchmarks rather than a complete theoretical guarantee. revision: partial
-
Referee: [§5.1, Table 2] §5.1 and Table 2 (Experiments): The reported accuracy improvements over neural baselines do not include controls for score-estimation error (e.g., varying diffusion steps, network capacity, or sample size) or statistical significance tests on the TE estimates. Without these, it is impossible to confirm that observed gains arise from the diffusion approach rather than uncontrolled bias or variance in the CMI computation.
Authors: We agree that stronger experimental controls are needed to isolate the contribution of the diffusion-based approach. In the revised manuscript we will augment §5.1 with additional ablation studies that systematically vary the number of diffusion steps, network capacity, and training sample size while reporting the resulting TE estimation errors. We will also add statistical significance testing (e.g., paired Wilcoxon tests with bootstrap confidence intervals) on the differences between TENDE and the neural baselines across the synthetic benchmarks in Table 2. These revisions will be included in the next version of the paper. revision: yes
Circularity Check
TENDE derivation is self-contained; no reduction of claims to inputs by construction
full rationale
The paper introduces TENDE as a method that learns score functions of conditional distributions via diffusion models and combines them into a conditional mutual information estimator for transfer entropy. This follows standard practice of using score matching for flexible density estimation followed by information-theoretic functionals; the central claim is an empirical demonstration of accuracy on benchmarks rather than a mathematical derivation that collapses to fitted parameters or self-citations. No load-bearing step equates the output estimate to its training objective by definition, and external benchmarks are invoked for validation.
Axiom & Free-Parameter Ledger
free parameters (1)
- Diffusion model neural network parameters
axioms (1)
- domain assumption Score functions of conditional distributions can be learned accurately by diffusion models from observed time series.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.