TENDE: Transfer Entropy Neural Diffusion Estimation

Giulio Franzese; Maurizio Filippone; Mustapha Bounoua; Pietro Michiardi; Simon Pedro Galeano Munoz

arxiv: 2510.14096 · v3 · submitted 2025-10-15 · 💻 cs.LG

TENDE: Transfer Entropy Neural Diffusion Estimation

Simon Pedro Galeano Munoz , Mustapha Bounoua , Giulio Franzese , Pietro Michiardi , Maurizio Filippone This is my paper

Pith reviewed 2026-05-18 06:42 UTC · model grok-4.3

classification 💻 cs.LG

keywords transfer entropydiffusion modelsneural estimationconditional mutual informationtime seriesinformation flowscore-based models

0 comments

The pith

TENDE estimates transfer entropy by learning conditional score functions with diffusion models and minimal assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Transfer entropy quantifies directed information flow between time series and matters for understanding causal influences in neuroscience, finance, and complex systems. Existing estimators often fail in high dimensions, demand strong distributional assumptions, or need impractically large datasets. The paper proposes TENDE to compute transfer entropy as conditional mutual information by training score-based diffusion models on the relevant conditional distributions. This yields estimates that require fewer assumptions about the data and converge with practical sample sizes. Experiments on synthetic benchmarks and real data show higher accuracy and robustness than previous neural and non-neural methods.

Core claim

TENDE estimates transfer entropy through conditional mutual information by training score-based diffusion models to learn the score functions of the relevant conditional distributions, thereby achieving flexible and scalable estimation that makes minimal assumptions about the underlying data-generating process.

What carries the argument

Score functions of conditional distributions learned via score-based diffusion models, used to approximate the terms in the conditional mutual information expression for transfer entropy.

If this is right

Transfer entropy estimation becomes feasible for high-dimensional time series without requiring exponentially large datasets.
Applications in neuroscience and finance can proceed without imposing restrictive distributional assumptions on the observed variables.
Estimation remains accurate and stable even when sample sizes are moderate, improving reliability for real-world noisy recordings.
The method supports analysis of longer or more complex time series that exceed the practical reach of earlier estimators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same diffusion-based score estimation could be reused to compute other conditional information quantities such as directed information.
Combining TENDE with existing dynamical models might allow joint inference of system dynamics and information flow.
Testing the estimator on irregularly sampled or partially observed time series would reveal whether the diffusion approach tolerates missing data better than alternatives.

Load-bearing premise

Score-based diffusion models can accurately recover the score functions of the conditional distributions from finite data samples without introducing bias into the transfer entropy estimate.

What would settle it

On synthetic time series where the true transfer entropy value is known exactly by construction, compare TENDE estimates against that ground truth and against estimates from competing neural methods.

read the original abstract

Transfer entropy measures directed information flow in time series, and it has become a fundamental quantity in applications spanning neuroscience, finance, and complex systems analysis. However, existing estimation methods suffer from the curse of dimensionality, require restrictive distributional assumptions, or need exponentially large datasets for reliable convergence. We address these limitations in the literature by proposing TENDE (Transfer Entropy Neural Diffusion Estimation), a novel approach that leverages score-based diffusion models to estimate transfer entropy through conditional mutual information. By learning score functions of the relevant conditional distributions, TENDE provides flexible, scalable estimation while making minimal assumptions about the underlying data-generating process. We demonstrate superior accuracy and robustness compared to existing neural estimators and other state-of-the-art approaches across synthetic benchmarks and real data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This combines diffusion models with transfer entropy estimation in a new way but the bias risk from finite-sample score estimates is the part worth watching.

read the letter

Colleague, The one thing to flag is that this work tries to estimate transfer entropy using score-based diffusion models to get at the conditional distributions for conditional mutual information. It's a fresh angle on a long-standing problem in time series analysis. They handle the limitations of prior methods reasonably well by avoiding strong distributional assumptions and targeting scalability in high dimensions. The comparisons to existing neural estimators and state-of-the-art approaches on both synthetic benchmarks and real data give some evidence that it performs better in accuracy and robustness. That's the kind of practical validation that matters for adoption. Where it falls short is in addressing how errors in the learned score functions might affect the transfer entropy calculation. Finite data means the scores won't be perfect, and in high dimensions those imperfections can introduce bias that doesn't go away easily. The central claim depends on the diffusion model recovering the necessary distributions accurately enough, but without more on error analysis or sensitivity to that, it's hard to be fully confident in the results. This kind of method would appeal to people in neuroscience or complex systems who analyze directed flows in large datasets. Anyone looking for ML-based tools to compute information measures in time series could get use out of the ideas here. I'd send it for peer review. The combination is new enough and the problem important enough that referees should have a look, even if they end up asking for more on the bias issue.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes TENDE, a method that leverages score-based diffusion models to estimate transfer entropy (TE) by approximating it via conditional mutual information (CMI) computed from learned score functions of the relevant conditional distributions p(y_{t+1} | y_past, x_past) and marginals. It claims this yields flexible, scalable estimation with minimal assumptions on the data-generating process, along with superior accuracy and robustness relative to existing neural estimators on synthetic benchmarks and real data.

Significance. If the central claims hold after addressing the identified gaps, the work would provide a practically useful advance for TE estimation in high-dimensional time series, addressing known limitations such as the curse of dimensionality in applications like neuroscience and finance. The integration of diffusion-based score matching with information-theoretic estimation is a promising direction, and any reproducible code or parameter-free derivations would strengthen its contribution.

major comments (2)

[§3] §3 (Method): The derivation that learned score functions of the conditional distributions yield a consistent estimator of CMI (and thus TE) lacks an explicit error-propagation analysis or convergence guarantee. Finite-sample score-matching bias is known to persist in high dimensions and could systematically distort the resulting information-theoretic quantity; this assumption is load-bearing for the superiority claims but is not justified theoretically or via bounds.
[§5.1, Table 2] §5.1 and Table 2 (Experiments): The reported accuracy improvements over neural baselines do not include controls for score-estimation error (e.g., varying diffusion steps, network capacity, or sample size) or statistical significance tests on the TE estimates. Without these, it is impossible to confirm that observed gains arise from the diffusion approach rather than uncontrolled bias or variance in the CMI computation.

minor comments (2)

[Abstract, §1] The abstract and introduction should explicitly state the precise definition of TE used (e.g., the standard Schreiber formulation) and the exact CMI expression implemented via scores.
[§2, §3] Notation for the time-series variables (y_past, x_past) is introduced without a clear diagram or pseudocode showing the conditioning sets; this reduces clarity for readers unfamiliar with TE.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the paper where appropriate to strengthen the theoretical discussion and experimental controls.

read point-by-point responses

Referee: [§3] §3 (Method): The derivation that learned score functions of the conditional distributions yield a consistent estimator of CMI (and thus TE) lacks an explicit error-propagation analysis or convergence guarantee. Finite-sample score-matching bias is known to persist in high dimensions and could systematically distort the resulting information-theoretic quantity; this assumption is load-bearing for the superiority claims but is not justified theoretically or via bounds.

Authors: We appreciate this observation. Section 3 derives the estimator by expressing CMI (and thus TE) in terms of the learned score functions of the relevant conditional distributions, leveraging the fact that score-based diffusion models can approximate the score of p(y_{t+1} | y_past, x_past) and the marginals. While the literature on score matching provides consistency guarantees under suitable conditions, we acknowledge that the current manuscript does not include an explicit finite-sample error-propagation analysis or high-dimensional convergence bounds for the resulting information-theoretic estimates. This is a valid gap. In the revised version we will add a dedicated paragraph in §3 discussing potential bias propagation from score estimation errors, citing relevant convergence results for diffusion models, and clarifying that the superiority claims rest primarily on the empirical benchmarks rather than a complete theoretical guarantee. revision: partial
Referee: [§5.1, Table 2] §5.1 and Table 2 (Experiments): The reported accuracy improvements over neural baselines do not include controls for score-estimation error (e.g., varying diffusion steps, network capacity, or sample size) or statistical significance tests on the TE estimates. Without these, it is impossible to confirm that observed gains arise from the diffusion approach rather than uncontrolled bias or variance in the CMI computation.

Authors: We agree that stronger experimental controls are needed to isolate the contribution of the diffusion-based approach. In the revised manuscript we will augment §5.1 with additional ablation studies that systematically vary the number of diffusion steps, network capacity, and training sample size while reporting the resulting TE estimation errors. We will also add statistical significance testing (e.g., paired Wilcoxon tests with bootstrap confidence intervals) on the differences between TENDE and the neural baselines across the synthetic benchmarks in Table 2. These revisions will be included in the next version of the paper. revision: yes

Circularity Check

0 steps flagged

TENDE derivation is self-contained; no reduction of claims to inputs by construction

full rationale

The paper introduces TENDE as a method that learns score functions of conditional distributions via diffusion models and combines them into a conditional mutual information estimator for transfer entropy. This follows standard practice of using score matching for flexible density estimation followed by information-theoretic functionals; the central claim is an empirical demonstration of accuracy on benchmarks rather than a mathematical derivation that collapses to fitted parameters or self-citations. No load-bearing step equates the output estimate to its training objective by definition, and external benchmarks are invoked for validation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the ability of diffusion models to learn conditional score functions; this is treated as a domain assumption rather than a derived result.

free parameters (1)

Diffusion model neural network parameters
Parameters of the score network are fitted to data during training and directly affect the estimated transfer entropy.

axioms (1)

domain assumption Score functions of conditional distributions can be learned accurately by diffusion models from observed time series.
This premise is required for the estimation procedure to produce valid transfer entropy values.

pith-pipeline@v0.9.0 · 5659 in / 1135 out tokens · 35229 ms · 2026-05-18T06:42:37.374551+00:00 · methodology

TENDE: Transfer Entropy Neural Diffusion Estimation

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)