Embedded Polygon Symbolic Transfer Entropy (EPSTE): A Geometric Token and Deep Learning Approach to Estimating Transfer Entropy in Neuroimaging Time Series

David Alexander Finnigan

arxiv: 2606.21754 · v1 · pith:WBOHS42Nnew · submitted 2026-06-19 · 💻 cs.IT · math.IT· stat.ML

Embedded Polygon Symbolic Transfer Entropy (EPSTE): A Geometric Token and Deep Learning Approach to Estimating Transfer Entropy in Neuroimaging Time Series

David Alexander Finnigan This is my paper

Pith reviewed 2026-06-26 12:38 UTC · model grok-4.3

classification 💻 cs.IT math.ITstat.ML

keywords transfer entropyMEGneuroimagingsymbolic time seriesdirected connectivitygeometric primitivesdeep learninginformation theory

0 comments

The pith

EPSTE turns local time series triplets into geometric polygon symbols to estimate transfer entropy, recovering directed MEG interactions near perfectly at the pair level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Embedded Polygon Symbolic Transfer Entropy as a way to make transfer entropy estimation practical for noisy, nonstationary neural signals where standard methods fail due to high sample requirements. It decomposes signals into geometric primitives from nearby sample triplets that capture shape features like curvature and direction, converts these to symbols, and trains a recurrent network with attention to predict reliable transfer entropy from bags of windows. The approach is tested on source-reconstructed MEG data using the AAL90 atlas against a standard symbolic baseline with the same model and supervision. A sympathetic reader would care because stable estimation of directed information flow could make model-free connectivity analysis usable on real brain recordings instead of remaining limited to simulated or high-quality data.

Core claim

The central claim is that reframing transfer entropy estimation around structured symbolic representations of local temporal morphology—specifically geometric primitives from triplets of samples—enables a deep learning model to predict surrogate-validated transfer entropy values. On source-reconstructed MEG data parcellated with the AAL90 atlas, aggregation across trials and channel pairs produces stable directed dependencies, with EPSTE achieving near-perfect recovery of ground-truth directed structure and significantly lower absolute error than the identical-architecture symbolic baseline.

What carries the argument

Embedded Polygon Symbolic Transfer Entropy (EPSTE), which decomposes neural time series into sequences of geometric primitives from local triplets, discretizes them into symbolic tokens, and applies an attention-based recurrent network under multiple-instance learning to predict transfer entropy.

If this is right

Local window-level predictions remain noisy, yet aggregation across trials and pairs yields stable directed dependencies.
At the pair level EPSTE recovers ground-truth directed structure nearly perfectly.
Absolute error is significantly lower than that of a standard symbolic transfer entropy baseline using identical architecture and supervision.
Representational geometry is essential for making information-theoretic dependencies learnable in finite, noisy neuroimaging data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same triplet-based geometric tokenization could be tested on other directed information measures such as causal entropy or partial information decomposition.
If the primitives prove robust, the pipeline might transfer to EEG or intracranial recordings with minimal retuning for their sampling rates and noise profiles.
The method's success on MEG suggests it could serve as a template for symbolic deep learning on other nonstationary time series where raw amplitude statistics are unreliable.

Load-bearing premise

The geometric primitives derived from local triplets preserve the information needed for accurate transfer entropy estimation even after discretization and in the presence of noise and nonstationarity.

What would settle it

Running EPSTE on a fresh source-reconstructed MEG dataset with independently verified ground-truth directed connections and observing that pair-level directed structure recovery accuracy drops substantially below near-perfect levels or fails to beat the symbolic baseline.

Figures

Figures reproduced from arXiv: 2606.21754 by David Alexander Finnigan.

**Figure 2.** Figure 2: Scatter plot comparing predicted EPSTE values against ground [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗

**Figure 3.** Figure 3: Scatter plot comparing predicted transfer entropy values from the raw-signal symbolic baseline against groundtruth TE at the pair level for the held-out test subject. The dashed red line denotes the ideal identity relationship (y = x), while the fitted trend line reveals systematic compression toward the mean, with reduced slope and offset bias. Compared to EPSTE, predictions exhibit greater dispersion an… view at source ↗

**Figure 4.** Figure 4: Pair-level comparison of predicted transfer entropy values from the EPSTE model against ground-truth TE for the held-out test subject. The dashed red line indicates the ideal identity relationship (y = x), while the fitted trend line shows near-unity slope and minimal bias. The tight clustering around the identity line demonstrates high quantitative accuracy and preservation of dynamic range, supporting th… view at source ↗

**Figure 5.** Figure 5: Pair-level comparison between ground-truth transfer entropy values and predictions from the raw-signal baseline model for the held-out test subject. The dashed red line denotes the ideal identity relationship (y = x), while the fitted trend line indicates near-unity slope with slight residual scatter. Although aggregation yields strong linear correspondence, the broader dispersion relative to EPSTE reflect… view at source ↗

**Figure 6.** Figure 6: Heatmap of pair-level transfer entropy predictions produced by the EPSTE model for the held-out test subject (Subject 5, LSD condition). Rows correspond to source channels (x) and columns to target channels (y); colour intensity indicates predicted directed information flow. The diagonal is masked (white) to exclude self-connections. The structured banding and graded variations across channel pairs indicat… view at source ↗

**Figure 7.** Figure 7: Heatmap of pair-level transfer entropy predictions produced by the raw-signal baseline model for the held-out test subject (Subject 5, LSD condition). Rows denote source channels (x) and columns denote target channels (y), with colour intensity indicating predicted directed information flow; the diagonal is masked (white) to exclude self-connections. While the baseline recovers broad patterns of directed c… view at source ↗

**Figure 8.** Figure 8: Histogram of surrogate-normalised TE z-scores for 200 channel pairs from the held-out test subject (Subject 5). Vertical dashed lines indicate the null mean (z = 0) and ±2 standard deviations. The distribution is centred near zero, as expected under the null hypothesis of no directed interaction, with most edges falling within the null bounds and a small subset exhibiting deviations consistent with statist… view at source ↗

**Figure 9.** Figure 9: Histogram of pair-level absolute error differences for the held-out test subject (Subject 5, LSD condition), computed as |𝑒𝑟𝑟𝑜𝑟baseline|- |𝑒𝑟𝑟𝑜𝑟EPSTE| for each directed channel pair. Positive values indicate lower error under the EPSTE representation. The dashed vertical line denotes zero improvement and distribution is predominantly shifted toward positive values, indicating that EPSTE achieves lower abso… view at source ↗

**Figure 10.** Figure 10: Histogram of per-edge standard deviation of predicted Transfer Entropy across trials for the held-out test subject (Subject 5, LSD condition). Blue bars show the raw-signal baseline, while orange bars show the triangle-based EPSTE representation. EPSTE exhibits higher edgewise variability, reflecting a broader dynamic range and greater sensitivity to trial-level fluctuations, whereas the baseline distribu… view at source ↗

**Figure 11.** Figure 11: Histogram of the coefficient of variation (CV = std / |mean|) of predicted Transfer Entropy across trials for each [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: Training and validation loss curves for the raw [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗

**Figure 13.** Figure 13: Learning dynamics of the EPSTE model across training epochs, showing training loss (b [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗

read the original abstract

Inferring directed interactions between neural systems from EEG and MEG remains challenging due to noise, nonstationarity, and the high sample complexity of information-theoretic estimators. Transfer Entropy (TE) provides a principled and model-free measure of directed information flow; however, its practical estimation is not stable in finite data regimes (particularly as embedding dimension increases). This work introduces Embedded Polygon Symbolic Transfer Entropy (EPSTE), a framework that reframes TE estimation as a learnable problem operating on structured symbolic representations of local temporal morphology rather than raw signal amplitudes. Neural time series are decomposed into sequences of geometric primitives derived from local triplets of samples encoding complementary aspects of waveform structure such as magnitude, curvature and directional change. These primitives are discretised into symbolic tokens, yielding a compact but expressive state space over which symbolic TE is estimated. A recurrent neural network with attention-based multiple-instance learning is trained to predict surrogate-validated TE values from bags of symbolic temporal windows. The method is evaluated on source-reconstructed MEG data parcellated using the AAL90 atlas and compared against a standard symbolic baseline using identical architectures and supervision. The results demonstrate that while local window-level predictions are noisy, aggregation across trials and channel pairs yields stable directed dependencies. At the pair level, EPSTE achieves near-perfect recovery of ground-truth directed structure and significantly lower absolute error than the baseline, indicating that representational geometry plays a critical role in enabling practical learnability of information-theoretic dependencies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EPSTE tries a geometric token route to stable TE estimation but the abstract gives no numbers and the local-triplet representation looks likely to drop the conditional lag structure TE needs.

read the letter

The new piece here is the specific pipeline: turning local sample triplets into polygon tokens for magnitude, curvature and direction, discretizing them, then feeding bags of those tokens into an attention MIL RNN trained to output surrogate-validated TE. That combination is not just another symbolic TE variant.

It does handle the stated problem head-on. Standard TE estimators get unstable with embedding dimension on finite noisy MEG, and swapping raw amplitudes for geometric shape descriptors is a reasonable attempt to reduce sample complexity while keeping some waveform structure.

The soft spots are the ones the stress-test flags. The abstract claims near-perfect pair-level recovery of directed structure and lower absolute error than the baseline, yet supplies zero numbers, no error bars, and no description of how the surrogates were generated or how aggregation across trials was validated. Without those, the claim cannot be checked. More importantly, TE is conditional mutual information over multiple lags. Local triplets give only instantaneous geometry; after binning into a finite alphabet they discard amplitude resolution and longer temporal dependencies. Nothing shown demonstrates that the resulting discrete process has the same TE as the continuous series under the nonstationarity and noise levels of source-reconstructed MEG. The network is also supervised by an existing estimator, so it is learning to approximate that estimator rather than providing an independent derivation.

This is for connectivity analysts working with EEG/MEG who already use symbolic TE and want something more stable on real data. A reader who cares about whether geometric primitives can stand in for the joint statistics TE requires will find the framing worth examining, but only if the full paper supplies the missing quantitative checks and controlled tests on synthetic data with known conditional dependencies.

I would send it for peer review so the methods and results can be examined directly, though the current presentation leaves the central claim unverifiable.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Embedded Polygon Symbolic Transfer Entropy (EPSTE), which decomposes neural time series into sequences of geometric primitives (magnitude, curvature, directional change) derived from local triplets of samples. These are discretized into symbolic tokens, after which an attention-based RNN with multiple-instance learning is trained to predict surrogate-validated transfer entropy values. On source-reconstructed MEG data parcellated with the AAL90 atlas, the method is compared to a standard symbolic baseline using identical architectures and supervision; the abstract claims that while window-level predictions are noisy, aggregation across trials and channel pairs yields stable directed dependencies, with EPSTE achieving near-perfect recovery of ground-truth directed structure and significantly lower absolute error than the baseline.

Significance. If the geometric tokenization demonstrably improves approximation of directed information flow over standard symbolic representations under the same supervision, the work could provide a practical route to more stable TE estimation in finite, noisy, nonstationary neuroimaging recordings where conventional estimators suffer from high sample complexity.

major comments (3)

[Abstract] Abstract: the claim that 'at the pair level, EPSTE achieves near-perfect recovery of ground-truth directed structure' is presented without any quantitative metrics (accuracy, AUC, absolute error values, number of pairs, or statistical tests), error bars, or details on the aggregation procedure, rendering the central empirical claim unverifiable from the provided text.
[Abstract] Abstract: the RNN is trained to predict surrogate-validated TE values obtained from an existing estimator, so the learned mapping is supervised by that estimator rather than constituting an independent derivation of TE from the symbolic process. Consequently, any performance advantage is attributable to the geometric tokens serving as better features for mimicking the baseline estimator, not to a new computation of the conditional mutual information I(Y_{t+1}; X_{t-τ:t} | Y_{t-τ:t}).
[Abstract] Abstract: the central assumption that sequences of local-triplet geometric tokens (after discretization into a finite alphabet) retain the joint statistics over multiple lags needed for accurate TE is stated but not supported by any derivation, controlled synthetic experiments with known conditional dependencies, or ablation on lag structure. Local instantaneous shape descriptors necessarily discard amplitude resolution and longer-range temporal correlations that TE requires, especially under the nonstationarity and noise levels of source-reconstructed MEG.

minor comments (2)

[Abstract] Abstract: the phrase 'standard symbolic baseline' is used without naming the concrete symbolic method (e.g., permutation symbols, amplitude binning) or citing its reference, preventing direct replication of the comparison.
[Abstract] Abstract: 'AAL90 atlas' is mentioned without a reference or confirmation that it is the standard Automated Anatomical Labeling atlas with 90 regions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating where revisions have been made to strengthen the presentation and clarify the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'at the pair level, EPSTE achieves near-perfect recovery of ground-truth directed structure' is presented without any quantitative metrics (accuracy, AUC, absolute error values, number of pairs, or statistical tests), error bars, or details on the aggregation procedure, rendering the central empirical claim unverifiable from the provided text.

Authors: We agree that the original abstract lacked the necessary quantitative support for this claim. In the revised manuscript, the abstract has been updated to include specific metrics (accuracy, AUC, absolute error), the number of pairs, statistical test results, error bars, and a brief description of the aggregation procedure across trials and channel pairs. These details are also expanded in the results section with supporting figures. revision: yes
Referee: [Abstract] Abstract: the RNN is trained to predict surrogate-validated TE values obtained from an existing estimator, so the learned mapping is supervised by that estimator rather than constituting an independent derivation of TE from the symbolic process. Consequently, any performance advantage is attributable to the geometric tokens serving as better features for mimicking the baseline estimator, not to a new computation of the conditional mutual information I(Y_{t+1}; X_{t-τ:t} | Y_{t-τ:t}).

Authors: The referee correctly identifies that the RNN is trained in a supervised manner to predict TE values from an existing surrogate-validated estimator rather than deriving the conditional mutual information independently. We do not claim a new theoretical computation of TE. The core contribution is the demonstration that geometric symbolic tokens yield better predictive features than standard symbolic representations under identical supervision and architecture. We have revised the abstract, introduction, and discussion to explicitly state this supervised approximation framing and to remove any implication of an independent derivation. revision: yes
Referee: [Abstract] Abstract: the central assumption that sequences of local-triplet geometric tokens (after discretization into a finite alphabet) retain the joint statistics over multiple lags needed for accurate TE is stated but not supported by any derivation, controlled synthetic experiments with known conditional dependencies, or ablation on lag structure. Local instantaneous shape descriptors necessarily discard amplitude resolution and longer-range temporal correlations that TE requires, especially under the nonstationarity and noise levels of source-reconstructed MEG.

Authors: We acknowledge that the original submission did not provide a formal derivation or dedicated controlled experiments to validate retention of joint statistics across lags. The empirical gains on real MEG data provide indirect support, but we agree this is insufficient. In the revision we have added a new subsection with synthetic experiments using known conditional dependencies, an ablation study varying lag structure, and quantitative comparison of information retention. On the concern about discarding amplitude and longer-range correlations, the multi-faceted geometric primitives (magnitude, curvature, directional change) are designed to encode complementary aspects of waveform morphology; the added experiments quantify how this mitigates information loss relative to the baseline under controlled nonstationarity and noise levels matching the MEG regime. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the EPSTE derivation chain

full rationale

The paper frames EPSTE as a supervised learning method that extracts geometric primitives from local sample triplets, discretizes them into symbols, and trains an RNN to predict TE values already computed by surrogate validation on the same data. This is an explicit empirical approximation task rather than a first-principles derivation; the central claim is that the new representation yields lower error than a baseline under identical supervision and architecture. No equations or sections reduce any claimed result to its own inputs by construction, no self-citations are load-bearing for uniqueness or ansatz, and the evaluation compares against an external baseline. The setup therefore remains self-contained against the provided surrogate TE labels without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only; the framework introduces new geometric primitives but provides no explicit free parameters, axioms, or invented entities beyond the standard definition of transfer entropy and the surrogate procedure.

invented entities (1)

Embedded Polygon Symbolic tokens no independent evidence
purpose: Compact symbolic representation of local waveform morphology from triplets of samples
New discretization step described in the abstract as the core representational change.

pith-pipeline@v0.9.1-grok · 5795 in / 1310 out tokens · 22533 ms · 2026-06-26T12:38:32.242054+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 3 canonical work pages

[1]

Analysis and Results 4.1 Experimental Setup The experimental set up is framed as a comparison of multiscale causal inference when local geometric embedding dimension of the time series is increased to produce motifs of polygons. It should be noted this experiment is designed not just to compare conventional test -train predictive accuracy of the time seri...
[2]

Polygon-based symbolic encoding does not increase learnability more effectively than classical amplitude-based time series representations

Discussion 5.1 Summary of Key Findings The above results show key findings; most importantly a Wilcoxon signed-rank test was used to assess whether the triangle -based representation (EPSTE) yielded significantly lower pairwise prediction errors than the baseline. The resulting p-value (p < 10⁻¹⁴) allowed rejection of the null hypothesis: “Polygon-based s...

2003
[3]

Conclusion This work set out to examine whether the practical estimation of Transfer Entropy from neural time series can be i mproved by altering how temporal structure is represented prior to learning. The results demonstrate that aggregation is essential: directed causal structure is not reliably observable at the level of short local windows but emerge...
[4]

causality

References Abdul Razak, F. and Jensen, H.J. (2014) ‘Quantifying “causality” in complex systems: Understanding transfer entropy’, PLoS ONE, 9(6). doi:10.1371/journal.pone.0099462. Ahlfors, S.P . and Mody, M. (2016) ‘Overview of MEG’, Organizational Research Methods, 22(1), pp. 95–115. doi:10.1177/1094428116676344. Amigó, J.M. (2010) ‘Ordinal patterns ’, Sp...

work page doi:10.1371/journal.pone.0099462 2014
[5]

Kiebel, S.J., David, O

doi:10.1016/j.sigpro.2005.07.010. Kiebel, S.J., David, O. and Friston, K.J. (2006) ‘Dynamic causal modelling of evoked responses in EEG/MEG with lead field parameterization ’, NeuroImage, 30(4), pp. 1273–1284. doi:10.1016/j.neuroimage.2005.12.055. 43 Larson, E. and Taulu, S. (2018) ‘Reducing sensor noise in MEG and EEG recordings using oversampled tempora...

work page doi:10.1016/j.sigpro.2005.07.010 2005
[6]

Seth, A.K

doi:10.1103/physrevlett.85.461. Seth, A.K. (2007) ‘Causal networks in simulated neural systems ’, Cognitive Neurodynamics, 2(1), pp. 49–64. doi:10.1007/s11571-007-9031-z. Seth, A.K., Barrett, A.B. and Barnett, L. (2015) ‘Granger causality analysis in neuroscience and neuroimaging’, The Journal of Neuroscience, 35(8), pp. 3293–3297. doi:10.1523/jneurosci.4...

work page doi:10.1103/physrevlett.85.461 2007

[1] [1]

Analysis and Results 4.1 Experimental Setup The experimental set up is framed as a comparison of multiscale causal inference when local geometric embedding dimension of the time series is increased to produce motifs of polygons. It should be noted this experiment is designed not just to compare conventional test -train predictive accuracy of the time seri...

[2] [2]

Polygon-based symbolic encoding does not increase learnability more effectively than classical amplitude-based time series representations

Discussion 5.1 Summary of Key Findings The above results show key findings; most importantly a Wilcoxon signed-rank test was used to assess whether the triangle -based representation (EPSTE) yielded significantly lower pairwise prediction errors than the baseline. The resulting p-value (p < 10⁻¹⁴) allowed rejection of the null hypothesis: “Polygon-based s...

2003

[3] [3]

Conclusion This work set out to examine whether the practical estimation of Transfer Entropy from neural time series can be i mproved by altering how temporal structure is represented prior to learning. The results demonstrate that aggregation is essential: directed causal structure is not reliably observable at the level of short local windows but emerge...

[4] [4]

causality

References Abdul Razak, F. and Jensen, H.J. (2014) ‘Quantifying “causality” in complex systems: Understanding transfer entropy’, PLoS ONE, 9(6). doi:10.1371/journal.pone.0099462. Ahlfors, S.P . and Mody, M. (2016) ‘Overview of MEG’, Organizational Research Methods, 22(1), pp. 95–115. doi:10.1177/1094428116676344. Amigó, J.M. (2010) ‘Ordinal patterns ’, Sp...

work page doi:10.1371/journal.pone.0099462 2014

[5] [5]

Kiebel, S.J., David, O

doi:10.1016/j.sigpro.2005.07.010. Kiebel, S.J., David, O. and Friston, K.J. (2006) ‘Dynamic causal modelling of evoked responses in EEG/MEG with lead field parameterization ’, NeuroImage, 30(4), pp. 1273–1284. doi:10.1016/j.neuroimage.2005.12.055. 43 Larson, E. and Taulu, S. (2018) ‘Reducing sensor noise in MEG and EEG recordings using oversampled tempora...

work page doi:10.1016/j.sigpro.2005.07.010 2005

[6] [6]

Seth, A.K

doi:10.1103/physrevlett.85.461. Seth, A.K. (2007) ‘Causal networks in simulated neural systems ’, Cognitive Neurodynamics, 2(1), pp. 49–64. doi:10.1007/s11571-007-9031-z. Seth, A.K., Barrett, A.B. and Barnett, L. (2015) ‘Granger causality analysis in neuroscience and neuroimaging’, The Journal of Neuroscience, 35(8), pp. 3293–3297. doi:10.1523/jneurosci.4...

work page doi:10.1103/physrevlett.85.461 2007