Embedded Polygon Symbolic Transfer Entropy (EPSTE): A Geometric Token and Deep Learning Approach to Estimating Transfer Entropy in Neuroimaging Time Series
Pith reviewed 2026-06-26 12:38 UTC · model grok-4.3
The pith
EPSTE turns local time series triplets into geometric polygon symbols to estimate transfer entropy, recovering directed MEG interactions near perfectly at the pair level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that reframing transfer entropy estimation around structured symbolic representations of local temporal morphology—specifically geometric primitives from triplets of samples—enables a deep learning model to predict surrogate-validated transfer entropy values. On source-reconstructed MEG data parcellated with the AAL90 atlas, aggregation across trials and channel pairs produces stable directed dependencies, with EPSTE achieving near-perfect recovery of ground-truth directed structure and significantly lower absolute error than the identical-architecture symbolic baseline.
What carries the argument
Embedded Polygon Symbolic Transfer Entropy (EPSTE), which decomposes neural time series into sequences of geometric primitives from local triplets, discretizes them into symbolic tokens, and applies an attention-based recurrent network under multiple-instance learning to predict transfer entropy.
If this is right
- Local window-level predictions remain noisy, yet aggregation across trials and pairs yields stable directed dependencies.
- At the pair level EPSTE recovers ground-truth directed structure nearly perfectly.
- Absolute error is significantly lower than that of a standard symbolic transfer entropy baseline using identical architecture and supervision.
- Representational geometry is essential for making information-theoretic dependencies learnable in finite, noisy neuroimaging data.
Where Pith is reading between the lines
- The same triplet-based geometric tokenization could be tested on other directed information measures such as causal entropy or partial information decomposition.
- If the primitives prove robust, the pipeline might transfer to EEG or intracranial recordings with minimal retuning for their sampling rates and noise profiles.
- The method's success on MEG suggests it could serve as a template for symbolic deep learning on other nonstationary time series where raw amplitude statistics are unreliable.
Load-bearing premise
The geometric primitives derived from local triplets preserve the information needed for accurate transfer entropy estimation even after discretization and in the presence of noise and nonstationarity.
What would settle it
Running EPSTE on a fresh source-reconstructed MEG dataset with independently verified ground-truth directed connections and observing that pair-level directed structure recovery accuracy drops substantially below near-perfect levels or fails to beat the symbolic baseline.
Figures
read the original abstract
Inferring directed interactions between neural systems from EEG and MEG remains challenging due to noise, nonstationarity, and the high sample complexity of information-theoretic estimators. Transfer Entropy (TE) provides a principled and model-free measure of directed information flow; however, its practical estimation is not stable in finite data regimes (particularly as embedding dimension increases). This work introduces Embedded Polygon Symbolic Transfer Entropy (EPSTE), a framework that reframes TE estimation as a learnable problem operating on structured symbolic representations of local temporal morphology rather than raw signal amplitudes. Neural time series are decomposed into sequences of geometric primitives derived from local triplets of samples encoding complementary aspects of waveform structure such as magnitude, curvature and directional change. These primitives are discretised into symbolic tokens, yielding a compact but expressive state space over which symbolic TE is estimated. A recurrent neural network with attention-based multiple-instance learning is trained to predict surrogate-validated TE values from bags of symbolic temporal windows. The method is evaluated on source-reconstructed MEG data parcellated using the AAL90 atlas and compared against a standard symbolic baseline using identical architectures and supervision. The results demonstrate that while local window-level predictions are noisy, aggregation across trials and channel pairs yields stable directed dependencies. At the pair level, EPSTE achieves near-perfect recovery of ground-truth directed structure and significantly lower absolute error than the baseline, indicating that representational geometry plays a critical role in enabling practical learnability of information-theoretic dependencies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Embedded Polygon Symbolic Transfer Entropy (EPSTE), which decomposes neural time series into sequences of geometric primitives (magnitude, curvature, directional change) derived from local triplets of samples. These are discretized into symbolic tokens, after which an attention-based RNN with multiple-instance learning is trained to predict surrogate-validated transfer entropy values. On source-reconstructed MEG data parcellated with the AAL90 atlas, the method is compared to a standard symbolic baseline using identical architectures and supervision; the abstract claims that while window-level predictions are noisy, aggregation across trials and channel pairs yields stable directed dependencies, with EPSTE achieving near-perfect recovery of ground-truth directed structure and significantly lower absolute error than the baseline.
Significance. If the geometric tokenization demonstrably improves approximation of directed information flow over standard symbolic representations under the same supervision, the work could provide a practical route to more stable TE estimation in finite, noisy, nonstationary neuroimaging recordings where conventional estimators suffer from high sample complexity.
major comments (3)
- [Abstract] Abstract: the claim that 'at the pair level, EPSTE achieves near-perfect recovery of ground-truth directed structure' is presented without any quantitative metrics (accuracy, AUC, absolute error values, number of pairs, or statistical tests), error bars, or details on the aggregation procedure, rendering the central empirical claim unverifiable from the provided text.
- [Abstract] Abstract: the RNN is trained to predict surrogate-validated TE values obtained from an existing estimator, so the learned mapping is supervised by that estimator rather than constituting an independent derivation of TE from the symbolic process. Consequently, any performance advantage is attributable to the geometric tokens serving as better features for mimicking the baseline estimator, not to a new computation of the conditional mutual information I(Y_{t+1}; X_{t-τ:t} | Y_{t-τ:t}).
- [Abstract] Abstract: the central assumption that sequences of local-triplet geometric tokens (after discretization into a finite alphabet) retain the joint statistics over multiple lags needed for accurate TE is stated but not supported by any derivation, controlled synthetic experiments with known conditional dependencies, or ablation on lag structure. Local instantaneous shape descriptors necessarily discard amplitude resolution and longer-range temporal correlations that TE requires, especially under the nonstationarity and noise levels of source-reconstructed MEG.
minor comments (2)
- [Abstract] Abstract: the phrase 'standard symbolic baseline' is used without naming the concrete symbolic method (e.g., permutation symbols, amplitude binning) or citing its reference, preventing direct replication of the comparison.
- [Abstract] Abstract: 'AAL90 atlas' is mentioned without a reference or confirmation that it is the standard Automated Anatomical Labeling atlas with 90 regions.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating where revisions have been made to strengthen the presentation and clarify the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'at the pair level, EPSTE achieves near-perfect recovery of ground-truth directed structure' is presented without any quantitative metrics (accuracy, AUC, absolute error values, number of pairs, or statistical tests), error bars, or details on the aggregation procedure, rendering the central empirical claim unverifiable from the provided text.
Authors: We agree that the original abstract lacked the necessary quantitative support for this claim. In the revised manuscript, the abstract has been updated to include specific metrics (accuracy, AUC, absolute error), the number of pairs, statistical test results, error bars, and a brief description of the aggregation procedure across trials and channel pairs. These details are also expanded in the results section with supporting figures. revision: yes
-
Referee: [Abstract] Abstract: the RNN is trained to predict surrogate-validated TE values obtained from an existing estimator, so the learned mapping is supervised by that estimator rather than constituting an independent derivation of TE from the symbolic process. Consequently, any performance advantage is attributable to the geometric tokens serving as better features for mimicking the baseline estimator, not to a new computation of the conditional mutual information I(Y_{t+1}; X_{t-τ:t} | Y_{t-τ:t}).
Authors: The referee correctly identifies that the RNN is trained in a supervised manner to predict TE values from an existing surrogate-validated estimator rather than deriving the conditional mutual information independently. We do not claim a new theoretical computation of TE. The core contribution is the demonstration that geometric symbolic tokens yield better predictive features than standard symbolic representations under identical supervision and architecture. We have revised the abstract, introduction, and discussion to explicitly state this supervised approximation framing and to remove any implication of an independent derivation. revision: yes
-
Referee: [Abstract] Abstract: the central assumption that sequences of local-triplet geometric tokens (after discretization into a finite alphabet) retain the joint statistics over multiple lags needed for accurate TE is stated but not supported by any derivation, controlled synthetic experiments with known conditional dependencies, or ablation on lag structure. Local instantaneous shape descriptors necessarily discard amplitude resolution and longer-range temporal correlations that TE requires, especially under the nonstationarity and noise levels of source-reconstructed MEG.
Authors: We acknowledge that the original submission did not provide a formal derivation or dedicated controlled experiments to validate retention of joint statistics across lags. The empirical gains on real MEG data provide indirect support, but we agree this is insufficient. In the revision we have added a new subsection with synthetic experiments using known conditional dependencies, an ablation study varying lag structure, and quantitative comparison of information retention. On the concern about discarding amplitude and longer-range correlations, the multi-faceted geometric primitives (magnitude, curvature, directional change) are designed to encode complementary aspects of waveform morphology; the added experiments quantify how this mitigates information loss relative to the baseline under controlled nonstationarity and noise levels matching the MEG regime. revision: partial
Circularity Check
No significant circularity in the EPSTE derivation chain
full rationale
The paper frames EPSTE as a supervised learning method that extracts geometric primitives from local sample triplets, discretizes them into symbols, and trains an RNN to predict TE values already computed by surrogate validation on the same data. This is an explicit empirical approximation task rather than a first-principles derivation; the central claim is that the new representation yields lower error than a baseline under identical supervision and architecture. No equations or sections reduce any claimed result to its own inputs by construction, no self-citations are load-bearing for uniqueness or ansatz, and the evaluation compares against an external baseline. The setup therefore remains self-contained against the provided surrogate TE labels without circular reduction.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Embedded Polygon Symbolic tokens
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Analysis and Results 4.1 Experimental Setup The experimental set up is framed as a comparison of multiscale causal inference when local geometric embedding dimension of the time series is increased to produce motifs of polygons. It should be noted this experiment is designed not just to compare conventional test -train predictive accuracy of the time seri...
-
[2]
Polygon-based symbolic encoding does not increase learnability more effectively than classical amplitude-based time series representations
Discussion 5.1 Summary of Key Findings The above results show key findings; most importantly a Wilcoxon signed-rank test was used to assess whether the triangle -based representation (EPSTE) yielded significantly lower pairwise prediction errors than the baseline. The resulting p-value (p < 10⁻¹⁴) allowed rejection of the null hypothesis: “Polygon-based s...
2003
-
[3]
Conclusion This work set out to examine whether the practical estimation of Transfer Entropy from neural time series can be i mproved by altering how temporal structure is represented prior to learning. The results demonstrate that aggregation is essential: directed causal structure is not reliably observable at the level of short local windows but emerge...
-
[4]
References Abdul Razak, F. and Jensen, H.J. (2014) ‘Quantifying “causality” in complex systems: Understanding transfer entropy’, PLoS ONE, 9(6). doi:10.1371/journal.pone.0099462. Ahlfors, S.P . and Mody, M. (2016) ‘Overview of MEG’, Organizational Research Methods, 22(1), pp. 95–115. doi:10.1177/1094428116676344. Amigó, J.M. (2010) ‘Ordinal patterns ’, Sp...
-
[5]
doi:10.1016/j.sigpro.2005.07.010. Kiebel, S.J., David, O. and Friston, K.J. (2006) ‘Dynamic causal modelling of evoked responses in EEG/MEG with lead field parameterization ’, NeuroImage, 30(4), pp. 1273–1284. doi:10.1016/j.neuroimage.2005.12.055. 43 Larson, E. and Taulu, S. (2018) ‘Reducing sensor noise in MEG and EEG recordings using oversampled tempora...
-
[6]
doi:10.1103/physrevlett.85.461. Seth, A.K. (2007) ‘Causal networks in simulated neural systems ’, Cognitive Neurodynamics, 2(1), pp. 49–64. doi:10.1007/s11571-007-9031-z. Seth, A.K., Barrett, A.B. and Barnett, L. (2015) ‘Granger causality analysis in neuroscience and neuroimaging’, The Journal of Neuroscience, 35(8), pp. 3293–3297. doi:10.1523/jneurosci.4...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.