Pretrained, Frozen, Still Leaking: Auditing Cross-Encoder Attribute Transfer in EEG Foundation Models
Pith reviewed 2026-06-27 16:18 UTC · model grok-4.3
The pith
A ridge attribute decoder from one frozen EEG encoder transfers to all others via a linear bridge, with 95% CI lower bound at least 0.081.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Each single-endpoint audit clears releases that still leak spectral attributes. The decisive evidence is a cross-encoder transfer audit: a single ridge attribute decoder learned from one frozen encoder transfers, via a fitted linear bridge, to held-out-subject test splits of every other encoder, with subject-disjoint matched-control 95% CI lower bound at least 0.081 across all six BIOT/LaBraM/EEGPT directions. A sufficient condition is proved for encoders sharing nontrivial attribute-coordinate projector overlap beta, admitting a chained ridge bridge with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and beta is back-solved in [0.008, 0.198].
What carries the argument
a fitted linear bridge between encoders that enables transfer of a ridge attribute decoder across frozen models
If this is right
- Joint multi-endpoint audits can block releases that pass any single audit.
- The audit-endpoint disagreement score is positive in all eight matched-CI cells with p<0.001.
- Wiener-style noise-aware attackers, LiRA membership audits, and DP-SGD at every utility-preserving epsilon leave the attribute channel essentially unchanged.
- The cross-encoder bridge theorem supplies a release-blocking criterion grounded in embedding overlap.
Where Pith is reading between the lines
- The same linear-bridge transfer may appear in foundation models for other biosignals or modalities.
- Attribute sanitization stronger than DP-SGD on the head may be required to close the channel.
- Measuring projector overlap beta directly on new model pairs would test whether the observed range is typical.
Load-bearing premise
A fitted linear bridge between encoders accurately captures attribute transfer without requiring additional unstated conditions on embedding distributions or subject matching beyond the stated disjoint splits.
What would settle it
A new pair of encoders where no linear bridge achieves a subject-disjoint 95% CI lower bound of 0.081 or higher for the transferred ridge decoder on held-out test splits.
Figures
read the original abstract
EEG foundation-model releases are usually audited one endpoint at a time: raw-reconstruction, membership inference, identity linkage, or DP-SGD on the downstream head. We audit the same released embeddings under all four endpoints jointly, on BIOT, LaBraM, and EEGPT, and show that each single-endpoint audit clears releases that still leak spectral attributes. The decisive evidence is a cross-encoder transfer audit: a single ridge attribute decoder learned from one frozen encoder transfers, via a fitted linear bridge, to held-out-subject test splits of every other encoder, with subject-disjoint matched-control 95% CI lower bound at least 0.081 across all six BIOT/LaBraM/EEGPT directions. We prove a sufficient condition: two encoders sharing a nontrivial attribute-coordinate projector overlap beta admit a chained ridge bridge attacker with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and back-solve beta in [0.008, 0.198]. To turn the joint audit into a deployment-readable decision rule we introduce an audit-endpoint disagreement score (AEDS), prove sufficient conditions for its positivity, and bootstrap-calibrate it per cell; AEDS is positive in all eight matched-CI cells (BIOT/LaBraM/EEGPT on EEGMMI; LaBraM on Sleep-EDF, 54-channel LIMO, CHB-MIT pediatric scalp EEG) with p<0.001, while a head-level Carlini LiRA membership audit reaches AUC only 0.50-0.70. Standard defenses fail under audit: a Wiener-style noise-aware adaptive attacker, the LiRA audit, and DP-SGD at every utility-preserving epsilon in {4,8} leave the attribute channel essentially unchanged. The contribution is an audit framework that turns scattered single-endpoint defenses into a joint release decision, supported by a cross-encoder bridge theorem and adaptive-attacker, LiRA, and DP-SGD baselines; the audit licenses release-blocking, not raw-waveform exfiltration or held-out-subject identity recovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that single-endpoint audits of EEG foundation models (BIOT, LaBraM, EEGPT) miss spectral attribute leakage, which is revealed by a cross-encoder transfer audit: a ridge attribute decoder trained on one frozen encoder transfers via a fitted linear bridge to held-out-subject splits of the others, yielding a subject-disjoint matched-control 95% CI lower bound of at least 0.081 across all six directions. It proves a sufficient condition for a chained ridge bridge attacker based on attribute-coordinate projector overlap beta (back-solved to [0.008, 0.198]), introduces a bootstrap-calibrated audit-endpoint disagreement score (AEDS) that is positive in all eight cells, and shows that Wiener-style, LiRA, and DP-SGD defenses leave the attribute channel intact.
Significance. If the cross-encoder transfer result and the supporting theorem hold after verification of the distributional assumptions, the work supplies a joint audit framework stronger than isolated membership or reconstruction attacks, directly supporting release-blocking decisions for EEG foundation models.
major comments (3)
- [Abstract] Abstract (sufficient condition paragraph): the proof states a centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0 and then back-solves beta in [0.008, 0.198] directly from the observed transfer performance; this makes the claimed lower bound dependent on the same fitted quantities it is intended to explain, creating a circularity risk for the joint-audit conclusion.
- [Abstract] Abstract (ridge bridge derivation): the sufficient condition implicitly requires centering and bounded-gain conditions on the embedding distributions plus dominant attribute-projector overlap; the reported experiments state subject-disjoint splits and matched-control CIs but do not report checks (e.g., cross-encoder covariance spectra or residual nonlinearity tests) that would confirm these conditions hold.
- [Results] Results (AEDS and CI cells): the claim that AEDS is positive with p<0.001 in all eight matched-CI cells and that the attribute lower bound is at least 0.081 rests on the linear bridge isolating the attribute channel; without explicit verification that residual subject or dataset covariates are not driving the transfer, the load-bearing claim that single-endpoint audits are insufficient does not yet follow.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment below with clarifications and revisions to strengthen the presentation of assumptions and evidence, while preserving the core joint-audit findings.
read point-by-point responses
-
Referee: [Abstract] Abstract (sufficient condition paragraph): the proof states a centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0 and then back-solves beta in [0.008, 0.198] directly from the observed transfer performance; this makes the claimed lower bound dependent on the same fitted quantities it is intended to explain, creating a circularity risk for the joint-audit conclusion.
Authors: The theorem derives the lower bound on chained-ridge attack performance strictly as a function of the (unobserved) projector overlap beta under the stated centering and bounded-gain assumptions; this derivation is independent of any particular empirical transfer value. The back-solving step is a separate, post-hoc calculation that converts the observed transfer performance into an implied range for beta to aid interpretation. We agree the abstract does not separate these steps clearly enough. In revision we will (i) state the theorem bound first without reference to the numerical interval, (ii) move the back-solving calculation and its confidence interval to the methods/appendix, and (iii) add an explicit sentence that the bound itself does not rely on the fitted quantities. revision: partial
-
Referee: [Abstract] Abstract (ridge bridge derivation): the sufficient condition implicitly requires centering and bounded-gain conditions on the embedding distributions plus dominant attribute-projector overlap; the reported experiments state subject-disjoint splits and matched-control CIs but do not report checks (e.g., cross-encoder covariance spectra or residual nonlinearity tests) that would confirm these conditions hold.
Authors: We accept that explicit diagnostic checks for the centering, bounded-gain, and approximate linearity assumptions would increase confidence in the applicability of the sufficient condition. In the revised manuscript we will add (a) cross-encoder covariance spectra for the three model pairs, (b) residual plots and a simple nonlinearity test (e.g., quadratic term significance) on the fitted bridges, and (c) confirmation that subject-disjoint splits preserve zero-mean centering after standardization. These diagnostics will appear in a new appendix subsection. revision: yes
-
Referee: [Results] Results (AEDS and CI cells): the claim that AEDS is positive with p<0.001 in all eight matched-CI cells and that the attribute lower bound is at least 0.081 rests on the linear bridge isolating the attribute channel; without explicit verification that residual subject or dataset covariates are not driving the transfer, the load-bearing claim that single-endpoint audits are insufficient does not yet follow.
Authors: The matched-control protocol already pairs each transfer trial with a same-subject, same-dataset control that receives identical preprocessing and split structure; any residual subject or dataset covariate would therefore appear equally in the control distribution, which is subtracted in the reported lower bounds. Nevertheless, to further isolate the attribute channel we will add two supplementary controls in revision: (i) attribute-label permutation tests that destroy the attribute signal while preserving all other covariates, and (ii) an additional covariate-matched subset analysis on the largest dataset (EEGMMI). These will be reported alongside the existing AEDS results. revision: partial
Circularity Check
Sufficient-condition bound on attribute overlap beta is back-solved from the same fitted ridge-bridge transfer performance it purports to explain
specific steps
-
fitted input called prediction
[Abstract]
"We prove a sufficient condition: two encoders sharing a nontrivial attribute-coordinate projector overlap beta admit a chained ridge bridge attacker with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and back-solve beta in [0.008, 0.198]."
The lower-bound expression is derived under the stated sufficient condition; beta is then obtained by inverting the observed transfer performance of the fitted linear bridge on the identical subject-disjoint test splits. The numerical claim therefore depends on the same ridge-regression outputs it is invoked to interpret, rendering the 'proof' non-independent.
full rationale
The paper states a theorem giving a lower bound on chained-ridge attacker gain in terms of an attribute-projector overlap parameter beta, then immediately back-solves numerical values for beta directly from the observed cross-encoder transfer accuracies on the held-out splits. Because the reported interval [0.008, 0.198] and the claim of 'nontrivial' overlap are obtained by inverting the same linear-bridge fit that constitutes the headline empirical result, the mathematical 'proof' does not supply an independent constraint; the bound is a re-expression of the fitted quantities. No external verification of the centering, bounded-gain, or projector-dominance assumptions is reported, so the derivation chain reduces to the input data by construction. This matches the fitted-input-called-prediction pattern at the level of the central theorem.
Axiom & Free-Parameter Ledger
free parameters (2)
- beta =
[0.008, 0.198]
- ridge regularization parameter
axioms (2)
- domain assumption Encoders share a nontrivial attribute-coordinate projector overlap beta
- domain assumption Subject-disjoint matched-control splits provide valid 95% CI bounds
Reference graph
Works this paper leans on
-
[1]
B.; Mironov, I.; Talwar, K.; and Zhang, L
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H. B.; Mironov, I.; Talwar, K.; and Zhang, L. 2016. Deep learning with differential privacy. ACM Conference on Computer and Communications Security
2016
-
[2]
V.; Spognardi, A.; Villani, A.; Vitali, D.; and Felici, G
Ateniese, G.; Mancini, L. V.; Spognardi, A.; Villani, A.; Vitali, D.; and Felici, G. 2015. Hacking smart machines with smarter ones: how to extract meaningful data from machine learning classifiers. International Journal of Security and Networks
2015
-
[3]
Banville, H.; Chehab, O.; Hyv\"arinen, A.; Engemann, D.-A.; and Gramfort, A. 2021. Uncovering the structure of clinical EEG signals with self-supervised learning. Journal of Neural Engineering
2021
-
[4]
Bonaci, T.; Calo, R.; and Chizeck, H. J. 2014. App stores for the brain: privacy and security in brain-computer interfaces. IEEE International Symposium on Ethics in Science, Technology and Engineering
2014
-
[5]
Carlini, N.; Liu, C.; Erlingsson, U.; Kos, J.; and Song, D. 2019. The Secret Sharer: evaluating and testing unintended memorization in neural networks. USENIX Security Symposium
2019
-
[6]
Carlini, N.; Tram\`er, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; Oprea, A.; and Raffel, C. 2021. Extracting training data from large language models. USENIX Security Symposium
2021
-
[7]
Carlini, N.; Chien, S.; Nasr, M.; Song, S.; Terzis, A.; and Tram\`er, F. 2022. Membership inference attacks from first principles. IEEE Symposium on Security and Privacy
2022
-
[8]
Coavoux, M.; Narayan, S.; and Cohen, S. B. 2018. Privacy-preserving neural representations of text. Empirical Methods in Natural Language Processing
2018
-
[9]
M.; Weidemann, C
DelPozo-Banos, M.; Travieso, C. M.; Weidemann, C. T.; and Alonso, J. B. 2015. EEG biometric identification: a thorough exploration of the time-frequency domain. Journal of Neural Engineering
2015
-
[10]
Dwork, C.; McSherry, F.; Nissim, K.; and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference
2006
-
[11]
Elazar, Y.; and Goldberg, Y. 2018. Adversarial removal of demographic attributes from text data. Empirical Methods in Natural Language Processing
2018
-
[12]
Fredrikson, M.; Jha, S.; and Ristenpart, T. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. ACM Conference on Computer and Communications Security
2015
-
[13]
A.; and Borisov, N
Ganju, K.; Wang, Q.; Yang, W.; Gunter, C. A.; and Borisov, N. 2018. Property inference attacks on fully connected neural networks using permutation invariant representations. ACM Conference on Computer and Communications Security
2018
-
[14]
L.; Amaral, L
Goldberger, A. L.; Amaral, L. A. N.; Glass, L.; Hausdorff, J. M.; Ivanov, P. C.; Mark, R. G.; Mietus, J. E.; Moody, G. B.; Peng, C.-K.; and Stanley, H. E. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation
2000
-
[15]
Ienca, M.; and Andorno, R. 2017. Towards new human rights in the age of neuroscience and neurotechnology. Life Sciences, Society and Policy
2017
-
[16]
Jiang, W.-B.; Zhao, L.-M.; and Lu, B.-L. 2024. Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI. International Conference on Learning Representations (ICLR), spotlight
2024
-
[17]
H.; Tuk, B.; Kamphuisen, H
Kemp, B.; Zwinderman, A. H.; Tuk, B.; Kamphuisen, H. A. C.; and Obery\'e, J. J. L. 2000. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG. IEEE Transactions on Biomedical Engineering
2000
-
[18]
Kostas, D.; Aroca-Ouellette, S.; and Rudzicz, F. 2021. BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data. Frontiers in Human Neuroscience
2021
-
[19]
Mahendran, A.; and Vedaldi, A. 2015. Understanding deep image representations by inverting them. IEEE Conference on Computer Vision and Pattern Recognition
2015
-
[20]
Maiorana, E.; La Rocca, D.; and Campisi, P. 2015. On the permanence of EEG signals for biometric recognition. IEEE Transactions on Information Forensics and Security
2015
-
[21]
Marcel, S.; and Mill \'a n, J. del R. 2007. Person authentication using brainwaves. IEEE Transactions on Pattern Analysis and Machine Intelligence
2007
-
[22]
Martinovic, I.; Davies, D.; Frank, M.; Perito, D.; Ros, T.; and Song, D. 2012. On the feasibility of side-channel attacks with brain-computer interfaces. USENIX Security Symposium
2012
-
[23]
Melis, L.; Song, C.; De Cristofaro, E.; and Shmatikov, V. 2019. Exploiting unintended feature leakage in collaborative learning. IEEE Symposium on Security and Privacy
2019
-
[24]
X.; Kuleshov, V.; Shmatikov, V.; and Rush, A
Morris, J. X.; Kuleshov, V.; Shmatikov, V.; and Rush, A. M. 2023. Text embeddings reveal almost as much as text. Empirical Methods in Natural Language Processing
2023
-
[25]
Nasr, M.; Shokri, R.; and Houmansadr, A. 2019. Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning. IEEE Symposium on Security and Privacy
2019
-
[26]
Obeid, I.; and Picone, J. 2016. The Temple University Hospital EEG Data Corpus. Frontiers in Neuroscience
2016
-
[27]
Palaniappan, R.; and Mandic, D. P. 2007. Biometrics from brain electrical activity: a machine learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence
2007
-
[28]
R.; Chauveau, N.; Gaspar, C.; and Rousselet, G
Pernet, C. R.; Chauveau, N.; Gaspar, C.; and Rousselet, G. A. 2011. LIMO EEG: a toolbox for hierarchical linear modeling of electroencephalographic data. Computational Intelligence and Neuroscience
2011
-
[29]
Rousselet, G. 2016. LIMO EEG Dataset. University of Edinburgh DataShare
2016
-
[30]
J.; Hinterberger, T.; Birbaumer, N.; and Wolpaw, J
Schalk, G.; McFarland, D. J.; Hinterberger, T.; Birbaumer, N.; and Wolpaw, J. R. 2004. BCI2000: a general-purpose brain-computer interface system. IEEE Transactions on Biomedical Engineering
2004
-
[31]
Shokri, R.; Stronati, M.; Song, C.; and Shmatikov, V. 2017. Membership inference attacks against machine learning models. IEEE Symposium on Security and Privacy
2017
-
[32]
Shoeb, A. H. 2009. Application of machine learning to epileptic seizure onset detection and treatment. Ph.D. dissertation, Massachusetts Institute of Technology
2009
-
[33]
Song, C.; and Raghunathan, A. 2020. Information leakage in embedding models. ACM Conference on Computer and Communications Security
2020
-
[34]
Wang, G.; Liu, W.; He, Y.; Xu, C.; Ma, L.; and Li, H. 2024. EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals. Advances in Neural Information Processing Systems (NeurIPS)
2024
-
[35]
B.; and Sun, J
Yang, C.; Westover, M. B.; and Sun, J. 2023. BIOT: biosignal transformer for cross-data learning in the wild. Advances in Neural Information Processing Systems
2023
-
[36]
Yeom, S.; Giacomelli, I.; Fredrikson, M.; and Jha, S. 2018. Privacy risk in machine learning: analyzing the connection to overfitting. IEEE Computer Security Foundations Symposium
2018
-
[37]
Meng, L.; Jiang, X.; Huang, J.; Li, W.; Luo, H.; and Wu, D. 2024. User Identity Protection in EEG-based Brain-Computer Interfaces. arXiv preprint arXiv:2412.09854
arXiv 2024
-
[38]
Chen, X.; Jia, T.; Tu, Y.; and Wu, D. 2024. PAT: Privacy-Preserving Adversarial Transfer for Accurate, Robust and Privacy-Preserving EEG Decoding. arXiv preprint arXiv:2412.11390
Pith/arXiv arXiv 2024
-
[39]
S.; Drake, D.; Stuart, M.; and Manic, M
Cobilean, V.; Mavikumbure, H. S.; Drake, D.; Stuart, M.; and Manic, M. 2025. Investigating Membership Inference Attacks Against CNN Models for BCI Systems. IEEE Journal of Biomedical and Health Informatics, 29(11). DOI: 10.1109/JBHI.2025.3593443
-
[40]
Fuhrmeister, K.; Pelzer, A.; Radke, F.; Lechinger, J.; Gharleghi, M.; K\"ollmer, T.; and Wolf, I. 2025. Bridging Privacy and Utility: Synthesizing anonymized EEG with constraining utility functions. arXiv preprint arXiv:2509.20454
arXiv 2025
-
[41]
Tonekaboni, S.; Stempfle, L.; Fallahpour, A.; Gerych, W.; and Ghassemi, M. 2025. An Investigation of Memorization Risk in Healthcare Foundation Models. NeurIPS 2025 Workshop on Reliable and Responsible Foundation Models; arXiv:2510.12950
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.