PRiSE-EEG: A Prior-Guided Foundation Model with Depth-Stratified Experts for Cross-Paradigm EEG Representation Learning
Pith reviewed 2026-05-20 01:01 UTC · model grok-4.3
The pith
PRiSE-EEG learns reusable EEG representations by patching signals with cortical priors and allocating experts according to layer-wise CKA sharedness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By analyzing gradients and CKA similarities, the authors establish that shallow layers capture shared EEG features while deeper layers specialize. They then design PRiSE-EEG to form patches using weak static cortical and network priors along with dynamic channel interactions, and to place shared and specialized experts in MoE blocks via a sigmoid mapping of layer-wise CKA sharedness. This yields strong cross-paradigm results on 12 benchmarks.
What carries the argument
CKA-calibrated Depth-Stratified Experts, which allocate shared versus specialized experts across MoE Transformer blocks based on a sigmoid function of layer-wise CKA similarity.
If this is right
- Common EEG regularities are preserved in early blocks while later blocks gain specialized capacity.
- Optimization conflicts among EEG paradigms are reduced through the depth-dependent expert allocation.
- Performance improves on heterogeneous benchmarks under consistent evaluation protocols.
- Compact models can outperform dense Transformers by using CKA-derived routing instead of uniform or manual expert ratios.
Where Pith is reading between the lines
- Similar CKA-based routing might reduce conflicts in other heterogeneous signal domains such as multi-site recordings.
- Checking whether the depth transition stays stable when more datasets are added would test how general the allocation rule is.
- Replacing the fixed sigmoid with a learned router could further adapt the sharing pattern during training.
Load-bearing premise
The depth-wise pattern of decreasing cross-paradigm similarity seen in CKA analysis on the six training datasets will hold for other EEG data and justify using a fixed sigmoid-based expert split.
What would settle it
A new set of EEG benchmarks where the CKA sharedness does not decrease consistently with depth, or where the PRiSE-EEG model fails to improve over dense baselines, would challenge the central design choice.
Figures
read the original abstract
EEG foundation models aim to learn reusable representations across heterogeneous paradigms, yet existing approaches often use uniform adaptation mechanisms and are typically reported under separate downstream fine-tuning protocols. In this work, we first analyze dense EEG Transformers from two complementary perspectives. Gradient similarity across six downstream datasets reveals substantial optimization conflicts among EEG paradigms, while CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition: shallow layers preserve stronger cross-paradigm similarity, whereas deeper layers become increasingly specialized. Motivated by these findings, we propose \textbf{PRiSE-EEG}, a prior-guided EEG foundation model with CKA-calibrated Depth-Stratified Experts. PRiSE-EEG forms continuous multi-channel EEG patches using weak static cortical and network priors and dynamic short-time channel interactions, then allocates shared and specialized experts across MoE Transformer blocks according to a sigmoid mapping from layer-wise CKA sharedness. This design preserves common EEG regularities in early blocks while assigning more specialized capacity to later task-specific transformations. Experiments on 12 public EEG benchmarks show strong cross-paradigm performance under matched protocols. Compact ablations further show that CKA-derived expert allocation improves over dense Transformers, uniform MoE, and manually fixed shared-specific expert ratios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes PRiSE-EEG, a foundation model for cross-paradigm EEG representation learning. It constructs continuous multi-channel EEG patches from weak static cortical/network priors combined with dynamic short-time channel interactions, then deploys a MoE Transformer with depth-stratified experts whose shared-versus-specialized allocation is set by a sigmoid mapping of layer-wise CKA similarity. The design is motivated by gradient-similarity analysis revealing optimization conflicts across paradigms and by CKA analysis on mixed batches from six datasets showing a consistent shallow-to-deep transition from shared to specialized representations. Experiments on 12 public EEG benchmarks under matched protocols, together with compact ablations, are reported to demonstrate gains over dense Transformers and uniform MoE baselines.
Significance. If the empirical improvements are shown to be statistically robust and the CKA-derived allocation rule generalizes, the work would supply a concrete, data-driven mechanism for balancing shared and specialized capacity inside EEG foundation models, directly addressing the paradigm-heterogeneity problem that uniform adaptation strategies have left unresolved.
major comments (2)
- [Motivation and Method (CKA analysis and expert allocation)] The sigmoid expert-allocation rule is calibrated exclusively on layer-wise CKA values obtained from mixed-paradigm batches drawn from the same six datasets used for the gradient and CKA analysis. No verification is provided that the observed depth-wise sharedness transition (and therefore the same sigmoid parameters) remains near-optimal when the paradigm mix is expanded to the remaining six benchmarks or altered in other ways; if the transition point or slope shifts, the allocation becomes an arbitrary hyperparameter and the performance advantage over uniform MoE cannot be confidently attributed to the CKA calibration.
- [Experiments and Ablations] The central performance claims rest on results from 12 benchmarks and compact ablations that are presented without error bars, without statistical significance tests, and without complete training details or hyperparameter specifications. This absence prevents assessment of whether the reported gains are reliable or could be explained by other design choices such as the prior-guided patch construction.
minor comments (1)
- [Method] Notation for the sigmoid mapping parameters and the precise definition of 'sharedness' used to compute CKA should be introduced earlier and used consistently throughout the method and ablation sections.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. We address each major comment below and outline the revisions we will make to improve the manuscript.
read point-by-point responses
-
Referee: The sigmoid expert-allocation rule is calibrated exclusively on layer-wise CKA values obtained from mixed-paradigm batches drawn from the same six datasets used for the gradient and CKA analysis. No verification is provided that the observed depth-wise sharedness transition (and therefore the same sigmoid parameters) remains near-optimal when the paradigm mix is expanded to the remaining six benchmarks or altered in other ways; if the transition point or slope shifts, the allocation becomes an arbitrary hyperparameter and the performance advantage over uniform MoE cannot be confidently attributed to the CKA calibration.
Authors: We appreciate this observation on the scope of the CKA calibration. The six datasets selected for the gradient and CKA analyses represent a diverse collection of EEG paradigms (motor imagery, P300, SSVEP, and others) that capture the primary sources of heterogeneity present across the full 12 benchmarks. The consistent shallow-to-deep transition in cross-paradigm similarity appears to reflect a general property of how dense Transformers process mixed EEG data rather than a narrow dataset artifact. To directly verify robustness, we will add CKA analyses on mixed batches that include the remaining six benchmarks, recompute the sigmoid parameters, and report the resulting performance deltas versus uniform MoE in the revised manuscript. This will strengthen the claim that the allocation rule is data-driven and generalizable. revision: yes
-
Referee: The central performance claims rest on results from 12 benchmarks and compact ablations that are presented without error bars, without statistical significance tests, and without complete training details or hyperparameter specifications. This absence prevents assessment of whether the reported gains are reliable or could be explained by other design choices such as the prior-guided patch construction.
Authors: We agree that the current experimental reporting lacks the statistical detail and transparency needed for rigorous evaluation. In the revised manuscript we will include error bars (standard deviation over three independent runs with different seeds) for all main results and ablations. We will add paired statistical significance tests (t-tests or Wilcoxon signed-rank) comparing PRiSE-EEG against the dense Transformer and uniform MoE baselines. A new appendix will provide complete hyperparameter tables, optimizer settings, learning-rate schedules, batch sizes, and hardware specifications for every experiment. These additions will allow readers to assess whether the observed gains are robust and attributable to the CKA-calibrated depth-stratified experts. revision: yes
Circularity Check
CKA-derived sigmoid expert allocation is a data-informed design choice but does not reduce the central claim by construction
specific steps
-
fitted input called prediction
[Abstract (CKA analysis and PRiSE-EEG proposal)]
"Gradient similarity across six downstream datasets reveals substantial optimization conflicts among EEG paradigms, while CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition: shallow layers preserve stronger cross-paradigm similarity, whereas deeper layers become increasingly specialized. Motivated by these findings, we propose PRiSE-EEG, a prior-guided EEG foundation model with CKA-calibrated Depth-Stratified Experts. ... allocates shared and specialized experts across MoE Transformer blocks according to a sigmoid mapping from layer-wise CKA sharedness."
The sigmoid mapping is calibrated directly from the observed CKA transition on the same six datasets used for gradient/CKA analysis and downstream training. While not a direct fit of accuracy, the expert allocation rule is statistically informed by the input data's layer-wise similarity statistics, so performance gains over uniform MoE baselines are partly attributable to this post-observation design choice rather than an independent prior.
full rationale
The paper computes CKA on held-out mixed-paradigm batches from six datasets solely to observe the depth-wise sharedness pattern and then selects a sigmoid mapping to allocate experts. This is a modest data-dependent hyperparameter choice rather than fitting the final performance metric or redefining the result in terms of itself. The core contributions (continuous patch formation with priors, MoE blocks, and cross-paradigm evaluation on 12 benchmarks) retain independent content, so the derivation chain is largely self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- sigmoid mapping parameters
axioms (2)
- domain assumption Gradient similarity across six downstream datasets reveals substantial optimization conflicts among EEG paradigms
- domain assumption CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition from shared to specialized representations
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
allocates shared and specialized experts across MoE Transformer blocks according to a sigmoid mapping from layer-wise CKA sharedness
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Finding frequent items in data streams.Theoretical Computer Science, 312(1):3–15, 2004. ISSN 0304-3975. Automata, Languages and Programming
work page 2004
-
[2]
Emotion estimation from eeg signals during listening to quran using psd features
Mashail Alsolamy and Anas Fattouh. Emotion estimation from eeg signals during listening to quran using psd features. InCSIT, pages 1–5, 2016
work page 2016
-
[3]
Diego Alvarez-Estevez and Roselyne M. Rijsman. Inter-database validation of a deep learning approach for automatic sleep scoring.PLOS ONE, 16(8):1–27, 08 2021
work page 2021
-
[4]
Vivit: A video vision transformer
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lu ˇci´c, and Cordelia Schmid. Vivit: A video vision transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 6836–6846, 2021
work page 2021
-
[5]
Christina Artemenko. Developmental fronto-parietal shift of brain activation during mental arithmetic across the lifespan: A registered report protocol.PLOS ONE, 16(8):1–13, 08 2021
work page 2021
-
[6]
Is space-time attention all you need for video understanding? InIcml, volume 2, page 4, 2021
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. Is space-time attention all you need for video understanding? InIcml, volume 2, page 4, 2021
work page 2021
-
[7]
Benjamin Blankertz, Guido Dornhege, Matthias Krauledat, Klaus-Robert Müller, and Gabriel Curio. The non-invasive berlin brain–computer interface: Fast acquisition of effective perfor- mance in untrained subjects.NeuroImage, 37(2):539–550, 2007
work page 2007
-
[8]
O Boucher, CH Bastien, G Muckle, D Saint-Amour, SW Jacobson, and JL Jacobson. Be- havioural correlates of the p3b event-related potential in school-age children.International Journal of Psychophysiology, 76(3):148–157, 2010
work page 2010
-
[9]
Matthew J Brookes, Mark Woolrich, Henry Luckhoo, Darren Price, Joanne R Hale, Mary C Stephenson, Gareth R Barnes, Stephen M Smith, and Peter G Morris. Investigating the electrophysiological basis of resting state networks using magnetoencephalography.Proceedings of the National Academy of Sciences, 108(40):16783–16788, 2011
work page 2011
-
[10]
Eeg-gnn: Graph neural networks for classification of electroencephalogram (eeg) signals
Andac Demir, Toshiaki Koike-Akino, Ye Wang, Masaki Haruna, and Deniz Erdogmus. Eeg-gnn: Graph neural networks for classification of electroencephalogram (eeg) signals. InEMBC, pages 1061–1067, 2021
work page 2021
-
[11]
Siena scalp eeg database.physionet, 10:493, 2020
Paolo Detti. Siena scalp eeg database.physionet, 10:493, 2020
work page 2020
-
[12]
Harini Eavani, Meng Kang Hsieh, Yang An, Guray Erus, Lori Beason-Held, Susan Resnick, and Christos Davatzikos. Capturing heterogeneous group differences using mixture-of-experts: Application to a study of aging.Neuroimage, 125:498–514, 2016
work page 2016
-
[13]
Lingzhong Fan, Hai Li, Junjie Zhuo, Yu Zhang, Jiaojian Wang, Liangfu Chen, Zhengyi Yang, Congying Chu, Sangma Xie, Angela R Laird, et al. The human brainnetome atlas: a new brain atlas based on connectional architecture.Cerebral cortex, 26(8):3508–3526, 2016
work page 2016
-
[14]
William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23 (120):1–39, 2022
work page 2022
-
[15]
Xuange Gao, Danli Wang, and Yanyan Zhao. Eegmoe: A domain-decoupled mixture-of-experts model for self-supervised eeg representation learning.IEEE Transactions on Neural Networks and Learning Systems, 2026
work page 2026
-
[16]
Alessandro T Gifford, Kshitij Dwivedi, Gemma Roig, and Radoslaw M Cichy. A large and rich eeg dataset for modeling human visual object recognition.NeuroImage, 264:119754, 2022
work page 2022
-
[17]
Alexandre Gramfort, Martin Luessi, Eric Larson, Denis A. Engemann, Daniel Strohmeier, Christian Brodbeck, Roman Goj, Mainak Jas, Teon Brooks, Lauri Parkkonen, and Matti Hämäläinen. Meg and eeg data analysis with mne-python.Frontiers in Neuroscience, 7, 2013. 10
work page 2013
-
[18]
Human eeg recordings for 1,854 concepts presented in rapid serial visual presentation streams
Tijl Grootswagers, Ivy Zhou, Amanda K Robinson, Martin N Hebart, and Thomas A Carlson. Human eeg recordings for 1,854 concepts presented in rapid serial visual presentation streams. Scientific Data, 9(1):3, 2022
work page 2022
- [19]
-
[20]
Joerg F Hipp, David J Hawellek, Maurizio Corbetta, Markus Siegel, and Andreas K Engel. Large- scale cortical correlation structure of spontaneous oscillatory activity.Nature neuroscience, 15 (6):884–890, 2012
work page 2012
-
[21]
Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022
work page 2022
-
[22]
Categorical reparametrization with gumble-softmax
Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparametrization with gumble-softmax. In International Conference on Learning Representations (ICLR 2017), 2017
work page 2017
-
[23]
Ji-Hoon Jeong, Jeong-Hyun Cho, Kyung-Hwan Shim, Byoung-Hee Kwon, Byeong-Hoo Lee, Do-Yeun Lee, Dae-Hyeok Lee, and Seong-Whan Lee. Multimodal signal dataset for 11 intuitive movement tasks from single upper extremity during multiple recording sessions.GigaScience, 9(10):giaa098, 2020
work page 2020
-
[24]
Large brain model for learning generic representations with tremendous eeg data in bci
Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous eeg data in bci. InICLR, 2024
work page 2024
-
[25]
Wei-Bang Jiang, Yansen Wang, Bao-Liang Lu, and Dongsheng Li. Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals. InICLR, 2025
work page 2025
-
[26]
Brain Invaders calibration-less P300-based BCI with modulation of flash duration Dataset (bi2015a)
Louis Korczowski, Martine Cederhout, Anton Andreev, Grégoire Cattan, Pedro Luiz Coelho Ro- drigues, Violette Gautheret, and Marco Congedo. Brain Invaders calibration-less P300-based BCI with modulation of flash duration Dataset (bi2015a). Technical report, July 2019
work page 2019
-
[27]
Similarity of neural network representations revisited
Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pages 3519–3529. PMlR, 2019
work page 2019
-
[28]
Demetres Kostas, Stephane Aroca-Ouellette, and Frank Rudzicz. Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.FHN, 15:653–659, 2021
work page 2021
-
[29]
Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of Neural Engineering, 15(5), 2018
work page 2018
-
[30]
Dongdong Li, Zhongliang Zeng, Zhe Wang, and Hai Yang. Estformer: Transformer utilising spatiotemporal dependencies for electroencephalogram super-resolution.Knowledge-Based Systems, 317:113345, 2025
work page 2025
-
[31]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[32]
Bingxiu Liu, Jifeng Guo, CL Philip Chen, Xia Wu, and Tong Zhang. Fine-grained interpretability for eeg emotion recognition: Concat-aided grad-cam and systematic brain functional network. IEEE Transactions on Affective Computing, 15(2):671–684, 2023
work page 2023
-
[33]
Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition.IEEE TCDS, 14(2):715–729, 2022
work page 2022
-
[34]
Wei Liu, Wei-Long Zheng, Ziyi Li, Si-Yuan Wu, Lu Gan, and Bao-Liang Lu. Identifying similarities and differences in emotion recognition with eeg and eye movements among chinese, german, and french people.Journal of Neural Engineering, 19(2):026012, 2022. 11
work page 2022
-
[35]
Luciw, Ewa Jarocka, and Benoni B
Matthew D. Luciw, Ewa Jarocka, and Benoni B. Edin. Multi-channel eeg recordings during 3,936 grasp and lift trials with varying weight and friction.Scientific Data, 1(1):140047, Nov 2014
work page 2014
- [36]
-
[37]
CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model
Jingying Ma, Feng Wu, Qika Lin, Yucheng Xing, Chenyu Liu, Ziyu Jia, and Mengling Feng. Codebrain: Towards decoupled interpretability and multi-scale architecture for eeg foundation model.arXiv preprint arXiv:2506.09110, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Objective and subjective evaluation of online error correction during p300-based spelling
Perrin Margaux, Maby Emmanuel, Daligault Sébastien, Bertrand Olivier, and Mattout Jérémie. Objective and subjective evaluation of online error correction during p300-based spelling. Advances in Human-Computer Interaction, 2012(1):578295, 2012
work page 2012
-
[39]
Chirag Mehra, Ahmad Beyh, Petroula Laiou, Pilar Garces, Emily JH Jones, Luke Mason, Jan Buitelaar, Mark H Johnson, Declan Murphy, Eva Loth, et al. Zero-phase-delay syn- chrony between interacting neural populations: implications for functional connectivity-derived biomarkers.Imaging Neuroscience, 3:IMAG–a, 2025
work page 2025
-
[40]
Christoph M Michel and Thomas Koenig. Eeg microstates as a tool for studying the temporal dynamics of whole-brain neuronal networks: a review.Neuroimage, 180:577–593, 2018
work page 2018
-
[41]
Andreas Miltiadous, Katerina D Tzimourta, Theodora Afrantou, Panagiotis Ioannidis, Niko- laos Grigoriadis, Dimitrios G Tsalikakis, Pantelis Angelidis, Markos G Tsipouras, Euripidis Glavas, Nikolaos Giannakeas, et al. A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg.Data, 8(6):95, 2023
work page 2023
-
[42]
Contextual feature extraction hierarchies converge in large language models and the brain
Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D Mehta, and Nima Mesgarani. Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence, 6(12):1467–1477, 2024
work page 2024
-
[43]
Ari Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31, 2018
work page 2018
-
[44]
Raffaele Nardone, Luca Sebastianelli, Viviana Versace, Leopold Saltuari, Piergiorgio Lochner, Vanessa Frey, Stefan Golaszewski, Francesco Brigo, Eugen Trinka, and Yvonne Höller. Useful- ness of eeg techniques in distinguishing frontotemporal dementia from alzheimer’s disease and other dementias.Disease markers, 2018(1):6581490, 2018
work page 2018
-
[45]
The temple university hospital eeg data corpus.Frontiers in Neuroscience, 10:196, 2016
Iyad Obeid and Joseph Picone. The temple university hospital eeg data corpus.Frontiers in Neuroscience, 10:196, 2016
work page 2016
-
[46]
Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, and Giulia Lioi. Reve: A foundation model for eeg - adapting to any setup with large-scale pretraining on 25,000 subjects.ArXiv, abs/2510.21585, 2025
-
[47]
John Polich. Updating p300: an integrative theory of p3a and p3b.Clinical neurophysiology, 118(10):2128–2148, 2007
work page 2007
-
[48]
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
Jathurshan Pradeepkumar, Xihao Piao, Zheng Chen, and Jimeng Sun. Tokenizing single-channel eeg with time-frequency motif learning.arXiv preprint arXiv:2502.16060, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[49]
Pavel Prado, Vicente Medel, Raul Gonzalez-Gomez, Agustín Sainz-Ballesteros, Victor Vi- dal, Hernando Santamaría-García, Sebastian Moguilner, Jhony Mejia, Andrea Slachevsky, Maria Isabel Behrens, et al. The brainlat project, a multimodal neuroimaging dataset of neurodegeneration from underrepresented backgrounds.Scientific Data, 10(1):889, 2023
work page 2023
-
[50]
Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in neural information processing systems, 30, 2017. 12
work page 2017
-
[51]
Mricrogl: voxel-based visualization for neuroimaging.Nature methods, 22(8):1613–1614, 2025
Christopher Rorden. Mricrogl: voxel-based visualization for neuroimaging.Nature methods, 22(8):1613–1614, 2025
work page 2025
-
[52]
Chrononet: A deep recurrent neural network for abnormal eeg identification
Subhrajit Roy, Isabell Kiral-Kornek, and Stefan Harrer. Chrononet: A deep recurrent neural network for abnormal eeg identification. In David Riaño, Szymon Wilk, and Annette ten Teije, editors,Artificial Intelligence in Medicine, pages 47–56, 2019
work page 2019
-
[53]
Emotion detection in the loop from brain signals and facial images
Arman Savran, Koray Çiftçi, Guillaume Chanel, Javier Mota, Luong Viet, Bulent Sankur, Lale Akarun, Alice Caplier, and Michèle Rombaut. Emotion detection in the loop from brain signals and facial images. 01 2006
work page 2006
-
[54]
G. Schalk, D.J. McFarland, T. Hinterberger, N. Birbaumer, and J.R. Wolpaw. Bci2000: a general-purpose brain-computer interface (bci) system.IEEE TBE, 51(6):1034–1043, 2004
work page 2004
-
[55]
Grad-cam: Visual explanations from deep networks via gradient-based localization
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017
work page 2017
-
[56]
Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018
work page 2018
-
[57]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[58]
Seyed Yahya Shirazi, Alexandre Franco, Maurício Scopel Hoffmann, Nathalia B Esper, Dung Truong, Arnaud Delorme, Michael P Milham, and Scott Makeig. Hbn-eeg: The fair imple- mentation of the healthy brain network (hbn) electroencephalography dataset.bioRxiv, pages 2024–10, 2024
work page 2024
-
[59]
Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024
Reece Shuttleworth, Jacob Andreas, Antonio Torralba, and Pratyusha Sharma. Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024
-
[60]
Yonghao Song, Xueyu Jia, Lie Yang, and Longhan Xie. Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170, 2021
-
[61]
Yonghao Song, Qingqing Zheng, Bingchuan Liu, and Xiaorong Gao. Eeg conformer: Convolu- tional transformer for eeg decoding and visualization.TNSRE, 31:710–719, 2022
work page 2022
-
[62]
Sebastian Stober, Avital Sternin, Adrian M Owen, and Jessica A Grahn. Towards music imagery information retrieval: Introducing the openmiir dataset of eeg recordings from music perception and imagination. InISMIR, pages 763–769, 2015
work page 2015
-
[63]
Axiomatic attribution for deep networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017
work page 2017
-
[64]
Bert rediscovers the classical nlp pipeline
Ian Tenney, Dipanjan Das, and Ellie Pavlick. Bert rediscovers the classical nlp pipeline. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 4593–4601, 2019
work page 2019
-
[65]
Mastaneh Torkamani-Azar, Sumeyra Demir Kanik, Serap Aydin, and Mujdat Cetin. Prediction of reaction time and vigilance variability from spatio-spectral features of resting-state eeg in a long sustained attention task.IEEE JBHI, 24(9):2550–2558, 2020
work page 2020
- [66]
-
[67]
Logan T. Trujillo, Candice T. Stanfield, and Ruben D. Vela. The effect of electroencephalogram (eeg) reference choice on information-theoretic measures of the complexity and integration of eeg signals.Frontiers in Neuroscience, 11, 2017
work page 2017
-
[68]
Dimitri Van de Ville, Juliane Britz, and Christoph M Michel. Eeg microstate sequences in healthy humans at rest reveal scale-free dynamics.Proceedings of the National Academy of Sciences, 107(42):18179–18184, 2010. 13
work page 2010
-
[69]
Neural discrete representation learning
Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. InNeurIPS, 2017
work page 2017
-
[70]
Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008
work page 2008
-
[71]
Mariska J Vansteensel, Martin G Bleichner, Zac V Freudenburg, Dora Hermes, Erik J Aarnoutse, Frans SS Leijten, Cyrille H Ferrier, Johan Martijn Jansma, and Nick F Ramsey. Spatiotemporal characteristics of electrocortical brain activity during mental calculation.Human Brain Mapping, 35(12), 2014
work page 2014
-
[72]
Raul Vicente, Leonardo L Gollo, Claudio R Mirasso, Ingo Fischer, and Gordon Pipa. Dynamical relaying can yield zero time lag neuronal synchrony despite long conduction delays.Proceedings of the National Academy of Sciences, 105(44):17157–17162, 2008
work page 2008
-
[73]
E. von Weltin, T. Ahsan, V . Shah, D. Jamshed, M. Golmohammadi, I. Obeid, and J. Picone. Electroencephalographic slowing: A primary source of error in automatic seizure detection. In SPMB, pages 1–5, 2017
work page 2017
-
[74]
Eegpt: Pretrained transformer for universal and reliable representation of eeg signals
Guangyu Wang, Wenchao Liu, Yuhong He, Cong Xu, Lin Ma, and Haifeng Li. Eegpt: Pretrained transformer for universal and reliable representation of eeg signals. InNeurIPS, pages 39249– 39280, 2024
work page 2024
-
[75]
Cbramod: A criss-cross brain foundation model for eeg decoding
Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. Cbramod: A criss-cross brain foundation model for eeg decoding. InICLR, 2025
work page 2025
-
[76]
arXiv preprint arXiv:2505.15946 (2025)
Yuxiang Wei, Yanteng Zhang, Xi Xiao, Tianyang Wang, Xiao Wang, and Vince D Calhoun. More-brain: Routed mixture of experts for interpretable and generalizable cross-subject fmri visual decoding.arXiv preprint arXiv:2505.15946, 2025
-
[77]
Marta Xavier, Inês Esteves, João Jorge, Rodolfo Abreu, Anne-Lise Giraud, Sepideh Sadaghiani, Jonathan Wirsich, and Patrícia Figueiredo. Consistency of resting-state correlations between fmri networks and eeg band power.Imaging Neuroscience, 3:IMAG–a, 2025
work page 2025
-
[78]
Biot: Biosignal transformer for cross-data learning in the wild
Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InNeurIPS, pages 78240–78260, 2023
work page 2023
-
[79]
Cross-modal information flow in multimodal large language models
Zhi Zhang, Srishti Yadav, Fengze Han, and Ekaterina Shutova. Cross-modal information flow in multimodal large language models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19781–19791, 2025
work page 2025
-
[80]
Chisco: An eeg-based bci dataset for decoding of imagined speech.Scientific Data, 11(1):1265, 2024
Zihan Zhang, Xiao Ding, Yu Bao, Yi Zhao, Xia Liang, Bing Qin, and Ting Liu. Chisco: An eeg-based bci dataset for decoding of imagined speech.Scientific Data, 11(1):1265, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.