STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

Abby Turner; Artur Dubrawski; Brad Shook; Jieshi Chen; Jonathan Elmer; Micha{\l} Wili\'nski; Mononito Goswami

arxiv: 2511.10848 · v2 · submitted 2025-11-13 · 💻 cs.LG · cs.AI

STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

Brad Shook , Abby Turner , Jieshi Chen , Micha{\l} Wili\'nski , Mononito Goswami , Jonathan Elmer , Artur Dubrawski This is my paper

Pith reviewed 2026-05-17 21:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords EEG classificationtime series foundation modelsadaptersspatial-temporal modelingmulti-head poolingbrain signalsclinical tasks

0 comments

The pith

A lightweight adapter lets general time series foundation models match specialized EEG models on clinical classification tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STAMP, an adapter that takes univariate embeddings from any general time series foundation model and applies multi-head pooling to handle EEG recordings. It demonstrates that this setup implicitly captures the spatial and temporal structure of brain signals and reaches performance levels comparable to EEG-specific foundation models across eight benchmark clinical datasets. The approach requires only a small number of trainable parameters and works with flexible input formats. A sympathetic reader would care because it reduces the need to train separate large models for every specialized domain like EEG.

Core claim

We introduce a novel Spatial-Temporal Adapter with Multi-Head Pooling (STAMP), which leverages univariate embeddings produced by a general TSFM, implicitly models spatial-temporal characteristics of EEG data, and achieves performance comparable to state-of-the-art EEGFMs on classification tasks.

What carries the argument

Multi-head pooling applied to univariate embeddings from a general time series foundation model to implicitly capture spatial relationships across EEG channels.

If this is right

General time series foundation models can be reused for EEG tasks with only lightweight adaptation.
Domain-specific pretraining may not be required to reach competitive results on EEG benchmarks.
The adapter supports different numbers of channels and input configurations for multivariate signals.
Computational cost for EEG modeling decreases because most parameters come from an already-trained general model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Rich temporal embeddings may contain enough information for pooling to recover channel interactions that would otherwise require explicit spatial layers.
The same adapter pattern could apply to other multivariate time series where channels have latent spatial or structural meaning.
Testing on datasets with higher channel counts or different electrode montages would show how far the implicit modeling extends.

Load-bearing premise

That univariate embeddings from a general time series foundation model combined with multi-head pooling can sufficiently capture spatial relationships across EEG channels without explicit spatial modeling or EEG-specific pretraining.

What would settle it

On a new clinical EEG classification dataset, if the STAMP-adapted general model shows a large performance gap below a dedicated EEG foundation model, that would challenge the claim of comparability.

Figures

Figures reproduced from arXiv: 2511.10848 by Abby Turner, Artur Dubrawski, Brad Shook, Jieshi Chen, Jonathan Elmer, Micha{\l} Wili\'nski, Mononito Goswami.

**Figure 1.** Figure 1: A diagram showing how EEG data is processed by MOMENT and STAMP. The EEG data is separated into tokens, which are embedded using MOMENT before positional encoding is applied. The resulting tokens are passed through the CC-GMLP, where spatial and temporal relationships are incorporated into embeddings. MHAP then determines relevant features and generates final predictions by projecting embeddings into lower… view at source ↗

**Figure 2.** Figure 2: Performance comparison between four positional encoding options: No PE (0.71M), PE-N (0.73M), PE-ST (0.72M), and PENST (0.74M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets. Through our ablation of positional encoding (see [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: Performance comparison between token aggregation strategies: mean pooling (0.70M) and MHAP (0.74M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets. positional encoding and token mixer choices. Specifically, performance between mean pooling and MHAP on every dataset except BCIC-IV-2a is similar. However, MHAP demonstrates a significant performance boos… view at source ↗

**Figure 3.** Figure 3: Performance comparison between four different token mixer options: B-GMLP (0.79M), CC-GMLP (0.74M), B-TF (1.25M), and CC-TF (0.99M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets. In our ablation study comparing token mixing strategies, we see that CC-GMLP performs strongly across each dataset. Across all four datasets, the GMLP architecture performs b… view at source ↗

**Figure 5.** Figure 5: Performance comparison between the full evaluation of 5 methods: STAMP (0.74M), CBraMod (29M), LaBraM (5.8M), STTransformer (3.5M), and EEG Conformer (0.55M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets. the EEGFMs. Further analysis demonstrated that STAMP can often provide the same level of performance with even fewer parameters (see Appendix D). … view at source ↗

**Figure 6.** Figure 6: Performance comparison between using the following TSFMs with STAMP: TSPulse (1M, 0.63M), MOMENT Small (40M, 0.67M), MOMENT Base (125M, 0.7M), MOMENT Large (385M, 0.74M), and Chronos Large (710M, 0.74M). The first value in the parentheses indicates the size of the TSFM and the second value denotes the average number of trainable parameters in STAMP across the 4 datasets. 6. Conclusion and Future Work We p… view at source ↗

**Figure 7.** Figure 7: Performance comparison of STAMP with varying D values [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 9.** Figure 9: Performance comparison of baselines, MOMENT embeddings with STAMP, and Chronos Large embeddings with STAMP on the two emotion recognition datasets: SEED-V and FACED. To further analyze the reason that STAMP performs poorly for the two emotion recognition datasets, SEED-V and FACED, we ran an additional STAMP experiment using embeddings from Chronos Large. The previously mentioned 5 seeds were used for th… view at source ↗

**Figure 8.** Figure 8: Performance comparison of STAMP when using embeddings from Chronos Large with only the EOS embedding versus an embedding from mean pooling. Large. The first 200 embeddings correspond to the length of the time series and the last embedding corresponds to an EOS token. For use with STAMP, we reduced these embeddings to a single representative embedding. We tested two aggregation methods: 1) mean pooling acr… view at source ↗

**Figure 10.** Figure 10: Performance comparison of STAMP using varying numbers of temporal channels. To investigate how the availability of temporal channels affects performance, we performed STAMP experiments using the first t temporal channels, where t ∈ {1, 2, 3, 4, 5}. In [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 12.** Figure 12: Performance comparison between MOMENT with mean pooling (0.04M) versus using MOMENT with STAMP (0.74M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets. The most naive baseline involving MOMENT is to use only mean pooling on the embeddings [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 11.** Figure 11: Performance comparison of STAMP using D = 96 versus EEG Conformer. Both methods have ≈ 0.55M trainable parameters. STAMP provides superior performance compared to EEG Conformer for nearly all datasets and metrics evaluated. One may argue that this is due to increased capacity in STAMP and that EEG Conformer is a more efficient method for EEG modeling. In order to demonstrate that this is not the case, … view at source ↗

**Figure 13.** Figure 13: Performance comparison between three variants of STAMP: PE-NST + Mean Pooling, PE-NST + MHAP, PE-NST + CC-GMLP + MHAP. performance is provided. This comparison demonstrates that the adapter requires some form of relationship modeling in the form of either token mixing or MHAP [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

**Figure 14.** Figure 14: Performance comparison between finetuning MOMENT using LoRA and STAMP (2.3M) versus only finetuning STAMP (0.74M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗

**Figure 15.** Figure 15: Performance comparison (across all metrics) between the full evaluation of 5 methods: STAMP, CBraMod, LaBraM, ST-Transformer, and EEG Conformer on SHU-MI, PhysioNet-MI, MentalArithmetic, and BCIC-IV-2a [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗

**Figure 16.** Figure 16: Performance comparison (across all metrics) between the full evaluation of 5 methods: STAMP, CBraMod, LaBraM, ST-Transformer, and EEG Conformer on TUEV, Mumtaz2016, SEED-V, and FACED. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗

read the original abstract

Time series foundation models (TSFMs) pretrained on data from multiple domains have shown strong performance on diverse modeling tasks. Various efforts have been made to develop foundation models specific to electroencephalography (EEG) data, which records brain electrical activity as time series. However, no comparative analysis of EEG-specific foundation models (EEGFMs) versus general TSFMs has been performed on EEG-specific tasks. We introduce a novel Spatial-Temporal Adapter with Multi-Head Pooling (STAMP), which leverages univariate embeddings produced by a general TSFM, implicitly models spatial-temporal characteristics of EEG data, and achieves performance comparable to state-of-the-art EEGFMs. A comprehensive analysis is performed on 8 benchmark datasets of clinical tasks using EEG for classification, along with ablation studies. Our proposed adapter is lightweight in trainable parameters and flexible in the inputs it can accommodate, supporting easy modeling of EEG data using TSFMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

STAMP gives a lightweight adapter for general TSFMs on EEG tasks but the implicit spatial modeling needs stronger evidence.

read the letter

The main point is that this paper presents STAMP as a simple adapter to make general time series foundation models work on EEG tasks with minimal extra parameters, and it gets results close to specialized EEG models on the benchmarks they tested. They introduce the multi-head pooling mechanism to handle spatial-temporal features implicitly from univariate embeddings. This is new in the context of bridging general TSFMs to EEG without domain-specific pretraining. The paper does well by evaluating on eight clinical benchmark datasets and including ablation studies to test the components. Keeping the adapter lightweight is a clear practical win, and the flexibility for different inputs makes it easy to apply. One soft spot is the assumption that multi-head pooling alone can recover spatial channel correlations. The base model processes channels independently, so it lacks any electrode layout information. Standard pooling aggregates without regard to positions, which might mean it learns co-occurrence patterns instead of true spatial structure. The stress-test concern holds here unless the ablations specifically demonstrate that the heads encode spatial relationships. Without seeing detailed comparisons to models that use explicit graphs or positional encodings, it's hard to be sure the performance is due to implicit modeling rather than other factors. This work is aimed at the intersection of foundation models and biomedical time series. Readers working on adapting large pretrained models to specialized data like EEG would get the most out of it, especially if they care about parameter efficiency. The citation pattern seems appropriate, focusing on relevant TSFM and EEGFM papers. The evidence is empirical, which is fine for this type of contribution. I recommend sending it for peer review. It raises a worthwhile question about minimal adaptation strategies and deserves referee input on the experimental rigor.

Referee Report

2 major / 2 minor

Summary. The paper introduces STAMP, a lightweight Spatial-Temporal Adapter with Multi-Head Pooling that processes univariate embeddings produced by feeding each EEG channel independently through a general time series foundation model (TSFM). It claims this adapter implicitly captures spatial-temporal characteristics of EEG signals and achieves performance comparable to state-of-the-art EEG-specific foundation models (EEGFMs) across 8 clinical classification benchmark datasets, supported by ablation studies. The approach is presented as flexible and parameter-efficient for adapting general TSFMs to EEG tasks without domain-specific pretraining.

Significance. If the central empirical claims hold under rigorous validation, the work would be significant for demonstrating that general TSFMs can be adapted to EEG without explicit spatial operators or EEG-specific pretraining, potentially simplifying model development in neuroscience applications. The reported ablation studies and multi-benchmark evaluation provide a foundation for assessing flexibility and efficiency, though stronger evidence for the implicit spatial modeling would strengthen the contribution.

major comments (2)

[Abstract] Abstract and method description: The central claim that STAMP 'implicitly models spatial-temporal characteristics of EEG data' relies on multi-head pooling over per-channel univariate TSFM embeddings recovering spatial channel correlations. However, the TSFM is pretrained on scalar time series (no topographic bias) and standard multi-head pooling is permutation-invariant, so it does not inherently encode electrode geometry, adjacency, or relative positions. Ablation studies isolating this component (e.g., comparing against random channel permutations or explicit spatial baselines) are needed to substantiate implicit modeling rather than statistical co-occurrence.
[Experimental section] Experimental evaluation: The comparability to SOTA EEGFMs is reported on 8 benchmarks with ablations, but without visible full details on error bars, exact baseline re-implementations, or statistical significance tests, the support for performance parity cannot be fully assessed. This affects the strength of the claim that the adapter matches specialized EEGFMs.

minor comments (2)

[Abstract] Clarify the specific general TSFM backbone used and report the exact number of trainable parameters in STAMP for reproducibility.
[Results] Ensure figures or tables comparing STAMP to baselines include standard deviations or confidence intervals to aid interpretation of results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major comment below and have revised the manuscript accordingly to clarify our claims and strengthen the experimental reporting.

read point-by-point responses

Referee: [Abstract] Abstract and method description: The central claim that STAMP 'implicitly models spatial-temporal characteristics of EEG data' relies on multi-head pooling over per-channel univariate TSFM embeddings recovering spatial channel correlations. However, the TSFM is pretrained on scalar time series (no topographic bias) and standard multi-head pooling is permutation-invariant, so it does not inherently encode electrode geometry, adjacency, or relative positions. Ablation studies isolating this component (e.g., comparing against random channel permutations or explicit spatial baselines) are needed to substantiate implicit modeling rather than statistical co-occurrence.

Authors: We appreciate this observation regarding the mechanism of implicit modeling. The STAMP adapter aggregates per-channel TSFM embeddings via multi-head pooling, enabling the model to learn inter-channel statistical dependencies directly from the EEG data without explicit spatial operators or topographic pretraining. This data-driven aggregation captures the spatial correlations necessary for clinical tasks, as demonstrated by competitive performance across benchmarks. To address the request for isolating ablations, we will add a random channel permutation experiment in the revised supplementary material and include a brief comparison to an explicit spatial baseline (e.g., a lightweight graph-based aggregator on electrode positions) to highlight the parameter efficiency of our approach. We have also revised the abstract and method description to more precisely characterize the implicit nature of the spatial-temporal modeling. revision: partial
Referee: [Experimental section] Experimental evaluation: The comparability to SOTA EEGFMs is reported on 8 benchmarks with ablations, but without visible full details on error bars, exact baseline re-implementations, or statistical significance tests, the support for performance parity cannot be fully assessed. This affects the strength of the claim that the adapter matches specialized EEGFMs.

Authors: We thank the referee for this feedback on experimental rigor. In the revised manuscript, we will expand the experimental section and appendix to report: standard deviation error bars computed over five independent runs with different random seeds for all methods and datasets; detailed re-implementation protocols for each EEGFM baseline, including exact hyperparameters, data preprocessing, and training settings; and statistical significance results using paired t-tests (with p-values) between STAMP and the EEGFMs on each of the eight benchmarks. These additions will provide a more complete assessment of performance parity. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external benchmarks

full rationale

The paper presents STAMP as a lightweight adapter that feeds per-channel univariate time series into a pretrained general TSFM and applies multi-head pooling to implicitly capture spatial-temporal EEG structure. All performance claims are supported by direct empirical comparisons on 8 external benchmark datasets plus ablation studies, rather than any derivation that reduces by construction to fitted parameters or self-referential definitions. No equations, uniqueness theorems, or self-citations are invoked as load-bearing premises that would collapse the central result to its own inputs. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the transferability of general TSFM embeddings to EEG and the sufficiency of implicit spatial-temporal modeling via pooling; no explicit free parameters or invented physical entities are described in the abstract.

axioms (1)

domain assumption Univariate embeddings from a general TSFM pretrained on other domains transfer usefully to EEG channels.
Invoked when the adapter is applied to EEG data without additional pretraining.

invented entities (1)

STAMP adapter with multi-head pooling no independent evidence
purpose: To implicitly model spatial-temporal EEG characteristics from univariate TSFM embeddings
New method component introduced to bridge general and EEG-specific modeling.

pith-pipeline@v0.9.0 · 5480 in / 1299 out tokens · 40264 ms · 2026-05-17T21:41:52.504868+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The CC-GMLP is made up of L blocks... gT(Z), gS(Z) ... spatial gating unit ... linear mapping W:R^S→R^S
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a novel Spatial-Temporal Adapter with Multi-Head Pooling (STAMP), which leverages univariate embeddings produced by a general TSFM, implicitly models spatial-temporal characteristics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages

[1]

URL https://dx.doi.org/10.1088/ 1741-2552/ad546d

ISSN 1741-2552. doi: 10.1088/1741-2552/ ab0ab5. URLhttps://dx.doi.org/10.1088/ 1741-2552/ab0ab5. Publisher: IOP Publishing. Wenhui Cui, Woojae Jeong, Philipp Th¨ olke, Takfari- nas Medani, Karim Jerbi, Anand A. Joshi, and Richard M. Leahy. Neuro-gpt: Towards a foun- dation model for eeg. In2024 IEEE International Symposium on Biomedical Imaging (ISBI), pa...

work page doi:10.1088/1741-2552/ 2024
[2]

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals,

URLhttps://proceedings.mlr.press/ v235/das24c.html. Vijay Ekambaram, Subodh Kumar, Arindam Jati, Sumanta Mukherjee, Tomoya Sakai, Pankaj Dayama, Wesley M. Gifford, and Jayant Kalagnanam. Tspulse: Dual space tiny pre- trained models for rapid time-series analysis, 2025. URLhttps://arxiv.org/abs/2505.13033. A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Ha...

work page doi:10.1161/01.cir.101.23.e215 2025
[3]

URLhttps://proceedings.mlr.press/ v235/goswami24a.html. A. Harati, M. Golmohammadi, S. Lopez, I. Obeid, and J. Picone. Improved EEG event clas- sification using differential energy. 2015: 10.1109/SPMB.2015.7405421, 2015. ISSN 2372-7241. doi: 10.1109/SPMB.2015.7405421. URLhttps://www.ncbi.nlm.nih.gov/pmc/ articles/PMC4874511/. Edward J Hu, Yelong Shen, Phi...

work page doi:10.1109/spmb.2015.7405421 2015
[4]

URL https://www.isca-archive.org/interspeech_ 2019/india19_interspeech.html

doi: 10.21437/Interspeech.2019-2616. URL https://www.isca-archive.org/interspeech_ 2019/india19_interspeech.html. Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representa- tions with tremendous EEG data in BCI. InThe Twelfth International Conference on Learning Rep- resentations, 2024. URLhttps://openreview. net/fo...

work page doi:10.21437/interspeech.2019-2616 2019
[5]

EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization,

URLhttps://openreview.net/forum?id= jYluzCLFDM. Gerwin Schalk, Dennis J McFarland, Thilo Hinter- berger, Niels Birbaumer, and Jonathan R Wolpaw. Bci2000: a general-purpose brain-computer inter- face (bci) system.IEEE Transactions on Biomed- ical Engineering, 51(6):1034–1043, 2004. doi: 10. 1109/TBME.2004.827072. Yonghao Song, Xueyu Jia, Lie Yang, and Long...

work page doi:10.1109/tnsre.2022.3230250 2004
[6]

Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan

URLhttps://openreview.net/forum?id= lvS2b8CjG5. Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. CBramod: A criss-cross brain foundation model for EEG decoding. InThe Thirteenth Inter- national Conference on Learning Representations,

work page
[7]

, author Zhou, A

URLhttps://openreview.net/forum?id= NPNUHgHF2w. Miao Zhao, Yufeng Ma, Yiwei Ding, Yu Zheng, Min Liu, and Minqiang Xu. Multi-query multi-head at- tention pooling and inter-topk penalty for speaker verification. InICASSP 2022 - 2022 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6737–6741, 2022. doi: 10.1109/ICASS...

work page doi:10.1109/icassp43922.2022.9746178 2022
[8]

All five seeds were used for our full evaluation

The first three seeds were used for ablation ex- periments and hyperparameter tuning. All five seeds were used for our full evaluation. As a result of our fixed seeds, each experiment is fully reproducible. Appendix C. Hyperparameter Tuning During the development of the adapter, many hyper- parameters were searched over. Our final hyperpa- rameters were s...

work page arXiv 2077

[1] [1]

URL https://dx.doi.org/10.1088/ 1741-2552/ad546d

ISSN 1741-2552. doi: 10.1088/1741-2552/ ab0ab5. URLhttps://dx.doi.org/10.1088/ 1741-2552/ab0ab5. Publisher: IOP Publishing. Wenhui Cui, Woojae Jeong, Philipp Th¨ olke, Takfari- nas Medani, Karim Jerbi, Anand A. Joshi, and Richard M. Leahy. Neuro-gpt: Towards a foun- dation model for eeg. In2024 IEEE International Symposium on Biomedical Imaging (ISBI), pa...

work page doi:10.1088/1741-2552/ 2024

[2] [2]

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals,

URLhttps://proceedings.mlr.press/ v235/das24c.html. Vijay Ekambaram, Subodh Kumar, Arindam Jati, Sumanta Mukherjee, Tomoya Sakai, Pankaj Dayama, Wesley M. Gifford, and Jayant Kalagnanam. Tspulse: Dual space tiny pre- trained models for rapid time-series analysis, 2025. URLhttps://arxiv.org/abs/2505.13033. A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Ha...

work page doi:10.1161/01.cir.101.23.e215 2025

[3] [3]

URLhttps://proceedings.mlr.press/ v235/goswami24a.html. A. Harati, M. Golmohammadi, S. Lopez, I. Obeid, and J. Picone. Improved EEG event clas- sification using differential energy. 2015: 10.1109/SPMB.2015.7405421, 2015. ISSN 2372-7241. doi: 10.1109/SPMB.2015.7405421. URLhttps://www.ncbi.nlm.nih.gov/pmc/ articles/PMC4874511/. Edward J Hu, Yelong Shen, Phi...

work page doi:10.1109/spmb.2015.7405421 2015

[4] [4]

URL https://www.isca-archive.org/interspeech_ 2019/india19_interspeech.html

doi: 10.21437/Interspeech.2019-2616. URL https://www.isca-archive.org/interspeech_ 2019/india19_interspeech.html. Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representa- tions with tremendous EEG data in BCI. InThe Twelfth International Conference on Learning Rep- resentations, 2024. URLhttps://openreview. net/fo...

work page doi:10.21437/interspeech.2019-2616 2019

[5] [5]

EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization,

URLhttps://openreview.net/forum?id= jYluzCLFDM. Gerwin Schalk, Dennis J McFarland, Thilo Hinter- berger, Niels Birbaumer, and Jonathan R Wolpaw. Bci2000: a general-purpose brain-computer inter- face (bci) system.IEEE Transactions on Biomed- ical Engineering, 51(6):1034–1043, 2004. doi: 10. 1109/TBME.2004.827072. Yonghao Song, Xueyu Jia, Lie Yang, and Long...

work page doi:10.1109/tnsre.2022.3230250 2004

[6] [6]

Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan

URLhttps://openreview.net/forum?id= lvS2b8CjG5. Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. CBramod: A criss-cross brain foundation model for EEG decoding. InThe Thirteenth Inter- national Conference on Learning Representations,

work page

[7] [7]

, author Zhou, A

URLhttps://openreview.net/forum?id= NPNUHgHF2w. Miao Zhao, Yufeng Ma, Yiwei Ding, Yu Zheng, Min Liu, and Minqiang Xu. Multi-query multi-head at- tention pooling and inter-topk penalty for speaker verification. InICASSP 2022 - 2022 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6737–6741, 2022. doi: 10.1109/ICASS...

work page doi:10.1109/icassp43922.2022.9746178 2022

[8] [8]

All five seeds were used for our full evaluation

The first three seeds were used for ablation ex- periments and hyperparameter tuning. All five seeds were used for our full evaluation. As a result of our fixed seeds, each experiment is fully reproducible. Appendix C. Hyperparameter Tuning During the development of the adapter, many hyper- parameters were searched over. Our final hyperpa- rameters were s...

work page arXiv 2077