pith. sign in

arxiv: 2602.23410 · v3 · pith:QCB73FYInew · submitted 2026-02-26 · 💻 cs.LG · cs.AI· eess.SP· q-bio.NC

Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG

Pith reviewed 2026-05-21 11:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AIeess.SPq-bio.NC
keywords brain foundation modelfMRIEEGMEGmultimodal pretrainingself-supervised learningneural signal processingAny-Resolution Neural Signal Sampler
0
0 comments X

The pith

Brain-OF jointly processes fMRI, EEG and MEG in one model by mapping them to a shared semantic space and using dual-domain pretraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Brain-OF as an omnifunctional foundation model for brain signals from fMRI, EEG, and MEG. It is jointly pretrained on data from around 40 datasets to handle both single and combined modality inputs in one framework. The model tackles differences in resolution and meaning between the modalities with a special sampler that maps them to a common space. It further uses a mixture of shared and specialized experts along with reconstruction tasks in time and frequency to learn general representations. This matters because combining these complementary brain measurement techniques could lead to more powerful tools for understanding brain function if the integration succeeds.

Core claim

Brain-OF is an omnifunctional brain foundation model jointly pretrained on fMRI, EEG and MEG, capable of handling both unimodal and multimodal inputs within a unified framework. It introduces the Any-Resolution Neural Signal Sampler to project diverse brain signals into a shared semantic space, integrates DINT attention with a Sparse Mixture of Experts to capture invariant and specific representations, and uses Masked Temporal-Frequency Modeling as a dual-domain pretraining objective. Pretrained on around 40 datasets, it demonstrates superior performance across diverse downstream tasks.

What carries the argument

Any-Resolution Neural Signal Sampler that projects signals of varying resolutions into a shared semantic space, enabling unified processing across modalities.

If this is right

  • The model achieves superior performance on diverse downstream tasks compared to existing single-modality approaches.
  • Joint multimodal integration allows exploitation of complementary spatiotemporal dynamics across neuroimaging techniques.
  • Dual-domain pretraining helps internalize characteristics of neural activity through self-supervised reconstruction in time and frequency domains.
  • Shared experts capture modality-invariant representations while routed experts handle modality-specific semantics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could support transfer of learned representations between different neuroimaging modalities.
  • It opens the possibility of developing more robust models for brain-computer interfaces that fuse multiple signal types.
  • Similar resolution alignment techniques might apply to other heterogeneous sensor data in scientific domains.

Load-bearing premise

The Any-Resolution Neural Signal Sampler successfully projects diverse brain signals with severe semantic heterogeneity and resolution discrepancies into a shared semantic space while preserving task-relevant information.

What would settle it

If a single-modality model trained only on fMRI achieves comparable or better results than Brain-OF on tasks involving EEG or MEG data, this would challenge the value of the joint multimodal pretraining.

Figures

Figures reproduced from arXiv: 2602.23410 by Abigail Morrison, Andrei Galbenus, Farah Abdellatif, Hanning Guo, Hanwen Bi, Jon. N. Shah, J\"urgen Dammers.

Figure 1
Figure 1. Figure 1: Overview of the Brain-OF Architecture. Brain-OF is an omnifunctional foundation model jointly pretrained on fMRI, EEG and MEG. (a) Pretraining pipeline: The original signals are masked in both the temporal and frequency domains to encourage the model to jointly learn coupled time–frequency representations. (b) ARNESS projects arbitrary resolution signals into a unified semantic space and also serves as a d… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of Brain-OF Interpretability. (a) EEG channel contribution topomap for abnormality detection on TUAB. (b) top 10% most influential AD-related brain regions on ADNI. (c) MEG sensor contribution map for brain-age prediction on CamCAN. 4.5. Performance Gains via Multimodal Fusion Beyond unimodal evaluation, we assess multimodal fusion on two paired-modality benchmarks: brain-age prediction on Ca… view at source ↗
Figure 3
Figure 3. Figure 3: Modality Importance Analysis. (a) Local Modality Importance. Heatmap showing the relative performance drop (%) on each downstream dataset when a specific modality (fMRI, EEG, or MEG) is removed during pretraining. Higher values indicate stronger reliance on that modality. (b) Global Modality Importance. Importance scores aggregated across all downstream tasks, illustrating that EEG and fMRI contribute most… view at source ↗
Figure 4
Figure 4. Figure 4: The impact of the router bias update rate γ during pretraining. The performances are evaluated on downstream tasks across three modalities. F. Visualization of Reconstruction We visualize the reconstruction quality of Brain-OF Huge to assess the effectiveness of the proposed Masked Temporal￾Frequency Modeling (MTFM) objective across fMRI, EEG, and MEG. Beyond simple waveform reconstruction, we examine whet… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative visualization of signal reconstruction across fMRI, EEG and MEG. G. Hyperparameter Configurations In this section, we provide the detailed hyperparameter configurations in the Brain-OF pretraining, as well as the optimization settings for downstream tasks. All models were implemented in PyTorch and trained on NVIDIA A100 GPUs using automatic mixed precision. 20 [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE visualization of learned representations across model scales. Comparison of raw input features (Left) with representations from Brain-OF Base, Large and Huge (Right). Rows correspond to different modalities and tasks: Top: TUAB (EEG, binary classification); Middle: ADNI (fMRI, binary classification); Bottom: CamCAN (MEG, age regression). As model size increases, the representations exhibit clearer cl… view at source ↗
Figure 7
Figure 7. Figure 7: Label efficiency of Brain-OF under limited supervision. Brain-OF Base is evaluated across varying training data fractions (10%, 30%, 50%, 70%, 100%). Shaded regions denote standard deviation across five random seeds. Brain-OF achieves strong performance even with limited labeled data, highlighting the effectiveness of heterogeneous multimodal pretraining for low-resource settings. 24 [PITH_FULL_IMAGE:figu… view at source ↗
read the original abstract

Brain foundation models have achieved remarkable advances across a wide range of neuroscience tasks. However, most existing models are limited to a single functional modality, restricting their ability to exploit complementary spatiotemporal dynamics and the collective data scale across different neuroimaging techniques. This limitation largely arises from severe semantic heterogeneity and resolution discrepancies among modalities. To address these challenges, we propose Brain-OF, an omnifunctional brain foundation model jointly pretrained on fMRI, EEG and MEG, capable of handling both unimodal and multimodal inputs within a unified framework. To reconcile heterogeneous spatiotemporal resolutions, we introduce the Any-Resolution Neural Signal Sampler, which projects diverse brain signals into a shared semantic space. To further manage semantic shifts, the Brain-OF backbone integrates DINT attention with a Sparse Mixture of Experts, where shared experts capture modality-invariant representations and routed experts specialize in modality-specific semantics. Furthermore, to explicitly internalize the characteristics of neural activity through self-supervised learning, we propose Masked Temporal-Frequency Modeling, a dual-domain pretraining objective that jointly reconstructs brain signals in both the time and frequency domains. Brain-OF is pretrained on a large-scale corpus comprising around 40 datasets and demonstrates superior performance across diverse downstream tasks, highlighting the benefits of joint multimodal integration and dual-domain pretraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Brain-OF, an omnifunctional foundation model jointly pretrained on fMRI, EEG, and MEG data from approximately 40 datasets. It introduces the Any-Resolution Neural Signal Sampler to project signals with heterogeneous resolutions and semantics into a shared space, integrates DINT attention with a Sparse Mixture of Experts backbone (shared experts for invariant features, routed experts for modality-specific semantics), and uses Masked Temporal-Frequency Modeling as a dual-domain self-supervised objective. The central claim is that this multimodal joint pretraining yields superior performance on diverse downstream tasks compared to single-modality approaches.

Significance. If the empirical results and ablations hold, the work would be significant for the field of brain foundation models. It directly addresses the limitation of modality-specific models by demonstrating scalable joint pretraining across fMRI, EEG, and MEG, which could enable better exploitation of complementary spatiotemporal information. The dual-domain pretraining and MoE design are potentially generalizable contributions if supported by rigorous controls.

major comments (1)
  1. [§3.2] §3.2 (Any-Resolution Neural Signal Sampler): The manuscript does not report isolated validation experiments, such as per-modality reconstruction fidelity, mutual information between projected representations and task labels, or controlled ablations that disable the sampler while keeping the rest of the pipeline fixed. This component is load-bearing for the central claim, because without evidence that it maps signals with severe semantic heterogeneity and resolution discrepancies into a shared space without substantial information loss, the reported benefits of joint pretraining and the DINT+MoE backbone cannot be confidently attributed to multimodal integration rather than superficial invariants.
minor comments (2)
  1. [Abstract] The abstract asserts superior performance across downstream tasks but does not include any quantitative metrics, baselines, error bars, or dataset statistics; while the full manuscript presumably contains these in the experiments section, their absence from the summary reduces immediate evaluability.
  2. [§4] Notation for the DINT attention mechanism and the routing in the Sparse Mixture of Experts could be clarified with an explicit equation or diagram in the methods section to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment point by point below and have incorporated revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Any-Resolution Neural Signal Sampler): The manuscript does not report isolated validation experiments, such as per-modality reconstruction fidelity, mutual information between projected representations and task labels, or controlled ablations that disable the sampler while keeping the rest of the pipeline fixed. This component is load-bearing for the central claim, because without evidence that it maps signals with severe semantic heterogeneity and resolution discrepancies into a shared space without substantial information loss, the reported benefits of joint pretraining and the DINT+MoE backbone cannot be confidently attributed to multimodal integration rather than superficial invariants.

    Authors: We agree that providing isolated validation for the Any-Resolution Neural Signal Sampler is necessary to rigorously support attribution of gains to multimodal integration. In the revised manuscript we add per-modality reconstruction fidelity metrics (MSE and cosine similarity) for fMRI, EEG, and MEG after projection into the shared space. We further report mutual information between the projected embeddings and downstream task labels across representative datasets. Finally, we include a controlled ablation that replaces the sampler with naive zero-padding and linear interpolation while freezing all other components (DINT attention, Sparse MoE, and Masked Temporal-Frequency Modeling). The ablation shows a clear performance drop on cross-modal and multimodal downstream tasks, confirming that the sampler preserves task-relevant information and enables effective joint pretraining rather than relying on superficial invariants. These results are reported in an expanded §3.2 and new supplementary tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained with independent empirical claims

full rationale

The paper motivates its components (Any-Resolution Neural Signal Sampler, DINT+MoE backbone, Masked Temporal-Frequency Modeling) as architectural responses to modality heterogeneity and then reports empirical gains from pretraining on ~40 datasets. No equations, fitted parameters renamed as predictions, or self-citations are shown that reduce the central performance claim to a tautology or input by construction. The sampler projects signals into a shared space but is not defined in terms of the downstream task outcomes it is claimed to enable; results remain externally falsifiable via replication. This is the normal case of a non-circular empirical foundation model paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premise that heterogeneous neuroimaging modalities can be aligned into a shared semantic space without critical information loss and that dual-domain reconstruction yields modality-invariant representations; no free parameters, axioms, or invented entities are explicitly quantified in the abstract.

pith-pipeline@v0.9.0 · 5785 in / 1183 out tokens · 34557 ms · 2026-05-21T11:24:44.886837+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Any-Resolution Neural Signal Sampler... projects diverse brain signals into a shared semantic space... Masked Temporal-Frequency Modeling, a dual-domain pretraining objective that jointly reconstructs brain signals in both the time and frequency domains.

  • IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Sparse Mixture of Experts... shared experts capture modality-invariant representations and routed experts specialize in modality-specific semantics

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

  1. [1]

    Dint transformer.arXiv preprint arXiv:2501.17486,

    Cang, Y ., Liu, Y ., Zhang, X., Zhao, E., and Shi, L. Dint transformer.arXiv preprint arXiv:2501.17486,

  2. [2]

    O., Fonseca, A

    Caro, J. O., Fonseca, A. H. d. O., Averill, C., Rizvi, S. A., Rosati, M., Cross, J. L., Mittal, P., Zappala, E., Levine, D., Dhodapkar, R. M., et al. Brainlm: A foundation model for brain activity recordings.bioRxiv, pp. 2023–09,

  3. [3]

    S., Bi, C., Furman, A

    Chowdhury, N. S., Bi, C., Furman, A. J., Chiang, A. K., Skippen, P., Si, E., Millard, S. K., Margerison, S. M., Spies, D., Keaser, M. L., Silva, J. T. D., Chen, S., Schabrun, S. M., and Seminowicz, D. A. ”predict”. 2025a. doi: doi:10.18112/openneuro.ds005486.v1.0.1. Chowdhury, N. S., Bi, C., Furman, A. J., Chiang, A. K., Skippen, P., Si, E., Millard, S. K...

  4. [4]

    Dong, Z., Li, R., Chong, J. S. X., Dehestani, N., Teng, Y ., Lin, Y ., Li, Z., Zhang, Y ., Xie, Y ., Ooi, L. Q. R., et al. Brain harmony: A multimodal foundation model unifying morphology and function into 1d tokens.arXiv preprint arXiv:2509.24693,

  5. [5]

    Jiang, W.-B., Zhao, L.-M., and Lu, B.-L

    URL https: //doi.org/10.34973/7q0a-vj19. Jiang, W.-B., Zhao, L.-M., and Lu, B.-L. Large brain model for learning generic representations with tremendous eeg data in bci.arXiv preprint arXiv:2405.18765,

  6. [6]

    To- wards robust multimodal physiological foundation mod- els: Handling arbitrary missing modalities.arXiv preprint arXiv:2504.19596, 2025a

    Jiang, W.-B., Fu, X., Ding, Y ., and Guan, C. To- wards robust multimodal physiological foundation mod- els: Handling arbitrary missing modalities.arXiv preprint arXiv:2504.19596, 2025a. Jiang, W.-B., Liu, X.-H., Zheng, W.-L., and Lu, B.-L. Seed- vii: A multimodal dataset of six basic emotions with continuous labels for emotion recognition.IEEE Trans- act...

  7. [7]

    Scaling Laws for Neural Language Models

    Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361,

  8. [8]

    Liu, W., Qiu, J.-L., Zheng, W.-L., and Lu, B.-L

    doi: 10.1109/ TAFFC.2025.3572504. Liu, W., Qiu, J.-L., Zheng, W.-L., and Lu, B.-L. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recogni- tion.IEEE Transactions on Cognitive and Developmental Systems,

  9. [9]

    URL https://dx.doi.org/10.1088/ 1741-2552/ad546d

    doi: 10.1088/1741-2552/ ad546d. URL https://dx.doi.org/10.1088/ 1741-2552/ad546d. Maaten, L. v. d. and Hinton, G. Visualizing data using t-sne.Journal of machine learning research, 9(Nov): 2579–2605,

  10. [10]

    Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170,

    Song, Y ., Jia, X., Yang, L., and Xie, L. Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170,

  11. [11]

    U., Kreiman, G., Katz, B., Cases, I., and Barbu, A

    Wang, C., Subramaniam, V ., Yaari, A. U., Kreiman, G., Katz, B., Cases, I., and Barbu, A. Brainbert: Self- supervised representation learning for intracranial record- ings.arXiv preprint arXiv:2302.14367,

  12. [12]

    Cbramod: A criss-cross brain foundation model for eeg decoding.arXiv preprint arXiv:2412.07236, 2024a

    Wang, J., Zhao, S., Luo, Z., Zhou, Y ., Jiang, H., Li, S., Li, T., and Pan, G. Cbramod: A criss-cross brain foundation model for eeg decoding.arXiv preprint arXiv:2412.07236, 2024a. Wang, L., Gao, H., Zhao, C., Sun, X., and Dai, D. Auxiliary- loss-free load balancing strategy for mixture-of-experts. arXiv preprint arXiv:2408.15664, 2024b. Wei, D., Zhuang,...

  13. [13]

    Brainomni: A brain foundation model for unified eeg and meg signals

    Xiao, Q., Cui, Z., Zhang, C., Chen, S., Wu, W., Thwaites, A., Woolgar, A., Zhou, B., and Zhang, C. Brainomni: A brain foundation model for unified eeg and meg signals. arXiv preprint arXiv:2505.18185,

  14. [14]

    Differential transformer.arXiv preprint arXiv:2410.05258,

    Ye, T., Dong, L., Xia, Y ., Sun, Y ., Zhu, Y ., Huang, G., and Wei, F. Differential transformer.arXiv preprint arXiv:2410.05258,

  15. [15]

    Zheng, W.-L

    doi: 10.1109/TAMD.2015.2431497. Zheng, W.-L. and Lu, B.-L. A multimodal ap- proach to estimating vigilance using eeg and fore- head eog.Journal of Neural Engineering, 14(2): 026017,

  16. [16]

    Zhou, Y ., Wu, J., Ren, Z., Yao, Z., Lu, W., Peng, K., Zheng, Q., Song, C., Ouyang, W., and Gou, C

    URL http://stacks.iop.org/ 1741-2552/14/i=2/a=026017. Zhou, Y ., Wu, J., Ren, Z., Yao, Z., Lu, W., Peng, K., Zheng, Q., Song, C., Ouyang, W., and Gou, C. Csbrain: A cross-scale spatiotemporal brain foundation model for eeg decoding.arXiv preprint arXiv:2506.23075,

  17. [17]

    Data Preprocessing If the public datasets provide official preprocessed data, we use them directly

    A.1. Data Preprocessing If the public datasets provide official preprocessed data, we use them directly. Otherwise, we apply minimal, modality- specific preprocessing pipelines to remove artifacts. Raw EEG and MEG signals are preprocessed using MNE-Python (Gramfort et al., 2014), and fMRI data are preprocessed with fMRIPrep (Esteban et al., 2019). After p...

  18. [18]

    High- density rs-EEG was acquired using a 64-channel elastic cap (10–20 system) at 1,000 Hz sampling rate

    is a large-scale longitudinal resting-state EEG dataset collected from 608 participants (ages 20-70), with 208 participants returning for a follow-up session approximately 5 years later. High- density rs-EEG was acquired using a 64-channel elastic cap (10–20 system) at 1,000 Hz sampling rate. Recordings were obtained both before and after a 2-hour battery...

  19. [19]

    andSEED-SD(Li et al., 2025). All datasets were recorded using a 62-channel ESI NeuroScan system at a 1,000 Hz sampling rate during emotion-related paradigms spanning positive, negative and neutral valence, as well as more fine-grained affective states (e.g., amusement, fear, sadness). In total, the SEED Series provides EEG recordings from 136 unique parti...

  20. [20]

    For pretraining, we incorporate the officially released preprocessed signals: 4-second trials recorded with 58 EEG channels

    is a high-quality motor imagery EEG dataset comprising recordings from 62 healthy participants (ages 17–30) across three sessions. For pretraining, we incorporate the officially released preprocessed signals: 4-second trials recorded with 58 EEG channels. • ABIDEis an international collaborative initiative that aggregates resting-state fMRI, structural MR...

  21. [21]

    Each participant completed four fMRI runs during a well-controlled thermal pain and placebo manipulation paradigm, plus anatomical scans and field maps

    is a large-scale, high-quality pain and placebo analgesia fMRI collection comprising 395 healthy adults (age 30–43 years). Each participant completed four fMRI runs during a well-controlled thermal pain and placebo manipulation paradigm, plus anatomical scans and field maps. • QTIM(Blokland et al., 2011; Sinclair et al.,

  22. [22]

    It includes five emotion categories (happy, sad, fear, disgust and neutral) recorded across three sessions

    is an emotion recognition EEG benchmark collected from 20 participants using a 62-channel ESI NeuroScan system at a 1,000 Hz sampling rate. It includes five emotion categories (happy, sad, fear, disgust and neutral) recorded across three sessions. Each session contains 15 trials, yielding a total of 117,744 1-second samples. Following the conventional str...

  23. [23]

    For TUAB and ADNI, a ratio of 0.7 yields the strongest overall results (e.g., 81.88% BAC on TUAB, 68.23% BAC on ADNI), though 0.8 performs comparably on ADNI (74.53% AUROC)

    Performance drops sharply at a low mask ratio of 0.4 across all modalities and metrics, so we do not explore ratios below this threshold. For TUAB and ADNI, a ratio of 0.7 yields the strongest overall results (e.g., 81.88% BAC on TUAB, 68.23% BAC on ADNI), though 0.8 performs comparably on ADNI (74.53% AUROC). In contrast, ratios of 0.6 and 0.5 are superi...