pith. sign in

arxiv: 2605.14048 · v3 · pith:Y5TDPHCKnew · submitted 2026-05-13 · 💻 cs.AI · cs.LG

Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning

Pith reviewed 2026-05-20 20:42 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords functional connectivitymasked autoencodersself-supervised learningnetwork tokenizationbilinear factorizationbrain networksrepresentation learningneuroimaging
0
0 comments X

The pith

Partitioning functional connectivity matrices into network-pair patches and embedding them bilinearly produces more stable, cross-cohort transferable representations than uniform or graph-based alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that standard ways of breaking functional connectivity matrices into tokens ignore the brain's large-scale modular organization, and that respecting network boundaries during tokenization improves self-supervised learning. It does this by splitting each matrix into intra- and inter-network blocks defined by an anatomical parcellation, then handling the resulting patches of unequal size through a bilinear factorization that keeps each network's identity explicit while keeping the number of parameters linear rather than quadratic. If the approach is correct, the learned representations should support more reliable prediction of behavior and psychopathology when a model trained on one developmental cohort is applied to another. The evaluations on ABCD, PNC, and CCNP data, together with targeted ablations, are presented as evidence that both the network-grounded partitioning and the bilinear embedding step are necessary for the observed gains in stability and transfer.

Core claim

By redefining FC tokenization as patches of intra- and inter-network connectivity blocks drawn from anatomically grounded network pairs and embedding those heterogeneous patches with a structured bilinear factorization that preserves network identity, NERVE produces self-supervised representations that are more stable and transferable across cohorts than those obtained from structurally agnostic MAE variants or graph-based baselines.

What carries the argument

Structured bilinear factorization that embeds variable-sized network-pair FC patches while preserving each network's distinct functional role and achieving linear parameter scaling with the number of networks.

If this is right

  • Representations trained with network-aware bilinear tokenization generalize more reliably to unseen cohorts for predicting individual differences in behavior and psychopathology.
  • The bilinear formulation reduces parameter count from quadratic to linear in the number of networks while retaining network-specific identity.
  • Ablations establish that both the anatomically grounded parcellation and the bilinear embedding are required for the reported stability and transfer gains.
  • Incorporating domain-specific structural priors into self-supervised pipelines for functional connectomics yields measurable improvements over treating connectivity matrices as homogeneous or purely graph-structured data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar network-boundary priors could be tested on dynamic or task-based connectivity to see whether the same modular alignment improves representation quality in those settings.
  • The linear scaling property would allow the method to be applied to finer-grained atlases containing more networks without a prohibitive increase in model size.
  • If the gains stem from explicit preservation of network identity, the approach might also serve as a way to inject known network-level priors into supervised or contrastive learning frameworks for neuroimaging.

Load-bearing premise

That anatomically defined network pairs create patches that match the brain's intrinsic modular organization and that this match is required to learn representations that transfer across cohorts.

What would settle it

If, in head-to-head cross-cohort tests, the network-aware bilinear model shows no improvement or worse performance than a standard fixed-size patch MAE on the same behavior and psychopathology prediction tasks, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.14048 by Bahram Jafrasteh, Leo Milecki, Mert R. Sabuncu, Qingyu Hu, Qingyu Zhao.

Figure 1
Figure 1. Figure 1: Overview of NERVE. A. The functional connectivity (FC) matrix is parti￾tioned into patches defined by pairs of functional brain networks. B. Network-aware Bilinear Tokenization. Each functional network is assigned learnable network￾specific weights at initialization, and patch tokens are computed through structured bilinear interactions between network weights during forward. C. MAE Framework. We apply a s… view at source ↗
Figure 1
Figure 1. Figure 1: Overview of NERVE. A. The functional connectivity (FC) matrix is parti￾tioned into patches defined by pairs of functional brain networks. B. Network-aware Bilinear Tokenization. Each functional network is assigned learnable network￾specific weights at initialization, and patch tokens are computed through structured bilinear interactions between network weights during forward. C. MAE Framework. We apply a s… view at source ↗
read the original abstract

Masked autoencoders (MAEs) have recently shown promise for self-supervised representation learning of resting-state brain functional connectivity (FC). However, a fundamental question remains unresolved: how should FC matrices be tokenized to align with the intrinsic modular organization of large-scale brain networks? Existing approaches typically adopt region-centric or graph-based schemes that treat FC as structurally homogeneous elements and overlook the large-scale network brain organization. We introduce NERVE (Network-Aware Representations of Brain Functional Connectivity via Bilinear Tokenization), a self-supervised learning framework that redefines FC tokenization by partitioning FC matrices into patches of intra- and inter-network connectivity blocks. Unlike image-based MAE, where fixed-size patches share a common tokenizer, FC patches defined by network pairs are heterogeneous in size and correspond to distinct functional roles. To resolve this problem, NERVE embeds FC patches through a novel structured bilinear factorization. This formulation preserves network identity and reduces parameter complexity from quadratic to linear scaling in the number of networks. We evaluate NERVE across three large-scale developmental cohorts (ABCD, PNC, and CCNP) for behavior and psychopathology prediction. Compared to structurally agnostic MAE variants and graph-based self-supervised baselines, the proposed network-aware formulation yields more stable and transferable representations, particularly in cross-cohort evaluation. Ablation studies confirm that the proposed bilinear network embedding and anatomically grounded parcellation are critical for performance. These findings highlight the importance of incorporating domain-specific structural priors into self-supervised learning for functional connectomics. Code is available at: https://github.com/leomlck/NERVE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces NERVE, a masked autoencoder framework for self-supervised representation learning on resting-state functional connectivity (FC) matrices. It partitions FC into intra- and inter-network patches using an anatomically grounded parcellation and embeds these heterogeneous patches via a structured bilinear factorization that preserves network identity while scaling linearly with the number of networks. Evaluations across ABCD, PNC, and CCNP cohorts for behavior and psychopathology prediction tasks claim superior stability and cross-cohort transferability relative to structurally agnostic MAE variants and graph-based self-supervised baselines, with ablations attributing gains to the bilinear embedding and anatomical parcellation.

Significance. If the central claims hold, the work would advance self-supervised learning for functional connectomics by demonstrating how domain-specific priors on large-scale brain network organization can be incorporated into tokenization schemes. The bilinear formulation addresses a practical scaling issue for heterogeneous FC patches, and the multi-cohort cross-evaluation design provides a meaningful test of transferability. Code availability supports reproducibility.

major comments (2)
  1. [Ablation studies] Ablation studies: The results show that anatomically grounded parcellation improves performance over alternatives, but the experiments do not include a control using random or non-anatomical structured partitioning of comparable heterogeneity. This leaves open whether gains derive from alignment with intrinsic functional modularity or simply from introducing any bilinear low-rank structure into the MAE training.
  2. [Cross-cohort evaluation] Cross-cohort evaluation and embedding analysis: The central claim that network-aware tokenization yields more transferable representations would be strengthened by direct verification that learned embeddings respect modular boundaries (e.g., quantitative intra- versus inter-network similarity metrics on the resulting tokens). Without this, the motivation for the anatomical prior remains partially unverified even if performance improves.
minor comments (2)
  1. [Abstract] The abstract would benefit from reporting at least one key quantitative metric (e.g., improvement in R² or AUC with confidence interval) rather than purely qualitative statements about stability and transferability.
  2. [Methods] Notation in the bilinear factorization section could be expanded with an explicit equation showing how network-pair identity is preserved across heterogeneous patch sizes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional experiments and analyses that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: Ablation studies: The results show that anatomically grounded parcellation improves performance over alternatives, but the experiments do not include a control using random or non-anatomical structured partitioning of comparable heterogeneity. This leaves open whether gains derive from alignment with intrinsic functional modularity or simply from introducing any bilinear low-rank structure into the MAE training.

    Authors: We agree that a random partitioning control is needed to isolate the contribution of anatomical alignment from the bilinear structure itself. In the revised manuscript we have added this ablation: FC matrices were randomly partitioned into patches of comparable size heterogeneity and the model was retrained under identical conditions. Results show that the bilinear factorization alone yields moderate gains, but the anatomically grounded parcellation produces statistically significant further improvements in cross-cohort transfer performance. These new results are reported in the updated ablation section with appropriate statistical tests. revision: yes

  2. Referee: Cross-cohort evaluation and embedding analysis: The central claim that network-aware tokenization yields more transferable representations would be strengthened by direct verification that learned embeddings respect modular boundaries (e.g., quantitative intra- versus inter-network similarity metrics on the resulting tokens). Without this, the motivation for the anatomical prior remains partially unverified even if performance improves.

    Authors: We thank the referee for this suggestion. In the revised manuscript we now include a quantitative embedding analysis that computes average cosine similarity among tokens belonging to the same network versus tokens from different networks. The analysis demonstrates that NERVE embeddings exhibit markedly higher intra-network similarity than inter-network similarity and than the corresponding metrics from the structurally agnostic baselines. This verification is presented in a new subsection on embedding properties and supports the role of the anatomical prior in producing modular representations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is a design choice evaluated empirically

full rationale

The paper proposes NERVE as a new tokenization scheme using anatomically defined network-pair patches and a structured bilinear factorization to handle heterogeneous patch sizes while preserving network identity. This is presented as an architectural prior motivated by domain knowledge of brain networks, not as a derived prediction or first-principles result. Claims of improved stability and transferability rest on direct empirical comparisons against baselines and ablations across held-out cohorts (ABCD, PNC, CCNP), without any reduction of outputs to fitted parameters from the same data or load-bearing self-citations. The bilinear formulation is introduced to address a practical scaling issue and is validated by performance, not by construction equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that brain networks form a meaningful modular structure that should guide tokenization, plus the new bilinear embedding technique; no explicit free parameters beyond standard model training are described.

axioms (1)
  • domain assumption Large-scale brain networks exhibit distinct functional roles that should be preserved when tokenizing functional connectivity matrices.
    This premise motivates the network-pair partitioning and is invoked to justify why standard homogeneous tokenization is insufficient.
invented entities (1)
  • Structured bilinear factorization for network-pair FC patches no independent evidence
    purpose: Embed heterogeneous-sized intra- and inter-network blocks while preserving network identity and achieving linear parameter scaling.
    New mechanism introduced to address the variable patch sizes that arise from network-aware partitioning.

pith-pipeline@v0.9.0 · 5825 in / 1351 out tokens · 72168 ms · 2026-05-20T20:42:11.998120+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Psychological Bulletin85(6), 1275–1301 (1978)

    Achenbach, T.M., Edelbrock, C.S.: The classification of child psychopathology: A review and analysis of empirical efforts. Psychological Bulletin85(6), 1275–1301 (1978)

  2. [2]

    Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S., et al.: Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci.14(5), 365–376 (2013)

  3. [3]

    In: ICLR

    Caro, J.O., Fonseca, A.H.d.O., Averill, C., Rizvi, S.A., Rosati, M., Cross, J.L., et al.: BrainLM: A foundation model for brain activity recordings. In: ICLR. Curran Associates, Inc. (2024)

  4. [4]

    In: NeurIPS

    Dong, Z., Li, R., Wu, Y., Nguyen, T., Su, J., Chong, et al.: Brain-JEPA: Brain Dy- namics Foundation Model with Gradient Positioning and Spatiotemporal Masking. In: NeurIPS. vol. 37, pp. 86048–86073. Curran Associates, Inc. (2024)

  5. [5]

    Frontiers in Neuroscience13(2019)

    Farahani, F.V., Karwowski, W., Lighthall, N.R.: Application of graph theory for identifying connectivity patterns in human brain networks: A systematic review. Frontiers in Neuroscience13(2019)

  6. [6]

    Medical Image Analysis107(Pt B), 103861 (2026)

    Gao, J., Ge, B., Qiang, N., Zhao, S.: 3D masked autoencoder with spatiotemporal transformer for modeling of 4D fMRI data. Medical Image Analysis107(Pt B), 103861 (2026)

  7. [7]

    Devel- opmental Cognitive Neuroscience32, 16–22 (2018)

    Garavan, H., Bartsch, H., Conway, K., Decastro, A., Goldstein, R.Z., Heeringa, S., et al.: Recruiting the ABCD sample: design considerations and procedures. Devel- opmental Cognitive Neuroscience32, 16–22 (2018)

  8. [8]

    In: CVPR

    He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked Autoencoders Are Scalable Vision Learners. In: CVPR. IEEE, Inc. (2021)

  9. [9]

    NeuroImage206(2020)

    He, T., Kong, R., Holmes, A.J., Nguyen, M., Sabuncu, M.R., et al.: Deep neural networks and kernel regression achieve comparable accuracies for functional connec- tivity prediction of behavior and demographics. NeuroImage206(2020)

  10. [10]

    IEEE, Inc

    He, T., Kong, R., Holmes, A.J., Sabuncu, M.R., Eickhoff, S.B., Bzdok, et al.: Is deep learning better than kernel regression for functional connectivity prediction of fluid intelligence? In: PRNI. IEEE, Inc. (2018)

  11. [11]

    Assessment31(2), 502–517 (2024)

    Hoffmann, M.S., Moore, T.M., Axelrud, L.K., Tottenham, N., Pan, P.M., Miguel, et al.: An Evaluation of Item Harmonization Strategies Between Assessment Tools of Psychopathology in Children and Adolescents. Assessment31(2), 502–517 (2024)

  12. [12]

    In: SIGKDD

    Hou, Z., Liu, X., Cen, Y., Dong, Y., Yang, H., Wang, C., et al.: GraphMAE: Self- Supervised Masked Graph Autoencoders. In: SIGKDD. pp. 594–604. Association for Computing Machinery (2022)

  13. [13]

    NeuroImage80, 360–378 (2013) 10 L

    Hutchison, R.M., Womelsdorf, T., Allen, E.A., Bandettini, P.A., Calhoun, V.D., Corbetta, et al.: Dynamic functional connectivity: Promise, issues, and interpreta- tions. NeuroImage80, 360–378 (2013) 10 L. Milecki et al

  14. [14]

    In: NeurIPS

    Kan, X., Dai, W., Cui, H., Zhang, Z., Guo, Y., Yang, C.: Brain Network Trans- former. In: NeurIPS. vol. 35. Curran Associates, Inc. (2022)

  15. [15]

    NeuroImage146, 1038–1049 (2017)

    Kawahara, J., Brown, C.J., Miller, S.P., Booth, B.G., Chau, V., Grunau, R.E., et al.: BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage146, 1038–1049 (2017)

  16. [16]

    The Indian Journal of Statistics30(2), 167–180 (1968)

    Khatri, C.G., Radhakrishna Rao, C.: Solutions to Some Functional Equations and Their Applications to Characterization of Probability Distributions. The Indian Journal of Statistics30(2), 167–180 (1968)

  17. [17]

    Medical Image Analysis 74, 102233 (2021)

    Li, X., Zhou, Y., Dvornek, N., Zhang, M., Gao, S., Zhuang, et al.: BrainGNN: In- terpretable Brain Graph Neural Network for fMRI Analysis. Medical Image Analysis 74, 102233 (2021)

  18. [18]

    NeuroImage262(3), 119531 (2022)

    Litwińczuk, M.C., Muhlert, N., Cloutman, L., Trujillo-Barreto, N., Woollams, A.: Combination of structural and functional connectivity explains unique variation in specific domains of cognitive function. NeuroImage262(3), 119531 (2022)

  19. [19]

    Developmental Cognitive Neuroscience52, 101020 (2021)

    Liu, S., Wang, Y.S., Zhang, Q., Zhou, Q., Cao, L.Z., Jiang, C., et al.: Chinese Color Nest Project : An accelerated longitudinal brain-mind cohort. Developmental Cognitive Neuroscience52, 101020 (2021)

  20. [20]

    IEEE transac- tions on neural networks and learning systems36(6), 10707–10720 (2025)

    Ma, H., Xu, Y., Tian, L.: RS-MAE: Region-State Masked Autoencoder for Neu- ropsychiatric Disorder Classifications Based on Resting-State fMRI. IEEE transac- tions on neural networks and learning systems36(6), 10707–10720 (2025)

  21. [21]

    NeuroImage263, 119636 (2022)

    Ooi, L.Q.R., Chen, J., Zhang, S., Kong, R., Tam, A., Li, J., et al.: Comparison of individualized behavioral predictions across anatomical, diffusion and functional connectivity MRI. NeuroImage263, 119636 (2022)

  22. [22]

    IEEE Transactions on Medical Imaging42(2), 391–402 (2023)

    Peng, L., Wang, N., Xu, J., Zhu, X., Li, X.: GATE: Graph CCA for Temporal Self-Supervised Learning for Label-Efficient fMRI Analysis. IEEE Transactions on Medical Imaging42(2), 391–402 (2023)

  23. [23]

    NeuroImage211, 116604 (2020)

    Pervaiz, U., Vidaurre, D., Woolrich, M.W., Smith, S.M.: Optimising network mod- elling methods for fMRI. NeuroImage211, 116604 (2020)

  24. [24]

    Nature Methods22(3), 473–476 (2025)

    Ren,J.,An,N.,Lin,C.,Zhang,Y.,Sun,Z.,Zhang,etal.:DeepPrep:anaccelerated, scalable and robust pipeline for neuroimaging preprocessing empowered by deep learning. Nature Methods22(3), 473–476 (2025)

  25. [25]

    Neu- roImage86, 544–553 (2014)

    Satterthwaite, T.D., Elliott, M.A., Ruparel, K., Loughead, J., Prabhakaran, K., Calkins, et al.: Neuroimaging of the Philadelphia Neurodevelopmental Cohort. Neu- roImage86, 544–553 (2014)

  26. [26]

    Cerebral cortex28(9), 3095–3114 (2018)

    Schaefer, A., Kong, R., Gordon, E.M., Laumann, T.O., Zuo, X.N., Holmes, et al.: Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cerebral cortex28(9), 3095–3114 (2018)

  27. [27]

    Nature Communications11(1), 1–15 (2020)

    Schulz, M.A., Yeo, B.T., Vogelstein, J.T., Mourao-Miranada, J., Kather, J.N., Ko- rding, K., et al.: Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets. Nature Communications11(1), 1–15 (2020)

  28. [28]

    Nature Mental Health1(5), 304–315 (2023)

    Tiego, J., Martin, E.A., DeYoung, C.G., Hagan, K., Cooper, S.E., Pasion, et al.: Precision behavioral phenotyping as a strategy for uncovering the biological corre- lates of psychopathology. Nature Mental Health1(5), 304–315 (2023)

  29. [29]

    Brain Research1822, 148634 (2024)

    Wei, W., Zhang, K., Chang, J., Zhang, S., Ma, L., Wang, H., et al.: Analyzing 20 years of Resting-State fMRI Research: Trends and collaborative networks revealed. Brain Research1822, 148634 (2024)

  30. [30]

    IEEE Journal of Biomedical and Health Informatics27(8), 4154–4165 (2023) Network-Aware Bilinear Tokenization for Brain Functional Connectivity 11

    Wen, G., Cao, P., Liu, L., Yang, J., Zhang, X., Wang, F., et al.: Graph Self- Supervised Learning With Application to Brain Networks Analysis. IEEE Journal of Biomedical and Health Informatics27(8), 4154–4165 (2023) Network-Aware Bilinear Tokenization for Brain Functional Connectivity 11

  31. [31]

    Woo, C.W., Chang, L.J., Lindquist, M.A., Wager, T.D.: Building better biomark- ers:brainmodelsintranslationalneuroimaging.NatureNeuroscience20(3),365–377 (2017)

  32. [32]

    IEEE Transactions on Medical Imaging43(11), 4004–4016 (2024)

    Yang,Y.,Ye,C.,Su,G.,Zhang,Z.,Chang,Z.,Chen,H.,etal.:BrainMass:Advanc- ing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning. IEEE Transactions on Medical Imaging43(11), 4004–4016 (2024)

  33. [33]

    Journal of Neurophysiology106(3), 1125–1165 (2011)

    Yeo, B.T., Krienen, F.M., Sepulcre, J., Sabuncu, M.R., Lashkari, D., Hollinshead, M., et al.: The organization of the human cerebral cortex estimated by intrinsic functional connectivity. Journal of Neurophysiology106(3), 1125–1165 (2011)