Deep Transfer Learning For Whole-Brain fMRI Analyses
Pith reviewed 2026-05-25 11:03 UTC · model grok-4.3
The pith
A deep learning model pre-trained on a large public fMRI dataset decodes cognitive states from new tasks after fine-tuning on data from only three subjects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A deep learning model pre-trained on a large openly available fMRI dataset can be fine-tuned on a new unrelated task and correctly decode 67.51 percent of the cognitive states from a test dataset with 100 individuals when fine-tuned on a dataset of the size of only three subjects. It outperforms a model variant with the same architecture that is trained from scratch on the new task data.
What carries the argument
Transfer learning via a pre-trained deep neural network on a large public fMRI dataset followed by fine-tuning on the target task.
If this is right
- Deep learning becomes feasible for cognitive-state decoding in settings where only small amounts of new fMRI data can be collected.
- Openly available large datasets can improve performance on unrelated tasks without requiring matched experimental designs.
- The same pre-training plus fine-tuning pipeline can be applied to other small-sample fMRI decoding problems.
- Clinical applications that rely on patient-specific data become more accessible because the required training set size drops dramatically.
Where Pith is reading between the lines
- The approach may generalize to other neuroimaging modalities if comparable large public datasets exist for pre-training.
- Performance gains could be tested by varying the degree of mismatch between pre-training and target datasets to map transfer boundaries.
- If the learned features capture general brain patterns, the method could support cross-task or cross-population prediction with minimal additional data.
- Downstream tasks such as individualized diagnosis or treatment monitoring might become practical with fewer scans per patient.
Load-bearing premise
Features learned from a large public fMRI dataset are sufficiently relevant and transferable to a new unrelated fMRI task despite differences in experimental design, scanner, and participant population.
What would settle it
A model trained from scratch on the new task with the same three-subject fine-tuning set achieves equal or higher accuracy than the pre-trained model on the 100-individual test set.
Figures
read the original abstract
The application of deep learning (DL) models to the decoding of cognitive states from whole-brain functional Magnetic Resonance Imaging (fMRI) data is often hindered by the small sample size and high dimensionality of these datasets. Especially, in clinical settings, where patient data are scarce. In this work, we demonstrate that transfer learning represents a solution to this problem. Particularly, we show that a DL model, which has been previously trained on a large openly available fMRI dataset of the Human Connectome Project, outperforms a model variant with the same architecture, but which is trained from scratch, when both are applied to the data of a new, unrelated fMRI task. Even further, the pre-trained DL model variant is already able to correctly decode 67.51% of the cognitive states from a test dataset with 100 individuals, when fine-tuned on a dataset of the size of only three subjects.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that transfer learning from a deep learning model pre-trained on the large Human Connectome Project (HCP) fMRI dataset enables effective decoding of cognitive states on a new, unrelated task. The pre-trained model, when fine-tuned on data from only three subjects, achieves 67.51% accuracy on a test set of 100 individuals and outperforms an identical architecture trained from scratch on the same small dataset.
Significance. If the empirical result holds after addressing verification gaps, the work would be significant for clinical fMRI applications where sample sizes are limited, as it shows how large public datasets can bootstrap models for new tasks. The direct performance comparison to a from-scratch baseline is a clear strength that allows assessment of the transfer benefit.
major comments (3)
- [Abstract] Abstract: The central claim of 67.51% accuracy after fine-tuning on only three subjects is load-bearing for the transfer-learning contribution, yet no details are provided on the cross-validation scheme, preprocessing steps, independence of the 100-subject test set, or statistical significance of the gain over the from-scratch baseline.
- [Abstract] Abstract / Results: No ablation or control experiment isolates the contribution of the HCP pre-trained weights from other factors such as architecture choice or regularization; this is required to substantiate that the reported gain arises from transfer rather than other modeling decisions.
- [Abstract] Abstract: The assertion that the target task is 'unrelated' to HCP is not accompanied by any quantitative measure of task similarity, scanner effects, or population shift, leaving the transferability assumption (the weakest link in the argument) without direct support.
minor comments (1)
- The manuscript should report error bars, standard deviations, or confidence intervals on all accuracy figures and include a table comparing the two model variants across multiple runs or folds.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of 67.51% accuracy after fine-tuning on only three subjects is load-bearing for the transfer-learning contribution, yet no details are provided on the cross-validation scheme, preprocessing steps, independence of the 100-subject test set, or statistical significance of the gain over the from-scratch baseline.
Authors: We agree that the abstract would benefit from additional methodological details to support the central claim. In the revised version, we have updated the abstract to briefly describe the cross-validation scheme (subject-wise leave-one-out cross-validation), the standard preprocessing steps applied to the fMRI data, the independence of the 100-subject test set from the three subjects used for fine-tuning, and the statistical significance of the performance improvement (p-value from a paired statistical test). These details were already present in the methods and results sections but are now summarized in the abstract for completeness. revision: yes
-
Referee: [Abstract] Abstract / Results: No ablation or control experiment isolates the contribution of the HCP pre-trained weights from other factors such as architecture choice or regularization; this is required to substantiate that the reported gain arises from transfer rather than other modeling decisions.
Authors: We respectfully disagree that additional ablations are necessary to isolate the contribution of the pre-trained weights. The manuscript already includes a direct comparison to an identical model architecture trained from scratch on the same small dataset, with all other factors (including regularization, optimizer, and hyperparameters) held constant. This standard control in transfer learning literature directly attributes any performance gain to the use of pre-trained weights rather than other modeling decisions. We have added a clarifying sentence in the results section to emphasize this point. revision: partial
-
Referee: [Abstract] Abstract: The assertion that the target task is 'unrelated' to HCP is not accompanied by any quantitative measure of task similarity, scanner effects, or population shift, leaving the transferability assumption (the weakest link in the argument) without direct support.
Authors: We acknowledge the value of a quantitative measure of task dissimilarity. However, the target task originates from a completely different study with distinct cognitive demands and experimental design compared to the HCP tasks, and was acquired on different scanners with a different subject population. We have expanded the discussion section to elaborate on these qualitative differences and why they support the transfer learning scenario. A formal quantitative similarity metric would require additional analysis beyond the scope of the current work but could be explored in future studies. revision: partial
Circularity Check
No circularity: purely empirical model comparison on held-out data.
full rationale
The paper reports measured test accuracies (e.g., 67.51 % after fine-tuning on three subjects) obtained by training and evaluating neural networks on separate fMRI datasets. No equations, uniqueness theorems, or self-citations are invoked to derive these numbers from fitted parameters inside the paper; the results are direct empirical outcomes on external test subjects. The transferability claim is an empirical observation, not a derivation that reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- DL model hyperparameters
Reference graph
Works this paper leans on
-
[1]
Barch, D.M., Burgess, G.C., Harms, M.P., et al.: Function in the human connec- tome: task-fmri and individual differences in behavior. Neuroimage 80, 169–189 (2013)
work page 2013
-
[2]
Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., et al.: The minimal preprocessing pipelines for the human connectome project. Neuroimage 80, 105–124 (2013)
work page 2013
-
[3]
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS. pp. 249–256 (2010)
work page 2010
-
[4]
Neural computation 9(8), 1735–1780 (1997)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
work page 1997
-
[5]
IEEE transactions on medical imaging 37(7), 1551–1561 (2017)
Huang, H., Hu, X., Zhao, Y., et al.: Modeling task fmri data via deep convolutional autoencoder. IEEE transactions on medical imaging 37(7), 1551–1561 (2017)
work page 2017
-
[6]
NeuroImage 145, 314–328 (2017)
Jang, H., Plis, S.M., Calhoun, V.D., Lee, J.H.: Task-specific feature extraction and classification of fmri volumes using a deep neural network initialized with a deep belief network: Evaluation using sensorimotor tasks. NeuroImage 145, 314–328 (2017)
work page 2017
-
[7]
Nature Communications 10, 1096 (2019)
Lapuschkin, S., W¨ aldchen, S., Binder, A., Montavon, G., Samek, W., M¨ uller, K.R.: Unmasking clever hans predictors and assessing what machines really learn. Nature Communications 10, 1096 (2019)
work page 2019
-
[8]
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436 (2015)
work page 2015
-
[9]
The handbook of brain theory and neural networks 3361(10), 1995 (1995)
LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10), 1995 (1995)
work page 1995
-
[10]
Neuroimage 56(2), 387–399 (2011)
Lemm, S., Blankertz, B., Dickhaus, T., M¨ uller, K.R.: Introduction to machine learning for brain imaging. Neuroimage 56(2), 387–399 (2011)
work page 2011
-
[11]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1717–1724 (2014)
work page 2014
-
[12]
Frontiers in neuroscience 8, 229 (2014)
Plis, S.M., Hjelm, D.R., Salakhutdinov, R., et al.: Deep learning for neuroimaging: a validation study. Frontiers in neuroscience 8, 229 (2014)
work page 2014
-
[13]
Analyzing Neuroimaging Data Through Recurrent Deep Learning Models
Thomas, A.W., Heekeren, H.R., M¨ uller, K.R., Samek, W.: Analyzing neuroimag- ing data through recurrent deep learning models. arXiv preprint arXiv:1810.09945 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[14]
U˘ gurbil, K., Xu, J., Auerbach, E.J., et al.: Pushing spatial and temporal resolution for functional and diffusion mri in the human connectome project. Neuroimage 80, 80–104 (2013)
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.