Deep Transfer Learning For Whole-Brain fMRI Analyses

Armin W. Thomas; Klaus-Robert M\"uller; Wojciech Samek

arxiv: 1907.01953 · v1 · pith:AU2CE6SSnew · submitted 2019-07-02 · 📡 eess.IV · cs.LG· stat.ML

Deep Transfer Learning For Whole-Brain fMRI Analyses

Armin W. Thomas , Klaus-Robert M\"uller , Wojciech Samek This is my paper

Pith reviewed 2026-05-25 11:03 UTC · model grok-4.3

classification 📡 eess.IV cs.LGstat.ML

keywords deep learningtransfer learningfMRIcognitive decodingbrain imagingfine-tuningwhole-brain analysis

0 comments

The pith

A deep learning model pre-trained on a large public fMRI dataset decodes cognitive states from new tasks after fine-tuning on data from only three subjects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deep learning models for decoding cognitive states from whole-brain fMRI data often fail when sample sizes are small, as is common in clinical work. The paper demonstrates that transfer learning addresses this limitation by first training on a large public dataset and then adapting the model to a new, unrelated task. The pre-trained variant outperforms an identical architecture trained from scratch on the target data. With fine-tuning on only three subjects it reaches 67.51 percent accuracy on a held-out test set of 100 individuals. This result shows that publicly available data can make deep learning practical for fMRI analysis even when new patient samples are scarce.

Core claim

A deep learning model pre-trained on a large openly available fMRI dataset can be fine-tuned on a new unrelated task and correctly decode 67.51 percent of the cognitive states from a test dataset with 100 individuals when fine-tuned on a dataset of the size of only three subjects. It outperforms a model variant with the same architecture that is trained from scratch on the new task data.

What carries the argument

Transfer learning via a pre-trained deep neural network on a large public fMRI dataset followed by fine-tuning on the target task.

If this is right

Deep learning becomes feasible for cognitive-state decoding in settings where only small amounts of new fMRI data can be collected.
Openly available large datasets can improve performance on unrelated tasks without requiring matched experimental designs.
The same pre-training plus fine-tuning pipeline can be applied to other small-sample fMRI decoding problems.
Clinical applications that rely on patient-specific data become more accessible because the required training set size drops dramatically.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to other neuroimaging modalities if comparable large public datasets exist for pre-training.
Performance gains could be tested by varying the degree of mismatch between pre-training and target datasets to map transfer boundaries.
If the learned features capture general brain patterns, the method could support cross-task or cross-population prediction with minimal additional data.
Downstream tasks such as individualized diagnosis or treatment monitoring might become practical with fewer scans per patient.

Load-bearing premise

Features learned from a large public fMRI dataset are sufficiently relevant and transferable to a new unrelated fMRI task despite differences in experimental design, scanner, and participant population.

What would settle it

A model trained from scratch on the new task with the same three-subject fine-tuning set achieves equal or higher accuracy than the pre-trained model on the 100-individual test set.

Figures

Figures reproduced from arXiv: 1907.01953 by Armin W. Thomas, Klaus-Robert M\"uller, Wojciech Samek.

**Figure 1.** Figure 1: Illustration of the DeepLight framework [13]. DeepLight first separates a wholebrain fMRI volume into a sequence of axial slices. Each axial slice is then processed by a convolutional feature extractor. The resulting sequence of higher-level axial slice representations is processed by a bi-directional LSTM unit, before a decoding prediction is made through a fully connected softmax output layer. highpass … view at source ↗

**Figure 2.** Figure 2: DeepLight pre-training statistics. A-B: Mean decoding accuracy in the training (A) and validation (B) data, as a function of training epochs. C: Mean decoding accuracy in the validation data after 40 training epochs. Lines represent grand means, surrounded by standard error bands. Bar heights indicate grand means, while scatter points indicate subject means. Colors indicate tasks. Dashed lines indicate ch… view at source ↗

**Figure 3.** Figure 3: Comparison of a ”pre-trained” DeepLight variant with a ”not pre-trained” variant that is trained entirely from scratch, when both are applied to subsets of 1%, 5%, 10%, 20%, 40%, 60%, and 100% of the full training dataset (N = 300) of the test task (the working memory task). A, B, D, E: Decoding accuracy as a function of training epochs in the training (A-B) and validation data (D-E). C, F: Difference in d… view at source ↗

read the original abstract

The application of deep learning (DL) models to the decoding of cognitive states from whole-brain functional Magnetic Resonance Imaging (fMRI) data is often hindered by the small sample size and high dimensionality of these datasets. Especially, in clinical settings, where patient data are scarce. In this work, we demonstrate that transfer learning represents a solution to this problem. Particularly, we show that a DL model, which has been previously trained on a large openly available fMRI dataset of the Human Connectome Project, outperforms a model variant with the same architecture, but which is trained from scratch, when both are applied to the data of a new, unrelated fMRI task. Even further, the pre-trained DL model variant is already able to correctly decode 67.51% of the cognitive states from a test dataset with 100 individuals, when fine-tuned on a dataset of the size of only three subjects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pretraining on HCP lets them fine-tune on three subjects and claim 67.51% on an unrelated task, but the transfer step rests on an untested assumption about feature relevance across scanners and designs.

read the letter

The central result is straightforward: a deep model pretrained on the Human Connectome Project data, then fine-tuned on only three subjects from a new task, reaches 67.51% accuracy on a 100-subject test set and beats the identical architecture trained from scratch. This directly targets the small-sample issue that limits deep learning in clinical fMRI studies. The baseline comparison is clear and the quantitative claim is easy to understand. That is the useful part of the work. It shows one practical route for borrowing large public datasets when local data are scarce. The paper does not claim a new architecture or a theoretical fix, just an empirical demonstration that the transfer helps in this setting. The soft spot is the transfer itself. The target task is labeled unrelated, yet the abstract gives no measure of task or network overlap, no correction for scanner or population differences, and no ablation that isolates the pretrained weights from other training choices. Without those checks, it is unclear whether the gain would survive different preprocessing or a more distant task. The abstract also omits error bars, cross-validation details, and full methods, so the 67.51% number cannot be assessed for stability. This paper is for neuroimaging groups that already work with whole-brain decoding and need small-n solutions. A reader in that area would find the numbers worth examining even if the methods require closer inspection. It is grounded enough in a real problem and a concrete experiment to go to peer review rather than a desk reject, though reviewers will need to see the validation and domain-shift controls.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that transfer learning from a deep learning model pre-trained on the large Human Connectome Project (HCP) fMRI dataset enables effective decoding of cognitive states on a new, unrelated task. The pre-trained model, when fine-tuned on data from only three subjects, achieves 67.51% accuracy on a test set of 100 individuals and outperforms an identical architecture trained from scratch on the same small dataset.

Significance. If the empirical result holds after addressing verification gaps, the work would be significant for clinical fMRI applications where sample sizes are limited, as it shows how large public datasets can bootstrap models for new tasks. The direct performance comparison to a from-scratch baseline is a clear strength that allows assessment of the transfer benefit.

major comments (3)

[Abstract] Abstract: The central claim of 67.51% accuracy after fine-tuning on only three subjects is load-bearing for the transfer-learning contribution, yet no details are provided on the cross-validation scheme, preprocessing steps, independence of the 100-subject test set, or statistical significance of the gain over the from-scratch baseline.
[Abstract] Abstract / Results: No ablation or control experiment isolates the contribution of the HCP pre-trained weights from other factors such as architecture choice or regularization; this is required to substantiate that the reported gain arises from transfer rather than other modeling decisions.
[Abstract] Abstract: The assertion that the target task is 'unrelated' to HCP is not accompanied by any quantitative measure of task similarity, scanner effects, or population shift, leaving the transferability assumption (the weakest link in the argument) without direct support.

minor comments (1)

The manuscript should report error bars, standard deviations, or confidence intervals on all accuracy figures and include a table comparing the two model variants across multiple runs or folds.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of 67.51% accuracy after fine-tuning on only three subjects is load-bearing for the transfer-learning contribution, yet no details are provided on the cross-validation scheme, preprocessing steps, independence of the 100-subject test set, or statistical significance of the gain over the from-scratch baseline.

Authors: We agree that the abstract would benefit from additional methodological details to support the central claim. In the revised version, we have updated the abstract to briefly describe the cross-validation scheme (subject-wise leave-one-out cross-validation), the standard preprocessing steps applied to the fMRI data, the independence of the 100-subject test set from the three subjects used for fine-tuning, and the statistical significance of the performance improvement (p-value from a paired statistical test). These details were already present in the methods and results sections but are now summarized in the abstract for completeness. revision: yes
Referee: [Abstract] Abstract / Results: No ablation or control experiment isolates the contribution of the HCP pre-trained weights from other factors such as architecture choice or regularization; this is required to substantiate that the reported gain arises from transfer rather than other modeling decisions.

Authors: We respectfully disagree that additional ablations are necessary to isolate the contribution of the pre-trained weights. The manuscript already includes a direct comparison to an identical model architecture trained from scratch on the same small dataset, with all other factors (including regularization, optimizer, and hyperparameters) held constant. This standard control in transfer learning literature directly attributes any performance gain to the use of pre-trained weights rather than other modeling decisions. We have added a clarifying sentence in the results section to emphasize this point. revision: partial
Referee: [Abstract] Abstract: The assertion that the target task is 'unrelated' to HCP is not accompanied by any quantitative measure of task similarity, scanner effects, or population shift, leaving the transferability assumption (the weakest link in the argument) without direct support.

Authors: We acknowledge the value of a quantitative measure of task dissimilarity. However, the target task originates from a completely different study with distinct cognitive demands and experimental design compared to the HCP tasks, and was acquired on different scanners with a different subject population. We have expanded the discussion section to elaborate on these qualitative differences and why they support the transfer learning scenario. A formal quantitative similarity metric would require additional analysis beyond the scope of the current work but could be explored in future studies. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical model comparison on held-out data.

full rationale

The paper reports measured test accuracies (e.g., 67.51 % after fine-tuning on three subjects) obtained by training and evaluating neural networks on separate fMRI datasets. No equations, uniqueness theorems, or self-citations are invoked to derive these numbers from fitted parameters inside the paper; the results are direct empirical outcomes on external test subjects. The transferability claim is an empirical observation, not a derivation that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim is an empirical performance result. It rests on the assumption that the HCP pre-training distribution overlaps sufficiently with the target task distribution; no new mathematical axioms or invented physical entities are introduced.

free parameters (1)

DL model hyperparameters
Standard deep network training involves many tunable hyperparameters whose values affect the reported accuracy.

pith-pipeline@v0.9.0 · 5689 in / 1236 out tokens · 32457 ms · 2026-05-25T11:03:48.526446+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Neuroimage 80, 169–189 (2013)

Barch, D.M., Burgess, G.C., Harms, M.P., et al.: Function in the human connec- tome: task-fmri and individual diﬀerences in behavior. Neuroimage 80, 169–189 (2013)

work page 2013
[2]

Neuroimage 80, 105–124 (2013)

Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., et al.: The minimal preprocessing pipelines for the human connectome project. Neuroimage 80, 105–124 (2013)

work page 2013
[3]

In: AISTATS

Glorot, X., Bengio, Y.: Understanding the diﬃculty of training deep feedforward neural networks. In: AISTATS. pp. 249–256 (2010)

work page 2010
[4]

Neural computation 9(8), 1735–1780 (1997)

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)

work page 1997
[5]

IEEE transactions on medical imaging 37(7), 1551–1561 (2017)

Huang, H., Hu, X., Zhao, Y., et al.: Modeling task fmri data via deep convolutional autoencoder. IEEE transactions on medical imaging 37(7), 1551–1561 (2017)

work page 2017
[6]

NeuroImage 145, 314–328 (2017)

Jang, H., Plis, S.M., Calhoun, V.D., Lee, J.H.: Task-speciﬁc feature extraction and classiﬁcation of fmri volumes using a deep neural network initialized with a deep belief network: Evaluation using sensorimotor tasks. NeuroImage 145, 314–328 (2017)

work page 2017
[7]

Nature Communications 10, 1096 (2019)

Lapuschkin, S., W¨ aldchen, S., Binder, A., Montavon, G., Samek, W., M¨ uller, K.R.: Unmasking clever hans predictors and assessing what machines really learn. Nature Communications 10, 1096 (2019)

work page 2019
[8]

nature 521(7553), 436 (2015)

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436 (2015)

work page 2015
[9]

The handbook of brain theory and neural networks 3361(10), 1995 (1995)

LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10), 1995 (1995)

work page 1995
[10]

Neuroimage 56(2), 387–399 (2011)

Lemm, S., Blankertz, B., Dickhaus, T., M¨ uller, K.R.: Introduction to machine learning for brain imaging. Neuroimage 56(2), 387–399 (2011)

work page 2011
[11]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1717–1724 (2014)

work page 2014
[12]

Frontiers in neuroscience 8, 229 (2014)

Plis, S.M., Hjelm, D.R., Salakhutdinov, R., et al.: Deep learning for neuroimaging: a validation study. Frontiers in neuroscience 8, 229 (2014)

work page 2014
[13]

Analyzing Neuroimaging Data Through Recurrent Deep Learning Models

Thomas, A.W., Heekeren, H.R., M¨ uller, K.R., Samek, W.: Analyzing neuroimag- ing data through recurrent deep learning models. arXiv preprint arXiv:1810.09945 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Neuroimage 80, 80–104 (2013)

U˘ gurbil, K., Xu, J., Auerbach, E.J., et al.: Pushing spatial and temporal resolution for functional and diﬀusion mri in the human connectome project. Neuroimage 80, 80–104 (2013)

work page 2013

[1] [1]

Neuroimage 80, 169–189 (2013)

Barch, D.M., Burgess, G.C., Harms, M.P., et al.: Function in the human connec- tome: task-fmri and individual diﬀerences in behavior. Neuroimage 80, 169–189 (2013)

work page 2013

[2] [2]

Neuroimage 80, 105–124 (2013)

Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., et al.: The minimal preprocessing pipelines for the human connectome project. Neuroimage 80, 105–124 (2013)

work page 2013

[3] [3]

In: AISTATS

Glorot, X., Bengio, Y.: Understanding the diﬃculty of training deep feedforward neural networks. In: AISTATS. pp. 249–256 (2010)

work page 2010

[4] [4]

Neural computation 9(8), 1735–1780 (1997)

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)

work page 1997

[5] [5]

IEEE transactions on medical imaging 37(7), 1551–1561 (2017)

Huang, H., Hu, X., Zhao, Y., et al.: Modeling task fmri data via deep convolutional autoencoder. IEEE transactions on medical imaging 37(7), 1551–1561 (2017)

work page 2017

[6] [6]

NeuroImage 145, 314–328 (2017)

Jang, H., Plis, S.M., Calhoun, V.D., Lee, J.H.: Task-speciﬁc feature extraction and classiﬁcation of fmri volumes using a deep neural network initialized with a deep belief network: Evaluation using sensorimotor tasks. NeuroImage 145, 314–328 (2017)

work page 2017

[7] [7]

Nature Communications 10, 1096 (2019)

Lapuschkin, S., W¨ aldchen, S., Binder, A., Montavon, G., Samek, W., M¨ uller, K.R.: Unmasking clever hans predictors and assessing what machines really learn. Nature Communications 10, 1096 (2019)

work page 2019

[8] [8]

nature 521(7553), 436 (2015)

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436 (2015)

work page 2015

[9] [9]

The handbook of brain theory and neural networks 3361(10), 1995 (1995)

LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10), 1995 (1995)

work page 1995

[10] [10]

Neuroimage 56(2), 387–399 (2011)

Lemm, S., Blankertz, B., Dickhaus, T., M¨ uller, K.R.: Introduction to machine learning for brain imaging. Neuroimage 56(2), 387–399 (2011)

work page 2011

[11] [11]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1717–1724 (2014)

work page 2014

[12] [12]

Frontiers in neuroscience 8, 229 (2014)

Plis, S.M., Hjelm, D.R., Salakhutdinov, R., et al.: Deep learning for neuroimaging: a validation study. Frontiers in neuroscience 8, 229 (2014)

work page 2014

[13] [13]

Analyzing Neuroimaging Data Through Recurrent Deep Learning Models

Thomas, A.W., Heekeren, H.R., M¨ uller, K.R., Samek, W.: Analyzing neuroimag- ing data through recurrent deep learning models. arXiv preprint arXiv:1810.09945 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Neuroimage 80, 80–104 (2013)

U˘ gurbil, K., Xu, J., Auerbach, E.J., et al.: Pushing spatial and temporal resolution for functional and diﬀusion mri in the human connectome project. Neuroimage 80, 80–104 (2013)

work page 2013