Recognition: no theorem link
Knowledge Transfer Scaling Laws for 3D Medical Imaging
Pith reviewed 2026-05-11 01:27 UTC · model grok-4.3
The pith
Optimizing data allocation using scaling laws for asymmetric knowledge transfer improves 3D medical imaging pretraining by up to 58 percent over proportional sampling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Different medical imaging domains scale at variable rates during pretraining, and knowledge transfer between domains is strongly asymmetric. Both MAE reconstruction loss and cross-domain transfer follow predictable power-law trends with domain-specific behaviors. Formulating data allocation as a scaling-law optimization problem reveals an interpretable hub-and-island structure, with highly transferable domains as hubs and isolated ones as islands. The derived transfer-aware allocations outperform data-proportional sampling by up to 58%, generalize well to unseen budgets with r=0.989, and provide stronger pretrained representations validated on disease classification and organ/lesion segment
What carries the argument
Scaling-law optimization of data allocation based on observed power-law trends in MAE loss and asymmetric cross-domain knowledge transfer.
If this is right
- Transfer-aware allocation outperforms data-proportional sampling by up to 58%.
- The allocations generalize well to unseen budgets with a correlation of r=0.989.
- Derived mixtures provide stronger pretrained representations for clinical 3D medical imaging tasks such as disease classification and organ/lesion segmentation.
- Highly transferable domains emerge as hubs that benefit many others and deserve strategic allocation.
- Isolated domains act as islands requiring direct investment.
Where Pith is reading between the lines
- The approach could extend to other multi-modal pretraining scenarios where domains have asymmetric transfer properties.
- If the power laws hold at larger scales, it may allow more efficient use of compute for building larger 3D foundation models.
- The hub-and-island structure might inform which modalities to prioritize when expanding datasets with new imaging types.
- Better pretrained representations could reduce the need for labeled data in downstream clinical applications.
Load-bearing premise
The power-law trends observed in MAE loss and cross-domain transfer on the studied datasets and model sizes will continue to hold for new data budgets, model scales, and unseen modality combinations.
What would settle it
Training a model using the transfer-aware allocation for a new unseen data budget and verifying that its downstream performance on classification or segmentation tasks significantly exceeds that of a data-proportional allocation would confirm the claim; failure to do so would falsify it.
Figures
read the original abstract
Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. However, training such models requires mixing heterogeneous imaging domains, and current mixture strategies remain largely heuristic. In this work, we observe that different medical imaging domains scale at variable rates during pretraining, and knowledge transfer between domains is strongly asymmetric: training on one domain can substantially improve another, but the reverse may be much weaker. Interestingly, both MAE reconstruction loss and cross-domain transfer follow predictable power-law trends with domain-specific behaviors. Motivated by these findings, we formulate data allocation as a scaling-law optimization problem. The derived allocations reveal an interpretable hub-and-island structure: highly transferable domains emerge as hubs that benefit many others and deserve strategic allocation, while isolated domains act as islands requiring direct investment. Empirically, transfer-aware allocation outperforms data-proportional sampling by up to 58% and generalizes well to unseen budgets with r=0.989. Downstream validation on disease classification and organ/lesion segmentation further confirms that the derived transfer-aware mixtures provide stronger pretrained representations for clinical 3D medical imaging tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that MAE reconstruction loss and asymmetric cross-domain knowledge transfer in 3D medical imaging (CT, MRI, PET) follow domain-specific power-law trends. These observations are used to cast data allocation as a scaling-law optimization problem whose solution yields interpretable 'hub-and-island' mixtures. The resulting transfer-aware allocations are reported to outperform data-proportional sampling by up to 58%, to generalize to unseen budgets (r=0.989), and to produce stronger representations on downstream disease classification and organ/lesion segmentation tasks.
Significance. If the fitted exponents and transfer coefficients remain valid outside the observed regime, the work supplies a principled, non-heuristic method for mixing heterogeneous 3D medical volumes that could improve data efficiency in multi-modal foundation-model pretraining. The reported generalization correlation and downstream-task gains constitute concrete, falsifiable support for the approach. The main limitation is that significance is conditional on the robustness of the post-experiment fitting and optimization steps.
major comments (2)
- [Abstract (empirical claims and optimization)] Abstract and empirical results: the 58% gain and r=0.989 generalization rest on power-law parameters fitted to the same pretraining runs that are later used to evaluate transfer. No residuals, R² values, or sensitivity to the number of fitting points are reported, leaving open whether the optimization truly identifies performance-maximizing allocations or merely reproduces artifacts of the chosen functional form.
- [Downstream task experiments] Downstream validation: gains on disease classification and organ/lesion segmentation are presented, yet the manuscript does not state whether total pretraining data volume or compute was held constant across transfer-aware and data-proportional conditions. Without this control, attribution of improvements to the allocation rule rather than dataset idiosyncrasies remains incomplete.
minor comments (1)
- [Methods (scaling-law formulation)] The hub-and-island interpretation is conceptually useful but would be clearer if accompanied by an explicit transfer-matrix equation or table showing the fitted asymmetric coefficients between each modality pair.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the presentation of our work on transfer-aware data allocation for 3D medical imaging. We address each major comment below and indicate revisions where the manuscript will be updated.
read point-by-point responses
-
Referee: Abstract and empirical results: the 58% gain and r=0.989 generalization rest on power-law parameters fitted to the same pretraining runs that are later used to evaluate transfer. No residuals, R² values, or sensitivity to the number of fitting points are reported, leaving open whether the optimization truly identifies performance-maximizing allocations or merely reproduces artifacts of the chosen functional form.
Authors: We agree that additional fit diagnostics would improve transparency. The power-law parameters are derived directly from the observed MAE and transfer curves across the pretraining runs, which is the standard approach for empirical scaling laws. In the revision we will report R² values, residual distributions, and sensitivity of the fitted exponents to the number of data points used for fitting. These additions will show that the functional form captures the observed trends with high fidelity (R² > 0.95 in all domains) and that the resulting allocations remain stable under reasonable perturbations of the fitting set. The reported generalization correlation (r=0.989) on held-out budgets further indicates that the optimization extrapolates beyond the fitting data rather than merely reproducing in-sample artifacts. revision: partial
-
Referee: Downstream validation: gains on disease classification and organ/lesion segmentation are presented, yet the manuscript does not state whether total pretraining data volume or compute was held constant across transfer-aware and data-proportional conditions. Without this control, attribution of improvements to the allocation rule rather than dataset idiosyncrasies remains incomplete.
Authors: The total pretraining data volume and compute budget were identical for the transfer-aware and data-proportional conditions; only the mixing proportions differed. We will add an explicit statement to this effect in the experimental setup section and in the figure captions of the downstream-task results to make the controlled comparison unambiguous. revision: yes
Circularity Check
No significant circularity; scaling-law optimization is applied to fitted trends with independent downstream validation
full rationale
The paper observes power-law scaling in MAE loss and cross-domain transfer from pretraining runs, fits parameters to those observations, and uses the resulting model to solve an optimization problem for data allocations. This is a standard extrapolation procedure rather than a reduction by construction. The allocations are then tested for generalization to unseen budgets (r=0.989) and evaluated via downstream disease classification and segmentation tasks, providing external grounding independent of the fitting data. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are indicated in the provided text. The central claims rest on empirical validation outside the fitted points.
Axiom & Free-Parameter Ledger
free parameters (2)
- domain-specific scaling exponents for MAE loss
- asymmetric transfer coefficients between modality pairs
axioms (2)
- domain assumption MAE reconstruction loss and cross-domain transfer performance follow power-law scaling with data volume in each domain.
- ad hoc to paper The linear combination of domain contributions under the fitted transfer matrix accurately predicts mixture performance.
Reference graph
Works this paper leans on
-
[1]
Adewole, M., Rudie, J.D., Gbdamosi, A., Toyobo, O., Raymond, C., Zhang, D., Omidiji, O., Akinola, R., Suwaid, M.A., Emegoakor, A., et al.: The brain tumor segmentation (brats) challenge 2023: Glioma segmentation in sub-saharan africa patient population (brats-africa). ArXiv pp. arXiv–2305 (2023)
work page 2023
-
[2]
Massively multilingual neural machine translation in the wild: Findings and challenges
Arivazhagan, N., Bapna, A., Firat, O., Lepikhin, D., Johnson, M., Krikun, M., Chen, M.X., Cao, Y ., Foster, G., Cherry, C., et al.: Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019 (2019)
-
[3]
Advances in Neural Information Processing Systems36, 78142–78167 (2023)
Bai, Y ., Ying, J., Cao, Y ., Lv, X., He, Y ., Wang, X., Yu, J., Zeng, K., Xiao, Y ., Lyu, H., et al.: Benchmarking foundation models with language-model-as-an-examiner. Advances in Neural Information Processing Systems36, 78142–78167 (2023)
work page 2023
-
[4]
Neurology105(2), e213831 (2025)
Barnard, L., Botha, H., Corriveau-Lecavalier, N., Graff-Radford, J., Dicks, E., Gogineni, V ., Zhang, G., Burkett, B.J., Johnson, D.R., Huls, S.J., et al.: An fdg-pet–based machine learning framework to support neurologic decision-making in alzheimer disease and related disorders. Neurology105(2), e213831 (2025)
work page 2025
-
[5]
Bassi, P.R., Li, W., Tang, Y ., Isensee, F., Wang, Z., Chen, J., Chou, Y .C., Roy, S., Kirchhoff, Y ., Rokuss, M., et al.: Touchstone benchmark: Are we on the right way for evaluating ai algorithms for medical segmentation? Advances in Neural Information Processing Systems37, 15184–15201 (2024)
work page 2024
-
[6]
SIAM Journal on Scientific Computing 21(1), 1–23 (1999)
Branch, M.A., Coleman, T.F., Li, Y .: A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM Journal on Scientific Computing 21(1), 1–23 (1999)
work page 1999
-
[7]
Med3d: Transfer learning for 3d medical image analysis
Chen, S., Ma, K., Zheng, Y .: Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arXiv:1904.00625 (2019)
-
[8]
Available: https://arxiv.org/abs/2304.09151
Chung, H.W., Constant, N., Garcia, X., Roberts, A., Tay, Y ., Narang, S., Firat, O.: Unimax: Fairer and more effective language sampling for large-scale multilingual pretraining. arXiv preprint arXiv:2304.09151 (2023)
-
[9]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[10]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Gao, Y .: Training like a medical resident: Context-prior learning toward universal medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11194–11204 (2024)
work page 2024
-
[11]
Bimix: A bivariate data mixing law for language model pretraining
Ge, C., Ma, Z., Chen, D., Li, Y ., Ding, B.: Bimix: A bivariate data mixing law for language model pretraining. arXiv preprint arXiv:2405.14908 (2024) 10
-
[12]
In: International MICCAI Brainlesion Workshop
Hatamizadeh, A., Nath, V ., Tang, Y ., Yang, D., Roth, H.R., Xu, D.: Swin unetr: Swin trans- formers for semantic segmentation of brain tumors in mri images. In: International MICCAI Brainlesion Workshop. pp. 272–284. Springer (2022)
work page 2022
-
[13]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
Hatamizadeh, A., Tang, Y ., Nath, V ., Yang, D., Myronenko, A., Landman, B., Roth, H.R., Xu, D.: Unetr: Transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 574–584 (2022)
work page 2022
-
[14]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
He, K., Chen, X., Xie, S., Li, Y ., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16000–16009 (2022)
work page 2022
-
[15]
Hernandez, D., Kaplan, J., Henighan, T., McCandlish, S.: Scaling laws for transfer. arXiv preprint arXiv:2102.01293 (2021)
-
[16]
Training Compute-Optimal Large Language Models
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D., Hendricks, L.A., Welbl, J., Clark, A., et al.: Training compute-optimal large language models. arXiv preprint arXiv:2203.1555610(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[17]
Data scaling laws for radiology foundation models.arXiv preprint arXiv:2509.12818, 2025
Ilse, M., Sharma, H., Schwaighofer, A., Bond-Taylor, S., Pérez-García, F., Melnichenko, O., Sykes, A.M.G., Horst, K.K., Khandelwal, A., Reynolds, M., et al.: Data scaling laws for radiology foundation models. arXiv preprint arXiv:2509.12818 (2025)
-
[18]
Nature methods18(2), 203–211 (2021)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)
work page 2021
-
[19]
arXiv preprint arXiv:2206.08023 (2022)
Ji, Y ., Bai, H., Yang, J., Ge, C., Zhu, Y ., Zhang, R., Li, Z., Zhang, L., Ma, W., Wan, X., et al.: Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. arXiv preprint arXiv:2206.08023 (2022)
-
[20]
Advances in Neural Information Processing Systems37, 111318–111357 (2024)
Jin, R., Xu, Z., Zhong, Y ., Yao, Q., Dou, Q., Zhou, S.K., Li, X.: Fairmedfm: fairness bench- marking for medical imaging foundation models. Advances in Neural Information Processing Systems37, 111318–111357 (2024)
work page 2024
-
[21]
Scaling Laws for Neural Language Models
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Rad- ford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[22]
Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt (1988)
Kraft, D.: A software package for sequential quadratic programming. Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt (1988)
work page 1988
-
[23]
Nature Biomedical Engineering6(12), 1346–1352 (2022)
Krishnan, R., Rajpurkar, P., Topol, E.J.: Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering6(12), 1346–1352 (2022)
work page 2022
-
[24]
In: The Eleventh Inter- national Conference on Learning Representations (2022)
Lee, H.H., Bao, S., Huo, Y ., Landman, B.A.: 3d ux-net: A large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. In: The Eleventh Inter- national Conference on Learning Representations (2022)
work page 2022
-
[25]
arXiv preprint arXiv:2505.19603 (2025)
Lee, H.H., Liu, Q., Bao, S., Huo, Y ., Landman, B.A.: Rep3d: Re-parameterize large 3d kernels with low-rank receptive modeling for medical imaging. arXiv preprint arXiv:2505.19603 (2025)
-
[26]
arXiv preprint arXiv:2303.05785 (2023)
Lee, H.H., Liu, Q., Bao, S., Yang, Q., Yu, X., Cai, L.Y ., Li, T., Huo, Y ., Koutsoukos, X., Landman, B.A.: Scaling up 3d kernels with bayesian frequency re-parameterization for medical image segmentation. arXiv preprint arXiv:2303.05785 (2023)
-
[27]
arXiv preprint arXiv:2407.01492 , year=
Liu, Q., Zheng, X., Muennighoff, N., Zeng, G., Dou, L., Pang, T., Jiang, J., Lin, M.: Regmix: Data mixture as regression for language model pre-training. arXiv preprint arXiv:2407.01492 (2024)
-
[28]
Decoupled Weight Decay Regularization
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017) 11
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Journal of cognitive neuroscience19(9), 1498–1507 (2007)
Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience19(9), 1498–1507 (2007)
work page 2007
-
[30]
In: Fourth Head and Neck Cancer Tumor Lesion Segmentation, Diagnosis and Prognosis (2025)
Quetin, S., Enger, S.A.: Automatic lesion and lymph node segmentation from pet and ct scans of the head and neck region: a hecktor 2025 challenge report. In: Fourth Head and Neck Cancer Tumor Lesion Segmentation, Diagnosis and Prognosis (2025)
work page 2025
-
[31]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Rui, S., Chen, L., Tang, Z., Wang, L., Liu, M., Zhang, S., Wang, X.: Multi-modal vision pre-training for medical image analysis. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5164–5174 (2025)
work page 2025
-
[32]
Scaling laws for optimal data mixtures.arXiv preprint arXiv:2507.09404,
Shukor, M., Bethune, L., Busbridge, D., Grangier, D., Fini, E., El-Nouby, A., Ablin, P.: Scaling laws for optimal data mixtures. arXiv preprint arXiv:2507.09404 (2025)
-
[33]
arXiv preprint arXiv:2006.03829 (2020)
Taleb, A., Loetzsch, W., Danz, N., Severin, J., Gaertner, T., Bergner, B., Lippert, C.: 3d self-supervised methods for medical imaging. arXiv preprint arXiv:2006.03829 (2020)
-
[34]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Tang, Y ., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., Nath, V ., Hatamizadeh, A.: Self- supervised pre-training of swin transformers for 3d medical image analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20730–20740 (2022)
work page 2022
-
[35]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Wald, T., Ulrich, C., Suprijadi, J., Ziegler, S., Nohel, M., Peretzke, R., Kohler, G., Maier- Hein, K.: An openmind for 3d medical vision self-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 23839–23879 (2025)
work page 2025
-
[36]
Scientific Data10(1), 574 (2023)
Wang, D., Wang, X., Wang, L., Li, M., Da, Q., Liu, X., Gao, X., Shen, J., He, J., Shen, T., et al.: A real-world dataset and benchmark for foundation model adaptation in medical image classification. Scientific Data10(1), 574 (2023)
work page 2023
-
[37]
Wang, G., Wu, J., Luo, X., Liu, X., Li, K., Zhang, S.: Mis-fm: 3d medical image segmenta- tion using foundation models pretrained on a large-scale unannotated dataset. arXiv preprint arXiv:2306.16925 (2023)
-
[38]
Advances in Neural Information Processing Systems37, 1082–1116 (2024)
Wang, J., Wang, X., Lyu, L., Chen, J., Ma, F.: Fedmeki: A benchmark for scaling medical foun- dation models via federated knowledge injection. Advances in Neural Information Processing Systems37, 1082–1116 (2024)
work page 2024
-
[39]
In: International Workshop on Machine Learning in Medical Imaging
Wu, B., Xie, Y ., Zhang, Z., Ge, J., Yaxley, K., Bahadir, S., Wu, Q., Liu, Y ., To, M.S.: Bhsd: A 3d multi-class brain hemorrhage segmentation dataset. In: International Workshop on Machine Learning in Medical Imaging. pp. 147–156. Springer (2023)
work page 2023
-
[40]
Advances in Neural Information Processing Systems36, 69798–69818 (2023)
Xie, S.M., Pham, H., Dong, X., Du, N., Liu, H., Lu, Y ., Liang, P.S., Le, Q.V ., Ma, T., Yu, A.W.: Doremi: Optimizing data mixtures speeds up language model pretraining. Advances in Neural Information Processing Systems36, 69798–69818 (2023)
work page 2023
-
[41]
In: European Conference on Computer Vision
Xie, Y ., Zhang, J., Xia, Y ., Wu, Q.: Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier. In: European Conference on Computer Vision. pp. 558–575. Springer (2022)
work page 2022
-
[42]
npj Digital Medicine8(1), 639 (2025)
Xu, T., Hosseini, S., Anderson, C., Rinaldi, A., Krishnan, R.G., Martel, A.L., Goubran, M.: A generalizable 3d framework and model for self-supervised learning in medical imaging. npj Digital Medicine8(1), 639 (2025)
work page 2025
-
[43]
Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mt5: A massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: Human language technologies. pp. 483–498 (2021)
work page 2021
-
[44]
Scientific data10(1), 41 (2023) 12
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific data10(1), 41 (2023) 12
work page 2023
-
[45]
arXiv preprint arXiv:2403.16952 , year=
Ye, J., Liu, P., Sun, T., Zhan, J., Zhou, Y ., Qiu, X.: Data mixing laws: Optimizing data mixtures by predicting language modeling performance. arXiv preprint arXiv:2403.16952 (2024)
-
[46]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Ye, Y ., Xie, Y ., Zhang, J., Chen, Z., Wu, Q., Xia, Y .: Continual self-supervised learning: Towards universal multi-modal medical data representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11114–11124 (2024)
work page 2024
-
[47]
Yu, X., Yang, Q., Zhou, Y ., Cai, L.Y ., Gao, R., Lee, H.H., Li, T., Bao, S., Xu, Z., Lasko, T.A., et al.: Unest: local spatial representation learning with hierarchical transformer for efficient medical segmentation. Medical Image Analysis p. 102939 (2023)
work page 2023
-
[48]
Zhou, Z., Sodha, V ., Pang, J., Gotway, M.B., Liang, J.: Models genesis. Medical image analysis 67, 101840 (2021) 13 A Experimental Details A.1 Dataset Description Pretraining datasets.Our six pretraining domains are drawn from three publicly available 3D medical imaging collections spanning CT, MRI, and PET (Table 4). To simulate realistic data imbalance...
work page 2021
-
[49]
Load the MAE checkpoint containing unetr_state_dict and metadata (scale_factor, num_layers)
-
[50]
Build a fresh UNETR without_channels = num_classesfor the target task
-
[51]
Extract all keys matchingvit.* from the checkpoint and load them into the new UNETR’s ViT encoder, verifying shape compatibility
-
[52]
All remaining parameters (CNN decoder, skip connections, output head) retain random initialization. Classification: ViT encoder with MLP head.For classification tasks, we extract the same pretrained ViT encoder and attach a lightweight MLP classification head. The ViT processes the input 963 volume into a sequence of patch embeddings, which are globally a...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.