Layer Selection in Feature-Based Losses Affects Image Quality and Microstructural Consistency in Deep Learning Super-Resolution of Brain Diffusion MRI
Pith reviewed 2026-05-19 18:37 UTC · model grok-4.3
The pith
Choosing the shallowest VGG16 layer for feature-based losses avoids grid-like artifacts in super-resolved brain diffusion MRI and maintains microstructural consistency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deeper layers and combinations thereof resulted in grid-like artifacts in super-resolution DWIs, which persisted in diffusion parameters like quantitative and fractional anisotropy. No such artifacts were present when using the shallowest layer. Downstream analysis for this layer showed great consistency with the ground truth, even for 9-fold super-resolution. Image SNR and used VGG16-layer depths modulated artifact appearance and severity.
What carries the argument
Feature-based loss computed from activations at different depths of a pretrained VGG16 network, serving as the objective for training a UNet on 2D super-resolution of diffusion weighted images.
Load-bearing premise
The ablation and isolation studies sufficiently isolate the effect of VGG16 layer depth from other training choices such as optimizer settings, data augmentation, or network capacity.
What would settle it
Retraining the super-resolution network with deeper VGG16 layers under otherwise identical conditions and checking if grid-like artifacts reappear in the output images and in maps of fractional anisotropy.
read the original abstract
Clinical application of high-resolution diffusion MRI is hindered by hardware limitations and prohibitive scan times, motivating computational super-resolution. This study investigates the efficacy of a feature-based loss function in preserving diffusion signal consistency in deep learning super-resolution. Using 7T data from the human connectome project to generate pairs of low- and high-resolution diffusion weighted images (DWI), we trained UNets for 2D super-resolution. Ablation and isolation studies evaluated different VGG16-layers for feature-based losses against an image-based L1 baseline. Deeper layers and combinations thereof resulted in grid-like artifacts in super-resolution DWIs, which persisted in diffusion parameters like quantitative and fractional anisotropy. No such artifacts were present when using the shallowest layer. Downstream analysis for this layer showed great consistency with the ground truth, even for 9-fold super-resolution. Image SNR and used VGG16-layer depths modulated artifact appearance and severity, mandating careful selection of contributing layers for application in and beyond diffusion MRI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical study on the effects of using different layers of a pre-trained VGG16 network for feature-based loss functions in training a UNet for super-resolving diffusion-weighted images (DWIs) from the 7T Human Connectome Project dataset. Using ablation and isolation studies, the authors compare these perceptual losses to an L1 baseline and report that deeper VGG16 layers and their combinations introduce grid-like artifacts in the super-resolved DWIs, which carry over to derived diffusion parameters such as fractional anisotropy (FA) and quantitative anisotropy. In contrast, the shallowest layer avoids these artifacts and yields high consistency with ground-truth high-resolution images, even for 9-fold super-resolution. The study also observes that image SNR and chosen layer depths influence artifact severity.
Significance. If the results hold after addressing controls, this work is significant for deep learning super-resolution in diffusion MRI by highlighting risks of perceptual losses that can introduce artifacts affecting microstructural parameters. The direct empirical comparisons to ground truth and focus on downstream consistency provide practical guidance. The ablation approach is a strength for isolating effects, though unconfirmed training controls weaken causal claims.
major comments (3)
- [Ablation and isolation studies] Ablation and isolation studies: The description does not confirm that all training parameters (optimizer, learning rate schedule, data augmentation, batch size, training duration, and network capacity) were held identical when varying only the VGG16 layer(s) for the feature loss. This control is load-bearing for the central claim that grid artifacts are due to deeper layers rather than confounding setup differences.
- [Downstream analysis] Downstream analysis for shallowest layer: The claim of 'great consistency' with ground truth at 9-fold super-resolution lacks quantitative support such as sample size, error bars, statistical tests, or specific metrics (e.g., RMSE or SSIM on DWIs and FA maps). This is central to recommending the shallowest layer and is undermined by the absence of these details.
- [Results] Results on artifact persistence: The observation that artifacts in DWIs persist in quantitative and fractional anisotropy maps is important, but without explicit controls for SNR variation across experiments (as noted to modulate severity), it is unclear if layer depth is the primary driver independent of data quality factors.
minor comments (2)
- [Abstract] Abstract: Clarify the exact VGG16 layers tested (e.g., which specific layer is the 'shallowest') and the precise meaning of 'quantitative anisotropy' to aid reproducibility.
- The manuscript would benefit from a table summarizing the ablation configurations, including which layers were used alone or in combination.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our manuscript. These have helped us clarify key aspects of our experimental design and strengthen the presentation of our results. We provide point-by-point responses to each major comment below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Ablation and isolation studies] Ablation and isolation studies: The description does not confirm that all training parameters (optimizer, learning rate schedule, data augmentation, batch size, training duration, and network capacity) were held identical when varying only the VGG16 layer(s) for the feature loss. This control is load-bearing for the central claim that grid artifacts are due to deeper layers rather than confounding setup differences.
Authors: We thank the referee for identifying this ambiguity in the description. In our ablation and isolation studies, all training parameters were held strictly identical across conditions, with the sole variation being the VGG16 layer(s) selected for the feature-based loss. The optimizer, learning rate schedule, data augmentation, batch size, training duration, and network capacity remained fixed. We have revised the Methods section to explicitly document these controls, thereby reinforcing the attribution of observed grid artifacts to layer depth rather than experimental confounds. revision: yes
-
Referee: [Downstream analysis] Downstream analysis for shallowest layer: The claim of 'great consistency' with ground truth at 9-fold super-resolution lacks quantitative support such as sample size, error bars, statistical tests, or specific metrics (e.g., RMSE or SSIM on DWIs and FA maps). This is central to recommending the shallowest layer and is undermined by the absence of these details.
Authors: We agree that quantitative support is essential for this central claim. While the original manuscript emphasized visual consistency for the shallowest layer at 9-fold super-resolution, we have now incorporated quantitative metrics in the revised Results section. These include RMSE and SSIM values computed on the super-resolved DWIs and FA maps, reported with error bars across the subject cohort, sample sizes, and statistical comparisons against the L1 baseline and ground truth. revision: yes
-
Referee: [Results] Results on artifact persistence: The observation that artifacts in DWIs persist in quantitative and fractional anisotropy maps is important, but without explicit controls for SNR variation across experiments (as noted to modulate severity), it is unclear if layer depth is the primary driver independent of data quality factors.
Authors: We appreciate the referee's emphasis on disentangling SNR effects. The manuscript already notes that SNR modulates artifact severity, but to directly address independence from data quality, we have added a controlled analysis in the revised Results. This includes experiments at matched SNR levels across layer depths, confirming that deeper layers introduce grid artifacts even under controlled SNR conditions, while the shallowest layer remains robust. revision: yes
Circularity Check
No circularity: purely empirical ablation study with direct measurements
full rationale
The paper reports results from training UNets on 7T HCP data for 2D super-resolution of diffusion-weighted images, comparing an L1 baseline against feature-based losses using different VGG16 layers. All reported outcomes (grid artifacts in deeper layers, absence in the shallowest layer, consistency of diffusion parameters like FA and quantitative anisotropy, and SNR modulation) are measured quantities obtained by comparing model outputs to ground-truth high-resolution images. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains are present; the work contains no claimed derivation chain that could reduce to its inputs by construction. The ablation studies are experimental controls whose validity is a separate question of experimental design, not circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption VGG16 layers pre-trained on natural images extract features relevant to diffusion-weighted MRI contrast
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Deeper layers and combinations thereof resulted in grid-like artifacts in super-resolution DWIs, which persisted in diffusion parameters like quantitative and fractional anisotropy. No such artifacts were present when using the shallowest layer.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Ablation and isolation studies evaluated different VGG16-layers for feature-based losses against an image-based L1 baseline.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Salat, D.H. (2014). Chapter 12 - Diffusion Tensor Imaging in the Study of Aging and Age-Associated Neural Disease. In Diffusion MRI (Second Edition), H. Johansen-Berg, and T.E.J. Behrens, eds. (Academic Press), pp. 257–281. 10.1016/B978-0-12-396460-1.00012-3
-
[2]
Goveas, J., O'Dwyer, L., Mascalchi, M., Cosottini, M., Diciotti, S., De Santis, S., Passamonti, L., Tessa, C., Toschi, N., and Giannelli, M. (2015). Diffusion-MRI in neurodegenerative disorders. Magnetic resonance imaging 33, 853–876. 10.1016/j.mri.2015.04.006
-
[3]
Tournier, J.D. (2019). Diffusion MRI in the brain - Theory and concepts. Prog Nucl Magn Reson Spectrosc 112-113, 1–16. 10.1016/j.pnmrs.2019.03.001
-
[4]
Lerch, J.P., van der Kouwe, A.J.W., Raznahan, A., Paus, T., Johansen-Berg, H., Miller, K.L., Smith, S.M., Fischl, B., and Sotiropoulos, S.N. (2017). Studying neuroanatomy using MRI. Nature Neuroscience 20, 314–326. 10.1038/nn.4501
-
[5]
Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E.J., Yacoub, E., and Ugurbil, K. (2013). The WU- Minn Human Connectome Project: An overview. Mapping the Connectome 80, 62–79. 10.1016/j.neuroimage.2013.05.041
-
[6]
Alexander, D.C., Zikic, D., Ghosh, A., Tanno, R., Wottschel, V., Zhang, J., Kaden, E., Dyrby, T.B., Sotiropoulos, S.N., Zhang, H., and Criminisi, A. (2017). Image quality transfer and applications in diffusion MRI. NeuroImage 152, 283–298. 10.1016/j.neuroimage.2017.02.089
-
[7]
Elsaid, N.M.H., and Wu, Y.C. (2019). Super-Resolution Diffusion Tensor Imaging using SRCNN: A Feasibility Study. 23–27 July 2019. pp. 2830–2834
work page 2019
-
[8]
Tian, Q., Li, Z., Fan, Q., Ngamsombat, C., Hu, Y., Liao, C., Wang, F., Setsompop, K., Polimeni, J.R., Bilgic, B., and Huang, S.Y. (2021). SRDTI: Deep learning-based super-resolution for diffusion tensor MRI. Preprint at arXiv. 10.48550/arXiv.2102.09069
-
[9]
Lyon, M., Armitage, P., and Álvarez, M.A. (2022). Angular Super -Resolution in Diffusion MRI with a 3D Recurrent Convolutional Autoencoder. Preprint at arXiv. 10.48550/arXiv.2203.15598
-
[10]
Chen, G., Hong, Y., Huynh, K.M., and Yap, P.T. (2023). Deep learning prediction of diffusion MRI data with microstructure-sensitive loss functions. Med Image Anal 85, 102742. 10.1016/j.media.2023.102742
-
[11]
Ren, M., Kim, H., Dey, N., and Gerig, G. (2021). Q-space Conditioned Translation Networks for Directional Synthesis of Diffusion Weighted Images from Multi-modal Structural MRI. Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention 12907, 530–540. 10.1...
-
[12]
Qin, Y., Liu, Z., Liu, C., Li, Y., Zeng, X., and Ye, C. (2021). Super-Resolved q-Space deep learning with uncertainty quantification. Medical Image Analysis 67, 101885. 10.1016/j.media.2020.101885
-
[13]
Zhou, W., Bovik, A.C., Sheikh, H.R., and Simoncelli, E.P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 600–612. 10.1109/TIP.2003.819861
-
[14]
Wang, Z., Simoncelli, E.P., and Bovik, A.C. (2003). Multiscale structural similarity for image quality assessment. 9–12 Nov. 2003. pp. 1398–1402 Vol.1392
work page 2003
-
[15]
Zhang, L., Zhang, L., Mou, X., and Zhang, D. (2011). FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Transactions on Image Processing 20, 2378–2386. 10.1109/TIP.2011.2109730
-
[16]
Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Perceptual Losses for Real-Time Style Transfer and Super- Resolution. Preprint at arXiv. 10.48550/arXiv.1603.08155
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.08155 2016
-
[17]
Panda, A., Naskar, R., Rajbans, S., and Pal, S. (2019). A 3D Wide Residual Network with Perceptual Loss for Brain MRI Image Denoising. 6–8 July 2019. pp. 1–7
work page 2019
-
[18]
Javadi, M., Sharma, R., Tsiamyrtzis, P., Shah, S., Leiss, E.L., and Tsekos, N.V. (2023). From Perception to Precision: Navigating Perceptual Loss in MRI Super-Resolution. 4–6 Dec. 2023. pp. 57–61
work page 2023
-
[19]
Zhang, K., Hu, H., Philbrick, K., Conte, G.M., Sobek, J.D., Rouzrokh, P., and Erickson, B.J. (2022). SOUP - GAN: Super-Resolution MRI Using Generative Adversarial Networks. Tomography (Ann Arbor, Mich.) 8, 905–919. 10.3390/tomography8020073
-
[20]
Yang, G., Yu, S., Dong, H., Slabaugh, G., Dragotti, P.L., Ye, X., Liu, F., Arridge, S., Keegan, J., Guo, Y., and Firmin, D. (2018). DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction. IEEE transactions on medical imaging 37, 1310–1321. 10.1109/TMI.2017.2785879
-
[21]
Lyu, Q., Shan, H., Steber, C., Helis, C., Whitlow, C., Chan, M., and Wang, G. (2020). Multi -Contrast Super-Resolution MRI Through a Progressive Network. IEEE transactions on medical imaging 39, 2738–
work page 2020
-
[22]
10.1109/TMI.2020.2974858
-
[23]
Fang-Cheng, Y., Wedeen, V.J., and Tseng, W.Y.I. (2010). Generalized Q-Sampling Imaging. Medical Imaging, IEEE Transactions on 29, 1626–1635. 10.1109/TMI.2010.2045126
-
[24]
Fu, Z., Zheng, Y., Ma, T., Ye, H., Yang, J., and He, L. (2022). Edge-aware deep image deblurring. Neurocomputing 502, 37–47. 10.1016/j.neucom.2022.06.051
-
[25]
Deng, M., Goy, A., Li, S., Arthur, K., and Barbastathis, G. (2020). Probing shallower: perceptual loss trained Phase Extraction Neural Network (PLT-PhENN) for artifact-free reconstruction at low photon budget. Opt. Express 28, 2511–2535. 10.1364/OE.381301
-
[26]
Krawczyk, P., Gaertner, M., Jansche, A., Bernthaler, T., and Schneider, G. (2024). Reducing artifact generation when using perceptual loss for image deblurring of microscopy data for microstructure analysis. Methods in Microscopy 1, 137–150. 10.1515/mim-2024-0012
-
[27]
Ghodrati, V., Shao, J., Bydder, M., Zhou, Z., Yin, W., Nguyen, K.L., Yang, Y., and Hu, P. (2019). MR image reconstruction using deep learning: evaluation of network structure and loss functions. Quant Imaging Med Surg 9, 1516–1527. 10.21037/qims.2019.08.10
-
[28]
Sarasaen, C., Chatterjee, S., Breitkopf, M., Rose, G., Nürnberger, A., and Speck, O. (2021). Fine -tuning deep learning model parameters for improved super-resolution of dynamic MRI with prior-knowledge. Artificial intelligence in medicine 121, 102196. 10.1016/j.artmed.2021.102196
-
[29]
Jannat, S.R., Lynch, K., Fotouhi, M., Cen, S., Choupan, J., Sheikh -Bahaei, N., Pandey, G., and Varghese, B.A. (2025). Advancing 1.5T MR imaging: toward achieving 3T quality through deep learning super - resolution techniques. Frontiers in Human Neuroscience 19. 10.3389/fnhum.2025.1532395
-
[30]
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022). A Conv Net for the 2020s. 18–24 June 2022. pp. 11966–11976
work page 2022
-
[31]
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., and Dollár, P. (2020). Designing Network Design Spaces. 13–19 June 2020. pp. 10425–10433
work page 2020
-
[32]
Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., and Yuan, L. (2022). DaViT: Dual Attention Vision Transformers. held in Cham, 2022. S. Avidan, G. Brostow, M. Cissé, G.M. Farinella, and T. Hassner, eds. (Springer Nature Switzerland), pp. 74–92
work page 2022
-
[33]
Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.E., Bucholz, R., Chang, A., Chen, L., Corbetta, M., Curtiss, S.W., et al. (2012). The Human Connectome Project: a data acquisition perspective. NeuroImage 62, 2222–2231. 10.1016/j.neuroimage.2012.02.018
-
[34]
Howard, J., and Gugger, S. (2020). Fastai: A Layered API for Deep Learning. Information 11, 108
work page 2020
-
[35]
Yeh, F.-C. DSI Studio. http://dsi-studio.labsolver.org. Accessed October 22, 2024
work page 2024
-
[36]
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 2015. N. Navab, J. Hornegger, W.M. Wells, and A.F. Frangi, eds. (Springer International Publishing), pp. 234 – 241
work page 2015
-
[37]
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Residual Learning for Image Recognition. held in Las Vegas, NV, June 01, 2016. pp. 1
work page 2016
-
[38]
Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. (2018). Self -Attention Generative Adversarial Networks. Preprint at arXiv. 10.48550/arXiv.1805.08318
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.08318 2018
-
[39]
Aitken, A., Ledig, C., Theis, L., Caballero, J., Wang, Z., and Shi, W. (2017). Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize. Preprint at arXiv. 10.48550/arXiv.1707.02937
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.02937 2017
-
[40]
Yeh, F.-C., Zaydan, I.M., Suski, V.R., Lacomis, D., Richardson, R.M., Maroon, J.C., and Barrios -Martinez, J. (2019). Differential tractography as a track-based biomarker for neuronal injury. NeuroImage 202, 116131. 10.1016/j.neuroimage.2019.116131
-
[41]
Yeh, F.-C., Panesar, S., Fernandes, D., Meola, A., Yoshino, M., Fernandez-Miranda, J.C., Vettel, J.M., and Verstynen, T. (2018). Population-averaged atlas of the macroscale human structural connectome and its network topology. NeuroImage 178, 57–68. 10.1016/j.neuroimage.2018.05.027. Supplemental information: Layer Selection in Feature-Based Losses Affects...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.