pith. sign in

arxiv: 2605.15895 · v1 · pith:SJ6V5MBNnew · submitted 2026-05-15 · 📡 eess.IV · cs.CV

Layer Selection in Feature-Based Losses Affects Image Quality and Microstructural Consistency in Deep Learning Super-Resolution of Brain Diffusion MRI

Pith reviewed 2026-05-19 18:37 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords super-resolutiondiffusion MRIfeature-based lossVGG16 layersgrid artifactsfractional anisotropybrain imagingmicrostructural consistency
0
0 comments X

The pith

Choosing the shallowest VGG16 layer for feature-based losses avoids grid-like artifacts in super-resolved brain diffusion MRI and maintains microstructural consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how the depth of layers used from a VGG16 network to build a feature-based loss function influences the results of deep learning super-resolution applied to diffusion-weighted brain images. Deeper layers produce grid patterns that show up in the enhanced images and also distort measures of fractional anisotropy and quantitative anisotropy. The shallowest layer prevents these patterns and delivers outputs that align closely with real high-resolution scans, even when the resolution is increased nine times. Clinicians and researchers interested in faster ways to obtain detailed diffusion data would find this relevant because it identifies a specific choice that keeps the method reliable without adding scan time.

Core claim

Deeper layers and combinations thereof resulted in grid-like artifacts in super-resolution DWIs, which persisted in diffusion parameters like quantitative and fractional anisotropy. No such artifacts were present when using the shallowest layer. Downstream analysis for this layer showed great consistency with the ground truth, even for 9-fold super-resolution. Image SNR and used VGG16-layer depths modulated artifact appearance and severity.

What carries the argument

Feature-based loss computed from activations at different depths of a pretrained VGG16 network, serving as the objective for training a UNet on 2D super-resolution of diffusion weighted images.

Load-bearing premise

The ablation and isolation studies sufficiently isolate the effect of VGG16 layer depth from other training choices such as optimizer settings, data augmentation, or network capacity.

What would settle it

Retraining the super-resolution network with deeper VGG16 layers under otherwise identical conditions and checking if grid-like artifacts reappear in the output images and in maps of fractional anisotropy.

read the original abstract

Clinical application of high-resolution diffusion MRI is hindered by hardware limitations and prohibitive scan times, motivating computational super-resolution. This study investigates the efficacy of a feature-based loss function in preserving diffusion signal consistency in deep learning super-resolution. Using 7T data from the human connectome project to generate pairs of low- and high-resolution diffusion weighted images (DWI), we trained UNets for 2D super-resolution. Ablation and isolation studies evaluated different VGG16-layers for feature-based losses against an image-based L1 baseline. Deeper layers and combinations thereof resulted in grid-like artifacts in super-resolution DWIs, which persisted in diffusion parameters like quantitative and fractional anisotropy. No such artifacts were present when using the shallowest layer. Downstream analysis for this layer showed great consistency with the ground truth, even for 9-fold super-resolution. Image SNR and used VGG16-layer depths modulated artifact appearance and severity, mandating careful selection of contributing layers for application in and beyond diffusion MRI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an empirical study on the effects of using different layers of a pre-trained VGG16 network for feature-based loss functions in training a UNet for super-resolving diffusion-weighted images (DWIs) from the 7T Human Connectome Project dataset. Using ablation and isolation studies, the authors compare these perceptual losses to an L1 baseline and report that deeper VGG16 layers and their combinations introduce grid-like artifacts in the super-resolved DWIs, which carry over to derived diffusion parameters such as fractional anisotropy (FA) and quantitative anisotropy. In contrast, the shallowest layer avoids these artifacts and yields high consistency with ground-truth high-resolution images, even for 9-fold super-resolution. The study also observes that image SNR and chosen layer depths influence artifact severity.

Significance. If the results hold after addressing controls, this work is significant for deep learning super-resolution in diffusion MRI by highlighting risks of perceptual losses that can introduce artifacts affecting microstructural parameters. The direct empirical comparisons to ground truth and focus on downstream consistency provide practical guidance. The ablation approach is a strength for isolating effects, though unconfirmed training controls weaken causal claims.

major comments (3)
  1. [Ablation and isolation studies] Ablation and isolation studies: The description does not confirm that all training parameters (optimizer, learning rate schedule, data augmentation, batch size, training duration, and network capacity) were held identical when varying only the VGG16 layer(s) for the feature loss. This control is load-bearing for the central claim that grid artifacts are due to deeper layers rather than confounding setup differences.
  2. [Downstream analysis] Downstream analysis for shallowest layer: The claim of 'great consistency' with ground truth at 9-fold super-resolution lacks quantitative support such as sample size, error bars, statistical tests, or specific metrics (e.g., RMSE or SSIM on DWIs and FA maps). This is central to recommending the shallowest layer and is undermined by the absence of these details.
  3. [Results] Results on artifact persistence: The observation that artifacts in DWIs persist in quantitative and fractional anisotropy maps is important, but without explicit controls for SNR variation across experiments (as noted to modulate severity), it is unclear if layer depth is the primary driver independent of data quality factors.
minor comments (2)
  1. [Abstract] Abstract: Clarify the exact VGG16 layers tested (e.g., which specific layer is the 'shallowest') and the precise meaning of 'quantitative anisotropy' to aid reproducibility.
  2. The manuscript would benefit from a table summarizing the ablation configurations, including which layers were used alone or in combination.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. These have helped us clarify key aspects of our experimental design and strengthen the presentation of our results. We provide point-by-point responses to each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Ablation and isolation studies] Ablation and isolation studies: The description does not confirm that all training parameters (optimizer, learning rate schedule, data augmentation, batch size, training duration, and network capacity) were held identical when varying only the VGG16 layer(s) for the feature loss. This control is load-bearing for the central claim that grid artifacts are due to deeper layers rather than confounding setup differences.

    Authors: We thank the referee for identifying this ambiguity in the description. In our ablation and isolation studies, all training parameters were held strictly identical across conditions, with the sole variation being the VGG16 layer(s) selected for the feature-based loss. The optimizer, learning rate schedule, data augmentation, batch size, training duration, and network capacity remained fixed. We have revised the Methods section to explicitly document these controls, thereby reinforcing the attribution of observed grid artifacts to layer depth rather than experimental confounds. revision: yes

  2. Referee: [Downstream analysis] Downstream analysis for shallowest layer: The claim of 'great consistency' with ground truth at 9-fold super-resolution lacks quantitative support such as sample size, error bars, statistical tests, or specific metrics (e.g., RMSE or SSIM on DWIs and FA maps). This is central to recommending the shallowest layer and is undermined by the absence of these details.

    Authors: We agree that quantitative support is essential for this central claim. While the original manuscript emphasized visual consistency for the shallowest layer at 9-fold super-resolution, we have now incorporated quantitative metrics in the revised Results section. These include RMSE and SSIM values computed on the super-resolved DWIs and FA maps, reported with error bars across the subject cohort, sample sizes, and statistical comparisons against the L1 baseline and ground truth. revision: yes

  3. Referee: [Results] Results on artifact persistence: The observation that artifacts in DWIs persist in quantitative and fractional anisotropy maps is important, but without explicit controls for SNR variation across experiments (as noted to modulate severity), it is unclear if layer depth is the primary driver independent of data quality factors.

    Authors: We appreciate the referee's emphasis on disentangling SNR effects. The manuscript already notes that SNR modulates artifact severity, but to directly address independence from data quality, we have added a controlled analysis in the revised Results. This includes experiments at matched SNR levels across layer depths, confirming that deeper layers introduce grid artifacts even under controlled SNR conditions, while the shallowest layer remains robust. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ablation study with direct measurements

full rationale

The paper reports results from training UNets on 7T HCP data for 2D super-resolution of diffusion-weighted images, comparing an L1 baseline against feature-based losses using different VGG16 layers. All reported outcomes (grid artifacts in deeper layers, absence in the shallowest layer, consistency of diffusion parameters like FA and quantitative anisotropy, and SNR modulation) are measured quantities obtained by comparing model outputs to ground-truth high-resolution images. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains are present; the work contains no claimed derivation chain that could reduce to its inputs by construction. The ablation studies are experimental controls whose validity is a separate question of experimental design, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning training assumptions and the transferability of ImageNet-pretrained VGG16 features to diffusion MRI images. No free parameters are fitted to produce the artifact observation, and no new physical entities are introduced.

axioms (1)
  • domain assumption VGG16 layers pre-trained on natural images extract features relevant to diffusion-weighted MRI contrast
    The feature-based loss is constructed directly from these layers without additional justification or domain-specific pre-training mentioned in the abstract.

pith-pipeline@v0.9.0 · 5709 in / 1507 out tokens · 51775 ms · 2026-05-19T18:37:48.344528+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

  1. [1]

    Salat, D.H. (2014). Chapter 12 - Diffusion Tensor Imaging in the Study of Aging and Age-Associated Neural Disease. In Diffusion MRI (Second Edition), H. Johansen-Berg, and T.E.J. Behrens, eds. (Academic Press), pp. 257–281. 10.1016/B978-0-12-396460-1.00012-3

  2. [2]

    Goveas, J., O'Dwyer, L., Mascalchi, M., Cosottini, M., Diciotti, S., De Santis, S., Passamonti, L., Tessa, C., Toschi, N., and Giannelli, M. (2015). Diffusion-MRI in neurodegenerative disorders. Magnetic resonance imaging 33, 853–876. 10.1016/j.mri.2015.04.006

  3. [3]

    Tournier, J.D. (2019). Diffusion MRI in the brain - Theory and concepts. Prog Nucl Magn Reson Spectrosc 112-113, 1–16. 10.1016/j.pnmrs.2019.03.001

  4. [4]

    Lerch, J.P., van der Kouwe, A.J.W., Raznahan, A., Paus, T., Johansen-Berg, H., Miller, K.L., Smith, S.M., Fischl, B., and Sotiropoulos, S.N. (2017). Studying neuroanatomy using MRI. Nature Neuroscience 20, 314–326. 10.1038/nn.4501

  5. [5]

    Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E.J., Yacoub, E., and Ugurbil, K. (2013). The WU- Minn Human Connectome Project: An overview. Mapping the Connectome 80, 62–79. 10.1016/j.neuroimage.2013.05.041

  6. [6]

    Alexander, D.C., Zikic, D., Ghosh, A., Tanno, R., Wottschel, V., Zhang, J., Kaden, E., Dyrby, T.B., Sotiropoulos, S.N., Zhang, H., and Criminisi, A. (2017). Image quality transfer and applications in diffusion MRI. NeuroImage 152, 283–298. 10.1016/j.neuroimage.2017.02.089

  7. [7]

    Elsaid, N.M.H., and Wu, Y.C. (2019). Super-Resolution Diffusion Tensor Imaging using SRCNN: A Feasibility Study. 23–27 July 2019. pp. 2830–2834

  8. [8]

    Tian, Q., Li, Z., Fan, Q., Ngamsombat, C., Hu, Y., Liao, C., Wang, F., Setsompop, K., Polimeni, J.R., Bilgic, B., and Huang, S.Y. (2021). SRDTI: Deep learning-based super-resolution for diffusion tensor MRI. Preprint at arXiv. 10.48550/arXiv.2102.09069

  9. [9]

    Lyon, M., Armitage, P., and Álvarez, M.A. (2022). Angular Super -Resolution in Diffusion MRI with a 3D Recurrent Convolutional Autoencoder. Preprint at arXiv. 10.48550/arXiv.2203.15598

  10. [10]

    Chen, G., Hong, Y., Huynh, K.M., and Yap, P.T. (2023). Deep learning prediction of diffusion MRI data with microstructure-sensitive loss functions. Med Image Anal 85, 102742. 10.1016/j.media.2023.102742

  11. [11]

    Ren, M., Kim, H., Dey, N., and Gerig, G. (2021). Q-space Conditioned Translation Networks for Directional Synthesis of Diffusion Weighted Images from Multi-modal Structural MRI. Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention 12907, 530–540. 10.1...

  12. [12]

    Qin, Y., Liu, Z., Liu, C., Li, Y., Zeng, X., and Ye, C. (2021). Super-Resolved q-Space deep learning with uncertainty quantification. Medical Image Analysis 67, 101885. 10.1016/j.media.2020.101885

  13. [13]

    Zhou, W., Bovik, A.C., Sheikh, H.R., and Simoncelli, E.P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 600–612. 10.1109/TIP.2003.819861

  14. [14]

    Wang, Z., Simoncelli, E.P., and Bovik, A.C. (2003). Multiscale structural similarity for image quality assessment. 9–12 Nov. 2003. pp. 1398–1402 Vol.1392

  15. [15]

    Zhang, L., Zhang, L., Mou, X., and Zhang, D. (2011). FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Transactions on Image Processing 20, 2378–2386. 10.1109/TIP.2011.2109730

  16. [16]

    Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Perceptual Losses for Real-Time Style Transfer and Super- Resolution. Preprint at arXiv. 10.48550/arXiv.1603.08155

  17. [17]

    Panda, A., Naskar, R., Rajbans, S., and Pal, S. (2019). A 3D Wide Residual Network with Perceptual Loss for Brain MRI Image Denoising. 6–8 July 2019. pp. 1–7

  18. [18]

    Javadi, M., Sharma, R., Tsiamyrtzis, P., Shah, S., Leiss, E.L., and Tsekos, N.V. (2023). From Perception to Precision: Navigating Perceptual Loss in MRI Super-Resolution. 4–6 Dec. 2023. pp. 57–61

  19. [19]

    Zhang, K., Hu, H., Philbrick, K., Conte, G.M., Sobek, J.D., Rouzrokh, P., and Erickson, B.J. (2022). SOUP - GAN: Super-Resolution MRI Using Generative Adversarial Networks. Tomography (Ann Arbor, Mich.) 8, 905–919. 10.3390/tomography8020073

  20. [20]

    Yang, G., Yu, S., Dong, H., Slabaugh, G., Dragotti, P.L., Ye, X., Liu, F., Arridge, S., Keegan, J., Guo, Y., and Firmin, D. (2018). DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction. IEEE transactions on medical imaging 37, 1310–1321. 10.1109/TMI.2017.2785879

  21. [21]

    Lyu, Q., Shan, H., Steber, C., Helis, C., Whitlow, C., Chan, M., and Wang, G. (2020). Multi -Contrast Super-Resolution MRI Through a Progressive Network. IEEE transactions on medical imaging 39, 2738–

  22. [22]

    10.1109/TMI.2020.2974858

  23. [23]

    Fang-Cheng, Y., Wedeen, V.J., and Tseng, W.Y.I. (2010). Generalized Q-Sampling Imaging. Medical Imaging, IEEE Transactions on 29, 1626–1635. 10.1109/TMI.2010.2045126

  24. [24]

    Fu, Z., Zheng, Y., Ma, T., Ye, H., Yang, J., and He, L. (2022). Edge-aware deep image deblurring. Neurocomputing 502, 37–47. 10.1016/j.neucom.2022.06.051

  25. [25]

    Deng, M., Goy, A., Li, S., Arthur, K., and Barbastathis, G. (2020). Probing shallower: perceptual loss trained Phase Extraction Neural Network (PLT-PhENN) for artifact-free reconstruction at low photon budget. Opt. Express 28, 2511–2535. 10.1364/OE.381301

  26. [26]

    Krawczyk, P., Gaertner, M., Jansche, A., Bernthaler, T., and Schneider, G. (2024). Reducing artifact generation when using perceptual loss for image deblurring of microscopy data for microstructure analysis. Methods in Microscopy 1, 137–150. 10.1515/mim-2024-0012

  27. [27]

    Ghodrati, V., Shao, J., Bydder, M., Zhou, Z., Yin, W., Nguyen, K.L., Yang, Y., and Hu, P. (2019). MR image reconstruction using deep learning: evaluation of network structure and loss functions. Quant Imaging Med Surg 9, 1516–1527. 10.21037/qims.2019.08.10

  28. [28]

    Sarasaen, C., Chatterjee, S., Breitkopf, M., Rose, G., Nürnberger, A., and Speck, O. (2021). Fine -tuning deep learning model parameters for improved super-resolution of dynamic MRI with prior-knowledge. Artificial intelligence in medicine 121, 102196. 10.1016/j.artmed.2021.102196

  29. [29]

    Jannat, S.R., Lynch, K., Fotouhi, M., Cen, S., Choupan, J., Sheikh -Bahaei, N., Pandey, G., and Varghese, B.A. (2025). Advancing 1.5T MR imaging: toward achieving 3T quality through deep learning super - resolution techniques. Frontiers in Human Neuroscience 19. 10.3389/fnhum.2025.1532395

  30. [30]

    Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022). A Conv Net for the 2020s. 18–24 June 2022. pp. 11966–11976

  31. [31]

    Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., and Dollár, P. (2020). Designing Network Design Spaces. 13–19 June 2020. pp. 10425–10433

  32. [32]

    Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., and Yuan, L. (2022). DaViT: Dual Attention Vision Transformers. held in Cham, 2022. S. Avidan, G. Brostow, M. Cissé, G.M. Farinella, and T. Hassner, eds. (Springer Nature Switzerland), pp. 74–92

  33. [33]

    Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.E., Bucholz, R., Chang, A., Chen, L., Corbetta, M., Curtiss, S.W., et al. (2012). The Human Connectome Project: a data acquisition perspective. NeuroImage 62, 2222–2231. 10.1016/j.neuroimage.2012.02.018

  34. [34]

    Howard, J., and Gugger, S. (2020). Fastai: A Layered API for Deep Learning. Information 11, 108

  35. [35]

    DSI Studio

    Yeh, F.-C. DSI Studio. http://dsi-studio.labsolver.org. Accessed October 22, 2024

  36. [36]

    Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 2015. N. Navab, J. Hornegger, W.M. Wells, and A.F. Frangi, eds. (Springer International Publishing), pp. 234 – 241

  37. [37]

    He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Residual Learning for Image Recognition. held in Las Vegas, NV, June 01, 2016. pp. 1

  38. [38]

    Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. (2018). Self -Attention Generative Adversarial Networks. Preprint at arXiv. 10.48550/arXiv.1805.08318

  39. [39]

    Aitken, A., Ledig, C., Theis, L., Caballero, J., Wang, Z., and Shi, W. (2017). Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize. Preprint at arXiv. 10.48550/arXiv.1707.02937

  40. [40]

    Yeh, F.-C., Zaydan, I.M., Suski, V.R., Lacomis, D., Richardson, R.M., Maroon, J.C., and Barrios -Martinez, J. (2019). Differential tractography as a track-based biomarker for neuronal injury. NeuroImage 202, 116131. 10.1016/j.neuroimage.2019.116131

  41. [41]

    Yeh, F.-C., Panesar, S., Fernandes, D., Meola, A., Yoshino, M., Fernandez-Miranda, J.C., Vettel, J.M., and Verstynen, T. (2018). Population-averaged atlas of the macroscale human structural connectome and its network topology. NeuroImage 178, 57–68. 10.1016/j.neuroimage.2018.05.027. Supplemental information: Layer Selection in Feature-Based Losses Affects...