pith. machine review for the scientific record. sign in

arxiv: 2605.09639 · v2 · submitted 2026-05-10 · 📡 eess.IV · cs.CV

Recognition: no theorem link

XTinyU-Net: Training-Free U-Net Scaling via Initialization-Time Sensitivity

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:35 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords U-Net scalingtraining-free compressionJacobian sensitivitymedical segmentationmodel efficiencyinitialization analysis
0
0 comments X

The pith

Jacobian sensitivity at initialization identifies the smallest stable U-Net configuration for medical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method to select ultralight U-Net models without training them. It finds that scaling down channel widths leads to a sharp drop in performance at a certain point. By measuring the total variation in a Jacobian-based sensitivity score on unlabeled images, it locates the boundary between stable and collapsed performance. This XTinyU-Net matches full nnU-Net accuracy on six datasets but uses hundreds of times fewer parameters. The approach avoids exhaustive training trials to find efficient models.

Core claim

XTinyU-Net is the smallest width-capped U-Net variant whose Jacobian sensitivity curve at initialization shows low total variation, indicating it remains in the stable performance plateau after full training. Across six medical datasets in the nnU-Net framework, it delivers segmentation accuracy comparable to the standard heavy nnU-Net while requiring 400x to 1600x fewer parameters.

What carries the argument

The Jacobian-based sensitivity metric computed at initialization on unlabeled images, whose total variation is used to detect the transition from stable to collapsed representational capacity.

If this is right

  • XTinyU-Net can be deployed in resource-limited medical imaging environments.
  • It outperforms other lightweight architectures with 5x-72x fewer parameters.
  • The framework allows dataset-specific model selection at initialization time.
  • Reduces the need for compute-intensive hyperparameter searches for U-Net scaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar sensitivity analysis might apply to other encoder-decoder architectures beyond U-Net.
  • Using only unlabeled images suggests the method could work in semi-supervised settings.
  • Extending to other compression techniques like pruning could be tested.

Load-bearing premise

The total variation of the Jacobian-based sensitivity curve computed at initialization on unlabeled images accurately locates the boundary between stable performance and representational collapse after full training.

What would settle it

Train the selected XTinyU-Net and the next larger width variant on one dataset; if the smaller model shows clear accuracy loss relative to the larger one after full training, the selection method is falsified.

Figures

Figures reproduced from arXiv: 2605.09639 by Alvin Kimbowa, David Liu, Ilker Hacihaliloglu, Moein Heidari.

Figure 1
Figure 1. Figure 1: U-Net width scaling curves (Dice vs parameters) and corresponding initialization-time Jacobian sensitivity across datasets. The estimated collapse bound￾ary separates stable and unstable regimes; XTinyU-Net is selected as the smallest configuration on the stable side. When training the defined U-Net family across datasets, we consistently ob￾serve two distinct performance regimes (measured by Dice, as seen… view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative results on BUS-BRA (top row), EchoNet-Dynamic (middle row), and ISIC 2018 (bottom row). XTinyU-Net achieves high quality segmentation perfor￾mance. nnU-Net across datasets despite using over 400x–1600x fewer parameters. For instance, on BUS-BRA, XTinyU-Net achieves 90.64% Dice compared to 89.90% for nnU-Net while reducing parameters from 33,472k to 20.39k (over 1600× fewer parameters) and FLOPs… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results on FiVES, BraTS2020, and ACDC (bottom row). XTinyU￾Net achieves high quality segmentation performance [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study on EchoNet-Dynamic showing robustness of the proposed Jacobian-based collapse detection method to batch size, random initialization, and comparison with existing NAS metrics. input image. It also demonstrates resilience to random initialization seeds, with the selected architectures tightly clustered around two adjacent configurations. Crucially, these selected models achieve competitive seg… view at source ↗
read the original abstract

While U-Net architectures remain the gold standard for medical image segmentation, their deployment in resource-constrained environments demands aggressive model compression. However, finding an optimally efficient configuration is computationally prohibitive, typically requiring exhaustive train-and-evaluate cycles to find the smallest model that maintains peak performance. In this paper, we introduce a training-free selection framework to automatically identify ultralightweight, dataset-specific U-Net configurations directly at initialization. We observe that systematically scaling down U-Net channel width induces a sharp transition from a stable performance plateau to representational capacity collapse. To pinpoint this boundary without training, we propose a Jacobian-based sensitivity metric that scores discrete, width-capped U-Net variants using a small set of unlabeled images. By analyzing the total variation of this sensitivity curve, we isolate the smallest stable configuration, which we denote as XTinyU-Net. Evaluated across six diverse medical datasets within the nnU-Net framework, XTinyU-Net achieves segmentation accuracy comparable to the heavy nnU-Net baseline with 400x-1600x fewer parameters, and outperforms contemporary lightweight architectures while utilizing 5x-72x fewer parameters. Code is publicly accessible on https://github.com/alvinkimbowa/nntinyunet.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces XTinyU-Net, a training-free framework that selects the smallest stable channel-width U-Net configuration for medical image segmentation by computing a Jacobian-based sensitivity metric at initialization on unlabeled images and locating the plateau-to-collapse transition via total variation of the resulting curve. It claims this yields dataset-specific models that match nnU-Net accuracy with 400x–1600x fewer parameters and outperform other lightweight architectures with 5x–72x fewer parameters across six diverse medical datasets inside the nnU-Net pipeline.

Significance. If the initialization-time metric reliably predicts post-training stability, the approach would remove the need for exhaustive train-and-evaluate searches when compressing U-Nets, enabling rapid deployment of ultralight models in resource-constrained clinical settings. Public code release supports reproducibility.

major comments (3)
  1. [§3] §3 (sensitivity metric definition): the claim that total variation of the Jacobian sensitivity curve at initialization identifies the stable plateau boundary is presented without derivation or monotonicity argument; the metric appears chosen empirically, yet it is load-bearing for the entire training-free selection procedure.
  2. [§4.3, Table 2] §4.3 and Table 2: reported Dice/Hausdorff values for XTinyU-Net versus nnU-Net baseline lack error bars, standard deviations across runs, or statistical tests, so the assertion of 'comparable' accuracy cannot be verified from the presented data.
  3. [§5] §5 (cross-dataset validation): no per-dataset scatter plots or correlation coefficients are shown between the init-time total-variation values and final segmentation metrics, leaving the key assumption that the relationship is monotonic and dataset-independent untested.
minor comments (2)
  1. The exact parameter counts and FLOPs for each of the six datasets should be tabulated alongside the reduction factors to allow direct comparison.
  2. Sensitivity curves in the figures would benefit from explicit markers indicating the selected XTinyU-Net width and the location of the total-variation threshold.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we plan to make to strengthen the paper.

read point-by-point responses
  1. Referee: [§3] §3 (sensitivity metric definition): the claim that total variation of the Jacobian sensitivity curve at initialization identifies the stable plateau boundary is presented without derivation or monotonicity argument; the metric appears chosen empirically, yet it is load-bearing for the entire training-free selection procedure.

    Authors: We acknowledge that the use of total variation to detect the plateau-to-collapse transition is motivated by empirical observations of the sensitivity curve's behavior rather than a formal derivation. In the revised manuscript, we will expand the discussion in §3 to provide a more detailed justification for this choice, including an analysis of the curve's properties and why total variation is suitable for identifying the boundary. While a complete theoretical proof of monotonicity may not be feasible within the scope of this work, we believe this addition will clarify the rationale and address the concern. revision: partial

  2. Referee: [§4.3, Table 2] §4.3 and Table 2: reported Dice/Hausdorff values for XTinyU-Net versus nnU-Net baseline lack error bars, standard deviations across runs, or statistical tests, so the assertion of 'comparable' accuracy cannot be verified from the presented data.

    Authors: This observation is correct, and we agree that including measures of variability and statistical analysis would strengthen the claims. We will perform additional experiments with multiple random initializations (at least 3-5 runs per dataset) to compute standard deviations and include error bars in the updated Table 2. Additionally, we will include appropriate statistical tests (e.g., Wilcoxon signed-rank test) to support the comparability of results. These changes will be incorporated in the revised version. revision: yes

  3. Referee: [§5] §5 (cross-dataset validation): no per-dataset scatter plots or correlation coefficients are shown between the init-time total-variation values and final segmentation metrics, leaving the key assumption that the relationship is monotonic and dataset-independent untested.

    Authors: We agree that visualizing and quantifying the correlation would provide stronger support for the method's generalizability. In the revised manuscript, we will add per-dataset scatter plots in §5 illustrating the relationship between the initialization-time total variation metric and the final Dice/Hausdorff scores. We will also report Pearson or Spearman correlation coefficients for each dataset to demonstrate the monotonicity of the relationship. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The paper computes its Jacobian-based sensitivity metric directly from the untrained network weights and a small set of unlabeled images, then uses total variation of that curve to locate the stable-to-collapse boundary. This step does not invoke any fitted parameters derived from post-training Dice scores, does not rename a known empirical pattern, and contains no load-bearing self-citations or uniqueness theorems imported from prior author work. The selection rule is therefore independent of the target segmentation performance it is later validated against, satisfying the criteria for a non-circular, externally falsifiable procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested domain assumption that initialization-time Jacobian sensitivity predicts post-training stability; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Jacobian sensitivity at initialization correlates with post-training representational capacity for width-scaled U-Nets
    This correlation is required for the training-free selection to work but is not demonstrated in the provided abstract.

pith-pipeline@v0.9.0 · 5525 in / 1194 out tokens · 57944 ms · 2026-05-15T05:35:49.748971+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 2 internal anchors

  1. [1]

    IEEE Transactions on Pattern Analysis and Machine Intel- ligence (2024)

    Azad, R., Heidokoohi, A., Baseri, S., et al.: Medical imag e segmentation review: The success of u-net. IEEE Transactions on Pattern Analysis and Machine Intel- ligence (2024)

  2. [2]

    Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yan g, X., Heng, P.A., Cetin, I., Lekadir, K., Camara, O., Ballester, M.A.G., et al.: Deep learning techniques for automatic mri cardiac multi-structures segmentation and d iagnosis: is the problem solved? IEEE transactions on medical imaging 37(11), 2514–2525 (2018)

  3. [3]

    : TinyU-Net: Lighter yet Better U-Net with Cascaded Multi-Receptive Fie lds

    Chen, J., Chen, R., Wang, W., Cheng, J., Zhang, L., Chen, L. : TinyU-Net: Lighter yet Better U-Net with Cascaded Multi-Receptive Fie lds. In: proceed- ings of Medical Image Computing and Computer Assisted Inter vention – MIC- CAI 2024. vol. LNCS 15009. Springer Nature Switzerland (Oct ober 2024). https://doi.org/10.1007/978-3-031-72114-4_60

  4. [4]

    In: Internat ional Conference on Learn- ing Representations (2021), https://openreview.net/forum?id=6z_BEpN6Y0

    Chen, W., Gong, X., Wang, Z.: Neural architecture search o n imagenet in four gpu hours: A theoretically inspired perspective. In: Internat ional Conference on Learn- ing Representations (2021), https://openreview.net/forum?id=6z_BEpN6Y0

  5. [5]

    Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

    Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., D usza, S., Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M., et al.: S kin lesion analysis toward melanoma detection 2018: A challenge hosted by the internat ional skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368 (2 019) 10 A. Kimbowa et al

  6. [6]

    Medical physics 51(4), 3110–3123 (2024)

    Gómez-Flores, W., Gregorio-Calas, M.J., Coelho de Albuq uerque Pereira, W.: Bus- bra: A breast ultrasound dataset for assessing computer-ai ded diagnosis systems. Medical physics 51(4), 3110–3123 (2024)

  7. [7]

    https://doi.org/10.48550/arXiv.2512.03834, https://arxiv.org/abs/2512.03834

    Hassler, T., Åkerholm, I., Nordström, M., Balletti, G., G ok- sel, O.: Lean unet: A compact model for image segmen- tation (2025). https://doi.org/10.48550/arXiv.2512.03834, https://arxiv.org/abs/2512.03834

  8. [8]

    Nature Methods 18(2), 203–211 (2021)

    Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Ma ier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomed- ical image segmentation. Nature Methods 18(2), 203–211 (2021). https://doi.org/10.1038/s41592-020-01008-z

  9. [9]

    Scientific data 9(1), 475 (2022)

    Jin, K., Huang, X., Zhou, J., Li, Y., Yan, Y., Sun, Y., Zhang , Q., Wang, Y., Ye, J.: Fives: A fundus image dataset for artificial intelligence ba sed vessel segmentation. Scientific data 9(1), 475 (2022)

  10. [10]

    Medical Image Analys is 103, 103601 (2025)

    Kalkhof, J., Ihm, N., Köhler, T., Gregori, B., Mukhopadh yay, A.: Med-nca: Bio- inspired medical image segmentation. Medical Image Analys is 103, 103601 (2025). https://doi.org/10.1016/j.media.2025.103601

  11. [11]

    SNIP: Single-shot Network Pruning based on Connection Sensitivity

    Lee, N., Ajanthan, T., Torr, P.H.S.: Snip: Single-shot n etwork pruning based on connection sensitivity. In: International Confer ence on Learn- ing Representations (2019). https://doi.org/10.48550/arXiv.1810.02340, https://arxiv.org/abs/1810.02340

  12. [12]

    In : Proceedings of the IEEE/CVF International Conference on Computer Vision (ICC V)

    Lin, M., Wang, P., Sun, Z., Chen, H., Sun, X., Qian, Q., Li, H., Jin, R.: Zen-nas: A zero-shot nas for high-performance image recognition. In : Proceedings of the IEEE/CVF International Conference on Computer Vision (ICC V). pp. 347–356 (October 2021)

  13. [13]

    In: Proceedings of the 38th International Con ference on Machine Learning

    Mellor, J., Turner, J., Storkey, A., Crowley, E.J.: Neur al architecture search with- out training. In: Proceedings of the 38th International Con ference on Machine Learning. Proceedings of Machine Learning Research, vol. 1 39, pp. 7588–7598. PMLR (2021), https://proceedings.mlr.press/v139/mellor21a.html

  14. [14]

    IEEE transactions o n medical imaging 34(10), 1993–2024 (2014)

    Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The mul timodal brain tumor image segmentation benchmark (brats). IEEE transactions o n medical imaging 34(10), 1993–2024 (2014)

  15. [15]

    Na- ture 616(7956), 259–265 (2023)

    Moor, M., Banerjee, O., Abad, Z.S.H., Krumholz, H.M., Le skovec, J., Topol, E.J., Rajpurkar, P.: Foundation models for generalist medical ar tificial intelligence. Na- ture 616(7956), 259–265 (2023)

  16. [16]

    Nature 580(7802), 252–256 (2020)

    Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., L anglotz, C.P., Heiden- reich, P.A., Harrington, R.A., Liang, D.H., Ashley, E.A., e t al.: Video-based ai for beat-to-beat assessment of cardiac function. Nature 580(7802), 252–256 (2020)

  17. [17]

    In: International Conference on Learning Representations (2024), https://openreview.net/forum?id=tveiUXU2aa

    Peng, Y., Song, A., Fayek, H.M., Ciesielski, V., Chang, X .: Swap-nas: Sample-wise activation patterns for ultra-fast nas. In: International Conference on Learning Representations (2024), https://openreview.net/forum?id=tveiUXU2aa

  18. [18]

    In: Medical Image Compu ting and Computer-Assisted Intervention – MICCAI 2015

    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolut ional networks for biomedical image segmentation. In: Medical Image Compu ting and Computer-Assisted Intervention – MICCAI 2015. pp. 234–241 . Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  19. [19]

    In: Advances in Neural Information Proces s- ing Systems (2020)

    Tanaka, H., Kunin, D., Yamins, D.L.K., Ganguli, S.: Prun - ing neural networks without any data by iteratively conserv - ing synaptic flow. In: Advances in Neural Information Proces s- ing Systems (2020). https://doi.org/10.48550/arXiv.2006.05467, https://arxiv.org/abs/2006.05467 XTinyU-Net: Training-Free U-Net Scaling via Initializati on-Time Sensitivity 11

  20. [20]

    In: 2024 IEEE International Symposium on Biomedical Imaging (ISBI)

    Tang, F., Ding, J., Quan, Q., Wang, L., Ning, C., Zhou, S.K .: Cmunext: An efficient medical image segmentation network based on large kernel an d skip fusion. In: 2024 IEEE International Symposium on Biomedical Imaging (ISBI) . pp. 1–5 (2024). https://doi.org/10.1109/ISBI56570.2024.10635609

  21. [21]

    In: Medical Image Computing and Compute r Assisted In- tervention – MICCAI 2022

    Valanarasu, J.M.J., Patel, V.M.: Unext: Mlp-based rapi d medical image seg- mentation network. In: Medical Image Computing and Compute r Assisted In- tervention – MICCAI 2022. pp. 23–33. Springer Nature Switze rland (2022). https://doi.org/10.1007/978-3-031-16443-9_3