arxiv: 2605.09639 · v2 · submitted 2026-05-10 · 📡 eess.IV · cs.CV

Recognition: no theorem link

XTinyU-Net: Training-Free U-Net Scaling via Initialization-Time Sensitivity

Alvin Kimbowa , Moein Heidari , David Liu , Ilker Hacihaliloglu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:35 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords U-Net scalingtraining-free compressionJacobian sensitivitymedical segmentationmodel efficiencyinitialization analysis

0 comments

The pith

Jacobian sensitivity at initialization identifies the smallest stable U-Net configuration for medical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method to select ultralight U-Net models without training them. It finds that scaling down channel widths leads to a sharp drop in performance at a certain point. By measuring the total variation in a Jacobian-based sensitivity score on unlabeled images, it locates the boundary between stable and collapsed performance. This XTinyU-Net matches full nnU-Net accuracy on six datasets but uses hundreds of times fewer parameters. The approach avoids exhaustive training trials to find efficient models.

Core claim

XTinyU-Net is the smallest width-capped U-Net variant whose Jacobian sensitivity curve at initialization shows low total variation, indicating it remains in the stable performance plateau after full training. Across six medical datasets in the nnU-Net framework, it delivers segmentation accuracy comparable to the standard heavy nnU-Net while requiring 400x to 1600x fewer parameters.

What carries the argument

The Jacobian-based sensitivity metric computed at initialization on unlabeled images, whose total variation is used to detect the transition from stable to collapsed representational capacity.

If this is right

XTinyU-Net can be deployed in resource-limited medical imaging environments.
It outperforms other lightweight architectures with 5x-72x fewer parameters.
The framework allows dataset-specific model selection at initialization time.
Reduces the need for compute-intensive hyperparameter searches for U-Net scaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar sensitivity analysis might apply to other encoder-decoder architectures beyond U-Net.
Using only unlabeled images suggests the method could work in semi-supervised settings.
Extending to other compression techniques like pruning could be tested.

Load-bearing premise

The total variation of the Jacobian-based sensitivity curve computed at initialization on unlabeled images accurately locates the boundary between stable performance and representational collapse after full training.

What would settle it

Train the selected XTinyU-Net and the next larger width variant on one dataset; if the smaller model shows clear accuracy loss relative to the larger one after full training, the selection method is falsified.

Figures

Figures reproduced from arXiv: 2605.09639 by Alvin Kimbowa, David Liu, Ilker Hacihaliloglu, Moein Heidari.

**Figure 1.** Figure 1: U-Net width scaling curves (Dice vs parameters) and corresponding initialization-time Jacobian sensitivity across datasets. The estimated collapse boundary separates stable and unstable regimes; XTinyU-Net is selected as the smallest configuration on the stable side. When training the defined U-Net family across datasets, we consistently observe two distinct performance regimes (measured by Dice, as seen… view at source ↗

**Figure 2.** Figure 2: Qualitative results on BUS-BRA (top row), EchoNet-Dynamic (middle row), and ISIC 2018 (bottom row). XTinyU-Net achieves high quality segmentation performance. nnU-Net across datasets despite using over 400x–1600x fewer parameters. For instance, on BUS-BRA, XTinyU-Net achieves 90.64% Dice compared to 89.90% for nnU-Net while reducing parameters from 33,472k to 20.39k (over 1600× fewer parameters) and FLOPs… view at source ↗

**Figure 3.** Figure 3: Qualitative results on FiVES, BraTS2020, and ACDC (bottom row). XTinyUNet achieves high quality segmentation performance [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation study on EchoNet-Dynamic showing robustness of the proposed Jacobian-based collapse detection method to batch size, random initialization, and comparison with existing NAS metrics. input image. It also demonstrates resilience to random initialization seeds, with the selected architectures tightly clustered around two adjacent configurations. Crucially, these selected models achieve competitive seg… view at source ↗

read the original abstract

While U-Net architectures remain the gold standard for medical image segmentation, their deployment in resource-constrained environments demands aggressive model compression. However, finding an optimally efficient configuration is computationally prohibitive, typically requiring exhaustive train-and-evaluate cycles to find the smallest model that maintains peak performance. In this paper, we introduce a training-free selection framework to automatically identify ultralightweight, dataset-specific U-Net configurations directly at initialization. We observe that systematically scaling down U-Net channel width induces a sharp transition from a stable performance plateau to representational capacity collapse. To pinpoint this boundary without training, we propose a Jacobian-based sensitivity metric that scores discrete, width-capped U-Net variants using a small set of unlabeled images. By analyzing the total variation of this sensitivity curve, we isolate the smallest stable configuration, which we denote as XTinyU-Net. Evaluated across six diverse medical datasets within the nnU-Net framework, XTinyU-Net achieves segmentation accuracy comparable to the heavy nnU-Net baseline with 400x-1600x fewer parameters, and outperforms contemporary lightweight architectures while utilizing 5x-72x fewer parameters. Code is publicly accessible on https://github.com/alvinkimbowa/nntinyunet.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XTinyU-Net's Jacobian sensitivity at init plus total variation gives a plausible training-free width selector, but the abstract supplies no numbers or controls to confirm the init metric tracks trained Dice across datasets.

read the letter

The core idea is a training-free method to pick the narrowest stable U-Net width for medical segmentation. They compute a Jacobian sensitivity score on unlabeled images at initialization, plot it against channel width, and use total variation to locate the point where performance would collapse after training. That selected XTinyU-Net is then claimed to match full nnU-Net accuracy on six datasets while using 400x-1600x fewer parameters and beating other lightweight models with 5x-72x fewer parameters. The code is released, which helps anyone who wants to reproduce the selection step. The observation that width scaling produces a clear plateau-then-drop pattern is straightforward and worth noting for anyone doing U-Net compression inside nnU-Net. The approach avoids the usual grid of full training runs, which is the practical gain if the correlation holds. The soft spot is that the abstract gives no derivation of the sensitivity metric, no error bars, no statistical tests, and no actual Dice or Hausdorff values. The central assumption—that total variation of the init-time curve reliably marks the post-training stability boundary—could be noisy if the Jacobian is dominated by early layers or if the transition sharpness varies across the six datasets. Without those checks visible, it is hard to know whether the reported parameter reductions are safe or whether some selected models already sit in the collapsed regime. This work is aimed at people who need lightweight segmentation models for edge or low-resource clinical settings and who already work with nnU-Net pipelines. A reader focused on practical compression methods would get value from the selection heuristic even before the numbers are fully stress-tested. I would send it to peer review because the idea is distinct from standard pruning and the potential compute savings are large enough to justify referee time for verification of the correlation and controls.

Referee Report

3 major / 2 minor

Summary. The paper introduces XTinyU-Net, a training-free framework that selects the smallest stable channel-width U-Net configuration for medical image segmentation by computing a Jacobian-based sensitivity metric at initialization on unlabeled images and locating the plateau-to-collapse transition via total variation of the resulting curve. It claims this yields dataset-specific models that match nnU-Net accuracy with 400x–1600x fewer parameters and outperform other lightweight architectures with 5x–72x fewer parameters across six diverse medical datasets inside the nnU-Net pipeline.

Significance. If the initialization-time metric reliably predicts post-training stability, the approach would remove the need for exhaustive train-and-evaluate searches when compressing U-Nets, enabling rapid deployment of ultralight models in resource-constrained clinical settings. Public code release supports reproducibility.

major comments (3)

[§3] §3 (sensitivity metric definition): the claim that total variation of the Jacobian sensitivity curve at initialization identifies the stable plateau boundary is presented without derivation or monotonicity argument; the metric appears chosen empirically, yet it is load-bearing for the entire training-free selection procedure.
[§4.3, Table 2] §4.3 and Table 2: reported Dice/Hausdorff values for XTinyU-Net versus nnU-Net baseline lack error bars, standard deviations across runs, or statistical tests, so the assertion of 'comparable' accuracy cannot be verified from the presented data.
[§5] §5 (cross-dataset validation): no per-dataset scatter plots or correlation coefficients are shown between the init-time total-variation values and final segmentation metrics, leaving the key assumption that the relationship is monotonic and dataset-independent untested.

minor comments (2)

The exact parameter counts and FLOPs for each of the six datasets should be tabulated alongside the reduction factors to allow direct comparison.
Sensitivity curves in the figures would benefit from explicit markers indicating the selected XTinyU-Net width and the location of the total-variation threshold.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we plan to make to strengthen the paper.

read point-by-point responses

Referee: [§3] §3 (sensitivity metric definition): the claim that total variation of the Jacobian sensitivity curve at initialization identifies the stable plateau boundary is presented without derivation or monotonicity argument; the metric appears chosen empirically, yet it is load-bearing for the entire training-free selection procedure.

Authors: We acknowledge that the use of total variation to detect the plateau-to-collapse transition is motivated by empirical observations of the sensitivity curve's behavior rather than a formal derivation. In the revised manuscript, we will expand the discussion in §3 to provide a more detailed justification for this choice, including an analysis of the curve's properties and why total variation is suitable for identifying the boundary. While a complete theoretical proof of monotonicity may not be feasible within the scope of this work, we believe this addition will clarify the rationale and address the concern. revision: partial
Referee: [§4.3, Table 2] §4.3 and Table 2: reported Dice/Hausdorff values for XTinyU-Net versus nnU-Net baseline lack error bars, standard deviations across runs, or statistical tests, so the assertion of 'comparable' accuracy cannot be verified from the presented data.

Authors: This observation is correct, and we agree that including measures of variability and statistical analysis would strengthen the claims. We will perform additional experiments with multiple random initializations (at least 3-5 runs per dataset) to compute standard deviations and include error bars in the updated Table 2. Additionally, we will include appropriate statistical tests (e.g., Wilcoxon signed-rank test) to support the comparability of results. These changes will be incorporated in the revised version. revision: yes
Referee: [§5] §5 (cross-dataset validation): no per-dataset scatter plots or correlation coefficients are shown between the init-time total-variation values and final segmentation metrics, leaving the key assumption that the relationship is monotonic and dataset-independent untested.

Authors: We agree that visualizing and quantifying the correlation would provide stronger support for the method's generalizability. In the revised manuscript, we will add per-dataset scatter plots in §5 illustrating the relationship between the initialization-time total variation metric and the final Dice/Hausdorff scores. We will also report Pearson or Spearman correlation coefficients for each dataset to demonstrate the monotonicity of the relationship. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The paper computes its Jacobian-based sensitivity metric directly from the untrained network weights and a small set of unlabeled images, then uses total variation of that curve to locate the stable-to-collapse boundary. This step does not invoke any fitted parameters derived from post-training Dice scores, does not rename a known empirical pattern, and contains no load-bearing self-citations or uniqueness theorems imported from prior author work. The selection rule is therefore independent of the target segmentation performance it is later validated against, satisfying the criteria for a non-circular, externally falsifiable procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested domain assumption that initialization-time Jacobian sensitivity predicts post-training stability; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Jacobian sensitivity at initialization correlates with post-training representational capacity for width-scaled U-Nets
This correlation is required for the training-free selection to work but is not demonstrated in the provided abstract.

pith-pipeline@v0.9.0 · 5525 in / 1194 out tokens · 57944 ms · 2026-05-15T05:35:49.748971+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 2 internal anchors

[1]

IEEE Transactions on Pattern Analysis and Machine Intel- ligence (2024)

Azad, R., Heidokoohi, A., Baseri, S., et al.: Medical imag e segmentation review: The success of u-net. IEEE Transactions on Pattern Analysis and Machine Intel- ligence (2024)

work page 2024
[2]

Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yan g, X., Heng, P.A., Cetin, I., Lekadir, K., Camara, O., Ballester, M.A.G., et al.: Deep learning techniques for automatic mri cardiac multi-structures segmentation and d iagnosis: is the problem solved? IEEE transactions on medical imaging 37(11), 2514–2525 (2018)

work page 2018
[3]

: TinyU-Net: Lighter yet Better U-Net with Cascaded Multi-Receptive Fie lds

Chen, J., Chen, R., Wang, W., Cheng, J., Zhang, L., Chen, L. : TinyU-Net: Lighter yet Better U-Net with Cascaded Multi-Receptive Fie lds. In: proceed- ings of Medical Image Computing and Computer Assisted Inter vention – MIC- CAI 2024. vol. LNCS 15009. Springer Nature Switzerland (Oct ober 2024). https://doi.org/10.1007/978-3-031-72114-4_60

work page doi:10.1007/978-3-031-72114-4_60 2024
[4]

In: Internat ional Conference on Learn- ing Representations (2021), https://openreview.net/forum?id=6z_BEpN6Y0

Chen, W., Gong, X., Wang, Z.: Neural architecture search o n imagenet in four gpu hours: A theoretically inspired perspective. In: Internat ional Conference on Learn- ing Representations (2021), https://openreview.net/forum?id=6z_BEpN6Y0

work page 2021
[5]

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., D usza, S., Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M., et al.: S kin lesion analysis toward melanoma detection 2018: A challenge hosted by the internat ional skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368 (2 019) 10 A. Kimbowa et al

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Medical physics 51(4), 3110–3123 (2024)

Gómez-Flores, W., Gregorio-Calas, M.J., Coelho de Albuq uerque Pereira, W.: Bus- bra: A breast ultrasound dataset for assessing computer-ai ded diagnosis systems. Medical physics 51(4), 3110–3123 (2024)

work page 2024
[7]

https://doi.org/10.48550/arXiv.2512.03834, https://arxiv.org/abs/2512.03834

Hassler, T., Åkerholm, I., Nordström, M., Balletti, G., G ok- sel, O.: Lean unet: A compact model for image segmen- tation (2025). https://doi.org/10.48550/arXiv.2512.03834, https://arxiv.org/abs/2512.03834

work page doi:10.48550/arxiv.2512.03834 2025
[8]

Nature Methods 18(2), 203–211 (2021)

Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Ma ier-Hein, K.H.: nnu-net: a self-conﬁguring method for deep learning-based biomed- ical image segmentation. Nature Methods 18(2), 203–211 (2021). https://doi.org/10.1038/s41592-020-01008-z

work page doi:10.1038/s41592-020-01008-z 2021
[9]

Scientiﬁc data 9(1), 475 (2022)

Jin, K., Huang, X., Zhou, J., Li, Y., Yan, Y., Sun, Y., Zhang , Q., Wang, Y., Ye, J.: Fives: A fundus image dataset for artiﬁcial intelligence ba sed vessel segmentation. Scientiﬁc data 9(1), 475 (2022)

work page 2022
[10]

Medical Image Analys is 103, 103601 (2025)

Kalkhof, J., Ihm, N., Köhler, T., Gregori, B., Mukhopadh yay, A.: Med-nca: Bio- inspired medical image segmentation. Medical Image Analys is 103, 103601 (2025). https://doi.org/10.1016/j.media.2025.103601

work page doi:10.1016/j.media.2025.103601 2025
[11]

SNIP: Single-shot Network Pruning based on Connection Sensitivity

Lee, N., Ajanthan, T., Torr, P.H.S.: Snip: Single-shot n etwork pruning based on connection sensitivity. In: International Confer ence on Learn- ing Representations (2019). https://doi.org/10.48550/arXiv.1810.02340, https://arxiv.org/abs/1810.02340

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1810.02340 2019
[12]

In : Proceedings of the IEEE/CVF International Conference on Computer Vision (ICC V)

Lin, M., Wang, P., Sun, Z., Chen, H., Sun, X., Qian, Q., Li, H., Jin, R.: Zen-nas: A zero-shot nas for high-performance image recognition. In : Proceedings of the IEEE/CVF International Conference on Computer Vision (ICC V). pp. 347–356 (October 2021)

work page 2021
[13]

In: Proceedings of the 38th International Con ference on Machine Learning

Mellor, J., Turner, J., Storkey, A., Crowley, E.J.: Neur al architecture search with- out training. In: Proceedings of the 38th International Con ference on Machine Learning. Proceedings of Machine Learning Research, vol. 1 39, pp. 7588–7598. PMLR (2021), https://proceedings.mlr.press/v139/mellor21a.html

work page 2021
[14]

IEEE transactions o n medical imaging 34(10), 1993–2024 (2014)

Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The mul timodal brain tumor image segmentation benchmark (brats). IEEE transactions o n medical imaging 34(10), 1993–2024 (2014)

work page 1993
[15]

Na- ture 616(7956), 259–265 (2023)

Moor, M., Banerjee, O., Abad, Z.S.H., Krumholz, H.M., Le skovec, J., Topol, E.J., Rajpurkar, P.: Foundation models for generalist medical ar tiﬁcial intelligence. Na- ture 616(7956), 259–265 (2023)

work page 2023
[16]

Nature 580(7802), 252–256 (2020)

Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., L anglotz, C.P., Heiden- reich, P.A., Harrington, R.A., Liang, D.H., Ashley, E.A., e t al.: Video-based ai for beat-to-beat assessment of cardiac function. Nature 580(7802), 252–256 (2020)

work page 2020
[17]

In: International Conference on Learning Representations (2024), https://openreview.net/forum?id=tveiUXU2aa

Peng, Y., Song, A., Fayek, H.M., Ciesielski, V., Chang, X .: Swap-nas: Sample-wise activation patterns for ultra-fast nas. In: International Conference on Learning Representations (2024), https://openreview.net/forum?id=tveiUXU2aa

work page 2024
[18]

In: Medical Image Compu ting and Computer-Assisted Intervention – MICCAI 2015

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolut ional networks for biomedical image segmentation. In: Medical Image Compu ting and Computer-Assisted Intervention – MICCAI 2015. pp. 234–241 . Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28

work page doi:10.1007/978-3-319-24574-4_28 2015
[19]

In: Advances in Neural Information Proces s- ing Systems (2020)

Tanaka, H., Kunin, D., Yamins, D.L.K., Ganguli, S.: Prun - ing neural networks without any data by iteratively conserv - ing synaptic ﬂow. In: Advances in Neural Information Proces s- ing Systems (2020). https://doi.org/10.48550/arXiv.2006.05467, https://arxiv.org/abs/2006.05467 XTinyU-Net: Training-Free U-Net Scaling via Initializati on-Time Sensitivity 11

work page doi:10.48550/arxiv.2006.05467 2020
[20]

In: 2024 IEEE International Symposium on Biomedical Imaging (ISBI)

Tang, F., Ding, J., Quan, Q., Wang, L., Ning, C., Zhou, S.K .: Cmunext: An eﬃcient medical image segmentation network based on large kernel an d skip fusion. In: 2024 IEEE International Symposium on Biomedical Imaging (ISBI) . pp. 1–5 (2024). https://doi.org/10.1109/ISBI56570.2024.10635609

work page doi:10.1109/isbi56570.2024.10635609 2024
[21]

In: Medical Image Computing and Compute r Assisted In- tervention – MICCAI 2022

Valanarasu, J.M.J., Patel, V.M.: Unext: Mlp-based rapi d medical image seg- mentation network. In: Medical Image Computing and Compute r Assisted In- tervention – MICCAI 2022. pp. 23–33. Springer Nature Switze rland (2022). https://doi.org/10.1007/978-3-031-16443-9_3

work page doi:10.1007/978-3-031-16443-9_3 2022