Submanifold Sparse Convolutional Networks for Automated 3D Segmentation of Kidneys and Kidney Tumours in Computed Tomography
Pith reviewed 2026-05-18 00:58 UTC · model grok-4.3
The pith
A two-stage sparse convolutional network matches or exceeds patch-based baselines for kidney tumor segmentation in CT while cutting VRAM use by up to 75 percent and inference time by up to 60 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that submanifold sparse convolutional networks arranged in a two-stage pipeline produce Dice similarity coefficients of 95.8 percent for kidneys plus masses, 85.7 percent for tumours plus cysts, and 80.3 percent for tumours alone on the KiTS23 renal cancer CT dataset, results that are competitive with top challenge entries and comparable to or slightly higher than a patch-based nnU-Net baseline, all while achieving up to 60 percent shorter inference time and up to 75 percent lower VRAM usage than an equivalent dense implementation across tested CPU and GPU hardware.
What carries the argument
The two-stage sparse segmentation pipeline in which a low-resolution submanifold sparse network first identifies a region of interest and a subsequent high-resolution submanifold sparse network refines the segmentation inside the cropped ROI.
If this is right
- Native high-resolution 3D processing of entire CT volumes becomes feasible without downsampling or patch-based workarounds.
- Segmentation accuracy stays at or above the level of a standard patch-based nnU-Net on the same KiTS23 cross-validation folds.
- VRAM consumption drops by as much as 75 percent and inference time by as much as 60 percent relative to a dense version of the same architecture.
- The method outperforms a zero-shot foundation model on small heterogeneous lesions while still localizing kidneys reliably.
Where Pith is reading between the lines
- The same sparse two-stage pattern could be applied to other large-volume 3D medical segmentation tasks where memory limits currently force downsampling.
- Efficiency gains might allow on-premise or edge deployment of high-resolution models in settings without high-end GPUs.
- Combining the ROI localization stage with multi-modal inputs or uncertainty estimates could further improve robustness on variable lesion sizes.
Load-bearing premise
The low-resolution first stage always produces a region of interest that fully contains every kidney and tumor voxel, including small or peripherally located lesions.
What would settle it
A single test volume in which the stage-one ROI excludes part of a tumor, causing the stage-two network to output an incomplete or zero segmentation for that lesion.
Figures
read the original abstract
Accurate delineation of kidney tumours in Computed Tomography (CT) is essential for downstream quantitative analysis and precision oncology, but manual segmentation is a specialised task, time-consuming and difficult to scale. Automated 3D segmentation remains challenging because CT scans are large volumetric images, making high-resolution dense convolutional networks computationally expensive and often dependent on downsampling or patch-based inference. We propose a two-stage 3D segmentation methodology based on voxel sparsification and submanifold sparse convolutional networks (SSCNs). Stage 1 uses a low-resolution sparse network to identify a region of interest (ROI); Stage 2 applies a high-resolution sparse network for refined segmentation within the cropped ROI. This enables native high-resolution 3D processing while reducing memory use and inference time. We evaluate the method on the KiTS23 renal cancer CT dataset using 5-fold cross-validation. Our method achieved Dice similarity coefficients of 95.8% for kidneys + masses, 85.7% for tumours + cysts, and 80.3% for tumours alone, competitive with top KiTS23 approaches. In direct comparisons on the same cross-validation folds, the proposed sparse method achieves tumour + cyst and tumour-only Dice scores comparable to, and slightly higher than, a patch-based nnU-Net baseline, while consistently requiring less VRAM and shorter inference time across the tested hardware. Across the tested GPUs, our sparse model is markedly faster than both nnU-Net and the zero-shot zoom-out/zoom-in foundation model SegVol, which localises kidneys well but underperforms on small heterogeneous lesions. Compared to an equivalent dense implementation of the same architecture, the proposed sparse approach achieves up to a 60% reduction in inference time and up to a 75% reduction in VRAM usage across both CPU and the GPU configurations tested.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-stage 3D segmentation pipeline for kidneys and kidney tumors in CT using submanifold sparse convolutional networks (SSCNs). Stage 1 employs a low-resolution sparse network to localize an ROI; Stage 2 applies a high-resolution sparse network for refinement inside the cropped ROI. On the KiTS23 dataset with 5-fold cross-validation, it reports Dice scores of 95.8% (kidneys + masses), 85.7% (tumours + cysts), and 80.3% (tumours alone), claiming these are competitive with or slightly superior to a patch-based nnU-Net baseline while delivering up to 60% shorter inference time and 75% lower VRAM usage versus dense equivalents and other baselines such as SegVol.
Significance. If the ROI-coverage assumption holds, the work supplies concrete empirical support for efficiency gains in high-resolution 3D medical segmentation through direct side-by-side measurements of Dice, runtime, and VRAM on identical cross-validation folds. The explicit multi-target Dice reporting and hardware-specific comparisons constitute a strength that would be useful for practitioners facing memory constraints on large volumetric CT data.
major comments (1)
- [Methods (two-stage pipeline) and Results (Dice reporting)] The central efficiency and accuracy claims rest on the premise that the low-resolution stage-1 sparse network produces an ROI containing every kidney and tumor voxel (including small or peripheral lesions). No stage-1 recall, missed-lesion count, or per-case ROI coverage statistics are reported in the methods or results, so the final Dice scores cannot be interpreted as guaranteed full-volume performance. This assumption is load-bearing for the two-stage design.
minor comments (1)
- [Abstract and Methods] The abstract and results would benefit from a brief statement of the exact voxel sparsity thresholds or submanifold convolution parameters used in each stage for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. The comment raises an important consideration for the two-stage pipeline, which we address below. We have made revisions to incorporate additional supporting analysis as suggested.
read point-by-point responses
-
Referee: [Methods (two-stage pipeline) and Results (Dice reporting)] The central efficiency and accuracy claims rest on the premise that the low-resolution stage-1 sparse network produces an ROI containing every kidney and tumor voxel (including small or peripheral lesions). No stage-1 recall, missed-lesion count, or per-case ROI coverage statistics are reported in the methods or results, so the final Dice scores cannot be interpreted as guaranteed full-volume performance. This assumption is load-bearing for the two-stage design.
Authors: We agree that the central claims depend on the stage-1 network producing an ROI that encompasses all kidney and tumor voxels. The manuscript does not currently report stage-1 recall or per-case ROI coverage statistics, which limits the ability to fully interpret the Dice scores as guaranteed full-volume results. To rectify this, we will add in the revised manuscript a new subsection under Methods describing how we evaluate stage-1 ROI coverage, along with quantitative results in the Results section, including recall rates and any instances of missed lesions. This will allow readers to assess the validity of the assumption directly. revision: yes
Circularity Check
No significant circularity; empirical evaluation only
full rationale
The paper proposes a practical two-stage sparse CNN architecture for 3D CT segmentation and reports Dice scores from 5-fold cross-validation on KiTS23, with direct runtime/VRAM comparisons to nnU-Net and SegVol. No mathematical derivations, parameter fits redefined as predictions, uniqueness theorems, or self-citation chains appear in the described method or results. All performance claims rest on external measurement against held-out data and independent baselines rather than any reduction to the method's own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Submanifold sparse convolutions preserve segmentation accuracy when applied to medical CT volumes at native resolution.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
voxel sparsification... 0.5th and 99.5th percentiles of the HU values... threshold range of (-53.4, 283.2) HU... retains approximately 99% of segmentation voxels while removing 76.8% of background voxels
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
two-stage... Stage 1 uses a low-resolution sparse network to identify a region of interest (ROI); Stage 2 applies a high-resolution sparse network
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation
SEMIR replaces dense voxel computation with a learned topology-preserving graph minor that supports exact decoding and GNN-based inference for small-structure segmentation in large medical images.
Reference graph
Works this paper leans on
-
[1]
Bansal, A., Dhamija, E., Chandrashekhara, S. & Sahoo, R. Role of CT in the detection and management of cancer related complications: a study of 599 patients.Ecancermedicalscience17, DOI: 10.3332/ecancer.2023.1529 (2023). 2.Siegel, R., Giaquinto, A. & A., J. Cancer statistics, 2024.CA Cancer J Clin.74, DOI: 10.3322/caac.21830 (2024). 3.West Midlands Cancer...
-
[2]
Stewart, G. & et al. The multispeciality approach to the management of localised kidney cancer.Lancet400, 523–534, DOI: 10.1016/S0140-6736(22)01059-5 (2022)
-
[3]
Gillies, R. J., Kinahan, P. & Hricak, H. Radiomics: Images are more than pictures, they are data.Radiology563–577, DOI: 10.1148/radiol.2015151169 (2016)
-
[4]
Uhlig, A. & et al. Radiomics and machine learning for renal tumor subtype assessment using multiphase computed tomography in a multicenter setting.Eur. Radiol.34, 6254–6263, DOI: 10.1007/s00330-024-10731-6 (2024)
-
[5]
Rundo, L. & et al. Clinically interpretable radiomics-based prediction of histopathologic response to neoadjuvant chemotherapy in high-grade serous ovarian carcinoma.Front. Oncol.12, DOI: 10.3389/fonc.2022.868265 (2022)
-
[6]
Abdelrahman, A. & Viriri, S. Kidney tumor semantic segmentation using deep learning: A survey of state-of-the-art.J. Imaging8, DOI: 10.3390/jimaging8030055 (2022)
-
[7]
Heller, N. & et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge.Med. Image Analysis67, 101821, DOI: 10.1016/j.media.2020.101821 (2021)
-
[8]
Buddenkotte, T.et al.Deep learning-based segmentation of multisite disease in ovarian cancer.Eur Radiol Exp.7(1), DOI: 10.1186/s41747-023-00388-z (2023)
-
[9]
Wasserthal, J. & et al. Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images.Radiol. Artif. Intell. DOI: 10.1148/ryai.230024 (2023)
-
[10]
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation.Int. Conf. on Med. Image Comput. Comput. Interv. (MICCAI)DOI: 10.1007/978-3-319-24574-4_28 (2015)
-
[11]
Isensee, F., Jaeger, P., Kohl, S., Petersen, J. & Maier-Hein, K. nnU-net: a self-configuring method for deep learning-based biomedical image segmentation.Nat MethodsDOI: 10.1038/s41592-020-01008-z (2021)
-
[12]
Kshatri, S. S. & Singh, D. Convolutional neural network in medical image analysis: a review.Arch. Comput. Methods Eng. 30, 2793–2810 (2023)
work page 2023
-
[13]
Submanifold Sparse Convolutional Networks
Graham, B. & van der Maaten, L. Submanifold sparse convolutional networks.arXivDOI: 10.48550/arXiv.1706.01307 (2017)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.01307 2017
-
[14]
Graham, B., Engelcke, M. & van der Maaten, L. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks.IEEE/CVF Conf. on Comput. Vis. Pattern Recognit.DOI: 10.1109/CVPR.2018.00961 (2018)
-
[15]
Dominé, L. & Terao, K. Scalable deep convolutional neural networks for sparse, locally dense liquid argon time projection chamber data.Phys. Rev. D102, 012005, DOI: 10.1103/PhysRevD.102.012005 (2020)
-
[16]
Adams, C. & et al. Enhancing neutrino event reconstruction with pixel-based 3D readout for liquid argon time projection chambers.JINST15, P04009, DOI: 10.1088/1748-0221/15/04/P04009 (2020). 1912.10133
-
[17]
Kekic, M. & et al. Demonstration of background rejection using deep convolutional neural networks in the NEXT experiment.JHEP01, 189, DOI: 10.1007/JHEP01(2021)189 (2021). 2009.10783. 20.Jianning, L.et al.Sparse Convolutional Neural Networks for Medical Image Analysis (2022)
-
[18]
Li, J.et al.Sparse convolutional neural network for high-resolution skull shape completion and shape super-resolution.Sci. Reports13, 20229 (2023). 10/12
work page 2023
-
[19]
Hounsfield, G. N. Computed Medical Imaging: Nobel Lecture December 8 1979.J Comput. Assist. Tomogr4, 665–674 (1980). 23.The 2023 kidney and kidney tumor segmentation challenge (Accessed June 2025). Https://kits-challenge.org/kits23/
work page 1979
- [20]
-
[21]
Uhm, K.-H.et al.Exploring 3d u-net training configurations and post-processing strategies for the miccai 2023 kidney and tumor segmentation challenge (2023). 2312.05528
-
[22]
Myronenko, A., Yang, D., He, Y . & Xu, D. Automated 3d segmentation of kidneys and tumors in miccai kits 2023 challenge.Lect. Notes Comput. Sci.14540, DOI: 10.1007/978-3-031-54806-2_1 (2024)
-
[23]
MONAI: An open-source framework for deep learning in healthcare
Liu, S. & Han, B. Dynamic resolution network for kidney tumor segmentation. In Heller, N.et al.(eds.)Kidney and Kidney Tumor Segmentation, 14–21 (Springer Nature Switzerland, Cham, 2024). 28.Cardoso, M. J. & et al. MONAI: An open-source framework for deep learning in healthcare (2022). 2211.02701
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Choy, C., Gwak, J. Y . & Savarese, S. 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3075–3084 (2019)
work page 2019
-
[25]
Woo, S.et al.Convnext v2: Co-designing and scaling convnets with masked autoencoders. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16133–16142, DOI: 10.1109/CVPR52729.2023.01548 (2023)
-
[26]
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(2019)
work page 2019
-
[27]
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. InProceedings of the 7th International Conference on Learning Representations (ICLR)(2019)
work page 2019
-
[28]
arXiv preprint arXiv:2307.01984 (2023)
Heller, N.et al.The KiTS21 challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT (2023). 2307.01984
-
[29]
Uhm, K. & et al. Configurations and post-processing strategies for the miccai 2023 kidney and tumor segmentation challenge.Lect. Notes Comput. Sci.14540, DOI: 10.1007/978-3-031-54806-2_2 (2024)
-
[30]
Methods Programs Biomed.221, 106861, DOI: https://doi.org/10.1016/j.cmpb
Hsiao, C.-H.et al.A deep learning-based precision volume calculation approach for kidney and tumor segmentation on computed tomography images.Comput. Methods Programs Biomed.221, 106861, DOI: https://doi.org/10.1016/j.cmpb. 2022.106861 (2022)
-
[31]
Ueda, D. & et al. Climate change and artificial intelligence in healthcare: Review and recommendations towards a sustainable future.Diagn. Interv. Imaging453–459, DOI: 10.1016/j.diii.2024.06.002 (2024). Acknowledgements The authors would like to thank Dr Thomas Buddenkotte for answering questions regarding his automated segmentation method, and the organi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.