pith. sign in

arxiv: 2604.07141 · v1 · submitted 2026-04-08 · 💻 cs.CV

USCNet: Transformer-Based Multimodal Fusion with Segmentation Guidance for Urolithiasis Classification

Pith reviewed 2026-05-10 17:53 UTC · model grok-4.3

classification 💻 cs.CV
keywords urolithiasiskidney stonesmultimodal fusiontransformerCT imagingsegmentation guidancepreoperative classificationelectronic health records
0
0 comments X

The pith

A transformer network fuses CT images and health records with segmentation guidance to classify kidney stones before surgery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents USCNet as a way to classify the composition of kidney stones preoperatively by combining CT scans with data from electronic health records. It builds a transformer framework that adds CT-EHR attention and segmentation-guided attention modules, plus a dynamic loss that balances the segmentation and classification tasks. Experiments on the authors' in-house dataset show the model beats mainstream methods across standard metrics. A reader would care because current practice waits until after surgery to analyze stone samples, which delays tailored treatment and recurrence prevention. The work releases its code publicly.

Core claim

USCNet is a Transformer-based multimodal fusion framework with CT-EHR attention and segmentation-guided attention modules that employs a dynamic loss function to balance segmentation and classification objectives, achieving superior classification performance on an in-house kidney stone dataset compared to existing mainstream methods.

What carries the argument

The Transformer-based multimodal fusion framework that incorporates CT-EHR attention and segmentation-guided attention modules, balanced by a dynamic loss function.

Load-bearing premise

The in-house kidney stone dataset distribution matches real-world clinical cases across hospitals so the model generalizes without retraining or domain adaptation.

What would settle it

Evaluating the model on an external multi-hospital dataset collected under different imaging protocols or patient demographics would show whether the reported classification gains hold.

Figures

Figures reproduced from arXiv: 2604.07141 by Ahmed Elazab, Changmiao Wang, Hai Wang, Jiexin Pan, Liya Liu, Nannan Li, Songqi Zhang, Xiang Wan, Xingzhi Li, Yifei Wang, Yi Jiang, Yongquan Zhang.

Figure 1
Figure 1. Figure 1: The architecture of the USCNet: (a) the visual and textual transformation module, (b) the ViT-UNetSeg module, (c) the MSAF feature fusion module, (d) the classification module, (e) the Transformer encoder block, and (f) the cross-attention principle. information, the model can more effectively distinguish infec￾tious stones. 2) Unlike SegPrompt [31], which utilizes segmentation maps as static prompts, we i… view at source ↗
Figure 2
Figure 2. Figure 2: ROC curve comparison of diagnostic accuracy between USCNet and established methods for infectious urolithiasis classification. performance in jointly handling both classification and seg￾mentation tasks. 2) Evaluation Metrics: We employed five key evaluation metrics to assess the classification performance of our pro￾posed model in comparison to state-of-the-art approaches: Accuracy (Acc), F1 Score (F1), R… view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrices comparison of USCNet versus multimodal baseline methods, quantitatively evaluating classification performance for infectious urolithiasis diagnosis. Axial Coronal Sagittal 3D View Original Image Ground Truth nnUnet USCNet [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Empirical validation of the dynamic loss threshold through systematic sensitivity analysis. E. Ablation Study 1) Ablation Study on Key Modules: This ablation study ex￾amines the individual contributions of key components within the proposed model, focusing on clinical data, the CEA module, the SMA module, and dynamic weight (DW) manage￾ment. To evaluate the effectiveness of these components, we developed t… view at source ↗
read the original abstract

Kidney stone disease ranks among the most prevalent conditions in urology, and understanding the composition of these stones is essential for creating personalized treatment plans and preventing recurrence. Current methods for analyzing kidney stones depend on postoperative specimens, which prevents rapid classification before surgery. To overcome this limitation, we introduce a new approach called the Urinary Stone Segmentation and Classification Network (USCNet). This innovative method allows for precise preoperative classification of kidney stones by integrating Computed Tomography (CT) images with clinical data from Electronic Health Records (EHR). USCNet employs a Transformer-based multimodal fusion framework with CT-EHR attention and segmentation-guided attention modules for accurate classification. Moreover, a dynamic loss function is introduced to effectively balance the dual objectives of segmentation and classification. Experiments on an in-house kidney stone dataset show that USCNet demonstrates outstanding performance across all evaluation metrics, with its classification efficacy significantly surpassing existing mainstream methods. This study presents a promising solution for the precise preoperative classification of kidney stones, offering substantial clinical benefits. The source code has been made publicly available: https://github.com/ZhangSongqi0506/KidneyStone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces USCNet, a Transformer-based multimodal fusion network that combines CT images with EHR clinical data using CT-EHR attention and segmentation-guided attention modules, along with a dynamic loss function balancing segmentation and classification objectives. It claims to enable accurate preoperative kidney stone composition classification and reports outstanding performance surpassing mainstream methods on an in-house dataset, with public code release.

Significance. If the empirical superiority holds under rigorous validation, the work could advance preoperative urolithiasis management by reducing reliance on postoperative analysis, enabling personalized treatment plans. The multimodal transformer design with segmentation guidance and dynamic loss is a reasonable technical contribution, and the public code supports reproducibility.

major comments (2)
  1. [Experiments] Experiments section: The central claim of 'outstanding performance across all evaluation metrics' and 'significantly surpassing existing mainstream methods' rests entirely on a single in-house kidney stone dataset; no external validation, multi-center testing, public dataset evaluation, or domain-shift experiments are described, which is load-bearing for any assertion of clinical generalizability or preoperative utility.
  2. [Abstract] Abstract and §1: The performance assertions lack any quantitative metrics, baseline details, dataset size/composition/statistics, or statistical tests (e.g., p-values or confidence intervals), preventing assessment of whether reported gains are meaningful or architecture-driven versus dataset-specific.
minor comments (2)
  1. [Method] The description of the dynamic loss function and attention modules would benefit from explicit equations or pseudocode to clarify the balancing mechanism and fusion process.
  2. [Figures/Tables] Figure captions and table headers should explicitly state the evaluation metrics used (e.g., accuracy, AUC, Dice) and the exact baselines compared.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback. We address the major comments point by point below. Revisions have been made to the abstract, introduction, and experiments section to improve transparency and acknowledge limitations.

read point-by-point responses
  1. Referee: [Abstract] Abstract and §1: The performance assertions lack any quantitative metrics, baseline details, dataset size/composition/statistics, or statistical tests (e.g., p-values or confidence intervals), preventing assessment of whether reported gains are meaningful or architecture-driven versus dataset-specific.

    Authors: We agree that the abstract and introduction would benefit from explicit quantitative support. In the revised manuscript we have added key metrics (accuracy, AUC, F1-score for USCNet and all baselines), dataset statistics (number of patients, CT scans, stone composition distribution), and results of paired statistical tests (p-values < 0.01) comparing USCNet against the strongest baselines. These additions allow readers to assess the magnitude and significance of the reported gains directly. revision: yes

  2. Referee: [Experiments] Experiments section: The central claim of 'outstanding performance across all evaluation metrics' and 'significantly surpassing existing mainstream methods' rests entirely on a single in-house kidney stone dataset; no external validation, multi-center testing, public dataset evaluation, or domain-shift experiments are described, which is load-bearing for any assertion of clinical generalizability or preoperative utility.

    Authors: We acknowledge that all quantitative results are derived from a single in-house dataset and that external or multi-center validation is absent. We have expanded the experiments section with additional dataset characterization and inserted a new limitations paragraph that explicitly discusses the lack of external validation, potential domain shift, and the practical difficulties of obtaining paired CT-EHR data across institutions. The released code enables others to perform such tests on their own data. We do not claim clinical generalizability beyond the reported cohort. revision: partial

standing simulated objections not resolved
  • Absence of external validation, multi-center testing, or public-dataset evaluation, as no such data were available for this study.

Circularity Check

0 steps flagged

No circularity; empirical claims rest on held-out test performance

full rationale

The paper introduces USCNet as a multimodal architecture and reports classification metrics on an in-house kidney-stone dataset. No equations, derivations, or parameter-fitting steps are described that would reduce any reported result to its own inputs by construction. Performance is measured on held-out data in the standard supervised-learning manner; the central claim does not rely on self-citations, uniqueness theorems, or ansatzes imported from prior author work. The absence of external validation is a generalizability concern, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated beyond standard deep-learning training practices.

pith-pipeline@v0.9.0 · 5530 in / 981 out tokens · 32405 ms · 2026-05-10T17:53:23.962970+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    Epidemiological trends of urolithiasis at the global, regional, and national levels: a population-based study,

    X. Qian, J. Wan, J. Xu, C. Liu, M. Zhong, J. Zhang, Y . Zhang, and S. Wang, “Epidemiological trends of urolithiasis at the global, regional, and national levels: a population-based study,”International Journal of Clinical Practice, Mar. 2022

  2. [2]

    Epidemiological research progress on urological stones and stone composition,

    X. W. Lin, Y . H. Luo, J. L. Li, and B. Y . Zhang, “Epidemiological research progress on urological stones and stone composition,”Asian Journal of Clinical Medicine, vol. 4, no. 2, p. 49, Mar. 2021

  3. [3]

    Prevalence of kidney stones in mainland china: A systematic review,

    W. Wang, J. Fan, G. Huang, J. Li, X. Zhu, Y . Tian, and L. Su, “Prevalence of kidney stones in mainland china: A systematic review,” Scientific Reports, vol. 7, no. 1, p. 41630, Jan. 2017

  4. [4]

    Stone composition pattern of kidney stone,

    A. Alpendri and H. R. Danarto, “Stone composition pattern of kidney stone,”Jurnal Urologi Indonesia, vol. 20, no. 1, p. 44, Jan. 2013

  5. [5]

    Re- search advances of ct and ai technology in predicting the composition of urinary calculi,

    B. Yang, D. Wang, Y . Zhou, G. Zhou, C. Wan, J. Xu, and J. Liu, “Re- search advances of ct and ai technology in predicting the composition of urinary calculi,”Journal of Clinical Urology, vol. 38, no. 2, pp. 139– 145, Feb. 2023

  6. [6]

    Eau guidelines on diagnosis and conservative management of urolithiasis,

    C. T ¨urk, A. Pet ˇr´ık, K. Sarica, C. Seitz, A. Skolarikos, M. Straub, and T. Knoll, “Eau guidelines on diagnosis and conservative management of urolithiasis,”European Urology, vol. 69, no. 3, pp. 468–474, Mar. 2016

  7. [7]

    Medical management of kidney stones: Aua guide- line,

    M. S. Pearle, D. S. Goldfarb, D. G. Assimos, G. Curhan, C. J. Denu- Ciocca, B. R. Matlaga, M. Monga, K. L. Penniston, G. M. Preminger, T. M. Turket al., “Medical management of kidney stones: Aua guide- line,”Journal of Urology, vol. 192, no. 2, pp. 316–324, Aug. 2014

  8. [8]

    Kidney stone prediction based on urine analysis using ensemble learn- ing,

    S. Gayathri, J. Gowthami, S. Jayavarshini, K. Karthika, and K. Nandhini, “Kidney stone prediction based on urine analysis using ensemble learn- ing,”2025 4th OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 5.0, Apr. 2025

  9. [9]

    What is the state of the stone analysis techniques in urolithiasis?

    A. Basiri, M. Taheri, and F. Taheri, “What is the state of the stone analysis techniques in urolithiasis?”Urology Journal, vol. 9, no. 2, pp. 445–454, May 2012

  10. [10]

    Deep learning for medical image processing: Overview, challenges and the future,

    M. I. Razzak, S. Naz, and A. Zaib, “Deep learning for medical image processing: Overview, challenges and the future,”Classification in BioApps: Automation of Decision Making, pp. 323–350, Nov. 2017

  11. [11]

    New and evolving concepts in the imaging and management of urolithi- asis: urologists’ perspective,

    A. R. Kambadakone, B. H. Eisner, O. A. Catalano, and D. V . Sahani, “New and evolving concepts in the imaging and management of urolithi- asis: urologists’ perspective,”Radiographics, vol. 30, no. 3, pp. 603–623, May 2010. 12 IEEE TRANSACTIONS AND JOURNALS TEMPLATE

  12. [12]

    Vision transformers, ensemble model, and transfer learning leveraging explainable ai for brain tumor detection and classification,

    S. Hossain, A. Chakrabarty, T. R. Gadekallu, M. Alazab, and M. J. Piran, “Vision transformers, ensemble model, and transfer learning leveraging explainable ai for brain tumor detection and classification,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 3, pp. 1261–1272, 2024

  13. [13]

    Hybrid neural network framework for multiclass classification of kidney stones from ct scans,

    M. Y . Sayed, Y . Waykar, B. Nepal, S. Y . Pabalkar, M. Vaishnavi, and P. K. Sudhakar, “Hybrid neural network framework for multiclass classification of kidney stones from ct scans,”2025 6th International Conference for Emerging Technology (INCET), May. 2025

  14. [14]

    Clinical- inspired framework for automatic kidney stone recognition and analysis on transverse ct images,

    W. Xu, C. Lai, Z. Mo, C. Liu, M. Li, G. Zhao, and K. Xu, “Clinical- inspired framework for automatic kidney stone recognition and analysis on transverse ct images,”IEEE Journal of Biomedical and Health Informatics, 2024

  15. [15]

    Advances on artificial intelligence in the diagnosis and treatment of urinary calculi,

    Z. H. A. N. Xiaofei, C. U. I. Zhenyu, and Z. H. A. O. Chunli, “Advances on artificial intelligence in the diagnosis and treatment of urinary calculi,”Journal of Hebei Medical College for Continuing Education, vol. 39, no. 4, p. 30, 2022

  16. [16]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

  17. [17]

    Hines, J

    J. Hines, J. A. Karajgikar, J. D. Giardina, and B. Friedman,Imaging the Urinary Tract: Fundamentals of Ultrasound, Computed Tomography, and Magnetic Resonance Imaging. Interventional Urology, Nov. 2021

  18. [18]

    Deep learning in medical image analysis,

    D. Shen, G. Wu, and H.-I. Suk, “Deep learning in medical image analysis,”Annual Review of Biomedical Engineering, vol. 19, no. 1, pp. 221–248, Mar. 2017

  19. [19]

    Stonenet: An efficient lightweight model based on depthwise separable convolutions for kidney stone detection from ct images,

    S. Asif, M. Zhao, X. Chen, and Y . Zhu, “Stonenet: An efficient lightweight model based on depthwise separable convolutions for kidney stone detection from ct images,”Interdisciplinary Sciences: Computa- tional Life Sciences, vol. 15, no. 4, pp. 633–652, Jul. 2023

  20. [20]

    A deep learning system for automated kidney stone detection and volumetric segmentation on noncontrast ct scans,

    D. C. Elton, E. B. Turkbey, P. J. Pickhardt, and R. M. Summers, “A deep learning system for automated kidney stone detection and volumetric segmentation on noncontrast ct scans,”Medical Physics, vol. 49, no. 4, pp. 2545–2554, Feb. 2022

  21. [21]

    Application of kronecker convolutions in deep learning technique for automated detection of kidney stones with coronal ct images,

    K. K. Patro, J. P. Allam, B. C. Neelapu, R. Tadeusiewicz, U. R. Acharya, M. Hammad, O. Yildirim, and P. Plawiak, “Application of kronecker convolutions in deep learning technique for automated detection of kidney stones with coronal ct images,”Information Sciences, vol. 640, p. 119005, Sep. 2023

  22. [22]

    Resganet: Residual group attention network for medical image classification and segmentation,

    J. Cheng, S. Tian, L. Yu, C. Gao, X. Kang, X. Ma, W. Wu, S. Liu, and H. Lu, “Resganet: Residual group attention network for medical image classification and segmentation,”Medical Image Analysis (MIA), vol. 76, p. 102313, Feb. 2022

  23. [23]

    Hymnet: a multimodal deep learning system for hypertension classification using fundus photographs and cardiometabolic risk factors,

    M. Baharoon, H. Almatar, R. Alduhayan, T. Aldebasi, B. Alahmadi, Y . Bokhari, M. Alawad, A. Almazroa, and A. Aljouie, “Hymnet: a multimodal deep learning system for hypertension classification using fundus photographs and cardiometabolic risk factors,”arXiv preprint arXiv:2310.01099, Mar. 2024

  24. [24]

    Ich-prnet: a cross-modal intracerebral haemorrhage prog- nostic prediction method using joint-attention interaction mechanism,

    X. Yu, A. Elazab, R. Ge, J. Zhu, L. Zhang, G. Jia, Q. Wu, X. Wan, L. Li, and C. Wang, “Ich-prnet: a cross-modal intracerebral haemorrhage prog- nostic prediction method using joint-attention interaction mechanism,” Neural Networks, vol. 184, p. 107096, Apr. 2025

  25. [25]

    Ich-scnet: Intracerebral hemorrhage segmentation and prog- nosis classification network using clip-guided sam mechanism,

    X. Yu, A. Elazab, R. Ge, H. Jin, X. Jiang, G. Jia, Q. Wu, Q. Shi, and C. Wang, “Ich-scnet: Intracerebral hemorrhage segmentation and prog- nosis classification network using clip-guided sam mechanism,” in2024 IEEE International Conference on Bioinformatics and Biomedicine). IEEE, Dec. 2024, pp. 2795–2800

  26. [26]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in Neural Information Processing Systems, vol. 30, 2023

  27. [27]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, Jun. 2020

  28. [28]

    Unetr: Transformers for 3d medical image segmentation,

    A. Hatamizadeh, Y . Tang, V . Nath, D. Yang, A. Myronenko, B. Land- man, H. R. Roth, and D. Xu, “Unetr: Transformers for 3d medical image segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 574–584

  29. [29]

    U-net: Convolutional net- works for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net- works for biomedical image segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Nov. 2015, pp. 234–241

  30. [30]

    Hybrid masked image modeling for 3d medical image segmentation,

    Z. Xing, L. Zhu, L. Yu, Z. Xing, and L. Wan, “Hybrid masked image modeling for 3d medical image segmentation,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 4, pp. 2115–2125, 2024

  31. [31]

    Segprompt: Using segmentation map as a better prompt to finetune deep models for kidney stone classification,

    W. Zhu, R. Zhou, Y . Yao, T. D. Campbell, R. K. Jain, and J. Luo, “Segprompt: Using segmentation map as a better prompt to finetune deep models for kidney stone classification,” inMedical Imaging with Deep Learning. PMLR, 2024, pp. 1680–1690

  32. [32]

    Tmss: an end- to-end transformer-based multimodal network for segmentation and survival prediction,

    N. Saeed, I. Sobirov, R. Al Majzoub, and M. Yaqub, “Tmss: an end- to-end transformer-based multimodal network for segmentation and survival prediction,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer Nature Switzerland, Sep. 2022, pp. 319–329

  33. [33]

    Ehr-hgcn: An enhanced hybrid approach for text classification using heterogeneous graph convolutional networks in electronic health records,

    G. Wang, X. Lou, F. Guo, D. Kwok, and C. Cao, “Ehr-hgcn: An enhanced hybrid approach for text classification using heterogeneous graph convolutional networks in electronic health records,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 3, pp. 1668–1679, 2023

  34. [34]

    V-net: Fully convolutional neural networks for volumetric medical image segmentation,

    F. Milletari, N. Navab, and S. A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in2016 Fourth International Conference on 3D Vision (3DV), 2016, pp. 565– 571

  35. [35]

    Focal loss for dense object detection,

    T. Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988

  36. [36]

    nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

    F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature Methods, vol. 18, no. 2, pp. 203–211, Dec. 2021