pith. sign in

arxiv: 2502.02097 · v3 · submitted 2025-02-04 · 💻 cs.CV

VerteNet -- A Multi-Context Hybrid CNN Transformer for Accurate Vertebral Landmark Localization in Lateral Spine DXA Images

Pith reviewed 2026-05-23 03:54 UTC · model grok-4.3

classification 💻 cs.CV
keywords vertebral landmark localizationDXA spine imaginghybrid CNN transformerdeep learningfracture assessmentabdominal aortic calcificationmulti-scanner generalization
0
0 comments X

The pith

A hybrid CNN-Transformer model localizes vertebral corners in lateral DXA spine images with 4.92 pixel normalized mean error across four scanner models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests VerteNet, a dual-resolution self- and cross-attention network, to place landmarks at the corners of vertebrae T12 through L5 on lateral spine DXA scans. These images are low-contrast and vary by manufacturer, so manual placement is slow and inconsistent, yet the landmarks are needed for fracture grading and abdominal aortic calcification scoring. The model reports a normalized mean error of 4.92 pixels and median error of 2.35 pixels, beating prior methods on data from four different scanners. It also detects an abdominal aorta crop with 96 percent accuracy on a held-out test set and produces intervertebral guides that raise inter-reader agreement. The result is a pipeline that supplies reliable landmarks for the two clinical tasks without requiring reader-specific retraining.

Core claim

The dual-resolution self- and cross-attention hybrid CNN Transformer achieves a normalized mean localization error of 4.92 pixels and a median error of 2.35 pixels on manually annotated vertebral corner landmarks (T12-L5) drawn from four DXA scanner models, outperforming baseline methods while also delivering 100 percent validation accuracy and 96 percent test accuracy for an abdominal aorta crop detector.

What carries the argument

Dual-resolution self- and cross-attention hybrid CNN Transformer that fuses multi-scale context to predict vertebral corner coordinates.

If this is right

  • Landmark coordinates become reliable enough for automatic fracture assessment on DXA scans.
  • Generated intervertebral guides raise agreement between different human readers on the same images.
  • The same pipeline supplies the vertebral positions required for the 24-point Kauppila method of scoring abdominal aortic calcification.
  • Performance holds across four distinct DXA scanner models without scanner-specific retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The model could be inserted into existing DXA reporting software to shorten the time from scan acquisition to fracture and calcification reports.
  • If the landmark accuracy transfers to new scanner models released after the study, the method would lower the barrier to multi-center DXA research.
  • The dual-resolution attention design may generalize to other low-contrast landmark tasks such as pelvis or hand radiographs.

Load-bearing premise

The manually placed ground-truth corner positions are treated as accurate and consistent reference points for all scanners and readers.

What would settle it

Re-annotation of the same test images by multiple independent readers or comparison against fracture status confirmed on follow-up imaging would show whether the reported error reduction persists.

Figures

Figures reproduced from arXiv: 2502.02097 by Afsah Saleem, Arooba Maqsood, David Suter, Erchuan Zhang, John T. Schousboe, Jonathan M. Hodgson, Joshua R. Lewis, Parminder Raina, Syed Zulqarnain Gilani, William D. Leslie, Zaid Ilyas.

Figure 1
Figure 1. Figure 1: (a) CT Lateral Spine Imaging – the gold standard, slower with highest radiation exposure [29]. (b) Digital X-Ray Imaging – a faster option with lower radiation exposure than CT [7]. (c) Hologic DXA SE variant – quickest with lowest radiation exposure, though susceptible to artifacts such as bowel gas. (d) and (e) GE DXA SE and DE variants – equipped with radiation-reducing technology (black regions), offer… view at source ↗
Figure 2
Figure 2. Figure 2: (a) A DXA LSI example with red arrows marking the location of AAC. (b) An illustration of Kauppila’s AAC￾24 scoring method [13]. (c) A DXA image example showing unclear vertebral boundaries, with (d) indicating the intended placement of IVGs needed for AAC-24 scoring. space. Yang et al. [32] developed a VLL framework based on heatmaps and incorporated a Markov Random Field model for refining landmarks. Pay… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Proposed Framework VerteNet (b) Dual Resolution Self Attention (DRSA) (c) Dual Resolution Cross Attention (DRCA) - takes two feature maps as input and generates Query from one feature map and Keys and Values from the other feature map. (d) Multi-Context Feature Fusion Block (MCFB) that employs DRSA and DRCA to calculate self-attention within, and cross attention among features from skip connection of l… view at source ↗
Figure 4
Figure 4. Figure 4: (a) HiLo Attention [17] - Uses the same window count s in both the low and actual resolution SA paths, which limits the context to individual patches. (b) DRSA - Uses the same window size in both resolution SA paths which increases the context in low-resolution SA path, and also introduces overlapping. In (a) and (b), r is the size reduction factor, and s is the window count. which was mainly designed for … view at source ↗
Figure 5
Figure 5. Figure 5: (a) and (b) describe the types of input images that our algorithm can process, originating from two different machines: one with black regions (GE machine) and one without (Hologic machines). (c) illustrates the block-level structure of the abdominal aorta crop detection algorithm. The image classifier categorizes the input images into two groups: those with black regions and those without (d). (e) shows t… view at source ↗
Figure 6
Figure 6. Figure 6: (a) Possible abdominal crop detection (both based on black region and image width) in DE DXA image from GE machine. (b) No aorta crop detection in the range L1 to L4 in the SE DXA image from GE machine. (c)(d) No abdominal aorta crop in SE Hologic machine images [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) and (b) represent the failure cases of the proposed algorithm, where improper localization of landmarks in images from Hologic and GE machines leads to suboptimal performance, as the algorithm relies on accurate landmark detection. IVGs to divide the abdominal aorta into four regions cor￾responding to L1–L4 vertebrae. While experienced profes￾sionals can position IVGs and categorize regions accurately,… view at source ↗
read the original abstract

This aims to develop and validate a deep learning model that can accurately locate vertebral landmarks in lateral spine Dual energy X-ray Absorptiometry (DXA) scans. Accurate vertebral landmark localization is critical for reliable fracture assessment and scoring of abdominal aortic calcification using the Kauppila 24-point method; however, DXA lateral spine images are low-contrast, artifact-prone, and manufacturer-dependent, while manual annotation is time-consuming and reader-dependent. This study aimed to address these challenges by developing a dual-resolution self- and cross-attention model for robust vertebral landmark localization using lateral spine DXA scans from four different scanner models. Ground-truth vertebral corner landmarks (T12 to L5) were manually annotated, and performance was evaluated using normalized mean and median localization errors against baseline and state-of-the-art methods. The proposed framework achieved superior localization accuracy across all four DXA scanner models, with a normalized mean error of 4.92 pixels and a median error of 2.35 pixels, outperforming baseline methods. The abdominal aorta crop detection algorithm achieved 100% accuracy in validation and 96% accuracy (sensitivity 0.93, specificity 0.98) in an independent test set. Generated intervertebral guides further improved inter-reader agreement, reflected by higher Cohens weighted kappa and inter-reader correlation. The proposed deep learning framework enables accurate and robust vertebral landmark localization in lateral spine DXA images across heterogeneous imaging systems to support clinically relevant downstream analyses. The code for this work can be found at: https://github.com/zaidilyas89/VerteNet

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper introduces VerteNet, a dual-resolution hybrid CNN-Transformer model using self- and cross-attention for localizing vertebral corner landmarks (T12–L5) in lateral spine DXA images acquired on four scanner models. Ground-truth corners were manually annotated; the model reports a normalized mean error of 4.92 pixels and median error of 2.35 pixels, outperforming baselines, while an auxiliary aorta-crop module reaches 96% accuracy on an independent test set. Generated intervertebral guides improve downstream inter-reader agreement (Cohen’s weighted kappa and correlation). Code is released at the cited GitHub repository.

Significance. If the performance numbers hold under reliable ground truth, the work supplies a practical, multi-scanner solution for automating a time-consuming and reader-dependent step that directly supports fracture grading and Kauppila AAC scoring. The explicit multi-vendor evaluation and public code release are concrete strengths that improve reproducibility and potential for clinical adoption.

major comments (1)
  1. [Abstract and Results] Abstract and Results: the headline claim of 'accurate' localization (normalized mean error 4.92 px, median 2.35 px) and superiority rests on comparison to a single set of manual corner annotations whose inter-rater reliability is never quantified. The manuscript reports improved inter-reader agreement only for the derived intervertebral guides, not for the landmark coordinates themselves. Without an inter-annotator Euclidean distance or similar metric on the same images, it is impossible to determine whether the reported errors lie below, at, or above typical human variability, rendering the absolute accuracy interpretation and clinical relevance of the numbers uncertain.
minor comments (3)
  1. [Abstract] Abstract: dataset size, train-test split ratios, number of images per scanner model, and any statistical testing (e.g., paired t-tests or Wilcoxon tests against baselines) are not stated, making it difficult to gauge the robustness of the reported superiority.
  2. [Methods and Results] Methods/Results: exact implementations, hyper-parameters, and training protocols of the baseline and state-of-the-art methods are not detailed, preventing independent verification of the performance gap.
  3. [Results] The manuscript states that the aorta-crop module was validated at 100% and tested at 96%, but does not report the size or composition of the independent test set used for the 96% figure.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for this constructive observation on the interpretation of our localization results. We address the point directly below.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: the headline claim of 'accurate' localization (normalized mean error 4.92 px, median 2.35 px) and superiority rests on comparison to a single set of manual corner annotations whose inter-rater reliability is never quantified. The manuscript reports improved inter-reader agreement only for the derived intervertebral guides, not for the landmark coordinates themselves. Without an inter-annotator Euclidean distance or similar metric on the same images, it is impossible to determine whether the reported errors lie below, at, or above typical human variability, rendering the absolute accuracy interpretation and clinical relevance of the numbers uncertain.

    Authors: We agree that an inter-annotator variability metric on the landmark coordinates would strengthen the absolute interpretation of the reported errors. Our ground-truth annotations were produced by a single experienced musculoskeletal radiologist using a standardized protocol on the full dataset; consequently, independent multi-rater annotations are not available and we cannot compute Euclidean inter-rater distances. The relative superiority of VerteNet over the baselines remains valid because all methods were evaluated against the identical annotation set. Clinical relevance is evidenced by the statistically significant improvement in inter-reader agreement when the model-derived intervertebral guides (rather than the raw landmarks) are supplied to readers. In the revised manuscript we will (i) explicitly state that landmark annotations were single-rater, (ii) report the limitation that inter-rater reliability for the corner coordinates themselves was not quantified, and (iii) temper the abstract wording from “accurate” to “robust and superior to baselines under the annotation protocol used.” revision: partial

standing simulated objections not resolved
  • Multiple independent landmark annotations do not exist, preventing direct computation of inter-rater Euclidean distances for the corner coordinates.

Circularity Check

0 steps flagged

No circularity: empirical evaluation on held-out annotations

full rationale

The paper trains a hybrid CNN-Transformer on manually annotated vertebral corner landmarks (T12-L5) from DXA scans and reports normalized mean/median localization error on held-out test images across four scanner models. All claims rest on direct comparison of model outputs to external ground-truth coordinates; there are no equations, derivations, fitted parameters, or self-citations that reduce any reported result to its own inputs by construction. The evaluation protocol is standard supervised learning and remains falsifiable via the released code.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality and consistency of the manual landmark annotations used as ground truth and on the assumption that the four-scanner training distribution is representative of future clinical images.

free parameters (1)
  • neural network weights
    All model parameters are fitted to the annotated training images from the four scanner models.
axioms (1)
  • domain assumption Manual annotations provide reliable ground-truth landmark positions
    All error metrics and comparisons are computed against these annotations.

pith-pipeline@v0.9.0 · 5878 in / 1254 out tokens · 83836 ms · 2026-05-23T03:54:44.291524+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

  1. [1]

    The role of dxa bone density scans inthediagnosisandtreatmentofosteoporosis

    Blake, G.M., Fogelman, I., 2007. The role of dxa bone density scans inthediagnosisandtreatmentofosteoporosis. PostgraduateMedical Journal 83, 509–517

  2. [2]

    Automated scoring of aortic calcifi- cation in vertebral fracture assessment images, in: Medical Imaging 2019: Computer-Aided Diagnosis, SPIE

    Chaplin, L., Cootes, T., 2019. Automated scoring of aortic calcifi- cation in vertebral fracture assessment images, in: Medical Imaging 2019: Computer-Aided Diagnosis, SPIE

  3. [3]

    Dual aggregation transformer for image super-resolution, in: Proceedings of the IEEE/CVF ICCV, pp

    Chen, Z., Zhang, Y., Gu, J., Kong, L., Yang, X., Yu, F., 2023. Dual aggregation transformer for image super-resolution, in: Proceedings of the IEEE/CVF ICCV, pp. 12312–12321. : Preprint submitted to Elsevier Page 10 of 11

  4. [4]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J., Chang, M.W., Lee, K., Toutanova, K., 2018. Bert: Pre- training of deep bidirectional transformers for language understand- ing. arXiv preprint arXiv:1810.04805

  5. [5]

    Discriminativeunsupervisedfeaturelearningwithexemplarconvolu- tional neural networks

    Dosovitskiy,A.,Beyer,L.,Kolesnikov,A.,Weissenborn,D.,Zhai,X., Unterthiner,T.,Hossain,I.,Kaiser,L.,Hou,Z.,Moczulski,M.,2016. Discriminativeunsupervisedfeaturelearningwithexemplarconvolu- tional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1734–1747

  6. [6]

    Elmasri, K., Hicks, Y., Yang, X., Sun, X., Pettit, R., Evans, W.,

  7. [7]

    ProcediaComputer Science 96, 1011–1021

    Automatic detection and quantification of abdominal aortic calcificationindualenergyx-rayabsorptiometry. ProcediaComputer Science 96, 1011–1021

  8. [8]

    A dataset of scoliosis, spondylolisthesis, and normal vertebrae x-ray images

    Fraiwan, M., Audat, Z., Manasreh, T., 2022. A dataset of scoliosis, spondylolisthesis, and normal vertebrae x-ray images. Mendeley Data. doi:10.17632/xkt857dsxk.1

  9. [9]

    Show, attend and detect: Towards fine- grained assessment of abdominal aortic calcification on vertebral fracture assessment scans, in: MICCAI, Springer

    Gilani, S.Z., Sharif, N., Suter, D., Schousboe, J.T., Reid, S., Leslie, W.D., Lewis, J.R., 2022. Show, attend and detect: Towards fine- grained assessment of abdominal aortic calcification on vertebral fracture assessment scans, in: MICCAI, Springer. pp. 439–450

  10. [10]

    A keypoint transformer to discover spine structure for cobb angle estimation, in: ICME, IEEE

    Guo, Y., Li, Y., Zhou, X., He, W., 2021. A keypoint transformer to discover spine structure for cobb angle estimation, in: ICME, IEEE. pp. 1–6

  11. [11]

    Landmark localization from medical images with generative distribution prior

    Huang, Z., Zhao, R., Leung, F.H., Banerjee, S., Lam, K.M., Zheng, Y.P., Ling, S.H., 2024. Landmark localization from medical images with generative distribution prior. IEEE TMI

  12. [12]

    Ilyas,Z.,Saleem,A.,Suter,D.,Schousboe,J.T.,Leslie,W.D.,Lewis, J.R., Gilani, S.Z., 2024. A hybrid cnn-transformer feature pyramid networkforgranularabdominalaorticcalcificationdetectionfromdxa images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer Nature Switzerland. pp. 14–25

  13. [13]

    Guidenet: Learning inter-vertebral guides in dxa lateral spine images, in: 2021 DICTA, IEEE

    Ilyas, Z., Sharif, N., Schousboe, J.T., Lewis, J.R., Suter, D., Gilani, S.Z., 2021. Guidenet: Learning inter-vertebral guides in dxa lateral spine images, in: 2021 DICTA, IEEE. pp. 1–7

  14. [14]

    New indices to classify location, severity and progression of calcific lesions in the abdominal aorta: a 25-year follow-up study

    Kauppila, L.I., Polak, J.F., Cupples, L.A., Hannan, M.T., Kiel, D.P., Wilson, P.W., 1997. New indices to classify location, severity and progression of calcific lesions in the abdominal aorta: a 25-year follow-up study. Atherosclerosis 132, 245–250

  15. [15]

    Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, pp

    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P., 2017. Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

  16. [16]

    Thoracolum- bar/lumbardegenerativekyphosis—theimportanceofthoracolumbar junction in sagittal alignment and balance

    Liu, C., Ge, R., Li, H., Zhu, Z., Xia, W., Liu, H., 2023. Thoracolum- bar/lumbardegenerativekyphosis—theimportanceofthoracolumbar junction in sagittal alignment and balance. Journal of Personalized Medicine 14, 36

  17. [17]

    Liu,Z.,Lin,Y.,Cao,Y.,Hu,H.,Wei,Y.,Zhang,Z.,Lin,S.,Guo,B.,

  18. [18]

    10012–10022

    Swintransformer:Hierarchicalvisiontransformerusingshifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022

  19. [19]

    Fast vision transformers with hilo attention

    Pan, Z., Cai, J., Zhuang, B., 2022. Fast vision transformers with hilo attention. Advances in Neural Information Processing Systems 35, 14541–14554

  20. [20]

    Integrating spatialconfigurationintoheatmapregressionbasedcnnsforlandmark localization

    Payer, C., Štern, D., Bischof, H., Urschler, M., 2019. Integrating spatialconfigurationintoheatmapregressionbasedcnnsforlandmark localization. Medical Image Analysis 54, 207–219

  21. [21]

    Machine learning for automated abdom- inal aortic calcification scoring of dxa vertebral fracture assessment images: A pilot study

    Reid, S., Schousboe, J.T., Kimelman, D., Monchka, B.A., Jozani, M.J., Leslie, W.D., 2021. Machine learning for automated abdom- inal aortic calcification scoring of dxa vertebral fracture assessment images: A pilot study. Bone 148, 115943

  22. [22]

    Scol: Supervised contrastive ordinal loss for abdominal aortic calcification scoringonvertebralfractureassessmentscans,in:MICCAI,Springer

    Saleem, A., Ilyas, Z., Suter, D., Hassan, G.M., Reid, S., Schousboe, J.T., Prince, R., Leslie, W.D., Lewis, J.R., Gilani, S.Z., 2023. Scol: Supervised contrastive ordinal loss for abdominal aortic calcification scoringonvertebralfractureassessmentscans,in:MICCAI,Springer. pp. 273–283

  23. [23]

    Bone 104, 91–100

    Schousboe,J.T.,Lewis,J.R.,Kiel,D.P.,2017.Abdominalaorticcalci- ficationondual-energyx-rayabsorptiometry:methodsofassessment and clinical significance. Bone 104, 91–100

  24. [24]

    Detectionofabdom- inal aortic calcification with lateral spine imaging using dxa

    Schousboe,J.T.,Wilson,K.E.,Kiel,D.P.,2006. Detectionofabdom- inal aortic calcification with lateral spine imaging using dxa. Journal of Clinical Densitometry 9, 302–308

  25. [25]

    Machine learning for abdominal aortic calcification assessment from bone density machine-derived lateral spine images

    Sharif, N., Gilani, S.Z., Suter, D., Reid, S., Szulc, P., Kimelman, D., Monchka,B.A.,Jozani,M.J.,Hodgson,J.M.,Sim,M.,Zhu,K.,2023. Machine learning for abdominal aortic calcification assessment from bone density machine-derived lateral spine images. EBioMedicine 94

  26. [26]

    Direct estimation of spinal cobb angles by structured multi-output regression, in: IPMI, Springer

    Sun, H., Zhen, X., Bailey, C., Rasoulinejad, P., Yin, Y., Li, S., 2017. Direct estimation of spinal cobb angles by structured multi-output regression, in: IPMI, Springer. pp. 529–540

  27. [27]

    5693–5703

    Sun,K.,Xiao,B.,Liu,D.,Wang,J.,2019.Deephigh-resolutionrepre- sentation learning for human pose estimation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5693–5703

  28. [28]

    Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study

    Tekeli,M.,Erdem,H.,Kilic,N.,Boyan,N.,Oguz,O.,Soames,R.W., 2023a. Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study. European Spine Journal 32, 4118–4127

  29. [29]

    Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study

    Tekeli,M.,Erdem,H.,Kilic,N.,Boyan,N.,Oguz,O.,Soames,R.W., 2023b. Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study. European Spine Journal 32, 4118–4127

  30. [30]

    Attention is all you need, in: Advances in NeurIPS

    Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Attention is all you need, in: Advances in NeurIPS

  31. [31]

    Fxhenn: Fpga-based acceleration framework for homomorphic encrypted cnn inference,

    Wasserthal, J., 2023. Dataset with segmentations of 117 important anatomicalstructuresin1228ctimages. Zenodo. doi: 10.5281/zenodo. 10047292. accessed: Oct. 27, 2023

  32. [32]

    Automatic land- mark estimation for adolescent idiopathic scoliosis assessment using boostnet, in: MICCAI, Springer

    Wu, H., Bailey, C., Rasoulinejad, P., Li, S., 2017. Automatic land- mark estimation for adolescent idiopathic scoliosis assessment using boostnet, in: MICCAI, Springer. pp. 127–135

  33. [33]

    Vision trans- former with deformable attention, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp

    Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G., 2022. Vision trans- former with deformable attention, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4794– 4803

  34. [34]

    Automatic vertebra labelinginlarge-scale3dctusingdeepimage-to-imagenetworkwith message passing and sparsity regularization, in: IPMI, Springer

    Yang, D., Xiong, T., Xu, D., Huang, Q., Liu, D., Zhou, S.K., Xu, Z., Park, J., Chen, M., Tran, T.D., et al., 2017. Automatic vertebra labelinginlarge-scale3dctusingdeepimage-to-imagenetworkwith message passing and sparsity regularization, in: IPMI, Springer. pp. 633–644

  35. [35]

    Vertebra- focused landmark detection for scoliosis assessment, in: ISBI, IEEE

    Yi, J., Wu, P., Huang, Q., Qu, H., Metaxas, D.N., 2020. Vertebra- focused landmark detection for scoliosis assessment, in: ISBI, IEEE. pp. 736–740

  36. [36]

    Zamir,S.W.,Arora,A.,Khan,S.,Hayat,M.,Khan,F.S.,Yang,M.H.,

  37. [37]

    Restormer: Efficient transformer for high-resolution image restoration,in:ProceedingsoftheIEEE/CVFCVPR,pp.5728–5739

  38. [38]

    Zhao, M., Meng, N., Cheung, J.P.Y., Yu, C., Lu, P., Zhang, T.,

  39. [39]

    Bioengineering 10, 1333

    Spinehrformer: a transformer-based deep learning model for automatic spine deformity assessment with prospective validation. Bioengineering 10, 1333. : Preprint submitted to Elsevier Page 11 of 11