VerteNet -- A Multi-Context Hybrid CNN Transformer for Accurate Vertebral Landmark Localization in Lateral Spine DXA Images

Afsah Saleem; Arooba Maqsood; David Suter; Erchuan Zhang; John T. Schousboe; Jonathan M. Hodgson; Joshua R. Lewis; Parminder Raina; Syed Zulqarnain Gilani; William D. Leslie

arxiv: 2502.02097 · v3 · submitted 2025-02-04 · 💻 cs.CV

VerteNet -- A Multi-Context Hybrid CNN Transformer for Accurate Vertebral Landmark Localization in Lateral Spine DXA Images

Arooba Maqsood , Zaid Ilyas , Afsah Saleem , Erchuan Zhang , David Suter , Parminder Raina , Jonathan M. Hodgson , John T. Schousboe

show 3 more authors

William D. Leslie Joshua R. Lewis Syed Zulqarnain Gilani

This is my paper

Pith reviewed 2026-05-23 03:54 UTC · model grok-4.3

classification 💻 cs.CV

keywords vertebral landmark localizationDXA spine imaginghybrid CNN transformerdeep learningfracture assessmentabdominal aortic calcificationmulti-scanner generalization

0 comments

The pith

A hybrid CNN-Transformer model localizes vertebral corners in lateral DXA spine images with 4.92 pixel normalized mean error across four scanner models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests VerteNet, a dual-resolution self- and cross-attention network, to place landmarks at the corners of vertebrae T12 through L5 on lateral spine DXA scans. These images are low-contrast and vary by manufacturer, so manual placement is slow and inconsistent, yet the landmarks are needed for fracture grading and abdominal aortic calcification scoring. The model reports a normalized mean error of 4.92 pixels and median error of 2.35 pixels, beating prior methods on data from four different scanners. It also detects an abdominal aorta crop with 96 percent accuracy on a held-out test set and produces intervertebral guides that raise inter-reader agreement. The result is a pipeline that supplies reliable landmarks for the two clinical tasks without requiring reader-specific retraining.

Core claim

The dual-resolution self- and cross-attention hybrid CNN Transformer achieves a normalized mean localization error of 4.92 pixels and a median error of 2.35 pixels on manually annotated vertebral corner landmarks (T12-L5) drawn from four DXA scanner models, outperforming baseline methods while also delivering 100 percent validation accuracy and 96 percent test accuracy for an abdominal aorta crop detector.

What carries the argument

Dual-resolution self- and cross-attention hybrid CNN Transformer that fuses multi-scale context to predict vertebral corner coordinates.

If this is right

Landmark coordinates become reliable enough for automatic fracture assessment on DXA scans.
Generated intervertebral guides raise agreement between different human readers on the same images.
The same pipeline supplies the vertebral positions required for the 24-point Kauppila method of scoring abdominal aortic calcification.
Performance holds across four distinct DXA scanner models without scanner-specific retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The model could be inserted into existing DXA reporting software to shorten the time from scan acquisition to fracture and calcification reports.
If the landmark accuracy transfers to new scanner models released after the study, the method would lower the barrier to multi-center DXA research.
The dual-resolution attention design may generalize to other low-contrast landmark tasks such as pelvis or hand radiographs.

Load-bearing premise

The manually placed ground-truth corner positions are treated as accurate and consistent reference points for all scanners and readers.

What would settle it

Re-annotation of the same test images by multiple independent readers or comparison against fracture status confirmed on follow-up imaging would show whether the reported error reduction persists.

Figures

Figures reproduced from arXiv: 2502.02097 by Afsah Saleem, Arooba Maqsood, David Suter, Erchuan Zhang, John T. Schousboe, Jonathan M. Hodgson, Joshua R. Lewis, Parminder Raina, Syed Zulqarnain Gilani, William D. Leslie, Zaid Ilyas.

**Figure 1.** Figure 1: (a) CT Lateral Spine Imaging – the gold standard, slower with highest radiation exposure [29]. (b) Digital X-Ray Imaging – a faster option with lower radiation exposure than CT [7]. (c) Hologic DXA SE variant – quickest with lowest radiation exposure, though susceptible to artifacts such as bowel gas. (d) and (e) GE DXA SE and DE variants – equipped with radiation-reducing technology (black regions), offer… view at source ↗

**Figure 2.** Figure 2: (a) A DXA LSI example with red arrows marking the location of AAC. (b) An illustration of Kauppila’s AAC24 scoring method [13]. (c) A DXA image example showing unclear vertebral boundaries, with (d) indicating the intended placement of IVGs needed for AAC-24 scoring. space. Yang et al. [32] developed a VLL framework based on heatmaps and incorporated a Markov Random Field model for refining landmarks. Pay… view at source ↗

**Figure 3.** Figure 3: (a) Proposed Framework VerteNet (b) Dual Resolution Self Attention (DRSA) (c) Dual Resolution Cross Attention (DRCA) - takes two feature maps as input and generates Query from one feature map and Keys and Values from the other feature map. (d) Multi-Context Feature Fusion Block (MCFB) that employs DRSA and DRCA to calculate self-attention within, and cross attention among features from skip connection of l… view at source ↗

**Figure 4.** Figure 4: (a) HiLo Attention [17] - Uses the same window count s in both the low and actual resolution SA paths, which limits the context to individual patches. (b) DRSA - Uses the same window size in both resolution SA paths which increases the context in low-resolution SA path, and also introduces overlapping. In (a) and (b), r is the size reduction factor, and s is the window count. which was mainly designed for … view at source ↗

**Figure 5.** Figure 5: (a) and (b) describe the types of input images that our algorithm can process, originating from two different machines: one with black regions (GE machine) and one without (Hologic machines). (c) illustrates the block-level structure of the abdominal aorta crop detection algorithm. The image classifier categorizes the input images into two groups: those with black regions and those without (d). (e) shows t… view at source ↗

**Figure 6.** Figure 6: (a) Possible abdominal crop detection (both based on black region and image width) in DE DXA image from GE machine. (b) No aorta crop detection in the range L1 to L4 in the SE DXA image from GE machine. (c)(d) No abdominal aorta crop in SE Hologic machine images [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: (a) and (b) represent the failure cases of the proposed algorithm, where improper localization of landmarks in images from Hologic and GE machines leads to suboptimal performance, as the algorithm relies on accurate landmark detection. IVGs to divide the abdominal aorta into four regions corresponding to L1–L4 vertebrae. While experienced professionals can position IVGs and categorize regions accurately,… view at source ↗

read the original abstract

This aims to develop and validate a deep learning model that can accurately locate vertebral landmarks in lateral spine Dual energy X-ray Absorptiometry (DXA) scans. Accurate vertebral landmark localization is critical for reliable fracture assessment and scoring of abdominal aortic calcification using the Kauppila 24-point method; however, DXA lateral spine images are low-contrast, artifact-prone, and manufacturer-dependent, while manual annotation is time-consuming and reader-dependent. This study aimed to address these challenges by developing a dual-resolution self- and cross-attention model for robust vertebral landmark localization using lateral spine DXA scans from four different scanner models. Ground-truth vertebral corner landmarks (T12 to L5) were manually annotated, and performance was evaluated using normalized mean and median localization errors against baseline and state-of-the-art methods. The proposed framework achieved superior localization accuracy across all four DXA scanner models, with a normalized mean error of 4.92 pixels and a median error of 2.35 pixels, outperforming baseline methods. The abdominal aorta crop detection algorithm achieved 100% accuracy in validation and 96% accuracy (sensitivity 0.93, specificity 0.98) in an independent test set. Generated intervertebral guides further improved inter-reader agreement, reflected by higher Cohens weighted kappa and inter-reader correlation. The proposed deep learning framework enables accurate and robust vertebral landmark localization in lateral spine DXA images across heterogeneous imaging systems to support clinically relevant downstream analyses. The code for this work can be found at: https://github.com/zaidilyas89/VerteNet

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VerteNet posts lower landmark errors than baselines on four DXA scanners and ships code, but the single-set manual annotations have no inter-rater numbers so the headline gains are hard to interpret.

read the letter

The paper's main contribution is a dual-resolution hybrid CNN-Transformer that combines self- and cross-attention for vertebral corner detection on lateral DXA images. It tests the model on scans from four scanner models, reports normalized mean error of 4.92 pixels and median of 2.35 pixels, and beats the baselines it compares against. Code is released, which is useful for anyone who wants to reproduce or extend the numbers. The 100% validation accuracy on the abdominal aorta crop step and the downstream gain in inter-reader kappa for the guides are concrete practical results.

Referee Report

1 major / 3 minor

Summary. The paper introduces VerteNet, a dual-resolution hybrid CNN-Transformer model using self- and cross-attention for localizing vertebral corner landmarks (T12–L5) in lateral spine DXA images acquired on four scanner models. Ground-truth corners were manually annotated; the model reports a normalized mean error of 4.92 pixels and median error of 2.35 pixels, outperforming baselines, while an auxiliary aorta-crop module reaches 96% accuracy on an independent test set. Generated intervertebral guides improve downstream inter-reader agreement (Cohen’s weighted kappa and correlation). Code is released at the cited GitHub repository.

Significance. If the performance numbers hold under reliable ground truth, the work supplies a practical, multi-scanner solution for automating a time-consuming and reader-dependent step that directly supports fracture grading and Kauppila AAC scoring. The explicit multi-vendor evaluation and public code release are concrete strengths that improve reproducibility and potential for clinical adoption.

major comments (1)

[Abstract and Results] Abstract and Results: the headline claim of 'accurate' localization (normalized mean error 4.92 px, median 2.35 px) and superiority rests on comparison to a single set of manual corner annotations whose inter-rater reliability is never quantified. The manuscript reports improved inter-reader agreement only for the derived intervertebral guides, not for the landmark coordinates themselves. Without an inter-annotator Euclidean distance or similar metric on the same images, it is impossible to determine whether the reported errors lie below, at, or above typical human variability, rendering the absolute accuracy interpretation and clinical relevance of the numbers uncertain.

minor comments (3)

[Abstract] Abstract: dataset size, train-test split ratios, number of images per scanner model, and any statistical testing (e.g., paired t-tests or Wilcoxon tests against baselines) are not stated, making it difficult to gauge the robustness of the reported superiority.
[Methods and Results] Methods/Results: exact implementations, hyper-parameters, and training protocols of the baseline and state-of-the-art methods are not detailed, preventing independent verification of the performance gap.
[Results] The manuscript states that the aorta-crop module was validated at 100% and tested at 96%, but does not report the size or composition of the independent test set used for the 96% figure.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for this constructive observation on the interpretation of our localization results. We address the point directly below.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results: the headline claim of 'accurate' localization (normalized mean error 4.92 px, median 2.35 px) and superiority rests on comparison to a single set of manual corner annotations whose inter-rater reliability is never quantified. The manuscript reports improved inter-reader agreement only for the derived intervertebral guides, not for the landmark coordinates themselves. Without an inter-annotator Euclidean distance or similar metric on the same images, it is impossible to determine whether the reported errors lie below, at, or above typical human variability, rendering the absolute accuracy interpretation and clinical relevance of the numbers uncertain.

Authors: We agree that an inter-annotator variability metric on the landmark coordinates would strengthen the absolute interpretation of the reported errors. Our ground-truth annotations were produced by a single experienced musculoskeletal radiologist using a standardized protocol on the full dataset; consequently, independent multi-rater annotations are not available and we cannot compute Euclidean inter-rater distances. The relative superiority of VerteNet over the baselines remains valid because all methods were evaluated against the identical annotation set. Clinical relevance is evidenced by the statistically significant improvement in inter-reader agreement when the model-derived intervertebral guides (rather than the raw landmarks) are supplied to readers. In the revised manuscript we will (i) explicitly state that landmark annotations were single-rater, (ii) report the limitation that inter-rater reliability for the corner coordinates themselves was not quantified, and (iii) temper the abstract wording from “accurate” to “robust and superior to baselines under the annotation protocol used.” revision: partial

standing simulated objections not resolved

Multiple independent landmark annotations do not exist, preventing direct computation of inter-rater Euclidean distances for the corner coordinates.

Circularity Check

0 steps flagged

No circularity: empirical evaluation on held-out annotations

full rationale

The paper trains a hybrid CNN-Transformer on manually annotated vertebral corner landmarks (T12-L5) from DXA scans and reports normalized mean/median localization error on held-out test images across four scanner models. All claims rest on direct comparison of model outputs to external ground-truth coordinates; there are no equations, derivations, fitted parameters, or self-citations that reduce any reported result to its own inputs by construction. The evaluation protocol is standard supervised learning and remains falsifiable via the released code.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality and consistency of the manual landmark annotations used as ground truth and on the assumption that the four-scanner training distribution is representative of future clinical images.

free parameters (1)

neural network weights
All model parameters are fitted to the annotated training images from the four scanner models.

axioms (1)

domain assumption Manual annotations provide reliable ground-truth landmark positions
All error metrics and comparisons are computed against these annotations.

pith-pipeline@v0.9.0 · 5878 in / 1254 out tokens · 83836 ms · 2026-05-23T03:54:44.291524+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

The role of dxa bone density scans inthediagnosisandtreatmentofosteoporosis

Blake, G.M., Fogelman, I., 2007. The role of dxa bone density scans inthediagnosisandtreatmentofosteoporosis. PostgraduateMedical Journal 83, 509–517

work page 2007
[2]

Automated scoring of aortic calcifi- cation in vertebral fracture assessment images, in: Medical Imaging 2019: Computer-Aided Diagnosis, SPIE

Chaplin, L., Cootes, T., 2019. Automated scoring of aortic calcifi- cation in vertebral fracture assessment images, in: Medical Imaging 2019: Computer-Aided Diagnosis, SPIE

work page 2019
[3]

Dual aggregation transformer for image super-resolution, in: Proceedings of the IEEE/CVF ICCV, pp

Chen, Z., Zhang, Y., Gu, J., Kong, L., Yang, X., Yu, F., 2023. Dual aggregation transformer for image super-resolution, in: Proceedings of the IEEE/CVF ICCV, pp. 12312–12321. : Preprint submitted to Elsevier Page 10 of 11

work page 2023
[4]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M.W., Lee, K., Toutanova, K., 2018. Bert: Pre- training of deep bidirectional transformers for language understand- ing. arXiv preprint arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Discriminativeunsupervisedfeaturelearningwithexemplarconvolu- tional neural networks

Dosovitskiy,A.,Beyer,L.,Kolesnikov,A.,Weissenborn,D.,Zhai,X., Unterthiner,T.,Hossain,I.,Kaiser,L.,Hou,Z.,Moczulski,M.,2016. Discriminativeunsupervisedfeaturelearningwithexemplarconvolu- tional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1734–1747

work page 2016
[6]

Elmasri, K., Hicks, Y., Yang, X., Sun, X., Pettit, R., Evans, W.,

work page
[7]

ProcediaComputer Science 96, 1011–1021

Automatic detection and quantification of abdominal aortic calcificationindualenergyx-rayabsorptiometry. ProcediaComputer Science 96, 1011–1021

work page
[8]

A dataset of scoliosis, spondylolisthesis, and normal vertebrae x-ray images

Fraiwan, M., Audat, Z., Manasreh, T., 2022. A dataset of scoliosis, spondylolisthesis, and normal vertebrae x-ray images. Mendeley Data. doi:10.17632/xkt857dsxk.1

work page doi:10.17632/xkt857dsxk.1 2022
[9]

Show, attend and detect: Towards fine- grained assessment of abdominal aortic calcification on vertebral fracture assessment scans, in: MICCAI, Springer

Gilani, S.Z., Sharif, N., Suter, D., Schousboe, J.T., Reid, S., Leslie, W.D., Lewis, J.R., 2022. Show, attend and detect: Towards fine- grained assessment of abdominal aortic calcification on vertebral fracture assessment scans, in: MICCAI, Springer. pp. 439–450

work page 2022
[10]

A keypoint transformer to discover spine structure for cobb angle estimation, in: ICME, IEEE

Guo, Y., Li, Y., Zhou, X., He, W., 2021. A keypoint transformer to discover spine structure for cobb angle estimation, in: ICME, IEEE. pp. 1–6

work page 2021
[11]

Landmark localization from medical images with generative distribution prior

Huang, Z., Zhao, R., Leung, F.H., Banerjee, S., Lam, K.M., Zheng, Y.P., Ling, S.H., 2024. Landmark localization from medical images with generative distribution prior. IEEE TMI

work page 2024
[12]

Ilyas,Z.,Saleem,A.,Suter,D.,Schousboe,J.T.,Leslie,W.D.,Lewis, J.R., Gilani, S.Z., 2024. A hybrid cnn-transformer feature pyramid networkforgranularabdominalaorticcalcificationdetectionfromdxa images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer Nature Switzerland. pp. 14–25

work page 2024
[13]

Guidenet: Learning inter-vertebral guides in dxa lateral spine images, in: 2021 DICTA, IEEE

Ilyas, Z., Sharif, N., Schousboe, J.T., Lewis, J.R., Suter, D., Gilani, S.Z., 2021. Guidenet: Learning inter-vertebral guides in dxa lateral spine images, in: 2021 DICTA, IEEE. pp. 1–7

work page 2021
[14]

New indices to classify location, severity and progression of calcific lesions in the abdominal aorta: a 25-year follow-up study

Kauppila, L.I., Polak, J.F., Cupples, L.A., Hannan, M.T., Kiel, D.P., Wilson, P.W., 1997. New indices to classify location, severity and progression of calcific lesions in the abdominal aorta: a 25-year follow-up study. Atherosclerosis 132, 245–250

work page 1997
[15]

Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, pp

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P., 2017. Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

work page 2017
[16]

Thoracolum- bar/lumbardegenerativekyphosis—theimportanceofthoracolumbar junction in sagittal alignment and balance

Liu, C., Ge, R., Li, H., Zhu, Z., Xia, W., Liu, H., 2023. Thoracolum- bar/lumbardegenerativekyphosis—theimportanceofthoracolumbar junction in sagittal alignment and balance. Journal of Personalized Medicine 14, 36

work page 2023
[17]

Liu,Z.,Lin,Y.,Cao,Y.,Hu,H.,Wei,Y.,Zhang,Z.,Lin,S.,Guo,B.,

work page
[18]

10012–10022

Swintransformer:Hierarchicalvisiontransformerusingshifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022

work page
[19]

Fast vision transformers with hilo attention

Pan, Z., Cai, J., Zhuang, B., 2022. Fast vision transformers with hilo attention. Advances in Neural Information Processing Systems 35, 14541–14554

work page 2022
[20]

Integrating spatialconfigurationintoheatmapregressionbasedcnnsforlandmark localization

Payer, C., Štern, D., Bischof, H., Urschler, M., 2019. Integrating spatialconfigurationintoheatmapregressionbasedcnnsforlandmark localization. Medical Image Analysis 54, 207–219

work page 2019
[21]

Machine learning for automated abdom- inal aortic calcification scoring of dxa vertebral fracture assessment images: A pilot study

Reid, S., Schousboe, J.T., Kimelman, D., Monchka, B.A., Jozani, M.J., Leslie, W.D., 2021. Machine learning for automated abdom- inal aortic calcification scoring of dxa vertebral fracture assessment images: A pilot study. Bone 148, 115943

work page 2021
[22]

Scol: Supervised contrastive ordinal loss for abdominal aortic calcification scoringonvertebralfractureassessmentscans,in:MICCAI,Springer

Saleem, A., Ilyas, Z., Suter, D., Hassan, G.M., Reid, S., Schousboe, J.T., Prince, R., Leslie, W.D., Lewis, J.R., Gilani, S.Z., 2023. Scol: Supervised contrastive ordinal loss for abdominal aortic calcification scoringonvertebralfractureassessmentscans,in:MICCAI,Springer. pp. 273–283

work page 2023
[23]

Bone 104, 91–100

Schousboe,J.T.,Lewis,J.R.,Kiel,D.P.,2017.Abdominalaorticcalci- ficationondual-energyx-rayabsorptiometry:methodsofassessment and clinical significance. Bone 104, 91–100

work page 2017
[24]

Detectionofabdom- inal aortic calcification with lateral spine imaging using dxa

Schousboe,J.T.,Wilson,K.E.,Kiel,D.P.,2006. Detectionofabdom- inal aortic calcification with lateral spine imaging using dxa. Journal of Clinical Densitometry 9, 302–308

work page 2006
[25]

Machine learning for abdominal aortic calcification assessment from bone density machine-derived lateral spine images

Sharif, N., Gilani, S.Z., Suter, D., Reid, S., Szulc, P., Kimelman, D., Monchka,B.A.,Jozani,M.J.,Hodgson,J.M.,Sim,M.,Zhu,K.,2023. Machine learning for abdominal aortic calcification assessment from bone density machine-derived lateral spine images. EBioMedicine 94

work page 2023
[26]

Direct estimation of spinal cobb angles by structured multi-output regression, in: IPMI, Springer

Sun, H., Zhen, X., Bailey, C., Rasoulinejad, P., Yin, Y., Li, S., 2017. Direct estimation of spinal cobb angles by structured multi-output regression, in: IPMI, Springer. pp. 529–540

work page 2017
[27]

5693–5703

Sun,K.,Xiao,B.,Liu,D.,Wang,J.,2019.Deephigh-resolutionrepre- sentation learning for human pose estimation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5693–5703

work page 2019
[28]

Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study

Tekeli,M.,Erdem,H.,Kilic,N.,Boyan,N.,Oguz,O.,Soames,R.W., 2023a. Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study. European Spine Journal 32, 4118–4127

work page
[29]

Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study

Tekeli,M.,Erdem,H.,Kilic,N.,Boyan,N.,Oguz,O.,Soames,R.W., 2023b. Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study. European Spine Journal 32, 4118–4127

work page
[30]

Attention is all you need, in: Advances in NeurIPS

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Attention is all you need, in: Advances in NeurIPS

work page 2017
[31]

Fxhenn: Fpga-based acceleration framework for homomorphic encrypted cnn inference,

Wasserthal, J., 2023. Dataset with segmentations of 117 important anatomicalstructuresin1228ctimages. Zenodo. doi: 10.5281/zenodo. 10047292. accessed: Oct. 27, 2023

work page doi:10.5281/zenodo 2023
[32]

Automatic land- mark estimation for adolescent idiopathic scoliosis assessment using boostnet, in: MICCAI, Springer

Wu, H., Bailey, C., Rasoulinejad, P., Li, S., 2017. Automatic land- mark estimation for adolescent idiopathic scoliosis assessment using boostnet, in: MICCAI, Springer. pp. 127–135

work page 2017
[33]

Vision trans- former with deformable attention, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp

Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G., 2022. Vision trans- former with deformable attention, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4794– 4803

work page 2022
[34]

Automatic vertebra labelinginlarge-scale3dctusingdeepimage-to-imagenetworkwith message passing and sparsity regularization, in: IPMI, Springer

Yang, D., Xiong, T., Xu, D., Huang, Q., Liu, D., Zhou, S.K., Xu, Z., Park, J., Chen, M., Tran, T.D., et al., 2017. Automatic vertebra labelinginlarge-scale3dctusingdeepimage-to-imagenetworkwith message passing and sparsity regularization, in: IPMI, Springer. pp. 633–644

work page 2017
[35]

Vertebra- focused landmark detection for scoliosis assessment, in: ISBI, IEEE

Yi, J., Wu, P., Huang, Q., Qu, H., Metaxas, D.N., 2020. Vertebra- focused landmark detection for scoliosis assessment, in: ISBI, IEEE. pp. 736–740

work page 2020
[36]

Zamir,S.W.,Arora,A.,Khan,S.,Hayat,M.,Khan,F.S.,Yang,M.H.,

work page
[37]

Restormer: Efficient transformer for high-resolution image restoration,in:ProceedingsoftheIEEE/CVFCVPR,pp.5728–5739

work page
[38]

Zhao, M., Meng, N., Cheung, J.P.Y., Yu, C., Lu, P., Zhang, T.,

work page
[39]

Bioengineering 10, 1333

Spinehrformer: a transformer-based deep learning model for automatic spine deformity assessment with prospective validation. Bioengineering 10, 1333. : Preprint submitted to Elsevier Page 11 of 11

work page

[1] [1]

The role of dxa bone density scans inthediagnosisandtreatmentofosteoporosis

Blake, G.M., Fogelman, I., 2007. The role of dxa bone density scans inthediagnosisandtreatmentofosteoporosis. PostgraduateMedical Journal 83, 509–517

work page 2007

[2] [2]

Automated scoring of aortic calcifi- cation in vertebral fracture assessment images, in: Medical Imaging 2019: Computer-Aided Diagnosis, SPIE

Chaplin, L., Cootes, T., 2019. Automated scoring of aortic calcifi- cation in vertebral fracture assessment images, in: Medical Imaging 2019: Computer-Aided Diagnosis, SPIE

work page 2019

[3] [3]

Dual aggregation transformer for image super-resolution, in: Proceedings of the IEEE/CVF ICCV, pp

Chen, Z., Zhang, Y., Gu, J., Kong, L., Yang, X., Yu, F., 2023. Dual aggregation transformer for image super-resolution, in: Proceedings of the IEEE/CVF ICCV, pp. 12312–12321. : Preprint submitted to Elsevier Page 10 of 11

work page 2023

[4] [4]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M.W., Lee, K., Toutanova, K., 2018. Bert: Pre- training of deep bidirectional transformers for language understand- ing. arXiv preprint arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Discriminativeunsupervisedfeaturelearningwithexemplarconvolu- tional neural networks

Dosovitskiy,A.,Beyer,L.,Kolesnikov,A.,Weissenborn,D.,Zhai,X., Unterthiner,T.,Hossain,I.,Kaiser,L.,Hou,Z.,Moczulski,M.,2016. Discriminativeunsupervisedfeaturelearningwithexemplarconvolu- tional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1734–1747

work page 2016

[6] [6]

Elmasri, K., Hicks, Y., Yang, X., Sun, X., Pettit, R., Evans, W.,

work page

[7] [7]

ProcediaComputer Science 96, 1011–1021

Automatic detection and quantification of abdominal aortic calcificationindualenergyx-rayabsorptiometry. ProcediaComputer Science 96, 1011–1021

work page

[8] [8]

A dataset of scoliosis, spondylolisthesis, and normal vertebrae x-ray images

Fraiwan, M., Audat, Z., Manasreh, T., 2022. A dataset of scoliosis, spondylolisthesis, and normal vertebrae x-ray images. Mendeley Data. doi:10.17632/xkt857dsxk.1

work page doi:10.17632/xkt857dsxk.1 2022

[9] [9]

Show, attend and detect: Towards fine- grained assessment of abdominal aortic calcification on vertebral fracture assessment scans, in: MICCAI, Springer

Gilani, S.Z., Sharif, N., Suter, D., Schousboe, J.T., Reid, S., Leslie, W.D., Lewis, J.R., 2022. Show, attend and detect: Towards fine- grained assessment of abdominal aortic calcification on vertebral fracture assessment scans, in: MICCAI, Springer. pp. 439–450

work page 2022

[10] [10]

A keypoint transformer to discover spine structure for cobb angle estimation, in: ICME, IEEE

Guo, Y., Li, Y., Zhou, X., He, W., 2021. A keypoint transformer to discover spine structure for cobb angle estimation, in: ICME, IEEE. pp. 1–6

work page 2021

[11] [11]

Landmark localization from medical images with generative distribution prior

Huang, Z., Zhao, R., Leung, F.H., Banerjee, S., Lam, K.M., Zheng, Y.P., Ling, S.H., 2024. Landmark localization from medical images with generative distribution prior. IEEE TMI

work page 2024

[12] [12]

Ilyas,Z.,Saleem,A.,Suter,D.,Schousboe,J.T.,Leslie,W.D.,Lewis, J.R., Gilani, S.Z., 2024. A hybrid cnn-transformer feature pyramid networkforgranularabdominalaorticcalcificationdetectionfromdxa images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer Nature Switzerland. pp. 14–25

work page 2024

[13] [13]

Guidenet: Learning inter-vertebral guides in dxa lateral spine images, in: 2021 DICTA, IEEE

Ilyas, Z., Sharif, N., Schousboe, J.T., Lewis, J.R., Suter, D., Gilani, S.Z., 2021. Guidenet: Learning inter-vertebral guides in dxa lateral spine images, in: 2021 DICTA, IEEE. pp. 1–7

work page 2021

[14] [14]

New indices to classify location, severity and progression of calcific lesions in the abdominal aorta: a 25-year follow-up study

Kauppila, L.I., Polak, J.F., Cupples, L.A., Hannan, M.T., Kiel, D.P., Wilson, P.W., 1997. New indices to classify location, severity and progression of calcific lesions in the abdominal aorta: a 25-year follow-up study. Atherosclerosis 132, 245–250

work page 1997

[15] [15]

Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, pp

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P., 2017. Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

work page 2017

[16] [16]

Thoracolum- bar/lumbardegenerativekyphosis—theimportanceofthoracolumbar junction in sagittal alignment and balance

Liu, C., Ge, R., Li, H., Zhu, Z., Xia, W., Liu, H., 2023. Thoracolum- bar/lumbardegenerativekyphosis—theimportanceofthoracolumbar junction in sagittal alignment and balance. Journal of Personalized Medicine 14, 36

work page 2023

[17] [17]

Liu,Z.,Lin,Y.,Cao,Y.,Hu,H.,Wei,Y.,Zhang,Z.,Lin,S.,Guo,B.,

work page

[18] [18]

10012–10022

Swintransformer:Hierarchicalvisiontransformerusingshifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022

work page

[19] [19]

Fast vision transformers with hilo attention

Pan, Z., Cai, J., Zhuang, B., 2022. Fast vision transformers with hilo attention. Advances in Neural Information Processing Systems 35, 14541–14554

work page 2022

[20] [20]

Integrating spatialconfigurationintoheatmapregressionbasedcnnsforlandmark localization

Payer, C., Štern, D., Bischof, H., Urschler, M., 2019. Integrating spatialconfigurationintoheatmapregressionbasedcnnsforlandmark localization. Medical Image Analysis 54, 207–219

work page 2019

[21] [21]

Machine learning for automated abdom- inal aortic calcification scoring of dxa vertebral fracture assessment images: A pilot study

Reid, S., Schousboe, J.T., Kimelman, D., Monchka, B.A., Jozani, M.J., Leslie, W.D., 2021. Machine learning for automated abdom- inal aortic calcification scoring of dxa vertebral fracture assessment images: A pilot study. Bone 148, 115943

work page 2021

[22] [22]

Scol: Supervised contrastive ordinal loss for abdominal aortic calcification scoringonvertebralfractureassessmentscans,in:MICCAI,Springer

Saleem, A., Ilyas, Z., Suter, D., Hassan, G.M., Reid, S., Schousboe, J.T., Prince, R., Leslie, W.D., Lewis, J.R., Gilani, S.Z., 2023. Scol: Supervised contrastive ordinal loss for abdominal aortic calcification scoringonvertebralfractureassessmentscans,in:MICCAI,Springer. pp. 273–283

work page 2023

[23] [23]

Bone 104, 91–100

Schousboe,J.T.,Lewis,J.R.,Kiel,D.P.,2017.Abdominalaorticcalci- ficationondual-energyx-rayabsorptiometry:methodsofassessment and clinical significance. Bone 104, 91–100

work page 2017

[24] [24]

Detectionofabdom- inal aortic calcification with lateral spine imaging using dxa

Schousboe,J.T.,Wilson,K.E.,Kiel,D.P.,2006. Detectionofabdom- inal aortic calcification with lateral spine imaging using dxa. Journal of Clinical Densitometry 9, 302–308

work page 2006

[25] [25]

Machine learning for abdominal aortic calcification assessment from bone density machine-derived lateral spine images

Sharif, N., Gilani, S.Z., Suter, D., Reid, S., Szulc, P., Kimelman, D., Monchka,B.A.,Jozani,M.J.,Hodgson,J.M.,Sim,M.,Zhu,K.,2023. Machine learning for abdominal aortic calcification assessment from bone density machine-derived lateral spine images. EBioMedicine 94

work page 2023

[26] [26]

Direct estimation of spinal cobb angles by structured multi-output regression, in: IPMI, Springer

Sun, H., Zhen, X., Bailey, C., Rasoulinejad, P., Yin, Y., Li, S., 2017. Direct estimation of spinal cobb angles by structured multi-output regression, in: IPMI, Springer. pp. 529–540

work page 2017

[27] [27]

5693–5703

Sun,K.,Xiao,B.,Liu,D.,Wang,J.,2019.Deephigh-resolutionrepre- sentation learning for human pose estimation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5693–5703

work page 2019

[28] [28]

Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study

Tekeli,M.,Erdem,H.,Kilic,N.,Boyan,N.,Oguz,O.,Soames,R.W., 2023a. Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study. European Spine Journal 32, 4118–4127

work page

[29] [29]

Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study

Tekeli,M.,Erdem,H.,Kilic,N.,Boyan,N.,Oguz,O.,Soames,R.W., 2023b. Evaluation of lumbar lordosis in symptomatic individuals and comparative analysis of six different techniques: a retrospective radiologic study. European Spine Journal 32, 4118–4127

work page

[30] [30]

Attention is all you need, in: Advances in NeurIPS

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Attention is all you need, in: Advances in NeurIPS

work page 2017

[31] [31]

Fxhenn: Fpga-based acceleration framework for homomorphic encrypted cnn inference,

Wasserthal, J., 2023. Dataset with segmentations of 117 important anatomicalstructuresin1228ctimages. Zenodo. doi: 10.5281/zenodo. 10047292. accessed: Oct. 27, 2023

work page doi:10.5281/zenodo 2023

[32] [32]

Automatic land- mark estimation for adolescent idiopathic scoliosis assessment using boostnet, in: MICCAI, Springer

Wu, H., Bailey, C., Rasoulinejad, P., Li, S., 2017. Automatic land- mark estimation for adolescent idiopathic scoliosis assessment using boostnet, in: MICCAI, Springer. pp. 127–135

work page 2017

[33] [33]

Vision trans- former with deformable attention, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp

Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G., 2022. Vision trans- former with deformable attention, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4794– 4803

work page 2022

[34] [34]

Automatic vertebra labelinginlarge-scale3dctusingdeepimage-to-imagenetworkwith message passing and sparsity regularization, in: IPMI, Springer

Yang, D., Xiong, T., Xu, D., Huang, Q., Liu, D., Zhou, S.K., Xu, Z., Park, J., Chen, M., Tran, T.D., et al., 2017. Automatic vertebra labelinginlarge-scale3dctusingdeepimage-to-imagenetworkwith message passing and sparsity regularization, in: IPMI, Springer. pp. 633–644

work page 2017

[35] [35]

Vertebra- focused landmark detection for scoliosis assessment, in: ISBI, IEEE

Yi, J., Wu, P., Huang, Q., Qu, H., Metaxas, D.N., 2020. Vertebra- focused landmark detection for scoliosis assessment, in: ISBI, IEEE. pp. 736–740

work page 2020

[36] [36]

Zamir,S.W.,Arora,A.,Khan,S.,Hayat,M.,Khan,F.S.,Yang,M.H.,

work page

[37] [37]

Restormer: Efficient transformer for high-resolution image restoration,in:ProceedingsoftheIEEE/CVFCVPR,pp.5728–5739

work page

[38] [38]

Zhao, M., Meng, N., Cheung, J.P.Y., Yu, C., Lu, P., Zhang, T.,

work page

[39] [39]

Bioengineering 10, 1333

Spinehrformer: a transformer-based deep learning model for automatic spine deformity assessment with prospective validation. Bioengineering 10, 1333. : Preprint submitted to Elsevier Page 11 of 11

work page