Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge
Pith reviewed 2026-05-22 21:51 UTC · model grok-4.3
The pith
CT segmentation reaches 0.93 fragment IoU on pelvic fractures while X-ray reaches only 0.77
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The top-performing CT algorithm achieved an average fragment-wise intersection over union of 0.930 while the best X-ray algorithm achieved 0.774. These scores indicate satisfactory accuracy for CT but insufficient performance for intra-operative X-ray decision-making. The challenge exposed that instance representation choices, such as primary-secondary classification versus boundary-core separation, lead to different segmentation strategies and that fragment definition carries inherent uncertainty in incomplete fractures.
What carries the argument
Fragment-wise intersection over union (IoU) computed on instance-level masks, used to compare algorithms that differ in how they represent fracture fragments.
If this is right
- CT fragment segmentation is accurate enough to support pre-operative planning and post-operative assessment.
- X-ray fragment segmentation is not yet reliable enough for intra-operative use because of overlap in projection views.
- Instance representation choices, such as primary-secondary versus boundary-core, produce measurably different segmentation outcomes.
- Uncertainties in defining incomplete fractures limit fully automatic methods and point toward interactive segmentation.
Where Pith is reading between the lines
- Real clinical X-rays may yield lower IoU than the simulated set because the simulation may understate certain artifacts.
- A hybrid pipeline that starts with CT segmentation and then refines on X-ray could reduce the performance gap.
- The released multi-center dataset and simulated X-rays can serve as a public testbed for future projection-aware segmentation research.
Load-bearing premise
The simulated X-ray images generated by DeepDRR capture the noise, artifacts, and positioning variations present in real clinical X-rays.
What would settle it
Apply the winning X-ray algorithms to a held-out set of real intraoperative X-ray images and measure whether fragment-wise IoU remains near 0.774 or drops substantially.
Figures
read the original abstract
The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture segmentation by benchmarking state-of-the-art algorithms on these complex tasks. A diverse dataset of 150 CT scans was collected from multiple clinical centers, and a large set of simulated X-ray images was generated using the DeepDRR method. Final submissions from 16 teams worldwide were evaluated under a rigorous multi-metric testing scheme. The top-performing CT algorithm achieved an average fragment-wise intersection over union (IoU) of 0.930, demonstrating satisfactory accuracy. However, in the X-ray task, the best algorithm achieved an IoU of 0.774, which is promising but not yet sufficient for intra-operative decision-making, reflecting the inherent challenges of fragment overlap in projection imaging. Beyond the quantitative evaluation, the challenge revealed methodological diversity in algorithm design. Variations in instance representation, such as primary-secondary classification versus boundary-core separation, led to differing segmentation strategies. Despite promising results, the challenge also exposed inherent uncertainties in fragment definition, particularly in cases of incomplete fractures. These findings suggest that interactive segmentation approaches, integrating human decision-making with task-relevant information, may be essential for improving model reliability and clinical applicability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript summarizes the PENGWIN 2024 MICCAI challenge benchmarking automated segmentation of pelvic fracture fragments. It uses a multi-center dataset of 150 CT scans and corresponding simulated X-ray projections generated by DeepDRR. Sixteen independent team submissions were evaluated on held-out test data under a multi-metric protocol. The top CT method reached a fragment-wise IoU of 0.930; the top X-ray method reached 0.774. The paper additionally catalogs variations in instance-representation strategies across submissions and notes definitional uncertainty for incomplete fractures, concluding that interactive segmentation may be required for clinical reliability.
Significance. If the reported metrics hold, the work supplies a valuable, reproducible public benchmark for a clinically important task. The use of independent submissions evaluated on held-out multi-center data, together with a rigorous multi-metric scheme, strengthens the reliability of the quantitative claims. The CT result demonstrates that current methods can reach high fragment-wise accuracy; the X-ray result quantifies the additional difficulty of projection imaging. The explicit discussion of methodological diversity (primary-secondary vs. boundary-core representations) provides concrete guidance for future algorithm design. These empirical strengths are the primary contribution.
major comments (1)
- [Abstract] Abstract: the claim that an IoU of 0.774 on the X-ray task is 'promising but not yet sufficient for intra-operative decision-making' rests on the unvalidated assumption that DeepDRR simulations faithfully reproduce real clinical X-ray characteristics (scatter, detector noise, beam hardening, and variable fragment overlap arising from patient positioning). No quantitative comparison between simulated and real X-ray images is reported, making the clinical-sufficiency interpretation load-bearing yet unsupported by the presented evidence.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of the work and for the constructive comment on the abstract. We address the point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that an IoU of 0.774 on the X-ray task is 'promising but not yet sufficient for intra-operative decision-making' rests on the unvalidated assumption that DeepDRR simulations faithfully reproduce real clinical X-ray characteristics (scatter, detector noise, beam hardening, and variable fragment overlap arising from patient positioning). No quantitative comparison between simulated and real X-ray images is reported, making the clinical-sufficiency interpretation load-bearing yet unsupported by the presented evidence.
Authors: We agree that the manuscript reports no quantitative comparison between DeepDRR-simulated projections and real clinical X-ray images, and that the interpretive claim regarding sufficiency for intra-operative decision-making therefore lacks direct supporting evidence. In the revised version we will remove this clinical-sufficiency statement from the abstract and replace it with a purely empirical observation on the performance gap between CT and X-ray tasks. revision: yes
Circularity Check
Empirical benchmark results with no circular derivations
full rationale
The paper reports performance metrics from a public challenge with independent team submissions evaluated on held-out test data. No equations, predictions, or first-principles derivations are present that reduce to fitted inputs or self-citations by construction. Central claims (CT IoU 0.930, X-ray IoU 0.774) are direct empirical outcomes. Any citations to DeepDRR or prior work are not load-bearing for the reported numbers and do not create self-definitional or fitted-input circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Fragment-wise IoU is a suitable primary metric for assessing clinical utility of fracture segmentation algorithms.
Forward citations
Cited by 1 Pith paper
-
Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation
ViTC-UNet adapts frozen ViT representations to biomedical semantic segmentation by conditioning a UNet via learnable tokens and two-way attention decoding.
Reference graph
Works this paper leans on
-
[1]
Basics of computer-assisted orthopaedic surgery,
J. Kowal, F. Langlotz, and L.-P. Nolte, “Basics of computer-assisted orthopaedic surgery,” inNavigation and MIS in Orthopedic Surgery. Springer, 2007, pp. 2–8
work page 2007
-
[2]
Computer-assisted orthopaedic surgery and robotic surgery in total hip arthroplasty,
N. Sugano, “Computer-assisted orthopaedic surgery and robotic surgery in total hip arthroplasty,”Clin. Orthop. Surg., vol. 5, no. 1, pp. 1–9, 2013
work page 2013
-
[3]
Y . Ge, C. Zhao, Y . Wang, and X. Wu, “Robot-assisted autonomous reduction of a displaced pelvic fracture: a case report and brief literature review,”Journal of Clinical Medicine, vol. 11, no. 6, p. 1598, 2022
work page 2022
-
[4]
Y . Liu, S. Yibulayimu, Y . Sang, G. Zhu, C. Shi, C. Liang, Q. Cao, C. Zhao, X. Wu, and Y . Wang, “Preoperative fracture reduction planning for image-guided pelvic trauma surgery: A comprehensive pipeline with learning,”Medical Image Analysis, p. 103506, 2025
work page 2025
-
[5]
Automatic path planning for pelvic fracture reduction with multi-degree-of-freedom,
C. Shi, Q. Yang, Y . Wang, X. Zhao, S. Shi, L. Zhang, S. Yibulayimu, Y . Liu, C. Liang, Y . Wanget al., “Automatic path planning for pelvic fracture reduction with multi-degree-of-freedom,”Computer Methods and Programs in Biomedicine, vol. 261, p. 108591, 2025. 17
work page 2025
-
[6]
R. Han, A. Uneri, R. C. Vijayan, P. Wu, P. Vagdargi, N. Sheth, S. V ogt, G. Kleinszig, G. Osgood, and J. H. Siewerdsen, “Fracture reduction planning and guidance in orthopaedic trauma surgery via multi-body image registration,”Medical image analysis, vol. 68, p. 101917, 2021
work page 2021
-
[7]
Automatic intraoperative CT-CBCT registration for image-guided pelvic fracture reduction,
Y . Liu, Y . Sang, S. Yibulayimu, G. Zhu, C. Shi, C. Liang, J. Liu, Q. Yang, C. Zhao, Q. Caoet al., “Automatic intraoperative CT-CBCT registration for image-guided pelvic fracture reduction,” in2024 IEEE ISBI. IEEE, 2024, pp. 1–5
work page 2024
-
[8]
DeepDRR–a catalyst for machine learning in fluoroscopy- guided procedures,
M. Unberath, J.-N. Zaech, S. C. Lee, B. Bier, J. Fotouhi, M. Armand, and N. Navab, “DeepDRR–a catalyst for machine learning in fluoroscopy- guided procedures,” inProc. Med. Image. Comput. Comput. Assist. Interv.Springer, 2018, pp. 98–106
work page 2018
-
[9]
Deep rib fracture instance segmentation and classification from CT on the RibFrac challenge,
J. Yang, R. Shi, L. Jin, X. Huang, K. Kuang, D. Wei, S. Gu, J. Liu, P. Liu, Z. Chaiet al., “Deep rib fracture instance segmentation and classification from CT on the RibFrac challenge,”arXiv preprint:2402.09372, 2024
-
[10]
Z. Hu, M. Patel, R. L. Ball, H. M. Lin, L. M. Prevedello, M. Naseri, S. Mathur, R. Moreland, J. Wilson, C. Witiwet al., “Assessing the performance of models from the 2022 RSNA cervical spine fracture detection competition at a level i trauma center,”Radiology: Artificial Intelligence, vol. 6, no. 6, p. e230550, 2024
work page 2022
-
[11]
Fractured bone detection challenge,
A. Khanal, K. Santosh, and S. K. Bajracharya, “Fractured bone detection challenge,” https://kaggle.com/competitions/ fractured-bone-detection-challenge, 2023, kaggle
work page 2023
-
[12]
Fully automatic and fast segmen- tation of the femur bone from 3D-CT images with no shape prior,
M. Kr ˇcah, G. Sz ´ekely, and R. Blanc, “Fully automatic and fast segmen- tation of the femur bone from 3D-CT images with no shape prior,” in 2011 IEEE ISBI. IEEE, 2011, pp. 2087–2090
work page 2011
-
[13]
L. Yang, S. Gao, P. Li, J. Shi, and F. Zhou, “Recognition and segmen- tation of individual bone fragments with a deep learning approach in CT scans of complex intertrochanteric fractures: a retrospective study,” Journal of Digital Imaging, vol. 35, no. 6, pp. 1681–1689, 2022
work page 2022
-
[14]
H. Kim, Y . D. Jeon, K. B. Park, H. Cha, M.-S. Kim, J. You, S.-W. Lee, S.-H. Shin, Y .-G. Chung, S. B. Kanget al., “Automatic segmentation of inconstant fractured fragments for tibia/fibula from CT images using deep learning,”Scientific Reports, vol. 13, no. 1, p. 20431, 2023
work page 2023
-
[15]
D. Wang, Z. Wu, G. Fan, H. Liu, X. Liao, Y . Chen, and H. Zhang, “Accu- racy and reliability analysis of a machine learning based segmentation tool for intertrochanteric femoral fracture CT,”Frontiers in Surgery, vol. 9, p. 913385, 2022
work page 2022
-
[16]
Pelvic fracture segmentation using a multi-scale distance-weighted neural network,
Y . Liu, S. Yibulayimu, Y . Sang, G. Zhu, Y . Wang, C. Zhao, and X. Wu, “Pelvic fracture segmentation using a multi-scale distance-weighted neural network,” inProc. Med. Image. Comput. Comput. Assist. Interv. Springer, 2023, pp. 312–321
work page 2023
-
[17]
Fragment distance- guided dual-stream learning for automatic pelvic fracture segmentation,
B. Zeng, H. Wang, L. Joskowicz, and X. Chen, “Fragment distance- guided dual-stream learning for automatic pelvic fracture segmentation,” Comput. Med. Imaging Graph, vol. 116, p. 102412, 2024
work page 2024
-
[18]
Synergistically segmenting and reducing fracture bones via whole-to- whole deep dense matching,
Z. Deng, J. Jiang, R. Huang, W. Zhang, Z. Chen, K. He, and Q. Yao, “Synergistically segmenting and reducing fracture bones via whole-to- whole deep dense matching,”Computers & Graphics, vol. 116, pp. 404– 417, 2023
work page 2023
-
[19]
A survey of fracture detection techniques in bone X-ray images,
D. Joshi and T. P. Singh, “A survey of fracture detection techniques in bone X-ray images,”Artificial Intelligence Review, vol. 53, no. 6, pp. 4475–4517, 2020
work page 2020
-
[20]
C.-T. Cheng, T.-Y . Ho, T.-Y . Lee, C.-C. Chang, C.-C. Chou, C.-C. Chen, I. Chung, C.-H. Liaoet al., “Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs,” European radiology, vol. 29, no. 10, pp. 5469–5477, 2019
work page 2019
-
[21]
Y . Wang, T. Bai, T. Li, and L. Huang, “Osteoporotic vertebral fracture classification in X-rays based on a multi-modal semantic consistency network,”J. Bionic. Eng., vol. 19, no. 6, pp. 1816–1829, 2022
work page 2022
-
[22]
Analysis on leg bone fracture detection and classification using X-ray images,
W. W. Myint, K. S. Tun, H. M. Tun, and H. Myint, “Analysis on leg bone fracture detection and classification using X-ray images,”Machine Learning Research, vol. 3, no. 3, pp. 49–59, 2018
work page 2018
-
[23]
Local- entropy based approach for X-ray image segmentation and fracture detection,
F. Hr ˇzi´c, I. ˇStajduhar, S. Tschauner, E. Sorantin, and J. Lerga, “Local- entropy based approach for X-ray image segmentation and fracture detection,”Entropy, vol. 21, no. 4, p. 338, 2019
work page 2019
-
[24]
Fracture detection in wrist X- ray images using deep learning-based object detection models,
F. Hardalac ¸, F. Uysal, O. Peker, M. C ¸ ic ¸eklida˘g, T. Tolunay, N. Tokg ¨oz, U. Kutbay, B. Demirciler, and F. Mert, “Fracture detection in wrist X- ray images using deep learning-based object detection models,”Sensors, vol. 22, no. 3, p. 1285, 2022
work page 2022
-
[25]
Deep learning-based localization and segmentation of wrist fractures on X-ray radiographs,
D. Joshi, T. P. Singh, and A. K. Joshi, “Deep learning-based localization and segmentation of wrist fractures on X-ray radiographs,”Neural Computing and Applications, vol. 34, no. 21, pp. 19 061–19 077, 2022
work page 2022
-
[26]
The impact of implementing backbone architectures on fracture segmentation in X-ray images,
S. Turk, O. Bingol, A. Coskuncay, and T. Aydin, “The impact of implementing backbone architectures on fracture segmentation in X-ray images,”Eng. Sci. Technol. Int. J., vol. 59, p. 101883, 2024
work page 2024
-
[27]
Deep-learning-based pelvic automatic segmentation in pelvic fractures,
J. M. Lee, J. Y . Park, Y . J. Kim, and K. G. Kim, “Deep-learning-based pelvic automatic segmentation in pelvic fractures,”Scientific Reports, vol. 14, no. 1, p. 12258, 2024
work page 2024
-
[28]
K. C. Kim, H. C. Cho, T. J. Jang, J. M. Choi, and J. K. Seo, “Automatic detection and segmentation of lumbar vertebrae from X-ray images for compression fracture evaluation,”Computer Methods and Programs in Biomedicine, vol. 200, p. 105833, 2021
work page 2021
-
[29]
Deep learning to segment pelvic bones: large-scale CT datasets and baseline models,
P. Liu, H. Han, Y . Du, H. Zhu, Y . Li, F. Gu, H. Xiao, J. Li, C. Zhao, L. Xiaoet al., “Deep learning to segment pelvic bones: large-scale CT datasets and baseline models,”International Journal of Computer Assisted Radiology and Surgery, vol. 16, pp. 749–756, 2021
work page 2021
-
[30]
C. Gao, B. D. Killeen, Y . Hu, R. B. Grupp, R. H. Taylor, M. Armand, and M. Unberath, “Synthetic data accelerates the development of gen- eralizable learning-based algorithms for X-ray image analysis,”Nature Machine Intelligence, vol. 5, no. 3, pp. 294–308, 2023
work page 2023
-
[31]
Mitigating bias in radiology machine learning: 3. performance metrics,
S. Faghani, B. Khosravi, K. Zhang, M. Moassefi, J. M. Jagtap, F. Nugen, S. Vahdati, S. P. Kuanar, S. M. Rassoulinejad-Mousavi, Y . Singhet al., “Mitigating bias in radiology machine learning: 3. performance metrics,” Radiology: Artificial Intelligence, vol. 4, no. 5, p. e220061, 2022
work page 2022
-
[32]
Totalsegmen- tator: robust segmentation of 104 anatomic structures in CT images,
J. Wasserthal, H.-C. Breit, M. T. Meyer, M. Pradella, D. Hinck, A. W. Sauter, T. Heye, D. T. Boll, J. Cyriac, S. Yanget al., “Totalsegmen- tator: robust segmentation of 104 anatomic structures in CT images,” Radiology: Artificial Intelligence, vol. 5, no. 5, p. e230024, 2023
work page 2023
-
[33]
Methods and open-source toolkit for analyzing and visualizing challenge results,
M. Wiesenfarth, A. Reinke, B. A. Landman, M. Eisenmann, L. A. Saiz, M. J. Cardoso, L. Maier-Hein, and A. Kopp-Schneider, “Methods and open-source toolkit for analyzing and visualizing challenge results,” Scientific reports, vol. 11, no. 1, p. 2369, 2021
work page 2021
-
[34]
Pose estimation of pe- riacetabular osteotomy fragments with intraoperative X-ray navigation,
R. B. Grupp, R. A. Hegeman, R. J. Murphy, C. P. Alexander, Y . Otake, B. A. McArthur, M. Armand, and R. H. Taylor, “Pose estimation of pe- riacetabular osteotomy fragments with intraoperative X-ray navigation,” IEEE Trans. Biomed. Eng., vol. 67, no. 2, pp. 441–452, 2019
work page 2019
-
[35]
S. Yibulayimu, Y . Liu, Y . Sang, J. Qin, C. Shi, C. Liang, G. Zhu, Y . Wang, C. Zhao, and X. Wu, “Fracformer: Fracture reduction planning with transformer-based shape restoration and fracture data simulation,” IEEE Transactions on Medical Imaging, 2025
work page 2025
-
[36]
Epidemiology and burden of pelvic fractures: Results from the global burden of disease study 2019,
S. Hu, J. Guo, B. Zhu, Y . Dong, and F. Li, “Epidemiology and burden of pelvic fractures: Results from the global burden of disease study 2019,” Injury, vol. 54, no. 2, pp. 589–597, 2023
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.