Rethinking Pulmonary Embolism Segmentation: A Study of Current Approaches and Challenges with an Open Weight Model
Pith reviewed 2026-05-18 14:05 UTC · model grok-4.3
The pith
A 3D U-Net with ResNet blocks remains the strongest performer for pulmonary embolism segmentation on a new 490-scan dataset.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through evaluation on a curated set of 490 CTPA scans, the study shows that a 3D U-Net with ResNet encoding blocks achieves the best segmentation performance measured by mean Dice and average symmetric surface distance, that 3D architectures consistently surpass 2D ones, that error patterns remain highly consistent across models, and that distal emboli continue to pose the greatest difficulty due to both inherent complexity and limited high-quality annotations.
What carries the argument
The 3D U-Net with ResNet encoding blocks, which ingests full volumetric CTPA data and produces voxel-wise embolus masks while serving as the top-ranked model in the nine-architecture comparison.
If this is right
- Future PE segmentation pipelines should default to 3D rather than 2D network designs.
- Greater attention must be paid to collecting and annotating distal emboli to close the remaining performance gap.
- Because error patterns prove stable across architectures, gains are more likely to come from improved data than from novel model designs.
- The released pretrained weights allow immediate testing and adaptation on other public or private CTPA collections without retraining from scratch.
Where Pith is reading between the lines
- Wider clinical use of the released model could help radiologists flag smaller clots that are currently missed, shortening time to treatment.
- Pairing the model with uncertainty maps might highlight regions where distal emboli are likely to be overlooked and trigger extra review.
- Testing the same architectures on non-contrast CT or MRI would show whether the 3D advantage generalizes beyond CTPA.
Load-bearing premise
The 490-scan dataset supplies dense, consistent, and representative annotations that capture the clinical spectrum of pulmonary embolisms, including enough distal cases for unbiased comparisons on the 60-scan test set.
What would settle it
Re-running the nine models on a fresh external CTPA collection that contains a markedly higher share of distal emboli and observing either a reversal in model ranking or a large drop in the 3D U-Net's Dice score.
Figures
read the original abstract
Pulmonary Embolism (PE) is a life-threatening condition for which accurate and timely detection is critical to patient care. However, our systematic study of PE segmentation algorithms reveals concerning limitations in the current state of research. Challenges such as small and inconsistent datasets, a lack of reproducible baselines, and limited comparative evaluation across models are hindering progress in the field. In this study, we curated a densely annotated dataset comprising 490 CTPA scans, each from a unique patient (430 for training and 60 for testing). We evaluated nine widely used segmentation architectures, including both CNN- and ViT-based models, in 2D and 3D configurations, using mean Dice Similarity Coefficient (mDSC) and Average Symmetric Surface Distance (ASSD) as evaluation metrics. Furthermore, the highest-performing model was evaluated on a public dataset without fine-tuning and achieved reasonable generalization performance. Our results show that: (1) a 3D U-Net with ResNet encoding blocks remains a highly effective architecture for PE segmentation; (2) 3D models consistently outperform their 2D counterparts; (3) across all architectures, when trained and evaluated on the same datasets, model error patterns are highly consistent; and (4) distal emboli remain particularly challenging due to both task complexity and the scarcity of high-quality datasets, highlighting the need for datasets with more comprehensive and consistent distal PE coverage. To promote research reproducibility, the architecture and pretrained weights of our best-performing model are publicly available at https://github.com/mazurowski-lab/PulmonaryEmbolismSegmentation
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a systematic benchmarking of nine 2D and 3D segmentation architectures (CNN- and ViT-based) for pulmonary embolism on a curated dataset of 490 CTPA scans (430 train, 60 test). It concludes that a 3D U-Net with ResNet encoding blocks is highly effective, 3D models outperform 2D counterparts, error patterns are consistent across architectures, and distal emboli remain challenging, while releasing the best model weights and architecture for reproducibility.
Significance. If the annotations prove reliable, this study supplies a much-needed reproducible baseline and open weights for a field hampered by small, inconsistent datasets. The finding of architecture-independent error patterns and the emphasis on distal PE coverage could usefully direct future data collection and model development.
major comments (2)
- [§3.1] §3.1 (Dataset curation): The manuscript states that a 'densely annotated dataset' of 490 scans was created but provides no inter-annotator agreement statistics, no explicit annotation protocol, and no breakdown of proximal versus distal emboli counts in the 60-scan test set. These omissions are load-bearing for claims (3) and (4) in the abstract, because noisy or imbalanced labels could artifactually produce the reported consistent error patterns and distal difficulty.
- [§4.3] §4.3 (Cross-dataset evaluation): The generalization test of the best model on a public dataset is reported only qualitatively ('reasonable generalization performance') without naming the dataset, reporting exact mDSC/ASSD values, or describing any preprocessing or resolution differences. This prevents assessment of whether the result supports the broader claim of model robustness.
minor comments (2)
- [Results tables] Table 2 (or equivalent results table): clarify whether the reported mDSC and ASSD values are means over the 60 test cases or include per-case standard deviations; the latter would better support the 'highly consistent' error-pattern claim.
- [Figure 4] Figure 4 (qualitative results): the caption should explicitly state the slice thickness and windowing used for visualization so readers can judge whether distal emboli visibility is limited by imaging rather than model failure.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments highlight important areas for improving transparency in dataset curation and evaluation reporting. We address each major comment below and will make the corresponding revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.1] §3.1 (Dataset curation): The manuscript states that a 'densely annotated dataset' of 490 scans was created but provides no inter-annotator agreement statistics, no explicit annotation protocol, and no breakdown of proximal versus distal emboli counts in the 60-scan test set. These omissions are load-bearing for claims (3) and (4) in the abstract, because noisy or imbalanced labels could artifactually produce the reported consistent error patterns and distal difficulty.
Authors: We agree that these details are essential for validating annotation quality and supporting claims (3) and (4). In the revised manuscript, we will expand §3.1 to include a full description of the annotation protocol (including the number of annotators, their expertise, and the process for resolving disagreements), inter-annotator agreement statistics computed on an overlap subset, and a table or text breakdown of proximal versus distal emboli counts in the 60-scan test set. These additions will help confirm that the consistent error patterns reflect genuine task difficulty rather than label artifacts. revision: yes
-
Referee: [§4.3] §4.3 (Cross-dataset evaluation): The generalization test of the best model on a public dataset is reported only qualitatively ('reasonable generalization performance') without naming the dataset, reporting exact mDSC/ASSD values, or describing any preprocessing or resolution differences. This prevents assessment of whether the result supports the broader claim of model robustness.
Authors: We acknowledge the lack of specificity in the current reporting. We will revise the relevant section to name the public dataset, report the exact quantitative mDSC and ASSD values obtained without fine-tuning, and detail the preprocessing steps along with any differences in resolution, slice thickness, or acquisition parameters relative to our internal dataset. This will enable a clearer evaluation of generalization performance. revision: yes
Circularity Check
No circularity: direct empirical benchmarking with independent results
full rationale
The paper is a straightforward empirical study that curates a 490-scan dataset, trains nine segmentation architectures (2D/3D CNN and ViT variants), and reports mDSC/ASSD metrics on a held-out 60-scan test set plus one external dataset. All central claims—3D U-Net with ResNet blocks being effective, 3D outperforming 2D, and consistent error patterns—are direct observations from these experiments rather than derivations, fitted predictions, or self-citation chains. No equations, uniqueness theorems, or ansatzes are invoked; the results stand on the reported training/testing protocol and public model weights. This is self-contained against external benchmarks with no reduction of outputs to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CTPA scans with expert annotations constitute a reliable ground truth for PE segmentation evaluation.
- domain assumption Mean Dice and average symmetric surface distance are sufficient and appropriate metrics for comparing segmentation quality in this task.
Reference graph
Works this paper leans on
-
[1]
Chronic thromboembolic pulmonary hypertension.Clinics in chest medicine, 28(1):255–269,
William R Auger, Nick H Kim, Kim M Kerr, Victor J Test, and Peter F Fedullo. Chronic thromboembolic pulmonary hypertension.Clinics in chest medicine, 28(1):255–269,
-
[2]
Matias F Callejas, Hui Ming Lin, Thomas Howard, Matthew Aitken, Marc Napoleone, Laura Jimenez-Juan, Robert More- land, Shobhit Mathur, Djeven P Deva, and Errol Colak. Aug- mentation of the rsna pulmonary embolism ct dataset with bounding box annotations and anatomic localization of pul- monary emboli.Radiology: Artificial Intelligence, 5(3): e230001, 2023. 4
work page 2023
-
[3]
Carlos Cano-Espinosa, Miguel Cazorla, and Germ ´an Gonz´alez. Computer aided detection of pulmonary em- bolism using multi-slice multi-axial segmentation.Applied Sciences, 10(8):2945, 2020. 2
work page 2020
-
[4]
Encoder-decoder with atrous separable convolution for semantic image segmentation
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018. 6
work page 2018
-
[5]
Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yi- fan Huang, Feiwei Qin, Qinhai Li, and Changmiao Wang. Scunet++: Swin-unet and cnn bottleneck hybrid architec- ture with multi-fusion dense skip connection for pulmonary embolism ct image segmentation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7759–...
work page 2024
-
[6]
Ting-Wei Cheng, Yi Wei Chua, Ching-Chun Huang, Jerry Chang, Chin Kuo, and Yun-Chien Cheng. Feature-enhanced adversarial semi-supervised semantic segmentation network for pulmonary embolism annotation.Heliyon, 9(5), 2023. 2
work page 2023
-
[7]
Errol Colak, Felipe C. Kitamura, Stephen B. Hobbs, Carol C. Wu, Matthew P. Lungren, Luciano M. Prevedello, Jayashree Kalpathy-Cramer, Robyn L. Ball, George Shih, Anouk Stein, Safwan S. Halabi, Emre Altinmakas, Meng Law, Parveen Kumar, Karam A. Manzalawi, Dennis Charles Nelson Ru- bio, Jacob W. Sechrist, Pauline Germaine, Eva Castro Lopez, Tomas Amerio, Pu...
work page 2021
-
[8]
Florin Condrea, Saikiran Rapaka, and Marius Leordeanu. Label up: Learning pulmonary embolism segmentation from image level annotation through model explainability.arXiv preprint arXiv:2412.07384, 2024. 2, 4
-
[9]
Jo ˜ao Mario Clementin de Andrade, Gabriel Olescki, Dante Luiz Escuissato, Lucas Ferrari Oliveira, Ana Car- olina Nicolleti Basso, and Gabriel Lucca Salvador. Pixel- level annotated dataset of computed tomography angiogra- phy images of acute pulmonary embolism.Scientific Data, 10(1):518, 2023. 2
work page 2023
-
[10]
Aissam Djahnine, Carole Lazarus, Mathieu Lederlin, S´ebastien Mul´e, Rafael Wiemker, Salim Si-Mohamed, Emi- lien Jupin-Delevaux, Olivier Nempont, Youssef Skandarani, Mathieu De Craene, et al. Detection and severity quantifi- cation of pulmonary embolism with 3d ct data using an au- tomated deep learning-based artificial solution.Diagnostic and Interventio...
work page 2024
-
[11]
K ˆamil Do ˘gan, Turab Selc ¸uk, and Ahmet Alkan. An en- hanced mask r-cnn approach for pulmonary embolism de- tection and segmentation.Diagnostics, 14(11):1102, 2024. 2, 4
work page 2024
- [12]
-
[13]
Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images
Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. InInternational MICCAI brainlesion workshop, pages 272–284. Springer, 2021. 6
work page 2021
-
[14]
Heidi Huhtanen, Mikko Nyman, Tarek Mohsen, Arho Virkki, Antti Karlsson, and Jussi Hirvonen. Automated de- tection of pulmonary embolism from ct-angiograms using deep learning.BMC Medical Imaging, 22(1):43, 2022. 2, 4
work page 2022
-
[15]
nnu-net revisited: A call for rigorous validation in 3d medical image segmentation
Fabian Isensee, Tassilo Wald, Constantin Ulrich, Michael Baumgartner, Saikat Roy, Klaus Maier-Hein, and Paul F Jaeger. nnu-net revisited: A call for rigorous validation in 3d medical image segmentation. InInternational Confer- ence on Medical Image Computing and Computer-Assisted Intervention, pages 488–498. Springer, 2024. 6
work page 2024
-
[16]
Nahid Ul Islam, Zongwei Zhou, Shiv Gehlot, Michael B Gotway, and Jianming Liang. Seeking an optimal approach for computer-aided diagnosis of pulmonary embolism.Med- ical image analysis, 91:102988, 2024. 2, 4
work page 2024
-
[17]
Weifang Liu, Min Liu, Xiaojuan Guo, Peiyao Zhang, Ling Zhang, Rongguo Zhang, Han Kang, Zhenguo Zhai, Xincao Tao, Jun Wan, et al. Evaluation of acute pulmonary em- bolism and clot burden on ctpa with deep learning.European radiology, 30:3567–3575, 2020. 2, 4
work page 2020
-
[18]
Zhenhong Liu, Hongfang Yuan, and Huaqing Wang. Cam- wnet: An effective solution for accurate pulmonary em- bolism segmentation.Medical Physics, 49(8):5294–5303,
-
[19]
Mojtaba Masoudi, Hamid-Reza Pourreza, Mahdi Saadatmand-Tarzjan, Noushin Eftekhari, Fateme Shafiee Zargar, and Masoud Pezeshki Rad. A new dataset of computed-tomography angiography images for computer- aided detection of pulmonary embolism.Scientific data, 5 (1):1–9, 2018. 2, 4
work page 2018
-
[20]
K. M. Moser, W. R. Auger, and P. F. Fedullo. Chronic major- vessel thromboembolic pulmonary hypertension.Circula- tion, 81(6):1735–1743, 1990. 1
work page 1990
-
[21]
Attention u-net: Learning where to look for the pancreas
Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention u-net: Learning where to look for the pancreas. InMedical Imaging with Deep Learning, 2018. 6
work page 2018
-
[22]
Joanna Pepke-Zaba, Marion Delcroix, Irene Lang, Eckhard Mayer, Pavel Jansa, David Ambroz, Carmen Treacy, An- drea M. D’Armini, Marco Morsolini, Repke Snijder, Paul Bresser, Adam Torbicki, Bent Kristensen, Jerzy Lewczuk, Iveta Simkova, Joan A. Barber`a, Marc de Perrot, Marius M. Hoeper, Sean Gaine, Rudolf Speich, Miguel A. Gomez- Sanchez, Gabor Kovacs, Abd...
work page 1973
-
[23]
Jiantao Pu, Naciye Sinem Gezer, Shangsi Ren, Aylin Oz- gen Alpaydin, Emre Ruhat Avci, Michael G Risbano, Be- linda Rivera-Lebron, Stephen Yu-Wah Chan, and Joseph K Leader. Automated detection and segmentation of pul- monary embolisms on computed tomography pulmonary an- giography (ctpa) using deep learning but without manual out- lining.Medical image anal...
work page 2023
-
[24]
Narinder Singh Punn and Sonali Agarwal. Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images.ACM Trans. Multimedia Comput. Commun. Appl., 16(1), 2020. 6
work page 2020
-
[25]
Mednext: transformer-driven scal- ing of convnets for medical image segmentation
Saikat Roy, Gregor Koehler, Constantin Ulrich, Michael Baumgartner, Jens Petersen, Fabian Isensee, Paul F Jaeger, and Klaus H Maier-Hein. Mednext: transformer-driven scal- ing of convnets for medical image segmentation. InIn- ternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 405–415. Springer,
-
[26]
Marc D. Silverstein, John A. Heit, David N. Mohr, Tanya M. Petterson, W. Michael O’Fallon, and III Melton, L. Joseph. Trends in the incidence of deep vein thrombosis and pulmonary embolism: A 25-year population-based study. Archives of Internal Medicine, 158(6):585–593, 1998. 1
work page 1998
-
[27]
Pulmonary embolism image segmenta- tion based on an u-net method with cbam attention mech- anism
Yiheng Tang, Siyu Zhan, Lu Guo, Hong Pu, Wanjie Feng, and Jianming Liao. Pulmonary embolism image segmenta- tion based on an u-net method with cbam attention mech- anism. In2022 3rd International Conference on Electron- ics, Communications and Information Technology (CECIT), pages 334–339. IEEE, 2022. 2
work page 2022
-
[28]
Victor F. Tapson. Acute pulmonary embolism.New England Journal of Medicine, 358(10):1037–1052, 2008. 1
work page 2008
-
[29]
Segment-based and patient-based segmentation of ctpa image in pulmonary embolism using cbam resu-net
Theeraphat Trongmetheerat, Kanjanajak Sukprasert, Kontee Netiwongsanon, Tanawan Leeboonngam, and Kanes Sumet- pipat. Segment-based and patient-based segmentation of ctpa image in pulmonary embolism using cbam resu-net. InPro- ceedings of the 13th International Conference on Advances in Information Technology, pages 1–7, 2023. 4
work page 2023
-
[30]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers.Advances in neural information processing systems, 34: 12077–12090, 2021. 6
work page 2021
-
[31]
Graph-cut-assisted cnn training for pulmonary embolism segmentation
Nana Yang, Robin Verschuren, and C De Vleeschouwer. Graph-cut-assisted cnn training for pulmonary embolism segmentation. InESANN 2024, 2024. 2, 4
work page 2024
-
[32]
Yixin Zhang and Maciej A. Mazurowski. Convolutional neu- ral networks rarely learn shape for semantic segmentation. Pattern Recognition, 146:110018, 2024. 8
work page 2024
-
[33]
Chuan Zhou, Heang-Ping Chan, Aamer Chughtai, Smita Pa- tel, Jean Kuriakose, Lubomir M Hadjiiski, Jun Wei, and Ella A Kazerooni. Variabilities in reference standard by ra- diologists and performance assessment in detection of pul- monary embolism in ct pulmonary angiography.Journal of Digital Imaging, 32(6):1089–1096, 2019. 8
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.