pith. sign in

arxiv: 2509.18308 · v3 · submitted 2025-09-22 · 💻 cs.CV

Rethinking Pulmonary Embolism Segmentation: A Study of Current Approaches and Challenges with an Open Weight Model

Pith reviewed 2026-05-18 14:05 UTC · model grok-4.3

classification 💻 cs.CV
keywords pulmonary embolismsegmentationCTPA3D U-NetResNetdeep learningmedical imagingdistal emboli
0
0 comments X p. Extension

The pith

A 3D U-Net with ResNet blocks remains the strongest performer for pulmonary embolism segmentation on a new 490-scan dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates why progress in automated pulmonary embolism segmentation has been slow, citing small datasets, missing baselines, and few head-to-head tests. It introduces a densely annotated collection of 490 CTPA scans from distinct patients and runs nine common segmentation networks in both 2D and 3D forms. Results indicate that a 3D U-Net using ResNet encoders delivers the highest scores, that volumetric models beat their slice-wise versions, and that all architectures make similar mistakes on the same cases. Distal clots stay especially hard to segment, which the authors tie to gaps in current training data. The best model and its weights are released to support further work.

Core claim

Through evaluation on a curated set of 490 CTPA scans, the study shows that a 3D U-Net with ResNet encoding blocks achieves the best segmentation performance measured by mean Dice and average symmetric surface distance, that 3D architectures consistently surpass 2D ones, that error patterns remain highly consistent across models, and that distal emboli continue to pose the greatest difficulty due to both inherent complexity and limited high-quality annotations.

What carries the argument

The 3D U-Net with ResNet encoding blocks, which ingests full volumetric CTPA data and produces voxel-wise embolus masks while serving as the top-ranked model in the nine-architecture comparison.

If this is right

  • Future PE segmentation pipelines should default to 3D rather than 2D network designs.
  • Greater attention must be paid to collecting and annotating distal emboli to close the remaining performance gap.
  • Because error patterns prove stable across architectures, gains are more likely to come from improved data than from novel model designs.
  • The released pretrained weights allow immediate testing and adaptation on other public or private CTPA collections without retraining from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Wider clinical use of the released model could help radiologists flag smaller clots that are currently missed, shortening time to treatment.
  • Pairing the model with uncertainty maps might highlight regions where distal emboli are likely to be overlooked and trigger extra review.
  • Testing the same architectures on non-contrast CT or MRI would show whether the 3D advantage generalizes beyond CTPA.

Load-bearing premise

The 490-scan dataset supplies dense, consistent, and representative annotations that capture the clinical spectrum of pulmonary embolisms, including enough distal cases for unbiased comparisons on the 60-scan test set.

What would settle it

Re-running the nine models on a fresh external CTPA collection that contains a markedly higher share of distal emboli and observing either a reversal in model ranking or a large drop in the 3D U-Net's Dice score.

Figures

Figures reproduced from arXiv: 2509.18308 by Kevin Kramer, Lawrence Ngo, Maciej A. Mazurowski, Ryan Chamberlain, Yixin Zhang.

Figure 1
Figure 1. Figure 1: Examples of segmentation results by nnUNet3D-ResNetXL. The first row shows successful segmentation of PEs. The second [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Vector of DSC scores for each test instance, predicted [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Upper Triangular: cosine similarity between DSC vec [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Letter-value plots showing the distribution of embolus [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Letter-value plots showing the distribution of embolus [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Pulmonary Embolism (PE) is a life-threatening condition for which accurate and timely detection is critical to patient care. However, our systematic study of PE segmentation algorithms reveals concerning limitations in the current state of research. Challenges such as small and inconsistent datasets, a lack of reproducible baselines, and limited comparative evaluation across models are hindering progress in the field. In this study, we curated a densely annotated dataset comprising 490 CTPA scans, each from a unique patient (430 for training and 60 for testing). We evaluated nine widely used segmentation architectures, including both CNN- and ViT-based models, in 2D and 3D configurations, using mean Dice Similarity Coefficient (mDSC) and Average Symmetric Surface Distance (ASSD) as evaluation metrics. Furthermore, the highest-performing model was evaluated on a public dataset without fine-tuning and achieved reasonable generalization performance. Our results show that: (1) a 3D U-Net with ResNet encoding blocks remains a highly effective architecture for PE segmentation; (2) 3D models consistently outperform their 2D counterparts; (3) across all architectures, when trained and evaluated on the same datasets, model error patterns are highly consistent; and (4) distal emboli remain particularly challenging due to both task complexity and the scarcity of high-quality datasets, highlighting the need for datasets with more comprehensive and consistent distal PE coverage. To promote research reproducibility, the architecture and pretrained weights of our best-performing model are publicly available at https://github.com/mazurowski-lab/PulmonaryEmbolismSegmentation

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports a systematic benchmarking of nine 2D and 3D segmentation architectures (CNN- and ViT-based) for pulmonary embolism on a curated dataset of 490 CTPA scans (430 train, 60 test). It concludes that a 3D U-Net with ResNet encoding blocks is highly effective, 3D models outperform 2D counterparts, error patterns are consistent across architectures, and distal emboli remain challenging, while releasing the best model weights and architecture for reproducibility.

Significance. If the annotations prove reliable, this study supplies a much-needed reproducible baseline and open weights for a field hampered by small, inconsistent datasets. The finding of architecture-independent error patterns and the emphasis on distal PE coverage could usefully direct future data collection and model development.

major comments (2)
  1. [§3.1] §3.1 (Dataset curation): The manuscript states that a 'densely annotated dataset' of 490 scans was created but provides no inter-annotator agreement statistics, no explicit annotation protocol, and no breakdown of proximal versus distal emboli counts in the 60-scan test set. These omissions are load-bearing for claims (3) and (4) in the abstract, because noisy or imbalanced labels could artifactually produce the reported consistent error patterns and distal difficulty.
  2. [§4.3] §4.3 (Cross-dataset evaluation): The generalization test of the best model on a public dataset is reported only qualitatively ('reasonable generalization performance') without naming the dataset, reporting exact mDSC/ASSD values, or describing any preprocessing or resolution differences. This prevents assessment of whether the result supports the broader claim of model robustness.
minor comments (2)
  1. [Results tables] Table 2 (or equivalent results table): clarify whether the reported mDSC and ASSD values are means over the 60 test cases or include per-case standard deviations; the latter would better support the 'highly consistent' error-pattern claim.
  2. [Figure 4] Figure 4 (qualitative results): the caption should explicitly state the slice thickness and windowing used for visualization so readers can judge whether distal emboli visibility is limited by imaging rather than model failure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important areas for improving transparency in dataset curation and evaluation reporting. We address each major comment below and will make the corresponding revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.1] §3.1 (Dataset curation): The manuscript states that a 'densely annotated dataset' of 490 scans was created but provides no inter-annotator agreement statistics, no explicit annotation protocol, and no breakdown of proximal versus distal emboli counts in the 60-scan test set. These omissions are load-bearing for claims (3) and (4) in the abstract, because noisy or imbalanced labels could artifactually produce the reported consistent error patterns and distal difficulty.

    Authors: We agree that these details are essential for validating annotation quality and supporting claims (3) and (4). In the revised manuscript, we will expand §3.1 to include a full description of the annotation protocol (including the number of annotators, their expertise, and the process for resolving disagreements), inter-annotator agreement statistics computed on an overlap subset, and a table or text breakdown of proximal versus distal emboli counts in the 60-scan test set. These additions will help confirm that the consistent error patterns reflect genuine task difficulty rather than label artifacts. revision: yes

  2. Referee: [§4.3] §4.3 (Cross-dataset evaluation): The generalization test of the best model on a public dataset is reported only qualitatively ('reasonable generalization performance') without naming the dataset, reporting exact mDSC/ASSD values, or describing any preprocessing or resolution differences. This prevents assessment of whether the result supports the broader claim of model robustness.

    Authors: We acknowledge the lack of specificity in the current reporting. We will revise the relevant section to name the public dataset, report the exact quantitative mDSC and ASSD values obtained without fine-tuning, and detail the preprocessing steps along with any differences in resolution, slice thickness, or acquisition parameters relative to our internal dataset. This will enable a clearer evaluation of generalization performance. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical benchmarking with independent results

full rationale

The paper is a straightforward empirical study that curates a 490-scan dataset, trains nine segmentation architectures (2D/3D CNN and ViT variants), and reports mDSC/ASSD metrics on a held-out 60-scan test set plus one external dataset. All central claims—3D U-Net with ResNet blocks being effective, 3D outperforming 2D, and consistent error patterns—are direct observations from these experiments rather than derivations, fitted predictions, or self-citation chains. No equations, uniqueness theorems, or ansatzes are invoked; the results stand on the reported training/testing protocol and public model weights. This is self-contained against external benchmarks with no reduction of outputs to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The study rests on standard medical imaging assumptions and common deep-learning practices without introducing new free parameters, axioms beyond domain conventions, or invented entities.

axioms (2)
  • domain assumption CTPA scans with expert annotations constitute a reliable ground truth for PE segmentation evaluation.
    Invoked when creating the 490-scan dataset and reporting mDSC/ASSD scores.
  • domain assumption Mean Dice and average symmetric surface distance are sufficient and appropriate metrics for comparing segmentation quality in this task.
    Used as primary evaluation criteria across all models.

pith-pipeline@v0.9.0 · 5829 in / 1445 out tokens · 64766 ms · 2026-05-18T14:05:34.118430+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Chronic thromboembolic pulmonary hypertension.Clinics in chest medicine, 28(1):255–269,

    William R Auger, Nick H Kim, Kim M Kerr, Victor J Test, and Peter F Fedullo. Chronic thromboembolic pulmonary hypertension.Clinics in chest medicine, 28(1):255–269,

  2. [2]

    Matias F Callejas, Hui Ming Lin, Thomas Howard, Matthew Aitken, Marc Napoleone, Laura Jimenez-Juan, Robert More- land, Shobhit Mathur, Djeven P Deva, and Errol Colak. Aug- mentation of the rsna pulmonary embolism ct dataset with bounding box annotations and anatomic localization of pul- monary emboli.Radiology: Artificial Intelligence, 5(3): e230001, 2023. 4

  3. [3]

    Computer aided detection of pulmonary em- bolism using multi-slice multi-axial segmentation.Applied Sciences, 10(8):2945, 2020

    Carlos Cano-Espinosa, Miguel Cazorla, and Germ ´an Gonz´alez. Computer aided detection of pulmonary em- bolism using multi-slice multi-axial segmentation.Applied Sciences, 10(8):2945, 2020. 2

  4. [4]

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018. 6

  5. [5]

    Scunet++: Swin-unet and cnn bottleneck hybrid architec- ture with multi-fusion dense skip connection for pulmonary embolism ct image segmentation

    Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yi- fan Huang, Feiwei Qin, Qinhai Li, and Changmiao Wang. Scunet++: Swin-unet and cnn bottleneck hybrid architec- ture with multi-fusion dense skip connection for pulmonary embolism ct image segmentation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7759–...

  6. [6]

    Feature-enhanced adversarial semi-supervised semantic segmentation network for pulmonary embolism annotation.Heliyon, 9(5), 2023

    Ting-Wei Cheng, Yi Wei Chua, Ching-Chun Huang, Jerry Chang, Chin Kuo, and Yun-Chien Cheng. Feature-enhanced adversarial semi-supervised semantic segmentation network for pulmonary embolism annotation.Heliyon, 9(5), 2023. 2

  7. [7]

    Kitamura, Stephen B

    Errol Colak, Felipe C. Kitamura, Stephen B. Hobbs, Carol C. Wu, Matthew P. Lungren, Luciano M. Prevedello, Jayashree Kalpathy-Cramer, Robyn L. Ball, George Shih, Anouk Stein, Safwan S. Halabi, Emre Altinmakas, Meng Law, Parveen Kumar, Karam A. Manzalawi, Dennis Charles Nelson Ru- bio, Jacob W. Sechrist, Pauline Germaine, Eva Castro Lopez, Tomas Amerio, Pu...

  8. [8]

    Label up: Learning pulmonary embolism segmentation from image level annotation through model explainability.arXiv preprint arXiv:2412.07384, 2024

    Florin Condrea, Saikiran Rapaka, and Marius Leordeanu. Label up: Learning pulmonary embolism segmentation from image level annotation through model explainability.arXiv preprint arXiv:2412.07384, 2024. 2, 4

  9. [9]

    Pixel- level annotated dataset of computed tomography angiogra- phy images of acute pulmonary embolism.Scientific Data, 10(1):518, 2023

    Jo ˜ao Mario Clementin de Andrade, Gabriel Olescki, Dante Luiz Escuissato, Lucas Ferrari Oliveira, Ana Car- olina Nicolleti Basso, and Gabriel Lucca Salvador. Pixel- level annotated dataset of computed tomography angiogra- phy images of acute pulmonary embolism.Scientific Data, 10(1):518, 2023. 2

  10. [10]

    Aissam Djahnine, Carole Lazarus, Mathieu Lederlin, S´ebastien Mul´e, Rafael Wiemker, Salim Si-Mohamed, Emi- lien Jupin-Delevaux, Olivier Nempont, Youssef Skandarani, Mathieu De Craene, et al. Detection and severity quantifi- cation of pulmonary embolism with 3d ct data using an au- tomated deep learning-based artificial solution.Diagnostic and Interventio...

  11. [11]

    An en- hanced mask r-cnn approach for pulmonary embolism de- tection and segmentation.Diagnostics, 14(11):1102, 2024

    K ˆamil Do ˘gan, Turab Selc ¸uk, and Ahmet Alkan. An en- hanced mask r-cnn approach for pulmonary embolism de- tection and segmentation.Diagnostics, 14(11):1102, 2024. 2, 4

  12. [12]

    Cad-pe, 2019

    German Gonzalez Serrano. Cad-pe, 2019. 2, 4

  13. [13]

    Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images

    Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. InInternational MICCAI brainlesion workshop, pages 272–284. Springer, 2021. 6

  14. [14]

    Automated de- tection of pulmonary embolism from ct-angiograms using deep learning.BMC Medical Imaging, 22(1):43, 2022

    Heidi Huhtanen, Mikko Nyman, Tarek Mohsen, Arho Virkki, Antti Karlsson, and Jussi Hirvonen. Automated de- tection of pulmonary embolism from ct-angiograms using deep learning.BMC Medical Imaging, 22(1):43, 2022. 2, 4

  15. [15]

    nnu-net revisited: A call for rigorous validation in 3d medical image segmentation

    Fabian Isensee, Tassilo Wald, Constantin Ulrich, Michael Baumgartner, Saikat Roy, Klaus Maier-Hein, and Paul F Jaeger. nnu-net revisited: A call for rigorous validation in 3d medical image segmentation. InInternational Confer- ence on Medical Image Computing and Computer-Assisted Intervention, pages 488–498. Springer, 2024. 6

  16. [16]

    Seeking an optimal approach for computer-aided diagnosis of pulmonary embolism.Med- ical image analysis, 91:102988, 2024

    Nahid Ul Islam, Zongwei Zhou, Shiv Gehlot, Michael B Gotway, and Jianming Liang. Seeking an optimal approach for computer-aided diagnosis of pulmonary embolism.Med- ical image analysis, 91:102988, 2024. 2, 4

  17. [17]

    Evaluation of acute pulmonary em- bolism and clot burden on ctpa with deep learning.European radiology, 30:3567–3575, 2020

    Weifang Liu, Min Liu, Xiaojuan Guo, Peiyao Zhang, Ling Zhang, Rongguo Zhang, Han Kang, Zhenguo Zhai, Xincao Tao, Jun Wan, et al. Evaluation of acute pulmonary em- bolism and clot burden on ctpa with deep learning.European radiology, 30:3567–3575, 2020. 2, 4

  18. [18]

    Cam- wnet: An effective solution for accurate pulmonary em- bolism segmentation.Medical Physics, 49(8):5294–5303,

    Zhenhong Liu, Hongfang Yuan, and Huaqing Wang. Cam- wnet: An effective solution for accurate pulmonary em- bolism segmentation.Medical Physics, 49(8):5294–5303,

  19. [19]

    A new dataset of computed-tomography angiography images for computer- aided detection of pulmonary embolism.Scientific data, 5 (1):1–9, 2018

    Mojtaba Masoudi, Hamid-Reza Pourreza, Mahdi Saadatmand-Tarzjan, Noushin Eftekhari, Fateme Shafiee Zargar, and Masoud Pezeshki Rad. A new dataset of computed-tomography angiography images for computer- aided detection of pulmonary embolism.Scientific data, 5 (1):1–9, 2018. 2, 4

  20. [20]

    K. M. Moser, W. R. Auger, and P. F. Fedullo. Chronic major- vessel thromboembolic pulmonary hypertension.Circula- tion, 81(6):1735–1743, 1990. 1

  21. [21]

    Attention u-net: Learning where to look for the pancreas

    Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention u-net: Learning where to look for the pancreas. InMedical Imaging with Deep Learning, 2018. 6

  22. [22]

    D’Armini, Marco Morsolini, Repke Snijder, Paul Bresser, Adam Torbicki, Bent Kristensen, Jerzy Lewczuk, Iveta Simkova, Joan A

    Joanna Pepke-Zaba, Marion Delcroix, Irene Lang, Eckhard Mayer, Pavel Jansa, David Ambroz, Carmen Treacy, An- drea M. D’Armini, Marco Morsolini, Repke Snijder, Paul Bresser, Adam Torbicki, Bent Kristensen, Jerzy Lewczuk, Iveta Simkova, Joan A. Barber`a, Marc de Perrot, Marius M. Hoeper, Sean Gaine, Rudolf Speich, Miguel A. Gomez- Sanchez, Gabor Kovacs, Abd...

  23. [23]

    Jiantao Pu, Naciye Sinem Gezer, Shangsi Ren, Aylin Oz- gen Alpaydin, Emre Ruhat Avci, Michael G Risbano, Be- linda Rivera-Lebron, Stephen Yu-Wah Chan, and Joseph K Leader. Automated detection and segmentation of pul- monary embolisms on computed tomography pulmonary an- giography (ctpa) using deep learning but without manual out- lining.Medical image anal...

  24. [24]

    Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images.ACM Trans

    Narinder Singh Punn and Sonali Agarwal. Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images.ACM Trans. Multimedia Comput. Commun. Appl., 16(1), 2020. 6

  25. [25]

    Mednext: transformer-driven scal- ing of convnets for medical image segmentation

    Saikat Roy, Gregor Koehler, Constantin Ulrich, Michael Baumgartner, Jens Petersen, Fabian Isensee, Paul F Jaeger, and Klaus H Maier-Hein. Mednext: transformer-driven scal- ing of convnets for medical image segmentation. InIn- ternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 405–415. Springer,

  26. [26]

    Silverstein, John A

    Marc D. Silverstein, John A. Heit, David N. Mohr, Tanya M. Petterson, W. Michael O’Fallon, and III Melton, L. Joseph. Trends in the incidence of deep vein thrombosis and pulmonary embolism: A 25-year population-based study. Archives of Internal Medicine, 158(6):585–593, 1998. 1

  27. [27]

    Pulmonary embolism image segmenta- tion based on an u-net method with cbam attention mech- anism

    Yiheng Tang, Siyu Zhan, Lu Guo, Hong Pu, Wanjie Feng, and Jianming Liao. Pulmonary embolism image segmenta- tion based on an u-net method with cbam attention mech- anism. In2022 3rd International Conference on Electron- ics, Communications and Information Technology (CECIT), pages 334–339. IEEE, 2022. 2

  28. [28]

    Victor F. Tapson. Acute pulmonary embolism.New England Journal of Medicine, 358(10):1037–1052, 2008. 1

  29. [29]

    Segment-based and patient-based segmentation of ctpa image in pulmonary embolism using cbam resu-net

    Theeraphat Trongmetheerat, Kanjanajak Sukprasert, Kontee Netiwongsanon, Tanawan Leeboonngam, and Kanes Sumet- pipat. Segment-based and patient-based segmentation of ctpa image in pulmonary embolism using cbam resu-net. InPro- ceedings of the 13th International Conference on Advances in Information Technology, pages 1–7, 2023. 4

  30. [30]

    Segformer: Simple and efficient design for semantic segmentation with transform- ers.Advances in neural information processing systems, 34: 12077–12090, 2021

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers.Advances in neural information processing systems, 34: 12077–12090, 2021. 6

  31. [31]

    Graph-cut-assisted cnn training for pulmonary embolism segmentation

    Nana Yang, Robin Verschuren, and C De Vleeschouwer. Graph-cut-assisted cnn training for pulmonary embolism segmentation. InESANN 2024, 2024. 2, 4

  32. [32]

    Mazurowski

    Yixin Zhang and Maciej A. Mazurowski. Convolutional neu- ral networks rarely learn shape for semantic segmentation. Pattern Recognition, 146:110018, 2024. 8

  33. [33]

    Chuan Zhou, Heang-Ping Chan, Aamer Chughtai, Smita Pa- tel, Jean Kuriakose, Lubomir M Hadjiiski, Jun Wei, and Ella A Kazerooni. Variabilities in reference standard by ra- diologists and performance assessment in detection of pul- monary embolism in ct pulmonary angiography.Journal of Digital Imaging, 32(6):1089–1096, 2019. 8