pith. sign in

arxiv: 1907.08136 · v1 · pith:JSP2DHIWnew · submitted 2019-07-16 · 💻 cs.CV · cs.RO

Autonomous Driving in the Lung using Deep Learning for Localization

Pith reviewed 2026-05-24 20:43 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords bronchoscope localizationdeep learningautonomous navigationlung biopsyCT registrationAirwayNetphantom lungcadaver evaluation
0
0 comments X

The pith

A deep learning model trained only on CT-rendered images localizes the bronchoscope in real video and enables 95 percent autonomous navigation to targets in a lung phantom.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops AirwayNet to estimate bronchoscope pose from video input and register it to a preoperative CT map. The network is trained exclusively on images rendered from the patient CT scan. On phantom lung videos it reaches an area under the precision-recall curve of 0.97 and on eight cadaver lungs the range is 0.82 to 0.997. These localization results let a robot drive the scope to four targets in the left and right lungs in 95 percent of trials. A reader would care because current biopsy sampling misses the target tissue in 26 to 33 percent of cases largely due to poor intraoperative registration.

Core claim

AirwayNet and BifurcationNet are deep learning models that localize the bronchoscope in the preoperative CT map from bronchoscopic video. Both networks are trained entirely on simulated images derived from the patient-specific CT. AirwayNet outperforms other deep learning localization algorithms on phantom data with an area under the precision-recall curve of 0.97. The same model achieves areas under the precision-recall curve from 0.82 to 0.997 on recorded videos from eight human cadaver lungs. Using AirwayNet for real-time feedback, a robot reaches four targets in the left and right lungs in 95 percent of phantom trials.

What carries the argument

AirwayNet, a convolutional network that predicts bronchoscope pose from single video frames for registration against the CT map.

If this is right

  • Autonomous navigation to multiple targets in both lungs is possible using only video feedback.
  • No domain adaptation or fine-tuning on real bronchoscopic data is required for the reported accuracy.
  • The localization accuracy demonstrated on cadaver data supports potential clinical use.
  • The method outperforms prior deep learning approaches for bronchoscope localization on the phantom benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training strategy could be tested on other endoscopic procedures that rely on preoperative imaging.
  • Successful real-time localization might reduce the manual skill currently needed to reach peripheral lung lesions.
  • Combining the localization output with robotic control could enable closed-loop biopsy sampling without continuous human input.
  • Direct comparison of navigation success rates with and without the network on the same phantom setup would quantify the gain from video-based feedback.

Load-bearing premise

Images rendered from the preoperative CT scan are similar enough in appearance and geometry to real bronchoscopic video that a network trained only on the rendered images produces accurate pose estimates on live footage.

What would settle it

Measuring the area under the precision-recall curve for AirwayNet on live patient bronchoscopy videos and checking whether it falls substantially below the cadaver range of 0.82 to 0.997.

Figures

Figures reproduced from arXiv: 1907.08136 by Chauncey Graetzel, David B. Camarillo, David Eng, Jake Sganga.

Figure 1
Figure 1. Figure 1: In A, a path through a preoperative lung CT is shown toward the site of [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The inputs and outputs of AirwayNet are shown. A camera image from the bronchoscope’s true position, [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: In A, the experimental setup for the phantom lung tests is shown. A [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: In the top row, example I cam images taken within the phantom lung by the bronchoscope are shown. In the bottom row, the I sim images are rendered at the same locations as the example I cam images. Each image is grayscaled and normalized to zero mean and unit standard deviation before training and evaluation. Irsim Icam [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: In the top row, example I cam images taken within one of the eight human cadaver lung by the bronchoscope are shown. In the bottom row, the I rsim images are rendered at the same locations as the example I cam images. The I rsim images are shown with patch and stripe randomization. These images are shown in color to highlight the effect, however they would be grayscaled and normalized before being used in … view at source ↗
Figure 6
Figure 6. Figure 6: The driving control loop is shown, where a trained AirwayNet feeds [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The loop operates at 48 Hz for this experiment. [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 7
Figure 7. Figure 7: AirwayNet, BifurcationNet and OffsetNet, trained on [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Closed-loop control of the robotic bronchoscope is shown using AirwayNet, trained on [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Repeated closed-loop control tests of the robotic bronchoscope is [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: AirwayNet, trained on I rsim, is shown in 13 independent tracking tasks in eight cadaver lungs. Left, the F1 score in classifying airways is shown in color for each airway the bronchoscope saw. The number of frames with each airway visible is represented in the line thickness. Right, the precision-recall curve is shown averaged across all airways. Below, the tracking analysis shows the error in position, … view at source ↗
Figure 11
Figure 11. Figure 11: The control loop for BifurcationNet is shown. A camera image from the bronchoscope’s true position, [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
read the original abstract

Lung cancer is the leading cause of cancer-related death worldwide, and early diagnosis is critical to improving patient outcomes. To diagnose cancer, a highly trained pulmonologist must navigate a flexible bronchoscope deep into the branched structure of the lung for biopsy. The biopsy fails to sample the target tissue in 26-33% of cases largely because of poor registration with the preoperative CT map. To improve intraoperative registration, we develop two deep learning approaches to localize the bronchoscope in the preoperative CT map based on the bronchoscopic video in real-time, called AirwayNet and BifurcationNet. The networks are trained entirely on simulated images derived from the patient-specific CT. When evaluated on recorded bronchoscopy videos in a phantom lung, AirwayNet outperforms other deep learning localization algorithms with an area under the precision-recall curve of 0.97. Using AirwayNet, we demonstrate autonomous driving in the phantom lung based on video feedback alone. The robot reaches four targets in the left and right lungs in 95% of the trials. On recorded videos in eight human cadaver lungs, AirwayNet achieves areas under the precision-recall curve ranging from 0.82 to 0.997.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces AirwayNet and BifurcationNet, deep learning models trained exclusively on images rendered from patient-specific preoperative CT scans to localize a bronchoscope in real time within the CT map. It reports an AUC-PR of 0.97 for AirwayNet on phantom bronchoscopy videos, a 95% success rate for closed-loop autonomous navigation to four targets in phantom lungs, and AUC-PR values of 0.82–0.997 on recorded videos from eight human cadaver lungs, all without domain adaptation or fine-tuning on real data.

Significance. If the zero-shot sim-to-real transfer holds, the work could meaningfully reduce the 26–33% biopsy failure rate in lung cancer diagnosis by enabling reliable video-based registration and autonomous navigation. The patient-specific CT rendering approach and the closed-loop phantom demonstration are concrete strengths that avoid reliance on large annotated real datasets.

major comments (2)
  1. [Abstract] Abstract: The central claim of accurate localization and 95% autonomous target-reaching on phantom video plus AUC-PR 0.82–0.997 on cadaver video rests on the untested assumption that CT-rendered images match real bronchoscopic appearance and geometry sufficiently for zero-shot transfer; no quantitative measure of simulation fidelity, ablation on lighting/specularity/deformation shifts, or domain-gap experiments is supplied, directly undermining verifiability of the reported metrics.
  2. [Abstract] Abstract and methods description: No network architecture, training hyperparameters, data-split protocol, loss functions, or statistical testing (e.g., confidence intervals on AUC-PR or success rates) are provided, so the quantitative support for outperforming other localization algorithms and the 95% success claim cannot be assessed or reproduced.
minor comments (1)
  1. [Abstract] The abstract states performance on 'recorded videos in eight human cadaver lungs' but does not clarify whether these are the same eight lungs used for any training or validation splits.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below, proposing revisions where they strengthen the manuscript without misrepresenting our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of accurate localization and 95% autonomous target-reaching on phantom video plus AUC-PR 0.82–0.997 on cadaver video rests on the untested assumption that CT-rendered images match real bronchoscopic appearance and geometry sufficiently for zero-shot transfer; no quantitative measure of simulation fidelity, ablation on lighting/specularity/deformation shifts, or domain-gap experiments is supplied, directly undermining verifiability of the reported metrics.

    Authors: The high AUC-PR values and 95% success rate achieved on real phantom and cadaver videos without any fine-tuning or domain adaptation constitute empirical validation of the sim-to-real transfer. We acknowledge, however, that the manuscript would benefit from a more explicit treatment of simulation fidelity. In revision we will add a discussion subsection that includes qualitative side-by-side comparisons of rendered and real images together with an analysis of the principal domain-gap factors (lighting, specularity, minor deformations) and their observed impact on localization performance. revision: yes

  2. Referee: [Abstract] Abstract and methods description: No network architecture, training hyperparameters, data-split protocol, loss functions, or statistical testing (e.g., confidence intervals on AUC-PR or success rates) are provided, so the quantitative support for outperforming other localization algorithms and the 95% success claim cannot be assessed or reproduced.

    Authors: The methods section already outlines the architectures of AirwayNet and BifurcationNet as well as the overall training procedure. To improve reproducibility and directly address the referee’s concern, we will expand this section with the precise network configurations, hyperparameter values, data-split protocol, loss functions, and will report confidence intervals for all AUC-PR and success-rate figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results measured on independent real data

full rationale

The paper trains AirwayNet and BifurcationNet exclusively on simulated images rendered from preoperative CT scans. Performance is then reported via AUC-PR on recorded phantom videos and eight cadaver lung videos never seen during training, plus closed-loop autonomous driving success rate (95%) on the phantom. No equations, fitted parameters, or self-citations are shown that reduce the reported metrics back to quantities defined or optimized on the same test data. The central claim therefore rests on empirical sim-to-real transfer rather than any definitional or self-referential reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the transferability of simulation-trained models to real video and on the assumption that phantom and cadaver results predict live-patient behavior.

free parameters (1)
  • neural network weights
    Millions of parameters in AirwayNet and BifurcationNet fitted by supervised training on simulated images.
axioms (1)
  • domain assumption Rendered CT images match real bronchoscopic video appearance and geometry closely enough for direct transfer
    Training is stated to be entirely on simulated images derived from patient-specific CT.

pith-pipeline@v0.9.0 · 5751 in / 1285 out tokens · 25446 ms · 2026-05-24T20:43:28.590549+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BronchoLumen: Analysis of recent YOLO-based architectures for real-time bronchial orifice detection in video bronchoscopy

    cs.CV 2026-05 unverdicted novelty 4.0

    BronchoLumen applies YOLOv8 and YOLOv12 to bronchial orifice detection, achieving mAP@0.5 of 0.91 and 0.84 on in-domain tests and 0.68 on cross-domain tests with open-weight models released.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper

  1. [1]

    Lung cancer: epidemiology, etiology, and prevention,

    C. S. D. Cruz, L. T. Tanoue, and R. A. Matthay, “Lung cancer: epidemiology, etiology, and prevention,” Clinics in chest medicine , vol. 32, no. 4, pp. 605–644, 2011

  2. [2]

    Diagnostic yield and complications of bronchoscopy for peripheral lung lesions. results of the aquire registry,

    D. E. Ost, et al. , “Diagnostic yield and complications of bronchoscopy for peripheral lung lesions. results of the aquire registry,” American journal of respiratory and critical care medicine , vol. 193, no. 1, pp. 68–77, 2016

  3. [3]

    Transthoracic needle biopsy of the lung,

    D. M. DiBardino, L. B. Yarmus, and R. W. Semaan, “Transthoracic needle biopsy of the lung,” Journal of thoracic disease , vol. 7, no. Suppl 4, p. S304, 2015

  4. [4]

    Image-based camera localization: an overview,

    Y . Wu, F. Tang, and H. Li, “Image-based camera localization: an overview,” Visual Computing for Industry, Biomedicine, and Art , vol. 1, no. 1, p. 8, 2018

  5. [5]

    Current and Emerging Robot-Assisted Endovascular Catheterization Technologies: A Review,

    H. Rafii-Tari, C. J. Payne, and G.-Z. Yang, “Current and Emerging Robot-Assisted Endovascular Catheterization Technologies: A Review,” Annals of Biomedical Engineering , vol. 42, no. 4, pp. 697–715

  6. [6]

    Motion planning for the virtual bronchoscopy,

    J. Rosell, A. P ´erez, P. Cabras, and A. Rosell, “Motion planning for the virtual bronchoscopy,” in 2012 IEEE International Conference on Robotics and Automation . IEEE, 2012, pp. 2 932–2 937

  7. [7]

    Navigated bron- choscopy: a technical review,

    P. J. Reynisson, H. O. Leira, T. N. Hernes, E. F. Hofstad, M. Scali, H. Sorger, T. Amundsen, F. Lindseth, and T. Langø, “Navigated bron- choscopy: a technical review,” Journal of bronchology & interventional pulmonology, vol. 21, no. 3, pp. 242–264, 2014

  8. [8]

    Orientation estimation of a continuum manipulator in a phantom lung,

    J. Sganga and D. B. Camarillo, “Orientation estimation of a continuum manipulator in a phantom lung,” in 2017 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2017, pp. 2 399–2 405

  9. [9]

    Combined video tracking and image-video registration for continuous bronchoscopic guidance,

    L. Rai, J. P. Helferty, and W. E. Higgins, “Combined video tracking and image-video registration for continuous bronchoscopic guidance,” International Journal of Computer Assisted Radiology and Surgery , vol. 3, no. 3-4, pp. 315–329, sep 2008

  10. [10]

    Interactive CT-Video Registration for the Continuous Guidance of Bronchoscopy,

    S. A. Merritt, R. Khare, R. Bascom, and W. E. Higgins, “Interactive CT-Video Registration for the Continuous Guidance of Bronchoscopy,” IEEE Transactions on Medical Imaging , vol. 32, no. 8, pp. 1 376–1 396, aug 2013

  11. [11]

    Construction of a multimodal ct-video chest model,

    P. D. Byrnes and W. E. Higgins, “Construction of a multimodal ct-video chest model,” in Medical Imaging 2014: Image-Guided Procedures, Robotic Interventions, and Modeling , vol. 9036. International Society for Optics and Photonics, 2014, p. 903607

  12. [12]

    Deep monocular 3d reconstruction for assisted navigation in bronchoscopy,

    M. Visentini-Scarzanella, T. Sugiura, T. Kaneko, and S. Koto, “Deep monocular 3d reconstruction for assisted navigation in bronchoscopy,” International journal of computer assisted radiology and surgery , vol. 12, no. 7, pp. 1 089–1 099, 2017

  13. [13]

    Branch: Bifurcation recognition for airway navigation based on structural characteristics,

    M. Shen, S. Giannarou, P. L. Shah, and G.-Z. Yang, “Branch: Bifurcation recognition for airway navigation based on structural characteristics,” in International Conference on Medical Image Computing and Computer- Assisted Intervention . Springer, 2017, pp. 182–189

  14. [14]

    Towards a Videobronchoscopy Localization System from Airway Centre Tracking

    C. S ´anchez, A. Esteban-Lansaque, A. Borr ´as, M. Diez-Ferrer, A. Rosell, and D. Gil, “Towards a Videobronchoscopy Localization System from Airway Centre Tracking.” in VISIGRAPP (4: VISAPP) , 2017, pp. 352– 359

  15. [15]

    Automatic registration of CT images to patient during the initial phase of bronchoscopy: A clinical pilot study,

    E. F. Hofstad, H. Sorger, H. O. Leira, T. Amundsen, and T. Langø, “Automatic registration of CT images to patient during the initial phase of bronchoscopy: A clinical pilot study,” Medical Physics , vol. 41, no. 4, p. 041903, mar 2014

  16. [16]

    A Discriminative Structural Similarity Measure and its Application to Video-V olume Registration for Endoscope Three-Dimensional Motion Tracking,

    X. Luo and K. Mori, “A Discriminative Structural Similarity Measure and its Application to Video-V olume Registration for Endoscope Three-Dimensional Motion Tracking,” IEEE Transactions on Medical Imaging, vol. 33, no. 6, pp. 1 248–1 261, jun 2014

  17. [17]

    Sparseness Meets Deepness: 3D Human Pose Estimation From Monoc- ular Video,

    X. Zhou, M. Zhu, S. Leonardos, K. G. Derpanis, and K. Daniilidis, “Sparseness Meets Deepness: 3D Human Pose Estimation From Monoc- ular Video,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , jun 2016

  18. [18]

    Learning Dexterous In-Hand Manipulation,

    OpenAI, et al. , “Learning Dexterous In-Hand Manipulation,” 2018

  19. [19]

    Are we ready for autonomous driving? the kitti vision benchmark suite,

    A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition . IEEE, 2012, pp. 3 354– 3 361

  20. [20]

    Offsetnet : Deep learning for localization in the lung using rendered images,

    J. Sganga, D. Eng, C. Graetzel, and D. B. Camarillo, “Offsetnet : Deep learning for localization in the lung using rendered images,” in 2019 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2019

  21. [21]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, sep 2017, pp. 23–30

  22. [22]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

  23. [23]

    Tensorflow: A system for large-scale machine learn- ing,

    M. Abadi, et al. , “Tensorflow: A system for large-scale machine learn- ing,” in 12th {USENIX} Symposium on Operating Systems Design and Implementation ( {OSDI} 16), 2016, pp. 265–283

  24. [24]

    Segmentation and image analysis of abnormal lungs at ct: current approaches, challenges, and future trends,

    A. Mansoor, U. Bagci, B. Foster, Z. Xu, G. Z. Papadakis, L. R. Folio, J. K. Udupa, and D. J. Mollura, “Segmentation and image analysis of abnormal lungs at ct: current approaches, challenges, and future trends,” RadioGraphics, vol. 35, no. 4, pp. 1 056–1 076, 2015. 10

  25. [25]

    Autonomous robotic intracardiac catheter navigation using haptic vision,

    G. Fagogenis, M. Mencattelli, Z. Machaidze, B. Rosa, K. Price, F. Wu, V . Weixler, M. Saeed, J. Mayer, and P. Dupont, “Autonomous robotic intracardiac catheter navigation using haptic vision,” Science Robotics , vol. 4, no. 29, p. eaaw1977, 2019. Jake Sganga earned a B.S. in Biomedical Engi- neering from Duke University and an M.S. in Bio- engineering fro...