BronchoLumen: Analysis of recent YOLO-based architectures for real-time bronchial orifice detection in video bronchoscopy

Marian Himstedt; Yongchao Li

arxiv: 2605.11748 · v1 · submitted 2026-05-12 · 💻 cs.CV

BronchoLumen: Analysis of recent YOLO-based architectures for real-time bronchial orifice detection in video bronchoscopy

Yongchao Li , Marian Himstedt This is my paper

Pith reviewed 2026-05-13 05:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords bronchoscopyYOLObronchial orifice detectionobject detectionreal-timecross-domainmedical imagingnavigation assistance

0 comments

The pith

BronchoLumen shows YOLO models can detect bronchial orifices with 0.68 cross-domain accuracy using only public image data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BronchoLumen as a real-time YOLO-based system to locate bronchial orifices in video bronchoscopy, addressing the challenge of navigating the respiratory tract's complex branches. It tests whether state-of-the-art detectors trained solely on limited public datasets can perform robustly across different image domains rather than requiring private clinical data. YOLOv8 reaches mAP@0.5 of 0.91 in-domain and 0.68 cross-domain while YOLOv12 reaches 0.84 and 0.68 with slightly better localization scores. If these results hold, the approach supplies an accessible tool to support navigation assistance and computer-aided diagnosis in pulmonary procedures. The models are released as open weights to enable further development.

Core claim

Bronchial orifices can be detected robustly across image domains by training YOLOv8 and YOLOv12 exclusively on publicly available datasets, yielding mAP@0.5 scores of 0.91 and 0.84 on in-domain tests and 0.68 for both on cross-domain tests, with YOLOv12 showing marginally superior localization accuracy at mAP@0.5:0.9 of 0.48 versus 0.45, and the overall system proving stable in most scenarios despite occasional motion blur or low-contrast cases.

What carries the argument

BronchoLumen, the open-weight YOLO-based object detection pipeline that processes bronchoscopic frames to identify orifice locations for navigation support.

If this is right

The system supplies real-time orifice detection to guide navigation through the respiratory tract's branching anatomy.
It offers a practical component for integration into computer-aided diagnosis tools used in pulmonary clinics and intensive care.
Public release of the trained weights allows other researchers to adapt the approach without starting from scratch.
YOLOv12's modest edge in localization accuracy may favor it in applications where precise boundary placement matters most.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Embedding the detector into live video pipelines could reduce procedure time by highlighting orifices ahead of the scope tip.
Performance gains are likely if future work adds domain adaptation steps that account for video-specific artifacts beyond static images.
The same detection strategy could transfer to other endoscopic procedures that involve tubular branching structures.

Load-bearing premise

Performance measured on public static image datasets will translate directly to real-time clinical bronchoscopy videos that contain variable lighting, motion, and anatomy.

What would settle it

Running the released models on a new collection of uncurated clinical bronchoscopy video sequences and checking whether mAP@0.5 falls below 0.5 or localization errors rise sharply under live lighting and motion conditions.

read the original abstract

Bronchoscopy is routinely conducted in pulmonary clinics and intensive care units, but navigating the complex branching of the respiratory tract remains challenging. This paper introduces BronchoLumen, a real-time YOLO-based system for detecting bronchial orifices in video bronchoscopy, aiming to assist navigation and CAD systems. The paper investigates if bronchial orifices can be robustly detected across image domains using state-of-the-art object detection and a limited set of public image data. The study includes the description and comparison of YOLOv8, a widely adopted architecture, and YOLOv12, a more recent architecture integrating attention-based modules to improve spatial reasoning. Both models are trained and tested solely on publicly available datasets comprising different image domains. A comparison of both models is conducted based on the common metrics mAP@0.5 and mAP@0.5:0.9 with the latter emphasizing localization accuracy. For YOLOv8 we obtained a mAP@0.5 of 0.91 on an in-domain and 0.68 on a cross-domain test set. YOLOv12 achieved 0.84 and 0.68 respectively with slightly better localization accuracy with mAP@0.5:0.9 of 0.48 and 0.26 compared to YOLOv8 with 0.45 and 0.25. Challenges like motion blur and low contrast occasionally entailed uncertainties but the system demonstrated overall robustness in most scenarios. BronchoLumen is an open-weight, YOLO-based solution for bronchial orifice detection offering high accuracy and efficiency across multiple image domains. While the more recent YOLOv12 achieves better localization accuracy, we observed a slightly worse precision. The models have been made publicly available to foster further research in bronchoscopy navigation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces BronchoLumen, an open-weight YOLO-based system comparing YOLOv8 and YOLOv12 for bronchial orifice detection. Both models are trained and evaluated solely on public static image datasets from multiple domains, reporting mAP@0.5 of 0.91 (YOLOv8) / 0.84 (YOLOv12) in-domain and 0.68 cross-domain, along with mAP@0.5:0.9 values, with claims of overall robustness to artifacts like motion blur and low contrast for real-time video bronchoscopy assistance.

Significance. If the performance generalizes beyond the tested static images, the work supplies a reproducible open-weight baseline for orifice detection that could support bronchoscopy navigation and CAD systems, while documenting trade-offs between a standard YOLOv8 and an attention-augmented YOLOv12 in localization accuracy versus precision.

major comments (2)

[Abstract] Abstract: The title and abstract frame the contribution as a 'real-time YOLO-based system for detecting bronchial orifices in video bronchoscopy,' yet the evaluation is performed exclusively on static public image datasets with no reported testing on actual bronchoscopy video sequences, temporal consistency metrics, or clinical conditions (variable lighting, scope motion, anatomy). This mismatch is load-bearing for the central claim of clinical video applicability.
[Results] Results (inferred from reported mAP values and robustness statements): Claims of robustness to 'motion blur and low contrast' and 'overall robustness in most scenarios' rest on static images containing those artifacts rather than on video streams or real-time OR acquisition, leaving the translation to the asserted video bronchoscopy use case untested.

minor comments (2)

[Abstract] Abstract: Training hyperparameters, exact dataset composition (image counts per domain, train/val/test splits), and any error analysis or failure-case breakdown are omitted, which hinders verification of the reported mAP figures.
The manuscript would benefit from explicit reporting of inference speed (FPS on target hardware) to substantiate the 'real-time' and 'efficiency' assertions, even if limited to image-based testing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments identifying the mismatch between the manuscript framing and the scope of the experiments. We agree that the title, abstract, and robustness claims overstate direct applicability to video bronchoscopy sequences. We will revise the title, abstract, results, and discussion sections to accurately reflect that the evaluation is performed on static public image datasets, while noting the models' computational efficiency supports potential real-time frame-by-frame use. No new video experiments will be added.

read point-by-point responses

Referee: [Abstract] Abstract: The title and abstract frame the contribution as a 'real-time YOLO-based system for detecting bronchial orifices in video bronchoscopy,' yet the evaluation is performed exclusively on static public image datasets with no reported testing on actual bronchoscopy video sequences, temporal consistency metrics, or clinical conditions (variable lighting, scope motion, anatomy). This mismatch is load-bearing for the central claim of clinical video applicability.

Authors: We acknowledge that the current title and abstract emphasize video bronchoscopy applicability, whereas all training and testing used static images from public datasets. The 'real-time' descriptor refers to the high inference speed of the YOLO architectures (approximately 100+ FPS on standard hardware), which enables processing of video frames without lag. The datasets include images with motion blur, low contrast, and other bronchoscopy-typical artifacts, allowing assessment of detection under those conditions. We will revise the title to 'BronchoLumen: Analysis of recent YOLO-based architectures for bronchial orifice detection in bronchoscopic images' and update the abstract to clarify the image-based evaluation scope, the potential for video extension, and the absence of temporal or clinical video testing. These changes will be incorporated in the revised manuscript. revision: yes
Referee: [Results] Results (inferred from reported mAP values and robustness statements): Claims of robustness to 'motion blur and low contrast' and 'overall robustness in most scenarios' rest on static images containing those artifacts rather than on video streams or real-time OR acquisition, leaving the translation to the asserted video bronchoscopy use case untested.

Authors: The robustness statements are based on performance on the static test sets, which contain images exhibiting motion blur, low contrast, and related artifacts. We agree this does not equate to evaluation on continuous video streams, temporal consistency, or full clinical acquisition conditions. We will revise the results and discussion sections to explicitly state that robustness was observed in static images containing these artifacts and to frame the video bronchoscopy use case as a prospective application rather than a tested outcome. The mAP metrics and model comparisons will remain unchanged as they accurately reflect the image-based experiments performed. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical training and evaluation on public image datasets

full rationale

The paper reports standard YOLO training on public static image datasets followed by mAP evaluation on held-out splits (in-domain and cross-domain). No equations, fitted parameters renamed as predictions, self-citations forming load-bearing uniqueness claims, or ansatzes are present. All performance numbers derive directly from data splits and standard metrics without reduction to prior inputs by construction. The central claim rests on empirical results rather than any self-referential derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that public bronchoscopy images suffice to train models that generalize to clinical video; no new mathematical axioms or invented physical entities are introduced.

axioms (1)

domain assumption YOLO-style object detectors trained on public images can locate bronchial orifices in video frames.
Invoked when the authors train and evaluate the models on the described datasets.

pith-pipeline@v0.9.0 · 5634 in / 1160 out tokens · 35163 ms · 2026-05-13T05:48:26.857411+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Both models are trained and tested solely on publicly available datasets comprising different image domains... mAP@0.5 of 0.91 on an in-domain and 0.68 on a cross-domain test set
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

YOLOv12-M architecture builds upon the YOLOv8 backbone but introduces attention-based modules

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Sci Rep11(1), 23765 (2021)

Yoo, J.Y., Kang, S.Y., Park, J.S., Cho, Y.-J., Park, S.Y., Yoon, H.I., Park, S.J., Jeong, H.-G., Kim, T.: Deep learning for anatomical interpretation of video bronchoscopy images. Sci Rep11(1), 23765 (2021)

work page 2021
[2]

In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Zhao, J., Chen, H., Tian, Q., Chen, J., Yang, B., Zhang, Z., Liu, H.: Bronchocopi- lot: Towards autonomous robotic bronchoscopy via multimodal reinforcement learning. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6923–6930 (2024). IEEE

work page 2024
[3]

International Journal of Computer Assisted Radiology and Surgery19(4), 713–721 (2024)

Keuth, R., Heinrich, M., Eichenlaub, M., Himstedt, M.: Airway label prediction in video bronchoscopy: capturing temporal dependencies utilizing anatomical knowl- edge. International Journal of Computer Assisted Radiology and Surgery19(4), 713–721 (2024)

work page 2024
[4]

Eberhardt, R., Kahn, N., Gompelmann, D., Schumann, M., Heussel, C.P., Herth, F.J.F.: LungPoint—A New Approach to Peripheral Lesions. J. Thorac. Oncol. 5(10), 1559–1563 (2010) https://doi.org/10.1097/JTO.0b013e3181e8b308

work page doi:10.1097/jto.0b013e3181e8b308 2010
[5]

In: MICCAI 2004, pp

Nagao, J., Mori, K., Enjouji, T., Deguchi, D., Kitasaka, T., Suenaga, Y., Hasegawa, J.-i., Toriwaki, J.-i., Takabatake, H., Natori, H.: Fast and accu- rate bronchoscope tracking using image registration and motion prediction. In: MICCAI 2004, pp. 551–558 (2004). Springer

work page 2004
[6]

In: Med Imaging 2023: Image Process, vol

Keuth, R., Heinrich, M., Eichenlaub, M., Himstedt, M.: Weakly supervised airway orifice segmentation in video bronchoscopy. In: Med Imaging 2023: Image Process, vol. 12464. SPIE, San Diego, CA (2023). International Society for Optics and Photonics

work page 2023
[7]

Scientific Data11(1), 321 (2024)

Vu, V.G., Hoang, A.D., Phan, T.P., Nguyen, N.D., Nguyen, T.T., Nguyen, D.N., Dao, N.P., Doan, T.P.L., Nguyen, T.T.H., Trinh, T.H.,et al.: Bm-broncholc- a rich bronchoscopy dataset for anatomical landmarks and lung cancer lesion recognition. Scientific Data11(1), 321 (2024)

work page 2024
[8]

Sganga, J., Eng, D., Graetzel, C., Camarillo, D.B.: Autonomous Driving in the Lung using Deep Learning for Localization (2019) https://doi.org/10.48550/ arxiv.1907.08136

work page arXiv 2019
[9]

In: 2024 IEEE 14th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, pp

Xu, S., Wang, X., Qin, Y., Wang, H., Yu, N., Han, J.: Depth-awareness shared self-supervised bronchial orifice segmentation for center detection in vision-based robotic bronchoscopy. In: 2024 IEEE 14th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, pp. 345–351 (2024) 9

work page 2024
[10]

Medical image analysis73, 102164 (2021)

Banach, A., King, F., Masaki, F., Tsukada, H., Hata, N.: Visually navigated bron- choscopy using three cycle-consistent generative adversarial network for depth estimation. Medical image analysis73, 102164 (2021)

work page 2021
[11]

International Journal of Computer Assisted Radiology and Surgery20(8), 1741–1748 (2025)

Soliman, A., Keuth, R., Himstedt, M.: Bronchogan: anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy. International Journal of Computer Assisted Radiology and Surgery20(8), 1741–1748 (2025)

work page 2025
[12]

IEEE Transactions on Medical Imaging (2024)

Tian, Q., Liao, H., Huang, X., Yang, B., Wu, J., Chen, J., Li, L., Liu, H.: Bronchotrack: Airway lumen tracking for branch-level bronchoscopic localization. IEEE Transactions on Medical Imaging (2024)

work page 2024
[13]

Nature communications15(1), 241 (2024)

Zhang, J., Liu, L., Xiang, P., Fang, Q., Nie, X., Ma, H., Hu, J., Xiong, R., Wang, Y., Lu, H.: Ai co-pilot bronchoscope robot. Nature communications15(1), 241 (2024)

work page 2024
[14]

CVPR (2016)

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. CVPR (2016)

work page 2016
[15]

Computers in biology and medicine141, 105031 (2022)

Pacal, I., Karaman, A., Karaboga, D., Akay, B., Basturk, A., Nalbantoglu, U., Coskun, S.: An efficient real-time colonic polyp detection with yolo algorithms trained by using negative samples and large datasets. Computers in biology and medicine141, 105031 (2022)

work page 2022
[16]

In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Deng, J., Li, P., Dhaliwal, K., Lu, C.X., Khadem, M.: Feature-based visual odome- try for bronchoscopy: A dataset and benchmark. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6557–6564 (2023)

work page 2023
[17]

Accessed: 2025-05-22 (2023)

Jocher, G., Ultralytics: YOLOv8: Open-Source Object Detection Framework. Accessed: 2025-05-22 (2023). https://github.com/ultralytics/ultralytics 10

work page 2025

[1] [1]

Sci Rep11(1), 23765 (2021)

Yoo, J.Y., Kang, S.Y., Park, J.S., Cho, Y.-J., Park, S.Y., Yoon, H.I., Park, S.J., Jeong, H.-G., Kim, T.: Deep learning for anatomical interpretation of video bronchoscopy images. Sci Rep11(1), 23765 (2021)

work page 2021

[2] [2]

In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Zhao, J., Chen, H., Tian, Q., Chen, J., Yang, B., Zhang, Z., Liu, H.: Bronchocopi- lot: Towards autonomous robotic bronchoscopy via multimodal reinforcement learning. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6923–6930 (2024). IEEE

work page 2024

[3] [3]

International Journal of Computer Assisted Radiology and Surgery19(4), 713–721 (2024)

Keuth, R., Heinrich, M., Eichenlaub, M., Himstedt, M.: Airway label prediction in video bronchoscopy: capturing temporal dependencies utilizing anatomical knowl- edge. International Journal of Computer Assisted Radiology and Surgery19(4), 713–721 (2024)

work page 2024

[4] [4]

Eberhardt, R., Kahn, N., Gompelmann, D., Schumann, M., Heussel, C.P., Herth, F.J.F.: LungPoint—A New Approach to Peripheral Lesions. J. Thorac. Oncol. 5(10), 1559–1563 (2010) https://doi.org/10.1097/JTO.0b013e3181e8b308

work page doi:10.1097/jto.0b013e3181e8b308 2010

[5] [5]

In: MICCAI 2004, pp

Nagao, J., Mori, K., Enjouji, T., Deguchi, D., Kitasaka, T., Suenaga, Y., Hasegawa, J.-i., Toriwaki, J.-i., Takabatake, H., Natori, H.: Fast and accu- rate bronchoscope tracking using image registration and motion prediction. In: MICCAI 2004, pp. 551–558 (2004). Springer

work page 2004

[6] [6]

In: Med Imaging 2023: Image Process, vol

Keuth, R., Heinrich, M., Eichenlaub, M., Himstedt, M.: Weakly supervised airway orifice segmentation in video bronchoscopy. In: Med Imaging 2023: Image Process, vol. 12464. SPIE, San Diego, CA (2023). International Society for Optics and Photonics

work page 2023

[7] [7]

Scientific Data11(1), 321 (2024)

Vu, V.G., Hoang, A.D., Phan, T.P., Nguyen, N.D., Nguyen, T.T., Nguyen, D.N., Dao, N.P., Doan, T.P.L., Nguyen, T.T.H., Trinh, T.H.,et al.: Bm-broncholc- a rich bronchoscopy dataset for anatomical landmarks and lung cancer lesion recognition. Scientific Data11(1), 321 (2024)

work page 2024

[8] [8]

Sganga, J., Eng, D., Graetzel, C., Camarillo, D.B.: Autonomous Driving in the Lung using Deep Learning for Localization (2019) https://doi.org/10.48550/ arxiv.1907.08136

work page arXiv 2019

[9] [9]

In: 2024 IEEE 14th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, pp

Xu, S., Wang, X., Qin, Y., Wang, H., Yu, N., Han, J.: Depth-awareness shared self-supervised bronchial orifice segmentation for center detection in vision-based robotic bronchoscopy. In: 2024 IEEE 14th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, pp. 345–351 (2024) 9

work page 2024

[10] [10]

Medical image analysis73, 102164 (2021)

Banach, A., King, F., Masaki, F., Tsukada, H., Hata, N.: Visually navigated bron- choscopy using three cycle-consistent generative adversarial network for depth estimation. Medical image analysis73, 102164 (2021)

work page 2021

[11] [11]

International Journal of Computer Assisted Radiology and Surgery20(8), 1741–1748 (2025)

Soliman, A., Keuth, R., Himstedt, M.: Bronchogan: anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy. International Journal of Computer Assisted Radiology and Surgery20(8), 1741–1748 (2025)

work page 2025

[12] [12]

IEEE Transactions on Medical Imaging (2024)

Tian, Q., Liao, H., Huang, X., Yang, B., Wu, J., Chen, J., Li, L., Liu, H.: Bronchotrack: Airway lumen tracking for branch-level bronchoscopic localization. IEEE Transactions on Medical Imaging (2024)

work page 2024

[13] [13]

Nature communications15(1), 241 (2024)

Zhang, J., Liu, L., Xiang, P., Fang, Q., Nie, X., Ma, H., Hu, J., Xiong, R., Wang, Y., Lu, H.: Ai co-pilot bronchoscope robot. Nature communications15(1), 241 (2024)

work page 2024

[14] [14]

CVPR (2016)

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. CVPR (2016)

work page 2016

[15] [15]

Computers in biology and medicine141, 105031 (2022)

Pacal, I., Karaman, A., Karaboga, D., Akay, B., Basturk, A., Nalbantoglu, U., Coskun, S.: An efficient real-time colonic polyp detection with yolo algorithms trained by using negative samples and large datasets. Computers in biology and medicine141, 105031 (2022)

work page 2022

[16] [16]

In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Deng, J., Li, P., Dhaliwal, K., Lu, C.X., Khadem, M.: Feature-based visual odome- try for bronchoscopy: A dataset and benchmark. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6557–6564 (2023)

work page 2023

[17] [17]

Accessed: 2025-05-22 (2023)

Jocher, G., Ultralytics: YOLOv8: Open-Source Object Detection Framework. Accessed: 2025-05-22 (2023). https://github.com/ultralytics/ultralytics 10

work page 2025