EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images
read the original abstract
Depth estimation is a foundational component for 3D reconstruction in minimally invasive endoscopic surgeries. However, existing monocular depth estimation techniques often exhibit limited performance to the varying illumination and complex textures of the surgical environment. While applying foundation models offers a promising approach to enhance the depth estimation performance, the domain gap between the natural images used for pre-training and the target endoscopic images leads to significant semantic perception deficiencies. In this study, EndoUFM is introduced as an unsupervised monocular depth estimation framework that innovatively \underline{U}tilizes dual Foundation Models for Endoscopic images, thereby enhancing the depth estimation performance by leveraging the powerful pre-learned priors. The framework features a novel adaptive fine-tuning strategy that incorporates Random Vector Low-Rank Adaptation (RVLoRA) to enhance model adaptability, and a Residual block based on Depthwise Separable Convolution (Res-DSC) to improve the capture of fine-grained local features. A mask-guided smoothness loss is also introduced to enforce depth consistency within anatomical structures. Extensive experiments on the SCARED, Hamlyn, SERV-CT, and EndoNeRF datasets confirm that our method achieves state-of-the-art performance while maintaining an efficient model size. This work contributes to augmenting surgeons' spatial perception during minimally invasive procedures, thereby enhancing surgical precision and safety, with crucial implications for augmented reality and navigation systems. Our code is available at https://github.com/RealMindyY/EndoUFM.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.