pith. sign in

arxiv: 2606.07891 · v1 · pith:Y4FBQZXPnew · submitted 2026-06-05 · 💻 cs.CV

C3VD-DEFCOL: A Deformable Colonoscopy Dataset with Time-Resolved 3D Ground Truth and Realistic Appearance

Pith reviewed 2026-06-27 21:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords colonoscopydeformable reconstruction3D datasetsim-to-realperistalsispose estimationmedical imagingground truth
0
0 comments X

The pith

C3VD-DEFCOL supplies 110 colonoscopy videos with realistic appearance and time-resolved 3D ground truth across controlled deformation levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a generation framework that produces paired realistic RGB video and dense 3D geometry for colon scenes undergoing non-rigid deformation. Current datasets lack either in vivo-like texture or precise time-resolved ground truth during motion such as peristalsis, which limits development of reconstruction methods for estimating mucosal coverage. The method starts from existing colon meshes and trajectories, applies parameterized deformations including peristaltic waves at three severity levels, renders per-frame depth and other geometric data, and translates the results to realistic appearance while conditioning on depth to retain structural fidelity. Experiments then use the paired data to benchmark pose estimation and show error growth with deformation severity. The outcome is a reproducible quantitative platform for testing deformable reconstruction algorithms.

Core claim

The paper presents C3VD-DEFCOL as a dataset of 110 videos from 11 unique colon mesh geometries that includes varying camera trajectories, appearances, and parameterized deformation regimes. It supplies per-frame depth, surface normals, optical flow, camera poses, and time-stamped 3D meshes generated through controlled surface and centerline deformations followed by translation to in vivo-like mucosal color, texture, vasculature, and specular highlights. The paired ground truth enables direct measurement of how pose estimation error increases with peristaltic severity, providing evaluation axes unavailable in existing in vivo data.

What carries the argument

Parameterized deformation of colon meshes combined with depth-conditioned translation to realistic RGB that preserves underlying scene structure for valid ground truth.

If this is right

  • Pose estimation methods can be tested for robustness across three explicit levels of peristaltic deformation.
  • The dataset supplies controlled axes for measuring how non-rigid motion affects accuracy of downstream tasks such as coverage estimation.
  • Time-stamped meshes and optical flow allow direct quantitative comparison of reconstruction algorithms on identical deformation sequences.
  • The generation pipeline supports creation of additional videos with new trajectories or appearance variations while retaining geometric labels.
  • Researchers obtain a reproducible testbed that reduces reliance on real patient data for initial algorithm validation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The controlled severity axis could reveal specific failure modes in current reconstruction pipelines that remain hidden when tested only on real data.
  • The same deformation-plus-translation approach might extend to other soft-tissue endoscopic procedures to create comparable ground-truth resources.
  • If geometric fidelity holds under the translation, the dataset could serve as training material for models that must handle both appearance variation and deformation.
  • Future algorithms evaluated on this data may need explicit deformation modeling to avoid the observed error growth with severity.

Load-bearing premise

The translation step produces realistic video while keeping the rendered depth and meshes accurate enough to serve as valid 3D ground truth.

What would settle it

A measurement showing that 3D reconstructions computed from the generated RGB videos deviate substantially from the supplied ground-truth meshes and depth would indicate the translation altered the scene structure.

Figures

Figures reproduced from arXiv: 2606.07891 by Anthony Song, Ethan Luk, Jos\'e M.M. Montiel, Lalithkumar Seenivasan, Mayank V. Golhar, Nicholas J. Durr, Ra\'ul Iranzo, V\'ictor M. Batlle.

Figure 1
Figure 1. Figure 1: C3VD-DEFCOL provides realistic RGB video and dense ground truth for deformable colonoscopy scenes. For each time instance, we show sim-to-real RGB frames, depth, diffuse, optical flow, surface normals, time-stamped deforming meshes, and the associated camera trajectory, to illustrate the dataset’s paired appearance and 3D geometry annotations. [3]. 3D reconstruction of the colon could reveal missed regions… view at source ↗
Figure 2
Figure 2. Figure 2: C3VD-DEFCOL framework: Our framework first applies controlled geometric deforma￾tion to an undeformed colon mesh and renders the resulting deformable sequence to obtain dense ground-truth. The rendered depth sequence is then used to condition an LTX-based sim-to-real video model, together with a real colonoscopy appearance frame and text prompt, to generate realistic RGB videos that remain aligned with the… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative appearance comparison. C3VD-DEFCOL sim-to-real RGB images exhibit [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Estimated 3D trajectories from COLMAP (orange) compared with the ground truth (blue) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of geometric consistency. Edges extracted from depth maps and generated RGB images are overlaid to assess haustral-fold alignment. The strong overlap shows that C3VD-DEFCOL generates RGB images that are highly consistent with the input depth map. with the theoretical constraints of rigid-body SfM methods, where non-rigid surface fluctuations are interpreted as noise or outliers, lead… view at source ↗
read the original abstract

3D reconstruction could improve colonoscopy by estimating mucosal coverage and alerting clinicians to missed regions during screening. However, algorithm development is limited as no current datasets provide both a realistic in vivo appearance and dense, time-resolved 3D ground truth, especially under non-rigid deformation. We present C3VD-DEFCOL, a framework and dataset for evaluating deformable colonoscopy reconstruction with paired geometry and realistic texture. Starting from C3VD/C3VDv2 colon meshes and camera trajectories, we generate controlled deformations of the colon surface, including peristaltic waves and centerline motion, and render per-frame depth, surface normals, optical flow, camera poses, and time-stamped 3D meshes. We then use the rendered geometry, primarily depth, to condition an LTX-2.3-based sim-to-real translation model that produces RGB clips with in vivo-like mucosal color, texture, vasculature, and specular appearance while preserving the underlying 3D scene structure. The resulting dataset contains 110 videos from 11 unique colon mesh geometries, with varying camera trajectories, appearances, and parameterized deformation regimes, including three peristaltic severity levels that serve as controlled evaluation axes. We evaluate the generated videos using appearance realism, geometric consistency, and temporal consistency metrics, and use the paired ground truth to benchmark the downstream task of pose estimation in deformable 3D reconstruction. Our experiments show how pose estimation error increases with increasing deformation severity, providing a controlled stress test that is not possible with existing in vivo datasets. Overall, C3VD-DEFCOL is designed as a reproducible, quantitative evaluation platform for testing deformable 3D reconstruction algorithms, with the goal of reducing the domain gap between synthetic datasets and in vivo colonoscopy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents C3VD-DEFCOL, a dataset and framework for evaluating deformable 3D colonoscopy reconstruction. Starting from C3VD/C3VDv2 colon meshes and trajectories, it generates controlled surface deformations (peristaltic waves at three severity levels plus centerline motion), renders per-frame depth, normals, optical flow, poses and time-stamped meshes, then applies a depth-conditioned LTX-2.3 sim-to-real model to produce 110 RGB videos with in-vivo-like appearance. The paired geometry is used to benchmark pose estimation, with the central experimental claim that error increases with deformation severity, supplying a controlled stress test unavailable in existing in-vivo datasets.

Significance. If the geometric-preservation claim holds, the work supplies a reproducible quantitative platform with time-resolved 3D ground truth under explicitly parameterized deformation regimes; the three peristaltic severity levels constitute a controlled evaluation axis that directly addresses the domain gap between synthetic and in-vivo colonoscopy data. The manuscript explicitly positions the release as a benchmark resource rather than a one-off collection.

major comments (1)
  1. [Abstract] Abstract: the claim that 'geometric consistency metrics' were computed to verify that the LTX-2.3 translation 'preserves the underlying 3D scene structure' is load-bearing for every downstream use of the rendered meshes and depth maps as ground truth. No metric definitions, quantitative thresholds, error bars, or ablation (e.g., depth re-estimation error from translated RGB) are supplied, leaving the central assumption that viewpoint-dependent appearance changes do not alter implied geometry untested.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'geometric consistency metrics' were computed to verify that the LTX-2.3 translation 'preserves the underlying 3D scene structure' is load-bearing for every downstream use of the rendered meshes and depth maps as ground truth. No metric definitions, quantitative thresholds, error bars, or ablation (e.g., depth re-estimation error from translated RGB) are supplied, leaving the central assumption that viewpoint-dependent appearance changes do not alter implied geometry untested.

    Authors: We agree that the abstract's reference to geometric consistency metrics requires explicit support to substantiate the claim that the sim-to-real translation preserves 3D structure. The manuscript states that such metrics were computed as part of the evaluation, but we acknowledge that definitions, numerical results with error bars, thresholds, and an ablation (e.g., re-projecting depth from the translated RGB) are not detailed in the current version. In the revision we will add these elements, including precise metric definitions (depth L1 error and surface normal angular deviation between original rendered geometry and geometry recovered from translated RGB), reported values with standard deviations across the 110 sequences, and the requested ablation study. revision: yes

Circularity Check

0 steps flagged

No circularity; dataset generation with external components

full rationale

The work is a dataset construction pipeline starting from prior C3VD meshes, applying parameterized deformations, rendering geometry outputs, and conditioning an external LTX-2.3 translation model. No equations, fitted parameters renamed as predictions, self-citations as load-bearing uniqueness theorems, or ansatzes smuggled via citation appear in the provided text. Evaluations use separate realism/consistency metrics on outputs, but these do not reduce the core claims to the inputs by construction. This is the expected non-finding for a reproducible dataset paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger entries are inferred from the high-level description of the generation pipeline.

free parameters (1)
  • peristaltic severity levels
    Three discrete levels chosen as controlled evaluation axes; values not numerically specified in abstract.
axioms (1)
  • domain assumption The LTX-2.3-based sim-to-real translation preserves the underlying 3D geometry when generating realistic RGB appearance
    Validity of all paired ground truth depends on this preservation; invoked in the description of the translation step.

pith-pipeline@v0.9.1-grok · 5893 in / 1276 out tokens · 28277 ms · 2026-06-27T21:50:06.646129+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 8 canonical work pages · 5 internal anchors

  1. [1]

    Cancer statistics, 2025.CA: A Cancer Journal for Clinicians, 75(1):10, 2025

    Rebecca L Siegel, Tyler B Kratzer, Angela N Giaquinto, Hyuna Sung, and Ahmedin Jemal. Cancer statistics, 2025.CA: A Cancer Journal for Clinicians, 75(1):10, 2025

  2. [2]

    Magnitude, risk factors, and factors associated with adenoma miss rate of tandem colonoscopy: a systematic review and meta-analysis.Gastroenterology, 156(6):1661–1674, 2019

    Shengbing Zhao, Shuling Wang, Peng Pan, Tian Xia, Xin Chang, Xia Yang, Liliangzi Guo, Qianqian Meng, Fan Yang, Wei Qian, et al. Magnitude, risk factors, and factors associated with adenoma miss rate of tandem colonoscopy: a systematic review and meta-analysis.Gastroenterology, 156(6):1661–1674, 2019

  3. [3]

    Artificial intelligence identifies and quantifies colonoscopy blind spots.Endoscopy, 53(12):1284–1286, 2021

    Sarah K McGill, Julian Rosenman, Rui Wang, Ruibin Ma, Jan-Michael Frahm, and Stephen Pizer. Artificial intelligence identifies and quantifies colonoscopy blind spots.Endoscopy, 53(12):1284–1286, 2021

  4. [4]

    RNNSLAM: Reconstructing the 3D colon to visualize missing regions during a colonoscopy

    Ruibin Ma, Rui Wang, Yubo Zhang, Stephen Pizer, Sarah K McGill, Julian Rosenman, and Jan-Michael Frahm. RNNSLAM: Reconstructing the 3D colon to visualize missing regions during a colonoscopy. Medical image analysis, 72:102100, 2021

  5. [5]

    Improving colonoscopy lesion classification using semi-supervised deep learning.IEEE Access, 9:631–640, 2020

    Mayank Golhar, Taylor L Bobrow, Mirmilad Pourmousavi Khoshknab, Simran Jit, Saowanee Ngamru- engphong, and Nicholas J Durr. Improving colonoscopy lesion classification using semi-supervised deep learning.IEEE Access, 9:631–640, 2020

  6. [6]

    Gan inversion for data augmentation to improve colonoscopy lesion classification.IEEE Journal of Biomedical and Health Informatics, 2024

    Mayank V Golhar, Taylor L Bobrow, Saowanee Ngamruengphong, and Nicholas J Durr. Gan inversion for data augmentation to improve colonoscopy lesion classification.IEEE Journal of Biomedical and Health Informatics, 2024

  7. [7]

    A robust method for blood vessel extraction in endoscopic images with svm-based scene classification

    Mayank Golhar, Yuji Iwahori, Manas Kamal Bhuyan, Kenji Funahashi, and Kunio Kasugai. A robust method for blood vessel extraction in endoscopic images with svm-based scene classification. InICPRAM, pages 148–156, 2017

  8. [8]

    Blood vessel delineation in endoscopic images with deep learning based scene classification

    Mayank Golhar, Yuji Iwahori, Manas Kamal Bhuyan, Kenji Funahashi, and Kunio Kasugai. Blood vessel delineation in endoscopic images with deep learning based scene classification. InPattern Recognition Applications and Methods: 6th International Conference, ICPRAM 2017, Porto, Portugal, February 24–26, 2017, Revised Selected Papers 6, pages 147–168. Springer, 2018

  9. [9]

    A deep learning framework for quality assessment and restoration in video endoscopy.Medical image analysis, 68:101900, 2021

    Sharib Ali, Felix Zhou, Adam Bailey, Barbara Braden, James E East, Xin Lu, and Jens Rittscher. A deep learning framework for quality assessment and restoration in video endoscopy.Medical image analysis, 68:101900, 2021

  10. [10]

    Endomapper dataset of complete calibrated endoscopy procedures.Scientific Data, 10(1):671, 2023

    Pablo Azagra, Carlos Sostres, Ángel Ferrández, Luis Riazuelo, Clara Tomasini, O León Barbed, Javier Morlana, David Recasens, Víctor M Batlle, Juan J Gómez-Rodríguez, et al. Endomapper dataset of complete calibrated endoscopy procedures.Scientific Data, 10(1):671, 2023

  11. [11]

    Simcol3d—3d reconstruction during colonoscopy challenge.Medical Image Analysis, 96:103195, 2024

    Anita Rau, Sophia Bano, Yueming Jin, Pablo Azagra, Javier Morlana, Rawen Kader, Edward Sanderson, Bogdan J Matuszewski, Jae Young Lee, Dong-Jae Lee, et al. Simcol3d—3d reconstruction during colonoscopy challenge.Medical Image Analysis, 96:103195, 2024

  12. [12]

    Simintestine: A synthetic dataset from virtual capsule endoscope.Medical Image Analysis, 105:103706, 2025

    Sarita Singh, Basabi Bhaumik, and Shouri Chatterjee. Simintestine: A synthetic dataset from virtual capsule endoscope.Medical Image Analysis, 105:103706, 2025

  13. [13]

    ColonSplat: Reconstruction of peristaltic motion in colonoscopy with dynamic gaussian splatting.arXiv preprint arXiv:2603.06860, 2026

    Weronika Smolak-Dy˙zewska, Joanna Kaleta, Diego Dall’Alba, and Przemysław Spurek. ColonSplat: Reconstruction of peristaltic motion in colonoscopy with dynamic gaussian splatting.arXiv preprint arXiv:2603.06860, 2026

  14. [14]

    Colonoscopy 3d video dataset with paired depth from 2d-3d registration.Medical image analysis, 90:102956, 2023

    Taylor L Bobrow, Mayank Golhar, Rohan Vijayan, Venkata S Akshintala, Juan R Garcia, and Nicholas J Durr. Colonoscopy 3d video dataset with paired depth from 2d-3d registration.Medical image analysis, 90:102956, 2023. 10

  15. [15]

    C3VDv2–colonoscopy 3D video dataset with enhanced realism.arXiv preprint arXiv:2506.24074, 2025

    Mayank V Golhar, Lucas Sebastian Galeano Fretes, Loren Ayers, Venkata S Akshintala, Taylor L Bobrow, and Nicholas J Durr. C3VDv2–colonoscopy 3D video dataset with enhanced realism.arXiv preprint arXiv:2506.24074, 2025

  16. [16]

    NR-SLAM: Non-rigid monocular SLAM

    Juan J Gomez Rodriguez, José MM Montiel, and Juan D Tardós. NR-SLAM: Non-rigid monocular SLAM. IEEE Transactions on Robotics, 40:4252–4264, 2024

  17. [17]

    Bobrow, Gulfize Coskun, Kagan Incetan, Yasin Almalioglu, Faisal Mahmood, Eva Curto, Luis Perdigoto, Marina Oliveira, Hasan Sahin, Helder Araujo, Henrique Alexandrino, Nicholas J

    Kutsev Bengisu Ozyoruk, Guliz Irem Gokceler, Taylor L. Bobrow, Gulfize Coskun, Kagan Incetan, Yasin Almalioglu, Faisal Mahmood, Eva Curto, Luis Perdigoto, Marina Oliveira, Hasan Sahin, Helder Araujo, Henrique Alexandrino, Nicholas J. Durr, Hunter B. Gilbert, and Mehmet Turan. EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimat...

  18. [18]

    Vr-caps: a virtual environment for capsule endoscopy.Medical image analysis, 70:101990, 2021

    Ka˘gan ˙Incetan, Ibrahim Omer Celik, Abdulhamid Obeid, Guliz Irem Gokceler, Kutsev Bengisu Ozyoruk, Yasin Almalioglu, Richard J Chen, Faisal Mahmood, Hunter Gilbert, Nicholas J Durr, et al. Vr-caps: a virtual environment for capsule endoscopy.Medical image analysis, 70:101990, 2021

  19. [19]

    Structure-from-motion revisited

    Johannes L Schonberger and Jan-Michael Frahm. Structure-from-motion revisited. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016

  20. [20]

    SAGE: SLAM with appearance and geometry prior for endoscopy

    Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D Hager, Russell H Taylor, and Mathias Unberath. SAGE: SLAM with appearance and geometry prior for endoscopy. In2022 International conference on robotics and automation (ICRA), pages 5587–5593. IEEE, 2022

  21. [21]

    The drunkard’s odometry: Estimating camera motion in deforming scenes.Advances in Neural Information Processing Systems, 36, 2024

    David Recasens Lafuente, Martin R Oswald, Marc Pollefeys, and Javier Civera. The drunkard’s odometry: Estimating camera motion in deforming scenes.Advances in Neural Information Processing Systems, 36, 2024

  22. [22]

    Gaussian pancakes: Geometrically-regularized 3D gaussian splatting for realistic endoscopic reconstruction

    Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, and Sophia Bano. Gaussian pancakes: Geometrically-regularized 3D gaussian splatting for realistic endoscopic reconstruction. InInt. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 274–283. Springer Nature, 2024

  23. [23]

    EndoGSLAM: Real-time dense reconstruction and tracking in endoscopic surgeries using gaussian splatting

    Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, and Wei Shen. EndoGSLAM: Real-time dense reconstruction and tracking in endoscopic surgeries using gaussian splatting. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 219–229. Springer, 2024

  24. [24]

    ENeRF-SLAM: A dense endoscopic SLAM with neural implicit representation.IEEE Transactions on Medical Robotics and Bionics, 6(3):1030–1041, 2024

    Jiwei Shan, Yirui Li, Ting Xie, and Hesheng Wang. ENeRF-SLAM: A dense endoscopic SLAM with neural implicit representation.IEEE Transactions on Medical Robotics and Bionics, 6(3):1030–1041, 2024

  25. [25]

    SD- DefSLAM: Semi-direct monocular slam for deformable and intracorporeal scenes

    Juan J Gómez-Rodríguez, José Lamarca, Javier Morlana, Juan D Tardós, and José MM Montiel. SD- DefSLAM: Semi-direct monocular slam for deformable and intracorporeal scenes. In2021 IEEE interna- tional conference on robotics and automation (ICRA), pages 5170–5177. IEEE, 2021

  26. [26]

    LTX-2: Efficient Joint Audio-Visual Foundation Model

    Yoav HaCohen, Benny Brazowski, Nisan Chiprut, Yaki Bitterman, Andrew Kvochko, Avishai Berkowitz, Daniel Shalem, Daphna Lifschitz, Dudu Moshe, Eitan Porat, et al. Ltx-2: Efficient joint audio-visual foundation model.arXiv preprint arXiv:2601.03233, 2026

  27. [27]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

  28. [28]

    In-context LoRA for diffusion transformers

    Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, and Jingren Zhou. In-context lora for diffusion transformers.arXiv preprint arXiv:2410.23775, 2024

  29. [29]

    Coloncrafter: A depth estimation model for colonoscopy videos using diffusion priors

    Romain Hardy, Tyler M Berzin, and Pranav Rajpurkar. Coloncrafter: A depth estimation model for colonoscopy videos using diffusion priors. InBiocomputing 2026: Proceedings of the Pacific Symposium, pages 27–41. World Scientific, 2025

  30. [30]

    Image-to-image translation with conditional adversarial networks

    Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017

  31. [31]

    AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

    Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning.arXiv preprint arXiv:2307.04725, 2023. 11

  32. [32]

    High-resolution image synthesis with latent diffusion models [internet].arXiv [cs

    R Rombach. High-resolution image synthesis with latent diffusion models [internet].arXiv [cs. CV]., 2021

  33. [33]

    Adding conditional control to text-to-image diffusion models

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023

  34. [34]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

  35. [35]

    Demystifying MMD GANs

    Mikołaj Bi´nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. arXiv preprint arXiv:1801.01401, 2018

  36. [36]

    Towards Accurate Generative Models of Video: A New Metric & Challenges

    Thomas Unterthiner, Sjoerd Van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. Towards accurate generative models of video: A new metric & challenges.arXiv preprint arXiv:1812.01717, 2018

  37. [37]

    Soria, E

    X. Soria, E. Riba, and A. Sappa. Dense extreme inception network: Towards a robust cnn model for edge detection. In2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1912–1921, Los Alamitos, CA, USA, mar 2020. IEEE Computer Society

  38. [38]

    HalF-SAM: SAM-based haustral fold detection in colonoscopy with debris suppression and temporal consistency

    Mayank Golhar, Luojie Huang, and Nicholas J Durr. HalF-SAM: SAM-based haustral fold detection in colonoscopy with debris suppression and temporal consistency. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 300–309. Springer, 2025

  39. [39]

    On the content bias in fréchet video distance

    Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, and Jia-Bin Huang. On the content bias in fréchet video distance. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7277–7288, 2024

  40. [40]

    Distinctive image features from scale-invariant keypoints.International journal of computer vision, 60(2):91–110, 2004

    David G Lowe. Distinctive image features from scale-invariant keypoints.International journal of computer vision, 60(2):91–110, 2004

  41. [41]

    Superglue: Learning feature matching with graph neural networks

    Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020. 12