Recognition: 2 theorem links
· Lean TheoremMODEST: Multi-Optics Depth-of-Field Stereo Dataset
Pith reviewed 2026-05-17 04:09 UTC · model grok-4.3
The pith
The authors introduce the first high-resolution stereo DSLR dataset with 18000 images that systematically varies focal length and aperture across real scenes to capture professional camera optics for depth tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that MODEST supplies the first large-scale, high-resolution (5472 by 3648 pixels) stereo DSLR dataset in which focal length and aperture are varied systematically across complex real scenes. For each of nine scenes the authors record 2000 images using two matched camera rigs at focal lengths from 28 mm to 70 mm and apertures from f/2.8 to f/22, producing fifty optical configurations together with calibration images for every configuration. The scenes include reflective surfaces, transparent glass, mirrors, fine detail, and mixed lighting so that geometric and optical effects can be isolated for monocular and stereo depth estimation, shallow depth-of-field rendering, debl
What carries the argument
The central object is the dual identical DSLR capture protocol that records synchronized stereo pairs while stepping through ten focal lengths and five apertures for each scene, paired with per-configuration calibration images that enable separate analysis of geometric and optical influences.
If this is right
- Controlled analysis of how focal length and aperture affect monocular and stereo depth estimation becomes feasible on real data.
- Classical and learning-based intrinsic and extrinsic calibration methods can be evaluated across fifty optical configurations.
- Current state-of-the-art monocular, stereo depth, and depth-of-field methods can be tested against documented real optical challenges.
- Research on shallow depth-of-field rendering, deblurring, 3D reconstruction, and novel view synthesis gains a real-world benchmark with calibration support.
- The realism gap between synthetic training data and actual professional camera optics can be measured and reduced.
Where Pith is reading between the lines
- Models trained on MODEST may generalize more reliably to professional camera inputs in robotics and augmented reality than models trained only on synthetic or fixed-optics data.
- Explicit modeling of varying depth of field could become a standard component of depth pipelines rather than an afterthought.
- Extensions that add temporal sequences or additional camera brands would test whether the current nine scenes already capture the essential optical variations.
- Vision algorithms might shift from assuming fixed pinhole optics toward pipelines that ingest focal length and aperture metadata as first-class inputs.
Load-bearing premise
The nine chosen scenes and the two identical camera assemblies sufficiently represent the diversity and optical complexity of real professional camera use without unaccounted capture artifacts or selection biases.
What would settle it
If depth estimation models trained on synthetic data achieve accuracy on independent real DSLR captures that matches or exceeds accuracy on MODEST, or if the optical effects in the dataset prove reproducible by simple pinhole models without the recorded parameter changes, the claim of unique optical realism would be challenged.
Figures
read the original abstract
Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472$\times$3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the MODEST dataset: the first high-resolution (5472×3648 px) stereo DSLR dataset with 18,000 images captured across 9 complex real scenes. It systematically varies 10 focal lengths (28-70 mm) and 5 apertures (f/2.8 to f/22) using two identical camera assemblies, providing 2000 images per scene along with dedicated calibration sets for each configuration. The dataset aims to support research on monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D reconstruction, and novel view synthesis under realistic optical conditions, including challenging elements like reflections, transparencies, and optical illusions.
Significance. If the captured data accurately represents professional camera optics without unaccounted artifacts, this dataset would fill an important gap in real-world high-fidelity stereo data for computer vision. It enables controlled studies of geometric and optical effects across a wide range of focal and aperture settings, which is currently limited by synthetic data or smaller real datasets. The public release of images, calibrations, and evaluation code promotes reproducible research on optical generalization.
major comments (1)
- [Abstract] The headline claim of capturing 'the optical realism and complexity of professional camera systems' across 'complex real scenes' rests on a selection of only 9 scenes described qualitatively by complexity, lighting, and background. No quantitative metrics of scene diversity (such as depth histograms, material coverage, or lighting variation statistics) or comparisons to standard scene benchmarks are provided, which is load-bearing for asserting that the dataset bridges the realism gap for broad professional use.
minor comments (2)
- [Abstract] The resolution is specified as 5472×3648px, but it would improve clarity to explicitly state whether this applies to each image in the stereo pair or if there is any downsampling involved in the release.
- [Abstract] The abstract states that the work 'demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods,' but does not reference specific quantitative results, tables, or figures where these demonstrations are shown.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of the MODEST dataset to address gaps in real-world optical data. We address the single major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] The headline claim of capturing 'the optical realism and complexity of professional camera systems' across 'complex real scenes' rests on a selection of only 9 scenes described qualitatively by complexity, lighting, and background. No quantitative metrics of scene diversity (such as depth histograms, material coverage, or lighting variation statistics) or comparisons to standard scene benchmarks are provided, which is load-bearing for asserting that the dataset bridges the realism gap for broad professional use.
Authors: We agree that the current manuscript describes the nine scenes primarily through qualitative attributes (varying complexity, lighting, and background) and specific challenging elements such as reflections, transparencies, mirrors, and optical illusions. Quantitative metrics of scene diversity are indeed absent, which limits the strength of claims about broad realism and generalization. In the revised manuscript we will add a dedicated subsection on scene characterization that includes: (1) categorical statistics (e.g., number of scenes containing reflective surfaces, transparent elements, fine-grained textures, and multi-scale illusions), (2) basic lighting variation measures derived from image histograms and exposure metadata, and (3) a comparison table contrasting key scene properties against common benchmarks such as KITTI, Middlebury, and NYU Depth V2. Because the dataset consists of real-world captures without dense ground-truth depth, we will not fabricate depth histograms; instead we will report statistics on estimated depth ranges obtained from the stereo pairs using a standard baseline method, clearly labeled as such. These additions will be placed in Section 3 and referenced in the abstract. revision: yes
Circularity Check
No circularity: dataset release with no derivations or predictions
full rationale
The paper is a data-release contribution describing capture of 18000 high-resolution stereo images across 9 scenes with systematic variation in focal length and aperture. No equations, models, predictions, or fitted parameters appear in the provided text or abstract. The central claims rest on the empirical description of the capture protocol and scene selection rather than any derivation chain that could reduce to self-definition or self-citation. This is the expected non-finding for a dataset paper whose value is independent of internal mathematical structure.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard camera calibration models for intrinsics and extrinsics apply to the dedicated calibration image sets
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present the first high-resolution (5472×3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Dpdd: A deep photographic defocus dataset
Abdullah Abuolaim, Abhijith Punnappurath, and Michael S Brown. Dpdd: A deep photographic defocus dataset. In 2019 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 611–619. IEEE, 2019
work page 2019
-
[2]
Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second, 2024
work page 2024
-
[3]
A naturalistic open source movie for op- tical flow evaluation
Daniel J Butler, Jonas Wulff, Garrett B Stanley, and Michael J Black. A naturalistic open source movie for op- tical flow evaluation. InEuropean conference on computer vision, pages 611–625. Springer, 2012
work page 2012
-
[4]
OpenMVS: Multi-view stereo reconstruction library
Dan Cernea. OpenMVS: Multi-view stereo reconstruction library. 2020
work page 2020
-
[5]
Matterport3d: Learning from rgb-d data in indoor environments
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Hal- ber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments. InInternational Conference on 3D Vision (3DV), pages 667–676. IEEE, 2017
work page 2017
-
[6]
Monster: Marry monodepth to stereo unleashes power
Junda Cheng, Xueqin Wang, Wei Wang, Lei Zhu, Jian Liu, Xinyu Li, and Others. Monster: Marry monodepth to stereo unleashes power. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 10186–10196, 2025
work page 2025
-
[7]
Polarimetric multi-view stereo
Zhaopeng Cui, Jinwei Gu, Boxin Shi, Ping Tan, and Jan Kautz. Polarimetric multi-view stereo. In2017 IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 369–378, 2017
work page 2017
-
[8]
Scannet: Richly-annotated 3d reconstructions of indoor scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017
work page 2017
-
[9]
Virtual kitti: A synthetic dataset for evaluating stereo and optical flow
Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual kitti: A synthetic dataset for evaluating stereo and optical flow. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–10. IEEE, 2016
work page 2016
-
[10]
Ddad: A real-world dataset for unsupervised deep-learning-based depth and ego-motion estimation
Suman Garg, Qiao Wang, Siyuan Chen, Yanan Liu, Yux- uan Li, Yujie Wang, Wenqiang Zhang, Raquel Urtasun, and Yukun Li. Ddad: A real-world dataset for unsupervised deep-learning-based depth and ego-motion estimation. In 2020 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 7089–7095. IEEE, 2020
work page 2020
-
[11]
Are we ready for autonomous driving? the kitti vision benchmark suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361. IEEE, 2012
work page 2012
-
[12]
Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. Metric3d v2: A versatile monocular geomet- ric foundation model for zero-shot metric depth and surface normal estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):10579–10596, 2024
work page 2024
-
[13]
Vabd: A video aberration and blur dataset
Thomas Huang, Fu-Jen Tung, Yirui Sun, and Michael S Brown. Vabd: A video aberration and blur dataset. In 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 25016–25026. IEEE, 2024
work page 2024
-
[14]
Rendering natural camera bokeh effect with deep learning
Andrey Ignatov, Jagruti Patel, and Radu Timofte. Rendering natural camera bokeh effect with deep learning. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 418–419, 2020
work page 2020
-
[15]
Aim 2020 challenge on render- ing realistic bokeh
Andrey Ignatov, Radu Timofte, Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng, Juewen Peng, et al. Aim 2020 challenge on render- ing realistic bokeh. InEuropean Conference on Computer Vision, pages 213–228. Springer, 2020
work page 2020
-
[16]
Secret lies in color: Enhancing ai-generated images detection with color distribution anal- ysis
Zexi Jia, Chuanwei Huang, Yeshuang Zhu, Hongyan Fei, Xi- aoyue Duan, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Jin- chao Zhang, and Jie Zhou. Secret lies in color: Enhancing ai-generated images detection with color distribution anal- ysis. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13445–13454, 2025
work page 2025
-
[17]
DEFOM-Stereo: Depth foun- dation model based stereo matching
Hualie Jiang, Zexian Lou, Li Ding, Rui Xu, Mingtan Tan, Wei Jiang, and Rong Huang. DEFOM-Stereo: Depth foun- dation model based stereo matching. InIEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[18]
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Nikhil Keetha, Norman M ¨uller, Johannes Sch ¨onberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, et al. Mapanything: Universal feed-forward metric 3d re- construction.arXiv preprint arXiv:2509.13414, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
3d gaussian splatting for real-time radiance field rendering, 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering, 2023
work page 2023
-
[20]
Evaluation of cnn-based single-image depth estimation methods
Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Korner. Evaluation of cnn-based single-image depth estimation methods. InProceedings of the European Con- ference on Computer Vision (ECCV) Workshops, pages 0–0, 2018
work page 2018
-
[21]
ibims-1: A dataset for rigid multi-view stereo
Tobias Koch, Christian Hane, Johannes Jordan, and Friedrich Fraundorfer. ibims-1: A dataset for rigid multi-view stereo. In2020 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 7065–7071. IEEE, 2020
work page 2020
-
[22]
Efficient frequency domain-based trans- formers for high-quality image deblurring
Lingshun Kong, Jiangxin Dong, Jianjun Ge, Mingqiang Li, and Jinshan Pan. Efficient frequency domain-based trans- formers for high-quality image deblurring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 5886–5895, 2023
work page 2023
-
[23]
Efficient visual state space model for image deblurring
Lingshun Kong, Jiangxin Dong, Jinhui Tang, Ming-Hsuan Yang, and Jinshan Pan. Efficient visual state space model for image deblurring. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12710–12719, 2025
work page 2025
-
[24]
Hammer: a large-scale, hand-object, multi- view, temporally-and-spatially-annotated dataset
Hamid Laga, Sutanu Jati, Ilaria Falco, Simone Melzi, Marco Manzo, Freek Stulp, Umberto Castellani, Antti Oulasvirta, and Chi Ren. Hammer: a large-scale, hand-object, multi- view, temporally-and-spatially-annotated dataset. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21008–21017. IEEE, 2022
work page 2022
-
[25]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: 12 Representing scenes as neural radiance fields for view syn- thesis, 2020
work page 2020
-
[26]
Deep multi-scale convolutional neural network for dynamic scene deblurring
Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2671–2680, 2017
work page 2017
-
[27]
Bokehme: When neural rendering meets classical rendering
Juewen Peng, Zhiguo Cao, Xianrui Luo, Hao Lu, Ke Xian, and Jianming Zhang. Bokehme: When neural rendering meets classical rendering. InProceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[28]
Bokehme: When neural rendering meets classical rendering
Juewen Peng, Zhiguo Cao, Xianrui Luo, Hao Lu, Ke Xian, and Jianming Zhang. Bokehme: When neural rendering meets classical rendering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16283–16292, 2022
work page 2022
-
[29]
Unidepthv2: Universal monocular metric depth estimation made simpler, 2025
Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. Unidepthv2: Universal monocular metric depth estimation made simpler, 2025
work page 2025
-
[30]
Structure-from-motion revisited
Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InConference on Com- puter Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[31]
Pixelwise view selection for un- structured multi-view stereo
Johannes Lutz Sch ¨onberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for un- structured multi-view stereo. InEuropean Conference on Computer Vision (ECCV), 2016
work page 2016
-
[32]
Eth3d: A benchmark for multi-view stereo
Thomas Sch ¨ops, Johannes L Sch ¨onberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and An- dreas Geiger. Eth3d: A benchmark for multi-view stereo. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3954–3963. IEEE, 2017
work page 2017
-
[33]
Yichen Sheng, Zixun Yu, Lu Ling, Zhiwen Cao, Xuaner Zhang, Xin Lu, Ke Xian, Haiting Lin, and Bedrich Benes. Dr. bokeh: Differentiable occlusion-aware bokeh rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4515–4525, 2024
work page 2024
-
[34]
Indoor segmentation and support inference from rgbd images
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. InEuropean conference on computer vision, pages 746–760. Springer, 2012
work page 2012
-
[35]
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Brandon Smart, Chuanxia Zheng, Iro Laina, and Vic- tor Adrian Prisacariu. Splatt3r: Zero-shot gaussian splatting from uncalibrated image pairs.arXiv preprint arXiv:2408.13912, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
The Replica Dataset: A Digital Replica of Indoor Spaces
Julian Straub, Manel Galindo, Dhruv Jayaraman, Sudeep Ra- makrishnan, Daniel Gordon, Richard Newcombe, Georgia Gkioxari, and Jitendra Malik. The replica dataset: A digital replica of indoor spaces.arXiv preprint arXiv:1906.05797, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[37]
A bench- mark for the evaluation of rgb-d slam systems
J ¨urgen Sturm, Jakob Engel, and Daniel Cremers. A bench- mark for the evaluation of rgb-d slam systems. In2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573–580. IEEE, 2012
work page 2012
-
[38]
Diode: A dense indoor and outdoor depth dataset
Igor Vlasic, Maria Shugrina, Or Litany, Angela Dai, and Matthias Nießner. Diode: A dense indoor and outdoor depth dataset. In2019 International Conference on 3D Vision (3DV), pages 310–320. IEEE, 2019
work page 2019
-
[39]
V oid: A new dataset and a baseline for void region filling
Lei Wang, Jian-Fang Zhang, Yebin Wang, Kun Yu, Yizhou Liu, and Tian Wu. V oid: A new dataset and a baseline for void region filling. InIEEE Transactions on Pattern Analysis and Machine Intelligence, pages 3155–3169. IEEE, 2020
work page 2020
-
[40]
Selective-stereo: Adaptive frequency information selection for stereo matching
Xianqi Wang, Gangwei Xu, Hao Jia, and Xin Yang. Selective-stereo: Adaptive frequency information selection for stereo matching. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 19701–19710, 2024
work page 2024
-
[42]
Foundationstereo: Zero- shot stereo matching.CVPR, 2025
Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching.CVPR, 2025
work page 2025
-
[43]
Unsuper- vised monocular depth learning in dynamic scenes
Alex Wong, Wei-Chih Chiu, and Stefano Soatto. Unsuper- vised monocular depth learning in dynamic scenes. InCon- ference on Robot Learning, pages 1016–1031. PMLR, 2020
work page 2020
-
[44]
nlmvs-net: Deep non-lambertian multi-view stereo
Kohei Yamashita, Yuto Enyo, Shohei Nobuhara, and Ko Nishino. nlmvs-net: Deep non-lambertian multi-view stereo. InProceedings of the IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV), 2023
work page 2023
-
[45]
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2, 2024
work page 2024
-
[46]
A practical 3d reconstruc- tion method for weak texture scenes.Remote Sensing, 13 (16), 2021
Xuyuan Yang and Guang Jiang. A practical 3d reconstruc- tion method for weak texture scenes.Remote Sensing, 13 (16), 2021
work page 2021
-
[47]
3d visual illusion depth estimation.arXiv preprint arXiv:2505.13061, 2025
Chengtang Yao, Zhidan Liu, Jiaxi Zeng, Lidong Yu, Yuwei Wu, and Yunde Jia. 3d visual illusion depth estimation.arXiv preprint arXiv:2505.13061, 2025
-
[48]
Scan- net++: A high-fidelity dataset of 3d indoor scenes
Chandan Yeshwanth, Shubham Tulsiani, Ishan Nerurkar, Georgia Gkioxari, Jitendra Malik, and Angela Dai. Scan- net++: A high-fidelity dataset of 3d indoor scenes. In 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 20997–21007. IEEE, 2024
work page 2024
-
[49]
Restormer: Efficient transformer for high-resolution image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022
work page 2022
-
[50]
Blur-aware lens blur synthesis
Jhih-Ciang Zheng, Fu-Jen Tung, and Michael S Brown. Blur-aware lens blur synthesis. In2022 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 17845–17854. IEEE, 2022
work page 2022
-
[51]
Bokehdiff: Neu- ral lens blur with one-step diffusion
Chengxuan Zhu, Qingnan Fan, Qi Zhang, Jinwei Chen, Huaqi Zhang, Chao Xu, and Boxin Shi. Bokehdiff: Neu- ral lens blur with one-step diffusion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9508–9518, 2025. 13
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.