3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis
Pith reviewed 2026-05-10 15:41 UTC · model grok-4.3
The pith
A feedforward network called 3DTV performs real-time novel view synthesis from sparse multi-view video without per-scene optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
3DTV is a feedforward interpolation network for real-time sparse-view synthesis. A Delaunay-based triplet selection ensures angular coverage for each target view. A pose-aware depth module estimates a coarse-to-fine depth pyramid that supports efficient feature reprojection and occlusion-aware blending. The network runs entirely feedforward without retraining or explicit proxies and achieves a practical balance of quality and speed on challenging multi-view video datasets.
What carries the argument
The combination of Delaunay triangulation for selecting input camera triplets and a pose-aware depth module that produces a multi-scale depth pyramid for feature warping and blending.
If this is right
- Real-time free-viewpoint rendering becomes feasible for interactive AR, VR, and telepresence without offline optimization.
- The system produces robust results across diverse scenes because it avoids reliance on explicit geometric proxies.
- Low-latency multi-view streaming and interactive rendering become practical on standard hardware.
- Quality and efficiency trade-offs improve over prior real-time novel-view baselines on multi-view video data.
Where Pith is reading between the lines
- The same triplet-plus-depth-pyramid structure might extend to dynamic scenes if the depth module is replaced by a temporal version.
- Integration with existing video compression pipelines could further reduce bandwidth for streaming interpolated viewpoints.
- The feedforward design suggests that lightweight geometric priors can replace heavy optimization in other view-synthesis tasks.
Load-bearing premise
The Delaunay triplet selection always supplies enough angular coverage and the estimated depth maps are accurate enough for occlusion handling without any scene-specific tuning.
What would settle it
Running the method on a scene whose camera layout yields poor triangulation coverage or where depth estimation fails on thin structures or reflections, then measuring whether output quality drops below that of the compared real-time baselines.
Figures
read the original abstract
Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this challenge by combining lightweight geometry with learning and propose 3DTV, a feedforward network for real-time sparse-view interpolation. A Delaunay-based triplet selection ensures angular coverage for each target view. Building on this, we introduce a pose-aware depth module that estimates a coarse-to-fine depth pyramid, enabling efficient feature reprojection and occlusion-aware blending. Unlike methods that require scene-specific optimization, 3DTV runs feedforward without retraining, making it practical for AR/VR, telepresence, and interactive applications. Our experiments on challenging multi-view video datasets demonstrate that 3DTV consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines. Crucially, 3DTV avoids explicit proxies, enabling robust rendering across diverse scenes. This makes it a practical solution for low-latency multi-view streaming and interactive rendering. Project Page: https://stefanmschulz.github.io/3DTV_webpage/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces 3DTV, a feedforward interpolation network for real-time view synthesis from sparse multi-view inputs. It employs Delaunay-based triplet selection to ensure angular coverage and a pose-aware depth module with a coarse-to-fine pyramid for efficient feature reprojection and occlusion-aware blending. The approach is designed to run without scene-specific optimization or retraining, and the authors report that it achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view synthesis baselines on multi-view video datasets while avoiding explicit proxies for robust rendering across diverse scenes.
Significance. If the experimental results are substantiated, this work could have significant impact on practical applications in AR/VR, telepresence, and interactive rendering by providing a lightweight, feedforward alternative to optimization-heavy methods. The combination of geometric priors (Delaunay selection) with learned components (depth estimation) without requiring per-scene tuning is a promising direction for generalization. However, the current presentation lacks sufficient quantitative evidence to fully evaluate the claims.
major comments (2)
- [Abstract and Experiments] The abstract claims that 3DTV 'consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines' and 'enabling robust rendering across diverse scenes,' but provides no quantitative metrics (e.g., PSNR, SSIM, runtime), specific baselines, dataset details, or error analysis. This makes it impossible to verify the outperformance and robustness claims without the full results section.
- [Pose-aware depth module] The feedforward claim and robust rendering across diverse scenes rest on the pose-aware depth module (coarse-to-fine pyramid) producing depth maps sufficiently precise for feature reprojection and occlusion-aware blending without scene-specific optimization or explicit proxies. The manuscript should include ablation studies or depth accuracy metrics to demonstrate that depth errors do not lead to visible artifacts in textureless or occluded regions.
minor comments (1)
- [Notation] Ensure consistent use of terms like 'pose-aware depth module' and 'coarse-to-fine depth pyramid' throughout the paper to avoid confusion.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to better substantiate the claims in the abstract and provide additional evidence for the depth module. We address each point below and have revised the manuscript to incorporate the suggestions where feasible.
read point-by-point responses
-
Referee: [Abstract and Experiments] The abstract claims that 3DTV 'consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines' and 'enabling robust rendering across diverse scenes,' but provides no quantitative metrics (e.g., PSNR, SSIM, runtime), specific baselines, dataset details, or error analysis. This makes it impossible to verify the outperformance and robustness claims without the full results section.
Authors: We agree that the abstract would benefit from additional context to support the summarized claims. In the revised manuscript, we have updated the abstract to include brief references to key quantitative results (e.g., average PSNR/SSIM gains and runtime on the evaluated multi-view video datasets) and the main baselines compared. The full experimental details, including specific dataset descriptions, error analysis, and comparisons to methods such as recent real-time NVS baselines, remain in Section 4. This change improves accessibility without lengthening the abstract excessively. revision: yes
-
Referee: [Pose-aware depth module] The feedforward claim and robust rendering across diverse scenes rest on the pose-aware depth module (coarse-to-fine pyramid) producing depth maps sufficiently precise for feature reprojection and occlusion-aware blending without scene-specific optimization or explicit proxies. The manuscript should include ablation studies or depth accuracy metrics to demonstrate that depth errors do not lead to visible artifacts in textureless or occluded regions.
Authors: The pose-aware depth module is evaluated through its contribution to end-to-end rendering quality across diverse scenes, as shown in our experiments without per-scene optimization. We acknowledge the value of targeted ablations. In the revised version, we have added an ablation study in Section 4.3 comparing the coarse-to-fine pyramid against a single-scale depth estimator, with qualitative examples and rendering metrics demonstrating reduced artifacts in occluded and textureless regions. Standalone depth accuracy metrics (e.g., absolute depth error) are not reported because ground-truth depth is unavailable for the primary video datasets; our evaluation prioritizes perceptual rendering quality, which indirectly validates the depth module's precision for reprojection and blending. revision: partial
Circularity Check
No significant circularity; empirical architecture validated externally
full rationale
The paper describes 3DTV as a feedforward neural network combining lightweight geometry with learning for sparse-view interpolation. It relies on architectural components (Delaunay triplet selection, pose-aware coarse-to-fine depth pyramid, feature reprojection, occlusion-aware blending) whose validity is asserted via experiments on multi-view video datasets and comparisons to real-time baselines. No equations, derivations, or first-principles results are presented that reduce by construction to fitted inputs, self-definitions, or self-citations. The method is explicitly positioned as running without scene-specific optimization or retraining, with claims resting on external empirical benchmarks rather than internal tautologies. This matches the default case of a self-contained empirical contribution.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights
axioms (2)
- domain assumption Delaunay triangulation ensures adequate angular coverage for each target view
- domain assumption Coarse-to-fine depth estimation supports efficient and accurate feature reprojection and occlusion handling
Reference graph
Works this paper leans on
-
[1]
Boyao Zhou, Shunyuan Zheng, Hanzhang Tu, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, and Yebin Liu. Gps-gaussian+: Generalizable pixel-wise 3d gaussian splatting for real-time human-scene rendering from sparse views.arXiv preprint arXiv:2411.11363, 2024
-
[2]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Com- munications of the ACM, 65(1):99–106, 2021
work page 2021
-
[3]
Instant neural graphics primitives with a multiresolution hash encoding
Thomas M¨ uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1– 15, 2022
work page 2022
-
[4]
3d gaussian splatting for real-time radiance field rendering
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk¨ uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1, 2023
work page 2023
-
[5]
Zanjani, Haitam Ben Yahia, Yuki M
Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Juergen Gall, and Amirhossein Habibian. Valid: Variable-length input diffusion for novel view synthesis, 2023
work page 2023
-
[6]
Sparsefu- sion: Distilling view-conditioned diffusion for 3d reconstruction
Zhizhuo Zhou and Shubham Tulsiani. Sparsefu- sion: Distilling view-conditioned diffusion for 3d reconstruction. InCVPR, 2023
work page 2023
-
[7]
View interpolation for image synthesis
Shenchang Eric Chen and Lance Williams. View interpolation for image synthesis. InProceed- ings of the 20th Annual Conference on Com- puter Graphics and Interactive Techniques (SIG- GRAPH ’93), pages 279–288. ACM, 1993
work page 1993
-
[8]
Nerfstudio: A modular framework for neural radiance field development
Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, and Angjoo Kanazawa. Nerfstudio: A modular framework for neural radiance field development. InACM SIGGRAPH 2023 Conference Proceed- ings, SIGGRAPH ’23, 2023
work page 2023
-
[9]
Riftcast: A template-free end-to-end multi-view live telep- resence framework and benchmark
Domenic Zingsheim, Markus Plack, Hannah Dr¨ oge, Janelle Pfeifer, Patrick Stotko, Matthias Hullin, and Reinhard Klein. Riftcast: A template-free end-to-end multi-view live telep- resence framework and benchmark. InProceed- 11 3DTV: A Feedforward Interpolation Network S.Schulz et al. ings of the 33rd ACM International Conference on Multimedia, 2025
work page 2025
-
[10]
Ef- ficient neural radiance fields for interactive free- viewpoint video
Haotong Lin, Sida Peng, Zhen Xu, Yunzhi Yan, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Ef- ficient neural radiance fields for interactive free- viewpoint video. InSIGGRAPH Asia Confer- ence Proceedings, 2022
work page 2022
-
[11]
Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, and Yebin Liu. Gps-gaussian: Generalizable pixel- wise 3d gaussian splatting for real-time hu- man novel view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[12]
Frugalnerf: Fast convergence for extreme few- shot novel view synthesis without learned priors
Chin-Yang Lin, Chung-Ho Wu, Chang-Han Yeh, Shih-Han Yen, Cheng Sun, and Yu-Lun Liu. Frugalnerf: Fast convergence for extreme few- shot novel view synthesis without learned priors. CVPR, 2025
work page 2025
-
[13]
Dhruv Mahajan, Fu-Chung Huang, Wojciech Matusik, Ravi Ramamoorthi, and Peter Bel- humeur. Moving gradients: a path-based method for plausible image interpolation.ACM Transactions on Graphics (TOG), 28(3):1–11, 2009
work page 2009
-
[14]
Frame inter- polation with occlusion detection using a time coherent segmentation
Rida Sadek, Coloma Ballester, Luis Garrido, En- ric Meinhardt, and Vicent Caselles. Frame inter- polation with occlusion detection using a time coherent segmentation. InInternational Confer- ence on Computer Vision Theory and Applica- tions, volume 2, pages 367–372. SCITEPRESS, 2012
work page 2012
-
[15]
Motion compensated frame interpolation with a symmetric optical flow constraint
Lars Lau Rakˆ et, Lars Roholm, Andr´ es Bruhn, and Joachim Weickert. Motion compensated frame interpolation with a symmetric optical flow constraint. InInternational Symposium on Visual Computing, pages 447–457. Springer, 2012
work page 2012
-
[16]
Phase-based frame interpolation for video
Simone Meyer, Oliver Wang, Henning Zimmer, Max Grosse, and Alexander Sorkine-Hornung. Phase-based frame interpolation for video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1410–1418, 2015
work page 2015
-
[17]
Learning image matching by simply watching video
Gucan Long, Laurent Kneip, Jose M Alvarez, Hongdong Li, Xiaohu Zhang, and Qifeng Yu. Learning image matching by simply watching video. InEuropean Conference on Computer Vi- sion, pages 434–450. Springer, 2016
work page 2016
-
[18]
Super slomo: High quality estimation of multiple intermediate frames for video interpola- tion
Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, and Jan Kautz. Super slomo: High quality estimation of multiple intermediate frames for video interpola- tion. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 9000–9008, 2018
work page 2018
-
[19]
Video frame interpolation via adaptive separable con- volution
Simon Niklaus, Long Mai, and Feng Liu. Video frame interpolation via adaptive separable con- volution. InProceedings of the IEEE interna- tional conference on computer vision, pages 261– 270, 2017
work page 2017
-
[20]
A flexible recurrent residual pyramid net- work for video frame interpolation
Haoxian Zhang, Yang Zhao, and Ronggang Wang. A flexible recurrent residual pyramid net- work for video frame interpolation. InEuropean conference on computer vision, pages 474–491. Springer, 2020
work page 2020
-
[21]
Bmbc: Bilateral motion estima- tion with bilateral cost volume for video inter- polation
Junheum Park, Keunsoo Ko, Chul Lee, and Chang-Su Kim. Bmbc: Bilateral motion estima- tion with bilateral cost volume for video inter- polation. InEuropean conference on computer vision, pages 109–125. Springer, 2020
work page 2020
-
[22]
Long- term video frame interpolation via feature prop- agation, 2022
Dawit Mureja Argaw and In So Kweon. Long- term video frame interpolation via feature prop- agation, 2022
work page 2022
-
[23]
Film: Frame interpolation for large motion, 2022
Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, and Brian Cur- less. Film: Frame interpolation for large motion, 2022
work page 2022
-
[24]
Video frame interpolation with transformer, 2022
Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, and Jiaya Jia. Video frame interpolation with transformer, 2022
work page 2022
-
[25]
Motion-aware video frame interpolation, 2024
Pengfei Han, Fuhua Zhang, Bin Zhao, and Xue- long Li. Motion-aware video frame interpolation, 2024
work page 2024
-
[26]
Wonyong Seo, Jihyong Oh, and Munchurl Kim. Bim-vfi: directional motion field-guided frame interpolation for video with non-uniform mo- tions, 2024
work page 2024
-
[27]
Phasenet for video frame interpolation
Simone Meyer, Abdelaziz Djelouah, Brian McWilliams, Alexander Sorkine-Hornung, Markus Gross, and Christopher Schroers. Phasenet for video frame interpolation. InPro- ceedings of the IEEE Conference on Computer 12 3DTV: A Feedforward Interpolation Network S.Schulz et al. Vision and Pattern Recognition, pages 498–507, 2018
work page 2018
-
[28]
Hierarchical flow diffusion for efficient frame interpolation, 2025
Yang Hai, Guo Wang, Tan Su, Wenjie Jiang, and Yinlin Hu. Hierarchical flow diffusion for efficient frame interpolation, 2025
work page 2025
-
[29]
Eden: Enhanced diffusion for high-quality large- motion video frame interpolation
Zihao Zhang, Haoran Chen, Haoyu Zhao, Guan- song Lu, Yanwei Fu, Hang Xu, and Zuxuan Wu. Eden: Enhanced diffusion for high-quality large- motion video frame interpolation. InProceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2025
work page 2025
-
[30]
Time-adaptive video frame in- terpolation based on residual diffusion, 2025
Victor Fonte Chavez, Claudia Esteves, and Jean- Bernard Hayet. Time-adaptive video frame in- terpolation based on residual diffusion, 2025
work page 2025
-
[31]
Depth-aware video frame interpolation
Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. Depth-aware video frame interpolation. InPro- ceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 3703– 3712, 2019
work page 2019
-
[32]
A theory of shape by space carving.International journal of computer vision, 38(3):199–218, 2000
Kiriakos N Kutulakos and Steven M Seitz. A theory of shape by space carving.International journal of computer vision, 38(3):199–218, 2000
work page 2000
-
[33]
Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis.IEEE transactions on pattern analysis and machine in- telligence, 32(8):1362–1376, 2009
work page 2009
-
[34]
Using multi- ple hypotheses to improve depth-maps for multi- view stereo
Neill DF Campbell, George Vogiatzis, Carlos Hern´ andez, and Roberto Cipolla. Using multi- ple hypotheses to improve depth-maps for multi- view stereo. InEuropean conference on computer vision, pages 766–779. Springer, 2008
work page 2008
-
[35]
Engin Tola, Christoph Strecha, and Pascal Fua. Efficient large-scale multi-view stereo for ultra high-resolution image sets.Machine Vision and Applications, 23(5):903–920, 2012
work page 2012
-
[36]
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. Mvsnet: Depth inference for un- structured multi-view stereo.European Confer- ence on Computer Vision (ECCV), 2018
work page 2018
-
[37]
Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. Recurrent mvsnet for high-resolution multi-view stereo depth infer- ence.Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[38]
Cascade cost volume for high-resolution multi-view stereo and stereo matching, 2020
Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, and Ping Tan. Cascade cost volume for high-resolution multi-view stereo and stereo matching, 2020
work page 2020
-
[39]
Cost volume pyramid based depth inference for multi-view stereo
Jiayu Yang, Wei Mao, Jose M Alvarez, and Miaomiao Liu. Cost volume pyramid based depth inference for multi-view stereo. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4877–4886, 2020
work page 2020
-
[40]
Deep stereo using adaptive thin volume rep- resentation with uncertainty awareness
Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, and Hao Su. Deep stereo using adaptive thin volume rep- resentation with uncertainty awareness. InPro- ceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 2524– 2534, 2020
work page 2020
-
[41]
End-to-end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Syn- naeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. InEuropean conference on computer vision, pages 213–229. Springer, 2020
work page 2020
-
[42]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexan- der Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[43]
Kai Han, Yunhe Wang, Hanting Chen, Xing- hao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, et al. A survey on vision transformer.IEEE trans- actions on pattern analysis and machine intelli- gence, 45(1):87–110, 2022
work page 2022
-
[44]
Transmvsnet: Global context- aware multi-view stereo network with transform- ers
Yikang Ding, Wentao Yuan, Qingtian Zhu, Hao- tian Zhang, Xiangyue Liu, Yuanjiang Wang, and Xiao Liu. Transmvsnet: Global context- aware multi-view stereo network with transform- ers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8585–8594, 2022
work page 2022
-
[45]
Wt-mvsnet: Window-based trans- formers for multi-view stereo, 2022
Jinli Liao, Yikang Ding, Yoli Shavit, Dihe Huang, Shihao Ren, Jia Guo, Wensen Feng, and Kai Zhang. Wt-mvsnet: Window-based trans- formers for multi-view stereo, 2022
work page 2022
-
[46]
Multi-view stereo with transformer, 2021
Jie Zhu, Bo Peng, Wanqing Li, Haifeng Shen, Zhe Zhang, and Jianjun Lei. Multi-view stereo with transformer, 2021. 13 3DTV: A Feedforward Interpolation Network S.Schulz et al
work page 2021
-
[47]
Mvster: Epipolar transformer for efficient multi-view stereo, 2022
Xiaofeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He, and Xingang Wang. Mvster: Epipolar transformer for efficient multi-view stereo, 2022
work page 2022
-
[48]
Ct- mvsnet: Efficient multi-view stereo with cross- scale transformer, 2024
Sicheng Wang, Hao Jiang, and Lei Xiang. Ct- mvsnet: Efficient multi-view stereo with cross- scale transformer, 2024
work page 2024
-
[49]
Shaoqian Wang, Xiaokun Ding, Yuxin Mao, and Yuchao Dai. Etv-mvs: Robust visibility- aware multi-view stereo with epipolar line-based transformer.Big Data Mining and Analytics, 8(3):520–533, 2025
work page 2025
-
[50]
Rc-mvsnet: Unsupervised multi-view stereo with neural rendering
Di Chang, Aljaˇ z Boˇ ziˇ c, Tong Zhang, Qingsong Yan, Yingcong Chen, Sabine S¨ usstrunk, and Matthias Nießner. Rc-mvsnet: Unsupervised multi-view stereo with neural rendering. InPro- ceedings of the European conference on computer vision (ECCV), 2022
work page 2022
-
[51]
Nope-nerf: Optimising neural radiance field with no pose prior
Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, and Victor Adrian Prisacariu. Nope-nerf: Optimising neural radiance field with no pose prior. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recogni- tion, pages 4160–4169, 2023
work page 2023
-
[52]
Halluci- nated neural radiance fields in the wild
Xingyu Chen, Qi Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, and Jue Wang. Halluci- nated neural radiance fields in the wild. InPro- ceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 12943–12952, 2022
work page 2022
-
[53]
Plenoxels: Radiance fields with- out neural networks
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields with- out neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022
work page 2022
-
[54]
Nerf in the wild: Neural radiance fields for unconstrained photo collections
Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7210–7219, 2021
work page 2021
-
[55]
Comapgs: Covisibility map-based gaussian splatting for sparse novel view synthesis
Youngkyoon Jang and Eduardo P´ erez-Pellitero. Comapgs: Covisibility map-based gaussian splatting for sparse novel view synthesis. InPro- ceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[56]
Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaus- sian: Optimizing sparse-view 3d gaussian ra- diance fields with global-local depth normaliza- tion. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 20775–20785, 2024
work page 2024
-
[57]
Coherentgs: Sparse novel view synthesis with coherent 3d gaussians
Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, and Nima Khademi Kalantari. Coherentgs: Sparse novel view synthesis with coherent 3d gaussians. InEuropean Conference on Computer Vision, pages 19–37. Springer, 2024
work page 2024
-
[58]
Dense point clouds matter: Dust-gs for scene reconstruc- tion from sparse viewpoints
Shen Chen, Jiale Zhou, and Lei Li. Dense point clouds matter: Dust-gs for scene reconstruc- tion from sparse viewpoints. InICASSP 2025- 2025 IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025
work page 2025
-
[59]
InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024
Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. Instantsplat: Unbounded sparse- view pose-free gaussian splatting in 40 seconds. arXiv preprint arXiv:2403.20309, 2(3):4, 2024
-
[60]
Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra- fast single-view 3d reconstruction.Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[61]
Speedy- splat: Fast 3d gaussian splatting with sparse pix- els and sparse primitives
Alex Hanson, Allen Tu, Geng Lin, Vasu Singla, Matthias Zwicker, and Tom Goldstein. Speedy- splat: Fast 3d gaussian splatting with sparse pix- els and sparse primitives. InProceedings of the Computer Vision and Pattern Recognition Con- ference, pages 21537–21546, 2025
work page 2025
-
[62]
Compgs: Smaller and faster gaussian splatting with vector quantization
KL Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, and Hamed Pir- siavash. Compgs: Smaller and faster gaussian splatting with vector quantization. InEuropean Conference on Computer Vision, pages 330–349. Springer, 2024. 14 3DTV: A Feedforward Interpolation Network S.Schulz et al
work page 2024
-
[63]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Ad- vances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[64]
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation.Journal of Machine Learning Research, 23(47):1–33, 2022
work page 2022
-
[65]
Zero-shot novel view and depth synthesis with multi-view geometric diffusion, 2025
Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich, and Rares Ambrus. Zero-shot novel view and depth synthesis with multi-view geometric diffusion, 2025
work page 2025
-
[66]
Bolt3d: Generating 3d scenes in seconds,
Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T. Barron, and Philipp Henzler. Bolt3D: Generating 3D Scenes in Seconds. arXiv:2503.14445, 2025
-
[67]
Novel view synthesis with diffusion models
Daniel Watson, William Chan, Ricardo Martin- Brualla, Jonathan Ho, Andrea Tagliasacchi, and Mohammad Norouzi. Novel view syn- thesis with diffusion models.arXiv preprint arXiv:2210.04628, 2022
-
[68]
Novel view synthesis with pixel-space diffusion models.arXiv preprint arXiv:2411.07765, 2024
Noam Elata, Bahjat Kawar, Yaron Ostrovsky- Berman, Miriam Farber, and Ron Sokolovsky. Novel view synthesis with pixel-space diffusion models.arXiv preprint arXiv:2411.07765, 2024
-
[69]
pixelnerf: Neural radiance fields from one or few images
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neural radiance fields from one or few images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4578–4587, 2021
work page 2021
-
[70]
Ibrnet: Learning multi-view image-based rendering
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. Ibrnet: Learning multi-view image-based rendering. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2021
work page 2021
-
[71]
Fwd: Real-time novel view synthesis with for- ward warping and depth, 2022
Ang Cao, Chris Rockwell, and Justin Johnson. Fwd: Real-time novel view synthesis with for- ward warping and depth, 2022
work page 2022
-
[72]
Fast and explicit neural view synthesis
Pengsheng Guo, Miguel Angel Bautista, Alex Colburn, Liang Yang, Daniel Ulbricht, Joshua M Susskind, and Qi Shan. Fast and explicit neural view synthesis. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3791–3800, 2022
work page 2022
-
[73]
Snap-snap: Taking two im- ages to reconstruct 3d human gaussians in mil- liseconds, 2025
Jia Lu, Taoran Yi, Jiemin Fang, Chen Yang, Chuiyun Wu, Wei Shen, Wenyu Liu, Qi Tian, and Xinggang Wang. Snap-snap: Taking two im- ages to reconstruct 3d human gaussians in mil- liseconds, 2025
work page 2025
-
[74]
Fast, mini- mum storage ray/triangle intersection
Tomas M¨ oller and Ben Trumbore. Fast, mini- mum storage ray/triangle intersection. InACM SIGGRAPH 2005 Courses, SIGGRAPH ’05, page 7–es, New York, NY, USA, 2005. Associ- ation for Computing Machinery
work page 2005
-
[75]
Ghost- net: More features from cheap operations
Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. Ghost- net: More features from cheap operations. In 2020 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 1577–1586, 2020
work page 2020
-
[76]
Ghostnetv2: enhance cheap operation with long-range atten- tion
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, and Yunhe Wang. Ghostnetv2: enhance cheap operation with long-range atten- tion. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA, 2022. Curran Associates Inc
work page 2022
-
[77]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vi- jay Vasudevan, Quoc V. Le, and Hartwig Adam. Searching for mobilenetv3, 2019
work page 2019
-
[78]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Semantic image segmentation with deep convo- lutional nets and fully connected crfs, 2016
work page 2016
-
[79]
Group-wise cor- relation stereo network
Xiaoyang Guo, Kai Yang, Wukui Yang, Xiao- gang Wang, and Hongsheng Li. Group-wise cor- relation stereo network. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3273–3282, 2019
work page 2019
-
[80]
Perceptual losses for real-time style transfer and super-resolution, 2016
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution, 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.