BulletGen: Improving 4D Reconstruction with Bullet-Time Generation
Pith reviewed 2026-05-19 07:59 UTC · model grok-4.3
The pith
BulletGen improves 4D reconstructions from monocular videos by aligning diffusion-generated frames at one frozen bullet-time step to supervise Gaussian optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BulletGen aligns the output of a diffusion-based video generation model with the 4D reconstruction at a single frozen bullet-time step. The generated frames are then used to supervise the optimization of the 4D Gaussian model, seamlessly blending generative content with both static and dynamic scene components.
What carries the argument
The bullet-time alignment step that matches diffusion-generated video frames to the existing 4D Gaussian reconstruction at one chosen frozen time to provide additional supervision signals.
If this is right
- Better novel-view synthesis for dynamic scenes from casual monocular input.
- Improved accuracy on both 2D and 3D tracking tasks.
- More reliable handling of unseen regions and monocular depth ambiguities.
- Seamless integration of generative content into both static and moving parts of the scene.
Where Pith is reading between the lines
- The same alignment idea could be tested on other dynamic representations such as neural radiance fields or mesh-based models.
- Extending the method to multiple bullet-time steps might reduce drift over long video sequences.
- If generation speed improves, the approach could support online refinement during capture.
- Similar supervision could help correct other monocular reconstruction failures like those in structure-from-motion pipelines.
Load-bearing premise
The diffusion video model can generate frames that match the current 4D Gaussian reconstruction at the chosen bullet-time step without adding new inconsistencies or artifacts that damage the overall optimization.
What would settle it
A side-by-side comparison of final 4D models trained with and without the bullet-time generated supervision, measured by how well novel views match held-out real frames or how accurately 3D tracks follow ground-truth motion.
Figures
read the original abstract
Transforming casually captured, monocular videos into fully immersive dynamic experiences is a highly ill-posed task, and comes with significant challenges, e.g., reconstructing unseen regions, and dealing with the ambiguity in monocular depth estimation. In this work we introduce BulletGen, an approach that takes advantage of generative models to correct errors and complete missing information in a Gaussian-based dynamic scene representation. This is done by aligning the output of a diffusion-based video generation model with the 4D reconstruction at a single frozen "bullet-time" step. The generated frames are then used to supervise the optimization of the 4D Gaussian model. Our method seamlessly blends generative content with both static and dynamic scene components, achieving state-of-the-art results on both novel-view synthesis, and 2D/3D tracking tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BulletGen, a method that improves 4D Gaussian-based dynamic scene reconstruction from monocular videos by conditioning a pre-trained diffusion video generation model on a single frozen bullet-time step extracted from the current reconstruction. The generated frames are used as additional supervision signals to optimize the 4D model, with the aim of correcting reconstruction errors and completing missing information in unseen regions while handling monocular depth ambiguity. The authors claim this yields state-of-the-art results on novel-view synthesis and 2D/3D tracking tasks.
Significance. If the generated frames maintain geometric and temporal consistency with the underlying 4D structure, the approach could provide an effective way to inject generative priors into reconstruction pipelines, addressing key ill-posed aspects of casual video capture. The single-step bullet-time conditioning offers a computationally lightweight integration point between diffusion models and Gaussian representations.
major comments (2)
- [§3.2] §3.2 (Bullet-time conditioning): The method description provides no explicit consistency loss, reprojection check, or warping verification between the diffusion-generated frames and the 4D Gaussian splats at the chosen bullet-time step. Because the initial reconstruction is incomplete due to monocular ambiguities, any misalignment would be directly incorporated as pseudo-ground-truth during optimization, undermining the central claim of seamless blending without new artifacts.
- [§5] §5 (Experiments): The reported state-of-the-art performance on novel-view synthesis and tracking lacks accompanying ablation studies isolating the contribution of the generative supervision versus a baseline 4D Gaussian optimization without bullet-time generation. This makes it difficult to assess whether the claimed improvements are load-bearing on the proposed alignment mechanism.
minor comments (2)
- [Abstract] The abstract asserts quantitative superiority without referencing specific metrics, datasets, or baseline comparisons; moving a concise summary of key numbers to the abstract would improve accessibility.
- [§2] Notation for the 4D Gaussian parameters (e.g., distinguishing time-dependent deformation fields from static attributes) could be introduced earlier in §2 for clearer reading.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and positive assessment of the potential impact of BulletGen. We address each major comment point by point below, providing clarifications from the manuscript and indicating where we will revise the text for the next version.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Bullet-time conditioning): The method description provides no explicit consistency loss, reprojection check, or warping verification between the diffusion-generated frames and the 4D Gaussian splats at the chosen bullet-time step. Because the initial reconstruction is incomplete due to monocular ambiguities, any misalignment would be directly incorporated as pseudo-ground-truth during optimization, undermining the central claim of seamless blending without new artifacts.
Authors: We appreciate the referee raising this point about potential misalignment. In the BulletGen pipeline, a single frame is extracted directly from the current 4D Gaussian reconstruction at the chosen bullet-time step and supplied as the conditioning input to the pre-trained video diffusion model. This conditioning anchors the entire generated sequence to the geometry and appearance present in the reconstruction at that instant. The diffusion model, having been trained on large-scale video data, produces frames that respect the provided conditioning while synthesizing plausible motion and content for other viewpoints and times. Although the manuscript does not introduce an additional explicit consistency or reprojection loss term (the generated frames serve as direct supervision), the iterative optimization loop—updating the 4D model and re-selecting a new bullet-time step—allows progressive refinement. To address the concern explicitly, we will expand the description in §3.2 to clarify the role of the conditioning mechanism in maintaining alignment at the bullet-time step and will include qualitative examples in the supplement showing the match between the conditioning frame and the generated video outputs. revision: yes
-
Referee: [§5] §5 (Experiments): The reported state-of-the-art performance on novel-view synthesis and tracking lacks accompanying ablation studies isolating the contribution of the generative supervision versus a baseline 4D Gaussian optimization without bullet-time generation. This makes it difficult to assess whether the claimed improvements are load-bearing on the proposed alignment mechanism.
Authors: We agree that an ablation isolating the generative supervision would strengthen the experimental section. The current results compare BulletGen against prior 4D reconstruction methods, but do not include a direct head-to-head with a 4D Gaussian baseline that omits the bullet-time diffusion component. In the revised manuscript we will add this ablation study, reporting novel-view synthesis and tracking metrics for both the full model and the baseline without generative supervision. This will allow readers to quantify the contribution of the proposed alignment and supervision mechanism. revision: yes
Circularity Check
No significant circularity in BulletGen derivation chain
full rationale
The paper describes a method that conditions an external pre-trained diffusion video model on a frozen bullet-time step from an existing 4D Gaussian reconstruction, then uses the generated frames as supervision to refine the Gaussian model. This chain depends on independent components (pre-trained diffusion models and prior Gaussian splatting representations) rather than any self-referential fitting, self-citation for uniqueness, or redefinition of inputs as outputs. No equations or steps reduce predictions to inputs by construction, and the approach remains falsifiable against external benchmarks such as novel-view synthesis metrics and tracking accuracy on held-out data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Hexplane: A fast representation for dynamic scenes
Ang Cao and Justin Johnson. Hexplane: A fast representation for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 130–141, 2023
work page 2023
-
[2]
A survey on generative diffusion models
Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z Li. A survey on generative diffusion models. IEEE Transactions on Knowledge and Data Engineering, 2024
work page 2024
-
[3]
Jin-Xiang Chai, Xin Tong, Shing-Chow Chan, and Heung-Yeung Shum. Plenoptic sampling. InProceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 307–318, 2000
work page 2000
-
[4]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In European Conference on Computer Vision, pages 370–386. Springer, 2024
work page 2024
-
[5]
Generating 3d-consistent videos from unposed internet photos
Gene Chou, Kai Zhang, Sai Bi, Hao Tan, Zexiang Xu, Fujun Luan, Bharath Hariharan, and Noah Snavely. Generating 3d-consistent videos from unposed internet photos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[6]
Diffusion models in vision: A survey
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10850–10869, 2023
work page 2023
-
[7]
Neural parametric gaussians for monocular non-rigid object reconstruction
Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, and Jan Eric Lenssen. Neural parametric gaussians for monocular non-rigid object reconstruction. arXiv preprint arXiv:2312.01196, 2023
-
[8]
Abe Davis, Marc Levoy, and Fredo Durand. Unstructured light fields. In Computer Graphics Forum, volume 31, pages 305–314. Wiley Online Library, 2012
work page 2012
-
[9]
Depth-supervised NeRF: Fewer views and faster training for free
Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. Depth-supervised NeRF: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
work page 2022
-
[10]
BootsTAP: Bootstrapped training for tracking-any-point
Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, João Carreira, and Andrew Zisserman. BootsTAP: Bootstrapped training for tracking-any-point. ACCV, 2024
work page 2024
-
[11]
TAPIR: Tracking any point with per-frame initialization and temporal refinement
Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, and Andrew Zisserman. TAPIR: Tracking any point with per-frame initialization and temporal refinement. In ICCV, pages 10061–10072, 2023
work page 2023
-
[12]
4d-rotor gaussian splatting: Towards efficient novel-view synthesis for dynamic scenes
Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, and Baoquan Chen. 4d-rotor gaussian splatting: Towards efficient novel-view synthesis for dynamic scenes. In Proc. SIGGRAPH, July 2024
work page 2024
-
[13]
K-planes: Explicit radiance fields in space, time, and appearance
Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12479–12488, 2023
work page 2023
-
[14]
Monocular dynamic view synthesis: A reality check
Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, and Angjoo Kanazawa. Monocular dynamic view synthesis: A reality check. In NeurIPS, 2022
work page 2022
-
[15]
Ruiqi Gao*, Aleksander Holynski*, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul P. Srinivasan, Jonathan T. Barron, and Ben Poole*. Cat3d: Create anything in 3d with multi-view diffusion models. NeurIPS, 2024
work page 2024
-
[16]
Spatio-angular resolution tradeoffs in integral photography
Todor G Georgiev, Ke Colin Zheng, Brian Curless, David Salesin, Shree K Nayar, and Chintan Intwala. Spatio-angular resolution tradeoffs in integral photography. Rendering Techniques, 2006(263-272):21, 2006
work page 2006
-
[17]
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014
work page 2014
-
[18]
Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. The lumigraph. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 453–464. 2023
work page 2023
-
[19]
The llama 3 herd of models, 2024
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, et al. The llama 3 herd of models, 2024
work page 2024
-
[20]
Single-view view synthesis in the wild with learned adaptive multiplane images
Yuxuan Han, Ruicheng Wang, and Jiaolong Yang. Single-view view synthesis in the wild with learned adaptive multiplane images. In ACM SIGGRAPH, 2022
work page 2022
-
[21]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[22]
Vivid4d: Improving 4d reconstruction from monocular video by video inpainting, 2025
Jiaxin Huang, Sheng Miao, BangBang Yang, Yuewen Ma, and Yiyi Liao. Vivid4d: Improving 4d reconstruction from monocular video by video inpainting, 2025
work page 2025
-
[23]
Panoptic studio: A massively multiview system for social motion capture
Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE international conference on computer vision, pages 3334–3342, 2015
work page 2015
-
[24]
Cotracker: It is better to track together
Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker: It is better to track together. In ECCV, 2024. 10
work page 2024
-
[25]
Splatam: Splat, track and map 3d gaussians for dense rgb-d slam
Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat, track and map 3d gaussians for dense rgb-d slam. In CVPR, 2024
work page 2024
-
[26]
3d gaussian splatting for real-time radiance field rendering
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM TOG, 42(4), July 2023
work page 2023
-
[27]
Tiled multiplane images for practical 3d photography
Numair Khan, Eric Penner, Douglas Lanman, and Lei Xiao. Tiled multiplane images for practical 3d photography. International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[28]
Scene reconstruction from high spatio-angular resolution light fields
Changil Kim, Henning Zimmer, Yael Pritch, Alexander Sorkine-Hornung, and Markus H Gross. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph., 32(4):73–1, 2013
work page 2013
-
[29]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017
work page 2017
-
[30]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. arXiv:2304.02643, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting
Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. ECCV, 2024
work page 2024
-
[32]
Imagenet classification with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, NeurIPS, volume 25. Curran Associates, Inc., 2012
work page 2012
-
[33]
Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds
Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, and Kostas Daniilidis. Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds. arXiv preprint arXiv:2405.17421, 2024
-
[34]
Marc Levoy and Pat Hanrahan. Light field rendering. InSeminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 441–452. 2023
work page 2023
-
[35]
Nerfacc: Efficient sampling accelerates nerfs
Ruilong Li, Hang Gao, Matthew Tancik, and Angjoo Kanazawa. Nerfacc: Efficient sampling accelerates nerfs. arXiv preprint arXiv:2305.04966, 2023
-
[36]
Spacetime gaussian feature splatting for real-time dynamic view synthesis
Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaussian feature splatting for real-time dynamic view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8508–8520, 2024
work page 2024
-
[37]
Neural scene flow fields for space-time view synthesis of dynamic scenes
Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6498–6508, 2021
work page 2021
-
[38]
Megasam: Accurate, fast and robust structure and motion from casual dynamic videos
Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Holynski, and Noah Snavely. Megasam: Accurate, fast and robust structure and motion from casual dynamic videos. In arxiv, 2024
work page 2024
-
[39]
Dynibar: Neural dynamic image-based rendering
Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, and Noah Snavely. Dynibar: Neural dynamic image-based rendering. In CVPR, 2023
work page 2023
-
[40]
Feed-forward bullet-time reconstruction of dynamic scenes from monocular videos
Hanxue Liang, Jiawei Ren, Ashkan Mirzaei, Antonio Torralba, Ziwei Liu, Igor Gilitschenski, Sanja Fidler, Cengiz Oztireli, Huan Ling, Zan Gojcic, and Jiahui Huang. Feed-forward bullet-time reconstruction of dynamic scenes from monocular videos. 2024
work page 2024
-
[41]
Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis
Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, and Lei Xiao. Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2642–2652. IEEE, 2025
work page 2025
-
[42]
Himor: Monocular deformable gaussian reconstruction with hierarchical motion representation, 2025
Yiming Liang, Tianhan Xu, and Yuta Kikuchi. Himor: Monocular deformable gaussian reconstruction with hierarchical motion representation, 2025
work page 2025
-
[43]
Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle
Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. In CVPR, pages 21136–21145, 2024
work page 2024
-
[44]
MoDGS: Dynamic gaussian splatting from casually-captured monocular videos with depth priors
Qingming LIU, Yuan Liu, Jiepeng Wang, Xianqiang Lyu, Peng Wang, Wenping Wang, and Junhui Hou. MoDGS: Dynamic gaussian splatting from casually-captured monocular videos with depth priors. In ICLR, 2025
work page 2025
-
[45]
Zero-1-to-3: Zero-shot one image to 3d object, 2023
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object, 2023
work page 2023
-
[46]
3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors
Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors. In Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[47]
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
Wonder3d: Single image to 3d using cross-domain diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Single image to 3d using cross-domain diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024
work page 2024
-
[49]
Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis
Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024. 11
work page 2024
-
[50]
Local light field fusion: Practical view synthesis with prescriptive sampling guidelines
Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (ToG), 38(4):1–14, 2019
work page 2019
-
[51]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020
work page 2020
-
[52]
Light field photography with a hand-held plenoptic camera
Ren Ng, Marc Levoy, Mathieu Brédif, Gene Duval, Mark Horowitz, and Pat Hanrahan. Light field photography with a hand-held plenoptic camera. PhD thesis, Stanford university, 2005
work page 2005
-
[53]
Holoportation: Virtual 3d teleporta- tion in real-time
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, et al. Holoportation: Virtual 3d teleporta- tion in real-time. In Proceedings of the 29th annual symposium on user interface software and technology, pages 741–754, 2016
work page 2016
-
[54]
Nerfies: Deformable neural radiance fields
Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5865–5874, 2021
work page 2021
-
[55]
Hypernerf: A higher-dimensional representation for topologi- cally varying neural radiance fields
Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M Seitz. Hypernerf: A higher-dimensional representation for topologi- cally varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021
-
[56]
Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M
Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M. Seitz. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM TOG, 40(6), dec 2021
work page 2021
-
[57]
UniDepthV2: Universal monocular metric depth estimation made simpler, 2025
Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mattia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal monocular metric depth estimation made simpler, 2025
work page 2025
-
[58]
UniDepth: Universal monocular metric depth estimation
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. UniDepth: Universal monocular metric depth estimation. In CVPR, 2024
work page 2024
-
[59]
D-nerf: Neural radiance fields for dynamic scenes
Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10318–10327, 2021
work page 2021
-
[60]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, ICML, volume 139 of Proceedings of Machine Learning Researc...
work page 2021
-
[61]
Gen3c: 3d-informed world-consistent video generation with precise camera control
Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, and Jun Gao. Gen3c: 3d-informed world-consistent video generation with precise camera control. In CVPR, 2025
work page 2025
-
[62]
Barron, Ben Mildenhall, Pratul P
Barbara Roessle, Jonathan T. Barron, Ben Mildenhall, Pratul P. Srinivasan, and Matthias Nießner. Dense depth priors for neural radiance fields from sparse input views. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
work page 2022
-
[63]
Structure-from-motion revisited
Johannes Lutz Schönberger and Jan-Michael Frahm. Structure-from-motion revisited. In CVPR, 2016
work page 2016
-
[64]
Pixelwise view selection for unstructured multi-view stereo
Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for unstructured multi-view stereo. In ECCV, 2016
work page 2016
-
[65]
Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering
Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, and Yebin Liu. Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16632–16642, 2023
work page 2023
-
[66]
Dynamic gaussian marbles for novel view synthesis of casual monocular videos
Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, and Leonidas Guibas. Dynamic gaussian marbles for novel view synthesis of casual monocular videos. In SIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024
work page 2024
-
[67]
Dimen- sionx: Create any 3d and 4d scenes from a single image with controllable video diffusion
Wenqiang Sun, Shuo Chen, Fangfu Liu, Zilong Chen, Yueqi Duan, Jun Zhang, and Yikai Wang. Dimen- sionx: Create any 3d and 4d scenes from a single image with controllable video diffusion. arXiv preprint arXiv:2411.04928, 2024
-
[68]
Single-view view synthesis with multiplane images
Richard Tucker and Noah Snavely. Single-view view synthesis with multiplane images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[69]
Generative camera dolly: Extreme monocular dynamic novel view synthesis
Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, and Carl V ondrick. Generative camera dolly: Extreme monocular dynamic novel view synthesis. 2024
work page 2024
-
[70]
Vistadream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu, Zhen Dong, Wenping Wang, and Bisheng Yang. Vistadream: Sampling multiview consistent images for single-view scene reconstruction. arXiv preprint arXiv:2410.16892, 2024
-
[71]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. In CVPR, 2025
work page 2025
-
[72]
ibutter: Neural interactive bullet time generator for human free-viewpoint rendering
Liao Wang, Ziyu Wang, Pei Lin, Yuheng Jiang, Xin Suo, Minye Wu, Lan Xu, and Jingyi Yu. ibutter: Neural interactive bullet time generator for human free-viewpoint rendering. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 4641–4650, New York, NY , USA, 2021. Association for Computing Machinery. 12
work page 2021
-
[73]
Shape of motion: 4d reconstruction from a single video, 2024
Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of motion: 4d reconstruction from a single video, 2024
work page 2024
-
[74]
Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision, 2024
work page 2024
-
[75]
High performance imaging using large camera arrays
Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville Talvala, Emilio Antunez, Adam Barth, Andrew Adams, Mark Horowitz, and Marc Levoy. High performance imaging using large camera arrays. In ACM siggraph 2005 papers, pages 765–776. 2005
work page 2005
-
[76]
4d gaussian splatting for real-time dynamic scene rendering
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. In CVPR, pages 20310–20320, June 2024
work page 2024
-
[77]
Barron, and Aleksander Holynski
Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T. Barron, and Aleksander Holynski. CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models. arXiv:2411.18613, 2024
-
[78]
Srini- vasan, Dor Verbin, Jonathan T
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srini- vasan, Dor Verbin, Jonathan T. Barron, Ben Poole, and Aleksander Holynski. Reconfusion: 3d reconstruc- tion with diffusion priors. arXiv, 2023
work page 2023
-
[79]
Neural fields in visual computing and beyond
Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, and Srinath Sridhar. Neural fields in visual computing and beyond. In Computer Graphics Forum, volume 41, pages 641–676. Wiley Online Library, 2022
work page 2022
-
[80]
Autoregressive models in vision: A survey
Jing Xiong, Gongye Liu, Lun Huang, Chengyue Wu, Taiqiang Wu, Yao Mu, Yuan Yao, Hui Shen, Zhongwei Wan, Jinfa Huang, et al. Autoregressive models in vision: A survey. arXiv preprint arXiv:2411.05902, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.