Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction
Pith reviewed 2026-05-22 00:06 UTC · model grok-4.3
The pith
Hestia achieves at least a 4% gain in coverage ratio and halves Chamfer distance for 3D reconstruction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Hestia systematically improves the planners through four components: a more diverse dataset to promote robustness, a hierarchical structure to manage the high-dimensional continuous action search space, a close-greedy strategy to mitigate spurious correlations, and a face-aware design to avoid overlooking geometry. This allows the system to predict five-degree-of-freedom viewpoints that yield efficient and robust 3D reconstruction in real time.
What carries the argument
Voxel-face-aware hierarchical next-best-view acquisition, which uses a structured search over viewpoints while accounting for the faces of voxels in the reconstruction volume to guide selection.
Load-bearing premise
The assumption that the four proposed components can be combined without introducing new failure modes that cancel the reported gains, and that the evaluation metrics on the chosen test objects and budgets generalize beyond the specific experimental setup.
What would settle it
Evaluating the planner on a broader range of shapes, including very irregular or symmetric objects, and checking whether the reported gains in coverage and error reduction still appear.
Figures
read the original abstract
Advances in 3D reconstruction and novel view synthesis have enabled efficient and photorealistic rendering. However, images for reconstruction are still either largely manual or constrained by simple preplanned trajectories. To address this issue, recent works propose generalizable next-best-view planners that do not require online learning. Nevertheless, robustness and performance remain limited across various shapes. Hence, this study introduces Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction (Hestia), which addresses the shortcomings of the reinforcement learning-based generalizable approaches for five-degree-of-freedom viewpoint prediction. Hestia systematically improves the planners through four components: a more diverse dataset to promote robustness, a hierarchical structure to manage the high-dimensional continuous action search space, a close-greedy strategy to mitigate spurious correlations, and a face-aware design to avoid overlooking geometry. Experimental results show that Hestia achieves non-marginal improvements, with at least a 4% gain in coverage ratio, while reducing Chamfer Distance by 50% and maintaining real-time inference. In addition, Hestia outperforms prior methods by at least 12% in coverage ratio with a 5-image budget and remains robust to object placement variations. Finally, we demonstrate that Hestia, as a next-best-view planner, is feasible for the real-world application. Our project page is https://johnnylu305.github.io/hestia web.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Hestia, a generalizable next-best-view (NBV) planner for efficient 3D reconstruction that extends reinforcement-learning approaches for 5-DoF viewpoint selection. It proposes four components—a more diverse training dataset, a hierarchical search structure over the continuous action space, a close-greedy selection strategy, and a voxel-face-aware geometric prior—to improve robustness and performance. Experiments report at least 4% higher coverage ratio, 50% lower Chamfer distance, real-time inference, at least 12% coverage gain under a 5-image budget, robustness to object placement changes, and real-world feasibility.
Significance. If the empirical gains are reproducible and generalize, the work would provide a practical advance in autonomous 3D scanning by delivering a non-learning-online NBV method that handles varied object geometries better than prior RL baselines while remaining computationally lightweight. The combination of hierarchical planning with geometric awareness and the reported real-time capability are particularly relevant for robotic deployment.
major comments (3)
- [Experimental results] Experimental results section: the reported improvements (≥4% coverage, 50% Chamfer reduction, ≥12% at 5-image budget) are presented without standard deviations, number of independent runs, or statistical significance tests, and without explicit confirmation that baselines and hyper-parameters were fixed prior to evaluating the test set; this directly affects whether the central performance claims can be considered load-bearing.
- [Results and real-world demonstration] Robustness and real-world evaluation: the abstract and results claim robustness to object placement variations and real-world feasibility, yet no quantitative details are given on the number or range of placement variations tested, the diversity of test objects relative to the training distribution, or metrics capturing sensor noise and calibration error; these omissions leave the generalization of the headline gains unverified.
- [Method and experiments] Ablation or component analysis: while four components are introduced, the manuscript does not present controlled ablations demonstrating that their joint use does not introduce new failure modes that offset the individual contributions; without such evidence the attribution of the observed gains to the proposed design remains incomplete.
minor comments (2)
- [Abstract] The abstract states 'five-degree-of-freedom viewpoint prediction' without clarifying whether roll is included or how the action space is discretized in the hierarchical search.
- [Figures] Figure captions and axis labels in the quantitative comparison plots should explicitly state the exact baselines and the number of objects or scenes averaged.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We address each major comment point-by-point below. We agree that the suggested additions will strengthen the manuscript and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [Experimental results] Experimental results section: the reported improvements (≥4% coverage, 50% Chamfer reduction, ≥12% at 5-image budget) are presented without standard deviations, number of independent runs, or statistical significance tests, and without explicit confirmation that baselines and hyper-parameters were fixed prior to evaluating the test set; this directly affects whether the central performance claims can be considered load-bearing.
Authors: We agree that reporting standard deviations, the number of runs, and statistical tests would improve the robustness of our claims. In the revised manuscript we will present all quantitative results as means over 5 independent runs with different random seeds, include standard deviations, and add paired t-tests for statistical significance between Hestia and each baseline. We will also add an explicit statement confirming that all baselines and hyperparameters were frozen before any test-set evaluation, consistent with the experimental protocol already described in Section 4. revision: yes
-
Referee: [Results and real-world demonstration] Robustness and real-world evaluation: the abstract and results claim robustness to object placement variations and real-world feasibility, yet no quantitative details are given on the number or range of placement variations tested, the diversity of test objects relative to the training distribution, or metrics capturing sensor noise and calibration error; these omissions leave the generalization of the headline gains unverified.
Authors: We acknowledge that additional quantitative details are needed to substantiate the robustness and real-world claims. In the revision we will expand the corresponding subsection to report: (i) results over 20 distinct random object placements spanning a translation range of ±15 cm and rotation range of ±20°, (ii) the composition of the 50 test objects (including that 35 % belong to shape categories absent from the training set), and (iii) reconstruction metrics obtained under added Gaussian sensor noise (σ = 1–3 mm) and calibration perturbations up to 2 mm. These numbers and metrics will be added to both the main results and the real-world demonstration. revision: yes
-
Referee: [Method and experiments] Ablation or component analysis: while four components are introduced, the manuscript does not present controlled ablations demonstrating that their joint use does not introduce new failure modes that offset the individual contributions; without such evidence the attribution of the observed gains to the proposed design remains incomplete.
Authors: We agree that controlled ablations are required to properly attribute performance gains. We will add a dedicated ablation subsection that evaluates each of the four components in isolation (diverse dataset, hierarchical search, close-greedy strategy, face-aware voxel prior) by training and testing variants with the component removed or disabled. All variants will be evaluated on the same metrics and test objects; we will also report any observed failure modes or performance regressions when components are combined, thereby clarifying that the joint design does not introduce offsetting drawbacks. revision: yes
Circularity Check
No circularity: empirical gains shown via held-out comparisons
full rationale
The paper's central claims consist of measured performance improvements (coverage ratio gains, Chamfer distance reductions) obtained by training and evaluating a next-best-view planner on a diverse dataset against prior methods. These results are produced by direct experimental comparison on held-out objects and budgets rather than any derivation that reduces to fitted parameters or self-referential definitions. No equations or first-principles steps are presented that equate outputs to inputs by construction. Self-citations to earlier RL baselines serve only as external reference points for comparison and do not carry the load of proving the new components' effectiveness. The architecture (hierarchical structure, face-aware design, etc.) is validated independently through ablation-style experiments, keeping the derivation chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- hierarchical search depth and branching factors
- close-greedy threshold
axioms (2)
- domain assumption Voxel grid representation accurately captures unobserved geometry for face-aware scoring.
- domain assumption The training distribution of object shapes is sufficiently representative for robustness to placement variations.
Reference graph
Works this paper leans on
-
[1]
S. Agarwal, Y . Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski. Building rome in a day. Communications of the ACM, 54(10):105–112, 2011
work page 2011
-
[2]
H. Jiang, H. Liu, P. Tan, G. Zhang, and H. Bao. 3d reconstruction of dynamic scenes with multiple handheld cameras. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part II 12, pages 601–615. Springer, 2012
work page 2012
- [3]
-
[4]
M. Nießner, M. Zollh ¨ofer, S. Izadi, and M. Stamminger. Real-time 3d reconstruction at scale using voxel hashing. ACM Transactions on Graphics (ToG), 32(6):1–11, 2013
work page 2013
-
[5]
J. Xie, R. Girshick, and A. Farhadi. Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14 , pages 842–857. Springer, 2016
work page 2016
-
[6]
J. L. Sch ¨onberger and J.-M. Frahm. Structure-from-motion revisited. In Conference on Com- puter Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[7]
Y . Yao, Z. Luo, S. Li, T. Fang, and L. Quan. Mvsnet: Depth inference for unstructured multi- view stereo. In Proceedings of the European conference on computer vision (ECCV) , pages 767–783, 2018
work page 2018
-
[8]
H. Xie, H. Yao, X. Sun, S. Zhou, and S. Zhang. Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2690–2698, 2019
work page 2019
-
[9]
Z. Murez, T. Van As, J. Bartolozzi, A. Sinha, V . Badrinarayanan, and A. Rabinovich. Atlas: End-to-end 3d scene reconstruction from posed images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16 , pages 414–431. Springer, 2020
work page 2020
-
[10]
D. Wang, X. Cui, X. Chen, Z. Zou, T. Shi, S. Salcudean, Z. J. Wang, and R. Ward. Multi-view 3d reconstruction with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 5722–5731, 2021
work page 2021
- [11]
-
[12]
H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger. Unifying flow, stereo and depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
work page 2023
-
[13]
S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud. Dust3r: Geometric 3d vision made easy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697–20709, 2024. 11
work page 2024
-
[14]
B. P. Duisterhof, L. Zust, P. Weinzaepfel, V . Leroy, Y . Cabon, and J. Revaud. Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion. CoRR, 2024
work page 2024
-
[15]
L. Pan, D. Barath, M. Pollefeys, and J. L. Sch ¨onberger. Global Structure-from-Motion Revis- ited. In European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[16]
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020
work page 2020
-
[17]
A. Yu, V . Ye, M. Tancik, and A. Kanazawa. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4578–4587, 2021
work page 2021
-
[18]
S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radi- ance fields without neural networks. In CVPR, 2022
work page 2022
-
[19]
A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su. Tensorf: Tensorial radiance fields. In European conference on computer vision, pages 333–350. Springer, 2022
work page 2022
-
[20]
T. M ¨uller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. , 41(4):102:1–102:15, July 2022. doi: 10.1145/3528223.3530127. URL https://doi.org/10.1145/3528223.3530127
- [21]
- [22]
- [23]
- [24]
-
[25]
W. Ren, Z. Zhu, B. Sun, J. Chen, M. Pollefeys, and S. Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[26]
D. Charatan, S. L. Li, A. Tagliasacchi, and V . Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19457–19467, 2024
work page 2024
- [27]
- [28]
- [29]
-
[30]
Y . Chen, H. Xu, C. Zheng, B. Zhuang, M. Pollefeys, A. Geiger, T.-J. Cham, and J. Cai. Mvs- plat: Efficient 3d gaussian splatting from sparse multi-view images. In European Conference on Computer Vision, pages 370–386. Springer, 2025
work page 2025
-
[31]
M. Wallingford, A. Bhattad, A. Kusupati, V . Ramanujan, M. Deitke, A. Kembhavi, R. Mot- taghi, W.-C. Ma, and A. Farhadi. From an image to a scene: Learning to imagine the world from a million 360° videos. In The Thirty-eighth Annual Conference on Neural Information Processing Systems
-
[32]
T. Yu, Z. Zheng, K. Guo, P. Liu, Q. Dai, and Y . Liu. Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR2021), June 2021
work page 2021
- [33]
-
[34]
T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[35]
A. Liu, R. Tucker, V . Jampani, A. Makadia, N. Snavely, and A. Kanazawa. Infinite nature: Per- petual view generation of natural scenes from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14458–14467, 2021
work page 2021
-
[36]
L. Xu, V . Agrawal, W. Laney, T. Garcia, A. Bansal, C. Kim, S. Rota Bul `o, L. Porzi, P. Kontschieder, A. Boˇziˇc, et al. Vr-nerf: High-fidelity virtualized walkable spaces. In SIG- GRAPH Asia 2023 Conference Papers, pages 1–12, 2023
work page 2023
-
[37]
M. Broxton, J. Flynn, R. Overbeck, D. Erickson, P. Hedman, M. DuVall, J. Dourgarian, J. Busch, M. Whalen, and P. Debevec. Immersive light field video with a layered mesh repre- sentation. 39(4):86:1–86:15, 2020
work page 2020
-
[38]
K.-E. Lin, L. Xiao, F. Liu, G. Yang, and R. Ramamoorthi. Deep 3d mask volume for view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1749–1758, 2021
work page 2021
-
[39]
J. S. Yoon, K. Kim, O. Gallo, H. S. Park, and J. Kautz. Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5336–5345, 2020
work page 2020
-
[40]
T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombe, et al. Neural 3d video synthesis from multi-view video. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5521–5531, 2022
work page 2022
-
[41]
C.-Y . Lu, P. Zhou, A. Xing, C. Pokhariya, A. Dey, I. N. Shah, R. Mavidipalli, D. Hu, A. I. Comport, K. Chen, et al. Diva-360: The dynamic visual dataset for immersive neural fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22466–22476, 2024
work page 2024
-
[42]
S. Peng, Y . Zhang, Y . Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou. Neural body: Implicit neu- ral representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9054–9063, 2021
work page 2021
-
[43]
X. Chen, Q. Li, T. Wang, T. Xue, and J. Pang. Gennbv: Generalizable next-best-view policy for active 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16436–16445, 2024. 13
work page 2024
-
[44]
R. Monica and J. Aleotti. Contour-based next-best view planning from point cloud segmenta- tion of unknown objects. Autonomous Robots, 42:443–458, 2018
work page 2018
-
[45]
H. Zha, K. Morooka, and T. Hasegawa. Next best viewpoint (nbv) planning for active object modeling based on a learning-by-showing approach. In Computer Vision—ACCV’98: Third Asian Conference on Computer Vision Hong Kong, China, January 8–10, 1998 Proceedings, Volume II 3, pages 185–192. Springer, 1997
work page 1998
-
[46]
L. Liu, X. Xia, H. Sun, Q. Shen, J. Xu, B. Chen, H. Huang, and K. Xu. Object-aware guidance for autonomous scene reconstruction. ACM Transactions on Graphics (TOG) , 37(4):1–12, 2018
work page 2018
-
[47]
G. Hardouin, F. Morbidi, J. Moras, J. Marzat, and E. M. Mouaddib. Surface-driven next-best- view planning for exploration of large-scale 3d environments. IFAC-PapersOnLine, 53(2): 15501–15507, 2020
work page 2020
-
[48]
G. Hardouin, J. Moras, F. Morbidi, J. Marzat, and E. M. Mouaddib. Next-best-view plan- ning for surface reconstruction of large-scale 3d environments with multiple uavs. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 1567–
work page 2020
- [49]
-
[50]
X. Pan, Z. Lai, S. Song, and G. Huang. Activenerf: Learning where to see with uncertainty estimation. In European Conference on Computer Vision, pages 230–246. Springer, 2022
work page 2022
-
[51]
S. Lee, L. Chen, J. Wang, A. Liniger, S. Kumar, and F. Yu. Uncertainty guided policy for active robotic 3d reconstruction using neural radiance fields. IEEE Robotics and Automation Letters, 7(4):12070–12077, 2022
work page 2022
- [52]
-
[53]
L. Jin, X. Chen, J. R ¨uckin, and M. Popovi ´c. Neu-nbv: Next best view planning using uncer- tainty estimation in image-based neural rendering. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11305–11312. IEEE, 2023
work page 2023
-
[54]
N. S ¨underhauf, J. Abou-Chakra, and D. Miller. Density-aware nerf ensembles: Quantifying predictive uncertainty in neural radiance fields. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9370–9376. IEEE, 2023
work page 2023
-
[55]
D. Peralta, J. Casimiro, A. M. Nilles, J. A. Aguilar, R. Atienza, and R. Cajote. Next-best view policy for 3d reconstruction. arXiv preprint arXiv:2008.12664, 2020
-
[56]
Y . Ran, J. Zeng, S. He, J. Chen, L. Li, Y . Chen, G. Lee, and Q. Ye. Neurar: Neural uncertainty for autonomous 3d reconstruction with implicit neural representations. IEEE Robotics and Automation Letters, 8(2):1125–1132, 2023
work page 2023
-
[57]
A. Gu ´edon, T. Monnier, P. Monasse, and V . Lepetit. Macarons: Mapping and coverage an- ticipation with rgb online self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 940–951, 2023
work page 2023
- [58]
-
[59]
A. Boneh and M. Hofri. The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1):39–66, 1997. 14
work page 1997
-
[60]
M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13142–13153, June 2023
work page 2023
- [61]
-
[62]
A. C. INTERPRETATION. Spurious correlation: A causal interpretation herbert a. simon. Causal Models in the Social Sciences, page 5, 1971
work page 1971
-
[63]
Y . Kim, S. Mo, M. Kim, K. Lee, J. Lee, and J. Shin. Discovering and mitigating visual biases through keyword explanation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11082–11092, 2024
work page 2024
-
[64]
J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[65]
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
A. Nichol, H. Jun, P. Dhariwal, P. Mishkin, and M. Chen. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[66]
J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019
work page 2019
-
[67]
Z.-X. Zou, Z. Yu, Y .-C. Guo, Y . Li, D. Liang, Y .-P. Cao, and S.-H. Zhang. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10324–10335, 2024
work page 2024
-
[68]
N. M ¨uller, A. Simonelli, L. Porzi, S. R. Bul`o, M. Nießner, and P. Kontschieder. Autorf: Learn- ing 3d object radiance fields from single view observations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3971–3980, 2022
work page 2022
-
[69]
Y .-L. Liu, C. Gao, A. Meuleman, H.-Y . Tseng, A. Saraf, C. Kim, Y .-Y . Chuang, J. Kopf, and J.-B. Huang. Robust dynamic radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13–23, 2023
work page 2023
-
[70]
Q. Wang, Z. Wang, K. Genova, P. Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. Funkhouser. Ibrnet: Learning multi-view image-based rendering. InCVPR, 2021
work page 2021
-
[71]
H. Lin, S. Peng, Z. Xu, Y . Yan, Q. Shuai, H. Bao, and X. Zhou. Efficient neural radiance fields for interactive free-viewpoint video. In SIGGRAPH Asia Conference Proceedings, 2022
work page 2022
-
[72]
A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su. Mvsnerf: Fast generaliz- able radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021
work page 2021
-
[73]
H. Gao, R. Li, S. Tulsiani, B. Russell, and A. Kanazawa. Monocular dynamic view synthesis: A reality check. Advances in Neural Information Processing Systems, 35:33768–33780, 2022
work page 2022
-
[74]
K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. B. Goldman, S. M. Seitz, and R. Martin-Brualla. Nerfies: Deformable neural radiance fields. ICCV, 2021
work page 2021
-
[75]
K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman, R. Martin-Brualla, and S. M. Seitz. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), dec 2021. 15
work page 2021
-
[76]
L. Ling, Y . Sheng, Z. Tu, W. Zhao, C. Xin, K. Wan, L. Yu, Q. Guo, Z. Yu, Y . Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024
work page 2024
-
[77]
C. Lu, F. Yin, X. Chen, W. Liu, T. Chen, G. Yu, and J. Fan. A large-scale outdoor multi- modal dataset and benchmark for novel view synthesis and implicit scene reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 7557– 7567, 2023
work page 2023
-
[78]
K. Grauman, A. Westbury, L. Torresani, K. Kitani, J. Malik, T. Afouras, K. Ashutosh, V . Baiyya, S. Bansal, B. Boote, et al. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19383–19400, 2024
work page 2024
-
[79]
H. Chen, Y . Hou, C. Qu, I. Testini, X. Hong, and J. Jiao. 360+x: A panoptic multi-modal scene understanding dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[80]
L. Li, Z. Shen, Z. Wang, L. Shen, and P. Tan. Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems, 35:13485–13498, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.