MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo

Anpei Chen; Fanbo Xiang; Fuqiang Zhao; Hao Su; Jingyi Yu; Xiaoshuai Zhang; Zexiang Xu

arxiv: 2103.15595 · v2 · pith:PX6MJYREnew · submitted 2021-03-29 · 💻 cs.CV

MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo

Anpei Chen , Zexiang Xu , Fuqiang Zhao , Xiaoshuai Zhang , Fanbo Xiang , Jingyi Yu , Hao Su This is my paper

classification 💻 cs.CV

keywords radianceneuralfieldreconstructionapproachfastfieldsimages

0 comments

read the original abstract

We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images
cs.CV 2026-05 unverdicted novelty 6.0

A feed-forward model aligns ground and satellite features to predict Gaussian splats for improved novel-view synthesis on georeferenced outdoor scenes.
PAGE-4D: VGGT-4D Perception via Disentangled Pose and Geometry Estimation
cs.CV 2025-10 unverdicted novelty 6.0

PAGE-4D is a feedforward extension of VGGT that uses a dynamics-aware aggregator and mask to disentangle pose estimation from geometry reconstruction in videos with moving objects.