GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
hub Canonical reference
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction
Canonical reference. 100% of citing Pith papers cite this work as background.
abstract
We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
PAGaS refines multi-view stereo depths by optimizing 1DoF Gaussians whose positions and sizes are fixed by back-projected pixel volumes, producing detailed depth maps that outperform reference baselines on 3D reconstruction benchmarks.
SpUDD defines superpower contours from power diagrams of unsigned distance samples, proves convergence to the true surface, and uses them to generate approximating polygonal meshes that outperform prior strategies.
THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
SVGS improves Gaussian Splatting novel-view synthesis by replacing single-color primitives with spatially varying color and opacity functions implemented via bilinear interpolation, movable kernels, or tiny neural networks on 2D Gaussian surfels.
ArtMesh presents a mesh-native pipeline for articulated reconstruction that uses restricted Delaunay remeshing and bidirectional motion consistency to outperform 3D Gaussian Splatting methods on joint estimation and part geometry.
TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.
RetrieveVGGT enables constant-memory long-context streaming 3D reconstruction by retrieving relevant frames via query-key similarities in VGGT's first attention layer, outperforming StreamVGGT and others.
A framework that structurally enforces divergence-free velocity and long-range transport coherence in 3D fluid reconstruction from 2D videos via divergence-free kernels advecting Lagrangian Gaussian splats.
Sat3R adapts Depth Anything V2 via RPC-aware metric depth fine-tuning to deliver satellite DSM reconstruction with 38% lower MAE than zero-shot baselines and over 300x speedup versus optimization methods.
A new GPU clipping algorithm with directional culling and hierarchical traversal constructs scalable 3D Voronoi and power diagrams for arbitrary point distributions.
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
A greedy algorithm interpolates consistent signed distance functions from discrete samples by treating SDF geometric properties as hard constraints.
A feed-forward model regresses accurate Gaussian surfel geometry from sparse views using Nyquist-guided cross-view feature aggregation, achieving 100x speedup over optimization-based 3DGS surface methods on DTU benchmarks.
Neural Harmonic Textures add periodic feature interpolation and deferred neural decoding to primitive representations, achieving state-of-the-art real-time novel-view synthesis and bridging primitive and neural-field methods.
A pipeline that reconstructs articulated objects from sparse unposed images by aligning independent per-pose reconstructions via learned deformation fields and progressive static/moving part disentanglement.
EpiS improves generalizable neural surface reconstruction from sparse views by guiding epipolar feature aggregation with cost volumes, using an epipolar transformer, and applying pretrained monocular depth constraints, outperforming prior methods on DTU and BlendedMVS.
PREF introduces a phasor volume and tailored Fourier mapping to let shallow MLPs capture high-frequency signals compactly in 2D images, 3D SDFs, and 5D NeRFs.
IVGT implicitly models continuous neural scene representations from pose-free multi-view images to enable coherent surface extraction, novel view synthesis, and related 3D tasks via SDF and color prediction.
FSTM improves indoor reconstruction by training geometry first without semantic supervision, then adding semantics, achieving 2.3x faster training and higher object surface recall than joint optimization.
GAI-NeRF combines geometric algebra attention and an adaptive ray tracing module inside a NeRF model to deliver more accurate and generalizable wireless channel predictions across varied indoor environments.
Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.
MetroGS combines distributed 2D Gaussian Splatting with structured dense enhancement, progressive hybrid optimization, and depth-guided appearance modeling to deliver higher geometric accuracy and stability in large-scale urban reconstruction.
Geometry-guided adaptive placement of bases and virtual viewpoints improves rendering quality and memory use over uniform arrangements in scalable NeRF for large indoor scenes.
citing papers explorer
-
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
-
PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement
PAGaS refines multi-view stereo depths by optimizing 1DoF Gaussians whose positions and sizes are fixed by back-projected pixel volumes, producing detailed depth maps that outperform reference baselines on 3D reconstruction benchmarks.
-
SpUDD: Superpower Contouring of Unsigned Distance Data
SpUDD defines superpower contours from power diagrams of unsigned distance samples, proves convergence to the true surface, and uses them to generate approximating polygonal meshes that outperform prior strategies.
-
THOM: Generating Physically Plausible Hand-Object Meshes From Text
THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
-
SVGS: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors
SVGS improves Gaussian Splatting novel-view synthesis by replacing single-color primitives with spatially varying color and opacity functions implemented via bilinear interpolation, movable kernels, or tiny neural networks on 2D Gaussian surfels.
-
ArtMesh: Part-Aware Articulated Mesh Fields with Motion-Consistent Dynamics
ArtMesh presents a mesh-native pipeline for articulated reconstruction that uses restricted Delaunay remeshing and bidirectional motion consistency to outperform 3D Gaussian Splatting methods on joint estimation and part geometry.
-
TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation
TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.
-
Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
RetrieveVGGT enables constant-memory long-context streaming 3D reconstruction by retrieving relevant frames via query-key similarities in VGGT's first attention layer, outperforming StreamVGGT and others.
-
LagrangianSplats: Divergence-Free Transport of Gaussian Primitives for Fluid Reconstruction
A framework that structurally enforces divergence-free velocity and long-range transport coherence in 3D fluid reconstruction from 2D videos via divergence-free kernels advecting Lagrangian Gaussian splats.
-
Sat3R: Satellite DSM Reconstruction via RPC-Aware Depth Fine-tuning
Sat3R adapts Depth Anything V2 via RPC-aware metric depth fine-tuning to deliver satellite DSM reconstruction with 38% lower MAE than zero-shot baselines and over 300x speedup versus optimization methods.
-
Scalable GPU Construction of 3D Voronoi and Power Diagrams
A new GPU clipping algorithm with directional culling and hierarchical traversal constructs scalable 3D Voronoi and power diagrams for arbitrary point distributions.
-
High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
-
Greed for the Spheres: A Signed Distance Interpolation Method
A greedy algorithm interpolates consistent signed distance functions from discrete samples by treating SDF geometric properties as hard constraints.
-
SurfelSplat: Learning Efficient and Generalizable Gaussian Surfel Representations for Sparse-View Surface Reconstruction
A feed-forward model regresses accurate Gaussian surfel geometry from sparse views using Nyquist-guided cross-view feature aggregation, achieving 100x speedup over optimization-based 3DGS surface methods on DTU benchmarks.
-
Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction
Neural Harmonic Textures add periodic feature interpolation and deferred neural decoding to primitive representations, achieving state-of-the-art real-time novel-view synthesis and bridging primitive and neural-field methods.
-
PAOLI: Pose-free Articulated Object Learning from Sparse-view Images
A pipeline that reconstructs articulated objects from sparse unposed images by aligning independent per-pose reconstructions via learned deformation fields and progressive static/moving part disentanglement.
-
Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry
EpiS improves generalizable neural surface reconstruction from sparse views by guiding epipolar feature aggregation with cost volumes, using an epipolar transformer, and applying pretrained monocular depth constraints, outperforming prior methods on DTU and BlendedMVS.
-
PREF: Phasorial Embedding Fields for Compact Neural Representations
PREF introduces a phasor volume and tailored Fourier mapping to let shallow MLPs capture high-frequency signals compactly in 2D images, 3D SDFs, and 5D NeRFs.
-
IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation
IVGT implicitly models continuous neural scene representations from pose-free multi-view images to enable coherent surface extraction, novel view synthesis, and related 3D tasks via SDF and color prediction.
-
First Shape, Then Meaning: Efficient Geometry and Semantics Learning for Indoor Reconstruction
FSTM improves indoor reconstruction by training geometry first without semantic supervision, then adding semantics, achieving 2.3x faster training and higher object surface recall than joint optimization.
-
A Geometric Algebra-informed NeRF Framework for Generalizable Wireless Channel Prediction
GAI-NeRF combines geometric algebra attention and an adaptive ray tracing module inside a NeRF model to deliver more accurate and generalizable wireless channel predictions across varied indoor environments.
-
Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation
Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.
-
MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes
MetroGS combines distributed 2D Gaussian Splatting with structured dense enhancement, progressive hybrid optimization, and depth-guided appearance modeling to deliver higher geometric accuracy and stability in large-scale urban reconstruction.
-
Geometry-Aware Scene Configurations for Novel View Synthesis
Geometry-guided adaptive placement of bases and virtual viewpoints improves rendering quality and memory use over uniform arrangements in scalable NeRF for large indoor scenes.
-
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
-
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
By fine-tuning DUST3R to output per-timestep pointmaps on scarce dynamic video datasets, MonST3R achieves stronger video depth and pose estimation without explicit motion modeling.
-
A Geometric Algorithm for Blood Vessel Reconstruction from Skeletal Representation
A geometric algorithm reconstructs tubular shapes from skeletal points as a TSDF in voxel hashing using direct geometric distance computation without segmentation or matrix solving.
-
Attention Is not Everything: Efficient Alternatives for Vision
A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.
- TACO: Temporal Consensus Optimization for Continual Neural Mapping