GlobalSplat achieves competitive novel-view synthesis on RealEstate10K and ACID using only 16K Gaussians via global scene tokens and coarse-to-fine training, with a 4MB footprint and under 78ms inference.
hub Canonical reference
Commu- nications of the ACM65(1), 99–106 (2021)
Canonical reference. 71% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 19representative citing papers
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
HairOrbit leverages video generation priors and a neural orientation extractor to achieve state-of-the-art strand-level 3D hair reconstruction from single-view portraits in visible and invisible regions.
EndoVGGT uses a dynamic DeGAT graph attention module to improve depth estimation and non-rigid 3D reconstruction in surgery, reporting 24.6% PSNR and 9.1% SSIM gains on SCARED with zero-shot generalization to new domains.
Echo4DIR reconstructs continuous 4D cardiac geometry from sparse 2D echocardiography videos using implicit representations, epipolar feature fusion, self-supervised domain adaptation, and radial SDF alignment to achieve up to 98.35% Dice overlap.
EndoGSim integrates MLLM-guided material initialization with 4D Gaussian Splatting and differentiable Material Point Method to achieve physics-aware 4D reconstruction and simulation of endoscopic scenes.
GSMap represents HD map elements as sequences of 2D Gaussians to unify geometric precision and topological regularity for online autonomous driving maps.
AsyncEvGS reconstructs high-fidelity 3D scenes from motion-blurred images by first deblurring via event data then using VGGT-based pose estimation and structure-driven losses inside Gaussian Splatting.
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
Rein3D generates photorealistic, globally consistent 3D indoor scenes by using a restore-and-refine process where radial panoramic videos are restored via diffusion models and then used to update a 3D Gaussian field.
A feed-forward network predicts per-SMPL-X-vertex 3D Gaussians in canonical space from multi-view RGB images, enabling single-pass reconstruction and real-time animation via linear blend skinning.
TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
HOIGS adds a cross-attention HOI module to Gaussian Splatting that combines HexPlane human features with Cubic Hermite Spline object features to model interaction-induced deformations.
VLMs caption real objects effectively but degrade on 3D-printed fakes in robotic scenes, while some standard metrics fail to detect the factual errors from this domain shift.
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
GeoRect4D couples 3D Gaussian splatting with a single-step diffusion rectifier via degradation-aware feedback and progressive optimization to improve fidelity and consistency in sparse-view dynamic 3D reconstruction.
MeshOn composes two input meshes realistically without intersections by using VLM-based rigid initialization, attractive geometric losses, a barrier loss, and a diffusion prior for final deformation.
citing papers explorer
-
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
GlobalSplat achieves competitive novel-view synthesis on RealEstate10K and ACID using only 16K Gaussians via global scene tokens and coarse-to-fine training, with a 4MB footprint and under 78ms inference.
-
Novel View Synthesis as Video Completion
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
-
HairOrbit: Multi-view Aware 3D Hair Modeling from Single Portraits
HairOrbit leverages video generation priors and a neural orientation extractor to achieve state-of-the-art strand-level 3D hair reconstruction from single-view portraits in visible and invisible regions.
-
EndoVGGT: GNN-Enhanced Depth Estimation for Surgical 3D Reconstruction
EndoVGGT uses a dynamic DeGAT graph attention module to improve depth estimation and non-rigid 3D reconstruction in surgery, reporting 24.6% PSNR and 9.1% SSIM gains on SCARED with zero-shot generalization to new domains.
-
Echo4DIR: 4D Implicit Heart Reconstruction from 2D Echocardiography Videos
Echo4DIR reconstructs continuous 4D cardiac geometry from sparse 2D echocardiography videos using implicit representations, epipolar feature fusion, self-supervised domain adaptation, and radial SDF alignment to achieve up to 98.35% Dice overlap.
-
EndoGSim: Physics-Aware 4D Dynamic Endoscopic Scene Simulations via MLLM-Guided Gaussian Splatting
EndoGSim integrates MLLM-guided material initialization with 4D Gaussian Splatting and differentiable Material Point Method to achieve physics-aware 4D reconstruction and simulation of endoscopic scenes.
-
GSMap: 2D Gaussians for Online HD Mapping
GSMap represents HD map elements as sequences of 2D Gaussians to unify geometric precision and topological regularity for online autonomous driving maps.
-
AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes
AsyncEvGS reconstructs high-fidelity 3D scenes from motion-blurred images by first deblurring via event data then using VGGT-based pose estimation and structure-driven losses inside Gaussian Splatting.
-
FluSplat: Sparse-View 3D Editing without Test-Time Optimization
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
-
ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
-
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
-
Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models
Rein3D generates photorealistic, globally consistent 3D indoor scenes by using a restore-and-refine process where radial panoramic videos are restored via diffusion models and then used to update a 3D Gaussian field.
-
Real-Time Human Reconstruction and Animation using Feed-Forward Gaussian Splatting
A feed-forward network predicts per-SMPL-X-vertex 3D Gaussians in canonical space from multi-view RGB images, enabling single-pass reconstruction and real-time animation via linear blend skinning.
-
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
-
HOIGS: Human-Object Interaction Gaussian Splatting
HOIGS adds a cross-attention HOI module to Gaussian Splatting that combines HexPlane human features with Cubic Hermite Spline object features to model interaction-induced deformations.
-
Fake or Real, Can Robots Tell? Evaluating VLM Robustness to Domain Shift in Single-View Robotic Scene Understanding
VLMs caption real objects effectively but degrade on 3D-printed fakes in robotic scenes, while some standard metrics fail to detect the factual errors from this domain shift.
-
Pose-Aware Diffusion for 3D Generation
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
-
GeoRect4D: Geometry-Compatible Generative Rectification for Dynamic Sparse-View 3D Reconstruction
GeoRect4D couples 3D Gaussian splatting with a single-step diffusion rectifier via degradation-aware feedback and progressive optimization to improve fidelity and consistency in sparse-view dynamic 3D reconstruction.
-
MeshOn: Intersection-Free Mesh-to-Mesh Composition
MeshOn composes two input meshes realistically without intersections by using VLM-based rigid initialization, attractive geometric losses, a barrier loss, and a diffusion prior for final deformation.