3DReflecNet is a 22 TB+ dataset of over 120,000 synthetic and 1,000 real objects with millions of multi-view frames for benchmarking 3D reconstruction on reflective, transparent, and low-texture surfaces.
hub Canonical reference
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Canonical reference. 71% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
fields
cs.CV 58representative citing papers
ULF-Loc removes bias from 3DGS landmark features via geometry-weighted fusion and consistency checks, cutting median translation error 17% while using 1/10 training time and 1/6 GPU memory of prior state-of-the-art.
PAGaS refines multi-view stereo depths by optimizing 1DoF Gaussians whose positions and sizes are fixed by back-projected pixel volumes, producing detailed depth maps that outperform reference baselines on 3D reconstruction benchmarks.
TokenGS uses learnable Gaussian tokens in an encoder-decoder architecture to regress 3D means directly, achieving SOTA feed-forward reconstruction on static and dynamic scenes with better robustness.
ClipGStream enables scalable flicker-free reconstruction of long dynamic multi-view videos by performing stream optimization at the clip level with clip-independent spatio-temporal fields, residual anchor compensation, and inter-clip inherited anchors.
A novel explicit neural height field method for descent-phase wide-angle imagery achieves greater spatial coverage than multi-view stereo while preserving estimation accuracy on simulated planetary terrains.
DreamStereo uses GAPW, PBDP, and SASI to enable real-time stereo video inpainting at 25 FPS for HD videos by reducing over 70% redundant computation while maintaining quality.
AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.
Test-time constrained optimization incorporates priors into pre-trained multiview transformers via self-supervised losses and penalty terms to improve 3D reconstruction accuracy.
THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
ProDiG progressively transforms aerial Gaussian splats into coherent ground-level 3D reconstructions via diffusion guidance and specialized attention modules.
MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.
PerpetualWonder introduces a closed-loop generative simulator with a unified physical-visual representation for long-horizon action-conditioned 4D scene generation from one image.
AGILE generates complete object meshes via VLM-guided synthesis and tracks poses with anchor-and-track plus contact-aware optimization to achieve robust hand-object reconstruction from video.
ART is a category-agnostic transformer that maps sparse multi-state RGB images to per-part 3D geometry, texture, and articulation parameters via learnable part slots.
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
A Z-order transformer organizes unstructured Gaussians for sparse attention, enabling feed-forward prediction of high-quality 3D splats with fewer primitives.
HumanSplatHMR jointly refines 3D human poses and learns Gaussian Splatting avatars by backpropagating photometric, segmentation, and depth losses through a differentiable renderer to improve novel-view and novel-pose synthesis from in-the-wild video.
Pruned local linear blendshapes on Gaussians capture pose-dependent appearance changes to deliver high-quality mobile avatars at 120 FPS from multi-view video without pretrained models.
GLMap combines explicit 3D Gaussians with multi-scale language semantics in a dual-modality structure and uses an analytical Gaussian Estimator for incremental map building, improving zero-shot performance on navigation and reasoning tasks.
GenWildSplat is a feed-forward model that reconstructs 3D Gaussians from sparse unposed unconstrained images by predicting depth and poses with learned priors, an appearance adapter, and semantic segmentation for transients.
Color-encoded illumination combined with dynamic Gaussian Splatting enables first-of-a-kind high-speed volumetric reconstruction from unaugmented low-speed multi-view cameras.
Unprojecting latent embeddings via depth maps and recalibrating with cross-view attention improves 3D Gaussian localization for generalizable sparse-view human rendering.
citing papers explorer
-
3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects
3DReflecNet is a 22 TB+ dataset of over 120,000 synthetic and 1,000 real objects with millions of multi-view frames for benchmarking 3D reconstruction on reflective, transparent, and low-texture surfaces.
-
ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting
ULF-Loc removes bias from 3DGS landmark features via geometry-weighted fusion and consistency checks, cutting median translation error 17% while using 1/10 training time and 1/6 GPU memory of prior state-of-the-art.
-
PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement
PAGaS refines multi-view stereo depths by optimizing 1DoF Gaussians whose positions and sizes are fixed by back-projected pixel volumes, producing detailed depth maps that outperform reference baselines on 3D reconstruction benchmarks.
-
TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens
TokenGS uses learnable Gaussian tokens in an encoder-decoder architecture to regress 3D means directly, achieving SOTA feed-forward reconstruction on static and dynamic scenes with better robustness.
-
ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction
ClipGStream enables scalable flicker-free reconstruction of long dynamic multi-view videos by performing stream optimization at the clip level with clip-independent spatio-temporal fields, residual anchor compensation, and inter-clip inherited anchors.
-
Neural 3D Reconstruction of Planetary Surfaces from Descent-Phase Wide-Angle Imagery
A novel explicit neural height field method for descent-phase wide-angle imagery achieves greater spatial coverage than multi-view stereo while preserving estimation accuracy on simulated planetary terrains.
-
DreamStereo: Towards Real-Time Stereo Inpainting for HD Videos
DreamStereo uses GAPW, PBDP, and SASI to enable real-time stereo video inpainting at 25 FPS for HD videos by reducing over 70% redundant computation while maintaining quality.
-
AnchorSplat: Feed-Forward 3D Gaussian Splatting with 3D Geometric Priors
AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
-
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization
AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.
-
Learning 3D Reconstruction with Priors in Test Time
Test-time constrained optimization incorporates priors into pre-trained multiview transformers via self-supervised losses and penalty terms to improve 3D reconstruction accuracy.
-
THOM: Generating Physically Plausible Hand-Object Meshes From Text
THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
-
ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction
ProDiG progressively transforms aerial Gaussian splats into coherent ground-level 3D reconstructions via diffusion guidance and specialized attention modules.
-
Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping
MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.
-
PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation
PerpetualWonder introduces a closed-loop generative simulator with a unified physical-visual representation for long-horizon action-conditioned 4D scene generation from one image.
-
AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
AGILE generates complete object meshes via VLM-guided synthesis and tracks poses with anchor-and-track plus contact-aware optimization to achieve robust hand-object reconstruction from video.
-
ART: Articulated Reconstruction Transformer
ART is a category-agnostic transformer that maps sparse multi-state RGB images to per-part 3D geometry, texture, and articulation parameters via learnable part slots.
-
RDSplat: Robust Watermarking for 3D Gaussian Splatting Against 2D and 3D Diffusion Editing
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
-
Z-Order Transformer for Feed-Forward Gaussian Splatting
A Z-order transformer organizes unstructured Gaussians for sparse attention, enabling feed-forward prediction of high-quality 3D splats with fewer primitives.
-
HumanSplatHMR: Closing the Loop Between Human Mesh Recovery and Gaussian Splatting Avatar
HumanSplatHMR jointly refines 3D human poses and learns Gaussian Splatting avatars by backpropagating photometric, segmentation, and depth losses through a differentiable renderer to improve novel-view and novel-pose synthesis from in-the-wild video.
-
High-Fidelity Mobile Avatars with Pruned Local Blendshapes
Pruned local linear blendshapes on Gaussians capture pose-dependent appearance changes to deliver high-quality mobile avatars at 120 FPS from multi-view video without pretrained models.
-
Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning
GLMap combines explicit 3D Gaussians with multi-scale language semantics in a dual-modality structure and uses an analytical Gaussian Estimator for incremental map building, improving zero-shot performance on navigation and reasoning tasks.
-
Generalizable Sparse-View 3D Reconstruction from Unconstrained Images
GenWildSplat is a feed-forward model that reconstructs 3D Gaussians from sparse unposed unconstrained images by predicting depth and poses with learned priors, an appearance adapter, and semantic segmentation for transients.
-
Color-Encoded Illumination for High-Speed Volumetric Scene Reconstruction
Color-encoded illumination combined with dynamic Gaussian Splatting enables first-of-a-kind high-speed volumetric reconstruction from unaugmented low-speed multi-view cameras.
-
Generalizable Human Gaussian Splatting via Multi-view Semantic Consistency
Unprojecting latent embeddings via depth maps and recalibrating with cross-view attention improves 3D Gaussian localization for generalizable sparse-view human rendering.
-
DualSplat: Robust 3D Gaussian Splatting via Pseudo-Mask Bootstrapping from Reconstruction Failures
DualSplat bootstraps object-level pseudo-masks from initial 3DGS reconstruction failures using residuals and SAM2 to enable robust second-pass optimization in transient-heavy scenes.
-
Gaussians on a Diet: High-Quality Memory-Bounded 3D Gaussian Splatting Training
A dynamic training framework for 3D Gaussian Splatting alternates incremental pruning and adaptive growing of primitives to maintain high rendering quality at up to 80% lower peak memory than standard 3DGS.
-
GS4City: Hierarchical Semantic Gaussian Splatting via City-Model Priors
GS4City derives geometry-grounded semantic masks from LoD3 CityGML models via raycasting and fuses them with 2D foundation model outputs to supervise identity encodings on Gaussians, improving coarse and fine semantic segmentation on urban datasets.
-
FreeScale: Scaling 3D Scenes via Certainty-Aware Free-View Generation
FreeScale generates scalable high-quality training data for generalizable novel view synthesis by certainty-aware sampling from imperfect scene reconstructions, delivering 2.7 dB PSNR gains on out-of-distribution tests.
-
Scene-Agnostic Object-Centric Representation Learning for 3D Gaussian Splatting
A scene-agnostic object codebook learned via unsupervised object-centric learning provides consistent identity-anchored representations for 3D Gaussians across multiple scenes.
-
In Depth We Trust: Reliable Monocular Depth Supervision for Gaussian Splatting
A selective regularization framework lets scale-ambiguous monocular depth priors improve Gaussian Splatting geometry and rendering by isolating and supervising only ill-posed regions.
-
3D Gaussian Splatting for Annular Dark Field Scanning Transmission Electron Microscopy Tomography Reconstruction
DenZa-Gaussian adapts 3D Gaussian Splatting for ADF-STEM tomography by modeling scattering as a learnable scalar field, adding tilt-angle normalization, and using a Fourier amplitude loss to improve sparse-view 3D reconstructions.
-
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
-
DINO-VO: Learning Where to Focus for Enhanced State Estimation
DINO-VO achieves state-of-the-art monocular visual odometry accuracy and generalization by training a differentiable patch selector together with multi-task features and inverse-depth bundle adjustment.
-
Rascene: High-Fidelity 3D Scene Imaging with mmWave Communication Signals
Rascene reconstructs high-precision 3D scenes from standard mmWave OFDM communication signals via multi-frame spatially adaptive fusion.
-
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
-
Scene Grounding In the Wild
A semantic feature optimization grounds disconnected partial 3D reconstructions to geospatially accurate reference models derived from Google Earth, improving global alignment across classical and learning-based pipelines.
-
Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes
A 3D Language-Embedded Gaussians framework with opacity-aware Poisson volumetric aggregation and progressive temperature decay achieves 59.50 IoU and 21.05 mIoU on Occ-ScanNet for open-vocabulary indoor occupancy.
-
Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians
Pixel-to-4D builds a dynamic 3D Gaussian representation from one image and samples object motion in a single forward pass to produce camera-controlled videos with claimed state-of-the-art quality and speed on KITTI, Waymo, RealEstate10K and DL3DV-10K.
-
GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation
GaussianDWM uses 3D Gaussians with embedded linguistic features, language-guided sampling, and dual-condition generation for unified scene understanding and multi-modal output in driving world models.
-
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.
-
FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision
FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.
-
Native and Compact Structured Latents for 3D Generation
Introduces O-Voxel omni-voxel representation and Sparse Compression VAE for structured native 3D latents, enabling efficient training of large flow-matching models that produce higher-quality geometry and materials than prior methods.
-
From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images
A technique reconstructs large urban areas from sparse extreme off-nadir satellite images by modeling geometry as a Z-monotonic 2.5D height map SDF and applying a generative network to restore plausible textures on the resulting mesh.
-
C3G: Learning Compact 3D Representations with 2K Gaussians
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
-
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.
-
FACT-GS: Frequency-Aligned Complexity-Aware Texture Reparameterization for 2D Gaussian Splatting
FACT-GS allocates higher texture sampling density to high-frequency areas in 2D Gaussian Splatting through a learnable deformation field, recovering sharper details at the same parameter budget.
-
GRLoc: Geometric Representation Regression for Visual Localization
The paper reformulates absolute pose regression as regressing disentangled world-coordinate raymaps and pointmaps from images, then recovering pose via a differentiable solver, claiming SOTA results on 7-Scenes and Cambridge Landmarks.
-
MedGS: Gaussian Splatting for Multi-Modal 3D Medical Imaging
MedGS extends Gaussian Splatting with a relightable model tailored to endoscopic imaging where light and camera are co-located, achieving better novel-view synthesis and tissue editing than baselines.
-
GaussianZoom: Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance
GaussianZoom enables high-fidelity extreme zoom-in 3D rendering from low-res inputs via an iterative framework combining geometry-consistent modeling, depth-based super-resolution, VLM detail synthesis, and an expandable continuous Level-of-Detail hierarchy.
-
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.