3DReflecNet is a 22 TB+ dataset of over 120,000 synthetic and 1,000 real objects with millions of multi-view frames for benchmarking 3D reconstruction on reflective, transparent, and low-texture surfaces.
hub Canonical reference
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Canonical reference. 71% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
fields
cs.CV 58representative citing papers
ULF-Loc removes bias from 3DGS landmark features via geometry-weighted fusion and consistency checks, cutting median translation error 17% while using 1/10 training time and 1/6 GPU memory of prior state-of-the-art.
PAGaS refines multi-view stereo depths by optimizing 1DoF Gaussians whose positions and sizes are fixed by back-projected pixel volumes, producing detailed depth maps that outperform reference baselines on 3D reconstruction benchmarks.
TokenGS uses learnable Gaussian tokens in an encoder-decoder architecture to regress 3D means directly, achieving SOTA feed-forward reconstruction on static and dynamic scenes with better robustness.
ClipGStream enables scalable flicker-free reconstruction of long dynamic multi-view videos by performing stream optimization at the clip level with clip-independent spatio-temporal fields, residual anchor compensation, and inter-clip inherited anchors.
A novel explicit neural height field method for descent-phase wide-angle imagery achieves greater spatial coverage than multi-view stereo while preserving estimation accuracy on simulated planetary terrains.
DreamStereo uses GAPW, PBDP, and SASI to enable real-time stereo video inpainting at 25 FPS for HD videos by reducing over 70% redundant computation while maintaining quality.
AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.
Test-time constrained optimization incorporates priors into pre-trained multiview transformers via self-supervised losses and penalty terms to improve 3D reconstruction accuracy.
THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
ProDiG progressively transforms aerial Gaussian splats into coherent ground-level 3D reconstructions via diffusion guidance and specialized attention modules.
MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.
PerpetualWonder introduces a closed-loop generative simulator with a unified physical-visual representation for long-horizon action-conditioned 4D scene generation from one image.
AGILE generates complete object meshes via VLM-guided synthesis and tracks poses with anchor-and-track plus contact-aware optimization to achieve robust hand-object reconstruction from video.
ART is a category-agnostic transformer that maps sparse multi-state RGB images to per-part 3D geometry, texture, and articulation parameters via learnable part slots.
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
A Z-order transformer organizes unstructured Gaussians for sparse attention, enabling feed-forward prediction of high-quality 3D splats with fewer primitives.
HumanSplatHMR jointly refines 3D human poses and learns Gaussian Splatting avatars by backpropagating photometric, segmentation, and depth losses through a differentiable renderer to improve novel-view and novel-pose synthesis from in-the-wild video.
Pruned local linear blendshapes on Gaussians capture pose-dependent appearance changes to deliver high-quality mobile avatars at 120 FPS from multi-view video without pretrained models.
GLMap combines explicit 3D Gaussians with multi-scale language semantics in a dual-modality structure and uses an analytical Gaussian Estimator for incremental map building, improving zero-shot performance on navigation and reasoning tasks.
GenWildSplat is a feed-forward model that reconstructs 3D Gaussians from sparse unposed unconstrained images by predicting depth and poses with learned priors, an appearance adapter, and semantic segmentation for transients.
Color-encoded illumination combined with dynamic Gaussian Splatting enables first-of-a-kind high-speed volumetric reconstruction from unaugmented low-speed multi-view cameras.
Unprojecting latent embeddings via depth maps and recalibrating with cross-view attention improves 3D Gaussian localization for generalizable sparse-view human rendering.
citing papers explorer
-
ART: Articulated Reconstruction Transformer
ART is a category-agnostic transformer that maps sparse multi-state RGB images to per-part 3D geometry, texture, and articulation parameters via learnable part slots.
-
RDSplat: Robust Watermarking for 3D Gaussian Splatting Against 2D and 3D Diffusion Editing
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
-
GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation
GaussianDWM uses 3D Gaussians with embedded linguistic features, language-guided sampling, and dual-condition generation for unified scene understanding and multi-modal output in driving world models.
-
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.
-
FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision
FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.
-
Native and Compact Structured Latents for 3D Generation
Introduces O-Voxel omni-voxel representation and Sparse Compression VAE for structured native 3D latents, enabling efficient training of large flow-matching models that produce higher-quality geometry and materials than prior methods.
-
From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images
A technique reconstructs large urban areas from sparse extreme off-nadir satellite images by modeling geometry as a Z-monotonic 2.5D height map SDF and applying a generative network to restore plausible textures on the resulting mesh.
-
C3G: Learning Compact 3D Representations with 2K Gaussians
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
-
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.
-
FACT-GS: Frequency-Aligned Complexity-Aware Texture Reparameterization for 2D Gaussian Splatting
FACT-GS allocates higher texture sampling density to high-frequency areas in 2D Gaussian Splatting through a learnable deformation field, recovering sharper details at the same parameter budget.
-
GRLoc: Geometric Representation Regression for Visual Localization
The paper reformulates absolute pose regression as regressing disentangled world-coordinate raymaps and pointmaps from images, then recovering pose via a differentiable solver, claiming SOTA results on 7-Scenes and Cambridge Landmarks.
-
MedGS: Gaussian Splatting for Multi-Modal 3D Medical Imaging
MedGS extends Gaussian Splatting with a relightable model tailored to endoscopic imaging where light and camera are co-located, achieving better novel-view synthesis and tissue editing than baselines.
-
SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model
A sparse transformer predicts multi-frame 3D occupancy from images without BEV or VAE tokenization and reports SOTA results on nuScenes for 1-3s forecasting under arbitrary trajectories.
-
MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes
MetroGS combines distributed 2D Gaussian Splatting with structured dense enhancement, progressive hybrid optimization, and depth-guided appearance modeling to deliver higher geometric accuracy and stability in large-scale urban reconstruction.
-
SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors
SING3R-SLAM adds submap-level global alignment and reconstruction priors to a Gaussian map to reduce drift and improve local geometry in monocular indoor SLAM.
- WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling