TokenGS uses learnable Gaussian tokens in an encoder-decoder architecture to regress 3D means directly, achieving SOTA feed-forward reconstruction on static and dynamic scenes with better robustness.
Feed-forward bullet-time reconstruction of dynamic scenes from monocular videos.arXiv preprint arXiv:2412.03526
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 9roles
background 4polarities
background 4representative citing papers
LongDPM introduces an overlap-aware chunk-based framework that registers and fuses local dynamic reconstructions to achieve coherent long-range 4D geometry and tracking from monocular video.
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.
Neural Harmonic Textures add periodic feature interpolation and deferred neural decoding to primitive representations, achieving state-of-the-art real-time novel-view synthesis and bridging primitive and neural-field methods.
A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.
ViPE estimates camera intrinsics, motion, and dense near-metric depth from uncalibrated videos, outperforming baselines on TUM and KITTI while releasing annotations for 96M frames across real and generated videos.
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
citing papers explorer
-
TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens
TokenGS uses learnable Gaussian tokens in an encoder-decoder architecture to regress 3D means directly, achieving SOTA feed-forward reconstruction on static and dynamic scenes with better robustness.
-
LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos
LongDPM introduces an overlap-aware chunk-based framework that registers and fuses local dynamic reconstructions to achieve coherent long-range 4D geometry and tracking from monocular video.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows
LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.
-
Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction
Neural Harmonic Textures add periodic feature interpolation and deferred neural decoding to primitive representations, achieving state-of-the-art real-time novel-view synthesis and bridging primitive and neural-field methods.
-
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models
A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.
-
ViPE: Video Pose Engine for 3D Geometric Perception
ViPE estimates camera intrinsics, motion, and dense near-metric depth from uncalibrated videos, outperforming baselines on TUM and KITTI while releasing annotations for 96M frames across real and generated videos.
-
Cosmos World Foundation Model Platform for Physical AI
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
- PAGE-4D: VGGT-4D Perception via Disentangled Pose and Geometry Estimation