LeanGate is a lightweight feed-forward network that predicts geometric utility scores to skip over 90% of redundant frames in GFM-based monocular SLAM, reducing tracking FLOPs by 85% and achieving 5x speedup while maintaining accuracy.
Ba-net: Dense bundle ad- justment network
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 8roles
background 2polarities
background 2representative citing papers
Tango3D unifies dense pixel-to-point 2D-3D alignment and global retrieval in one shared space using a geometry-aware 2D backbone, 3D VAE tokens, and three-stage progressive training.
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
PAGE-4D is a feedforward extension of VGGT that uses a dynamics-aware aggregator and mask to disentangle pose estimation from geometry reconstruction in videos with moving objects.
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.
By fine-tuning DUST3R to output per-timestep pointmaps on scarce dynamic video datasets, MonST3R achieves stronger video depth and pose estimation without explicit motion modeling.
VGGT-Long extends VGGT with chunking, overlap alignment, and loop closure to produce consistent kilometer-scale 3D reconstructions from monocular RGB sequences without retraining or extra supervision.
citing papers explorer
-
Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring
LeanGate is a lightweight feed-forward network that predicts geometric utility scores to skip over 90% of redundant frames in GFM-based monocular SLAM, reducing tracking FLOPs by 85% and achieving 5x speedup while maintaining accuracy.
-
Tango3D: Towards Alignment for Global and Local 2D-3D Correspondence
Tango3D unifies dense pixel-to-point 2D-3D alignment and global retrieval in one shared space using a geometry-aware 2D backbone, 3D VAE tokens, and three-stage progressive training.
-
Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
-
PAGE-4D: VGGT-4D Perception via Disentangled Pose and Geometry Estimation
PAGE-4D is a feedforward extension of VGGT that uses a dynamics-aware aggregator and mask to disentangle pose estimation from geometry reconstruction in videos with moving objects.
-
Efficient 3D Content Reconstruction and Generation
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
-
TTT3R: 3D Reconstruction as Test-Time Training
TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.
-
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
By fine-tuning DUST3R to output per-timestep pointmaps on scarce dynamic video datasets, MonST3R achieves stronger video depth and pose estimation without explicit motion modeling.
-
VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences
VGGT-Long extends VGGT with chunking, overlap alignment, and loop closure to produce consistent kilometer-scale 3D reconstructions from monocular RGB sequences without retraining or extra supervision.