Self-supervised monocular depth estimation improves in low-texture regions by using distance transforms on jointly estimated pre-semantic contours to create more informative loss signals.
arXiv preprint arXiv:1812.04605 , year =
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Tango3D unifies dense pixel-to-point 2D-3D alignment and global retrieval in one shared space using a geometry-aware 2D backbone, 3D VAE tokens, and three-stage progressive training.
DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.
Flow4DGS-SLAM uses optical flow to generate motion masks, initialize poses, and guide 4D Gaussian modeling with scene flow and GMM for temporal properties, claiming SOTA results in dynamic tracking and reconstruction.
By fine-tuning DUST3R to output per-timestep pointmaps on scarce dynamic video datasets, MonST3R achieves stronger video depth and pose estimation without explicit motion modeling.
VGGT-SLAM++ improves on prior transformer SLAM by adding dense DEM submap graphs and high-cadence local optimization, achieving SOTA accuracy with reduced drift and bounded memory on benchmarks.
VGGT-Long extends VGGT with chunking, overlap alignment, and loop closure to produce consistent kilometer-scale 3D reconstructions from monocular RGB sequences without retraining or extra supervision.
citing papers explorer
-
Improved monocular depth prediction using distance transform over pre-semantic contours with self-supervised neural networks
Self-supervised monocular depth estimation improves in low-texture regions by using distance transforms on jointly estimated pre-semantic contours to create more informative loss signals.
-
Tango3D: Towards Alignment for Global and Local 2D-3D Correspondence
Tango3D unifies dense pixel-to-point 2D-3D alignment and global retrieval in one shared space using a geometry-aware 2D backbone, 3D VAE tokens, and three-stage progressive training.
-
Depth Anything 3: Recovering the Visual Space from Any Views
DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.
-
Flow4DGS-SLAM: Optical Flow-Guided 4D Gaussian Splatting SLAM
Flow4DGS-SLAM uses optical flow to generate motion masks, initialize poses, and guide 4D Gaussian modeling with scene flow and GMM for temporal properties, claiming SOTA results in dynamic tracking and reconstruction.
-
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
By fine-tuning DUST3R to output per-timestep pointmaps on scarce dynamic video datasets, MonST3R achieves stronger video depth and pose estimation without explicit motion modeling.
-
VGGT-SLAM++
VGGT-SLAM++ improves on prior transformer SLAM by adding dense DEM submap graphs and high-cadence local optimization, achieving SOTA accuracy with reduced drift and bounded memory on benchmarks.
-
VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences
VGGT-Long extends VGGT with chunking, overlap alignment, and loop closure to produce consistent kilometer-scale 3D reconstructions from monocular RGB sequences without retraining or extra supervision.
- PRISM-SLAM: Probabilistic Ray-Grounded Inference for Scale-aware Metric SLAM