A new benchmark with real lunar stereo ground truth and analog data shows that sim-to-real fine-tuned monocular depth models achieve large in-domain gains but minimal generalization to actual lunar images.
FoundationStereo: Zero-Shot Stereo Matching, April 2025
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
SonarSweep adapts plane sweeping into an end-to-end neural network for sonar-vision fusion to produce dense accurate depth maps that outperform prior methods in high-turbidity underwater conditions.
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
An automated annotation pipeline combining Grounded DINO and SAM produces usable bounding boxes and masks for weakly supervised defect detection in shearography.
TwinOR creates dynamic photorealistic digital twins of operating rooms that generate realistic RGB and depth data enabling embodied AI perception and localization tasks to match real-world performance levels.
A geometry-aware 4D video generation model trained with cross-view pointmap alignment to produce spatio-temporally consistent future videos from novel viewpoints for robot manipulation.
citing papers explorer
-
LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation
A new benchmark with real lunar stereo ground truth and analog data shows that sim-to-real fine-tuned monocular depth models achieve large in-domain gains but minimal generalization to actual lunar images.
-
SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
SonarSweep adapts plane sweeping into an end-to-end neural network for sonar-vision fusion to produce dense accurate depth maps that outperform prior methods in high-turbidity underwater conditions.
-
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
-
Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect Detection
An automated annotation pipeline combining Grounded DINO and SAM produces usable bounding boxes and masks for weakly supervised defect detection in shearography.
-
TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research
TwinOR creates dynamic photorealistic digital twins of operating rooms that generate realistic RGB and depth data enabling embodied AI perception and localization tasks to match real-world performance levels.
-
Geometry-aware 4D Video Generation for Robot Manipulation
A geometry-aware 4D video generation model trained with cross-view pointmap alignment to produce spatio-temporally consistent future videos from novel viewpoints for robot manipulation.