CDPR integrates polarization priors into a diffusion-based monocular depth estimator via shared latent space and adaptive gating, outperforming RGB-only methods in challenging scenes.
Depth anything v2
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
UfM* uses Gaussian mixtures to compute multiview disagreement for uncertainty in depth estimation with single inference per image, reducing energy and memory use.
A privacy-preserving thermal-only crowd counting framework extracts enhanced features from thermal images via single-step LCM denoising in a depth-to-RGB diffusion model and matches RGB-T fusion performance without RGB input at inference.
H-OmniStereo trains a stereo matcher on 2.8 million synthetic equirectangular pairs and adds a heading-aligned normal prior to improve zero-shot accuracy and generalization on out-of-domain and real omnidirectional data.
Angle-I2P rejects outliers in cross-modality registration via scale-invariant angular consistency and hierarchical attention, reporting state-of-the-art inlier ratio and registration recall on 7Scenes, RGBD Scenes V2, and a self-collected dataset.
OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
GREATEN fuses surface normals with image features via gated contextual-geometric fusion and efficient sparse attentions to cut stereo matching errors by up to 30% on real datasets when trained solely on synthetic data.
citing papers explorer
-
CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation
CDPR integrates polarization priors into a diffusion-based monocular depth estimator via shared latent space and adaptive gating, outperforming RGB-only methods in challenging scenes.
-
UfM*: Uncertainty from Motion* for DNN Depth Estimation Using Gaussians
UfM* uses Gaussian mixtures to compute multiview disagreement for uncertainty in depth estimation with single inference per image, reducing energy and memory use.
-
Thermal-Only Crowd Counting with Deployment-Time Privacy Protection
A privacy-preserving thermal-only crowd counting framework extracts enhanced features from thermal images via single-step LCM denoising in a depth-to-RGB diffusion model and matches RGB-T fusion performance without RGB input at inference.
-
H-OmniStereo: Zero-Shot Omnidirectional Stereo Matching with Heading-Aligned Normal Priors
H-OmniStereo trains a stereo matcher on 2.8 million synthetic equirectangular pairs and adds a heading-aligned normal prior to improve zero-shot accuracy and generalization on out-of-domain and real omnidirectional data.
-
Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection
Angle-I2P rejects outliers in cross-modality registration via scale-invariant angular consistency and hierarchical attention, reporting state-of-the-art inlier ratio and registration recall on 7Scenes, RGBD Scenes V2, and a self-collected dataset.
-
Clutter-Robust Vision-Language-Action Models through Object-Centric and Geometry Grounding
OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
-
Geometry Reinforced Efficient Attention Tuning Equipped with Normals for Robust Stereo Matching
GREATEN fuses surface normals with image features via gated contextual-geometric fusion and efficient sparse attentions to cut stereo matching errors by up to 30% on real datasets when trained solely on synthetic data.