S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Wenjun Zeng; Xiaotian Chen; Xuejin Chen; Yuwang Wang

arxiv: 2104.00877 · v2 · pith:ICNIYHH6new · submitted 2021-04-02 · 💻 cs.CV

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Xiaotian Chen , Yuwang Wang , Xuejin Chen , Wenjun Zeng This is my paper

classification 💻 cs.CV

keywords depthdepth-specificreal-worldrepresentations2r-depthnetdataimagemodule

0 comments

read the original abstract

Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet (Synthetic to Real DepthNet) can be well generalized to unseen real-world data directly even though it is only trained on synthetic data. S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domaininvariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation. Without access of any real-world images, our method even outperforms the state-of-the-art unsupervised domain adaptation methods which use real-world images of the target domain for training. In addition, when using a small amount of labeled real-world data, we achieve the state-ofthe-art performance under the semi-supervised setting. The code and trained models are available at https://github.com/microsoft/S2R-DepthNet.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Stereo Visual SLAM System Using Object-Level Motion Estimation and Geometric Filtering Based on Cross Disparity
cs.RO 2026-07 unverdicted novelty 5.0

OCD SLAM adds cross-disparity inconsistency checks and object-level motion classification to ORB-SLAM2, reporting better trajectory accuracy than prior dynamic SLAM methods on KITTI sequences.