SGANVO: Unsupervised Deep Visual Odometry and Depth Estimation with Stacked Generative Adversarial Networks

Dongbing Gu; Tuo Feng

arxiv: 1906.08889 · v1 · pith:4U3ZTULInew · submitted 2019-06-20 · 💻 cs.RO · cs.CV· eess.IV

SGANVO: Unsupervised Deep Visual Odometry and Depth Estimation with Stacked Generative Adversarial Networks

Tuo Feng , Dongbing Gu This is my paper

Pith reviewed 2026-05-25 19:16 UTC · model grok-4.3

classification 💻 cs.RO cs.CVeess.IV

keywords unsupervised visual odometrydepth estimationgenerative adversarial networksstacked GANego-motion estimationKITTI datasetrecurrent representation

0 comments

The pith

The SGANVO stacked GAN produces better or comparable unsupervised depth and ego-motion estimates on the KITTI dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SGANVO, a system of stacked GAN layers for unsupervised visual depth and ego-motion estimation from video. The lowest layer handles direct depth and motion prediction, higher layers extract spatial features, and recurrent connections across layers capture temporal dynamics. This setup is positioned as an advance over encoder-decoder networks, RCNNs, and earlier GAN uses by leveraging the adversarial training process. Results on the KITTI dataset are reported as better or comparable to prior unsupervised methods, particularly in challenging scenes. A reader would care because such methods could support more reliable camera-based navigation without requiring labeled training data.

Core claim

This paper proposes a novel unsupervised network system for visual depth and ego-motion estimation: Stacked Generative Adversarial Network(SGANVO). It consists of a stack of GAN layers, of which the lowest layer estimates the depth and ego-motion while the higher layers estimate the spatial features. It can also capture the temporal dynamic due to the use of a recurrent representation across the layers. The evaluation results show that our proposed method can produce better or comparable results in depth and ego-motion estimation.

What carries the argument

The stack of GAN layers where the lowest layer estimates depth and ego-motion, higher layers estimate spatial features, and recurrent representation across layers captures temporal dynamics.

Load-bearing premise

That the specific stack of GAN layers combined with recurrent representation across layers will capture both spatial features and temporal dynamics sufficiently to improve estimation accuracy beyond prior unsupervised methods.

What would settle it

Direct comparison of depth estimation errors (such as absolute relative error) and ego-motion accuracy (such as trajectory error) between SGANVO and prior unsupervised methods on the KITTI dataset; if SGANVO errors are not lower or equal, the central claim does not hold.

Figures

Figures reproduced from arXiv: 1906.08889 by Dongbing Gu, Tuo Feng.

**Figure 2.** Figure 2: The network is unfolded in time. The temporal dynamic [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Illustrated above are qualitative comparisons of our [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Our proposed SGANVO system to estimate the ego [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Recently end-to-end unsupervised deep learning methods have achieved an effect beyond geometric methods for visual depth and ego-motion estimation tasks. These data-based learning methods perform more robustly and accurately in some of the challenging scenes. The encoder-decoder network has been widely used in the depth estimation and the RCNN has brought significant improvements in the ego-motion estimation. Furthermore, the latest use of Generative Adversarial Nets(GANs) in depth and ego-motion estimation has demonstrated that the estimation could be further improved by generating pictures in the game learning process. This paper proposes a novel unsupervised network system for visual depth and ego-motion estimation: Stacked Generative Adversarial Network(SGANVO). It consists of a stack of GAN layers, of which the lowest layer estimates the depth and ego-motion while the higher layers estimate the spatial features. It can also capture the temporal dynamic due to the use of a recurrent representation across the layers. See Fig.1 for details. We select the most commonly used KITTI [1] data set for evaluation. The evaluation results show that our proposed method can produce better or comparable results in depth and ego-motion estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SGANVO stacks GAN layers with recurrent cross-layer links for unsupervised depth and ego-motion, but the abstract gives no numbers or ablations so the performance claim stays unverified.

read the letter

The main takeaway is a stacked GAN architecture where lower layers handle depth and ego-motion while higher ones extract spatial features, plus recurrent connections across layers to capture temporal dynamics. This is a clear step past the single GAN or encoder-decoder setups cited in the abstract. The design choice to split the work across stacked layers and add recurrence for video input is a reasonable extension of existing unsupervised VO work. It targets the same KITTI evaluation that most papers in this area use. The paper does a straightforward job describing how the stack might improve feature capture without overclaiming the mechanism. The central weakness is that the abstract asserts better or comparable results but supplies zero quantitative metrics, error breakdowns, or ablation results. All claims rest on training and testing within the same dataset, which makes it hard to tell whether the gains are real or just fitting. Without the tables or independent checks, the soundness stays low. This paper is aimed at people already following unsupervised deep visual odometry and GAN variants in robotics. A reader looking for new architecture ideas could extract the network description, but anyone needing reproducible evidence would have to wait for the full numbers. It deserves peer review so the quantitative claims can be examined directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes SGANVO, a stacked generative adversarial network for unsupervised depth and ego-motion estimation. The architecture consists of multiple GAN layers where the lowest layer estimates depth and ego-motion, higher layers estimate spatial features, and recurrent representations across layers capture temporal dynamics. Evaluation is performed on the KITTI dataset, with the claim that the method produces better or comparable results to prior unsupervised approaches.

Significance. If the quantitative improvements and architectural advantages are substantiated, the work would contribute a novel combination of stacked GANs and cross-layer recurrence to unsupervised visual odometry, potentially improving robustness in challenging scenes over standard encoder-decoder or RCNN baselines. The paper receives credit for explicitly describing the layered GAN structure and recurrent mechanism in the abstract and for selecting the standard KITTI benchmark for evaluation.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the central claim that SGANVO produces 'better or comparable results' is stated without any quantitative metrics (e.g., Abs Rel, RMSE for depth or ATE for ego-motion), ablation studies, or explicit comparison tables against baselines such as SfMLearner or prior GAN methods. This absence prevents verification of the performance claim and makes the result load-bearing for the paper's contribution.
[Method] Method / Network Architecture section: the assumption that the specific stack of GAN layers plus recurrent cross-layer representation will sufficiently capture both spatial features and temporal dynamics to yield accuracy gains is presented without supporting analysis, such as feature visualization, ablation on recurrence, or comparison of loss terms. This is the weakest link in the central claim.

minor comments (2)

[Abstract] Abstract: 'Nets(GANs)' is missing a space; 'pictures in the game learning process' is informal and should be clarified to 'synthesized images during adversarial training'.
[Related Work] The manuscript should include a dedicated related-work subsection contrasting the stacked recurrent GAN design against existing unsupervised VO methods that also employ adversarial losses.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and agree that the manuscript requires revisions to include quantitative metrics, comparison tables, and supporting analysis for the architectural claims.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim that SGANVO produces 'better or comparable results' is stated without any quantitative metrics (e.g., Abs Rel, RMSE for depth or ATE for ego-motion), ablation studies, or explicit comparison tables against baselines such as SfMLearner or prior GAN methods. This absence prevents verification of the performance claim and makes the result load-bearing for the paper's contribution.

Authors: We agree that the abstract and experiments section lack specific quantitative metrics and explicit comparison tables. In the revised manuscript we will add a results table reporting Abs Rel, Sq Rel, RMSE, and ATE values on the KITTI dataset together with direct numerical comparisons to SfMLearner and prior unsupervised GAN-based methods. This will allow verification of the 'better or comparable' claim. revision: yes
Referee: [Method] Method / Network Architecture section: the assumption that the specific stack of GAN layers plus recurrent cross-layer representation will sufficiently capture both spatial features and temporal dynamics to yield accuracy gains is presented without supporting analysis, such as feature visualization, ablation on recurrence, or comparison of loss terms. This is the weakest link in the central claim.

Authors: The current manuscript describes the stacked GAN layers and recurrent cross-layer mechanism but does not provide ablations or visualizations. We will add an ablation study isolating the contribution of the recurrent connections, together with feature visualizations and a comparison of loss terms, in the revised version to substantiate the architectural design. revision: yes

Circularity Check

0 steps flagged

No significant circularity in claimed results

full rationale

The paper presents an empirical unsupervised learning architecture (stacked GANs with recurrent cross-layer representation) and reports its performance on the standard KITTI benchmark. No derivation chain, first-principles prediction, or mathematical reduction is claimed or present in the abstract or described text. The evaluation results are obtained by training and testing the proposed network on KITTI splits, which constitutes standard supervised-style validation of an empirical method rather than any self-definitional, fitted-input-renamed-as-prediction, or self-citation-load-bearing step. No equations, uniqueness theorems, or ansatzes are invoked that collapse to the inputs by construction. The central claim therefore remains externally falsifiable against the benchmark and does not reduce to its own training procedure.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the effectiveness of the stacked adversarial architecture and recurrent connections, which are introduced without independent evidence beyond the KITTI evaluation.

free parameters (1)

network architecture and loss weights
Deep network parameters and training hyperparameters are fitted to the KITTI data to achieve the reported performance.

axioms (1)

domain assumption Adversarial training via stacked GANs improves depth and ego-motion estimation accuracy
Invoked when the abstract states that latest GAN use has demonstrated further improvement.

pith-pipeline@v0.9.0 · 5740 in / 1107 out tokens · 26744 ms · 2026-05-25T19:16:26.800179+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 2 internal anchors

[1]

”Are we ready for autonomous driving? the kitti vision benchmark suite.” 2012 IEEE Conference on Computer Vision and Pattern Recognition

Geiger, Andreas, Philip Lenz, and Raquel Urtasun. ”Are we ready for autonomous driving? the kitti vision benchmark suite.” 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012

work page 2012
[2]

”Spatial transformer networks.” Advances in neural information processing sys- tems

Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. ”Spatial transformer networks.” Advances in neural information processing sys- tems. 2015

work page 2015
[3]

”Unsupervised cnn for single view depth estimation: Geometry to the rescue.” European Conference on Computer Vision

Garg, Ravi, et al. ”Unsupervised cnn for single view depth estimation: Geometry to the rescue.” European Conference on Computer Vision. Springer, Cham, 2016

work page 2016
[4]

Godard, Clment, Oisin Mac Aodha, and Gabriel J. Brostow. ”Un- supervised monocular depth estimation with left-right consistency.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017

work page 2017
[5]

”Unsupervised learning of depth and ego-motion from video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Zhou, Tinghui, et al. ”Unsupervised learning of depth and ego-motion from video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017

work page 2017
[6]

”Geonet: Unsupervised learning of dense depth, optical ﬂow and camera pose.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Yin, Zhichao, and Jianping Shi. ”Geonet: Unsupervised learning of dense depth, optical ﬂow and camera pose.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018

work page 2018
[7]

”Digging into self-supervised monocular depth estimation.” arXiv preprint arXiv:1806.01260 (2018)

Godard, Clment, et al. ”Digging into self-supervised monocular depth estimation.” arXiv preprint arXiv:1806.01260 (2018)

work page arXiv 2018
[8]

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

Pillai, Sudeep, Rares Ambrus, and Adrien Gaidon. ”SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation.” arXiv preprint arXiv:1810.01849 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

”Undeepvo: Monocular visual odometry through unsupervised deep learning.” 2018 IEEE International Conference on Robotics and Automation (ICRA)

Li, Ruihao, et al. ”Undeepvo: Monocular visual odometry through unsupervised deep learning.” 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018

work page 2018
[10]

Ranjan, Anurag, et al. ”Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmen- tation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019

work page 2019
[11]

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

Wang, Yang, et al. ”Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos.” arXiv preprint arXiv:1810.03654 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

”PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Sun, Deqing, et al. ”PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. 7

work page 2018
[13]

”GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks.” arXiv preprint arXiv:1809.05786 (2018)

Almalioglu, Yasin, et al. ”GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks.” arXiv preprint arXiv:1809.05786 (2018)

work page arXiv 2018
[14]

”Generative adversarial nets.” Advances in neural information processing systems

Goodfellow, Ian, et al. ”Generative adversarial nets.” Advances in neural information processing systems. 2014

work page 2014
[15]

Bhandarkar, and Mukta Prasad

CS Kumar, Arun, Suchendra M. Bhandarkar, and Mukta Prasad. ”Monocular depth prediction using generative adversarial networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018

work page 2018
[16]

”Generative Adversarial Networks for unsu- pervised monocular depth prediction.” Proceedings of the European Conference on Computer Vision (ECCV)

Aleotti, Filippo, et al. ”Generative Adversarial Networks for unsu- pervised monocular depth prediction.” Proceedings of the European Conference on Computer Vision (ECCV). 2018

work page 2018
[17]

”Unsupervised adversarial depth estimation using cycled generative networks.” 2018 International Conference on 3D Vision (3DV)

Pilzer, Andrea, et al. ”Unsupervised adversarial depth estimation using cycled generative networks.” 2018 International Conference on 3D Vision (3DV). IEEE, 2018

work page 2018
[18]

”Generative adversarial networks for depth map estimation from RGB video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

Gwn Lore, Kin, et al. ”Generative adversarial networks for depth map estimation from RGB video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018

work page 2018
[19]

Shi, Wenzhe, et al. ”Real-time single image and video super-resolution using an efﬁcient sub-pixel convolutional neural network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016
[20]

”Self-normalizing neural networks.” Advances in neural information processing systems

Klambauer, Gnter, et al. ”Self-normalizing neural networks.” Advances in neural information processing systems. 2017

work page 2017
[21]

”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems

Eigen, David, Christian Puhrsch, and Rob Fergus. ”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems. 2014

work page 2014
[22]

”Unsupervised learning of depth and ego-motion from monocular video using 3d geo- metric constraints.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Mahjourian, Reza, Martin Wicke, and Anelia Angelova. ”Unsupervised learning of depth and ego-motion from monocular video using 3d geo- metric constraints.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018

work page 2018
[23]

”SGAN: An Alternative Training of Generative Adversarial Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Chavdarova, Tatjana, and Franois Fleuret. ”SGAN: An Alternative Training of Generative Adversarial Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018

work page 2018
[24]

”End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks.” The International Journal of Robotics Research 37.4-5 (2018): 513-542

Wang, Sen, et al. ”End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks.” The International Journal of Robotics Research 37.4-5 (2018): 513-542

work page 2018
[25]

Mur-Artal, Raul, and Juan D. Tards. ”Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras.” IEEE Transactions on Robotics 33.5 (2017): 1255-1262

work page 2017
[26]

Single view stereo matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Luo Y , Ren J, Lin M, et al. Single view stereo matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 155-163

work page 2018
[27]

”The cityscapes dataset for semantic urban scene understanding.” Proceedings of the IEEE conference on computer vision and pattern recognition

Cordts, Marius, et al. ”The cityscapes dataset for semantic urban scene understanding.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016

[1] [1]

”Are we ready for autonomous driving? the kitti vision benchmark suite.” 2012 IEEE Conference on Computer Vision and Pattern Recognition

Geiger, Andreas, Philip Lenz, and Raquel Urtasun. ”Are we ready for autonomous driving? the kitti vision benchmark suite.” 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012

work page 2012

[2] [2]

”Spatial transformer networks.” Advances in neural information processing sys- tems

Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. ”Spatial transformer networks.” Advances in neural information processing sys- tems. 2015

work page 2015

[3] [3]

”Unsupervised cnn for single view depth estimation: Geometry to the rescue.” European Conference on Computer Vision

Garg, Ravi, et al. ”Unsupervised cnn for single view depth estimation: Geometry to the rescue.” European Conference on Computer Vision. Springer, Cham, 2016

work page 2016

[4] [4]

Godard, Clment, Oisin Mac Aodha, and Gabriel J. Brostow. ”Un- supervised monocular depth estimation with left-right consistency.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017

work page 2017

[5] [5]

”Unsupervised learning of depth and ego-motion from video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Zhou, Tinghui, et al. ”Unsupervised learning of depth and ego-motion from video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017

work page 2017

[6] [6]

”Geonet: Unsupervised learning of dense depth, optical ﬂow and camera pose.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Yin, Zhichao, and Jianping Shi. ”Geonet: Unsupervised learning of dense depth, optical ﬂow and camera pose.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018

work page 2018

[7] [7]

”Digging into self-supervised monocular depth estimation.” arXiv preprint arXiv:1806.01260 (2018)

Godard, Clment, et al. ”Digging into self-supervised monocular depth estimation.” arXiv preprint arXiv:1806.01260 (2018)

work page arXiv 2018

[8] [8]

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

Pillai, Sudeep, Rares Ambrus, and Adrien Gaidon. ”SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation.” arXiv preprint arXiv:1810.01849 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

”Undeepvo: Monocular visual odometry through unsupervised deep learning.” 2018 IEEE International Conference on Robotics and Automation (ICRA)

Li, Ruihao, et al. ”Undeepvo: Monocular visual odometry through unsupervised deep learning.” 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018

work page 2018

[10] [10]

Ranjan, Anurag, et al. ”Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmen- tation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019

work page 2019

[11] [11]

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

Wang, Yang, et al. ”Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos.” arXiv preprint arXiv:1810.03654 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

”PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Sun, Deqing, et al. ”PWC-Net: CNNs for optical ﬂow using pyramid, warping, and cost volume.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. 7

work page 2018

[13] [13]

”GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks.” arXiv preprint arXiv:1809.05786 (2018)

Almalioglu, Yasin, et al. ”GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks.” arXiv preprint arXiv:1809.05786 (2018)

work page arXiv 2018

[14] [14]

”Generative adversarial nets.” Advances in neural information processing systems

Goodfellow, Ian, et al. ”Generative adversarial nets.” Advances in neural information processing systems. 2014

work page 2014

[15] [15]

Bhandarkar, and Mukta Prasad

CS Kumar, Arun, Suchendra M. Bhandarkar, and Mukta Prasad. ”Monocular depth prediction using generative adversarial networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018

work page 2018

[16] [16]

”Generative Adversarial Networks for unsu- pervised monocular depth prediction.” Proceedings of the European Conference on Computer Vision (ECCV)

Aleotti, Filippo, et al. ”Generative Adversarial Networks for unsu- pervised monocular depth prediction.” Proceedings of the European Conference on Computer Vision (ECCV). 2018

work page 2018

[17] [17]

”Unsupervised adversarial depth estimation using cycled generative networks.” 2018 International Conference on 3D Vision (3DV)

Pilzer, Andrea, et al. ”Unsupervised adversarial depth estimation using cycled generative networks.” 2018 International Conference on 3D Vision (3DV). IEEE, 2018

work page 2018

[18] [18]

”Generative adversarial networks for depth map estimation from RGB video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

Gwn Lore, Kin, et al. ”Generative adversarial networks for depth map estimation from RGB video.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018

work page 2018

[19] [19]

Shi, Wenzhe, et al. ”Real-time single image and video super-resolution using an efﬁcient sub-pixel convolutional neural network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016

[20] [20]

”Self-normalizing neural networks.” Advances in neural information processing systems

Klambauer, Gnter, et al. ”Self-normalizing neural networks.” Advances in neural information processing systems. 2017

work page 2017

[21] [21]

”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems

Eigen, David, Christian Puhrsch, and Rob Fergus. ”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems. 2014

work page 2014

[22] [22]

”Unsupervised learning of depth and ego-motion from monocular video using 3d geo- metric constraints.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Mahjourian, Reza, Martin Wicke, and Anelia Angelova. ”Unsupervised learning of depth and ego-motion from monocular video using 3d geo- metric constraints.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018

work page 2018

[23] [23]

”SGAN: An Alternative Training of Generative Adversarial Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Chavdarova, Tatjana, and Franois Fleuret. ”SGAN: An Alternative Training of Generative Adversarial Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018

work page 2018

[24] [24]

”End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks.” The International Journal of Robotics Research 37.4-5 (2018): 513-542

Wang, Sen, et al. ”End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks.” The International Journal of Robotics Research 37.4-5 (2018): 513-542

work page 2018

[25] [25]

Mur-Artal, Raul, and Juan D. Tards. ”Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras.” IEEE Transactions on Robotics 33.5 (2017): 1255-1262

work page 2017

[26] [26]

Single view stereo matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Luo Y , Ren J, Lin M, et al. Single view stereo matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 155-163

work page 2018

[27] [27]

”The cityscapes dataset for semantic urban scene understanding.” Proceedings of the IEEE conference on computer vision and pattern recognition

Cordts, Marius, et al. ”The cityscapes dataset for semantic urban scene understanding.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016