MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification

Lukas Liebel; Marco K\"orner

arxiv: 1907.11111 · v1 · pith:HMIUHSWGnew · submitted 2019-07-25 · 💻 cs.CV

MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification

Lukas Liebel , Marco K\"orner This is my paper

Pith reviewed 2026-05-24 16:14 UTC · model grok-4.3

classification 💻 cs.CV

keywords single image depth estimationmulti-task learningdepth regressiondepth classificationKITTIconvolutional neural networksroad scene understanding

0 comments

The pith

Multi-task learning with regression and depth classification improves single-image depth estimation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MultiDepth, a convolutional neural network that approaches single-image depth estimation as a multi-task problem. It combines a main regression task for predicting continuous depth values with an auxiliary classification task that predicts depth intervals. This multi-task setup is intended to address the instability and slow convergence issues common in pure regression-based depth estimation. The method is evaluated on the KITTI depth prediction dataset for road scenes, showing that the combined training leads to more accurate results while allowing the classification branch to be disabled at test time for efficient continuous depth prediction.

Core claim

End-to-end multi-task learning using both regression for continuous depth and classification for depth intervals considerably improves training and yields more accurate depth estimates from single images compared to regression alone.

What carries the argument

A shared CNN backbone with separate regression and classification heads, where the classification of depth intervals serves as an auxiliary task during training.

If this is right

Training converges faster and more stably for depth regression networks.
Depth predictions achieve higher accuracy on the KITTI benchmark.
The auxiliary task can be removed at inference without affecting the regression output.
Improved performance for applications in autonomous driving and advanced driver assistance systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar multi-task strategies might benefit other regression problems in computer vision that face convergence issues.
The classification task could encourage the network to learn more robust scene structure features.
Defining optimal depth intervals for the classification task may require dataset-specific tuning.

Load-bearing premise

The auxiliary classification task supplies helpful training signals to the shared features without interfering negatively with the regression task or needing heavy tuning of task weights.

What would settle it

Training the same architecture with only the regression task and observing equal or better accuracy on KITTI would show that the multi-task approach does not improve results.

Figures

Figures reproduced from arXiv: 1907.11111 by Lukas Liebel, Marco K\"orner.

**Figure 1.** Figure 1: Depth prediction (bottom) for a single RGB image (top) from the KITTI depth prediction benchmark [45, 14]. Auxiliary depth interval classification result (middle) used during training to support the depth value regression via optimization of a multi-task objective. depth completion [7] or as labels for actual depth prediction. Utilizing stereo image pairs and a photo-consistency loss for semi-supervised t… view at source ↗

**Figure 2.** Figure 2: Network architecture for the proposed MultiDepth approach featuring a shared ResNet-101 encoder with dilated convolutions and task-specific decoders including pyramid pooling for the main regression and the auxiliary classification task. The multi-task loss is constructed from the contributing single-task losses utilizing learned weighting terms, which represent task uncertainties. weighting of errors in l… view at source ↗

**Figure 3.** Figure 3: Sample from the KITTI depth prediction dataset [45, 14] with RGB image (a) and sparse ground-truth depth (b). Distribution of (c) available ground-truth depth values per pixel in the training set with an apparent lack of measurements in the upper part of the images, and (d) of valid depth values in the training set. loss used in the original implementation of Zhao et al. [51] was not implemented. Parameter… view at source ↗

**Figure 4.** Figure 4: Experimental estimation of a suitable learning rate. The shaded interval of learning rates decreases the loss during training, the dashed line marks the selected learning rate. image on average. For some samples, however, this number drops to only 0.8%. Figure 3c shows the unequal spatial distribution of depth measurements with a significant lack of data for the upper part of the images. We train on the f… view at source ↗

**Figure 6.** Figure 6: Validation results using different settings for (a) multi-task weights, (b) depth value scaling, and (c) patch size. Models using learned weighting, normalization bounds adapted to the data distribution, and large patches for training yield best results with weighting being the most important factor. −10 −5 0 0 0.5 1 0 0.5 1 0 2.5 5 0 0.5 1 0 2.5 5 0 0.5 1 0 2.5 5 0 1 2 0 2.5 5 0 1 2 0 2.5 5 −4 −2 0 2 0 2.… view at source ↗

**Figure 7.** Figure 7: Training results for the final model configuration showing an initial phase of adjusting the weights stask according to the uncertainty of Ltask followed by a continuous decrease of both. The combined Lmt is optimized using a decaying α. Validation results show the stable convergence of both outputs with the regression yielding superior results due to quantization errors in the auxiliary classification ou… view at source ↗

**Figure 8.** Figure 8: Qualitative evaluation of our estimation results for unseen KITTI depth prediction test images [45, 14] (first row) with results of the auxiliary classification task (second row), results of the main regression task (third row) and color coded error images of the regression result compared to the sparse LiDAR point cloud (fourth row). weighting terms saves precious computation time while introducing as li… view at source ↗

read the original abstract

We introduce MultiDepth, a novel training strategy and convolutional neural network (CNN) architecture that allows approaching single-image depth estimation (SIDE) as a multi-task problem. SIDE is an important part of road scene understanding. It, thus, plays a vital role in advanced driver assistance systems and autonomous vehicles. Best results for the SIDE task so far have been achieved using deep CNNs. However, optimization of regression problems, such as estimating depth, is still a challenging task. For the related tasks of image classification and semantic segmentation, numerous CNN-based methods with robust training behavior have been proposed. Hence, in order to overcome the notorious instability and slow convergence of depth value regression during training, MultiDepth makes use of depth interval classification as an auxiliary task. The auxiliary task can be disabled at test-time to predict continuous depth values using the main regression branch more efficiently. We applied MultiDepth to road scenes and present results on the KITTI depth prediction dataset. In experiments, we were able to show that end-to-end multi-task learning with both, regression and classification, is able to considerably improve training and yield more accurate results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MultiDepth adds depth-interval classification as an auxiliary task to stabilize regression for single-image depth on KITTI; the move is practical but the reported gains need the actual numbers to judge.

read the letter

The paper's main contribution is a multi-task CNN that runs depth regression as the primary output while adding an auxiliary classification head over discrete depth intervals. The classification loss is used only during training to give the shared backbone more stable gradients, then dropped at inference so the model still produces continuous depth values at no extra cost. They evaluate on the KITTI depth prediction benchmark, which fits the road-scene focus. This is a direct, incremental application of a known multi-task pattern rather than a new architecture or loss derivation. The design choice to keep the auxiliary task training-only is clean and addresses a real practical constraint. If the experiments include proper ablations against a pure-regression baseline and show consistent accuracy lifts without heavy hyper-parameter tuning, the work supplies a usable training trick for similar perception pipelines. The abstract states that the combined objective improves both training behavior and final accuracy, but supplies no tables, error metrics, or baseline deltas in the text provided. Without those numbers or an error analysis, it is impossible to tell whether the auxiliary task actually delivers net benefit or just trades one set of instabilities for another. The paper stays within standard CNN practice and cites relevant prior work on depth estimation and multi-task learning, so there is no obvious circularity or invented quantity. This is the sort of targeted empirical note that a reading group on depth estimation or autonomous-vehicle perception might discuss for half an hour. A serious editor should send it to review because the claim is falsifiable on public data and the domain matters, even if the gains prove modest after revision.

Referee Report

2 major / 0 minor

Summary. The paper introduces MultiDepth, a CNN architecture and training strategy for single-image depth estimation (SIDE) on road scenes. It frames SIDE as a multi-task problem with a primary regression branch for continuous depth values and an auxiliary classification branch over depth intervals; the auxiliary task is used only during training to stabilize optimization and is disabled at test time. The approach is evaluated on the KITTI depth prediction dataset, with the abstract asserting that the combined objective improves training behavior and final accuracy.

Significance. If the claimed empirical gains hold under proper controls, the multi-task formulation could provide a lightweight way to regularize depth regression without architectural changes at inference. The idea of leveraging classification robustness to aid regression is plausible and has precedents in other vision tasks, but the manuscript supplies no quantitative support for the improvement, limiting any assessment of practical impact or novelty relative to existing multi-task depth methods.

major comments (2)

[Abstract] Abstract: the central empirical claim that 'end-to-end multi-task learning with both, regression and classification, is able to considerably improve training and yield more accurate results' is unsupported; the manuscript contains no tables, figures, numerical metrics (e.g., RMSE, δ<1.25), baseline comparisons, ablation results, or training curves to substantiate the assertion.
The weakest assumption identified in the reader's report—that the auxiliary depth-interval classification task supplies useful gradient signal without negative transfer or extensive task-weight tuning—is never tested or quantified; no loss-weighting schedule, gradient-norm analysis, or ablation removing the auxiliary head appears in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback. We agree that the current manuscript version lacks sufficient quantitative evidence and ablations to support the central claims, and we will revise accordingly to address these gaps.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim that 'end-to-end multi-task learning with both, regression and classification, is able to considerably improve training and yield more accurate results' is unsupported; the manuscript contains no tables, figures, numerical metrics (e.g., RMSE, δ<1.25), baseline comparisons, ablation results, or training curves to substantiate the assertion.

Authors: We acknowledge this point. The abstract asserts empirical improvements on KITTI without accompanying metrics or visuals in the current text. We will add a results section with tables reporting RMSE, δ<1.25, baseline comparisons against standard regression-only models, ablation studies, and training curves showing convergence differences in the revised manuscript. revision: yes
Referee: The weakest assumption identified in the reader's report—that the auxiliary depth-interval classification task supplies useful gradient signal without negative transfer or extensive task-weight tuning—is never tested or quantified; no loss-weighting schedule, gradient-norm analysis, or ablation removing the auxiliary head appears in the text.

Authors: We agree that the contribution of the auxiliary task requires explicit validation. The revised manuscript will include an ablation study with the auxiliary head removed, a description of the loss-weighting schedule used (e.g., equal weighting or tuned ratios), and discussion of any observed negative transfer or gradient behavior to quantify the auxiliary task's benefit. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical CNN architecture and training strategy for single-image depth estimation, with the central claim resting on experimental results on the KITTI dataset rather than any derivation or prediction. No equations, fitted parameters renamed as predictions, self-citations as load-bearing uniqueness theorems, or ansatzes are present in the abstract or described approach. The multi-task regression+classification benefit is stated as an observed outcome from end-to-end training, not a self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; standard CNN training assumptions (e.g., existence of suitable loss weighting between tasks) are implicit but not enumerated.

pith-pipeline@v0.9.0 · 5726 in / 1123 out tokens · 23377 ms · 2026-05-24T16:14:03.750761+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

end-to-end multi-task learning with both, regression and classification, is able to considerably improve training
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

auxiliary depth interval classification task

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 3 internal anchors

[1]

Estimating Depth From Monocu- lar Images as Classiﬁcation Using Deep Fully Convolutional Residual Networks

Y . Cao, Z. Wu, and C. Shen. “Estimating Depth From Monocu- lar Images as Classiﬁcation Using Deep Fully Convolutional Residual Networks”. In: TCSVT 28.11 (2018), pp. 3174–3182

work page 2018
[2]

Multitask Learning

Rich Caruana. “Multitask Learning”. In: Machine Learning 28.1 (1997), pp. 41–75

work page 1997
[3]

Multitask Learning: A Knowledge-Based Source of Inductive Bias

Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: ICML. 1993, pp. 41–48

work page 1993
[4]

Deep MANTA: A Coarse-To- Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis From Monocular Image

Florian Chabot, Mohamed Chaouch, Jaonary Rabarisoa, Celine Teuliere, and Thierry Chateau. “Deep MANTA: A Coarse-To- Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis From Monocular Image”. In: CVPR. 2017, pp. 1827–1836

work page 2017
[5]

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. “Semantic Image Segmenta- tion with Deep Convolutional Nets and Fully Connected CRFs”. In: ICLR. 2015, pp. 1–14. arXiv: 1412.7062v4 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2015
[6]

Multi- View 3D Object Detection Network for Autonomous Driving

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. “Multi- View 3D Object Detection Network for Autonomous Driving”. In: CVPR. 2017, pp. 6526–6534

work page 2017
[7]

Depth Esti- mation via Afﬁnity Learned with Convolutional Spatial Propa- gation Network

Xinjing Cheng, Peng Wang, and Ruigang Yang. “Depth Esti- mation via Afﬁnity Learned with Convolutional Spatial Propa- gation Network”. In: ECCV. 2018, pp. 108–125

work page 2018
[8]

AuxNet: Auxiliary tasks enhanced Semantic Segmentation for Automated Driving

Sumanth Chennupati, Ganesh Sistu, Senthil Yogamani, and Samir Rawashdeh. “AuxNet: Auxiliary tasks enhanced Seman- tic Segmentation for Automated Driving”. In: VISAPP. 2019, pp. 1–8. arXiv: 1901.05808v1 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[9]

The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Re- hfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. “The Cityscapes Dataset for Semantic Urban Scene Understanding”. In: CVPR. 2016, pp. 3213–3223

work page 2016
[10]

Predicting Depth, Surface Nor- mals and Semantic Labels With a Common Multi-Scale Con- volutional Architecture

David Eigen and Rob Fergus. “Predicting Depth, Surface Nor- mals and Semantic Labels With a Common Multi-Scale Con- volutional Architecture”. In: ICCV. 2015, pp. 2650–2658

work page 2015
[11]

Depth Map Prediction from a Single Image using a Multi-Scale Deep Net- work

David Eigen, Christian Puhrsch, and Rob Fergus. “Depth Map Prediction from a Single Image using a Multi-Scale Deep Net- work”. In: NIPS. 2014, pp. 2366–2374. 1https://github.com/lukasliebel/MultiDepth

work page 2014
[12]

Deep Ordinal Regression Network for Monocular Depth Estimation

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Bat- manghelich, and Dacheng Tao. “Deep Ordinal Regression Network for Monocular Depth Estimation”. In: CVPR. 2018, pp. 2002–2011

work page 2018
[13]

Monocular Depth Estimation with Afﬁnity, Vertical Pooling, and Label Enhancement

Yukang Gan, Xiangyu Xu, Wenxiu Sun, and Liang Lin. “Monocular Depth Estimation with Afﬁnity, Vertical Pooling, and Label Enhancement”. In: ECCV. 2018, pp. 232–247

work page 2018
[14]

Vision meets robotics: The KITTI dataset

A Geiger, P Lenz, C Stiller, and R Urtasun. “Vision meets robotics: The KITTI dataset”. In: Int. J. Robotics Res. 32.11 (2013), pp. 1231–1237

work page 2013
[15]

Unsupervised Monocular Depth Estimation With Left-Right Consistency

Clement Godard, Oisin Mac Aodha, and Gabriel J. Brostow. “Unsupervised Monocular Depth Estimation With Left-Right Consistency”. In: CVPR. 2017, pp. 6602–6611

work page 2017
[16]

Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. http : / / www . deeplearningbook . org. MIT Press, 2016

work page 2016
[17]

Dynamic Task Prioritization for Multitask Learning

Michelle Guo, Albert Haque, De-An Huang, Serena Yeung, and Li Fei-Fei. “Dynamic Task Prioritization for Multitask Learning”. In: ECCV. 2018, pp. 282–299

work page 2018
[18]

Learning Monocular Depth by Distilling Cross- domain Stereo Networks

Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, and Xi- aogang Wang. “Learning Monocular Depth by Distilling Cross- domain Stereo Networks”. In: ECCV. 2018, pp. 506–523

work page 2018
[19]

Monocular Depth Estima- tion by Learning from Heterogeneous Datasets

Akhil Gurram, Onay Urfalioglu, Ibrahim Halfaoui, Fahd Bouzaraa, and Antonio M. Lopez. “Monocular Depth Estima- tion by Learning from Heterogeneous Datasets”. In: IV. 2018, pp. 2176–2181

work page 2018
[20]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition”. In: CVPR. 2016, pp. 770–778

work page 2016
[21]

Monocular Depth Estimation Using Whole Strip Masking and Reliability-Based Reﬁnement

Minhyeok Heo, Jaehan Lee, Kyung-Rae Kim, Han-Ul Kim, and Chang-Su Kim. “Monocular Depth Estimation Using Whole Strip Masking and Reliability-Based Reﬁnement”. In: ECCV. 2018, pp. 39–55

work page 2018
[22]

The ApolloScape Dataset for Autonomous Driving

Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. “The ApolloScape Dataset for Autonomous Driving”. In: CVPR Workshops. 2018, pp. 1067–1037

work page 2018
[23]

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geom- etry and Semantics

Alex Kendall, Yarin Gal, and Roberto Cipolla. “Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geom- etry and Semantics”. In: CVPR. 2018, pp. 7482–7491

work page 2018
[24]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimization”. In: ICLR. 2015, pp. 1–15

work page 2015
[25]

Evaluation of CNN-based Single-Image Depth Esti- mation Methods

Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Körner. “Evaluation of CNN-based Single-Image Depth Esti- mation Methods”. In: ECCV Workshops. 2018, pp. 331–348

work page 2018
[26]

Pixel-wise Attentional Gat- ing for Scene parsing

Shu Kong and Charless Fowlkes. “Pixel-wise Attentional Gat- ing for Scene parsing”. In: WACV. 2019, pp. 1024–1033

work page 2019
[27]

Semi- Supervised Deep Learning for Monocular Depth Map Predic- tion

Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. “Semi- Supervised Deep Learning for Monocular Depth Map Predic- tion”. In: CVPR. 2017, pp. 2215–2223

work page 2017
[28]

Deeper depth prediction with fully convolutional residual networks

Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. “Deeper depth prediction with fully convolutional residual networks”. In: 3DV. 2016, pp. 239–248

work page 2016
[29]

Monocular depth estima- tion with hierarchical fusion of dilated CNNs and soft-weighted- sum inference

Bo Li, Yuchao Dai, and Mingyi He. “Monocular depth estima- tion with hierarchical fusion of dilated CNNs and soft-weighted- sum inference”. In: Pattern Recognit. 83 (2018), pp. 328–339. 8

work page 2018
[30]

A Two-Streamed Network for Estimating Fine-Scaled Depth Maps From Single RGB Images

Jun Li, Reinhard Klein, and Angela Yao. “A Two-Streamed Network for Estimating Fine-Scaled Depth Maps From Single RGB Images”. In: CVPR. 2017, pp. 3372–3380

work page 2017
[31]

Deep attention-based classiﬁcation network for robust depth prediction

Ruibo Li, Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, and Lingxiao Hang. “Deep attention-based classiﬁcation network for robust depth prediction”. In: ACCV. (ACCV). 2018, pp. 1–

work page 2018
[32]

MegaDepth: Learning Single- View Depth Prediction From Internet Photos

Zhengqi Li and Noah Snavely. “MegaDepth: Learning Single- View Depth Prediction From Internet Photos”. In: CVPR. 2018, pp. 2041–2050

work page 2018
[33]

Auxiliary Tasks in Multi-task Learning

Lukas Liebel and Marco Körner. “Auxiliary Tasks in Multi- task Learning”. In: (2018), pp. 1–8. arXiv: 1805.06334v2 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image

Chen Liu, Jimei Yang, Duygu Ceylan, Ersin Yumer, and Yasu- taka Furukawa. “PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image”. In: CVPR. 2018, pp. 2579–2588

work page 2018
[35]

Deep convolu- tional neural ﬁelds for depth estimation from a single image

Fayao Liu, Chunhua Shen, and Guosheng Lin. “Deep convolu- tional neural ﬁelds for depth estimation from a single image”. In: CVPR. 2015, pp. 5162–5170

work page 2015
[36]

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Arun Mallya, Dillon Davis, and Svetlana Lazebnik. “Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights”. In: ECCV. 2018, pp. 72–88

work page 2018
[37]

GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation

Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, Raquel Urtasun, and Jiaya Jia. “GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation”. In: CVPR. 2018, pp. 283– 291

work page 2018
[38]

Cross-Domain Self- Supervised Multi-Task Feature Learning Using Synthetic Im- agery

Zhongzheng Ren and Yong Jae Lee. “Cross-Domain Self- Supervised Multi-Task Feature Learning Using Synthetic Im- agery”. In: CVPR. 2018, pp. 762–771

work page 2018
[39]

Train Here, Deploy There: Robust Segmentation in Unseen Domains

E. Romera, L. M. Bergasa, J. M. Alvarez, and M. Trivedi. “Train Here, Deploy There: Robust Segmentation in Unseen Domains”. In: IV. 2018, pp. 1828–1833

work page 2018
[40]

An Overview of Multi-Task Learning in Deep Neural Networks

Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: (2017), pp. 1–14. arXiv: 1706. 05098v1 [cs.LG]

work page 2017
[41]

Multi-Task Learning as Multi- Objective Optimization

Ozan Sener and Vladlen Koltun. “Multi-Task Learning as Multi- Objective Optimization”. In: NIPS. 2018, pp. 525–536

work page 2018
[42]

Cyclical Learning Rates for Training Neural Networks

L. N. Smith. “Cyclical Learning Rates for Training Neural Networks”. In: WACV. 2017, pp. 464–472

work page 2017
[43]

On the Importance of Stereo for Accurate Depth Estima- tion: An Efﬁcient Semi-Supervised Deep Neural Network Ap- proach

Nikolai Smolyanskiy, Alexey Kamenev, and Stan Birchﬁeld. “On the Importance of Stereo for Accurate Depth Estima- tion: An Efﬁcient Semi-Supervised Deep Neural Network Ap- proach”. In: CVPR Workshops. 2018, pp. 1120–1128

work page 2018
[44]

MultiNet: Real-time Joint Se- mantic Reasoning for Autonomous Driving

Marvin Teichmann, Michael Weber, J. Marius Zöllner, Roberto Cipolla, and Raquel Urtasun. “MultiNet: Real-time Joint Se- mantic Reasoning for Autonomous Driving”. In: IV. 2018, pp. 1013–1020

work page 2018
[45]

Sparsity Invariant CNNs

Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. “Sparsity Invariant CNNs”. In: 3DV. 2017, pp. 11–20

work page 2017
[46]

PAD- Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

Dan Xu, Wanli Ouyang, Xiaogang Wang, and Nicu Sebe. “PAD- Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing”. In: CVPR. 2018, pp. 675–684

work page 2018
[47]

SegStereo: Exploiting Semantic Information for Disparity Estimation

Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, and Jiaya Jia. “SegStereo: Exploiting Semantic Information for Disparity Estimation”. In: ECCV. 2018, pp. 660–676

work page 2018
[48]

Multi-Scale Context Aggrega- tion by Dilated Convolutions

Fisher Yu and Vladlen Koltun. “Multi-Scale Context Aggrega- tion by Dilated Convolutions”. In: ICLR. 2016, pp. 1–13

work page 2016
[49]

Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation

Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, and Jian Yang. “Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation”. In: ECCV. 2018, pp. 238– 255

work page 2018
[50]

Deep hierarchical guidance and regularization learn- ing for end-to-end depth estimation

Zhenyu Zhang, Chunyan Xu, Jian Yang, Ying Tai, and Liang Chen. “Deep hierarchical guidance and regularization learn- ing for end-to-end depth estimation”. In: Pattern Recognit. 83 (2018), pp. 430–442

work page 2018
[51]

Pyramid Scene Parsing Network

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid Scene Parsing Network”. In: CVPR. 2017, pp. 6230–6239

work page 2017
[52]

A Modulation Module for Multi-task Learn- ing with Applications in Image Retrieval

Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, and Ying Wu. “A Modulation Module for Multi-task Learn- ing with Applications in Image Retrieval”. In: ECCV. 2018, pp. 415–432

work page 2018
[53]

OmniDepth: Dense Depth Estimation for In- doors Spherical Panoramas

Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. “OmniDepth: Dense Depth Estimation for In- doors Spherical Panoramas”. In: ECCV. 2018, pp. 453–471. 9

work page 2018

[1] [1]

Estimating Depth From Monocu- lar Images as Classiﬁcation Using Deep Fully Convolutional Residual Networks

Y . Cao, Z. Wu, and C. Shen. “Estimating Depth From Monocu- lar Images as Classiﬁcation Using Deep Fully Convolutional Residual Networks”. In: TCSVT 28.11 (2018), pp. 3174–3182

work page 2018

[2] [2]

Multitask Learning

Rich Caruana. “Multitask Learning”. In: Machine Learning 28.1 (1997), pp. 41–75

work page 1997

[3] [3]

Multitask Learning: A Knowledge-Based Source of Inductive Bias

Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: ICML. 1993, pp. 41–48

work page 1993

[4] [4]

Deep MANTA: A Coarse-To- Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis From Monocular Image

Florian Chabot, Mohamed Chaouch, Jaonary Rabarisoa, Celine Teuliere, and Thierry Chateau. “Deep MANTA: A Coarse-To- Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis From Monocular Image”. In: CVPR. 2017, pp. 1827–1836

work page 2017

[5] [5]

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. “Semantic Image Segmenta- tion with Deep Convolutional Nets and Fully Connected CRFs”. In: ICLR. 2015, pp. 1–14. arXiv: 1412.7062v4 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2015

[6] [6]

Multi- View 3D Object Detection Network for Autonomous Driving

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. “Multi- View 3D Object Detection Network for Autonomous Driving”. In: CVPR. 2017, pp. 6526–6534

work page 2017

[7] [7]

Depth Esti- mation via Afﬁnity Learned with Convolutional Spatial Propa- gation Network

Xinjing Cheng, Peng Wang, and Ruigang Yang. “Depth Esti- mation via Afﬁnity Learned with Convolutional Spatial Propa- gation Network”. In: ECCV. 2018, pp. 108–125

work page 2018

[8] [8]

AuxNet: Auxiliary tasks enhanced Semantic Segmentation for Automated Driving

Sumanth Chennupati, Ganesh Sistu, Senthil Yogamani, and Samir Rawashdeh. “AuxNet: Auxiliary tasks enhanced Seman- tic Segmentation for Automated Driving”. In: VISAPP. 2019, pp. 1–8. arXiv: 1901.05808v1 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[9] [9]

The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Re- hfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. “The Cityscapes Dataset for Semantic Urban Scene Understanding”. In: CVPR. 2016, pp. 3213–3223

work page 2016

[10] [10]

Predicting Depth, Surface Nor- mals and Semantic Labels With a Common Multi-Scale Con- volutional Architecture

David Eigen and Rob Fergus. “Predicting Depth, Surface Nor- mals and Semantic Labels With a Common Multi-Scale Con- volutional Architecture”. In: ICCV. 2015, pp. 2650–2658

work page 2015

[11] [11]

Depth Map Prediction from a Single Image using a Multi-Scale Deep Net- work

David Eigen, Christian Puhrsch, and Rob Fergus. “Depth Map Prediction from a Single Image using a Multi-Scale Deep Net- work”. In: NIPS. 2014, pp. 2366–2374. 1https://github.com/lukasliebel/MultiDepth

work page 2014

[12] [12]

Deep Ordinal Regression Network for Monocular Depth Estimation

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Bat- manghelich, and Dacheng Tao. “Deep Ordinal Regression Network for Monocular Depth Estimation”. In: CVPR. 2018, pp. 2002–2011

work page 2018

[13] [13]

Monocular Depth Estimation with Afﬁnity, Vertical Pooling, and Label Enhancement

Yukang Gan, Xiangyu Xu, Wenxiu Sun, and Liang Lin. “Monocular Depth Estimation with Afﬁnity, Vertical Pooling, and Label Enhancement”. In: ECCV. 2018, pp. 232–247

work page 2018

[14] [14]

Vision meets robotics: The KITTI dataset

A Geiger, P Lenz, C Stiller, and R Urtasun. “Vision meets robotics: The KITTI dataset”. In: Int. J. Robotics Res. 32.11 (2013), pp. 1231–1237

work page 2013

[15] [15]

Unsupervised Monocular Depth Estimation With Left-Right Consistency

Clement Godard, Oisin Mac Aodha, and Gabriel J. Brostow. “Unsupervised Monocular Depth Estimation With Left-Right Consistency”. In: CVPR. 2017, pp. 6602–6611

work page 2017

[16] [16]

Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. http : / / www . deeplearningbook . org. MIT Press, 2016

work page 2016

[17] [17]

Dynamic Task Prioritization for Multitask Learning

Michelle Guo, Albert Haque, De-An Huang, Serena Yeung, and Li Fei-Fei. “Dynamic Task Prioritization for Multitask Learning”. In: ECCV. 2018, pp. 282–299

work page 2018

[18] [18]

Learning Monocular Depth by Distilling Cross- domain Stereo Networks

Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, and Xi- aogang Wang. “Learning Monocular Depth by Distilling Cross- domain Stereo Networks”. In: ECCV. 2018, pp. 506–523

work page 2018

[19] [19]

Monocular Depth Estima- tion by Learning from Heterogeneous Datasets

Akhil Gurram, Onay Urfalioglu, Ibrahim Halfaoui, Fahd Bouzaraa, and Antonio M. Lopez. “Monocular Depth Estima- tion by Learning from Heterogeneous Datasets”. In: IV. 2018, pp. 2176–2181

work page 2018

[20] [20]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition”. In: CVPR. 2016, pp. 770–778

work page 2016

[21] [21]

Monocular Depth Estimation Using Whole Strip Masking and Reliability-Based Reﬁnement

Minhyeok Heo, Jaehan Lee, Kyung-Rae Kim, Han-Ul Kim, and Chang-Su Kim. “Monocular Depth Estimation Using Whole Strip Masking and Reliability-Based Reﬁnement”. In: ECCV. 2018, pp. 39–55

work page 2018

[22] [22]

The ApolloScape Dataset for Autonomous Driving

Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. “The ApolloScape Dataset for Autonomous Driving”. In: CVPR Workshops. 2018, pp. 1067–1037

work page 2018

[23] [23]

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geom- etry and Semantics

Alex Kendall, Yarin Gal, and Roberto Cipolla. “Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geom- etry and Semantics”. In: CVPR. 2018, pp. 7482–7491

work page 2018

[24] [24]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimization”. In: ICLR. 2015, pp. 1–15

work page 2015

[25] [25]

Evaluation of CNN-based Single-Image Depth Esti- mation Methods

Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Körner. “Evaluation of CNN-based Single-Image Depth Esti- mation Methods”. In: ECCV Workshops. 2018, pp. 331–348

work page 2018

[26] [26]

Pixel-wise Attentional Gat- ing for Scene parsing

Shu Kong and Charless Fowlkes. “Pixel-wise Attentional Gat- ing for Scene parsing”. In: WACV. 2019, pp. 1024–1033

work page 2019

[27] [27]

Semi- Supervised Deep Learning for Monocular Depth Map Predic- tion

Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. “Semi- Supervised Deep Learning for Monocular Depth Map Predic- tion”. In: CVPR. 2017, pp. 2215–2223

work page 2017

[28] [28]

Deeper depth prediction with fully convolutional residual networks

Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. “Deeper depth prediction with fully convolutional residual networks”. In: 3DV. 2016, pp. 239–248

work page 2016

[29] [29]

Monocular depth estima- tion with hierarchical fusion of dilated CNNs and soft-weighted- sum inference

Bo Li, Yuchao Dai, and Mingyi He. “Monocular depth estima- tion with hierarchical fusion of dilated CNNs and soft-weighted- sum inference”. In: Pattern Recognit. 83 (2018), pp. 328–339. 8

work page 2018

[30] [30]

A Two-Streamed Network for Estimating Fine-Scaled Depth Maps From Single RGB Images

Jun Li, Reinhard Klein, and Angela Yao. “A Two-Streamed Network for Estimating Fine-Scaled Depth Maps From Single RGB Images”. In: CVPR. 2017, pp. 3372–3380

work page 2017

[31] [31]

Deep attention-based classiﬁcation network for robust depth prediction

Ruibo Li, Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, and Lingxiao Hang. “Deep attention-based classiﬁcation network for robust depth prediction”. In: ACCV. (ACCV). 2018, pp. 1–

work page 2018

[32] [32]

MegaDepth: Learning Single- View Depth Prediction From Internet Photos

Zhengqi Li and Noah Snavely. “MegaDepth: Learning Single- View Depth Prediction From Internet Photos”. In: CVPR. 2018, pp. 2041–2050

work page 2018

[33] [33]

Auxiliary Tasks in Multi-task Learning

Lukas Liebel and Marco Körner. “Auxiliary Tasks in Multi- task Learning”. In: (2018), pp. 1–8. arXiv: 1805.06334v2 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[34] [34]

PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image

Chen Liu, Jimei Yang, Duygu Ceylan, Ersin Yumer, and Yasu- taka Furukawa. “PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image”. In: CVPR. 2018, pp. 2579–2588

work page 2018

[35] [35]

Deep convolu- tional neural ﬁelds for depth estimation from a single image

Fayao Liu, Chunhua Shen, and Guosheng Lin. “Deep convolu- tional neural ﬁelds for depth estimation from a single image”. In: CVPR. 2015, pp. 5162–5170

work page 2015

[36] [36]

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Arun Mallya, Dillon Davis, and Svetlana Lazebnik. “Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights”. In: ECCV. 2018, pp. 72–88

work page 2018

[37] [37]

GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation

Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, Raquel Urtasun, and Jiaya Jia. “GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation”. In: CVPR. 2018, pp. 283– 291

work page 2018

[38] [38]

Cross-Domain Self- Supervised Multi-Task Feature Learning Using Synthetic Im- agery

Zhongzheng Ren and Yong Jae Lee. “Cross-Domain Self- Supervised Multi-Task Feature Learning Using Synthetic Im- agery”. In: CVPR. 2018, pp. 762–771

work page 2018

[39] [39]

Train Here, Deploy There: Robust Segmentation in Unseen Domains

E. Romera, L. M. Bergasa, J. M. Alvarez, and M. Trivedi. “Train Here, Deploy There: Robust Segmentation in Unseen Domains”. In: IV. 2018, pp. 1828–1833

work page 2018

[40] [40]

An Overview of Multi-Task Learning in Deep Neural Networks

Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: (2017), pp. 1–14. arXiv: 1706. 05098v1 [cs.LG]

work page 2017

[41] [41]

Multi-Task Learning as Multi- Objective Optimization

Ozan Sener and Vladlen Koltun. “Multi-Task Learning as Multi- Objective Optimization”. In: NIPS. 2018, pp. 525–536

work page 2018

[42] [42]

Cyclical Learning Rates for Training Neural Networks

L. N. Smith. “Cyclical Learning Rates for Training Neural Networks”. In: WACV. 2017, pp. 464–472

work page 2017

[43] [43]

On the Importance of Stereo for Accurate Depth Estima- tion: An Efﬁcient Semi-Supervised Deep Neural Network Ap- proach

Nikolai Smolyanskiy, Alexey Kamenev, and Stan Birchﬁeld. “On the Importance of Stereo for Accurate Depth Estima- tion: An Efﬁcient Semi-Supervised Deep Neural Network Ap- proach”. In: CVPR Workshops. 2018, pp. 1120–1128

work page 2018

[44] [44]

MultiNet: Real-time Joint Se- mantic Reasoning for Autonomous Driving

Marvin Teichmann, Michael Weber, J. Marius Zöllner, Roberto Cipolla, and Raquel Urtasun. “MultiNet: Real-time Joint Se- mantic Reasoning for Autonomous Driving”. In: IV. 2018, pp. 1013–1020

work page 2018

[45] [45]

Sparsity Invariant CNNs

Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. “Sparsity Invariant CNNs”. In: 3DV. 2017, pp. 11–20

work page 2017

[46] [46]

PAD- Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

Dan Xu, Wanli Ouyang, Xiaogang Wang, and Nicu Sebe. “PAD- Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing”. In: CVPR. 2018, pp. 675–684

work page 2018

[47] [47]

SegStereo: Exploiting Semantic Information for Disparity Estimation

Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, and Jiaya Jia. “SegStereo: Exploiting Semantic Information for Disparity Estimation”. In: ECCV. 2018, pp. 660–676

work page 2018

[48] [48]

Multi-Scale Context Aggrega- tion by Dilated Convolutions

Fisher Yu and Vladlen Koltun. “Multi-Scale Context Aggrega- tion by Dilated Convolutions”. In: ICLR. 2016, pp. 1–13

work page 2016

[49] [49]

Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation

Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, and Jian Yang. “Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation”. In: ECCV. 2018, pp. 238– 255

work page 2018

[50] [50]

Deep hierarchical guidance and regularization learn- ing for end-to-end depth estimation

Zhenyu Zhang, Chunyan Xu, Jian Yang, Ying Tai, and Liang Chen. “Deep hierarchical guidance and regularization learn- ing for end-to-end depth estimation”. In: Pattern Recognit. 83 (2018), pp. 430–442

work page 2018

[51] [51]

Pyramid Scene Parsing Network

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. “Pyramid Scene Parsing Network”. In: CVPR. 2017, pp. 6230–6239

work page 2017

[52] [52]

A Modulation Module for Multi-task Learn- ing with Applications in Image Retrieval

Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, and Ying Wu. “A Modulation Module for Multi-task Learn- ing with Applications in Image Retrieval”. In: ECCV. 2018, pp. 415–432

work page 2018

[53] [53]

OmniDepth: Dense Depth Estimation for In- doors Spherical Panoramas

Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. “OmniDepth: Dense Depth Estimation for In- doors Spherical Panoramas”. In: ECCV. 2018, pp. 453–471. 9

work page 2018