To Learn or Not to Learn: Analyzing the Role of Learning for Navigation in Virtual Environments

Jia Deng; Noriyuki Kojima

arxiv: 1907.11770 · v1 · pith:EMDGNKHGnew · submitted 2019-07-26 · 💻 cs.CV

To Learn or Not to Learn: Analyzing the Role of Learning for Navigation in Virtual Environments

Noriyuki Kojima , Jia Deng This is my paper

Pith reviewed 2026-05-24 15:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords navigationvirtual environmentsclassical methodslearning-based agentscollision avoidancememory managementMINOS benchmarkStanford 3D Indoor Spaces

0 comments

The pith

Classical navigation agents outperform state-of-the-art learning-based agents on two standard virtual environment benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs classical navigation agents and shows they surpass current learning-based methods on the MINOS and Stanford Large-Scale 3D Indoor Spaces benchmarks. It then breaks down the performance gaps to identify where each approach succeeds or fails. Learned agents prove weaker at avoiding collisions and managing memory but stronger when environments contain ambiguity or noise. The comparison supplies concrete evidence that can shape how future navigation systems are built. Readers in robotics and AI care because the work questions whether learning is always the better route for this task.

Core claim

Classical navigation agents outperform state-of-the-art learning-based agents on the MINOS and Stanford Large-Scale 3D Indoor Spaces benchmarks. Learned agents show inferior collision avoidance and memory management yet handle ambiguity and noise better than classical agents. These observations can directly inform the design of improved navigation agents.

What carries the argument

The constructed classical navigation agents used as direct baselines against learning-based methods on the two benchmarks.

If this is right

Navigation design should target better collision avoidance and memory use in learned agents.
Classical methods remain competitive when environments are structured and low-noise.
Hybrid systems could combine classical collision handling with learned tolerance for ambiguity.
Benchmark results for navigation should include explicit classical baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Classical methods may reduce the need for large training datasets in controlled virtual settings.
The noise-handling advantage of learning suggests it could prove stronger in real-world sensor data.
Repeating the comparison on new benchmarks would test whether the classical advantage generalizes.

Load-bearing premise

The classical agents built for the study fairly represent what classical navigation methods can achieve without hidden implementation advantages.

What would settle it

An experiment on the same two benchmarks in which the identical learning-based agents beat the paper's classical agents after both are re-implemented with equal care would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 1907.11770 by Jia Deng, Noriyuki Kojima.

**Figure 1.** Figure 1: Classical Navigation Agent: We construct a classical navigation agent that consists of a mapper, a localizer, a planner and a controller. See Sec 4 for more details. Inspired by its success in many domains of AI, deep learning has emerged as a promising alternative to classical methods for navigation [10, 11, 12, 14, 26]. Deep learning is attractive in that with sufficient data, effective solutions can eme… view at source ↗

**Figure 2.** Figure 2: Success and failure examples in MINOS and S3DIS: We visualize example success and failure cases of UNREAL and CMP. Blue dots and magenta dots in figures are the goals and starts respectively. Red dots are trajectories of the agents. Top row: Trajectories of UNREAL in MINOS. The left image is a success episode, the middle image is a failed episode due to collisions, and the right image is a failed episode d… view at source ↗

**Figure 3.** Figure 3: Effects of noisy depth on the classical agent in MINOS: We visualize the effects of depth noise coming from the FCRN depth estimator. For the middle columns and right columns, the dark gray regions are predicted obstacles, and light gray regions are predicted free space. The blue dot is the goal, and the red dots show a trajectory of an agent. Left: an input RGB image and the predicted depth by FCRN. Middl… view at source ↗

**Figure 4.** Figure 4: All methods under Gaussian noise: We report the success rate of all agents on MINOS and S3DIS under different Gaussian noise levels. We use dot lines, dash lines and solid lines to show results for the validation episodes of MINOS small and medium house, and the S3DIS 32 steps task respectively. all methods with different noise levels. In MINOS, UNREAL suffers very little from Gaussian noise, while the … view at source ↗

**Figure 5.** Figure 5: Examples of ambiguity / complexity: The magenta dot is the start, and the blue dot is the goal. Top row: The left map is an ambiguous task (7.5 ambiguity score) and the right map is an unambiguous task (1.0 ambiguity score). The colors in the figure shows a heatmap of 2D-MC’s trajectories. Bottom row: The left map is a episode with high complexity (10 turns), and the right map is an episode with low comple… view at source ↗

**Figure 6.** Figure 6: Classical Navigation Pipeline in MINOS: The figure above illustrates the classical navigation pipeline constructed for MINOS environment. We describe details in Section 4 and A1. Vertical / Horizontal Occupancy Map Focal Length, Field of View, Camera Elevation Angle t=1 Action: "Go Forward 0.4 m" (x_t, y_t, Θ_t) Pose Goal Position Camera Parameters Depth Image Planner Controller Environment Analytic Mapper… view at source ↗

**Figure 7.** Figure 7: Classical Navigation Pipeline in S3DIS: The figure above illustrates the classical navigation pipeline constructed for S3DIS environment. We describe details in Section 4 and A2. described in the paper, we convert the 2D occupancy map into a directed graph; a cell in the map corresponds to a node in the graph. We calculate the weight of an edge from node A to B by taking the sum of (1) the weight of a cell… view at source ↗

**Figure 8.** Figure 8: Idealized 2D Monte Carlo Agent Pipeline: The figure above illustrates the 2D-MC agent proposed for experiments in Section 8.1 of the paper. A magenta dot in a map shows the start location, a blue dot indicates the goal location, and an orange dot is a sampled subgoal. Red regions on the maps show the observed free space, and green regions show frontiers. Finally, an emerald line illustrates a planned path … view at source ↗

read the original abstract

In this paper we compare learning-based methods and classical methods for navigation in virtual environments. We construct classical navigation agents and demonstrate that they outperform state-of-the-art learning-based agents on two standard benchmarks: MINOS and Stanford Large-Scale 3D Indoor Spaces. We perform detailed analysis to study the strengths and weaknesses of learned agents and classical agents, as well as how characteristics of the virtual environment impact navigation performance. Our results show that learned agents have inferior collision avoidance and memory management, but are superior in handling ambiguity and noise. These results can inform future design of navigation agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper compares classical navigation agents (constructed by the authors) against state-of-the-art learning-based agents on the MINOS and Stanford Large-Scale 3D Indoor Spaces benchmarks. It reports that the classical agents outperform the learning-based ones and provides an analysis of qualitative differences: learned agents show weaker collision avoidance and memory management but handle ambiguity and noise better. The work includes implementation details for both agent classes and ablation-style failure-mode analysis.

Significance. If the empirical comparison holds, the result is significant because it demonstrates that well-engineered classical methods can remain competitive with contemporary learning approaches on standard virtual-environment navigation benchmarks and supplies concrete guidance on where learning agents need improvement. The manuscript's provision of explicit implementation details for the classical agents and its ablation-style failure analysis are positive features that increase the reproducibility and utility of the comparison.

minor comments (3)

[Abstract] Abstract and §1: the phrase 'state-of-the-art learning-based agents' should be accompanied by an explicit list (with citations and versions) of the exact learning agents evaluated; while the full text supplies implementation details, a concise enumeration in the abstract or introduction would improve clarity.
[§4] Figure captions and §4: several figures comparing trajectories or failure cases would benefit from explicit scale bars or coordinate annotations so that collision-avoidance and memory differences can be visually quantified by readers.
[§5] §5: the discussion of how environment characteristics affect performance would be strengthened by a short table summarizing the key statistics (e.g., average path length, obstacle density) of the two benchmarks.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work, the assessment of its significance, and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely empirical study that constructs classical navigation agents and compares their performance against learning-based agents on the MINOS and Stanford 3D benchmarks. No equations, derivations, fitted parameters presented as predictions, or self-referential claims appear in the abstract or described content. The central claim rests on reported experimental outcomes and ablation analysis rather than any chain that reduces to its own inputs by construction. Self-citations, if present, are not load-bearing for the comparison results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical benchmark study and introduces no mathematical model, fitted parameters, or new theoretical constructs.

pith-pipeline@v0.9.0 · 5619 in / 948 out tokens · 19928 ms · 2026-05-24T15:24:22.358807+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 12 internal anchors

[1]

On Evaluation of Embodied Navigation Agents

P. Anderson, A. Chang, D. S. Chaplot, A. Dosovitskiy, S. Gupta, V . Koltun, J. Kosecka, J. Malik, R. Mottaghi, M. Savva, et al. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Armeni, O

I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 3d semantic parsing of large- scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1534– 1543, 2016

work page 2016
[3]

CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM

M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davison. Codeslam-learning a compact, optimis- able representation for dense visual slam. arXiv preprint arXiv:1804.00874, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Brachmann, A

E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother. Dsac-differentiable ransac for camera localization. In IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), volume 3, 2017

work page 2017
[5]

OpenAI Gym

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Matterport3D: Learning from RGB-D Data in Indoor Environments

A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y . Zhang. Matterport3d: Learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Chen and G

Y . Chen and G. Medioni. Object modelling by registra- tion of multiple range images. Image and vision computing , 10(3):145–155, 1992

work page 1992
[8]

A. J. Davison and D. W. Murray. Mobile robot localisation using active vision. In European Conference on Computer Vision, pages 809–825. Springer, 1998

work page 1998
[9]

G. N. DeSouza and A. C. Kak. Vision for mobile robot navi- gation: A survey. IEEE transactions on pattern analysis and machine intelligence, 24(2):237–267, 2002

work page 2002
[10]

Learning to Act by Predicting the Future

A. Dosovitskiy and V . Koltun. Learning to act by predicting the future. arXiv preprint arXiv:1611.01779, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[11]

Cognitive Mapping and Planning for Visual Navigation

S. Gupta, J. Davidson, S. Levine, R. Sukthankar, and J. Ma- lik. Cognitive mapping and planning for visual navigation. arXiv preprint arXiv:1702.03920, 3, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Unifying Map and Landmark Based Representations for Visual Navigation

S. Gupta, D. Fouhey, S. Levine, and J. Malik. Unifying map and landmark based representations for visual naviga- tion. arXiv preprint arXiv:1712.08125, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Hoiem, Y

D. Hoiem, Y . Chodpathumwan, and Q. Dai. Diagnosing error in object detectors. In European conference on computer vision, pages 340–353. Springer, 2012

work page 2012
[14]

Reinforcement Learning with Unsupervised Auxiliary Tasks

M. Jaderberg, V . Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[15]

Kaﬂe and C

K. Kaﬂe and C. Kanan. An analysis of visual question an- swering algorithms. In Proceedings of the IEEE Interna- tional Conference on Computer Vision , pages 1965–1973, 2017

work page 1965
[16]

Kempka, M

M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Ja´skowski. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In Computational Intelli- gence and Games (CIG), 2016 IEEE Conference on , pages 1–8. IEEE, 2016

work page 2016
[17]

Kendall, M

A. Kendall, M. Grimes, and R. Cipolla. Posenet: A convolu- tional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on com- puter vision, pages 2938–2946, 2015

work page 2015
[18]

Konstantinova and C

N. Konstantinova and C. Orasan. Interactive question an- swering. In Emerging Applications of Natural Language Processing: Concepts and New Research , pages 149–169. IGI Global, 2013

work page 2013
[19]

Laina, C

I. Laina, C. Rupprecht, V . Belagiannis, F. Tombari, and N. Navab. Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV), 2016 F ourth Interna- tional Conference on, pages 239–248. IEEE, 2016

work page 2016
[20]

Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Y . Zhao, E. Wijmans, B. Jain, J. Straub, J. Liu, V . Koltun, J. Malik, D. Parikh, and D. Batra. Habitat: A Platform for Embodied AI Research. arXiv preprint arXiv:1904.01201, 2019

work page arXiv 1904
[21]

Melekhov, J

I. Melekhov, J. Ylioinas, J. Kannala, and E. Rahtu. Relative camera pose estimation using convolutional neural networks. In International Conference on Advanced Concepts for Intel- ligent Vision Systems, pages 675–687. Springer, 2017

work page 2017
[22]

Minguez, L

J. Minguez, L. Montesano, and F. Lamiraux. Metric-based iterative closest point scan matching for sensor displacement estimation. IEEE Transactions on Robotics , 22(5):1047– 1054, 2006

work page 2006
[23]

Benchmarking Classic and Learned Navigation in Complex 3D Environments

D. Mishkin, A. Dosovitskiy, and V . Koltun. Benchmarking classic and learned navigation in complex 3d environments. arXiv preprint arXiv:1901.10915, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[24]

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016

work page 1928
[25]

Mur-Artal, J

R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos. Orb-slam: a versatile and accurate monocular slam system.IEEE trans- actions on robotics, 31(5):1147–1163, 2015

work page 2015
[26]

Neural Map: Structured Memory for Deep Reinforcement Learning

E. Parisotto and R. Salakhutdinov. Neural map: Structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Pathak, P

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity- driven exploration by self-supervised prediction. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 16–17, 2017

work page 2017
[28]

Pomerleau, F

F. Pomerleau, F. Colas, R. Siegwart, and S. Magnenat. Com- paring ICP Variants on Real-World Data Sets. Autonomous Robots, 34(3):133–148, Feb. 2013

work page 2013
[29]

MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments

M. Savva, A. X. Chang, A. Dosovitskiy, T. Funkhouser, and V . Koltun. Minos: Multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931, 2017. 12

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on , pages 190–198. IEEE, 2017

work page 2017
[31]

Tamar, Y

A. Tamar, Y . Wu, G. Thomas, S. Levine, and P. Abbeel. Value iteration networks. In Advances in Neural Informa- tion Processing Systems, pages 2154–2162, 2016

work page 2016
[32]

Tateno, F

K. Tateno, F. Tombari, I. Laina, and N. Navab. Cnn-slam: Real-time dense monocular slam with learned depth predic- tion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, 2017

work page 2017
[33]

Thrun, W

S. Thrun, W. Burgard, and D. Fox. Probabilistic robotics. 2005

work page 2005
[34]

Z. Wang, V . Bapst, N. Heess, V . Mnih, R. Munos, K. Kavukcuoglu, and N. de Freitas. Sample efﬁ- cient actor-critic with experience replay. arXiv preprint arXiv:1611.01224, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[35]

F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9068–9079, 2018

work page 2018
[36]

C. Yan, D. Misra, A. Bennnett, A. Walsman, Y . Bisk, and Y . Artzi. Chalet: Cornell house agent learning environment. arXiv preprint arXiv:1801.07357, 2018

work page arXiv 2018
[37]

Neural slam,

J. Zhang, L. Tai, J. Boedecker, W. Burgard, and M. Liu. Neu- ral slam. arXiv preprint arXiv:1706.09520, 2017

work page arXiv 2017
[38]

Z. Zhang. Iterative point matching for registration of free- form curves and surfaces. International journal of computer vision, 13(2):119–152, 1994. 13

work page 1994

[1] [1]

On Evaluation of Embodied Navigation Agents

P. Anderson, A. Chang, D. S. Chaplot, A. Dosovitskiy, S. Gupta, V . Koltun, J. Kosecka, J. Malik, R. Mottaghi, M. Savva, et al. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Armeni, O

I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 3d semantic parsing of large- scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1534– 1543, 2016

work page 2016

[3] [3]

CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM

M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davison. Codeslam-learning a compact, optimis- able representation for dense visual slam. arXiv preprint arXiv:1804.00874, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Brachmann, A

E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother. Dsac-differentiable ransac for camera localization. In IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), volume 3, 2017

work page 2017

[5] [5]

OpenAI Gym

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Matterport3D: Learning from RGB-D Data in Indoor Environments

A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y . Zhang. Matterport3d: Learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Chen and G

Y . Chen and G. Medioni. Object modelling by registra- tion of multiple range images. Image and vision computing , 10(3):145–155, 1992

work page 1992

[8] [8]

A. J. Davison and D. W. Murray. Mobile robot localisation using active vision. In European Conference on Computer Vision, pages 809–825. Springer, 1998

work page 1998

[9] [9]

G. N. DeSouza and A. C. Kak. Vision for mobile robot navi- gation: A survey. IEEE transactions on pattern analysis and machine intelligence, 24(2):237–267, 2002

work page 2002

[10] [10]

Learning to Act by Predicting the Future

A. Dosovitskiy and V . Koltun. Learning to act by predicting the future. arXiv preprint arXiv:1611.01779, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[11] [11]

Cognitive Mapping and Planning for Visual Navigation

S. Gupta, J. Davidson, S. Levine, R. Sukthankar, and J. Ma- lik. Cognitive mapping and planning for visual navigation. arXiv preprint arXiv:1702.03920, 3, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Unifying Map and Landmark Based Representations for Visual Navigation

S. Gupta, D. Fouhey, S. Levine, and J. Malik. Unifying map and landmark based representations for visual naviga- tion. arXiv preprint arXiv:1712.08125, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Hoiem, Y

D. Hoiem, Y . Chodpathumwan, and Q. Dai. Diagnosing error in object detectors. In European conference on computer vision, pages 340–353. Springer, 2012

work page 2012

[14] [14]

Reinforcement Learning with Unsupervised Auxiliary Tasks

M. Jaderberg, V . Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[15] [15]

Kaﬂe and C

K. Kaﬂe and C. Kanan. An analysis of visual question an- swering algorithms. In Proceedings of the IEEE Interna- tional Conference on Computer Vision , pages 1965–1973, 2017

work page 1965

[16] [16]

Kempka, M

M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Ja´skowski. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In Computational Intelli- gence and Games (CIG), 2016 IEEE Conference on , pages 1–8. IEEE, 2016

work page 2016

[17] [17]

Kendall, M

A. Kendall, M. Grimes, and R. Cipolla. Posenet: A convolu- tional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on com- puter vision, pages 2938–2946, 2015

work page 2015

[18] [18]

Konstantinova and C

N. Konstantinova and C. Orasan. Interactive question an- swering. In Emerging Applications of Natural Language Processing: Concepts and New Research , pages 149–169. IGI Global, 2013

work page 2013

[19] [19]

Laina, C

I. Laina, C. Rupprecht, V . Belagiannis, F. Tombari, and N. Navab. Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV), 2016 F ourth Interna- tional Conference on, pages 239–248. IEEE, 2016

work page 2016

[20] [20]

Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Y . Zhao, E. Wijmans, B. Jain, J. Straub, J. Liu, V . Koltun, J. Malik, D. Parikh, and D. Batra. Habitat: A Platform for Embodied AI Research. arXiv preprint arXiv:1904.01201, 2019

work page arXiv 1904

[21] [21]

Melekhov, J

I. Melekhov, J. Ylioinas, J. Kannala, and E. Rahtu. Relative camera pose estimation using convolutional neural networks. In International Conference on Advanced Concepts for Intel- ligent Vision Systems, pages 675–687. Springer, 2017

work page 2017

[22] [22]

Minguez, L

J. Minguez, L. Montesano, and F. Lamiraux. Metric-based iterative closest point scan matching for sensor displacement estimation. IEEE Transactions on Robotics , 22(5):1047– 1054, 2006

work page 2006

[23] [23]

Benchmarking Classic and Learned Navigation in Complex 3D Environments

D. Mishkin, A. Dosovitskiy, and V . Koltun. Benchmarking classic and learned navigation in complex 3d environments. arXiv preprint arXiv:1901.10915, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[24] [24]

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016

work page 1928

[25] [25]

Mur-Artal, J

R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos. Orb-slam: a versatile and accurate monocular slam system.IEEE trans- actions on robotics, 31(5):1147–1163, 2015

work page 2015

[26] [26]

Neural Map: Structured Memory for Deep Reinforcement Learning

E. Parisotto and R. Salakhutdinov. Neural map: Structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Pathak, P

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity- driven exploration by self-supervised prediction. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 16–17, 2017

work page 2017

[28] [28]

Pomerleau, F

F. Pomerleau, F. Colas, R. Siegwart, and S. Magnenat. Com- paring ICP Variants on Real-World Data Sets. Autonomous Robots, 34(3):133–148, Feb. 2013

work page 2013

[29] [29]

MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments

M. Savva, A. X. Chang, A. Dosovitskiy, T. Funkhouser, and V . Koltun. Minos: Multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931, 2017. 12

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on , pages 190–198. IEEE, 2017

work page 2017

[31] [31]

Tamar, Y

A. Tamar, Y . Wu, G. Thomas, S. Levine, and P. Abbeel. Value iteration networks. In Advances in Neural Informa- tion Processing Systems, pages 2154–2162, 2016

work page 2016

[32] [32]

Tateno, F

K. Tateno, F. Tombari, I. Laina, and N. Navab. Cnn-slam: Real-time dense monocular slam with learned depth predic- tion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, 2017

work page 2017

[33] [33]

Thrun, W

S. Thrun, W. Burgard, and D. Fox. Probabilistic robotics. 2005

work page 2005

[34] [34]

Z. Wang, V . Bapst, N. Heess, V . Mnih, R. Munos, K. Kavukcuoglu, and N. de Freitas. Sample efﬁ- cient actor-critic with experience replay. arXiv preprint arXiv:1611.01224, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[35] [35]

F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9068–9079, 2018

work page 2018

[36] [36]

C. Yan, D. Misra, A. Bennnett, A. Walsman, Y . Bisk, and Y . Artzi. Chalet: Cornell house agent learning environment. arXiv preprint arXiv:1801.07357, 2018

work page arXiv 2018

[37] [37]

Neural slam,

J. Zhang, L. Tai, J. Boedecker, W. Burgard, and M. Liu. Neu- ral slam. arXiv preprint arXiv:1706.09520, 2017

work page arXiv 2017

[38] [38]

Z. Zhang. Iterative point matching for registration of free- form curves and surfaces. International journal of computer vision, 13(2):119–152, 1994. 13

work page 1994