pith. sign in

arxiv: 1906.11407 · v1 · pith:GUSJ2ND7new · submitted 2019-06-27 · 💻 cs.CV · cs.RO

Emergence of Exploratory Look-Around Behaviors through Active Observation Completion

Pith reviewed 2026-05-25 15:17 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords active perceptionreinforcement learningobservation completionlook-around behavioruncertainty reductionsidekick policy learningvisual explorationgeneralization
0
0 comments X

The pith

Training an agent to complete partial observations by reducing uncertainty produces policies that generalize to useful look-around behaviors in other active perception tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how an agent can learn to acquire informative visual observations on its own. It trains the agent with reinforcement learning, rewarding it for choosing glimpses that reduce uncertainty about the unseen parts of a scene before inferring the full environment. A sidekick policy learning method handles sparse rewards by using extra information available only during training. If this works, exploratory look-around behavior emerges from one self-supervised objective and transfers to multiple perception tasks without needing new rewards or retraining for each.

Core claim

The paper claims that the proposed reinforcement learning methods, which train agents to complete partial observations via uncertainty reduction and use sidekick policy learning, learn observation policies that not only succeed at the completion task but also generalize to exhibit useful look-around behavior for a range of active perception tasks.

What carries the argument

The central mechanism is a reinforcement learning policy trained to select short sequences of glimpses that minimize uncertainty when inferring the full environment, combined with sidekick policy learning that exploits greater observability at training time.

If this is right

  • The policies succeed at the trained task of inferring full scenes from partial glimpses.
  • The same policies exhibit useful look-around behavior on other active perception tasks without retraining.
  • Exploratory behavior arises without designing separate rewards for each new task.
  • Sidekick policy learning mitigates sparse rewards during the initial training phase.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Uncertainty reduction may act as a broadly useful objective for bootstrapping exploration in visual agents.
  • The training setup could apply in simulation environments where full scene access is available only while learning the policy.
  • Similar single-objective training might transfer to related problems like active object search or mapping.

Load-bearing premise

That training solely to reduce uncertainty in observation completion will produce exploratory policies that transfer to other active perception tasks without task-specific rewards or fine-tuning.

What would settle it

A direct test where the learned policies show no performance gain over random glimpse selection or non-exploratory baselines on a held-out active perception task such as object classification from limited views.

Figures

Figures reproduced from arXiv: 1906.11407 by Dinesh Jayaraman, Kristen Grauman, Santhosh K. Ramakrishnan.

Figure 1
Figure 1. Figure 1: Looking around efficiently is a complex task requiring the ability to reason about [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Approach overview: The agent (actor) encodes individual views from the environment [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scene and object completion accuracy under different agent behaviors. Top plots [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Episodes of active observation completion for SUN360 (left) and ModelNet (right). [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Three examples of reconstructions after T = 6 glimpses (in order to generate more complete images). The first column shows the ground-truth viewgrids (equirectangular pro￾jections for SUN), the second column shows the corresponding GAN-refined reconstructions of lookaround and rnd-actions agents, and the third column shows handpicked unseen views (marked on the ground-truth) and the corresponding angles. P… view at source ↗
Figure 6
Figure 6. Figure 6: The ground truth 360 panorama or viewgrid, agent glimpse inputs, and final GAN [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Architecture of our active observation completion system. While the input-output [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
read the original abstract

Standard computer vision systems assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself. We address the problem of learning to look around: how can an agent learn to acquire informative visual observations? We propose a reinforcement learning solution, where the agent is rewarded for reducing its uncertainty about the unobserved portions of its environment. Specifically, the agent is trained to select a short sequence of glimpses after which it must infer the appearance of its full environment. To address the challenge of sparse rewards, we further introduce sidekick policy learning, which exploits the asymmetry in observability between training and test time. The proposed methods learn observation policies that not only perform the completion task for which they are trained, but also generalize to exhibit useful "look-around" behavior for a range of active perception tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a reinforcement learning framework in which an agent learns to select short sequences of visual glimpses to reduce uncertainty about unobserved portions of its environment, thereby completing partial observations. To address sparse rewards, the authors introduce sidekick policy learning that exploits an asymmetry in observability between training and test time. The central claim is that policies trained solely on this observation-completion objective generalize zero-shot to exhibit useful exploratory look-around behavior across a range of unrelated active-perception tasks without task-specific rewards or fine-tuning.

Significance. If the zero-shot generalization results are robustly demonstrated, the work would be significant for active vision: it offers a self-supervised route to task-agnostic exploration policies, reducing reliance on hand-crafted rewards for each downstream perception problem. The sidekick technique is a practical contribution to sparse-reward RL in partially observable visual settings.

major comments (2)
  1. [§4] §4 (Experiments): the generalization claim requires explicit zero-shot transfer results on tasks that are demonstrably unrelated to observation completion (e.g., object detection or navigation); without quantitative metrics, baselines, and ablations showing that performance does not collapse when the reconstruction head is removed, the claim that the behavior is task-agnostic remains unverified.
  2. [§3.2] §3.2 (Sidekick policy learning): the formulation must clarify whether the sidekick policy is trained with access to ground-truth full observations only during training or whether any auxiliary loss inadvertently leaks test-time information; if the latter, the learned glimpse-selection policy may overfit to completion-specific uncertainty patterns rather than producing general exploration.
minor comments (2)
  1. [§3] Notation for the uncertainty reward and the glimpse-selection action space should be defined once in §3 and used consistently; currently the abstract and method use slightly different phrasing for the same quantities.
  2. [Figures] Figure captions should state whether error bars represent standard deviation across seeds or across environments, and whether the reported numbers are from the same policy checkpoint used for all downstream tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our contributions. We address each major point below and indicate where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): the generalization claim requires explicit zero-shot transfer results on tasks that are demonstrably unrelated to observation completion (e.g., object detection or navigation); without quantitative metrics, baselines, and ablations showing that performance does not collapse when the reconstruction head is removed, the claim that the behavior is task-agnostic remains unverified.

    Authors: The active-perception tasks evaluated in the original manuscript (active object recognition and similar look-around problems) are unrelated to the observation-completion training objective, as they involve different reward structures and goals at test time. Nevertheless, to directly address the concern, the revised manuscript will include additional zero-shot transfer experiments on navigation and object detection, along with the requested quantitative metrics, baselines, and an ablation removing the reconstruction head. This strengthens the evidence without altering the core claims. revision: partial

  2. Referee: [§3.2] §3.2 (Sidekick policy learning): the formulation must clarify whether the sidekick policy is trained with access to ground-truth full observations only during training or whether any auxiliary loss inadvertently leaks test-time information; if the latter, the learned glimpse-selection policy may overfit to completion-specific uncertainty patterns rather than producing general exploration.

    Authors: The sidekick policy receives ground-truth full observations exclusively during training to generate dense rewards; at test time the policy has no access to full observations or any auxiliary signals derived from them. No auxiliary loss uses or leaks test-time information. We will revise §3.2 to state this asymmetry explicitly and emphasize that the resulting policy is not specialized to completion-specific uncertainty. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained RL formulation

full rationale

The paper defines an RL objective that rewards uncertainty reduction on an observation-completion task and augments it with sidekick policy learning that exploits an explicit training/test observability asymmetry. No load-bearing step equates a claimed prediction or generalization result to its own fitted inputs or to a self-citation chain; the generalization behavior is presented as an empirical outcome of the learned policy rather than a mathematical identity or renamed input. The derivation therefore remains independent of the target transfer results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard RL assumptions plus the novel sidekick technique; no explicit free parameters or invented physical entities are named in the abstract, but the method implicitly depends on typical RL training choices and the domain assumption of glimpse-based partial observability.

free parameters (1)
  • RL training hyperparameters
    Standard learning rates, discount factors, and reward scaling are required for any RL implementation but are not specified in the abstract.
axioms (1)
  • domain assumption Partial visual observations can be obtained via discrete glimpses and uncertainty can be quantified for reward computation
    Invoked in the description of the reward signal and the completion task.
invented entities (1)
  • sidekick policy learning no independent evidence
    purpose: Exploit asymmetry in observability between training and test time to address sparse rewards
    New auxiliary training procedure introduced to make the main RL objective tractable

pith-pipeline@v0.9.0 · 5679 in / 1367 out tokens · 31200 ms · 2026-05-25T15:17:26.783097+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 6 internal anchors

  1. [1]

    remember

    The transfer performance of our policies is better than that of rnd-actions on all tasks. This shows that intelligent sequential camera control has scope for improving these perception tasks’ efficiency. Overall, our look-around policy transfers well across tasks, competing with or even outperforming the supervised task-specific policies. Furthermore, our l...

  2. [2]

    Berg, and Li Fei-Fei

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhi- heng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015

  3. [3]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014

  4. [4]

    UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

    Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012

  5. [5]

    Development of three-dimensional object completion in infancy

    Kasey C Soska and Scott P Johnson. Development of three-dimensional object completion in infancy. In Child development, 2008

  6. [6]

    Systems in development: motor skill acquisition facilitates three-dimensional object completion

    Kasey C Soska, Karen E Adolph, and Scott P Johnson. Systems in development: motor skill acquisition facilitates three-dimensional object completion. In Developmental psychology, 2010

  7. [7]

    Perception of partly occluded objects in infancy

    Philip J Kellman and Elizabeth S Spelke. Perception of partly occluded objects in infancy. In Cognitive psychology, 1983

  8. [8]

    Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search

    Antonio Torralba, Aude Oliva, Monica S Castelhano, and John M Henderson. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. In Psychological review, 2006. 34

  9. [9]

    Look-ahead before you leap: end-to-end active recognition by forecasting the effect of motion

    Dinesh Jayaraman and Kristen Grauman. Look-ahead before you leap: end-to-end active recognition by forecasting the effect of motion. In European Conference on Computer Vision, 2016

  10. [10]

    Deep q-learning for active recognition of germs: Baseline performance on a standardized dataset for active learning

    Mohsen Malmir, Karan Sikka, Deborah Forster, Javier R Movellan, and Garison Cottrell. Deep q-learning for active recognition of germs: Baseline performance on a standardized dataset for active learning. In British Machine Vision Conference, 2015

  11. [11]

    3d shapenets: A deep representation for volumetric shapes

    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In Computer Vision and Pattern Recognition, IEEE Conference on, 2015

  12. [12]

    A dataset for developing and benchmarking active vision

    Phil Ammirato, Patrick Poirson, Eunbyung Park, Jana Ko ˇseck´a, and Alexander C Berg. A dataset for developing and benchmarking active vision. In Robotics and Automation, IEEE International Conference on, 2017

  13. [13]

    End-to-end learning of action detection from frame glimpses in videos

    Serena Yeung, Olga Russakovsky, Greg Mori, and Li Fei-Fei. End-to-end learning of action detection from frame glimpses in videos. In Computer Vision and Pattern Recognition, IEEE Conference on, 2016

  14. [14]

    Mathe, A

    S. Mathe, A. Pirinen, and C. Sminchisescu. Reinforcement learning for visual object de- tection. In Computer Vision and Pattern Recognition, IEEE Conference on , 2016

  15. [15]

    Karayev, T

    S. Karayev, T. Baumgartner, M. Fritz, and T. Darrell. Timely object recognition. In Ad- vances in Neural Information Processing Systems , 2012

  16. [16]

    Efros, and Trevor Darrell

    Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven ex- ploration by self-supervised prediction. In International Conference on Machine Learning, 2017. 35

  17. [17]

    Learning exploration policies for naviga- tion

    Tao Chen, Saurabh Gupta, and Abhinav Gupta. Learning exploration policies for naviga- tion. In International Conference on Learning Representations , 2019

  18. [18]

    Sinha, Ashish Kapoor, Neel Joshi, and Otmar Hilliges

    Benjamin Hepp, Debadeepta Dey, Sudipta N. Sinha, Ashish Kapoor, Neel Joshi, and Otmar Hilliges. Learn-to-score: Efficient 3d scene exploration by predicting view utility. In The European Conference on Computer Vision, September 2018

  19. [19]

    Im2pano3d: Extrapolating 360 structure and semantics beyond the field of view

    Shuran Song, Andy Zeng, Angel X Chang, Manolis Savva, Silvio Savarese, and Thomas Funkhouser. Im2pano3d: Extrapolating 360 structure and semantics beyond the field of view. In Computer Vision and Pattern Recognition, IEEE Conference on , pages 3847– 3856, 2018

  20. [20]

    Deep view morph- ing

    Dinghuang Ji, Junghyun Kwon, Max McFarland, and Silvio Savarese. Deep view morph- ing. In Computer Vision and Pattern Recognition, IEEE Conference on , volume 2, 2017

  21. [21]

    Deep convo- lutional inverse graphics network

    Tejas D Kulkarni, William F Whitney, Pushmeet Kohli, and Josh Tenenbaum. Deep convo- lutional inverse graphics network. In Advances in neural information processing systems , pages 2539–2547, 2015

  22. [22]

    Shapecodes: Self-supervised fea- ture learning by lifting views to viewgrids

    Dinesh Jayaraman, Ruohan Gao, and Kristen Grauman. Shapecodes: Self-supervised fea- ture learning by lifting views to viewgrids. European Conference on Computer Vision , 2018

  23. [23]

    Neural scene representation and rendering

    SM Ali Eslami, Danilo Jimenez Rezende, Frederic Besse, Fabio Viola, Ari S Morcos, Marta Garnelo, Avraham Ruderman, Andrei A Rusu, Ivo Danihelka, Karol Gregor, et al. Neural scene representation and rendering. Science, 360(6394):1204–1210, 2018

  24. [24]

    Learning to look around: Intelligently exploring unseen environments for unknown tasks

    Dinesh Jayaraman and Kristen Grauman. Learning to look around: Intelligently exploring unseen environments for unknown tasks. In Computer Vision and Pattern Recognition, IEEE Conference on, 2018. 36

  25. [25]

    Ramakrishnan and Kristen Grauman

    Santhosh K. Ramakrishnan and Kristen Grauman. Sidekick Policy Learning for Active Visual Exploration. In European Conference on Computer Vision, 2018

  26. [26]

    Pairwise decomposition of im- age sequences for active multi-view recognition

    Edward Johns, Stefan Leutenegger, and Andrew J Davison. Pairwise decomposition of im- age sequences for active multi-view recognition. In Computer Vision and Pattern Recogni- tion, IEEE Conference on, 2016

  27. [27]

    Visual Semantic Planning using Deep Successor Representa- tions

    Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, and Ali Farhadi. Visual Semantic Planning using Deep Successor Representa- tions. In Computer Vision, IEEE International Conference on , 2017

  28. [28]

    Unifying Map and Landmark Based Representations for Visual Navigation

    Saurabh Gupta, David Fouhey, Sergey Levine, and Jitendra Malik. Unifying map and landmark based representations for visual navigation. arXiv preprint arXiv:1712.08125 , 2017

  29. [29]

    Target-driven visual navigation in indoor scenes using deep reinforcement learning

    Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Robotics and Automation, IEEE International Conference on , 2017

  30. [30]

    Jayaraman and K

    D. Jayaraman and K. Grauman. End-to-end policy learning for active visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018

  31. [31]

    Deep learning for real-time atari game play using offline monte-carlo tree search planning

    Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L Lewis, and Xiaoshi Wang. Deep learning for real-time atari game play using offline monte-carlo tree search planning. In Advances in Neural Information Processing Systems , 2014

  32. [32]

    Learning with intelligent teacher

    Vladimir Vapnik and Rauf Izmailov. Learning with intelligent teacher. In Symposium on Conformal and Probabilistic Prediction with Applications , 2016. 37

  33. [33]

    Recognizing scene viewpoint using panoramic place representation

    Jianxiong Xiao, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Recognizing scene viewpoint using panoramic place representation. In Computer Vision and Pattern Recogni- tion, IEEE Conference on, 2012

  34. [34]

    Graph-based visual saliency

    Jonathan Harel, Christof Koch, and Pietro Perona. Graph-based visual saliency. In Ad- vances in Neural Information Processing Systems , 2006

  35. [35]

    Image-to-image translation with conditional adversarial networks

    Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Computer Vision and Pattern Recognition, IEEE Conference on, pages 5967–5976. IEEE, 2017

  36. [36]

    3d- r2n2: A unified approach for single and multi-view 3d object reconstruction

    Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 3d- r2n2: A unified approach for single and multi-view 3d object reconstruction. In Proceed- ings of the European Conference on Computer Vision (ECCV) , 2016

  37. [37]

    Haoqiang Fan, Hao Su, and Leonidas J. Guibas. A point set generation network for 3d object reconstruction from a single image. In Computer Vision and Pattern Recognition, IEEE Conference on, July 2017

  38. [38]

    Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images

    Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. arXiv preprint arXiv:1804.01654, 2018

  39. [39]

    Carla: An open urban driving simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on Robot Learning, 2017

  40. [40]

    Asymmetric actor critic for image-based robot learning

    Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Wojciech Zaremba, and Pieter Abbeel. Asymmetric actor critic for image-based robot learning. Robotics: Science and Systems , 2018. 38

  41. [41]

    Embodied Question Answering

    Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Ba- tra. Embodied Question Answering. In Computer Vision and Pattern Recognition, IEEE Conference on, 2018

  42. [42]

    Building Generalizable Agents with a Realistic and Rich 3D Environment

    Yi Wu, Yuxin Wu, Georgia Gkioxari, and Yuandong Tian. Building generalizable agents with a realistic and rich 3d environment. arXiv preprint arXiv:1801.02209, 2018

  43. [43]

    Vision-and-language navigation: Inter- preting visually-grounded navigation instructions in real environments

    Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko S ¨underhauf, Ian Reid, Stephen Gould, and Anton van den Hengel. Vision-and-language navigation: Inter- preting visually-grounded navigation instructions in real environments. InComputer Vision and Pattern Recognition, IEEE Conference on, 2018

  44. [44]

    Semi-parametric topological memory for navigation

    Nikolay Savinov, Alexey Dosovitskiy, and Vladlen Koltun. Semi-parametric topological memory for navigation. International Conference on Learning Representations , 2018

  45. [45]

    World Models

    David Ha and J ¨urgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2018

  46. [46]

    Learning real-world robot policies by dreaming

    AJ Piergiovanni, Alan Wu, and Michael S Ryoo. Learning real-world robot policies by dreaming. arXiv preprint arXiv:1805.07813, 2018

  47. [47]

    Long short-term memory

    Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

  48. [48]

    Learning stochastic feedforward networks

    Radford M Neal. Learning stochastic feedforward networks. Department of Computer Science, University of Toronto, 64(9), 1990

  49. [49]

    Reinforcement learning: An introduction

    Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction

  50. [50]

    End to End Learning for Self-Driving Cars

    Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016. 39

  51. [51]

    A machine learning approach to visual perception of forest trails for mobile robots

    Alessandro Giusti, J ´erˆome Guzzi, Dan C Cires ¸an, Fang-Lin He, Juan P Rodr´ıguez, Flavio Fontana, Matthias Faessler, Christian Forster, J ¨urgen Schmidhuber, Gianni Di Caro, et al. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 2016

  52. [52]

    One-shot imitation learning

    Yan Duan, Marcin Andrychowicz, Bradly Stadie, OpenAI Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Advances in Neural Information Processing Systems , 2017

  53. [53]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

  54. [54]

    Spherenet: Learn- ing spherical representations for detection and classification in omnidirectional images

    Coors, Benjamin and Paul Condurache, Alexandru and Geiger, Andreas. Spherenet: Learn- ing spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV) , 2018

  55. [55]

    environment

    For simplicity of presentation, we represent an “environment” as X where the agent ex- plores a novel scene, looking outward in new viewing directions. However, experiments will also use X as an object where the agent moves around an object, looking inward at it from new viewing angles. Figure 1 illustrates the two scenarios

  56. [56]

    The angles were selected to break symmetry and reduce redundancy of views

  57. [57]

    For the sake of brevity, we report the best performances among the two sidekick variants we proposed in (24)

  58. [58]

    grid-of-grids

    We refine the decoded viewgrids (for both our method and the baseline) with a pix2pix (34)- style conditional Generative Adversarial Network (GAN), detailed in the Supplementary Materials. 40 Acknowledgements: We thank Yu-Chuan Su, Kimberly Hsiao, Bo Xiong and Philipp Kr¨ahenb¨uhl for helpful discussions. Funding: The University of Texas at AUstin is suppo...