Neural Embedding for Physical Manipulations

Andong Cao; Jianbo Shi; Lingzhi Zhang; Rui Li

REVIEW 2 major objections 1 minor 49 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

Enforcing normalized pairwise distances between latent and output spaces enables data-efficient discovery of full output topologies.

2026-05-24 21:42 UTC pith:BN3KXX4V

load-bearing objection The grid-cell distance constraint is a sensible practical idea for sparse robotic data but the abstract gives no reason to think it actually forces full topology coverage. the 2 major comments →

arxiv 1907.06143 v1 pith:BN3KXX4V submitted 2019-07-13 cs.LG cs.CV

Neural Embedding for Physical Manipulations

Lingzhi Zhang , Andong Cao , Rui Li , Jianbo Shi This is my paper

classification cs.LG cs.CV

keywords generative modellatent spaceoutput spacepairwise distancemode collapsegrid cellsrobotic manipulationdata efficiency

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a generative model that draws from grid cell properties to enforce matching normalized distances between points in the latent space and the generated outputs. The goal is to learn the complete structure of action and state spaces from only sparse observations, which is common in robotic tasks. Unlike GANs and VAEs that tend to collapse and only generate limited varieties, this constraint is intended to promote exploration of the entire space. A sympathetic reader would care because it could make learning in high-dimensional, partially observed environments more reliable and efficient.

Core claim

The authors claim that their generative model, by imposing a normalized pairwise distance constraint between the latent space and the output space, achieves substantially better results than GANs and VAEs in discovering the full topology of output spaces from few and sparse observations, avoiding the mode collapse that limits prior models.

What carries the argument

The normalized pairwise distance constraint that aligns the geometry of the latent representation with that of the output space.

Load-bearing premise

That the normalized pairwise distance constraint will consistently force exploration of the complete output space instead of permitting partial or collapsed solutions.

What would settle it

Training the model on a synthetic dataset with a known complete topology, such as all possible configurations in a low-dimensional space, and checking whether generated samples cover all regions or still exhibit clustering in subsets.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

The model explores the full output topology rather than collapsing to few modes.
Learning becomes more data-efficient for tasks with vast and unknown spaces.
Both qualitative and quantitative improvements are shown on various datasets.
Applicable to robotic operations where observations are sparse.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This distance constraint approach could be tested in other generative tasks beyond physical manipulations, such as image synthesis.
If the grid cell inspiration holds, similar mechanisms might appear in other neural architectures for spatial reasoning.
Real-world robotic experiments would be needed to confirm if the learned topologies translate to better manipulation performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Referee Report

2 major / 1 minor

Summary. The paper proposes a generative model inspired by grid cells in mammalian brains. It enforces a normalized pairwise distance constraint between latent and output spaces to enable data-efficient discovery of the full topology of action and state spaces from sparse observations in robotic settings. The method is claimed to substantially outperform GANs and VAEs by avoiding mode collapse, with qualitative and quantitative demonstrations on various datasets.

Significance. If the central claim holds with rigorous validation, the approach could offer a principled way to mitigate mode collapse in generative models for high-dimensional robotic spaces, improving data efficiency in topology discovery where observations are sparse.

major comments (2)

[Abstract] Abstract: The central performance claim (substantially better results than GANs/VAEs via the distance constraint) is stated without any equations, implementation details, experimental setup, or quantitative numbers, rendering it impossible to verify whether the math or data support the claim.
[Abstract] Abstract: The key assumption that a normalized pairwise distance constraint (grid-cell inspired) will reliably produce full output topology exploration and avoid mode collapse is not justified; distance preservation on observed pairs alone does not guarantee recovery of global topology on non-Euclidean manifolds and permits collapse to lower-dimensional subsets while satisfying the loss.

minor comments (1)

[Abstract] Abstract: The phrase 'various datasets' is vague; specifying the datasets and metrics used for the qualitative/quantitative demonstrations would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim (substantially better results than GANs/VAEs via the distance constraint) is stated without any equations, implementation details, experimental setup, or quantitative numbers, rendering it impossible to verify whether the math or data support the claim.

Authors: We agree the abstract is high-level by design. The normalized pairwise distance constraint is formalized in Equation (3) of Section 3, with implementation details in Section 4 and quantitative results (including specific metrics outperforming GANs/VAEs on topology coverage) in Section 5 and Tables 1-2. To address the concern, we will revise the abstract to include a brief reference to the constraint equation and example quantitative gains. revision: yes
Referee: [Abstract] Abstract: The key assumption that a normalized pairwise distance constraint (grid-cell inspired) will reliably produce full output topology exploration and avoid mode collapse is not justified; distance preservation on observed pairs alone does not guarantee recovery of global topology on non-Euclidean manifolds and permits collapse to lower-dimensional subsets while satisfying the loss.

Authors: The constraint is enforced between all latent-output pairs during optimization (not solely observed pairs) to promote an approximately isometric embedding, as motivated by grid cell properties. While we acknowledge that strict theoretical guarantees for arbitrary non-Euclidean manifolds remain an open question and the loss could in principle admit lower-dimensional solutions, our empirical evaluations on multiple robotic and synthetic datasets demonstrate reliable topology exploration and reduced mode collapse relative to baselines. We will add a limitations paragraph in the discussion section to explicitly note this point and the supporting experimental evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: method described at high level with no equations or self-citation chains

full rationale

The abstract and summary present a generative model that enforces a normalized pairwise distance constraint between latent and output spaces, inspired by grid cells, and claim empirical superiority over GANs/VAEs in avoiding mode collapse. No mathematical derivations, equations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. The central claim is an empirical performance improvement rather than a first-principles derivation that reduces to its inputs by construction. Without quotable equations or self-referential steps, no circularity patterns (self-definitional, fitted-input-called-prediction, etc.) can be exhibited.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5654 in / 1006 out tokens · 35839 ms · 2026-05-24T21:42:33.271348+00:00 · methodology

0 comments

read the original abstract

In common real-world robotic operations, action and state spaces can be vast and sometimes unknown, and observations are often relatively sparse. How do we learn the full topology of action and state spaces when given only few and sparse observations? Inspired by the properties of grid cells in mammalian brains, we build a generative model that enforces a normalized pairwise distance constraint between the latent space and output space to achieve data-efficient discovery of output spaces. This method achieves substantially better results than prior generative models, such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs). Prior models have the common issue of mode collapse and thus fail to explore the full topology of output space. We demonstrate the effectiveness of our model on various datasets both qualitatively and quantitatively.

Figures

Figures reproduced from arXiv: 1907.06143 by Andong Cao, Jianbo Shi, Lingzhi Zhang, Rui Li.

**Figure 1.** Figure 1: Given a set of sparse observations of action and state, we aim to learn a generative model that can interpolate the intermediate actions and predicts the corresponding future states. 1 Introduction Grid cells, the grid-like neural circuit in mammalian brains, is known to dynamically map the external environment as the animal navigates the world [1]. Remarkably, this encoding preserves metric distance relat… view at source ↗

**Figure 2.** Figure 2: This is an overview of our model architecture. Top Left: An auto-encoder that guides the network to learn a meaningful feature embedding of the input state. Bottom Left: The action decoder takes the input state embedding concatenated with a noise sampled uniform distribution and predicts an action. Top Right: Conditioned on the input state, the discriminator takes actions as inputs and predicts the probabi… view at source ↗

**Figure 3.** Figure 3: This figure shows the idea of normalized pairwise distance in the latent space and action space. 3.1.1 Active Exploration Via Normalized Diversification When mapping random variables from the latent space to the action space, our generative model preserves the normalized pairwise distance of different generated samples in between the latent space and the action space. The distance metric dz(., .) between a… view at source ↗

**Figure 4.** Figure 4: Left: Rolling dataset; Right: Rope dataset. Evaluation Metric. To evaluate whether the sampled actions are plausible or realistic, we use three evaluation metrics to quantify the similarity between the generated and real action distributions, including Fréchet Distance [45] and Jensen-Shannon Divergence (JS Divergence) [46]. Baseline Models. We conduct experiments in two settings. One is with a fixed initi… view at source ↗

**Figure 5.** Figure 5: Comparison of generative models’ ability to discover the unknown action and state spaces. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: A table shows "JS Divergence" between approximate and real action distribution" versus "number of training samples" [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Quanlitative results of diverse action sampling on rope and roller manipulations. Rope Roller Model Fréchet Distance ↓ JS Divergence ↓ Fréchet Distance ↓ JS Divergence ↓ VAE[34] 12.367 ± 1.049 0.670 ± 0.009 10.140 ± 0.002 0.660 ± 0.006 GAN[35] 16.481 ± 10.450 0.667 ± 0.007 13.045 ± 6.798 0.666 ± 0.005 Ours 11.084 ± 4.460 0.547 ± 0.101 9.662 ± 4.905 0.504 ± 0.085 [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Predicted and ground truth future states given input state and action. Model Pixel MSE Error Rope 5.8908 Roller 54.7298 [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: This figure shows the rope and roller images on the t-SNE embeddings [49] using the feature extracted by the current state encoder. Zoom in to see the details. 5 Conclusion In this work, we propose a generative model that can approximate vast and unknown action and state spaces using only sparse observations. Current generative models suffer from mode collapsing and mode dropping issues, and so we propose … view at source ↗

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 25 internal anchors

[1]

Moser, D

M.-B. Moser, D. C. Rowland, and E. I. Moser. Place cells, grid cells, and memory. Cold Spring Harbor perspectives in biology , 7(2):a021808, 2015

work page 2015
[2]

R. A. Epstein, E. Z. Patai, J. B. Julian, and H. J. Spiers. The cognitive map in humans: spatial navigation and beyond. Nature neuroscience, 20(11):1504, 2017

work page 2017
[3]

Barry, R

C. Barry, R. Hayman, N. Burgess, and K. J. Jeffery. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6):682, 2007

work page 2007
[4]

S. Liu, X. Zhang, J. Wangni, and J. Shi. Normalized diversiﬁcation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10306–10315, 2019

work page 2019
[5]

J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. P. Singh. Action-conditional video prediction using deep networks in atari games. CoRR, abs/1507.08750, 2015. URL http://arxiv.org/abs/ 1507.08750

work page internal anchor Pith review Pith/arXiv arXiv 2015
[6]

C. Finn, I. J. Goodfellow, and S. Levine. Unsupervised learning for physical interaction through video prediction. CoRR, abs/1605.07157, 2016. URL http://arxiv.org/abs/1605.07157

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

J. Wu, E. Lu, P. Kohli, B. Freeman, and J. Tenenbaum. Learning to see physics via vi- sual de-animation. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 153–164. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/ 6620-learni...

work page 2017
[8]

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

M. Watter, J. T. Springenberg, J. Boedecker, and M. A. Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. CoRR, abs/1506.07365, 2015. URL http://arxiv.org/abs/1506.07365

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

B. C. Stadie, S. Levine, and P. Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. CoRR, abs/1507.00814, 2015. URL http://arxiv.org/abs/1507. 00814

work page internal anchor Pith review Pith/arXiv arXiv 2015
[11]

URL http://arxiv.org/abs/1605.09674

work page internal anchor Pith review Pith/arXiv arXiv
[12]

M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. CoRR, abs/1606.01868, 2016. URL http: //arxiv.org/abs/1606.01868

work page internal anchor Pith review Pith/arXiv arXiv 2016
[13]

J. Fu, J. Co-Reyes, and S. Levine. Ex2: Exploration with exemplar models for deep reinforcement learning. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2577–2587. Curran Associates, Inc., 2017. URLhttp://papers.nips.cc/paper/ 6851-ex2-exp...

work page 2017
[14]

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

J. Achiam and S. Sastry. Surprise-based intrinsic motivation for deep reinforcement learning. CoRR, abs/1703.01732, 2017. URL http://arxiv.org/abs/1703.01732

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Curiosity-driven Exploration by Self-supervised Prediction

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self- supervised prediction. CoRR, abs/1705.05363, 2017. URL http://arxiv.org/abs/1705. 05363

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Count-Based Exploration with Neural Density Models

G. Ostrovski, M. G. Bellemare, A. van den Oord, and R. Munos. Count-based exploration with neural density models. CoRR, abs/1703.01310, 2017. URL http://arxiv.org/abs/1703. 01310

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

URL http://arxiv.org/abs/1611.07507. 9

work page internal anchor Pith review Pith/arXiv arXiv
[19]

B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning. In Proc. AAAI, pages 1433–1438, 2008

work page 2008
[20]

Reinforcement Learning with Deep Energy-Based Policies

T. Haarnoja, H. Tang, P. Abbeel, and S. Levine. Reinforcement learning with deep energy-based policies. CoRR, abs/1702.08165, 2017. URL http://arxiv.org/abs/1702.08165

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR, abs/1801.01290, 2018. URL http://arxiv.org/abs/1801.01290

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

T. Jung, D. Polani, and P. Stone. Empowerment for continuous agent-environment systems. CoRR, abs/1201.6583, 2012. URL http://arxiv.org/abs/1201.6583

work page internal anchor Pith review Pith/arXiv arXiv 2012
[23]

Diversity is All You Need: Learning Skills without a Reward Function

B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine. Diversity is all you need: Learning skills without a reward function. CoRR, abs/1802.06070, 2018. URL http://arxiv.org/abs/ 1802.06070

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Latent Space Policies for Hierarchical Reinforcement Learning

T. Haarnoja, K. Hartikainen, P. Abbeel, and S. Levine. Latent space policies for hierarchical reinforcement learning. CoRR, abs/1804.02808, 2018. URL http://arxiv.org/abs/1804. 02808

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Coros, P

S. Coros, P. Beaudoin, and M. van de Panne. Robust task-based control policies for physics- based characters. In ACM SIGGRAPH Asia 2009 Papers, SIGGRAPH Asia ’09, pages 170:1– 170:9, New York, NY , USA, 2009. ACM. ISBN 978-1-60558-858-2. doi:10.1145/1661412. 1618516. URL http://doi.acm.org/10.1145/1661412.1618516

work page doi:10.1145/1661412 2009
[26]

Meta Learning Shared Hierarchies

K. Frans, J. Ho, X. Chen, P. Abbeel, and J. Schulman. Meta learning shared hierarchies. CoRR, abs/1710.09767, 2017. URL http://arxiv.org/abs/1710.09767

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

2017 , issue_date =

L. Liu and J. Hodgins. Learning to schedule control fragments for physics-based characters using deep q-learning. ACM Trans. Graph., 36(3), June 2017. ISSN 0730-0301. doi:10.1145/3083723. URL http://doi.acm.org/10.1145/3083723

work page doi:10.1145/3083723 2017
[28]

Merel, A

J. Merel, A. Ahuja, V . Pham, S. Tunyasuvunakool, S. Liu, D. Tirumala, N. Heess, and G. Wayne. Hierarchical visuomotor control of humanoids. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJfYvo09Y7

work page 2019
[29]

X. B. Peng, G. Berseth, and M. van de Panne. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Trans. Graph., 35(4):81:1–81:12, July 2016. ISSN 0730-0301. doi:10.1145/2897824.2925881. URL http://doi.acm.org/10.1145/2897824.2925881

work page doi:10.1145/2897824.2925881 2016
[30]

J. Z. Kolter and A. Y . Ng. Learning omnidirectional path following using dimensionality reduction. In in Proceedings of Robotics: Science and Systems , 2007

work page 2007
[31]

Learning and Transfer of Modulated Locomotor Controllers

N. Heess, G. Wayne, Y . Tassa, T. P. Lillicrap, M. A. Riedmiller, and D. Silver. Learning and transfer of modulated locomotor controllers. CoRR, abs/1610.05182, 2016. URL http: //arxiv.org/abs/1610.05182

work page internal anchor Pith review Pith/arXiv arXiv 2016
[32]

Hausman, J

K. Hausman, J. T. Springenberg, Z. Wang, N. Heess, and M. Riedmiller. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations,

work page
[33]

URL https://openreview.net/forum?id=rk07ZXZRb

work page
[34]

Merel, L

J. Merel, L. Hasenclever, A. Galashov, A. Ahuja, V . Pham, G. Wayne, Y . W. Teh, and N. Heess. Neural probabilistic motor primitives for humanoid control. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJl6TjRcY7

work page 2019
[35]

X. B. Peng, M. Chang, G. Zhang, P. Abbeel, and S. Levine. MCP: learning composable hierarchical control with multiplicative compositional policies. CoRR, abs/1905.09808, 2019. URL http://arxiv.org/abs/1905.09808

work page internal anchor Pith review Pith/arXiv arXiv 1905
[36]

Variational Option Discovery Algorithms

J. Achiam, H. Edwards, D. Amodei, and P. Abbeel. Variational option discovery algorithms. CoRR, abs/1807.10299, 2018. URL http://arxiv.org/abs/1807.10299

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013. 10

work page internal anchor Pith review Pith/arXiv arXiv 2013
[38]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Generative adversarial nets. InAdvances in neural information processing systems , pages 2672–2680, 2014

work page 2014
[39]

R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do. Semantic image inpainting with deep generative models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5485–5493, 2017

work page 2017
[40]

Upchurch, J

P. Upchurch, J. Gardner, G. Pleiss, R. Pless, N. Snavely, K. Bala, and K. Weinberger. Deep feature interpolation for image content changes. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7064–7073, 2017

work page 2017
[41]

Y . Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. Stargan: Uniﬁed generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8789–8797, 2018

work page 2018
[42]

X. Yan, J. Yang, K. Sohn, and H. Lee. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision, pages 776–791. Springer, 2016

work page 2016
[43]

Interpretable Latent Spaces for Learning from Demonstration

Y . Hristov, A. Lascarides, and S. Ramamoorthy. Interpretable latent spaces for learning from demonstration. arXiv preprint arXiv:1807.06583, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

J. H. Lim and J. C. Ye. Geometric gan. arXiv preprint arXiv:1705.02894, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

D. Tran, R. Ranganath, and D. M. Blei. Deep and hierarchical implicit models. arXiv preprint arXiv:1702.08896, 7, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

Spectral Normalization for Generative Adversarial Networks

T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[48]

Lucic, K

M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet. Are gans created equal? a large-scale study. In Advances in neural information processing systems , pages 700–709, 2018

work page 2018
[49]

C. D. Manning, C. D. Manning, and H. Schütze. F oundations of statistical natural language processing. MIT press, 1999

work page 1999
[50]

Conditional Generative Adversarial Nets

M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[51]

C. Doersch. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908, 2016

work page arXiv 2016
[52]

L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008. 11

work page 2008

[1] [1]

Moser, D

M.-B. Moser, D. C. Rowland, and E. I. Moser. Place cells, grid cells, and memory. Cold Spring Harbor perspectives in biology , 7(2):a021808, 2015

work page 2015

[2] [2]

R. A. Epstein, E. Z. Patai, J. B. Julian, and H. J. Spiers. The cognitive map in humans: spatial navigation and beyond. Nature neuroscience, 20(11):1504, 2017

work page 2017

[3] [3]

Barry, R

C. Barry, R. Hayman, N. Burgess, and K. J. Jeffery. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6):682, 2007

work page 2007

[4] [4]

S. Liu, X. Zhang, J. Wangni, and J. Shi. Normalized diversiﬁcation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10306–10315, 2019

work page 2019

[5] [5]

J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. P. Singh. Action-conditional video prediction using deep networks in atari games. CoRR, abs/1507.08750, 2015. URL http://arxiv.org/abs/ 1507.08750

work page internal anchor Pith review Pith/arXiv arXiv 2015

[6] [6]

C. Finn, I. J. Goodfellow, and S. Levine. Unsupervised learning for physical interaction through video prediction. CoRR, abs/1605.07157, 2016. URL http://arxiv.org/abs/1605.07157

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

J. Wu, E. Lu, P. Kohli, B. Freeman, and J. Tenenbaum. Learning to see physics via vi- sual de-animation. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 153–164. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/ 6620-learni...

work page 2017

[8] [8]

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

M. Watter, J. T. Springenberg, J. Boedecker, and M. A. Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. CoRR, abs/1506.07365, 2015. URL http://arxiv.org/abs/1506.07365

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

B. C. Stadie, S. Levine, and P. Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. CoRR, abs/1507.00814, 2015. URL http://arxiv.org/abs/1507. 00814

work page internal anchor Pith review Pith/arXiv arXiv 2015

[10] [11]

URL http://arxiv.org/abs/1605.09674

work page internal anchor Pith review Pith/arXiv arXiv

[11] [12]

M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. CoRR, abs/1606.01868, 2016. URL http: //arxiv.org/abs/1606.01868

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [13]

J. Fu, J. Co-Reyes, and S. Levine. Ex2: Exploration with exemplar models for deep reinforcement learning. In I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2577–2587. Curran Associates, Inc., 2017. URLhttp://papers.nips.cc/paper/ 6851-ex2-exp...

work page 2017

[13] [14]

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

J. Achiam and S. Sastry. Surprise-based intrinsic motivation for deep reinforcement learning. CoRR, abs/1703.01732, 2017. URL http://arxiv.org/abs/1703.01732

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [15]

Curiosity-driven Exploration by Self-supervised Prediction

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self- supervised prediction. CoRR, abs/1705.05363, 2017. URL http://arxiv.org/abs/1705. 05363

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [16]

Count-Based Exploration with Neural Density Models

G. Ostrovski, M. G. Bellemare, A. van den Oord, and R. Munos. Count-based exploration with neural density models. CoRR, abs/1703.01310, 2017. URL http://arxiv.org/abs/1703. 01310

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [18]

URL http://arxiv.org/abs/1611.07507. 9

work page internal anchor Pith review Pith/arXiv arXiv

[17] [19]

B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning. In Proc. AAAI, pages 1433–1438, 2008

work page 2008

[18] [20]

Reinforcement Learning with Deep Energy-Based Policies

T. Haarnoja, H. Tang, P. Abbeel, and S. Levine. Reinforcement learning with deep energy-based policies. CoRR, abs/1702.08165, 2017. URL http://arxiv.org/abs/1702.08165

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [21]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR, abs/1801.01290, 2018. URL http://arxiv.org/abs/1801.01290

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [22]

T. Jung, D. Polani, and P. Stone. Empowerment for continuous agent-environment systems. CoRR, abs/1201.6583, 2012. URL http://arxiv.org/abs/1201.6583

work page internal anchor Pith review Pith/arXiv arXiv 2012

[21] [23]

Diversity is All You Need: Learning Skills without a Reward Function

B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine. Diversity is all you need: Learning skills without a reward function. CoRR, abs/1802.06070, 2018. URL http://arxiv.org/abs/ 1802.06070

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [24]

Latent Space Policies for Hierarchical Reinforcement Learning

T. Haarnoja, K. Hartikainen, P. Abbeel, and S. Levine. Latent space policies for hierarchical reinforcement learning. CoRR, abs/1804.02808, 2018. URL http://arxiv.org/abs/1804. 02808

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [25]

Coros, P

S. Coros, P. Beaudoin, and M. van de Panne. Robust task-based control policies for physics- based characters. In ACM SIGGRAPH Asia 2009 Papers, SIGGRAPH Asia ’09, pages 170:1– 170:9, New York, NY , USA, 2009. ACM. ISBN 978-1-60558-858-2. doi:10.1145/1661412. 1618516. URL http://doi.acm.org/10.1145/1661412.1618516

work page doi:10.1145/1661412 2009

[24] [26]

Meta Learning Shared Hierarchies

K. Frans, J. Ho, X. Chen, P. Abbeel, and J. Schulman. Meta learning shared hierarchies. CoRR, abs/1710.09767, 2017. URL http://arxiv.org/abs/1710.09767

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [27]

2017 , issue_date =

L. Liu and J. Hodgins. Learning to schedule control fragments for physics-based characters using deep q-learning. ACM Trans. Graph., 36(3), June 2017. ISSN 0730-0301. doi:10.1145/3083723. URL http://doi.acm.org/10.1145/3083723

work page doi:10.1145/3083723 2017

[26] [28]

Merel, A

J. Merel, A. Ahuja, V . Pham, S. Tunyasuvunakool, S. Liu, D. Tirumala, N. Heess, and G. Wayne. Hierarchical visuomotor control of humanoids. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJfYvo09Y7

work page 2019

[27] [29]

X. B. Peng, G. Berseth, and M. van de Panne. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Trans. Graph., 35(4):81:1–81:12, July 2016. ISSN 0730-0301. doi:10.1145/2897824.2925881. URL http://doi.acm.org/10.1145/2897824.2925881

work page doi:10.1145/2897824.2925881 2016

[28] [30]

J. Z. Kolter and A. Y . Ng. Learning omnidirectional path following using dimensionality reduction. In in Proceedings of Robotics: Science and Systems , 2007

work page 2007

[29] [31]

Learning and Transfer of Modulated Locomotor Controllers

N. Heess, G. Wayne, Y . Tassa, T. P. Lillicrap, M. A. Riedmiller, and D. Silver. Learning and transfer of modulated locomotor controllers. CoRR, abs/1610.05182, 2016. URL http: //arxiv.org/abs/1610.05182

work page internal anchor Pith review Pith/arXiv arXiv 2016

[30] [32]

Hausman, J

K. Hausman, J. T. Springenberg, Z. Wang, N. Heess, and M. Riedmiller. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations,

work page

[31] [33]

URL https://openreview.net/forum?id=rk07ZXZRb

work page

[32] [34]

Merel, L

J. Merel, L. Hasenclever, A. Galashov, A. Ahuja, V . Pham, G. Wayne, Y . W. Teh, and N. Heess. Neural probabilistic motor primitives for humanoid control. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJl6TjRcY7

work page 2019

[33] [35]

X. B. Peng, M. Chang, G. Zhang, P. Abbeel, and S. Levine. MCP: learning composable hierarchical control with multiplicative compositional policies. CoRR, abs/1905.09808, 2019. URL http://arxiv.org/abs/1905.09808

work page internal anchor Pith review Pith/arXiv arXiv 1905

[34] [36]

Variational Option Discovery Algorithms

J. Achiam, H. Edwards, D. Amodei, and P. Abbeel. Variational option discovery algorithms. CoRR, abs/1807.10299, 2018. URL http://arxiv.org/abs/1807.10299

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [37]

D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013. 10

work page internal anchor Pith review Pith/arXiv arXiv 2013

[36] [38]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Generative adversarial nets. InAdvances in neural information processing systems , pages 2672–2680, 2014

work page 2014

[37] [39]

R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do. Semantic image inpainting with deep generative models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5485–5493, 2017

work page 2017

[38] [40]

Upchurch, J

P. Upchurch, J. Gardner, G. Pleiss, R. Pless, N. Snavely, K. Bala, and K. Weinberger. Deep feature interpolation for image content changes. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7064–7073, 2017

work page 2017

[39] [41]

Y . Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. Stargan: Uniﬁed generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8789–8797, 2018

work page 2018

[40] [42]

X. Yan, J. Yang, K. Sohn, and H. Lee. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision, pages 776–791. Springer, 2016

work page 2016

[41] [43]

Interpretable Latent Spaces for Learning from Demonstration

Y . Hristov, A. Lascarides, and S. Ramamoorthy. Interpretable latent spaces for learning from demonstration. arXiv preprint arXiv:1807.06583, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [45]

J. H. Lim and J. C. Ye. Geometric gan. arXiv preprint arXiv:1705.02894, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [46]

D. Tran, R. Ranganath, and D. M. Blei. Deep and hierarchical implicit models. arXiv preprint arXiv:1702.08896, 7, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[44] [47]

Spectral Normalization for Generative Adversarial Networks

T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[45] [48]

Lucic, K

M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet. Are gans created equal? a large-scale study. In Advances in neural information processing systems , pages 700–709, 2018

work page 2018

[46] [49]

C. D. Manning, C. D. Manning, and H. Schütze. F oundations of statistical natural language processing. MIT press, 1999

work page 1999

[47] [50]

Conditional Generative Adversarial Nets

M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[48] [51]

C. Doersch. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908, 2016

work page arXiv 2016

[49] [52]

L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008. 11

work page 2008