Sensorimotor World Models: Perception for Action via Inverse Dynamics
Pith reviewed 2026-06-26 18:19 UTC · model grok-4.3
The pith
A single inverse-dynamics regularizer on latent states prevents collapse and aligns representations to controllable factors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A sensorimotor world model is a latent world model trained end-to-end with inverse dynamics regularization. This single regularizer prevents representation collapse and induces action-aligned representations. By forcing latent states to preserve information about the action underlying a transition, it biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors. This yields stable latent world models trained from offline, reward-free trajectories, without frozen encoders, exponential moving averages, or complex latent regularizers.
What carries the argument
Inverse-dynamics prediction objective applied directly to latent states, which enforces retention of action information across transitions.
If this is right
- Latent world models can be trained stably from offline reward-free trajectories.
- Representations become compact and focused on controllable degrees of freedom.
- Planning performance becomes competitive on simple 2D and 3D control tasks.
- No need for frozen encoders, exponential moving averages, or multiple regularizers.
Where Pith is reading between the lines
- The same regularizer might reduce reliance on multi-term loss functions when scaling to higher-dimensional observations.
- The approach could be tested in environments where uncontrollable noise varies over time to check robustness of the separation.
- If the latent states prove interpretable, they might serve as inputs for downstream tasks beyond planning such as imitation learning.
Load-bearing premise
An inverse-dynamics prediction objective on latent states will reliably separate controllable from uncontrollable factors without degrading forward prediction quality or requiring additional loss terms.
What would settle it
Training the model on an environment with known uncontrollable distractors and checking whether the learned latent states still encode those distractors or whether forward prediction error increases relative to baselines.
Figures
read the original abstract
Perception for action suggests that representations of the world should be shaped not by visual fidelity alone, but by their relevance for actions. At the same time, latent JEPA-style world models advocate learning compact predictive states from high-dimensional observations to facilitate the prediction of future states, but end-to-end training of these models is nontrivial because representations may collapse if our only goal is to construct a latent state that is easy to predict. We introduce a sensorimotor world model (SMWM): a latent world model trained end-to-end with inverse dynamics regularization. This single regularizer addresses both issues: it prevents representation collapse and induces action-aligned representations. By forcing latent states to preserve information about the action underlying a transition, it biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors. This yields stable latent world models trained from offline, reward-free trajectories, without frozen encoders, exponential moving averages, or complex latent regularizers. Empirically, SMWM learns compact, interpretable latent spaces and enables competitive planning performance across simple 2D and 3D control tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Sensorimotor World Models (SMWM), a latent JEPA-style world model trained end-to-end from offline reward-free trajectories using a single inverse-dynamics regularization term on latent states. This regularizer is presented as simultaneously preventing representation collapse and inducing action-aligned representations that bias the model toward controllable degrees of freedom while discarding uncontrollable distractors, yielding stable training without frozen encoders, EMAs or additional latent regularizers, and enabling competitive planning on simple 2D and 3D control tasks.
Significance. If the central empirical claims hold, the work would demonstrate that a minimal inverse-dynamics term suffices for both collapse prevention and sensorimotor alignment, offering a simpler alternative to existing world-model training pipelines that rely on multiple auxiliary losses or architectural constraints.
major comments (3)
- [Abstract] Abstract, final paragraph: the claim that the inverse-dynamics regularizer 'biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors' is not supported by the stated objective. The regularizer only encourages z_t, z_{t+1} to retain sufficient information to predict a_t; it contains no explicit penalty on retention of action-irrelevant factors. When distractors are temporally predictable or correlated with controllable variables in the offline data, the forward-prediction loss can be satisfied while still encoding them, undermining the 'discarding' part of the central claim.
- [Abstract, §4] Abstract and §4 (empirical results): the manuscript asserts 'competitive planning performance' and 'compact, interpretable latent spaces' yet supplies no quantitative metrics, baselines, ablation studies, error bars, or statistical comparisons. Without these details it is impossible to evaluate whether the single regularizer alone accounts for any observed gains or whether forward-prediction quality is preserved.
- [§3] §3 (method): the description of the inverse-dynamics term as an independent training signal that reliably separates controllable from uncontrollable factors without degrading the forward model or requiring further loss terms is an assumption rather than a derived property. No analysis or bound is provided showing that action-predictive information is sufficient to exclude distractors under the joint optimization.
minor comments (2)
- [§3] Notation for the latent states and the inverse-dynamics predictor should be introduced with explicit equations rather than prose descriptions.
- [§4] Figure captions should state the exact tasks, number of runs, and what 'competitive' is measured against.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We respond point-by-point to the major comments below, indicating planned revisions where the concerns are valid.
read point-by-point responses
-
Referee: [Abstract] Abstract, final paragraph: the claim that the inverse-dynamics regularizer 'biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors' is not supported by the stated objective. The regularizer only encourages z_t, z_{t+1} to retain sufficient information to predict a_t; it contains no explicit penalty on retention of action-irrelevant factors. When distractors are temporally predictable or correlated with controllable variables in the offline data, the forward-prediction loss can be satisfied while still encoding them, undermining the 'discarding' part of the central claim.
Authors: We agree that the inverse-dynamics term provides no explicit penalty against retaining action-irrelevant factors, so the 'discarding' effect is not a guaranteed theoretical outcome but an empirical tendency when distractors do not aid action prediction. We will revise the abstract to replace the stronger 'discarding' phrasing with language indicating that the regularizer encourages retention of action-relevant information, which in practice biases representations toward controllable factors in the tested settings. revision: partial
-
Referee: [Abstract, §4] Abstract and §4 (empirical results): the manuscript asserts 'competitive planning performance' and 'compact, interpretable latent spaces' yet supplies no quantitative metrics, baselines, ablation studies, error bars, or statistical comparisons. Without these details it is impossible to evaluate whether the single regularizer alone accounts for any observed gains or whether forward-prediction quality is preserved.
Authors: Section 4 presents planning results on 2D and 3D tasks along with latent-space visualizations. To strengthen the evaluation, we will add quantitative metrics with error bars from multiple seeds, explicit baseline comparisons, ablation studies isolating the inverse-dynamics term, and confirmation that forward-prediction quality is preserved under the joint objective. revision: yes
-
Referee: [§3] §3 (method): the description of the inverse-dynamics term as an independent training signal that reliably separates controllable from uncontrollable factors without degrading the forward model or requiring further loss terms is an assumption rather than a derived property. No analysis or bound is provided showing that action-predictive information is sufficient to exclude distractors under the joint optimization.
Authors: The method is presented empirically; we do not derive a theoretical bound showing that action-predictive information suffices to exclude distractors. We will revise §3 to state explicitly that the separation is an observed empirical outcome rather than a proven property of the joint optimization. revision: yes
- Providing a theoretical analysis or bound demonstrating that action-predictive information is sufficient to exclude distractors under the joint optimization.
Circularity Check
No circularity: regularizer presented as independent objective without reduction to fitted inputs
full rationale
The paper introduces inverse-dynamics regularization as an explicit additional training term on latent states z_t, z_{t+1} to predict a_t. The abstract and description claim this term simultaneously prevents collapse and biases toward controllable factors, but no equations, derivations, or self-citations are shown that define the claimed bias or distractor-discarding property as a direct algebraic consequence of the same fitted quantities. The benefit is asserted as a property of the added loss rather than derived by construction from the forward-prediction objective alone. No load-bearing self-citation chains or ansatzes appear in the provided text. This is the common case of an independent regularizer whose empirical effects are left for validation.
Axiom & Free-Parameter Ledger
free parameters (1)
- inverse-dynamics regularization weight
Forward citations
Cited by 1 Pith paper
-
Predictive Objectives Discard Exogenous Control-Relevant Features: A Controlled Mechanistic Study
JEPA-style objectives discard exogenous control-relevant features because they optimize temporal predictability; reward grounding recovers them with as little as 2% labeled data.
Reference graph
Works this paper leans on
-
[1]
World models.arXiv preprint arXiv:1803.10122, 2018
David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2018
Pith/arXiv arXiv 2018
-
[2]
Dream to control: Learning behaviors by latent imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InICLR, 2020
2020
-
[3]
Causality for machine learning
Bernhard Schölkopf. Causality for machine learning. 2019. URL http://arxiv.org/abs/ 1911.10500. Published in: Probabilistic and Causal Inference: The Works of Judea Pearl
arXiv 2019
-
[4]
Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023
Pith/arXiv arXiv 2023
-
[5]
Temporal difference learning for model predictive control
Nicklas Hansen, Xiaolong Wang, and Hao Su. Temporal difference learning for model predictive control. InICML, 2022
2022
-
[6]
Td-mpc2: Scalable, robust world models for continuous control.ICLR, 2024
Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control.ICLR, 2024
2024
-
[7]
A path towards autonomous machine intelligence.OpenReview, 2022
Yann LeCun. A path towards autonomous machine intelligence.OpenReview, 2022
2022
-
[8]
Self-supervised learning from images with a joint- embedding predictive architecture
Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint- embedding predictive architecture. InCVPR, 2023
2023
-
[9]
Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, and Nicolas Ballas. V-jepa: Latent video prediction for visual representation learning.arXiv preprint arXiv:2402.04252, 2024
arXiv 2024
-
[10]
Bootstrap your own latent: A new approach to self-supervised learning
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent: A new approach to self-supervised learning. InNeurIPS, 2020
2020
-
[11]
Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning
Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning. InICLR, 2022
2022
-
[12]
DINO-WM: World models on pre-trained visual features enable zero-shot planning
Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. DINO-WM: World models on pre-trained visual features enable zero-shot planning. InProceedings of the 42nd International Conference on Machine Learning (ICML 2025), volume 267 ofProceedings of Machine Learning Research, pages 79115–79135. PMLR, 2025. URL https://proceedings.mlr. press/v267/zhou25t.html
2025
-
[13]
Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim G. J. Rudner, and Yann LeCun. Learning from reward-free offline data: A case for planning with latent dynamics models. InAdvances in Neural Information Processing Systems 38 (NeurIPS 2025), 2025. URL https://neurips.cc/virtual/2025/poster/116649. 11
2025
-
[14]
Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. LeWorld- Model: Stable end-to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026. URLhttps://arxiv.org/abs/2603.19312
Pith/arXiv arXiv 2026
-
[15]
Randall Balestriero and Yann LeCun. LeJEPA: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025. URL https://arxiv.org/ abs/2511.08544
Pith/arXiv arXiv 2025
-
[16]
V-JEPA 2: Self-supervised video models enable understanding, prediction and planning
Mahmoud Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, X...
Pith/arXiv arXiv 2025
-
[17]
Schölkopf, F
B. Schölkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y . Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021
2021
-
[18]
Goodale and A
Melvyn A. Goodale and A. David Milner. Separate visual pathways for perception and action. Trends in Neurosciences, 15(1):20–25, 1992
1992
-
[19]
A common coding approach to perception and action
Wolfgang Prinz. A common coding approach to perception and action. In Odmar Neumann and Wolfgang Prinz, editors,Relationships Between Perception and Action: Current Approaches, pages 167–201. Springer, Berlin, 1990
1990
-
[20]
Gibson.The Ecological Approach to Visual Perception
James J. Gibson.The Ecological Approach to Visual Perception. Houghton Mifflin, 1979
1979
-
[21]
Verlag von Julius Springer, Berlin, 1934
Jakob von Uexküll.Streifzüge durch die Umwelten von Tieren und Menschen: Ein Bilderbuch unsichtbarer Welten, volume 21 ofVerständliche Wissenschaft. Verlag von Julius Springer, Berlin, 1934
1934
-
[22]
Varela, Eleanor Rosch, and Evan Thompson.The Embodied Mind: Cognitive Science and Human Experience
Francisco J. Varela, Eleanor Rosch, and Evan Thompson.The Embodied Mind: Cognitive Science and Human Experience. MIT Press, Cambridge, MA, 1991
1991
-
[23]
Kevin O’Regan and Alva Noë
J. Kevin O’Regan and Alva Noë. A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 24(5):939–1031, 2001
2001
-
[24]
Mastering Atari with discrete world models
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering Atari with discrete world models. InICLR, 2021
2021
-
[25]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InICML, 2019
2019
-
[26]
Representation learning with contrastive predictive coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. InarXiv preprint arXiv:1807.03748, 2018
Pith/arXiv arXiv 2018
-
[27]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InICML, 2020
2020
-
[28]
Exploring simple siamese representation learning
Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. InCVPR, 2021
2021
-
[29]
Barlow twins: Self- supervised learning via redundancy reduction
Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InICML, 2021
2021
-
[30]
Curiosity-driven exploration by self-supervised prediction
Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. InICML, 2017
2017
-
[31]
Provably filtering exogenous distractors using multistep inverse dynamics
Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, and John Langford. Provably filtering exogenous distractors using multistep inverse dynamics. InInternational Conference on Learning Representations (ICLR 2022), 2022. URL https://openreview. net/forum?id=RQLLzMCefQu. 12
2022
-
[32]
Guaranteed discovery of control-endogenous latent states with multi-step inverse models.Transactions on Machine Learning Research, 2023
Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Rajiv Didolkar, Dipendra Misra, Dylan J Foster, Lekan P Molu, Rajan Chari, Akshay Krishnamurthy, and John Langford. Guaranteed discovery of control-endogenous latent states with multi-step inverse models.Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/ forum?id=T...
2023
-
[33]
Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, and John Langford. Agent-controller representations: Principled offline RL with rich exogenous information.arXiv preprint arXiv:2211.00164, 2022
arXiv 2022
-
[34]
Foster, and Alexander Rakhlin
Zakaria Mhammedi, Dylan J. Foster, and Alexander Rakhlin. Representation learning with multi-step inverse kinematics: An efficient and optimal approach to rich-observation RL. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023), volume 202 ofProceedings of Machine Learning Research, pages 24659–24700. PMLR, 2023. URL https:...
2023
-
[35]
Enhancing policy learning with world-action model.arXiv preprint arXiv:2603.28955, 2026
Yuci Han and Alper Yilmaz. Enhancing policy learning with world-action model.arXiv preprint arXiv:2603.28955, 2026. URLhttps://arxiv.org/abs/2603.28955
arXiv 2026
-
[36]
Basile Terver, Randall Balestriero, Megi Dervishi, David Fan, Quentin Garrido, Tushar Na- garajan, Koustuv Sinha, Wancong Zhang, Mike Rabbat, Yann LeCun, and Amir Bar. A lightweight library for energy-based joint-embedding predictive architectures.arXiv preprint arXiv:2602.03604, 2026. URLhttps://arxiv.org/abs/2602.03604
Pith/arXiv arXiv 2026
-
[37]
Why and how auxiliary tasks improve JEPA representations
Jiacan Yu, Siyi Chen, Mingrui Liu, Nono Horiuchi, Vladimir Braverman, Zicheng Xu, Dan Haramati, and Randall Balestriero. Why and how auxiliary tasks improve JEPA representations. InUniReps: 3rd Edition of the Workshop on Unifying Representations in Neural Models, 2025. URLhttps://openreview.net/forum?id=ZVx4SdKhlc
2025
-
[38]
Learning to act without actions
Dominik Schmidt and Minqi Jiang. Learning to act without actions. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=rvUq3cxpDF
2024
-
[39]
Dynamo: In- domain dynamics pretraining for visuo-motor control
Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, and Lerrel Pinto. Dynamo: In- domain dynamics pretraining for visuo-motor control. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum? id=vUrOuc6NR3
2024
-
[40]
James, and Pieter Abbeel
Younggyo Seo, Kimin Lee, Stephen L. James, and Pieter Abbeel. Reinforcement learning with action-free pre-training from videos. InProceedings of the 39th International Conference on Machine Learning (ICML 2022), volume 162 ofProceedings of Machine Learning Re- search, pages 19561–19579. PMLR, 2022. URL https://proceedings.mlr.press/v162/ seo22a.html
2022
-
[41]
Curl: Contrastive unsupervised repre- sentations for reinforcement learning
Michael Laskin, Aravind Srinivas, and Pieter Abbeel. Curl: Contrastive unsupervised repre- sentations for reinforcement learning. InInternational conference on machine learning, pages 5639–5650. PMLR, 2020
2020
-
[42]
Ilya Kostrikov, Denis Yarats, and Rob Fergus. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels.arXiv preprint arXiv:2004.13649, 2020
arXiv 2004
-
[43]
Reinforcement learning with prototypical representations
Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Reinforcement learning with prototypical representations. InInternational Conference on Machine Learning, pages 11920–11931. PMLR, 2021
2021
-
[44]
Metrics for finite Markov decision processes
Norm Ferns, Prakash Panangaden, and Doina Precup. Metrics for finite Markov decision processes. InUAI, 2004
2004
-
[45]
Scalable methods for computing state similarity in deterministic Markov decision processes
Pablo Samuel Castro. Scalable methods for computing state similarity in deterministic Markov decision processes. InAAAI, 2020. 13
2020
-
[46]
P. K. Rubenstein*, S. Weichwald*, S. Bongers, J. M. Mooij, D. Janzing, M. Grosse-Wentrup, and B. Schölkopf. Causal consistency of structural equation models. InProceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence (UAI), 2017. URL http: //auai.org/uai2017/proceedings/papers/11.pdf
2017
-
[47]
Macmillan, London, 1899
Heinrich Hertz.The Principles of Mechanics Presented in a New Form. Macmillan, London, 1899
-
[48]
Optimization of computer simulation models with rare events.European Journal of Operational Research, 99(1):89–112, 1997
Reuven Y Rubinstein. Optimization of computer simulation models with rare events.European Journal of Operational Research, 99(1):89–112, 1997
1997
-
[49]
Lucas Maes, Quentin Le Lidec, Dan Haramati, Nassim Massaudi, Damien Scieur, Yann LeCun, and Randall Balestriero. stable-worldmodel-v1: Reproducible world modeling research and evaluation.arXiv preprint arXiv:2602.08968, 2026
arXiv 2026
-
[50]
Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim G. J. Rudner, and Yann LeCun. Stress-testing offline reward-free reinforcement learning: A case for planning with latent dynamics models. In7th Robot Learning Workshop: Towards Robots with Human-Level Abilities, 2025. URLhttps://openreview.net/forum?id=jON7H6A9UU
2025
-
[51]
OGBench: Bench- marking offline goal-conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. OGBench: Bench- marking offline goal-conditioned RL. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=M992mjgKzI
2025
-
[52]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018
Pith/arXiv arXiv 2018
-
[53]
Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. PMLR, 2018
2018
-
[54]
Balaraman Ravindran and Andrew G. Barto. Smdp homomorphisms: An algebraic approach to abstraction in semi-markov decision processes. InInternational Joint Conference on Artificial Intelligence (IJCAI), pages 1011–1016, 2003
2003
-
[55]
Keurti, H.-R
H. Keurti, H.-R. Pan, M. Besserve, B. F. Grewe, and B. Schölkopf. Homomorphism Au- toEncoder — learning group structured representations from observed transitions. InPro- ceedings of the 40th International Conference on Machine Learning, volume 202 ofPro- ceedings of Machine Learning Research, pages 16190–16215. PMLR, 2023. URL https: //proceedings.mlr.pr...
2023
-
[56]
The linear representation hypothesis and the geometry of large language models
Kiho Park, Yo Joong Choe, and Victor Veitch. The linear representation hypothesis and the geometry of large language models. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 39643–39666. PMLR, 2024
2024
-
[57]
Multistep inverse is not all you need.Rein- forcement Learning Journal, 2:884–925, 2024
Alexander Levine, Peter Stone, and Amy Zhang. Multistep inverse is not all you need.Rein- forcement Learning Journal, 2:884–925, 2024. URL https://rlj.cs.umass.edu/2024/ papers/Paper117.html. Presented at the Reinforcement Learning Conference (RLC 2024)
2024
-
[58]
Inverse dynamics pretraining learns good representations for multitask imitation
David Brandfonbrener, Ofir Nachum, and Joan Bruna. Inverse dynamics pretraining learns good representations for multitask imitation. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023. URL https://proceedings.neurips.cc/paper_files/ paper/2023/hash/d36dfcdb14473a8526111c221660f2ab-Abstract-Conference. html
2023
-
[59]
"" o_t, o_tp1: (B, C, H, W) consecutive pixel observations a_t: (B, A) action taken between them lambda_inv: (float) inverse dynamics loss weight
Max Schwarzer, Ankesh Anand, Rishab Goel, R Devon Hjelm, Aaron Courville, and Philip Bachman. Data-efficient reinforcement learning with self-predictive representations. InICLR, 2021. 14 A Implementation details A.1 Training objective Alg. 1 gives PyTorch-style pseudocode for the mini-batch objective used to train SMWM. The encoder receives gradients from...
2021
-
[60]
Then both sides of Eq
No encoding.Take Z=O , f= id , and ga =a . Then both sides of Eq. (10) equal a(o). This solution satisfies equivariance but achieves no compression
-
[61]
Inv.” and “Fwd
Collapse.Take Z={z} , f≡z , and ga = id. Then both sides of Eq. (10) equal z. This solution satisfies equivariance but discards all information abouta. Useful representations therefore need more than equivariance: the latent dynamics should remain faithful to the physical action. In particular, if an action a∈ A changes observations nontrivially in O, it ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.