arxiv: 2605.07278 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.AI· cs.CV

Recognition: no theorem link

Predictive but Not Plannable: RC-aux for Latent World Models

Wenyuan Li , Guang Li , Keisuke Maeda , Takahiro Ogawa , Miki Haseyama

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords latent world modelsreachability supervisionmodel-based reinforcement learningauxiliary objectivesgoal-conditioned planningrepresentation geometryfinite-horizon reachability

0 comments

The pith

Latent world models need explicit reachability supervision to support planning beyond accurate short-term prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many latent world models predict the next state well yet produce representations that do not reflect which states are actually reachable within a limited action budget, so goal-directed search in latent space often fails. The paper introduces RC-aux, a lightweight auxiliary loss that adds multi-horizon open-loop prediction along the time axis and budget-conditioned reachability labels with temporal hard negatives along the space axis. These signals reshape the latent geometry so that Euclidean distances better approximate finite-horizon reachability. At inference time the same signal guides a planner toward attainable trajectories. Experiments on pixel-based goal-conditioned tasks show that planning success rises with only modest extra training cost, suggesting that representation structure matters as much as predictive fidelity for downstream search.

Core claim

RC-aux keeps the original world-model backbone unchanged and supplies two forms of planning-aligned supervision. Multi-horizon open-loop rollouts train consistency beyond one step. Budget-conditioned reachability targets, paired with hard negatives drawn from earlier time steps, push the latent space to separate states that are reachable within the current horizon from those that are not. The learned reachability score is then used by a modified planner that prefers trajectories that are both goal-directed and attainable under the action budget. On LeWorldModel backbones, both continuation training and matched-from-scratch runs across goal-conditioned pixel tasks and a LIBERO-Goal extension,

What carries the argument

The Reachability-Correction auxiliary objective (RC-aux), which injects budget-conditioned reachability supervision and temporal hard negatives to align latent geometry with finite-horizon reachability.

If this is right

Planning success improves on goal-conditioned tasks even when short-horizon prediction error stays the same.
The latent space geometry becomes more consistent with finite-horizon reachability constraints.
A reachability-aware planner can use the auxiliary signal to prune or rank trajectories at test time.
The same auxiliary objective works under both continued training and training from scratch on the same backbone.
Representation quality for planning is shown to be separable from pure predictive accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mismatch between prediction and reachability may appear in other world-model families, suggesting RC-aux could be ported as a modular fix.
In robotics domains where action budgets are strict, explicitly modeling reachability might reduce the number of unsafe or dead-end plans generated by latent search.
Scaling the horizon length of the reachability labels could reveal whether the benefit saturates or continues to grow with longer planning horizons.

Load-bearing premise

That the added reachability labels and hard negatives will make latent distances correspond to true reachable sets under a fixed action budget instead of simply fitting the auxiliary task.

What would settle it

An experiment in which RC-aux is added, predictive accuracy remains unchanged, yet planning success rates on held-out goal-conditioned tasks do not increase or the predicted reachability scores fail to match actual reachable states measured by exhaustive search.

Figures

Figures reproduced from arXiv: 2605.07278 by Guang Li, Keisuke Maeda, Miki Haseyama, Takahiro Ogawa, Wenyuan Li.

**Figure 1.** Figure 1: Conceptual illustration of the latentshortcut failure mode. A terminal latent-distance planner may favor a shortcut that is close in latent space but unreachable within the finite action budget. RC-aux encourages distance to align with finite-horizon reachability, making feasible trajectories more consistent with planning. The gap creates two coupled mismatches. The temporal mismatch arises because trai… view at source ↗

**Figure 2.** Figure 2: The framework of RC-aux. RC-aux keeps the latent world-model backbone unchanged [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Success rates across the five pixel-based control tasks. RC-aux improves four of the five [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Paired outcomes on fixed evaluation episodes. Each cell corresponds to one evaluation [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Local LeWM-family success rates. Bars show mean success over five fixed evaluation [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: Paired Wall rollouts comparing LeWM and RC-aux. Wall highlights obstacle-constrained [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Full paired Cube rollouts comparing LeWM and RC-aux. The figure provides additional [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Additional successful RC-aux rollouts on Wall. [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Additional successful RC-aux rollouts on TwoRoom. [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗

**Figure 10.** Figure 10: Additional successful RC-aux rollouts on Reacher. [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: Additional successful RC-aux rollouts on Push-T. [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

**Figure 12.** Figure 12: Additional successful RC-aux rollouts on Cube. [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: Approximate pixel-space trajectory overlays. Paths are extracted from rendered rollout [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Latent reachability diagnostic on selected Wall and TwoRoom rollouts. Rollout frames [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗

**Figure 15.** Figure 15: Terminal latent distance summary for selected diagnostic rollouts. Lower distance [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗

**Figure 16.** Figure 16: Push-T physical probe with linear predictors. RC-aux remains comparable to LeWM on [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

read the original abstract

A latent world model may achieve accurate short-horizon prediction while still inducing a latent space that is poorly aligned with planning. A key issue is spatiotemporal mismatch: these models are often trained with local predictive supervision, but deployed for long-horizon goal-directed search in latent spaces where Euclidean distance may not reflect what is reachable within a finite action budget. We present the Reachability-Correction auxiliary objective (RC-aux), a lightweight correction for this mismatch in reconstruction-free latent world models. RC-aux keeps the world-model backbone unchanged and adds planning-aligned supervision along two axes. Along the time axis, multi-horizon open-loop prediction trains the model beyond one-step consistency. Along the space axis, budget-conditioned reachability supervision, together with temporal hard negatives, encourages the latent space to distinguish states that are eventually reachable from those reachable within the current planning horizon. At test time, the learned reachability signal can also be used by a reachability-aware planner to favor trajectories that are both goal-directed and attainable under the available budget. We instantiate RC-aux on LeWorldModel and evaluate it under both continuation-training and matched-from-scratch settings. Across goal-conditioned pixel-control tasks and a LIBERO-Goal extension, RC-aux improves LeWM-style planning with modest additional cost. These results suggest that planning with latent world models depends not only on predictive accuracy, but also on whether the learned representation encodes the temporal and geometric structure required by downstream search. The code is available at https://github.com/Guang000/RC-aux.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RC-aux adds a lightweight auxiliary loss for reachability in latent world models and reports planning gains, but the lift may come from the test-time planner rather than a changed latent geometry.

read the letter

The core contribution is RC-aux, which layers multi-horizon open-loop prediction and budget-conditioned reachability supervision (plus temporal hard negatives) on top of an existing reconstruction-free world model backbone. This is a straightforward, additive fix aimed at the prediction-planning mismatch without forcing architectural changes. The paper shows the method on LeWorldModel under both continuation and from-scratch training, and claims better goal-conditioned behavior on pixel tasks plus a LIBERO extension, at modest extra cost. That is the useful part: it gives practitioners a concrete auxiliary objective they can try when standard one-step prediction leaves planning weak. The code release helps too. The main soft spot is the stress-test concern. The abstract and method description indicate that a reachability head is trained and then fed to a reachability-aware planner at test time. Without an ablation that freezes the encoder and runs plain latent-distance search, it is hard to tell whether the reported gains come from a latent space whose geometry now better matches finite-horizon reachability or simply from the extra test-time signal. The abstract supplies no numbers, baselines, or statistical details, so the size of the effect and its robustness remain unclear. The central claim about representation alignment therefore rests on the assumption that the auxiliary labels actually reshape the latent metric in the desired way; that assumption is plausible but not yet strongly evidenced. This paper is aimed at researchers working on latent world models for model-based RL who already have a reconstruction-free backbone and want a cheap way to improve long-horizon search. It is coherent on its own terms and engages the right literature, so it deserves a serious referee even if the experiments need tightening. I would send it to review with a request for the planner ablation and quantitative tables.

Referee Report

2 major / 2 minor

Summary. The paper introduces RC-aux, a lightweight auxiliary objective added to reconstruction-free latent world models such as LeWorldModel. RC-aux combines multi-horizon open-loop prediction with budget-conditioned reachability supervision and temporal hard negatives to correct spatiotemporal mismatch between local predictive training and long-horizon goal-directed planning. The method is evaluated in continuation-training and matched-from-scratch regimes on goal-conditioned pixel-control tasks and a LIBERO-Goal extension, with the claim that the resulting latent space better encodes finite-horizon reachability, enabling improved planning (optionally using a reachability-aware planner at test time).

Significance. If the empirical gains are shown to arise from improved latent geometry rather than test-time use of the auxiliary head, the work would usefully separate predictive accuracy from plannability and provide a practical correction for existing world-model backbones in visual domains. The availability of code supports reproducibility.

major comments (2)

[Experiments and Evaluation] The central claim that RC-aux aligns latent geometry with finite-horizon reachability (so that downstream search succeeds because of the representation) is load-bearing and requires an explicit ablation: replace the reachability-aware planner with standard latent-distance search using only the frozen encoder (no reachability head at test time). If the performance lift disappears in this setting, the results demonstrate utility of the auxiliary signal as a heuristic rather than a change in representation structure. This ablation is not described in the reported experiments.
[Method] §3 (Method): the interaction between the budget-conditioned reachability head and the unchanged world-model backbone is not fully specified. It is unclear whether gradients from RC-aux flow into the backbone encoder or whether the auxiliary loss is strictly additive with frozen backbone parameters during the correction phase.

minor comments (2)

[Abstract] The abstract would be strengthened by reporting at least one key quantitative result (e.g., success rate delta and baseline comparison) rather than stating only that improvements occur.
[Method] Notation for the reachability head output and its integration into the planner should be defined more explicitly (e.g., how the budget-conditioned probability is combined with latent distance).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major point below and will revise the manuscript to incorporate the suggested clarifications and additional experiments.

read point-by-point responses

Referee: [Experiments and Evaluation] The central claim that RC-aux aligns latent geometry with finite-horizon reachability (so that downstream search succeeds because of the representation) is load-bearing and requires an explicit ablation: replace the reachability-aware planner with standard latent-distance search using only the frozen encoder (no reachability head at test time). If the performance lift disappears in this setting, the results demonstrate utility of the auxiliary signal as a heuristic rather than a change in representation structure. This ablation is not described in the reported experiments.

Authors: We agree that this ablation is necessary to isolate whether the gains arise from improved latent geometry. The current evaluation focuses on the full RC-aux pipeline, which includes optional use of the reachability-aware planner at test time as described in the method. To directly address the concern, we will add this ablation in the revised version: we will report planning performance using only standard latent-distance search on the frozen encoder (without the reachability head) for both the baseline LeWorldModel and the RC-aux corrected model. This will clarify the contribution of the representation change independent of the test-time heuristic. revision: yes
Referee: [Method] §3 (Method): the interaction between the budget-conditioned reachability head and the unchanged world-model backbone is not fully specified. It is unclear whether gradients from RC-aux flow into the backbone encoder or whether the auxiliary loss is strictly additive with frozen backbone parameters during the correction phase.

Authors: We thank the referee for pointing out this ambiguity. As stated in the abstract ('RC-aux keeps the world-model backbone unchanged'), the backbone encoder and dynamics model remain frozen during the RC-aux correction phase. The auxiliary objectives (multi-horizon prediction and budget-conditioned reachability with temporal hard negatives) are implemented as separate heads whose parameters are trained on top of the fixed latent representations. Gradients from the RC-aux losses do not propagate into the backbone; the auxiliary loss is strictly additive. We will expand §3 with an explicit statement and diagram clarifying this frozen-backbone design to make the interaction unambiguous. revision: yes

Circularity Check

0 steps flagged

No circularity detected in RC-aux claims

full rationale

The paper introduces RC-aux as an auxiliary supervision signal added to an unchanged latent world model backbone, with claims of improved planning supported by empirical results on goal-conditioned tasks and LIBERO-Goal. No equations, self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations are present in the provided text that would make the planning gains equivalent to the auxiliary loss by construction. The method is framed as a lightweight correction for spatiotemporal mismatch, and results are reported as external improvements rather than tautological outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the standard domain assumption that latent world models can be used for planning once the representation is suitably aligned, plus the new method itself; no free parameters or additional invented physical entities are visible in the abstract.

axioms (1)

domain assumption Latent world models trained with local predictive supervision can be deployed for long-horizon goal-directed search in their latent spaces.
This premise is stated as the key issue the paper addresses in the opening sentences of the abstract.

invented entities (1)

Reachability-Correction auxiliary objective (RC-aux) no independent evidence
purpose: To supply planning-aligned supervision along time and space axes in reconstruction-free latent world models.
New training objective introduced by the paper; no independent evidence outside the current work is provided in the abstract.

pith-pipeline@v0.9.0 · 5596 in / 1464 out tokens · 37147 ms · 2026-05-11T02:08:21.049364+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 14 internal anchors

[1]

Self-supervised learning from images with a joint- embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint- embedding predictive architecture. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15619–15629, 2023

work page 2023
[2]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Tldr: Unsupervised goal-conditioned rl via temporal distance-aware representations.arXiv preprint arXiv:2407.08464, 2024

Junik Bae, Kwanyoung Park, and Youngwoon Lee. Tldr: Unsupervised goal-conditioned rl via temporal distance-aware representations.arXiv preprint arXiv:2407.08464, 2024

work page arXiv 2024
[4]

Lejepa: Provable and scalable self-supervised learning without the heuristics, 2025

Randall Balestriero and Yann LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025

work page arXiv 2025
[5]

Revisiting Feature Prediction for Learning Visual Representations from Video

Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, and Nicolas Ballas. Revisiting feature prediction for learning visual repre- sentations from video.arXiv preprint arXiv:2404.08471, 2024

work page internal anchor Pith review arXiv 2024
[6]

Vicreg: Variance-invariance-covariance regularization for self-supervised learning.arXiv preprint arXiv:2105.04906,

Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning.arXiv preprint arXiv:2105.04906, 2021

work page arXiv 2021
[7]

Mudreamer: Learning predictive world models without reconstruction.arXiv preprint arXiv:2405.15083, 2024

Maxime Burchi and Radu Timofte. Mudreamer: Learning predictive world models without reconstruction.arXiv preprint arXiv:2405.15083, 2024

work page arXiv 2024
[8]

Mico: Improved representations via sampling-based state similarity for markov decision processes.Advances in Neural Information Processing Systems, 34:30113–30126, 2021

Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, and Mark Rowland. Mico: Improved representations via sampling-based state similarity for markov decision processes.Advances in Neural Information Processing Systems, 34:30113–30126, 2021

work page 2021
[9]

Dreamerpro: Reconstruction-free model-based rein- forcement learning with prototypical representations

Fei Deng, Ingook Jang, and Sungjin Ahn. Dreamerpro: Reconstruction-free model-based rein- forcement learning with prototypical representations. InInternational conference on machine learning, pages 4956–4975. PMLR, 2022

work page 2022
[10]

arXiv preprint arXiv:2601.00844 , year=

Matthieu Destrade, Oumayma Bounou, Quentin Le Lidec, Jean Ponce, and Yann LeCun. Value-guided action planning with jepa world models.arXiv preprint arXiv:2601.00844, 2025

work page arXiv 2025
[11]

Deep visual foresight for planning robot motion

Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. In2017 IEEE international conference on robotics and automation (ICRA), pages 2786–2793. IEEE, 2017

work page 2017
[12]

Deep- mdp: Learning continuous latent space models for representation learning

Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G Bellemare. Deep- mdp: Learning continuous latent space models for representation learning. InInternational conference on machine learning, pages 2170–2179. PMLR, 2019

work page 2019
[13]

Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088, 2019

Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin, Benjamin Eysenbach, and Sergey Levine. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088, 2019. 10

work page arXiv 1912
[14]

Bootstrap your own latent-a new approach to self-supervised learning.Advances in neural information processing systems, 33:21271–21284, 2020

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning.Advances in neural information processing systems, 33:21271–21284, 2020

work page 2020
[15]

World Models

David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2(3):440, 2018

work page internal anchor Pith review arXiv 2018
[16]

Dream to Control: Learning Behaviors by Latent Imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination.arXiv preprint arXiv:1912.01603, 2019

work page internal anchor Pith review arXiv 1912
[17]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

work page 2019
[18]

Mastering Atari with Discrete World Models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models.arXiv preprint arXiv:2010.02193, 2020

work page internal anchor Pith review arXiv 2010
[19]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

work page internal anchor Pith review arXiv 2023
[20]

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control.arXiv preprint arXiv:2310.16828, 2023

work page internal anchor Pith review arXiv 2023
[21]

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Moo Jin Kim, Chelsea Finn, and Percy Liang. Fine-tuning vision-language-action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645, 2025

work page internal anchor Pith review arXiv 2025
[22]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit q-learning.arXiv preprint arXiv:2110.06169, 2021

work page internal anchor Pith review arXiv 2021
[24]

Pclast: Discovering plannable continuous latent states.arXiv preprint arXiv:2311.03534, 2023

Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, et al. Pclast: Discovering plannable continuous latent states.arXiv preprint arXiv:2311.03534, 2023

work page arXiv 2023
[25]

Learning plannable representations with causal infogan.Advances in Neural Information Processing Systems, 31, 2018

Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart J Russell, and Pieter Abbeel. Learning plannable representations with causal infogan.Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[26]

Curl: Contrastive unsupervised repre- sentations for reinforcement learning

Michael Laskin, Aravind Srinivas, and Pieter Abbeel. Curl: Contrastive unsupervised repre- sentations for reinforcement learning. InInternational conference on machine learning, pages 5639–5650. PMLR, 2020

work page 2020
[27]

A path towards autonomous machine intelligence version 0.9

Yann LeCun et al. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1):1–62, 2022

work page 2022
[28]

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning.arXiv preprint arXiv:2306.03310, 2023

work page internal anchor Pith review arXiv 2023
[29]

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. Leworld- model: Stable end-to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026

work page internal anchor Pith review arXiv 2026
[30]

Temporal predictive coding for model-based planning in latent space

Tung D Nguyen, Rui Shu, Tuan Pham, Hung Bui, and Stefano Ermon. Temporal predictive coding for model-based planning in latent space. InInternational conference on machine learning, pages 8130–8139. PMLR, 2021

work page 2021
[31]

Dreamingv2: Reinforcement learning with discrete world models without reconstruction

Masashi Okada and Tadahiro Taniguchi. Dreamingv2: Reinforcement learning with discrete world models without reconstruction. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 985–991. IEEE, 2022. 11

work page 2022
[32]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

Ogbench: Benchmarking offline goal-conditioned rl.arXiv preprint arXiv:2410.20092,

Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. Ogbench: Benchmarking offline goal-conditioned rl.arXiv preprint arXiv:2410.20092, 2024

work page arXiv 2024
[34]

Goal-conditioned reinforcement learning with disentanglement-based reachability planning.IEEE Robotics and Automation Letters, 8(8):4721–4728, 2023

Zhifeng Qian, Mingyu You, Hongjun Zhou, Xuanhui Xu, and Bin He. Goal-conditioned reinforcement learning with disentanglement-based reachability planning.IEEE Robotics and Automation Letters, 8(8):4721–4728, 2023

work page 2023
[35]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Si- mon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

work page 2020
[36]

Data-efficient reinforcement learning with self-predictive representations

Max Schwarzer, Ankesh Anand, Rishab Goel, R Devon Hjelm, Aaron Courville, and Philip Bachman. Data-efficient reinforcement learning with self-predictive representations.arXiv preprint arXiv:2007.05929, 2020

work page arXiv 2007
[37]

Planning to explore via self-supervised world models

Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. InInternational conference on machine learning, pages 8583–8592. PMLR, 2020

work page 2020
[38]

Stress-testing offline reward-free reinforcement learning: A case for planning with latent dynamics models

Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim GJ Rudner, and Yann LeCun. Stress-testing offline reward-free reinforcement learning: A case for planning with latent dynamics models. In7th Robot Learning Workshop: Towards Robots with Human-Level Abilities

work page
[39]

Learning from reward-free offline data: A case for planning with latent dynamics models

Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim GJ Rudner, and Yann LeCun. Learning from reward-free offline data: A case for planning with latent dynamics models.arXiv preprint arXiv:2502.14819, 2025

work page arXiv 2025
[40]

Universal plan- ning networks: Learning generalizable representations for visuomotor control

Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, and Chelsea Finn. Universal plan- ning networks: Learning generalizable representations for visuomotor control. InInternational conference on machine learning, pages 4732–4741. PMLR, 2018

work page 2018
[41]

State representation learning for goal-conditioned reinforcement learning

Lorenzo Steccanella and Anders Jonsson. State representation learning for goal-conditioned reinforcement learning. InJoint european conference on machine learning and knowledge discovery in databases, pages 84–99. Springer, 2022

work page 2022
[42]

DeepMind Control Suite

Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018

work page internal anchor Pith review arXiv 2018
[43]

Optimal goal-reaching reinforcement learning via quasimetric learning

Tongzhou Wang, Antonio Torralba, Phillip Isola, and Amy Zhang. Optimal goal-reaching reinforcement learning via quasimetric learning. InInternational Conference on Machine Learning, pages 36411–36430. PMLR, 2023

work page 2023
[44]

Temporal straightening for latent planning.arXiv preprint arXiv:2603.12231, 2026

Ying Wang, Oumayma Bounou, Gaoyue Zhou, Randall Balestriero, Tim GJ Rudner, Yann LeCun, and Mengye Ren. Temporal straightening for latent planning.arXiv preprint arXiv:2603.12231, 2026

work page arXiv 2026
[45]

Embed to control: A locally linear latent dynamics model for control from raw images.Advances in neural information processing systems, 28, 2015

Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images.Advances in neural information processing systems, 28, 2015

work page 2015
[46]

Learning invariant representa- tions for reinforcement learning without reconstruction.arXiv preprint arXiv:2006.10742,

Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invariant representations for reinforcement learning without reconstruction.arXiv preprint arXiv:2006.10742, 2020

work page arXiv 2006
[47]

2411.04983 , archiveprefix =

Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning.arXiv preprint arXiv:2411.04983, 2024. 12 A Extended Related Work Latent world models for control from pixels.World models learn compact predictive representa- tions that support decision-making by imagining the consequenc...

work page arXiv 2024
[48]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page