stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

Ayush Chaurasia; Damien Scieur; Dan Haramati; Francesco Capuano; Lucas Maes; Luiz Facury; Nassim Massaudi; Quentin Le Lidec; Randall Balestriero; Richard Gao

arxiv: 2605.21800 · v1 · pith:D2A6PAUTnew · submitted 2026-05-20 · 💻 cs.LG · cs.RO

stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

Lucas Maes , Quentin Le Lidec , Luiz Facury , Nassim Massaudi , Ayush Chaurasia , Francesco Capuano , Richard Gao , Taj Gillin

show 4 more authors

Dan Haramati Damien Scieur Yann LeCun Randall Balestriero

This is my paper

Pith reviewed 2026-05-22 08:53 UTC · model grok-4.3

classification 💻 cs.LG cs.RO

keywords world modelsreproducibilitybenchmarkingdata pipelinesopen source platformmachine learninggeneralizationreinforcement learning

0 comments

The pith

stable-worldmodel unifies data pipelines, baselines, and benchmarks under one framework to cut research overhead for world models

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces stable-worldmodel as an open-source platform that addresses fragmentation in world model research through disparate codebases, slow data loading, and missing standardized benchmarks. It supplies a high-performance Lance-based data layer with conversion tools for MP4, HDF5, and LeRobot datasets, plus clean implementations of modern baselines and planning solvers, and environments extended with controllable visual, geometric, and physical factors. A sympathetic reader would care because these pieces together let researchers run reproducible experiments and fair comparisons without rebuilding pipelines from scratch each time. If the unification works as claimed, it would lower the barrier to developing agents that reason, plan, and generalize beyond training data.

Core claim

The authors state that by unifying the full pipeline under a single scalable framework, stable-worldmodel dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models, delivering the data layer, baseline implementations, and extended environments as the concrete means to achieve standardized and reproducible evaluation of dynamics understanding, control performance, representation quality, and out-of-distribution generalization.

What carries the argument

The stable-worldmodel platform itself, which integrates a Lance-based data layer for fast native support and conversion across dataset formats, clean baseline implementations, and environments with controllable factors of variation for systematic testing.

If this is right

Native support and conversion tools for MP4, HDF5, and LeRobot datasets remove the need for custom video loaders in most experiments.
Well-tested baseline implementations and planning solvers let researchers focus effort on novel components rather than reimplementation.
Environments with controllable visual, geometric, and physical factors enable systematic measurement of out-of-distribution generalization.
The single framework makes it straightforward to reproduce and compare results across different research groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same unification approach could be extended to real-robot data streams to test whether the platform's benefits transfer beyond simulation.
Neighbouring areas such as model-based reinforcement learning may adopt similar standardized layers to address their own reproducibility gaps.
A natural next measurement would be to track how many new papers cite or build on the platform's environments for their generalization claims.

Load-bearing premise

The provided Lance-based data layer, baseline implementations, and extended environments with controllable factors will be sufficient for systematic evaluation and fair comparison without requiring substantial additional custom engineering by users.

What would settle it

A direct comparison study in which independent teams implement the same new world model both inside and outside the platform and measure total engineering time plus result consistency would settle whether the claimed reduction in overhead holds.

Figures

Figures reproduced from arXiv: 2605.21800 by Ayush Chaurasia, Damien Scieur, Dan Haramati, Francesco Capuano, Lucas Maes, Luiz Facury, Nassim Massaudi, Quentin Le Lidec, Randall Balestriero, Richard Gao, Taj Gillin, Yann LeCun.

**Figure 1.** Figure 1: Overview of stable-worldmodel: data is efficiently collected from a world and used to train world models via provided baselines, then leveraged by solvers for control. The idea of using predictive models to guide decision-making dates back to the 1960s-1970s from the control theory community [1, 2]. Most of these approaches relied on analytical, closedform models or hand-crafted simulators to predict the… view at source ↗

**Figure 2.** Figure 2: Environment families supported by swm. Top row: default (unperturbed) renderings of each environment. Bottom row: all visual factors of variation (e.g., agent, object, scene, geometry, lighting) jointly perturbed. Dynamic physical parameters (e.g., mass, density, gravity, or friction) can also be modified, but are omitted here as they are not visible in a single frame. standardized benchmarks, and reproduc… view at source ↗

**Figure 3.** Figure 3: Performance comparison of different data formats for a dataset from the Push-T environment. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of trajectory-level prediction MSE for successful (blue) and failed (red) plans [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Robustness analysis on Push-T. Left: effect of visual distractors. Right: planning success [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of visual perturbations that can be applied on-the-fly to any supported envi [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Factors of variation supported per environment. Light blue bars indicate total factors [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Performance comparison of different data formats for a dataset from the Two-Room [PITH_FULL_IMAGE:figures/full_fig_p033_8.png] view at source ↗

**Figure 9.** Figure 9: Push-T prediction error under increasing distribution shift for [PITH_FULL_IMAGE:figures/full_fig_p034_9.png] view at source ↗

**Figure 10.** Figure 10: PLDM counterpart of Fig [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗

**Figure 11.** Figure 11: Planning success rate of LeWM on Push-T as a function of background color intensity [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗

**Figure 12.** Figure 12: Benchmarking stable-worldmodel against standard baselines across diverse environments. Solid lines depict the mean episode reward over 5 random seeds, while shaded areas denote the standard deviation. TD-MPC2 consistently achieves faster convergence in continuous control tasks. 1.5 1.0 0.5 0.0 0.5 1.0 PC1 (44.7% var) 0.5 0.0 0.5 1.0 PC2 (10.0% var) Expert Actor [PITH_FULL_IMAGE:figures/full_fig_p036_12.png] view at source ↗

**Figure 13.** Figure 13: PCA projection of TD-MPC2’s latent state space on Push-T. Gray points show the [PITH_FULL_IMAGE:figures/full_fig_p036_13.png] view at source ↗

read the original abstract

World models are central to building agents that can reason, plan, and generalize beyond their training data. However, research on world models is currently fragmented, with disparate codebases, data pipelines, and evaluation protocols hindering reproducibility and fair comparison. Current practice is further limited by three key bottlenecks: fragile one-off codebases, slow video data loading, and the lack of standardized generalization benchmarks. We present stable-worldmodel (swm), an open-source platform for standardized and reproducible world modeling research and evaluation. It delivers (1) a high-performance Lance-based data layer with native support and conversion tools for MP4, HDF5, and LeRobot datasets, (2) clean, well-tested implementations of modern world model baselines and planning solvers, and (3) a broad suite of environments and tasks extended with controllable visual, geometric, and physical factors of variation for systematic in-silico evaluation of dynamics understanding, control performance, representation quality, and out-of-distribution generalization. By unifying the full pipeline under a single, scalable framework, \texttt{swm} dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

stable-worldmodel bundles a Lance data layer, baselines, and factorized environments into one platform, but supplies no measurements or usage examples to support the claim of dramatically lower research overhead.

read the letter

Hi colleague, the main takeaway here is that this paper releases an open-source platform called stable-worldmodel to address fragmentation in world modeling work, yet the central promise of big overhead cuts rests on an untested assumption that the delivered pieces are already complete enough for most users. What is actually new is the specific combination of a high-performance Lance data layer with built-in converters for MP4, HDF5, and LeRobot formats, plus clean baseline implementations of modern world models and planning solvers, and environments extended with controllable visual, geometric, and physical factors of variation. These pieces target the stated bottlenecks of fragile one-off code, slow video loading, and missing standardized generalization tests. The paper does a straightforward job naming those real pain points and showing how a single framework could let researchers run comparable experiments on dynamics understanding, control, representation quality, and out-of-distribution behavior without starting from scratch each time. That kind of shared infrastructure can matter in a subfield where code reuse is currently low. The soft spot is the missing evidence for the efficiency claim. The manuscript describes the three components but includes no usage traces, no timing comparisons to prior fragmented setups, no ablations on remaining custom code, and no concrete examples of end-to-end experiments run inside the platform. Without those, it is difficult to know whether the unification actually delivers the promised reduction or whether users will still need substantial extra engineering. This paper is aimed at people actively running world model experiments who want a reproducible starting point rather than those seeking new theoretical mechanisms. A reader focused on practical evaluation of generalization or control performance would get the most direct value. The thinking is clear and the intent to improve shared tooling is honest, so I would bring it to a reading group to discuss whether the field needs this kind of release. I recommend sending it for peer review rather than desk rejection; referees could usefully ask for the missing benchmarks and usage data while the platform itself has a reasonable chance of being adopted if the documentation holds up.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces stable-worldmodel (swm), an open-source platform for standardized and reproducible world modeling research. It claims to deliver three components: (1) a high-performance Lance-based data layer with native support and conversion tools for MP4, HDF5, and LeRobot datasets; (2) clean, well-tested implementations of modern world model baselines and planning solvers; and (3) a broad suite of environments and tasks extended with controllable visual, geometric, and physical factors of variation. By unifying the full pipeline under a single scalable framework, the paper asserts that swm dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models.

Significance. If the delivered components prove jointly sufficient for end-to-end reproducible experiments and fair comparisons with minimal custom engineering, the platform could meaningfully address fragmentation in world-model research by enabling systematic in-silico evaluation of dynamics understanding, control, representation quality, and out-of-distribution generalization. The provision of factorized environments and baseline implementations is a constructive contribution toward standardized benchmarks. However, the manuscript supplies no usage traces, ablation studies, or overhead measurements, so the claimed significance remains prospective rather than demonstrated.

major comments (2)

Abstract: the central claim that unifying the pipeline under swm 'dramatically reduces research overhead' is unsupported; the text describes the three components but provides neither timing measurements relative to prior fragmented codebases nor ablation results quantifying remaining custom engineering required by users.
Abstract: the assertions of 'high-performance' Lance data layer, 'clean, well-tested' baselines, and 'broad suite' of extended environments are presented without any implementation details, performance benchmarks, validation results, or concrete usage examples that would substantiate sufficiency for zero-custom-engineering systematic evaluation.

minor comments (2)

Consider adding explicit quick-start code snippets or a minimal reproducible experiment trace in the main text or supplementary material to illustrate end-to-end usage of the Lance layer, a baseline, and a factorized environment.
Clarify the exact scope of 'controllable factors of variation' (visual, geometric, physical) with a table listing which factors are exposed per environment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract claims would benefit from additional substantiation and will revise the manuscript to include more details, benchmarks, and examples as outlined below.

read point-by-point responses

Referee: Abstract: the central claim that unifying the pipeline under swm 'dramatically reduces research overhead' is unsupported; the text describes the three components but provides neither timing measurements relative to prior fragmented codebases nor ablation results quantifying remaining custom engineering required by users.

Authors: We acknowledge that the abstract asserts a reduction in overhead without direct quantitative comparisons in the current text. The manuscript's contribution centers on the integrated design of the data layer, baselines, and environments, which by construction eliminates the need for users to assemble disparate codebases. We will add a new subsection with preliminary timing measurements for data loading and setup effort relative to common prior practices, plus concrete usage traces showing the engineering steps required for a standard experiment. revision: yes
Referee: Abstract: the assertions of 'high-performance' Lance data layer, 'clean, well-tested' baselines, and 'broad suite' of extended environments are presented without any implementation details, performance benchmarks, validation results, or concrete usage examples that would substantiate sufficiency for zero-custom-engineering systematic evaluation.

Authors: We agree that the abstract would be strengthened by explicit support for these descriptors. The full manuscript already contains implementation descriptions of the Lance integration, baseline code structure, and environment factorizations, but we will expand the revised version with performance numbers for the data layer, test coverage statistics for the baselines, and step-by-step usage examples that illustrate end-to-end evaluation with controllable factors of variation. revision: yes

Circularity Check

0 steps flagged

No circularity: software platform paper with no derivations or fitted quantities

full rationale

The manuscript presents an open-source platform (data layer, baselines, extended environments) rather than any derivation chain, equations, or statistical predictions. No load-bearing steps reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. Claims about reduced research overhead are descriptive assertions about the delivered components, not results derived from the paper's own inputs by construction. This is a standard non-finding for infrastructure papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software platform contribution; the abstract introduces no free parameters, mathematical axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 5772 in / 1027 out tokens · 37363 ms · 2026-05-22T08:53:53.353985+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 24 internal anchors

[1]

Use of linear programming methods for synthesizing sampled-data automatic systems.Automn

AI Propoi. Use of linear programming methods for synthesizing sampled-data automatic systems.Automn. Remote Control, 24(7):837–844, 1963

work page 1963
[2]

Industrial applications of model based predictive control.Automatica, 29(5): 1251–1274, 1993

Jacques Richalet. Industrial applications of model based predictive control.Automatica, 29(5): 1251–1274, 1993

work page 1993
[3]

Model predictive control.Switzerland: Springer International Publishing, 38(13-56):7, 2016

Basil Kouvaritakis and Mark Cannon. Model predictive control.Switzerland: Springer International Publishing, 38(13-56):7, 2016

work page 2016
[4]

Model predictive control: theory, computation, and design.(No Title), 2020

James B Rawlings, David Q Mayne, and Moritz M Diehl. Model predictive control: theory, computation, and design.(No Title), 2020. 9

work page 2020
[5]

World Models

David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2(3):440, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

work page 2019
[7]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control.arXiv preprint arXiv:2310.16828, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

V-jepa: Latent video prediction for visual representation learning

Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mido Assran, and Nicolas Ballas. V-jepa: Latent video prediction for visual representation learning. 2023

work page 2023
[10]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

World models can leverage human videos for dexterous manipulation.arXiv preprint arXiv:2512.13644, 2025

Raktim Gautam Goswami, Amir Bar, David Fan, Tsung-Yen Yang, Gaoyue Zhou, Prashanth Krishnamurthy, Michael Rabbat, Farshad Khorrami, and Yann LeCun. World models can leverage human videos for dexterous manipulation.arXiv preprint arXiv:2512.13644, 2025

work page arXiv 2025
[12]

Pearson Education, 1995

Frederick P Brooks Jr.The mythical man-month: essays on software engineering. Pearson Education, 1995

work page 1995
[13]

A step toward quantifying independently reproducible machine learning research

Edward Raff. A step toward quantifying independently reproducible machine learning research. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[14]

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

Riashat Islam, Peter Henderson, Maziar Gomrokchi, and Doina Precup. Reproducibility of benchmarked deep reinforcement learning tasks for continuous control.arXiv preprint arXiv:1708.04133, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

work page 2019
[16]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG, et al. Gymnasium: A standard interface for reinforcement learning environments.arXiv preprint arXiv:2407.17032, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Training Agents Inside of Scalable World Models

Danijar Hafner, Wilson Yan, and Timothy Lillicrap. Training agents inside of scalable world models.arXiv preprint arXiv:2509.24527, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

A path towards autonomous machine intelligence version 0.9

Yann LeCun et al. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1):1–62, 2022

work page 2022
[19]

Learning from reward-free offline data: A case for planning with latent dynamics models.arXiv preprint arXiv:2502.14819, 2025

Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim GJ Rudner, and Yann LeCun. Learning from reward-free offline data: A case for planning with latent dynamics models.arXiv preprint arXiv:2502.14819, 2025

work page arXiv 2025
[20]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero and Yann LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. Leworld- model: Stable end-to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning.arXiv preprint arXiv:2411.04983, 2024. 10

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Springer, 2004

Reuven Y Rubinstein and Dirk P Kroese.The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, volume 133. Springer, 2004

work page 2004
[24]

Genie: Generative interactive environments

Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InForty-first International Conference on Machine Learning, 2024

work page 2024
[25]

From kepler to newton: Inductive biases guide learned world models in transformers, 2026

Ziming Liu, Sophia Sanborn, Surya Ganguli, and Andreas Tolias. From kepler to newton: Inductive biases guide learned world models in transformers, 2026. URL https://arxiv. org/abs/2602.06923

work page arXiv 2026
[26]

Lance: Efficient random access in columnar storage through adaptive structural encodings,

Weston Pace, Chang She, Lei Xu, Will Jones, Albert Lockett, Jun Wang, and Raunak Shah. Lance: Efficient random access in columnar storage through adaptive structural encodings,

work page
[27]

URLhttps://arxiv.org/abs/2504.15247

work page arXiv
[28]

Lerobot: An open-source library for end-to-end robot learning

Remi Cadene, Simon Alibert, Francesco Capuano, Michel Aractingi, Adil Zouitine, Pepijn Kooijmans, Jade Choghari, Martino Russi, Caroline Pascal, Steven Palma, et al. Lerobot: An open-source library for end-to-end robot learning. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[29]

Predictive sampling: Real-time behaviour synthesis with mujoco

Taylor Howell, Nimrod Gileadi, Saran Tunyasuvunakool, Kevin Zakka, Tom Erez, and Yuval Tassa. Predictive sampling: Real-time behaviour synthesis with mujoco. 2022

work page 2022
[30]

Sample-efficient cross-entropy method for real-time planning

Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold, Joerg Stueckler, Michal Rolinek, and Georg Martius. Sample-efficient cross-entropy method for real-time planning. In Conference on Robot Learning, pages 1049–1065. PMLR, 2021

work page 2021
[31]

Aggressive driving with model predictive path integral control

Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou. Aggressive driving with model predictive path integral control. In2016 IEEE international conference on robotics and automation (ICRA), pages 1433–1440. IEEE, 2016

work page 2016
[32]

Model-Based Planning with Discrete and Continuous Actions

Mikael Henaff, William F Whitney, and Yann LeCun. Model-based planning with discrete and continuous actions.arXiv preprint arXiv:1705.07177, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Parallel stochastic gradient-based planning for world models.arXiv preprint arXiv:2602.00475, 2026

Michael Psenka, Michael Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar. Parallel stochastic gradient-based planning for world models.arXiv preprint arXiv:2602.00475, 2026

work page arXiv 2026
[35]

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit q-learning.arXiv preprint arXiv:2110.06169, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[36]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021
[38]

Neuronlike adaptive elements that can solve difficult learning control problems.IEEE transactions on systems, man, and cybernetics, (5):834–846, 2012

Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike adaptive elements that can solve difficult learning control problems.IEEE transactions on systems, man, and cybernetics, (5):834–846, 2012

work page 2012
[39]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems (RSS), 2023

work page 2023
[40]

OGBench: Bench- marking offline goal-conditioned RL

Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. OGBench: Bench- marking offline goal-conditioned RL. InThe Thirteenth International Conference on Learning Representations, 2025. 11

work page 2025
[41]

The arcade learning environment: An evaluation platform for general agents.Journal of artificial intelligence research, 47:253–279, 2013

Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of artificial intelligence research, 47:253–279, 2013

work page 2013
[42]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012

work page 2012
[43]

Craftax: A lightning-fast benchmark for open-ended reinforcement learning

Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, and Jakob Foerster. Craftax: A lightning-fast benchmark for open-ended reinforcement learning. InProceedings of the 41st International Conference on Machine Learning (ICML), pages 35104–35137, 2024. URL https://arxiv.org/abs/2402.16801

work page arXiv 2024
[44]

Dream to Control: Learning Behaviors by Latent Imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination.arXiv preprint arXiv:1912.01603, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[45]

Mastering Atari with Discrete World Models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models.arXiv preprint arXiv:2010.02193, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[46]

Temporal difference learning for model predictive control

Nicklas Hansen, Xiaolong Wang, and Hao Su. Temporal difference learning for model predictive control.arXiv preprint arXiv:2203.04955, 2022

work page arXiv 2022
[47]

Learning and leveraging world models in visual representation learning.arXiv preprint arXiv:2403.00504, 2024

Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, and Yann LeCun. Learning and leveraging world models in visual representation learning.arXiv preprint arXiv:2403.00504, 2024

work page arXiv 2024
[48]

Navigation world models

Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, and Yann LeCun. Navigation world models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15791–15801, 2025

work page 2025
[49]

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, and Yongtao Ge. Worldmark: A unified benchmark suite for interactive video world models.arXiv preprint arXiv:2604.21686, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[50]

Benchmarking World-Model Learning with Environment-Level Queries

Archana Warrier, Dat Nguyen, Michelangelo Naim, Moksh Jain, Yichao Liang, Karen Schroeder, Cambridge Yang, Joshua B Tenenbaum, Sebastian V ollmer, Kevin Ellis, et al. Benchmarking world-model learning.arXiv preprint arXiv:2510.19788, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[51]

A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures

Basile Terver, Randall Balestriero, Megi Dervishi, David Fan, Quentin Garrido, Tushar Nagara- jan, Koustuv Sinha, Wancong Zhang, Mike Rabbat, Yann LeCun, et al. A lightweight library for energy-based joint-embedding predictive architectures.arXiv preprint arXiv:2602.03604, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[52]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Si- mon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

work page 2020
[53]

Playing Atari with Deep Reinforcement Learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[54]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018

work page 2018
[55]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[56]

Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024

work page 2024
[57]

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning.arXiv preprint arXiv:2004.07219, 2020. 12

work page internal anchor Pith review Pith/arXiv arXiv 2004
[58]

Natural Environment Benchmarks for Reinforcement Learning

Amy Zhang, Yuxin Wu, and Joelle Pineau. Natural environment benchmarks for reinforcement learning.arXiv preprint arXiv:1811.06032, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[59]

The distracting con- trol suite–a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021

Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. The distracting con- trol suite–a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021

work page arXiv 2021
[60]

Stabilizing deep q-learning with convnets and vision transformers under data augmentation.Advances in neural information processing systems, 34:3680–3693, 2021

Nicklas Hansen, Hao Su, and Xiaolong Wang. Stabilizing deep q-learning with convnets and vision transformers under data augmentation.Advances in neural information processing systems, 34:3680–3693, 2021

work page 2021
[61]

Dmc-vb: A benchmark for representation learning for control with visual distractors.Advances in Neural Information Processing Systems, 37:6574–6602, 2024

Joseph Ortiz, Antoine Dedieu, Wolfgang Lehrach, J Swaroop Guntupalli, Carter Wendelken, Ahmad Humayun, Sivaramakrishnan Swaminathan, Guangyao Zhou, Miguel Lázaro-Gredilla, and Kevin P Murphy. Dmc-vb: A benchmark for representation learning for control with visual distractors.Advances in Neural Information Processing Systems, 37:6574–6602, 2024

work page 2024
[62]

Assessing adaptive world models in machines with novel games.arXiv preprint arXiv:2507.12821, 2025

Lance Ying, Katherine M Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J Gershman, Jacob D Andreas, et al. Assessing adaptive world models in machines with novel games.arXiv preprint arXiv:2507.12821, 2025

work page arXiv 2025
[63]

Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

work page 2021
[64]

Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms.Journal of Machine Learning Research, 23(274):1–18, 2022

Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, and JoÃG, o GM AraÃšjo. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms.Journal of Machine Learning Research, 23(274):1–18, 2022

work page 2022
[65]

Mbrl-lib: A modular library for model-based reinforcement learning.arXiv preprint arXiv:2104.10159, 2021

Luis Pineda, Brandon Amos, Amy Zhang, Nathan O Lambert, and Roberto Calandra. Mbrl-lib: A modular library for model-based reinforcement learning.arXiv preprint arXiv:2104.10159, 2021

work page arXiv 2021
[66]

Robohive: A unified framework for robot learning.Advances in Neural Information Processing Systems, 36:44323–44340, 2023

Vikash Kumar, Rutav Shah, Gaoyue Zhou, Vincent Moens, Vittorio Caggiano, Abhishek Gupta, and Aravind Rajeswaran. Robohive: A unified framework for robot learning.Advances in Neural Information Processing Systems, 36:44323–44340, 2023

work page 2023
[67]

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Abhishek Joshi, Kevin Lin, Abhiram Maddukuri, Soroush Nasiriany, and Yifeng Zhu. robosuite: A modular simulation framework and benchmark for robot learning.arXiv preprint arXiv:2009.12293, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[68]

Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2): 3019–3026, 2020

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2): 3019–3026, 2020

work page 2020
[69]

arXiv preprint arXiv:1912.06088 , year=

Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin, Benjamin Eysenbach, and Sergey Levine. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088, 2019

work page arXiv 1912
[70]

Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning. 2021

work page 2021
[71]

Efficient projections onto the l1-ball for learning in high dimensions

John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the l1-ball for learning in high dimensions. InProceedings of the 25th International Conference on Machine Learning, ICML ’08, page 272–279, New York, NY , USA, 2008. Association for Computing Machinery. ISBN 9781605582054. doi: 10.1145/1390156.1390191. URL https...

work page doi:10.1145/1390156.1390191 2008
[72]

Hydra - a framework for elegantly configuring complex applications

Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019

work page 2019
[73]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 10, 2022. 13 Appendix Our Appendix complements the main paper with a walkthrough of thestable-worldmodel pla...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[1] [1]

Use of linear programming methods for synthesizing sampled-data automatic systems.Automn

AI Propoi. Use of linear programming methods for synthesizing sampled-data automatic systems.Automn. Remote Control, 24(7):837–844, 1963

work page 1963

[2] [2]

Industrial applications of model based predictive control.Automatica, 29(5): 1251–1274, 1993

Jacques Richalet. Industrial applications of model based predictive control.Automatica, 29(5): 1251–1274, 1993

work page 1993

[3] [3]

Model predictive control.Switzerland: Springer International Publishing, 38(13-56):7, 2016

Basil Kouvaritakis and Mark Cannon. Model predictive control.Switzerland: Springer International Publishing, 38(13-56):7, 2016

work page 2016

[4] [4]

Model predictive control: theory, computation, and design.(No Title), 2020

James B Rawlings, David Q Mayne, and Moritz M Diehl. Model predictive control: theory, computation, and design.(No Title), 2020. 9

work page 2020

[5] [5]

World Models

David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2(3):440, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

work page 2019

[7] [7]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control.arXiv preprint arXiv:2310.16828, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

V-jepa: Latent video prediction for visual representation learning

Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mido Assran, and Nicolas Ballas. V-jepa: Latent video prediction for visual representation learning. 2023

work page 2023

[10] [10]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

World models can leverage human videos for dexterous manipulation.arXiv preprint arXiv:2512.13644, 2025

Raktim Gautam Goswami, Amir Bar, David Fan, Tsung-Yen Yang, Gaoyue Zhou, Prashanth Krishnamurthy, Michael Rabbat, Farshad Khorrami, and Yann LeCun. World models can leverage human videos for dexterous manipulation.arXiv preprint arXiv:2512.13644, 2025

work page arXiv 2025

[12] [12]

Pearson Education, 1995

Frederick P Brooks Jr.The mythical man-month: essays on software engineering. Pearson Education, 1995

work page 1995

[13] [13]

A step toward quantifying independently reproducible machine learning research

Edward Raff. A step toward quantifying independently reproducible machine learning research. Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[14] [14]

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

Riashat Islam, Peter Henderson, Maziar Gomrokchi, and Doina Precup. Reproducibility of benchmarked deep reinforcement learning tasks for continuous control.arXiv preprint arXiv:1708.04133, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

work page 2019

[16] [16]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG, et al. Gymnasium: A standard interface for reinforcement learning environments.arXiv preprint arXiv:2407.17032, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Training Agents Inside of Scalable World Models

Danijar Hafner, Wilson Yan, and Timothy Lillicrap. Training agents inside of scalable world models.arXiv preprint arXiv:2509.24527, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

A path towards autonomous machine intelligence version 0.9

Yann LeCun et al. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1):1–62, 2022

work page 2022

[19] [19]

Learning from reward-free offline data: A case for planning with latent dynamics models.arXiv preprint arXiv:2502.14819, 2025

Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim GJ Rudner, and Yann LeCun. Learning from reward-free offline data: A case for planning with latent dynamics models.arXiv preprint arXiv:2502.14819, 2025

work page arXiv 2025

[20] [20]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero and Yann LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[21] [21]

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. Leworld- model: Stable end-to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[22] [22]

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning.arXiv preprint arXiv:2411.04983, 2024. 10

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Springer, 2004

Reuven Y Rubinstein and Dirk P Kroese.The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, volume 133. Springer, 2004

work page 2004

[24] [24]

Genie: Generative interactive environments

Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InForty-first International Conference on Machine Learning, 2024

work page 2024

[25] [25]

From kepler to newton: Inductive biases guide learned world models in transformers, 2026

Ziming Liu, Sophia Sanborn, Surya Ganguli, and Andreas Tolias. From kepler to newton: Inductive biases guide learned world models in transformers, 2026. URL https://arxiv. org/abs/2602.06923

work page arXiv 2026

[26] [26]

Lance: Efficient random access in columnar storage through adaptive structural encodings,

Weston Pace, Chang She, Lei Xu, Will Jones, Albert Lockett, Jun Wang, and Raunak Shah. Lance: Efficient random access in columnar storage through adaptive structural encodings,

work page

[27] [27]

URLhttps://arxiv.org/abs/2504.15247

work page arXiv

[28] [28]

Lerobot: An open-source library for end-to-end robot learning

Remi Cadene, Simon Alibert, Francesco Capuano, Michel Aractingi, Adil Zouitine, Pepijn Kooijmans, Jade Choghari, Martino Russi, Caroline Pascal, Steven Palma, et al. Lerobot: An open-source library for end-to-end robot learning. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026

[29] [29]

Predictive sampling: Real-time behaviour synthesis with mujoco

Taylor Howell, Nimrod Gileadi, Saran Tunyasuvunakool, Kevin Zakka, Tom Erez, and Yuval Tassa. Predictive sampling: Real-time behaviour synthesis with mujoco. 2022

work page 2022

[30] [30]

Sample-efficient cross-entropy method for real-time planning

Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold, Joerg Stueckler, Michal Rolinek, and Georg Martius. Sample-efficient cross-entropy method for real-time planning. In Conference on Robot Learning, pages 1049–1065. PMLR, 2021

work page 2021

[31] [31]

Aggressive driving with model predictive path integral control

Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou. Aggressive driving with model predictive path integral control. In2016 IEEE international conference on robotics and automation (ICRA), pages 1433–1440. IEEE, 2016

work page 2016

[32] [32]

Model-Based Planning with Discrete and Continuous Actions

Mikael Henaff, William F Whitney, and Yann LeCun. Model-based planning with discrete and continuous actions.arXiv preprint arXiv:1705.07177, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Parallel stochastic gradient-based planning for world models.arXiv preprint arXiv:2602.00475, 2026

Michael Psenka, Michael Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar. Parallel stochastic gradient-based planning for world models.arXiv preprint arXiv:2602.00475, 2026

work page arXiv 2026

[34] [35]

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit q-learning.arXiv preprint arXiv:2110.06169, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[35] [36]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [37]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021

[37] [38]

Neuronlike adaptive elements that can solve difficult learning control problems.IEEE transactions on systems, man, and cybernetics, (5):834–846, 2012

Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike adaptive elements that can solve difficult learning control problems.IEEE transactions on systems, man, and cybernetics, (5):834–846, 2012

work page 2012

[38] [39]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems (RSS), 2023

work page 2023

[39] [40]

OGBench: Bench- marking offline goal-conditioned RL

Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. OGBench: Bench- marking offline goal-conditioned RL. InThe Thirteenth International Conference on Learning Representations, 2025. 11

work page 2025

[40] [41]

The arcade learning environment: An evaluation platform for general agents.Journal of artificial intelligence research, 47:253–279, 2013

Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of artificial intelligence research, 47:253–279, 2013

work page 2013

[41] [42]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012

work page 2012

[42] [43]

Craftax: A lightning-fast benchmark for open-ended reinforcement learning

Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, and Jakob Foerster. Craftax: A lightning-fast benchmark for open-ended reinforcement learning. InProceedings of the 41st International Conference on Machine Learning (ICML), pages 35104–35137, 2024. URL https://arxiv.org/abs/2402.16801

work page arXiv 2024

[43] [44]

Dream to Control: Learning Behaviors by Latent Imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination.arXiv preprint arXiv:1912.01603, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912

[44] [45]

Mastering Atari with Discrete World Models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models.arXiv preprint arXiv:2010.02193, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[45] [46]

Temporal difference learning for model predictive control

Nicklas Hansen, Xiaolong Wang, and Hao Su. Temporal difference learning for model predictive control.arXiv preprint arXiv:2203.04955, 2022

work page arXiv 2022

[46] [47]

Learning and leveraging world models in visual representation learning.arXiv preprint arXiv:2403.00504, 2024

Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, and Yann LeCun. Learning and leveraging world models in visual representation learning.arXiv preprint arXiv:2403.00504, 2024

work page arXiv 2024

[47] [48]

Navigation world models

Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, and Yann LeCun. Navigation world models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15791–15801, 2025

work page 2025

[48] [49]

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, and Yongtao Ge. Worldmark: A unified benchmark suite for interactive video world models.arXiv preprint arXiv:2604.21686, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[49] [50]

Benchmarking World-Model Learning with Environment-Level Queries

Archana Warrier, Dat Nguyen, Michelangelo Naim, Moksh Jain, Yichao Liang, Karen Schroeder, Cambridge Yang, Joshua B Tenenbaum, Sebastian V ollmer, Kevin Ellis, et al. Benchmarking world-model learning.arXiv preprint arXiv:2510.19788, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[50] [51]

A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures

Basile Terver, Randall Balestriero, Megi Dervishi, David Fan, Quentin Garrido, Tushar Nagara- jan, Koustuv Sinha, Wancong Zhang, Mike Rabbat, Yann LeCun, et al. A lightweight library for energy-based joint-embedding predictive architectures.arXiv preprint arXiv:2602.03604, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[51] [52]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Si- mon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

work page 2020

[52] [53]

Playing Atari with Deep Reinforcement Learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[53] [54]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018

work page 2018

[54] [55]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[55] [56]

Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024

work page 2024

[56] [57]

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning.arXiv preprint arXiv:2004.07219, 2020. 12

work page internal anchor Pith review Pith/arXiv arXiv 2004

[57] [58]

Natural Environment Benchmarks for Reinforcement Learning

Amy Zhang, Yuxin Wu, and Joelle Pineau. Natural environment benchmarks for reinforcement learning.arXiv preprint arXiv:1811.06032, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[58] [59]

The distracting con- trol suite–a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021

Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. The distracting con- trol suite–a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021

work page arXiv 2021

[59] [60]

Stabilizing deep q-learning with convnets and vision transformers under data augmentation.Advances in neural information processing systems, 34:3680–3693, 2021

Nicklas Hansen, Hao Su, and Xiaolong Wang. Stabilizing deep q-learning with convnets and vision transformers under data augmentation.Advances in neural information processing systems, 34:3680–3693, 2021

work page 2021

[60] [61]

Dmc-vb: A benchmark for representation learning for control with visual distractors.Advances in Neural Information Processing Systems, 37:6574–6602, 2024

Joseph Ortiz, Antoine Dedieu, Wolfgang Lehrach, J Swaroop Guntupalli, Carter Wendelken, Ahmad Humayun, Sivaramakrishnan Swaminathan, Guangyao Zhou, Miguel Lázaro-Gredilla, and Kevin P Murphy. Dmc-vb: A benchmark for representation learning for control with visual distractors.Advances in Neural Information Processing Systems, 37:6574–6602, 2024

work page 2024

[61] [62]

Assessing adaptive world models in machines with novel games.arXiv preprint arXiv:2507.12821, 2025

Lance Ying, Katherine M Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J Gershman, Jacob D Andreas, et al. Assessing adaptive world models in machines with novel games.arXiv preprint arXiv:2507.12821, 2025

work page arXiv 2025

[62] [63]

Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

work page 2021

[63] [64]

Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms.Journal of Machine Learning Research, 23(274):1–18, 2022

Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, and JoÃG, o GM AraÃšjo. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms.Journal of Machine Learning Research, 23(274):1–18, 2022

work page 2022

[64] [65]

Mbrl-lib: A modular library for model-based reinforcement learning.arXiv preprint arXiv:2104.10159, 2021

Luis Pineda, Brandon Amos, Amy Zhang, Nathan O Lambert, and Roberto Calandra. Mbrl-lib: A modular library for model-based reinforcement learning.arXiv preprint arXiv:2104.10159, 2021

work page arXiv 2021

[65] [66]

Robohive: A unified framework for robot learning.Advances in Neural Information Processing Systems, 36:44323–44340, 2023

Vikash Kumar, Rutav Shah, Gaoyue Zhou, Vincent Moens, Vittorio Caggiano, Abhishek Gupta, and Aravind Rajeswaran. Robohive: A unified framework for robot learning.Advances in Neural Information Processing Systems, 36:44323–44340, 2023

work page 2023

[66] [67]

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Abhishek Joshi, Kevin Lin, Abhiram Maddukuri, Soroush Nasiriany, and Yifeng Zhu. robosuite: A modular simulation framework and benchmark for robot learning.arXiv preprint arXiv:2009.12293, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009

[67] [68]

Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2): 3019–3026, 2020

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2): 3019–3026, 2020

work page 2020

[68] [69]

arXiv preprint arXiv:1912.06088 , year=

Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin, Benjamin Eysenbach, and Sergey Levine. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088, 2019

work page arXiv 1912

[69] [70]

Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning. 2021

work page 2021

[70] [71]

Efficient projections onto the l1-ball for learning in high dimensions

John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the l1-ball for learning in high dimensions. InProceedings of the 25th International Conference on Machine Learning, ICML ’08, page 272–279, New York, NY , USA, 2008. Association for Computing Machinery. ISBN 9781605582054. doi: 10.1145/1390156.1390191. URL https...

work page doi:10.1145/1390156.1390191 2008

[71] [72]

Hydra - a framework for elegantly configuring complex applications

Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019

work page 2019

[72] [73]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 10, 2022. 13 Appendix Our Appendix complements the main paper with a walkthrough of thestable-worldmodel pla...

work page internal anchor Pith review Pith/arXiv arXiv 2022