stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation
Pith reviewed 2026-05-22 08:53 UTC · model grok-4.3
The pith
stable-worldmodel unifies data pipelines, baselines, and benchmarks under one framework to cut research overhead for world models
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors state that by unifying the full pipeline under a single scalable framework, stable-worldmodel dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models, delivering the data layer, baseline implementations, and extended environments as the concrete means to achieve standardized and reproducible evaluation of dynamics understanding, control performance, representation quality, and out-of-distribution generalization.
What carries the argument
The stable-worldmodel platform itself, which integrates a Lance-based data layer for fast native support and conversion across dataset formats, clean baseline implementations, and environments with controllable factors of variation for systematic testing.
If this is right
- Native support and conversion tools for MP4, HDF5, and LeRobot datasets remove the need for custom video loaders in most experiments.
- Well-tested baseline implementations and planning solvers let researchers focus effort on novel components rather than reimplementation.
- Environments with controllable visual, geometric, and physical factors enable systematic measurement of out-of-distribution generalization.
- The single framework makes it straightforward to reproduce and compare results across different research groups.
Where Pith is reading between the lines
- The same unification approach could be extended to real-robot data streams to test whether the platform's benefits transfer beyond simulation.
- Neighbouring areas such as model-based reinforcement learning may adopt similar standardized layers to address their own reproducibility gaps.
- A natural next measurement would be to track how many new papers cite or build on the platform's environments for their generalization claims.
Load-bearing premise
The provided Lance-based data layer, baseline implementations, and extended environments with controllable factors will be sufficient for systematic evaluation and fair comparison without requiring substantial additional custom engineering by users.
What would settle it
A direct comparison study in which independent teams implement the same new world model both inside and outside the platform and measure total engineering time plus result consistency would settle whether the claimed reduction in overhead holds.
Figures
read the original abstract
World models are central to building agents that can reason, plan, and generalize beyond their training data. However, research on world models is currently fragmented, with disparate codebases, data pipelines, and evaluation protocols hindering reproducibility and fair comparison. Current practice is further limited by three key bottlenecks: fragile one-off codebases, slow video data loading, and the lack of standardized generalization benchmarks. We present stable-worldmodel (swm), an open-source platform for standardized and reproducible world modeling research and evaluation. It delivers (1) a high-performance Lance-based data layer with native support and conversion tools for MP4, HDF5, and LeRobot datasets, (2) clean, well-tested implementations of modern world model baselines and planning solvers, and (3) a broad suite of environments and tasks extended with controllable visual, geometric, and physical factors of variation for systematic in-silico evaluation of dynamics understanding, control performance, representation quality, and out-of-distribution generalization. By unifying the full pipeline under a single, scalable framework, \texttt{swm} dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces stable-worldmodel (swm), an open-source platform for standardized and reproducible world modeling research. It claims to deliver three components: (1) a high-performance Lance-based data layer with native support and conversion tools for MP4, HDF5, and LeRobot datasets; (2) clean, well-tested implementations of modern world model baselines and planning solvers; and (3) a broad suite of environments and tasks extended with controllable visual, geometric, and physical factors of variation. By unifying the full pipeline under a single scalable framework, the paper asserts that swm dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models.
Significance. If the delivered components prove jointly sufficient for end-to-end reproducible experiments and fair comparisons with minimal custom engineering, the platform could meaningfully address fragmentation in world-model research by enabling systematic in-silico evaluation of dynamics understanding, control, representation quality, and out-of-distribution generalization. The provision of factorized environments and baseline implementations is a constructive contribution toward standardized benchmarks. However, the manuscript supplies no usage traces, ablation studies, or overhead measurements, so the claimed significance remains prospective rather than demonstrated.
major comments (2)
- Abstract: the central claim that unifying the pipeline under swm 'dramatically reduces research overhead' is unsupported; the text describes the three components but provides neither timing measurements relative to prior fragmented codebases nor ablation results quantifying remaining custom engineering required by users.
- Abstract: the assertions of 'high-performance' Lance data layer, 'clean, well-tested' baselines, and 'broad suite' of extended environments are presented without any implementation details, performance benchmarks, validation results, or concrete usage examples that would substantiate sufficiency for zero-custom-engineering systematic evaluation.
minor comments (2)
- Consider adding explicit quick-start code snippets or a minimal reproducible experiment trace in the main text or supplementary material to illustrate end-to-end usage of the Lance layer, a baseline, and a factorized environment.
- Clarify the exact scope of 'controllable factors of variation' (visual, geometric, physical) with a table listing which factors are exposed per environment.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We agree that the abstract claims would benefit from additional substantiation and will revise the manuscript to include more details, benchmarks, and examples as outlined below.
read point-by-point responses
-
Referee: Abstract: the central claim that unifying the pipeline under swm 'dramatically reduces research overhead' is unsupported; the text describes the three components but provides neither timing measurements relative to prior fragmented codebases nor ablation results quantifying remaining custom engineering required by users.
Authors: We acknowledge that the abstract asserts a reduction in overhead without direct quantitative comparisons in the current text. The manuscript's contribution centers on the integrated design of the data layer, baselines, and environments, which by construction eliminates the need for users to assemble disparate codebases. We will add a new subsection with preliminary timing measurements for data loading and setup effort relative to common prior practices, plus concrete usage traces showing the engineering steps required for a standard experiment. revision: yes
-
Referee: Abstract: the assertions of 'high-performance' Lance data layer, 'clean, well-tested' baselines, and 'broad suite' of extended environments are presented without any implementation details, performance benchmarks, validation results, or concrete usage examples that would substantiate sufficiency for zero-custom-engineering systematic evaluation.
Authors: We agree that the abstract would be strengthened by explicit support for these descriptors. The full manuscript already contains implementation descriptions of the Lance integration, baseline code structure, and environment factorizations, but we will expand the revised version with performance numbers for the data layer, test coverage statistics for the baselines, and step-by-step usage examples that illustrate end-to-end evaluation with controllable factors of variation. revision: yes
Circularity Check
No circularity: software platform paper with no derivations or fitted quantities
full rationale
The manuscript presents an open-source platform (data layer, baselines, extended environments) rather than any derivation chain, equations, or statistical predictions. No load-bearing steps reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. Claims about reduced research overhead are descriptive assertions about the delivered components, not results derived from the paper's own inputs by construction. This is a standard non-finding for infrastructure papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Use of linear programming methods for synthesizing sampled-data automatic systems.Automn
AI Propoi. Use of linear programming methods for synthesizing sampled-data automatic systems.Automn. Remote Control, 24(7):837–844, 1963
work page 1963
-
[2]
Industrial applications of model based predictive control.Automatica, 29(5): 1251–1274, 1993
Jacques Richalet. Industrial applications of model based predictive control.Automatica, 29(5): 1251–1274, 1993
work page 1993
-
[3]
Model predictive control.Switzerland: Springer International Publishing, 38(13-56):7, 2016
Basil Kouvaritakis and Mark Cannon. Model predictive control.Switzerland: Springer International Publishing, 38(13-56):7, 2016
work page 2016
-
[4]
Model predictive control: theory, computation, and design.(No Title), 2020
James B Rawlings, David Q Mayne, and Moritz M Diehl. Model predictive control: theory, computation, and design.(No Title), 2020. 9
work page 2020
-
[5]
David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2(3):440, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019
work page 2019
-
[7]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control.arXiv preprint arXiv:2310.16828, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[9]
V-jepa: Latent video prediction for visual representation learning
Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mido Assran, and Nicolas Ballas. V-jepa: Latent video prediction for visual representation learning. 2023
work page 2023
-
[10]
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Raktim Gautam Goswami, Amir Bar, David Fan, Tsung-Yen Yang, Gaoyue Zhou, Prashanth Krishnamurthy, Michael Rabbat, Farshad Khorrami, and Yann LeCun. World models can leverage human videos for dexterous manipulation.arXiv preprint arXiv:2512.13644, 2025
-
[12]
Frederick P Brooks Jr.The mythical man-month: essays on software engineering. Pearson Education, 1995
work page 1995
-
[13]
A step toward quantifying independently reproducible machine learning research
Edward Raff. A step toward quantifying independently reproducible machine learning research. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[14]
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
Riashat Islam, Peter Henderson, Maziar Gomrokchi, and Doina Precup. Reproducibility of benchmarked deep reinforcement learning tasks for continuous control.arXiv preprint arXiv:1708.04133, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019
work page 2019
-
[16]
Gymnasium: A Standard Interface for Reinforcement Learning Environments
Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG, et al. Gymnasium: A standard interface for reinforcement learning environments.arXiv preprint arXiv:2407.17032, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Training Agents Inside of Scalable World Models
Danijar Hafner, Wilson Yan, and Timothy Lillicrap. Training agents inside of scalable world models.arXiv preprint arXiv:2509.24527, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
A path towards autonomous machine intelligence version 0.9
Yann LeCun et al. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1):1–62, 2022
work page 2022
-
[19]
Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim GJ Rudner, and Yann LeCun. Learning from reward-free offline data: A case for planning with latent dynamics models.arXiv preprint arXiv:2502.14819, 2025
-
[20]
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
Randall Balestriero and Yann LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. Leworld- model: Stable end-to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[22]
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning.arXiv preprint arXiv:2411.04983, 2024. 10
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Reuven Y Rubinstein and Dirk P Kroese.The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, volume 133. Springer, 2004
work page 2004
-
[24]
Genie: Generative interactive environments
Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[25]
From kepler to newton: Inductive biases guide learned world models in transformers, 2026
Ziming Liu, Sophia Sanborn, Surya Ganguli, and Andreas Tolias. From kepler to newton: Inductive biases guide learned world models in transformers, 2026. URL https://arxiv. org/abs/2602.06923
-
[26]
Lance: Efficient random access in columnar storage through adaptive structural encodings,
Weston Pace, Chang She, Lei Xu, Will Jones, Albert Lockett, Jun Wang, and Raunak Shah. Lance: Efficient random access in columnar storage through adaptive structural encodings,
- [27]
-
[28]
Lerobot: An open-source library for end-to-end robot learning
Remi Cadene, Simon Alibert, Francesco Capuano, Michel Aractingi, Adil Zouitine, Pepijn Kooijmans, Jade Choghari, Martino Russi, Caroline Pascal, Steven Palma, et al. Lerobot: An open-source library for end-to-end robot learning. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[29]
Predictive sampling: Real-time behaviour synthesis with mujoco
Taylor Howell, Nimrod Gileadi, Saran Tunyasuvunakool, Kevin Zakka, Tom Erez, and Yuval Tassa. Predictive sampling: Real-time behaviour synthesis with mujoco. 2022
work page 2022
-
[30]
Sample-efficient cross-entropy method for real-time planning
Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold, Joerg Stueckler, Michal Rolinek, and Georg Martius. Sample-efficient cross-entropy method for real-time planning. In Conference on Robot Learning, pages 1049–1065. PMLR, 2021
work page 2021
-
[31]
Aggressive driving with model predictive path integral control
Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou. Aggressive driving with model predictive path integral control. In2016 IEEE international conference on robotics and automation (ICRA), pages 1433–1440. IEEE, 2016
work page 2016
-
[32]
Model-Based Planning with Discrete and Continuous Actions
Mikael Henaff, William F Whitney, and Yann LeCun. Model-based planning with discrete and continuous actions.arXiv preprint arXiv:1705.07177, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[33]
Parallel stochastic gradient-based planning for world models.arXiv preprint arXiv:2602.00475, 2026
Michael Psenka, Michael Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar. Parallel stochastic gradient-based planning for world models.arXiv preprint arXiv:2602.00475, 2026
-
[35]
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit q-learning.arXiv preprint arXiv:2110.06169, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[36]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021
work page 2021
-
[38]
Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike adaptive elements that can solve difficult learning control problems.IEEE transactions on systems, man, and cybernetics, (5):834–846, 2012
work page 2012
-
[39]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems (RSS), 2023
work page 2023
-
[40]
OGBench: Bench- marking offline goal-conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. OGBench: Bench- marking offline goal-conditioned RL. InThe Thirteenth International Conference on Learning Representations, 2025. 11
work page 2025
-
[41]
Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of artificial intelligence research, 47:253–279, 2013
work page 2013
-
[42]
Mujoco: A physics engine for model-based control
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012
work page 2012
-
[43]
Craftax: A lightning-fast benchmark for open-ended reinforcement learning
Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, and Jakob Foerster. Craftax: A lightning-fast benchmark for open-ended reinforcement learning. InProceedings of the 41st International Conference on Machine Learning (ICML), pages 35104–35137, 2024. URL https://arxiv.org/abs/2402.16801
-
[44]
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination.arXiv preprint arXiv:1912.01603, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[45]
Mastering Atari with Discrete World Models
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models.arXiv preprint arXiv:2010.02193, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[46]
Temporal difference learning for model predictive control
Nicklas Hansen, Xiaolong Wang, and Hao Su. Temporal difference learning for model predictive control.arXiv preprint arXiv:2203.04955, 2022
-
[47]
Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, and Yann LeCun. Learning and leveraging world models in visual representation learning.arXiv preprint arXiv:2403.00504, 2024
-
[48]
Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, and Yann LeCun. Navigation world models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15791–15801, 2025
work page 2025
-
[49]
WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, and Yongtao Ge. Worldmark: A unified benchmark suite for interactive video world models.arXiv preprint arXiv:2604.21686, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[50]
Benchmarking World-Model Learning with Environment-Level Queries
Archana Warrier, Dat Nguyen, Michelangelo Naim, Moksh Jain, Yichao Liang, Karen Schroeder, Cambridge Yang, Joshua B Tenenbaum, Sebastian V ollmer, Kevin Ellis, et al. Benchmarking world-model learning.arXiv preprint arXiv:2510.19788, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures
Basile Terver, Randall Balestriero, Megi Dervishi, David Fan, Quentin Garrido, Tushar Nagara- jan, Koustuv Sinha, Wancong Zhang, Mike Rabbat, Yann LeCun, et al. A lightweight library for energy-based joint-embedding predictive architectures.arXiv preprint arXiv:2602.03604, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[52]
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Si- mon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020
work page 2020
-
[53]
Playing Atari with Deep Reinforcement Learning
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[54]
Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018
work page 2018
-
[55]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[56]
Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024
work page 2024
-
[57]
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning.arXiv preprint arXiv:2004.07219, 2020. 12
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[58]
Natural Environment Benchmarks for Reinforcement Learning
Amy Zhang, Yuxin Wu, and Joelle Pineau. Natural environment benchmarks for reinforcement learning.arXiv preprint arXiv:1811.06032, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[59]
Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. The distracting con- trol suite–a challenging benchmark for reinforcement learning from pixels.arXiv preprint arXiv:2101.02722, 2021
-
[60]
Nicklas Hansen, Hao Su, and Xiaolong Wang. Stabilizing deep q-learning with convnets and vision transformers under data augmentation.Advances in neural information processing systems, 34:3680–3693, 2021
work page 2021
-
[61]
Joseph Ortiz, Antoine Dedieu, Wolfgang Lehrach, J Swaroop Guntupalli, Carter Wendelken, Ahmad Humayun, Sivaramakrishnan Swaminathan, Guangyao Zhou, Miguel Lázaro-Gredilla, and Kevin P Murphy. Dmc-vb: A benchmark for representation learning for control with visual distractors.Advances in Neural Information Processing Systems, 37:6574–6602, 2024
work page 2024
-
[62]
Assessing adaptive world models in machines with novel games.arXiv preprint arXiv:2507.12821, 2025
Lance Ying, Katherine M Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J Gershman, Jacob D Andreas, et al. Assessing adaptive world models in machines with novel games.arXiv preprint arXiv:2507.12821, 2025
-
[63]
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021
work page 2021
-
[64]
Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, and JoÃG, o GM AraÚjo. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms.Journal of Machine Learning Research, 23(274):1–18, 2022
work page 2022
-
[65]
Luis Pineda, Brandon Amos, Amy Zhang, Nathan O Lambert, and Roberto Calandra. Mbrl-lib: A modular library for model-based reinforcement learning.arXiv preprint arXiv:2104.10159, 2021
-
[66]
Vikash Kumar, Rutav Shah, Gaoyue Zhou, Vincent Moens, Vittorio Caggiano, Abhishek Gupta, and Aravind Rajeswaran. Robohive: A unified framework for robot learning.Advances in Neural Information Processing Systems, 36:44323–44340, 2023
work page 2023
-
[67]
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Abhishek Joshi, Kevin Lin, Abhiram Maddukuri, Soroush Nasiriany, and Yifeng Zhu. robosuite: A modular simulation framework and benchmark for robot learning.arXiv preprint arXiv:2009.12293, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[68]
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2): 3019–3026, 2020
work page 2020
-
[69]
arXiv preprint arXiv:1912.06088 , year=
Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin, Benjamin Eysenbach, and Sergey Levine. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088, 2019
-
[70]
Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning
Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning. 2021
work page 2021
-
[71]
Efficient projections onto the l1-ball for learning in high dimensions
John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the l1-ball for learning in high dimensions. InProceedings of the 25th International Conference on Machine Learning, ICML ’08, page 272–279, New York, NY , USA, 2008. Association for Computing Machinery. ISBN 9781605582054. doi: 10.1145/1390156.1390191. URL https...
-
[72]
Hydra - a framework for elegantly configuring complex applications
Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019
work page 2019
-
[73]
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 10, 2022. 13 Appendix Our Appendix complements the main paper with a walkthrough of thestable-worldmodel pla...
work page internal anchor Pith review Pith/arXiv arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.