Simulus: Combining Improvements in Sample-Efficient World Model Agents
Pith reviewed 2026-05-23 02:25 UTC · model grok-4.3
The pith
Simulus shows that four separate improvements to world-model agents combine without conflict to set new sample-efficiency records on visual, continuous, and symbolic tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Simulus integrates a flexible tokenization framework, intrinsic motivation for epistemic uncertainty reduction, prioritized world-model replay, and regression-as-classification for reward and return prediction; the resulting agent achieves state-of-the-art sample efficiency for planning-free world models on visual Atari 100K, continuous-control DMC Proprioception 500K, and symbolic Craftax-1M while each component contributes individually and their combination produces synergistic gains.
What carries the argument
Simulus, a modular token-based world-model agent that supports arbitrary observation and action modalities and adds the four listed improvements on top of a shared base learner.
If this is right
- Each of the four components improves performance when added alone.
- The combination of all four yields larger gains than any subset.
- Intrinsic motivation continues to help even when total environment steps are severely limited.
- A single token-based architecture can accommodate visual, proprioceptive, and symbolic inputs without task-specific redesign.
Where Pith is reading between the lines
- Modular token interfaces may make it easier to test future improvements without rewriting the entire agent stack.
- The success on three very different domains suggests the same four additions could be tried on planning-based world-model agents.
- If prioritized replay of model rollouts remains useful, similar prioritization could be applied to other internal buffers such as value or policy targets.
Load-bearing premise
That the four components complement each other without significant negative interactions and that intrinsic motivation remains beneficial even under the tight interaction budgets of sample-efficient RL.
What would settle it
An ablation on any of the three benchmarks in which the full Simulus agent underperforms a version that omits one or more of the four components.
Figures
read the original abstract
World models (WMs) represent the frontier of sample-efficient reinforcement learning, but their complexity leaves many promising improvements unrealized due to the significant expertise and effort required to identify and integrate them. Inspired by Rainbow, which showed that individually known improvements to DQN complement each other and can be effectively combined, we take on this challenge and ask whether the same principle applies to world model agents. We introduce Simulus, a modular token-based WM agent that integrates: (1) a flexible tokenization framework supporting arbitrary combinations of observation and action modalities; (2) intrinsic motivation for epistemic uncertainty reduction; (3) prioritized world model replay; and (4) regression-as-classification for reward and return prediction. Simulus achieves state-of-the-art sample efficiency for planning-free WMs across three diverse benchmarks: visual Atari 100K, continuous-control DMC Proprioception 500K, and symbolic Craftax-1M. Notably, intrinsic motivation proves beneficial even under the tight interaction budgets of sample-efficient RL, despite the risk of wasting scarce interactions on task-irrelevant experience. Ablation studies reveal that each component contributes individually, and their combination yields synergistic gains. Our code and model weights are publicly available at https://github.com/leor-c/Simulus.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Simulus, a modular token-based world model agent integrating four components: flexible tokenization supporting arbitrary observation/action modalities, intrinsic motivation for epistemic uncertainty reduction, prioritized world model replay, and regression-as-classification for reward/return prediction. It claims state-of-the-art sample efficiency for planning-free world models on visual Atari 100K, continuous-control DMC Proprioception 500K, and symbolic Craftax-1M benchmarks. Ablation studies indicate each component contributes positively on its own with synergistic gains from the full combination; intrinsic motivation remains beneficial at the tight 100K/500K/1M interaction budgets. Public code and model weights are released.
Significance. If the results hold, the work shows that the Rainbow-style combination of complementary improvements can be successfully applied to world model agents, potentially reducing the expertise barrier for building sample-efficient RL systems. The public code and weights directly support reproducibility of the SOTA claims across three diverse benchmarks and address concerns about experimental details.
minor comments (2)
- [Abstract] The abstract claims SOTA results but does not name the specific metrics (e.g., mean return, human-normalized score) or list the exact baselines against which superiority is measured.
- The manuscript would benefit from explicit reporting of the number of random seeds, confidence intervals, and any statistical tests used to support the ablation and benchmark comparisons.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of Simulus, the recognition of its modular design and reproducibility contributions, and the recommendation for minor revision. No major comments were provided in the report.
Circularity Check
No significant circularity
full rationale
The paper is an empirical study that combines four modular improvements to world-model agents and validates them via ablation experiments on three standard benchmarks (Atari 100K, DMC 500K, Craftax-1M). All performance claims rest on reported interaction counts, reward curves, and ablation tables rather than any derivation, equation, or fitted parameter that reduces to its own inputs by construction. Public code and weights are supplied, making the results externally reproducible against the same benchmarks without reliance on self-citation chains or self-definitional steps.
Axiom & Free-Parameter Ledger
free parameters (1)
- Various agent and training hyperparameters
axioms (1)
- domain assumption Environments follow the standard Markov decision process formulation used in RL
Forward citations
Cited by 1 Pith paper
-
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampli...
Reference graph
Works this paper leans on
-
[1]
Cosmos World Foundation Model Platform for Physical AI
Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai. arXiv preprint arXiv:2501.03575, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Deep reinforcement learning at the edge of the statistical precipice
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C Courville, and Marc Belle- mare. Deep reinforcement learning at the edge of the statistical precipice. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 29304–29320. Curran Associates, Inc....
work page 2021
-
[3]
Diffusion for world modeling: Visual details matter in atari
Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari. arXiv preprint arXiv:2405.12399, 2024
-
[4]
Agent57: Outperforming the Atari human benchmark
Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvit- skyi, Zhaohan Daniel Guo, and Charles Blundell. Agent57: Outperforming the Atari human benchmark. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, p...
work page 2020
-
[5]
Never give up: Learning directed exploration strategies
Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martin Arjovsky, Alexander Pritzel, Andrew Bolt, and Charles Blundell. Never give up: Learning directed exploration strategies. In International Con- ference on Learning Representations, 2020. URL https://openreview.net/forum?id= Sye57xStvB
work page 2020
-
[6]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[7]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Do- minik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Video generation models as world simulators
Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video generation models as world simulators. 2024. URL https://openai.com/research/ video-generation-models-as-world-simulators
work page 2024
-
[9]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agar- wal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Ma- teusz Litwin, S...
work page 1901
-
[10]
Exploration by random network distillation
Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=H1lJJnR5Ym
work page 2019
-
[11]
Improving token-based world models with parallel observation prediction
Lior Cohen, Kaixin Wang, Bingyi Kang, and Shie Mannor. Improving token-based world models with parallel observation prediction. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=Lfp5Dk1xb6
work page 2024
-
[12]
Oasis: A universe in a transformer, 2024
Decart, Julian Quevedo, Quinn McIntyre, Spruce Campbell, Xinlei Chen, and Robert Wachen. Oasis: A universe in a transformer, 2024. URL https://oasis-model.github.io/. 10
work page 2024
-
[13]
Improving transformer world models for data-efficient rl, 2025
Antoine Dedieu, Joseph Ortiz, Xinghua Lou, Carter Wendelken, Wolfgang Lehrach, J Swaroop Guntupalli, Miguel Lazaro-Gredilla, and Kevin Patrick Murphy. Improving transformer world models for data-efficient rl, 2025. URL https://arxiv.org/abs/2502.01591
-
[14]
Genie 2: A large-scale foundation world model, 2024
Google DeepMind. Genie 2: A large-scale foundation world model, 2024. URL https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation- world-model/
work page 2024
-
[15]
Taming transformers for high-resolution image synthesis
Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021
work page 2021
-
[16]
Stop regressing: Training value functions via classification for scalable deep RL
Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taiga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, and Rishabh Agarwal. Stop regressing: Training value functions via classification for scalable deep RL. In Forty-first International Conference on Machine Learning, 2024. URL https://openr...
work page 2024
-
[17]
Recurrent world models facilitate policy evolution
David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems 31, pages 2451–2463. Curran Associates, Inc., 2018. URL https://papers.nips.cc/paper/7512-recurrent-world-models- facilitate-policy-evolution. https://worldmodels.github.io
work page 2018
-
[18]
Dream to control: Learning behaviors by latent imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representa- tions, 2020. URL https://openreview.net/forum?id=S1lOTC4tDS
work page 2020
-
[19]
Lillicrap, Mohammad Norouzi, and Jimmy Ba
Danijar Hafner, Timothy P. Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021. URL https: //openreview.net/forum?id=0oabwyZbOu
work page 2021
-
[20]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
TD-MPC2: Scalable, robust world models for continuous control
Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Oxh5CstDJU
work page 2024
-
[22]
Provably efficient maximum entropy exploration
Elad Hazan, Sham Kakade, Karan Singh, and Abby Van Soest. Provably efficient maximum entropy exploration. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2681–2691. PMLR, 09–15 Jun 2019. URL https://proceedings. mlr.p...
work page 2019
-
[23]
Exploration via ellip- tical episodic bonuses
Mikael Henaff, Roberta Raileanu, Minqi Jiang, and Tim Rocktäschel. Exploration via ellip- tical episodic bonuses. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems , 2022. URL https: //openreview.net/forum?id=Xg-yZos9qJQ
work page 2022
-
[24]
Bridging nonlinearities and stochastic regularizers with gaussian error linear units, 2017
Dan Hendrycks and Kevin Gimpel. Bridging nonlinearities and stochastic regularizers with gaussian error linear units, 2017. URL https://openreview.net/forum?id=Bk0MRI5lg
work page 2017
-
[25]
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735–1780, 1997
work page 1997
-
[27]
Perceptual losses for real-time style transfer and super-resolution
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016. 11
work page 2016
-
[28]
Model based reinforcement learning for atari
Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Bla˙zej Osi´nski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model based reinforcement learning for atari. In International Conference on Learning Representations , 2020. URL https: /...
work page 2020
-
[29]
Transformers are RNNs: Fast autoregressive transformers with linear attention
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5156–5165. PMLR, 13–18 Jul 2...
work page 2020
-
[30]
Curious replay for model-based adaptation
Isaac Kauvar, Chris Doyle, Linqi Zhou, and Nick Haber. Curious replay for model-based adaptation. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023
work page 2023
-
[31]
Simple and scal- able predictive uncertainty estimation using deep ensembles
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scal- able predictive uncertainty estimation using deep ensembles. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, edi- tors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc., 2017. URL https:/...
work page 2017
-
[32]
Autoencoding beyond pixels using a learned similarity metric
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. In Maria Florina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1558–1566, New York, N...
work page 2016
-
[33]
UNIFIED-IO: A unified model for vision, language, and multi-modal tasks
Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, and Aniruddha Kembhavi. UNIFIED-IO: A unified model for vision, language, and multi-modal tasks. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview. net/forum?id=E01k9048soZ
work page 2023
-
[34]
Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action
Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, and Aniruddha Kembhavi. Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 26439–26455, June 2024
work page 2024
-
[35]
Craftax: A lightning-fast benchmark for open-ended reinforcement learning
Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, and Jakob Foerster. Craftax: A lightning-fast benchmark for open-ended reinforcement learning. In International Conference on Machine Learning (ICML), 2024
work page 2024
-
[36]
Dis- covering and achieving goals via world models
Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, and Deepak Pathak. Dis- covering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021
work page 2021
-
[37]
Transformers are sample-efficient world models
Vincent Micheli, Eloi Alonso, and François Fleuret. Transformers are sample-efficient world models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/ pdf?id=vhFu1Acb0xb
work page 2023
-
[38]
Efficient world models with context-aware tokenization
Vincent Micheli, Eloi Alonso, and François Fleuret. Efficient world models with context-aware tokenization. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024. URL https://openreview.net/forum? id=BiWIERWBFX
work page 2024
-
[39]
Human-level control through deep reinforcement learning
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015. 12
work page 2015
-
[40]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performa...
work page 2019
-
[41]
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven ex- ploration by self-supervised prediction. In Doina Precup and Yee Whye Teh, editors, Pro- ceedings of the 34th International Conference on Machine Learning , volume 70 of Pro- ceedings of Machine Learning Research, pages 2778–2787. PMLR, 06–11 Aug 2017. URL https://pro...
work page 2017
-
[42]
Prajit Ramachandran, Barret Zoph, and Quoc V . Le. Searching for activation functions, 2018. URL https://openreview.net/forum?id=SkBYYyZRZ
work page 2018
-
[43]
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gómez Colmenarejo, Alexander Novikov, Gabriel Barth-maron, Mai Giménez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, and Nando de Freitas. A generalist agent. Transactions on ...
work page 2022
-
[44]
Transformer-based world models are happy with 100k interactions
Jan Robine, Marc Höftmann, Tobias Uelwer, and Stefan Harmeling. Transformer-based world models are happy with 100k interactions. arXiv preprint arXiv:2303.07109, 2023
-
[45]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022
work page 2022
-
[46]
doi: 10.1109/TAMD.2010.2056368
Jürgen Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development , 2, 2010. ISSN 19430604. doi: 10.1109/TAMD.2010.2056368
-
[47]
A generalist dynamics model for control
Ingmar Schubert, Jingwei Zhang, Jake Bruce, Sarah Bechtle, Emilio Parisotto, Martin Ried- miller, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, and Nicolas Heess. A generalist dynamics model for control. arXiv preprint arXiv:2305.10912, 2023
-
[48]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[49]
Planning to explore via self-supervised world models
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8583–8592. PMLR, 13–18 Jul 2020...
work page 2020
-
[50]
Model-based active exploration
Pranav Shyam, Wojciech Ja´skowski, and Faustino Gomez. Model-based active exploration. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5779–5788. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/ shyam19a.html
work page 2019
-
[51]
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei. Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[52]
Policy gradi- ent methods for reinforcement learning with function approximation
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradi- ent methods for reinforcement learning with function approximation. In S. Solla, T. Leen, and K. Müller, editors, Advances in Neural Information Processing Systems , volume 12. MIT Press, 1999. URL https://proceedings.neurips.cc/paper_files/paper/1999/ file/464d828b85b0b...
work page 1999
-
[53]
Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, Nicolas Heess, and Yuval Tassa. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020. ISSN 2665-9638. doi: https:// doi.org/10.1016/j.simpa.2020.100022. URL https://www.sciencedirect.com/science/ article/...
-
[54]
Diffusion Models Are Real-Time Game Engines
Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter. Diffusion models are real-time game engines. CoRR, abs/2408.14837, 2024. URL https://doi.org/10.48550/ arXiv.2408.14837
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
Neural discrete representation learning
Aaron van den Oord, Oriol Vinyals, and koray kavukcuoglu. Neural discrete representation learning. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/ 2017/...
work page 2017
-
[56]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, ed- itors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc., 2017. UR...
work page 2017
-
[57]
Efficientzero v2: Mas- tering discrete and continuous control with limited data
Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, and Yang Gao. Efficientzero v2: Mas- tering discrete and continuous control with limited data. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=LHGMXcr6zx
work page 2024
-
[58]
Parallelizing model-based rein- forcement learning over the sequence length
ZiRui Wang, Yue DENG, Junfeng Long, and Yin Zhang. Parallelizing model-based rein- forcement learning over the sequence length. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id= R6N9AGyz13
work page 2024
-
[59]
ivideoGPT: Interactive videoGPTs are scalable world models
Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye HAO, and Mingsheng Long. ivideoGPT: Interactive videoGPTs are scalable world models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview. net/forum?id=4TENzBftZR
work page 2024
-
[60]
Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, and Gao Huang. Storm: Efficient stochastic transformer based world models for reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024. 14 A Models and Hyperparameters A.1 Hyperparameters We detail shared hyperparameters in Table 1, training hyperparameters in Table 2, world model h...
work page 2024
-
[61]
Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: We provide extensive empirical evidence in Section 3, including ablation studies, which directly relate to our contributions and claims. The scope of our paper is sample-efficient, planning-free wor...
-
[62]
Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: Section 5 explicitly discuss the limitations of our work. Additional limitations are discussed in Section 3 (e.g., the absence of ablations on Craftax due to computational limitations). Guidelines: • The answer NA means that the ...
-
[63]
Guidelines: • The answer NA means that the paper does not include theoretical results
Theory assumptions and proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [NA] Justification: Our paper does not include theoretical results. Guidelines: • The answer NA means that the paper does not include theoretical results. • All the theorems, formulas, and proo...
-
[64]
Guidelines: • The answer NA means that the paper does not include experiments
Experimental result reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main ex- perimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: In Section 2 and in the ap...
-
[65]
Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instruc- tions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: In our abstract and appendix we provide a link to the code and trained model weights. Our code has a detail...
-
[66]
Guidelines: • The answer NA means that the paper does not include experiments
Experimental setting/details Question: Does the paper specify all the training and test details (e.g., data splits, hyper- parameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: We specify all experimental details in Section 3 and in Appendix A and C. Guidelines: • The answer NA means t...
-
[67]
Figure 5 also includes error bars
Experiment statistical significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: We utilize the rliable toolkit [2] to generate plots with appropriate error bars (Figure 4 bottom, Figure 6). Figure 5 also includ...
-
[68]
Guidelines: • The answer NA means that the paper does not include experiments
Experiments compute resources Question: For each experiment, does the paper provide sufficient information on the com- puter resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: We provide this information in Appendix C. Guidelines: • The answer NA means that the paper does not in...
-
[69]
No human subjects or partici- pants were involved
Code of ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines? Answer: [Yes] Justification: Our work follows the NeurIPS Code of Ethics. No human subjects or partici- pants were involved. We found no special concerns beyond those related to the genera...
-
[70]
As such, there are no direct positive or negative societal impacts
Broader impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] 30 Justification: This paper presents a foundational work in the field of Machine Learning. As such, there are no direct positive or negative societal impacts. Guidelines: • The answer NA means that th...
-
[71]
Hence, we do not introduce additional safeguards
Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: Our work does not pose any additional risks beyond those of common deep reinforcement learning ...
-
[72]
We follow the licenses of all assets used in our work
Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: Our paper cites all relevant assets, and our open-sourced repository includes a credits section ...
-
[73]
Guidelines: • The answer NA means that the paper does not release new assets
New assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [Yes] Justification: Our open-sourced repository includes all new assets and is well documented. Guidelines: • The answer NA means that the paper does not release new assets. • Researchers should communicate the detai...
-
[74]
Crowdsourcing and research with human subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: The paper does not involve crowdsourcing nor research...
-
[75]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
-
[76]
Declaration of LLM usage Question: Does the paper describe the usage of LLMs if it is an important, original, or non-standard component of the core methods in this research? Note that if the LLM is used only for writing, editing, or formatting purposes and does not impact the core methodology, scientific rigorousness, or originality of the research, decla...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.