WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform
Pith reviewed 2026-05-20 10:56 UTC · model grok-4.3
The pith
WorldArena 2.0 broadens embodied world model testing to touch sensing, interactive policy training, and real robot platforms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WorldArena 2.0 extends embodied world model evaluation along three axes: modality from vision-only to visuotactile, functionality from policy evaluation and planning to use as interactive RL environments for policy optimization, and platform from simulator-only to a suite of simulated and real-world robotic settings across multiple embodiments, all under a standardized protocol that assesses perceptual quality, interactive utility, and cross-platform performance.
What carries the argument
The WorldArena 2.0 benchmark, which broadens evaluation along modality, functionality, and platform dimensions under one standardized protocol.
If this is right
- World models become testable for their ability to incorporate tactile signals into future predictions.
- The same model can now be evaluated both as a predictor and as a live environment that improves robot policies through trial-and-error interaction.
- Performance gaps between simulation and physical robots can be quantified directly for each model.
- Progress in embodied world models can be tracked consistently across vision, touch, planning, and real hardware.
Where Pith is reading between the lines
- A shared benchmark of this form could reduce duplication of effort when different research groups test new world-model architectures.
- Extending the protocol later to include audio or proprioception would follow the same logic already used for touch.
- Models that rank high here may still require separate checks for long-horizon safety before deployment on physical systems.
Load-bearing premise
The chosen additions for modality, functionality, and platform together with the fixed protocol are enough to judge increasingly capable world models without adding new evaluation biases or missing important gaps.
What would settle it
If top-scoring models on WorldArena 2.0 show no corresponding gains in real-world task success rates outside the benchmark suite, the claim that it provides a sufficient testbed would be weakened.
Figures
read the original abstract
World models have emerged as a central paradigm for embodied intelligence, enabling agents to predict action-conditioned future and reason about environmental dynamics. However, existing embodied world model benchmarks are still largely confined to vision-only prediction, offline embodied applications, and simulator-based evaluation, making them insufficient for assessing increasingly comprehensive world models. In this work, we introduce WorldArena 2.0, an expanded benchmark that systematically broadens embodied world model evaluation along three dimensions: modality, functionality, and platform. Along the modality dimension, WorldArena 2.0 extends evaluation from vision-only to visuotactile modalities, enabling assessment of multimodal perception and prediction. Along the functionality dimension, it extends beyond policy evaluation and planning to assess world models as interactive RL environments for policy optimization. Along the platform dimension, it moves beyond simulator-only evaluation to a diverse suite of simulated and real-world robotic settings across multiple embodiments. Under a standardized protocol, WorldArena 2.0 comprehensively evaluates perceptual quality, interactive utility, and cross-platform performance, providing a comprehensive testbed for tracking progress toward embodied world models. The benchmark is available at: https://world-arena.ai.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces WorldArena 2.0 as an expanded benchmark for embodied world models. It extends evaluation along three axes: modality (adding visuotactile to vision-only), functionality (adding use as interactive RL environments for policy optimization beyond planning and offline evaluation), and platform (adding diverse simulated and real robotic embodiments beyond simulators). A standardized protocol is proposed to assess perceptual quality, interactive utility, and cross-platform performance, with the benchmark released at https://world-arena.ai.
Significance. If the extensions are accompanied by concrete validation data and closed-loop transfer results, the benchmark could serve as a useful standardized testbed for tracking progress on more comprehensive embodied world models, filling gaps left by existing vision-only, offline, and simulator-centric suites.
major comments (2)
- [Functionality dimension] Functionality dimension (as described in the abstract and § on extensions): the claim that world models can be assessed as interactive RL environments requires closed-loop policy transfer experiments. Policies optimized inside the world model must be transferred to the target simulated or real platforms and compared against direct baselines; without such results, in-model success rates risk inflation from compounding inaccuracies and do not substantiate the interactive utility dimension.
- [Abstract] Abstract and overall evaluation claims: the manuscript describes the intended extensions and standardized protocol but supplies no concrete results, validation data, error analysis, or quantitative tables. This absence prevents verification that the chosen modality, functionality, and platform extensions actually deliver comprehensive coverage without new biases or gaps.
minor comments (1)
- [Benchmark release] The benchmark availability statement could include explicit details on protocol documentation, dataset access, and reproducibility instructions beyond the URL.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript introducing WorldArena 2.0. We address the major comments point by point below, agreeing where revisions are needed to strengthen the validation of the benchmark extensions.
read point-by-point responses
-
Referee: [Functionality dimension] Functionality dimension (as described in the abstract and § on extensions): the claim that world models can be assessed as interactive RL environments requires closed-loop policy transfer experiments. Policies optimized inside the world model must be transferred to the target simulated or real platforms and compared against direct baselines; without such results, in-model success rates risk inflation from compounding inaccuracies and do not substantiate the interactive utility dimension.
Authors: We agree that demonstrating closed-loop policy transfer is important for fully substantiating the interactive utility dimension. The manuscript currently focuses on establishing the standardized protocol for using world models as RL environments and includes preliminary in-model optimization results along with some cross-platform consistency checks. However, we acknowledge that more extensive transfer experiments comparing policies trained in the world model against direct baselines on both simulated and real platforms would provide stronger evidence against compounding errors. We will add these closed-loop transfer results and analyses in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract and overall evaluation claims: the manuscript describes the intended extensions and standardized protocol but supplies no concrete results, validation data, error analysis, or quantitative tables. This absence prevents verification that the chosen modality, functionality, and platform extensions actually deliver comprehensive coverage without new biases or gaps.
Authors: We appreciate this point. The current manuscript emphasizes the design of the three-dimensional extensions and the unified evaluation protocol, supported by illustrative examples rather than exhaustive quantitative benchmarks. We recognize that including concrete validation data, error analyses, and quantitative tables would better demonstrate the coverage and lack of introduced biases. We will expand the evaluation sections with additional results, tables, and analyses in the revision to address this. revision: yes
Circularity Check
No circularity: benchmark definition is self-contained with no derived predictions or load-bearing self-citations
full rationale
The paper introduces WorldArena 2.0 as an expanded benchmark suite that broadens evaluation along modality, functionality, and platform dimensions under a standardized protocol. No equations, fitted parameters, or predictions are described that could reduce to the paper's own inputs by construction. The central claims concern the creation and application of this independent evaluation testbed rather than deriving results from prior self-citations or ansatzes. The functionality extension to interactive RL environments is presented as a direct assessment protocol without any reduction to fitted quantities or uniqueness theorems imported from the authors' prior work. This is the most common honest finding for benchmark papers that define new evaluation protocols without claiming to derive quantitative results from their own definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A standardized protocol across modalities and platforms can produce comparable and meaningful assessments of world model quality.
Reference graph
Works this paper leans on
-
[1]
Jingtao Ding, Yunke Zhang, Yu Shang, Yuheng Zhang, Zefang Zong, Jie Feng, Yuan Yuan, Hongyuan Su, Nian Li, Nicholas Sukiennik, et al. Understanding world or predicting future? a comprehensive survey of world models.ACM Computing Surveys, 58(3):1–38, 2025
work page 2025
-
[2]
A survey of embodied world models
Yu Shang, Yinzhou Tang, Xin Zhang, Shengyuan Wang, Yuwei Yan, Honglin Zhang, Zhiheng Zheng, Jie Zhao, Jie Feng, Chen Gao, et al. A survey of embodied world models. 2026
work page 2026
-
[3]
Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, et al. A survey: Learning embodied intelligence from physical simulators and world models.arXiv preprint arXiv:2507.00917, 2025
-
[4]
Dacheng Li, Yunhao Fang, Yukang Chen, Shuo Yang, Shiyi Cao, Justin Wong, Michael Luo, Xiaolong Wang, Hongxu Yin, Joseph E Gonzalez, et al. Worldmodelbench: Judging video generation models as world models.arXiv preprint arXiv:2502.20694, 2025
-
[5]
Haoyi Duan, Hong-Xing Yu, Sirui Chen, Li Fei-Fei, and Jiajun Wu. Worldscore: A unified evaluation benchmark for world generation.arXiv preprint arXiv:2504.00983, 2025
-
[6]
Hu Yue, Siyuan Huang, Yue Liao, Shengcong Chen, Pengfei Zhou, Liliang Chen, Maoqing Yao, and Guanghui Ren. Ewmbench: Evaluating scene, motion, and semantic quality in embodied world models.arXiv preprint arXiv:2505.09694, 2025
-
[7]
Yiran Qin, Zhelun Shi, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao, et al. Worldsimbench: Towards video generation models as world simulators.arXiv preprint arXiv:2410.18072, 2024
-
[8]
Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025
Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, and Yi Xu. Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025
-
[9]
Chun-Kai Fan, Xiaowei Chi, Xiaozhu Ju, Hao Li, Yong Bao, Yu-Kai Wang, Lizhang Chen, Zhiyuan Jiang, Kuangzhi Ge, Ying Li, et al. Wow, wo, val! a comprehensive embodied world model evaluation turing test.arXiv preprint arXiv:2601.04137, 2026
-
[10]
Yu Shang, Zhuohang Li, Yiding Ma, Weikang Su, Xin Jin, Ziyou Wang, Lei Jin, Xin Zhang, Yinzhou Tang, Haisheng Su, et al. Worldarena: A unified benchmark for evaluating perception and functional utility of embodied world models.arXiv preprint arXiv:2602.08971, 2026
-
[11]
World-in-world: World models in a closed-loop world.arXiv preprint arXiv:2510.18135, 2025
Jiahan Zhang, Muqing Jiang, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M Patel, Paul Pu Liang, et al. World-in-world: World models in a closed-loop world.arXiv preprint arXiv:2510.18135, 2025
-
[12]
Baijun Chen, Weijie Wan, Tianxing Chen, Xianda Guo, Congsheng Xu, Yuanyang Qi, Haojie Zhang, Longyan Wu, Tianling Xu, Zixuan Li, et al. Univtac: A unified simulation platform for visuo-tactile manipulation data generation, learning, and benchmarking.arXiv preprint arXiv:2602.10093, 2026
-
[13]
A comprehensive survey on world models for embodied ai.arXiv preprint arXiv:2510.16732, 2025
Xinqing Li, Xin He, Le Zhang, Min Wu, Xiaoli Li, and Yun Liu. A comprehensive survey on world models for embodied ai.arXiv preprint arXiv:2510.16732, 2025. 10
-
[14]
Mastering atari with discrete world models
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations
-
[15]
Mastering diverse control tasks through world models.Nature, pages 1–7, 2025
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, pages 1–7, 2025
work page 2025
-
[16]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Bohan Li, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, et al. Is sora a world simulator? a comprehensive survey on general world models and beyond.arXiv preprint arXiv:2405.03520, 2024
-
[19]
Cosmos world foundation models for physical ai
Jinwei Gu. Cosmos world foundation models for physical ai. InProceedings of the 3rd International Workshop on Rich Media With Generative AI, pages 39–39, 2025
work page 2025
-
[20]
Roboscape: Physics-informed embodied world model
Yu Shang, Xin Zhang, Yinzhou Tang, Lei Jin, Chen Gao, Wei Wu, and Yong Li. Roboscape: Physics-informed embodied world model.arXiv preprint arXiv:2506.23135, 2025
-
[21]
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation.arXiv preprint arXiv:2510.10125, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, et al. Abot-physworld: Interactive world foundation model for robotic manipulation with physics alignment.arXiv preprint arXiv:2603.23376, 2026
-
[23]
Xiaowei Chi, Peidong Jia, Chun-Kai Fan, Xiaozhu Ju, Weishi Mi, Kevin Zhang, Zhiyuan Qin, Wanxin Tian, Kuangzhi Ge, Hao Li, et al. Wow: Towards a world omniscient world model through embodied interaction.arXiv preprint arXiv:2509.22642, 2025
-
[24]
Vidar: Embodied Video Diffusion Model for Generalist Manipulation
Yao Feng, Hengkai Tan, Xinyi Mao, Chendong Xiang, Guodong Liu, Shuhe Huang, Hang Su, and Jun Zhu. Vidar: Embodied video diffusion model for generalist manipulation.arXiv preprint arXiv:2507.12898, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
Irasim: Learning interactive real-robot action simulators.arXiv preprint arXiv:2406.14540, 2024
Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Irasim: Learning interactive real-robot action simulators.arXiv preprint arXiv:2406.14540, 2024
-
[26]
Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, and Mingsheng Long. ivideogpt: Interactive videogpts are scalable world models.Advances in Neural Information Processing Systems, 37:68082–68119, 2024
work page 2024
-
[27]
Motus: A Unified Latent Action World Model
Hongzhe Bi, Hengkai Tan, Shenghao Xie, Zeyuan Wang, Shuhe Huang, Haitian Liu, Ruowen Zhao, Yao Feng, Chendong Xiang, Yinze Rong, et al. Motus: A unified latent action world model.arXiv preprint arXiv:2512.13030, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Causal World Modeling for Robot Control
Lin Li, Qihang Zhang, Yiming Luo, Shuai Yang, Ruilin Wang, Fei Han, Mingrui Yu, Zelin Gao, Nan Xue, Xing Zhu, et al. Causal world modeling for robot control.arXiv preprint arXiv:2601.21998, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[30]
Fast-WAM: Do World Action Models Need Test-time Future Imagination?
Tianyuan Yuan, Zibin Dong, Yicheng Liu, and Hang Zhao. Fast-wam: Do world action models need test-time future imagination?arXiv preprint arXiv:2603.16666, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[31]
Yiting Lu, Wei Luo, Peiyan Tu, Haoran Li, Hanxin Zhu, Zihao Yu, Xingrui Wang, Xinyi Chen, Xinge Peng, Xin Li, et al. 4dworldbench: A comprehensive evaluation framework for 3d/4d world generation models.arXiv preprint arXiv:2511.19836, 2025. 11
-
[32]
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, et al. Genie envisioner: A unified world foundation platform for robotic manipulation.arXiv preprint arXiv:2508.05635, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
GigaWorld Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jiagang Zhu, Kerui Li, Mengyuan Xu, et al. Gigaworld-0: World models as data engine to empower embodied ai.arXiv preprint arXiv:2511.19861, 2025
-
[34]
Haoran Yuan, Weigang Yi, Zhenyu Zhang, Wendi Chen, Yuchen Mo, Jiashi Yin, Xinzhuo Li, Xiangyu Zeng, Chuan Wen, Cewu Lu, Katherine Driggs-Campbell, and Ismini Lourentzou. Vtam: Video-tactile-action models for complex physical interaction beyond vlas.arXiv preprint arXiv:2603.23481, 2026
-
[35]
Visuo-tactile world models.arXiv preprint arXiv:2602.06001, 2026
Carolina Higuera, Sergio Arnaud, Byron Boots, Mustafa Mukadam, Francois Robert Hogan, and Franziska Meier. Visuo-tactile world models.arXiv preprint arXiv:2602.06001, 2026
-
[36]
Yuhang Zheng, Songen Gu, Weize Li, Yupeng Zheng, Yujie Zang, Shuai Tian, Xiang Li, Ce Hao, Chen Gao, Si Liu, Haoran Li, Yilun Chen, Shuicheng Yan, and Wenchao Ding. Omnivta: Visuo- tactile world modeling for contact-rich robotic manipulation.arXiv preprint arXiv:2603.19201, 2026
-
[37]
Vbench: Comprehensive benchmark suite for video generative models
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, et al. Vbench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21807–21818, 2024
work page 2024
-
[38]
World Action Models are Zero-shot Policies
Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, Shenyuan Gao, Sihyun Yu, George Kurian, Suneel Indupuru, You Liang Tan, Chuning Zhu, Jiannan Xiang, Ayaan Malik, Kyungmin Lee, William Liang, Nadun Ranawaka, Jiasheng Gu, Yinzhen Xu, Guanzhi Wang, Fengyuan Hu, Avnish Narayan, Johan Bjorck, Jing Wang, Gwanghyun Kim, Dantong Niu, Ruijie Zheng, Yuqi Xie, Jimmy Wu, Qi ...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Junjin Xiao, Yandan Yang, Xinyuan Chang, Ronghan Chen, Feng Xiong, Mu Xu, Wei-Shi Zheng, and Qing Zhang. World-env: Leveraging world model as a virtual environment for vla post-training.arXiv preprint arXiv:2509.24948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Yinzhou Tang, Yu Shang, Yinuo Chen, Bingwen Wei, Xin Zhang, Shu’ang Yu, Liangzhi Shi, Chao Yu, Chen Gao, Wei Wu, et al. Roboscape-r: Unified reward-observation world models for generalizable robotics training via rl.arXiv preprint arXiv:2512.03556, 2025
-
[41]
Fangqi Zhu, Zhengyang Yan, Zicong Hong, Quanxin Shou, Xiao Ma, and Song Guo. Wmpo: World model-based policy optimization for vision-language-action models.arXiv preprint arXiv:2511.09515, 2025
-
[42]
Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. InRobotics: Science and Systems (RSS), Daegu, Republic of Korea, July 2023
work page 2023
- [43]
-
[44]
Open-Sora: Democratizing Efficient Video Production for All
Zangwei Zheng, Xiangyu Peng, Tianji Yang, Chenhui Shen, Shenggui Li, Hongxin Liu, Yukun Zhou, Tianyi Li, and Yang You. Open-sora: Democratizing efficient video production for all. arXiv preprint arXiv:2412.20404, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Irasim: A fine-grained world model for robot manipulation
Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Irasim: A fine-grained world model for robot manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9834–9844, 2025. 12
work page 2025
-
[46]
World Simulation with Video Foundation Models for Physical AI
Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical ai.arXiv preprint arXiv:2511.00062, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[48]
Hongzhi Zang, Mingjie Wei, Si Xu, Yongji Wu, Zhen Guo, Yuanqing Wang, Hao Lin, Liangzhi Shi, Yuqing Xie, Zhexuan Xu, et al. Rlinf-vla: A unified and efficient framework for vla+ rl training.arXiv preprint arXiv:2510.06710, 2025
-
[49]
Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Zixuan Li, Qiwei Liang, Xianliang Lin, Yiheng Ge, Zhenyu Gu, et al. Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsc...
work page 2025
-
[51]
PlayWorld: Learning Robot World Models from Autonomous Play
Tenny Yin, Zhiting Mei, Zhonghe Zheng, Miyu Yamane, David Wang, Jade Sceats, Samuel M Bateman, Lihan Zha, Apurva Badithela, Ola Shorinwa, et al. Playworld: Learning robot world models from autonomous play.arXiv preprint arXiv:2603.09030, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[52]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 13 A Platform Introduction RoboTwin 2.0is a scalable bimanual simulation environment comprising 731 objects ac...
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.