{"total":12,"items":[{"citing_arxiv_id":"2605.30639","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PInVerify: An Offline Embodied Benchmark for Active Instance Verification","primary_cat":"cs.CV","submitted_at":"2026-05-28T22:42:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PInVerify is a new offline embodied benchmark for active instance verification that supplies multi-view captures and 6-sector navigation topology, with MLLM baselines reaching 85.6% after fine-tuning but showing no reliable benefit from tested next-best-view strategies.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09423","ref_index":45,"ref_count":4,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning","primary_cat":"cs.AI","submitted_at":"2026-05-10T08:51:50+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.","context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"Behavior-1k: A human-centered, embodied ai benchmark with 1,000 everyday activities and realistic simulation, 2024. URLhttps://arxiv.org/abs/2403.09227. [44] William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, and Yecheng Ja- son Ma. Eurekaverse: Environment curriculum generation via large language models.arXiv preprint arXiv:2411.01775, 2024. URLhttps://arxiv.org/abs/2411.01775. [45] Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents.arXiv preprint arXiv:2308.03688, 2023. URLhttps://arxiv.org/abs/2308.03688. [46] Xiaokang Liu, Zechen Bai, Hai Ci, Kevin Yuchen Ma, and Mike Zheng Shou. World-vla-loop: Closed-loop learning of video world model and vla policy."},{"citing_arxiv_id":"2604.22409","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SpaMEM: Benchmarking Dynamic Spatial Reasoning via Perception-Memory Integration in Embodied Environments","primary_cat":"cs.CV","submitted_at":"2026-04-24T10:06:41+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[14] Lei, J., et al.: Tvqa: Localized, compositional video question answering. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2018),https://arxiv.org/abs/1809.0169620 [15] Li, B., et al.: Llava-onevision: Easy visual task transfer. arXiv preprint arXiv:2408.03326 (2024),https://arxiv.org/abs/2408.033268 [16] Li, C., et al.: igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. arXiv preprint arXiv:2108.03272 (2021),https: //arxiv.org/abs/2108.032723 [17] Li,Y.,Cao,Z.,Liang,A.,Liang,B.,Chen,L.,Zhao,H.,Feng,C.:Egocentric prediction of action target in 3d. arXiv preprint arXiv:2203.13116 (2022), https://arxiv.org/abs/2203."},{"citing_arxiv_id":"2604.22363","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios","primary_cat":"cs.RO","submitted_at":"2026-04-24T08:53:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15805","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation","primary_cat":"cs.RO","submitted_at":"2026-04-17T08:06:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Digital Cousins is a generative real-to-sim method that creates diverse high-fidelity simulation scenes from real panoramas to improve generalization in robot learning and evaluation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Xu, Zixiang Zhou, Zunnan Xu, Yangyu Tao, Qinglin Lu, Songtao Liu, Dax Zhou, Hongfa Wang, Yong Yang, Di Wang, Yuhong Liu, Jie Jiang, and Caesar Zhong. Hunyuanvideo: A systematic framework for large video generative models, 2025. URL https://arxiv.org/abs/2412. 03603. [26] World Labs. Marble, 2025. URL https://marble. worldlabs.ai. Accessed: 2026-01-25. [27] Chengshu Li, Fei Xia, Roberto Mart 'ın-Mart'ın, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, et al. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks.arXiv preprint arXiv:2108.03272, 2021. [28] Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gok- men, Sanjana Srivastava, Roberto Mart 'ın-Mart'ın, Chen"},{"citing_arxiv_id":"2604.12626","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting","primary_cat":"cs.RO","submitted_at":"2026-04-14T11:52:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Embodied AI research relies on simulation environments for large-scale parallel trainingandsubsequentSim-to-Realpolicytransfer[23,30].Wecompareexisting platforms along four dimensions, namelyrender asset type,humanoid avatar support,platform openness, andhardware requirements, in Tab. 1. The predominant embodied AI simulators, including Habitat-Sim [21,23,27], iGibson [11,24], AI2-THOR [9], ThreeDWorld [5], and SAPIEN [32], all employ mesh-based rasterization for scene rendering. Regarding humanoid avatars, while some platforms now provide some form of mesh-based representation, such as URDF-driven articulated bodies [9,32], rigid-body tracks [11], and Unity Repli- cants [5], these uniformly lack the visual fidelity needed for training agents to"},{"citing_arxiv_id":"2602.08392","ref_index":62,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs","primary_cat":"cs.RO","submitted_at":"2026-02-09T08:47:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Ai2-thor: An interactive 3d environment for visual ai.arXiv preprint arXiv:1712.05474, 2017. [61] Zhiqian Lan, Yuxuan Jiang, Ruiqi Wang, Xuanbing Xie, Rongkui Zhang, Yicheng Zhu, Peihang Li, Tianshuo Yang, Tianxing Chen, Haoyu Gao, et al. Autobio: A simulation and benchmark for robotic automation in digital biology laboratory.arXiv preprint arXiv:2505.14030, 2025. [62] Chengshu Li, Fei Xia, Roberto Mart'ın-Mart'ın, Michael Lin- gelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, et al. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks.arXiv preprint arXiv:2108.03272, 2021. [63] Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Mart 'ın-Mart'ın, Chen Wang,"},{"citing_arxiv_id":"2512.14671","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ART: Articulated Reconstruction Transformer","primary_cat":"cs.CV","submitted_at":"2025-12-16T18:35:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ART is a category-agnostic transformer that maps sparse multi-state RGB images to per-part 3D geometry, texture, and articulation parameters via learnable part slots.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.20349","ref_index":14,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior","primary_cat":"q-bio.NC","submitted_at":"2025-02-27T18:20:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Advocates integrating naturalistic paradigms and AI progress into cognitive science to develop generalizable models of natural behavior while retaining experimental control and theoretical insight.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.02523","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots","primary_cat":"cs.RO","submitted_at":"2024-06-04T17:41:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[24] Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with large-scale data collection. In ISER, pages 173-184, 2016. [25] Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020. [26] Chengshu Li, Fei Xia, Roberto Mart 'ın-Mart'ın, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, et al. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. arXiv preprint arXiv:2108.03272, 2021. [27] Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gok- men, Sanjana Srivastava, Roberto Mart 'ın-Mart'ın, Chen"},{"citing_arxiv_id":"2401.03568","ref_index":58,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agent AI: Surveying the Horizons of Multimodal Interaction","primary_cat":"cs.AI","submitted_at":"2024-01-07T19:11:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.06114","ref_index":157,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Interactive Real-World Simulators","primary_cat":"cs.AI","submitted_at":"2023-10-09T19:42:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}