{"total":13,"items":[{"citing_arxiv_id":"2607.02092","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Guided Action Flow: Q-Guided Inference for Flow-Matching Vision-Language-Action Policies","primary_cat":"cs.RO","submitted_at":"2026-07-02T12:30:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Guided Action Flow applies a rollout-trained critic to steer frozen flow-matching VLA policies at inference time via action gradients, reporting success rate gains on LIBERO manipulation tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02735","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs","primary_cat":"cs.RO","submitted_at":"2026-06-01T18:02:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"S2 improves generalization in vision-language-action models by using goal-preserving refined language guidance and explicit visual evidence budgets, raising mean subtask success from 54.2% to 79.0% on eight real-robot tasks compared to pi0.5.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29864","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-28T12:49:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FEC conditions policies on LLM-guided short-horizon future videos via a three-stage pipeline, yielding performance gains for BC+RL over no-future baselines on RoboCasa and CALVIN while mismatched futures degrade results.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17077","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning","primary_cat":"cs.RO","submitted_at":"2026-05-16T16:52:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15536","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkiP: When to Skip and When to Refine for Efficient Robot Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-15T02:16:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SkiP introduces action relabeling and Motion Spectrum Keying to skip redundant steps in robot trajectories, cutting executed steps by 15-40% while maintaining success rates across 72 simulated and 3 real tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13428","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SID: Sliding into Distribution for Robust Few-Demonstration Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-13T12:22:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SID achieves approximately 90% success on six real-world manipulation tasks with only two demonstrations under out-of-distribution initializations, with less than 10% performance drop under distractors and disturbances.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"Perceiver-actor: A multi-task transformer for robotic ma- nipulation. InConference on Robot Learning, pages 785- 799. PMLR, 2023. [38] Andreas Sochopoulos, Nikolay Malkin, Nikolaos Tsagkas, Jo ˜ao Moura, Michael Gienger, and Sethu Vijayakumar. Fast flow-based visuomotor policies via conditional optimal transport couplings, 2025. URL https://arxiv.org/abs/2505.01179. [39] Zhian Su, Yicheng Ma, Haotian Guo, and Huixu Dong. Construction of bin-picking system for logistic applica- tion: A hybrid robotic gripper and vision-based grasp planning.IEEE Robotics and Automation Letters, 10(8): 8300-8307, 2025. doi: 10.1109/LRA.2025.3585393. [40] Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh, and Jeannette Bohg. Kite: Keypoint-conditioned"},{"citing_arxiv_id":"2605.12090","ref_index":261,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Action Models: The Next Frontier in Embodied AI","primary_cat":"cs.RO","submitted_at":"2026-05-12T13:10:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09989","ref_index":4,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception","primary_cat":"cs.RO","submitted_at":"2026-05-11T05:06:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"StereoPolicy fuses left-right image features via cross-attention to deliver consistent gains over RGB, RGB-D, point cloud, and multi-view baselines in simulation and real-robot manipulation tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Erez, S. Cabi, S. Tunyasuvunakool, J. Kram'ar, R. Had- sell, N. de Freitas, and N. Heess. Reinforcement and imitation learning for diverse visuomotor skills, 2018. URLhttps://arxiv.org/abs/1802.09564. [3] M. Shridhar, L. Manuelli, and D. Fox. Cliport: What and where pathways for robotic manipu- lation.arXiv preprint arXiv: Arxiv-2109.12098, 2021. [4] S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y . Sulsky, J. Kay, J. T. Springenberg, T. Eccles, J. Bruce, A. Razavi, A. Edwards, N. Heess, Y . Chen, R. Hadsell, O. Vinyals, M. Bordbar, and N. de Freitas. A generalist agent.arXiv preprint arXiv: Arxiv-2205.06175, 2022. [5] Y . Jiang, A. Gupta, Z. Zhang, G."},{"citing_arxiv_id":"2605.06481","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-07T16:06:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[65] Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. SAM 2: Segment Anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. [66] Hao Shi, Bin Xie, Yingfei Liu, Lin Sun, Fengrong Liu, et al. MemoryVLA: Perceptual-cognitive memory in vision-language-action models for robotic manipulation.arXiv preprint arXiv:2508.19236, 2025. [67] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. CLIPort: What and where pathways for robotic manipulation. InConference on Robot Learning (CoRL), 2021. arXiv:2109.12098. [68] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-Actor: A multi-task transformer for robotic manipulation. InConference on Robot Learning (CoRL), 2022. arXiv:2209.05451."},{"citing_arxiv_id":"2511.02239","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LACY: A Vision-Language Model-based Language-Action Cycle for Self-Improving Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2025-11-04T04:02:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LACY is a VLM framework jointly trained on L2A, A2L, and L2C tasks that uses an active augmentation cycle to self-improve robotic manipulation policies, reporting a 56.46% average success rate gain in simulation and real-world experiments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.02117","ref_index":82,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation","primary_cat":"cs.RO","submitted_at":"2024-01-04T07:55:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A low-cost whole-body teleoperation system enables effective imitation learning for complex bimanual mobile manipulation by co-training on mobile and static demonstration datasets.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Karen Yang, and Jeannette Bohg. Con- cept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research , 40(12-14):1419-1434, 2021. 3 [81] Lucy Xiaoyang Shi, Archit Sharma, Tony Z Zhao, and Chelsea Finn. Waypoint-based imitation learning for robotic manipulation. CoRL, 2023. 2, 3, 5 [82] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021. 3 [83] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-actor: A multi-task trans- former for robotic manipulation. ArXiv, abs/2209.05451, 2022. 3 [84] Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, and Sergey Levine."},{"citing_arxiv_id":"2305.16291","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","primary_cat":"cs.AI","submitted_at":"2023-05-25T17:46:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more unique items and 15.3x faster milestone unlocks than prior methods while generalizing技能","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 17021-17036, 2021. [19] Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi (Jim) Fan. Vima: General robot manipu- lation with multimodal prompts. ARXIV .ORG, 2022. 12 [20] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. arXiv preprint arXiv: Arxiv-2109.12098, 2021. [21] Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, and Animashree Anandkumar. SECANT: self-expert cloning for zero-shot generalization of visual policies. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on"},{"citing_arxiv_id":"2304.13705","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware","primary_cat":"cs.RO","submitted_at":"2023-04-23T19:10:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Low-cost imprecise robots achieve 80-90% success on six fine bimanual manipulation tasks using imitation learning with a new Action Chunking with Transformers algorithm trained on only 10 minutes of demonstrations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[50] Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, and Ken Goldberg. Sgtm 2.0: Autonomously untangling long cables using interactive perception. ArXiv, abs/2209.13706, 2022. [51] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021. [52] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022. [53] Aravind Sivakumar, Kenneth Shaw, and Deepak Pathak. Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. RSS, 2022. [54] Christian Smith, Yiannis Karayiannidis, Lazaros Nal-"}],"limit":50,"offset":0}