{"work":{"id":"f9ca0722-8855-48c3-a27a-0eefb7e19253","openalex_id":null,"doi":null,"arxiv_id":"2405.12213","raw_key":null,"title":"Octo: An Open-Source Generalist Robot Policy","authors":null,"authors_text":"Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees","year":2024,"venue":"cs.RO","abstract":"Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-source, widely applicable, generalist policies for robotic manipulation. As a first step, we introduce Octo, a large transformer-based policy trained on 800k trajectories from the Open X-Embodiment dataset, the largest robot manipulation dataset to date. It can be instructed via language commands or goal images and can be effectively finetuned to robot setups with new sensory inputs and action spaces within a few hours on standard consumer GPUs. In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces. We also perform detailed ablations of design decisions for the Octo model, from architecture to training data, to guide future research on building generalist robot models.","external_url":"https://arxiv.org/abs/2405.12213","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T14:13:30.503003+00:00","pith_arxiv_id":"2405.12213","created_at":"2026-05-10T00:59:49.466018+00:00","updated_at":"2026-06-29T14:13:30.503003+00:00","title_quality_ok":true,"display_title":"Octo: An Open-Source Generalist Robot Policy","render_title":"Octo: An Open-Source Generalist Robot Policy"},"hub":{"state":{"work_id":"f9ca0722-8855-48c3-a27a-0eefb7e19253","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":163,"external_cited_by_count":null,"distinct_field_count":5,"first_pith_cited_at":"2024-10-08T16:00:47+00:00","last_pith_cited_at":"2026-06-26T09:13:16+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T14:38:56.470323+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":36},{"context_role":"baseline","n":14},{"context_role":"dataset","n":2},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":36},{"context_polarity":"baseline","n":15},{"context_polarity":"use_dataset","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Octo: An Open-Source Generalist Robot Policy","claims":[{"claim_text":"Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-so","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"Linxi Fan, Yu Fang, Dieter Fox, et al. Gr00t n1: An open foundation model for generalist humanoid robots, 2025. URLhttps://arxiv.org/abs/2503.14734. [46] Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy, 2024. URLhttps://arxiv.org/abs/2405.12213. [47] Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"manipvla: Transferring vision-language-action models for general mobile manip- ulation. InProceedings of the Computer Vision and Pattern Recognition Conference. 1714-1723. [30] Zhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che, and Jian Tang. 2024. A survey on robotics with foundation models: toward embodied ai.arXiv preprint arXiv:2402.02385(2024). [31] Jiabing Yang, Yixiang Chen, Yuan Xu, Peiyan Li, Xiangnan Wu, Zichen Wen, Bowen Fang, Tao Yu, Zhengbo Zhang, Yingda Li, et al.","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"-We validate the effectiveness of our method through extensive experiments, including simulations and real-world evaluations, showing consistent im- provements over existing baselines. 4 Li et al. 2 Related Work 2.1 Vision-Language-Action Models Building upon the success of Vision-Language Models (VLM) [1, 42, 57, 87], Vision-Language-Action (VLA) [8,36,61] models have made significant strides in embodied intelligence. Current VLA research predominantly employs autore- gressive tokenization stra","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Vision-Language-Action (VLA) models, which have significantly advanced the field of embodied AI. Early pioneering works, such as RT-1 [5] and RT-2 [47], demonstrated that pre-trained Vision- Language Models (VLMs) could be effectively repurposed to map raw visual observations and natural language instructions directly into low-level robot actions. Following this paradigm, open- source models like OpenVLA [ 21] and Octo [ 37] have become standard baselines, leveraging powerful 2D vision backbones","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"OneWM-VLA reaches 98.1% on average, above π0.5 (96.85%). On the Long suite, OneWM-VLA reaches 95.6%, +10.4 over π0 and +3.2 over π0.5. All 6 Table 2: Main results on LIBERO. Success rates (%) on the four suites. Inference-time configura- tion is detailed in Appendix D. Method Spatial Object Goal Long Avg% Diffusion Policy [14] 78.3 85.7 88.4 68.3 72.4 Octo [41] 78.9 84.6 89.9 79.2 76.5 OpenVLA [27] 84.7 88.2 92.5 78.6 78.1 SpatialVLA [35] 88.2 89.9 92.5 79.2 78.1 CoT-VLA [48] 81.1 87.5 91.6 87.6","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"We report success rate (%) for each category. 4.2 Baselines We compare against a diverse set of recent VLA and robot policy baselines. On LIBERO, we include π0+FAST [33], OpenVLA-OFT [25], π0 [2], FLOWER [35], GR00T-N1.5 [30], and BEAST [50]. On LIBERO-plus, we compare against OpenVLA [ 24], OpenVLA-OFT, π0, π0-fast, Nora [ 20], WorldVLA [8], UniVLA [47], and RIPT-VLA [45]. 5 Table 1: Experimental Results on the LIBERO Benchmark. Success rate (%) is reported for each task suite.Boldindicates the","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Octo: An Open-Source Generalist Robot Policy because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (14 contexts).","role_counts":[{"n":14,"context_role":"background"},{"n":7,"context_role":"baseline"},{"n":1,"context_role":"dataset"},{"n":1,"context_role":"method"}]},"error":null,"updated_at":"2026-05-16T21:29:14.738905+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"1bbf45e7-c542-4e7d-b72a-21e21b352309","orcid":null,"display_name":"Octo Model Team"},{"id":"62be6f33-eaa7-4375-aae5-d024341f6169","orcid":null,"display_name":"Dibya Ghosh"},{"id":"73036e2b-bfb7-482c-a4f8-700c73a93722","orcid":null,"display_name":"Homer Walke"},{"id":"9821de6a-6143-4939-8865-7d9c02424af9","orcid":null,"display_name":"Karl Pertsch"},{"id":"06730c35-b7c3-40f7-9cf7-d777f261f66c","orcid":null,"display_name":"Kevin Black"},{"id":"0acf1c8e-ce89-43c9-b418-59af5cc82d42","orcid":null,"display_name":"Oier Mees"}]},"error":null,"updated_at":"2026-05-16T21:29:15.377765+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T08:28:04.626398+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"OpenVLA: An Open-Source Vision-Language-Action Model","work_id":"3e7e65c5-5aed-4fe9-8414-2092bcb31cc7","shared_citers":55},{"title":"$\\pi_0$: A Vision-Language-Action Flow Model for General Robot Control","work_id":"f790abdc-a796-482f-a40d-f8ee035ecfc2","shared_citers":45},{"title":"RT-1: Robotics Transformer for Real-World Control at Scale","work_id":"e11bda85-8531-46bc-a07f-d0ade3643ab1","shared_citers":33},{"title":"$\\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization","work_id":"d1ad7304-d09a-49bc-809e-846439f6aff9","shared_citers":27},{"title":"Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success","work_id":"04f46bb3-4346-47e8-bf09-c75d91f96e87","shared_citers":25},{"title":"RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control","work_id":"ff438a8a-8003-4fae-9131-acd418b3597b","shared_citers":23},{"title":"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware","work_id":"6fe159e0-fa73-481a-88d4-4719c15140be","shared_citers":22},{"title":"RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation","work_id":"12319725-bc7d-4c32-a229-ad270a7460bc","shared_citers":22},{"title":"GR00T N1: An Open Foundation Model for Generalist Humanoid Robots","work_id":"e2db69c7-ee8a-4cb7-a761-7b8de1dfcf97","shared_citers":21},{"title":"CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation","work_id":"4b158d3e-3dff-4412-85cd-baa879465a5e","shared_citers":17},{"title":"FAST: Efficient Action Tokenization for Vision-Language-Action Models","work_id":"83a8f966-6cfa-4f21-81f3-87440aae238f","shared_citers":17},{"title":"Open X-Embodiment: Robotic Learning Datasets and RT-X Models","work_id":"62f0fb6c-e6ae-4dc4-95a4-d9dd64b240e8","shared_citers":16},{"title":"DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset","work_id":"13253de2-3d89-415c-8c2f-3adb25d4c337","shared_citers":15},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":14},{"title":"SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics","work_id":"0c5e9314-5fa7-4613-ad12-605a71d561d2","shared_citers":13},{"title":"DINOv2: Learning Robust Visual Features without Supervision","work_id":"26b304e5-b54a-4f26-be7e-83299eca52e4","shared_citers":12},{"title":"Do As I Can, Not As I Say: Grounding Language in Robotic Affordances","work_id":"037320f1-b0a9-4cbe-a639-bfb25409ce71","shared_citers":12},{"title":"3D-VLA: A 3D Vision-Language-Action Generative World Model","work_id":"aebf924c-e761-437e-9cee-f1ccc2e427bd","shared_citers":11},{"title":"Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation","work_id":"e92c2c13-4330-45fe-8231-34a6002626bd","shared_citers":11},{"title":"AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems","work_id":"f797e9ec-510f-43a7-8a0c-18009ce332e5","shared_citers":10},{"title":"Evaluating Real-World Robot Manipulation Policies in Simulation","work_id":"7f4ca6cb-1b94-454c-9623-b52441b74b61","shared_citers":10},{"title":"LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning","work_id":"662203ad-084f-42c4-8e60-977b3173755b","shared_citers":10},{"title":"SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model","work_id":"592041b3-3ca2-4836-8dd4-f8095d8a692b","shared_citers":10},{"title":"UniVLA: Learning to Act Anywhere with Task-centric Latent Actions","work_id":"e05d654d-db73-48f6-9318-381b6798bac9","shared_citers":10}],"time_series":[{"n":4,"year":2024},{"n":5,"year":2025},{"n":64,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T08:28:21.150945+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T08:28:17.225507+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Octo: An Open-Source Generalist Robot Policy","claims":[{"claim_text":"Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-so","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"Linxi Fan, Yu Fang, Dieter Fox, et al. Gr00t n1: An open foundation model for generalist humanoid robots, 2025. URLhttps://arxiv.org/abs/2503.14734. [46] Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy, 2024. URLhttps://arxiv.org/abs/2405.12213. [47] Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"manipvla: Transferring vision-language-action models for general mobile manip- ulation. InProceedings of the Computer Vision and Pattern Recognition Conference. 1714-1723. [30] Zhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che, and Jian Tang. 2024. A survey on robotics with foundation models: toward embodied ai.arXiv preprint arXiv:2402.02385(2024). [31] Jiabing Yang, Yixiang Chen, Yuan Xu, Peiyan Li, Xiangnan Wu, Zichen Wen, Bowen Fang, Tao Yu, Zhengbo Zhang, Yingda Li, et al.","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"-We validate the effectiveness of our method through extensive experiments, including simulations and real-world evaluations, showing consistent im- provements over existing baselines. 4 Li et al. 2 Related Work 2.1 Vision-Language-Action Models Building upon the success of Vision-Language Models (VLM) [1, 42, 57, 87], Vision-Language-Action (VLA) [8,36,61] models have made significant strides in embodied intelligence. Current VLA research predominantly employs autore- gressive tokenization stra","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Vision-Language-Action (VLA) models, which have significantly advanced the field of embodied AI. Early pioneering works, such as RT-1 [5] and RT-2 [47], demonstrated that pre-trained Vision- Language Models (VLMs) could be effectively repurposed to map raw visual observations and natural language instructions directly into low-level robot actions. Following this paradigm, open- source models like OpenVLA [ 21] and Octo [ 37] have become standard baselines, leveraging powerful 2D vision backbones","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"OneWM-VLA reaches 98.1% on average, above π0.5 (96.85%). On the Long suite, OneWM-VLA reaches 95.6%, +10.4 over π0 and +3.2 over π0.5. All 6 Table 2: Main results on LIBERO. Success rates (%) on the four suites. Inference-time configura- tion is detailed in Appendix D. Method Spatial Object Goal Long Avg% Diffusion Policy [14] 78.3 85.7 88.4 68.3 72.4 Octo [41] 78.9 84.6 89.9 79.2 76.5 OpenVLA [27] 84.7 88.2 92.5 78.6 78.1 SpatialVLA [35] 88.2 89.9 92.5 79.2 78.1 CoT-VLA [48] 81.1 87.5 91.6 87.6","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"We report success rate (%) for each category. 4.2 Baselines We compare against a diverse set of recent VLA and robot policy baselines. On LIBERO, we include π0+FAST [33], OpenVLA-OFT [25], π0 [2], FLOWER [35], GR00T-N1.5 [30], and BEAST [50]. On LIBERO-plus, we compare against OpenVLA [ 24], OpenVLA-OFT, π0, π0-fast, Nora [ 20], WorldVLA [8], UniVLA [47], and RIPT-VLA [45]. 5 Table 1: Experimental Results on the LIBERO Benchmark. Success rate (%) is reported for each task suite.Boldindicates the","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Octo: An Open-Source Generalist Robot Policy because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (14 contexts).","role_counts":[{"n":14,"context_role":"background"},{"n":7,"context_role":"baseline"},{"n":1,"context_role":"dataset"},{"n":1,"context_role":"method"}]},"error":null,"updated_at":"2026-05-16T21:29:14.404162+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Octo: An Open-Source Generalist Robot Policy","claims":[{"claim_text":"Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-so","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Octo: An Open-Source Generalist Robot Policy because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T08:28:13.386278+00:00"}},"summary":{"title":"Octo: An Open-Source Generalist Robot Policy","claims":[{"claim_text":"Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-so","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Octo: An Open-Source Generalist Robot Policy because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"OpenVLA: An Open-Source Vision-Language-Action Model","work_id":"3e7e65c5-5aed-4fe9-8414-2092bcb31cc7","shared_citers":55},{"title":"$\\pi_0$: A Vision-Language-Action Flow Model for General Robot Control","work_id":"f790abdc-a796-482f-a40d-f8ee035ecfc2","shared_citers":45},{"title":"RT-1: Robotics Transformer for Real-World Control at Scale","work_id":"e11bda85-8531-46bc-a07f-d0ade3643ab1","shared_citers":33},{"title":"$\\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization","work_id":"d1ad7304-d09a-49bc-809e-846439f6aff9","shared_citers":27},{"title":"Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success","work_id":"04f46bb3-4346-47e8-bf09-c75d91f96e87","shared_citers":25},{"title":"RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control","work_id":"ff438a8a-8003-4fae-9131-acd418b3597b","shared_citers":23},{"title":"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware","work_id":"6fe159e0-fa73-481a-88d4-4719c15140be","shared_citers":22},{"title":"RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation","work_id":"12319725-bc7d-4c32-a229-ad270a7460bc","shared_citers":22},{"title":"GR00T N1: An Open Foundation Model for Generalist Humanoid Robots","work_id":"e2db69c7-ee8a-4cb7-a761-7b8de1dfcf97","shared_citers":21},{"title":"CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation","work_id":"4b158d3e-3dff-4412-85cd-baa879465a5e","shared_citers":17},{"title":"FAST: Efficient Action Tokenization for Vision-Language-Action Models","work_id":"83a8f966-6cfa-4f21-81f3-87440aae238f","shared_citers":17},{"title":"Open X-Embodiment: Robotic Learning Datasets and RT-X Models","work_id":"62f0fb6c-e6ae-4dc4-95a4-d9dd64b240e8","shared_citers":16},{"title":"DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset","work_id":"13253de2-3d89-415c-8c2f-3adb25d4c337","shared_citers":15},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":14},{"title":"SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics","work_id":"0c5e9314-5fa7-4613-ad12-605a71d561d2","shared_citers":13},{"title":"DINOv2: Learning Robust Visual Features without Supervision","work_id":"26b304e5-b54a-4f26-be7e-83299eca52e4","shared_citers":12},{"title":"Do As I Can, Not As I Say: Grounding Language in Robotic Affordances","work_id":"037320f1-b0a9-4cbe-a639-bfb25409ce71","shared_citers":12},{"title":"3D-VLA: A 3D Vision-Language-Action Generative World Model","work_id":"aebf924c-e761-437e-9cee-f1ccc2e427bd","shared_citers":11},{"title":"Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation","work_id":"e92c2c13-4330-45fe-8231-34a6002626bd","shared_citers":11},{"title":"AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems","work_id":"f797e9ec-510f-43a7-8a0c-18009ce332e5","shared_citers":10},{"title":"Evaluating Real-World Robot Manipulation Policies in Simulation","work_id":"7f4ca6cb-1b94-454c-9623-b52441b74b61","shared_citers":10},{"title":"LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning","work_id":"662203ad-084f-42c4-8e60-977b3173755b","shared_citers":10},{"title":"SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model","work_id":"592041b3-3ca2-4836-8dd4-f8095d8a692b","shared_citers":10},{"title":"UniVLA: Learning to Act Anywhere with Task-centric Latent Actions","work_id":"e05d654d-db73-48f6-9318-381b6798bac9","shared_citers":10}],"time_series":[{"n":4,"year":2024},{"n":5,"year":2025},{"n":64,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"62be6f33-eaa7-4375-aae5-d024341f6169","orcid":null,"display_name":"Dibya Ghosh","source":"manual","import_confidence":0.72},{"id":"73036e2b-bfb7-482c-a4f8-700c73a93722","orcid":null,"display_name":"Homer Walke","source":"manual","import_confidence":0.72},{"id":"9821de6a-6143-4939-8865-7d9c02424af9","orcid":null,"display_name":"Karl Pertsch","source":"manual","import_confidence":0.72},{"id":"06730c35-b7c3-40f7-9cf7-d777f261f66c","orcid":null,"display_name":"Kevin Black","source":"manual","import_confidence":0.72},{"id":"1bbf45e7-c542-4e7d-b72a-21e21b352309","orcid":null,"display_name":"Octo Model Team","source":"manual","import_confidence":0.72},{"id":"0acf1c8e-ce89-43c9-b418-59af5cc82d42","orcid":null,"display_name":"Oier Mees","source":"manual","import_confidence":0.72}]}}