{"paper":{"title":"A Pragmatic VLA Foundation Model","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A vision-language-action model trained on 20,000 hours of real-world dual-arm data outperforms competitors in generalization across tasks and platforms.","cross_cats":["cs.CV"],"primary_cat":"cs.RO","authors_text":"Fangjing Wang, Fan Lu, He Sun, Houlong Xiong, Hui Yu, Jingmei Zhao, Kecheng Zheng, Kejia Zhang, Qian Zhu, Ran Cheng, Shi Liu, Shuailei Ma, Shuai Yang, Shuai Zhou, Wei Wu, Xing Zhu, Yiyu Ren, Yong-Lu Li, Yongtao Huang, Yong Wang, Yujun Shen, Yunnan Wang, Zechen Wang, Zhenqi Qiu, Ziyu Wang","submitted_at":"2026-01-26T17:08:04Z","abstract_excerpt":"Offering great potential in robotic manipulation, a capable Vision-Language-Action (VLA) foundation model is expected to faithfully generalize across tasks and platforms while ensuring cost efficiency (e.g., data and GPU hours required for adaptation). To this end, we develop LingBot-VLA with around 20,000 hours of real-world data from 9 popular dual-arm robot configurations. Through a systematic assessment on 3 robotic platforms, each completing 100 tasks with 130 post-training episodes per task, our model achieves clear superiority over competitors, showcasing its strong performance and broa"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"our model achieves clear superiority over competitors, showcasing its strong performance and broad generalizability","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That evaluation on three platforms with 100 tasks and 130 post-training episodes each is sufficient to establish broad generalizability across tasks and platforms","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LingBot-VLA is a VLA foundation model trained on massive real robot data that shows superior generalization across tasks and platforms with fast training throughput.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A vision-language-action model trained on 20,000 hours of real-world dual-arm data outperforms competitors in generalization across tasks and platforms.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"cca3efe139414ea7cc5102f9b38e6508a83e6c4e474db4d5b2d22ca43ecce2b5"},"source":{"id":"2601.18692","kind":"arxiv","version":2},"verdict":{"id":"f3d9c974-5e4e-4ae9-830e-4c582176d5a3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T21:14:31.671667Z","strongest_claim":"our model achieves clear superiority over competitors, showcasing its strong performance and broad generalizability","one_line_summary":"LingBot-VLA is a VLA foundation model trained on massive real robot data that shows superior generalization across tasks and platforms with fast training throughput.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That evaluation on three platforms with 100 tasks and 130 post-training episodes each is sufficient to establish broad generalizability across tasks and platforms","pith_extraction_headline":"A vision-language-action model trained on 20,000 hours of real-world dual-arm data outperforms competitors in generalization across tasks and platforms."},"references":{"count":33,"sample":[{"doi":"","year":2025,"title":"RoboArena: Distributed real-world evaluation of generalist robot policies","work_id":"a02af411-4d93-4ac8-a15c-930c8f021765","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","ref_index":2,"cited_arxiv_id":"2502.13923","is_internal_anchor":true},{"doi":"","year":2024,"title":"PaliGemma: A versatile 3B VLM for transfer","work_id":"df6f48b3-5792-47c7-9614-cb856ea31ad9","ref_index":3,"cited_arxiv_id":"2407.07726","is_internal_anchor":true},{"doi":"","year":2025,"title":"GR00T N1: An Open Foundation Model for Generalist Humanoid Robots","work_id":"e2db69c7-ee8a-4cb7-a761-7b8de1dfcf97","ref_index":4,"cited_arxiv_id":"2503.14734","is_internal_anchor":true},{"doi":"","year":2025,"title":"Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Robert Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, br","work_id":"26cc9d6b-1484-44e3-9d38-ae7168bb2fd8","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":33,"snapshot_sha256":"aeb36ded5a7eaf9444080b133991ecd37d202c935ba17a89836f8c3a02a81b17","internal_anchors":13},"formal_canon":{"evidence_count":1,"snapshot_sha256":"02380f065aac530a84e8fcf9664b997d4135763c7bbfa427a638039e5e0fdb27"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}