{"paper":{"title":"CUBic: Coordinated Unified Bimanual Perception and Control Framework","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"CUBic unifies bimanual robot perception and control in a shared tokenized representation where independence and coordination arise from structure alone.","cross_cats":["cs.AI"],"primary_cat":"cs.RO","authors_text":"Donglin Wang, Jingkai Xu, Pengxiang Ding, Xingyu Wang, Zhaoxin Fan","submitted_at":"2026-05-13T12:48:23Z","abstract_excerpt":"Recent advances in visuomotor policy learning have enabled robots to perform control directly from visual inputs. Yet, extending such end-to-end learning from single-arm to bimanual manipulation remains challenging due to the need for both independent perception and coordinated interaction between arms. Existing methods typically favor one side -- either decoupling the two arms to avoid interference or enforcing strong cross-arm coupling for coordination -- thus lacking a unified treatment. We propose CUBic, a Coordinated and Unified framework for Bimanual perception and control that reformula"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"CUBic consistently surpasses standard baselines, achieving marked improvements in coordination accuracy and task success rates over state-of-the-art visuomotor baselines on the RoboTwin benchmark.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a shared tokenized representation learned through unidirectional aggregation and bidirectional codebook coordination will allow independence and coordination to emerge intrinsically from structure without requiring hand-crafted coupling mechanisms.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordination accuracy and task success on the RoboTwin benchmark.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"CUBic unifies bimanual robot perception and control in a shared tokenized representation where independence and coordination arise from structure alone.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"97bd63d8099f3b89557d20320aa34717110cc3dfec70d43867e88c724d722a7a"},"source":{"id":"2605.13452","kind":"arxiv","version":1},"verdict":{"id":"40c116ff-390a-4b3e-bf48-9eee8dae2e49","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:02:28.770787Z","strongest_claim":"CUBic consistently surpasses standard baselines, achieving marked improvements in coordination accuracy and task success rates over state-of-the-art visuomotor baselines on the RoboTwin benchmark.","one_line_summary":"CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordination accuracy and task success on the RoboTwin benchmark.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a shared tokenized representation learned through unidirectional aggregation and bidirectional codebook coordination will allow independence and coordination to emerge intrinsically from structure without requiring hand-crafted coupling mechanisms.","pith_extraction_headline":"CUBic unifies bimanual robot perception and control in a shared tokenized representation where independence and coordination arise from structure alone."},"references":{"count":62,"sample":[{"doi":"","year":2025,"title":"arXiv preprint arXiv:2507.23523 (2025) 3, 10","work_id":"eed69612-81a9-49c8-811f-3696bfe4037c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adria","work_id":"c1fbcc2a-00a8-4d20-8fab-39cb94c8f25f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"UniVLA: Learning to Act Anywhere with Task-centric Latent Actions","work_id":"e05d654d-db73-48f6-9318-381b6798bac9","ref_index":3,"cited_arxiv_id":"2505.06111","is_internal_anchor":true},{"doi":"","year":2025,"title":"villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models","work_id":"e4e687d7-64db-4ff4-8ddf-752b58365e0f","ref_index":4,"cited_arxiv_id":"2507.23682","is_internal_anchor":true},{"doi":"","year":2025,"title":"Moto: Latent mo- tion token as the bridging language for learning robot ma- nipulation from videos","work_id":"9b9fe8b3-ad01-432e-9cde-9c90b61cca05","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":62,"snapshot_sha256":"b61907f8be9875c792495ff3ed44a2928f42aac70981e731088d469fc4442410","internal_anchors":7},"formal_canon":{"evidence_count":2,"snapshot_sha256":"6d85e797dec57aea7d020e57a797010426989a1341c4cf2706f4b58a02fdebec"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}