{"paper":{"title":"AttenA+: Rectifying Action Inequality in Robotic Foundation Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Reweighting robotic action losses by inverse velocity improves foundation model performance on manipulation tasks","cross_cats":["cs.AI"],"primary_cat":"cs.RO","authors_text":"Andrew F. Luo, Boyu Zhou, Daojie Peng, Fulong Ma, Jiahang Cao, Jian Guo, Jun Ma, Ping Luo, Qiang Zhang, Xupeng Xie","submitted_at":"2026-05-13T13:55:37Z","abstract_excerpt":"Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions as equally informative during optimization. This \"flat\" training paradigm, inherited from language modeling, remains indifferent to the underlying physical hierarchy of manipulation. In reality, robot trajectories are fundamentally heterogeneous, where low-velocity segments often dictate task success through precision-demanding interactions, while high-velocity motions serve as error-tolerant transitions. Such a misalignment between uniform loss weighting a"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"AttenA+ significantly elevates the ceilings of current state-of-the-art models. Specifically, it improves OpenVLA-OFT to 98.6% (+1.5%) on the Libero benchmark and pushes FastWAM to 92.4% (+0.6%) on RoboTwin 2.0.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That reweighting the training objective by the inverse velocity field naturally aligns model learning capacity with the physical demands of manipulation, with velocity serving as the primary proxy for kinematic criticality.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"AttenA+ applies velocity-driven action attention to reweight training objectives toward kinematically critical low-velocity segments, yielding small benchmark gains on Libero and RoboTwin without added parameters.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Reweighting robotic action losses by inverse velocity improves foundation model performance on manipulation tasks","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"b2d8c37fba1f2dbb958137fe83522c53809bfc32c513edbffd7d1cc29cedbe47"},"source":{"id":"2605.13548","kind":"arxiv","version":1},"verdict":{"id":"5809133f-ad87-4cbb-b69e-2bbac22f2c2e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T18:39:33.801861Z","strongest_claim":"AttenA+ significantly elevates the ceilings of current state-of-the-art models. Specifically, it improves OpenVLA-OFT to 98.6% (+1.5%) on the Libero benchmark and pushes FastWAM to 92.4% (+0.6%) on RoboTwin 2.0.","one_line_summary":"AttenA+ applies velocity-driven action attention to reweight training objectives toward kinematically critical low-velocity segments, yielding small benchmark gains on Libero and RoboTwin without added parameters.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That reweighting the training objective by the inverse velocity field naturally aligns model learning capacity with the physical demands of manipulation, with velocity serving as the primary proxy for kinematic criticality.","pith_extraction_headline":"Reweighting robotic action losses by inverse velocity improves foundation model performance on manipulation tasks"},"references":{"count":42,"sample":[{"doi":"","year":2024,"title":"OpenVLA: An Open-Source Vision-Language-Action Model","work_id":"3e7e65c5-5aed-4fe9-8414-2092bcb31cc7","ref_index":1,"cited_arxiv_id":"2406.09246","is_internal_anchor":true},{"doi":"","year":2024,"title":"$\\pi_0$: A Vision-Language-Action Flow Model for General Robot Control","work_id":"f790abdc-a796-482f-a40d-f8ee035ecfc2","ref_index":2,"cited_arxiv_id":"2410.24164","is_internal_anchor":true},{"doi":"","year":2026,"title":"Structured observation language for efficient and generalizable vision-language navigation.arXiv preprint arXiv:2603.27577, 2026","work_id":"1f8c0bb5-7cc9-4aeb-bb19-bd8838354af4","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"RT-1: Robotics Transformer for Real-World Control at Scale","work_id":"e11bda85-8531-46bc-a07f-d0ade3643ab1","ref_index":4,"cited_arxiv_id":"2212.06817","is_internal_anchor":true},{"doi":"","year":2023,"title":"Rt-2: Vision-language-action models transfer web knowledge to robotic control","work_id":"ece5db96-21e1-4168-97fe-a2addb84ec83","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":42,"snapshot_sha256":"74d58c2ecda88306f31418d0896f0c17c4dc6dbe71303fec9406f058ebe1034a","internal_anchors":22},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}