{"work":{"id":"0b0ada39-bb8d-4c7b-8df0-4863621a3a5b","openalex_id":null,"doi":null,"arxiv_id":"2605.02881","raw_key":null,"title":"MolmoAct2: Action Reasoning Models for Real-world Deployment","authors":null,"authors_text":null,"year":2026,"venue":"cs.RO","abstract":"Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency for their grounding, and fine-tuned success rates remain below the threshold for dependable use. We present MolmoAct2, a fully open action reasoning model built for practical deployment, advancing its predecessor along five axes. We introduce MolmoER, a VLM backbone specialized for spatial and embodied reasoning, trained on a 3.3M-sample corpus with a specialize-then-rehearse recipe. We release three new datasets spanning low-to-medium cost platforms, including MolmoAct2-BimanualYAM, 720 hours of teleoperated bimanual trajectories that constitute the largest open bimanual dataset to date, together with quality-filtered Franka (DROID) and SO100/101 subsets. We provide OpenFAST, an open-weight, open-data action tokenizer trained on millions of trajectories across five embodiments. We redesign the architecture to graft a flow-matching continuous-action expert onto a discrete-token VLM via per-layer KV-cache conditioning. Finally, we propose MolmoThink, an adaptive-depth reasoning variant that re-predicts depth tokens only for scene regions that change between timesteps, retaining geometric grounding at a fraction of prior latency. In the most extensive empirical study of any open VLA to date, spanning 7 simulation and real-world benchmarks, MolmoAct2 outperforms strong baselines including Pi-05, while MolmoER surpasses GPT-5 and Gemini Robotics ER-1.5 across 13 embodied-reasoning benchmarks. We release model weights, training code, and complete training data. Project page: https://allenai.org/blog/molmoact2","external_url":"https://arxiv.org/abs/2605.02881","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-07-04T14:19:53.794088+00:00","pith_arxiv_id":"2605.02881","created_at":"2026-06-27T22:01:20.757522+00:00","updated_at":"2026-07-04T14:19:53.794088+00:00","title_quality_ok":true,"display_title":null,"render_title":"MolmoAct2: Action Reasoning Models for Real-world Deployment"},"hub":{"state":{"work_id":"0b0ada39-bb8d-4c7b-8df0-4863621a3a5b","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":13,"external_cited_by_count":null,"distinct_field_count":3,"first_pith_cited_at":"2026-06-05T10:01:37+00:00","last_pith_cited_at":"2026-06-29T17:48:01+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-07-04T08:26:44.712591+00:00","tier_text":"hub"},"tier":"hub","role_counts":[],"polarity_counts":[],"runs":{},"summary":{},"graph":{},"authors":[]}}