The model is trained with a flow-matching loss, where the interpolated noisy actiona w is defined in Eq

Reference Policy, Baseline Implementations:We obtain the reference policy by supervised fine-tuning [53] the pretrainedπ 0 · 2000

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

cs.RO · 2026-05-01 · unverdicted · novelty 6.0

Fleet-scale RL framework improves a single generalist VLA policy from deployment data to 95% average success on eight real-world manipulation tasks with 16 dual-arm robots.

citing papers explorer

Showing 1 of 1 citing paper.

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies cs.RO · 2026-05-01 · unverdicted · none · ref 70
Fleet-scale RL framework improves a single generalist VLA policy from deployment data to 95% average success on eight real-world manipulation tasks with 16 dual-arm robots.

The model is trained with a flow-matching loss, where the interpolated noisy actiona w is defined in Eq

fields

years

verdicts

representative citing papers

citing papers explorer