RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Brianna Zitkovich et al · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Vidar: Embodied Video Diffusion Model for Generalist Manipulation

cs.LG · 2025-07-17 · unverdicted · novelty 6.0

Vidar shows that a video diffusion prior continuously pre-trained on 750K multi-view robot trajectories plus a label-free masked inverse dynamics adapter can generalize manipulation to new robot embodiments with 1% of typical demonstration data.

citing papers explorer

Showing 1 of 1 citing paper.

Vidar: Embodied Video Diffusion Model for Generalist Manipulation cs.LG · 2025-07-17 · unverdicted · none · ref 33
Vidar shows that a video diffusion prior continuously pre-trained on 750K multi-view robot trajectories plus a label-free masked inverse dynamics adapter can generalize manipulation to new robot embodiments with 1% of typical demonstration data.

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

fields

years

verdicts

representative citing papers

citing papers explorer