pith. sign in

Mixed citations

Scalable vision-language-action model pretraining for robotic manipulation with real-life human activity videos

Mixed citation behavior. Most common role is background (60%).

7 Pith papers citing it
Background 60% of classified citations

citation-role summary

background 4 baseline 1

citation-polarity summary

fields

cs.RO 5 cs.CV 2

years

2026 7

verdicts

UNVERDICTED 7

representative citing papers

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

citing papers explorer

Showing 7 of 7 citing papers.