Structured world models from human videos

Russell Mendonca, Shikhar Bahl, Deepak Pathak

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

cs.RO · 2025-05-09 · unverdicted · novelty 6.0

UniVLA trains cross-embodiment vision-language-action policies from unlabeled videos via a latent action model in DINO space, beating OpenVLA on benchmarks with 1/20th pretraining compute and 1/10th downstream data.

citing papers explorer

Showing 1 of 1 citing paper.

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions cs.RO · 2025-05-09 · unverdicted · none · ref 58
UniVLA trains cross-embodiment vision-language-action policies from unlabeled videos via a latent action model in DINO space, beating OpenVLA on benchmarks with 1/20th pretraining compute and 1/10th downstream data.

Structured world models from human videos

fields

years

verdicts

representative citing papers

citing papers explorer