EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.
Taco: Benchmarking generalizable bimanual tool-action-object understanding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 1
citation-polarity summary
fields
cs.RO 2roles
dataset 1polarities
background 1representative citing papers
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
citing papers explorer
-
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.