VEGA reconstructs local geometry from monocular egocentric video to create supervised trajectories that train a flow-matching VLA policy, yielding lower collision rates on a new benchmark and in real-world tests.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
SALSA aligns social features and adds future-risk signals in VLA models to cut near-collisions by 86.4% and raise social accuracy from 53% to 93% on SCAND and real robots.
citing papers explorer
-
VEGA: Learning Navigation VLAs from In-the-Wild Egocentric Video with Geometric Trajectory Supervision
VEGA reconstructs local geometry from monocular egocentric video to create supervised trajectories that train a flow-matching VLA policy, yielding lower collision rates on a new benchmark and in real-world tests.
-
Act on What You See: Unlocking Safe Social Navigation in Vision-Language-Action Models
SALSA aligns social features and adds future-risk signals in VLA models to cut near-collisions by 86.4% and raise social accuracy from 53% to 93% on SCAND and real robots.