RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
Bridgedata v2: A dataset for robot learning at scale
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 2
citation-polarity summary
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2roles
dataset 2polarities
use dataset 2representative citing papers
VEGA improves spatial reasoning in VLA models for robotics by aligning visual encoder features with 3D-supervised DINOv2 representations via a temporary projector and cosine similarity loss.
citing papers explorer
-
RotVLA: Rotational Latent Action for Vision-Language-Action Model
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
-
VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models
VEGA improves spatial reasoning in VLA models for robotics by aligning visual encoder features with 3D-supervised DINOv2 representations via a temporary projector and cosine similarity loss.